Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20231123となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 分布移動に基づく敵防衛 Adversarial defense based on distribution transfer ( http://arxiv.org/abs/2311.13841v1 ) ライセンス: Link先を確認	Jiahao Chen, Diqun Yan, Li Dong,	(参考訳) 敵対的な例の存在は、ディープラーニングモデルとその応用に重大な脅威をもたらす。既存の防御手法は、敵の例に対してある種の弾力性を提供するが、しばしば精度の低下と一般化性能に悩まされ、堅牢性と一般化の間のトレードオフを達成できない。そこで本研究では, サンプル分布の観点から, 逆例問題を解釈し, 拡散モデルの分布伝達能力を利用して, 分布シフトに基づく防御手法を提案する。その中核となる考え方は、正規分布と対向分布の差を利用して、事前訓練された拡散モデルを用いて対向防御を実現することである。具体的には、逆方向のサンプルは、ソース分布から離れて前方拡散プロセスを実行し、その後、保護されたモデル(最適化モデル)出力で導かれる逆プロセスで通常の分布にマッピングする。 CIFAR10とImageNet30データセットの実験評価を行った。 8/255摂動を持つ無限ノルム攻撃では、それぞれ78.1%と83.5%の精度が達成されている。 128/255の摂動を持つ2ノルム攻撃の場合、精度は74.3%と82.5%である。また, 摂動振幅, 拡散反復, 適応攻撃を考慮した追加実験を行い, 提案手法の有効性を検証した。提案手法は,攻撃者が防御について知識を持っている場合でも,敵の例に効果的に対処できることを示す。従来のアプローチのギャップを埋め、高品質なオリジナルサンプルを復元し、モデルの堅牢性と一般化において優れたパフォーマンスを示す。 The presence of adversarial examples poses a significant threat to deep learning models and their applications. Existing defense methods provide certain resilience against adversarial examples, but often suffer from decreased accuracy and generalization performance, making it challenging to achieve a trade-off between robustness and generalization. To address this, our paper interprets the adversarial example problem from the perspective of sample distribution and proposes a defense method based on distribution shift, leveraging the distribution transfer capability of a diffusion model for adversarial defense. The core idea is to exploit the discrepancy between normal and adversarial sample distributions to achieve adversarial defense using a pretrained diffusion model. Specifically, an adversarial sample undergoes a forward diffusion process, moving away from the source distribution, followed by a reverse process guided by the protected model (victim model) output to map it back to the normal distribution. Experimental evaluations on CIFAR10 and ImageNet30 datasets are conducted, comparing with adversarial training and input preprocessing methods. For infinite-norm attacks with 8/255 perturbation, accuracy rates of 78.1% and 83.5% are achieved, respectively. For 2-norm attacks with 128/255 perturbation, accuracy rates are 74.3% and 82.5%. Additional experiments considering perturbation amplitude, diffusion iterations, and adaptive attacks also validate the effectiveness of the proposed method. Results demonstrate that even when the attacker has knowledge of the defense, the proposed distribution-based method effectively withstands adversarial examples. It fills the gaps of traditional approaches, restoring high-quality original samples and showcasing superior performance in model robustness and generalization.	翻訳日:2024-03-25 13:16:38 公開日:2023-11-23
# StockFormer: STL分解と自己注意ネットワークに基づくSwingトレーディング戦略 StockFormer: A Swing Trading Strategy Based on STL Decomposition and Self-Attention Networks ( http://arxiv.org/abs/2401.06139v1 ) ライセンス: Link先を確認	Bohan Ma, Yiheng Wang, Yuchao Lu, Tianzixuan Hu, Jinling Xu, Patrick Houlihan	(参考訳) 市場再編と投資家の楽観主義の高まりの中で、米国の株式市場は復活し、ポートフォリオの保護と成長のための高度なツールの必要性が高まっている。そこで我々は,スイングトレーディングに最適化された最先端のディープラーニングフレームワークである"Stockformer"を紹介した。 STL分解と自己アテンションネットワークを統合することで、StockformerはS&P 500の複雑なデータを使用して、ストックリターン予測を洗練する。提案手法では,訓練と検証のためのセグメンテーションデータ(2021年1月～2023年1月)とテスト(2023年2月～6月)を行った。試験中、stockformerの予測は10の業界モデルを上回り、主要な予測精度指標(mae、rmse、mape)において優れた精度を達成し、市場のトレンド検出において62.39%という驚くべき精度で達成された。私たちのバックテストでは、Stockformerのスイングトレーディング戦略は13.19%の累積リターンと30.80%の年次リターンをもたらし、現在の最先端モデルを大きく上回った。 stockformerは、この不安定な時代にイノベーションの指標として登場し、投資家に市場予測のための強力なツールを提供している。 Stockformerはhttps://github.com/Eric991005/Stockformer.comで公開されている。 Amidst ongoing market recalibration and increasing investor optimism, the U.S. stock market is experiencing a resurgence, prompting the need for sophisticated tools to protect and grow portfolios. Addressing this, we introduce "Stockformer," a cutting-edge deep learning framework optimized for swing trading, featuring the TopKDropout method for enhanced stock selection. By integrating STL decomposition and self-attention networks, Stockformer utilizes the S&P 500's complex data to refine stock return predictions. Our methodology entailed segmenting data for training and validation (January 2021 to January 2023) and testing (February to June 2023). During testing, Stockformer's predictions outperformed ten industry models, achieving superior precision in key predictive accuracy indicators (MAE, RMSE, MAPE), with a remarkable accuracy rate of 62.39% in detecting market trends. In our backtests, Stockformer's swing trading strategy yielded a cumulative return of 13.19% and an annualized return of 30.80%, significantly surpassing current state-of-the-art models. Stockformer has emerged as a beacon of innovation in these volatile times, offering investors a potent tool for market forecasting. To advance the field and foster community collaboration, we have open-sourced Stockformer, available at https://github.com/Eric991005/Stockformer.	翻訳日:2024-01-22 13:03:39 公開日:2023-11-23
# 短文クラスタリングのためのフェデレーション学習 Federated Learning for Short Text Clustering ( http://arxiv.org/abs/2312.07556v1 ) ライセンス: Link先を確認	Mengling Hu, Chaochao Chen, Weiming Liu, Xinting Liao, and Xiaolin Zheng	(参考訳) 短文クラスタリングは、多くの短文から貴重な洞察を引き出す上での意義について広く研究されている。本稿では、フェデレートされた短文クラスタリング(FSTC)問題、すなわち、異なるクライアントに分散された短文をクラスタリングすることに焦点を当て、これはプライバシー要件の下で現実的な問題である。中央サーバにショートテキストが格納されている集中型ショートテキストクラスタリング問題と比較して、fstcの問題はまだ検討されていない。このギャップを埋めるために,fstc(federated robust short text clustering)フレームワークを提案する。 FSTCには2つの主要なモジュール、すなわちロバストな短文クラスタリングモジュールとフェデレートされたクラスタセンターアグリゲーションモジュールが含まれる。堅牢なショートテキストクラスタリングモジュールは、各クライアントのローカルデータによる効果的なショートテキストクラスタリングモデルをトレーニングすることを目的としている。我々は,疑似教師付きデータの信頼性を確保するために,疑似ラベル生成のための最適なトランスポートとガウス-一様混合モデルを組み合わせた。フェデレーションクラスタセンターアグリゲーションモジュールは、ローカルな生データを効率的に共有することなく、クライアント間で知識を交換することを目的としている。サーバは、異なるクライアントからローカルクラスタセンターを集約し、各通信ラウンドのすべてのクライアントにグローバルセンターを送信する。 3つの短いテキストクラスタリングデータセットに関する実証研究は、FSTCがフェデレートされた短いテキストクラスタリングベースラインよりも大幅に優れていることを示した。 Short text clustering has been popularly studied for its significance in mining valuable insights from many short texts. In this paper, we focus on the federated short text clustering (FSTC) problem, i.e., clustering short texts that are distributed in different clients, which is a realistic problem under privacy requirements. Compared with the centralized short text clustering problem that short texts are stored on a central server, the FSTC problem has not been explored yet. To fill this gap, we propose a Federated Robust Short Text Clustering (FSTC) framework. FSTC includes two main modules, i.e., robust short text clustering module and federated cluster center aggregation module. The robust short text clustering module aims to train an effective short text clustering model with local data in each client. We innovatively combine optimal transport to generate pseudo-labels with Gaussian-uniform mixture model to ensure the reliability of the pseudo-supervised data. The federated cluster center aggregation module aims to exchange knowledge across clients without sharing local raw data in an efficient way. The server aggregates the local cluster centers from different clients and then sends the global centers back to all clients in each communication round. Our empirical studies on three short text clustering datasets demonstrate that FSTC significantly outperforms the federated short text clustering baselines.	翻訳日:2024-01-15 14:36:32 公開日:2023-11-23
# ランガナサンの再発見:知識グラフスペクトルによる彼の人生の原始的視点 Rediscovering Ranganathan: A Prismatic View of His Life through the Knowledge Graph Spectrum ( http://arxiv.org/abs/2401.03343v1 ) ライセンス: Link先を確認	B. Dutta and S. Arzoo	(参考訳) 本稿では,図書館情報科学(LIS)分野の先駆者の一人であるS.R.ランガナサン教授の伝記知識グラフ(KG)について述べる。ランガナサンに関する関連する事実のほとんどは、様々な資源(書籍、エッセイ、雑誌記事、ウェブサイト、ブログなど)に存在し、断片的で断片的な情報を提供する。この献身的なkg (henceforth, rkg) により、我々は彼の生涯と業績を360度見れるようにしたい。私たちの知る限りでは、このような専門的な表現は、その範囲と範囲において、別個のものです: オープンアクセス、使用/再利用、貢献のために最先端の技術を使用するのです。ランガナサンの理論とアイデアにインスパイアされたこのkgは、重要な伝記的側面の同定と存在論的モデルの開発という2段階の「顔に基づく方法論」を用いて開発された。最後に,本研究は,lis領域の活性化から100周年を記念した図書館学の父に対して,その持続的な参加を通じて,kgの向上と献金を行うコミュニティ主導の努力を求めるものである。 The present study puts forward a novel biographical knowledge graph (KG) on Prof. S. R. Ranganathan, one of the pioneering figures in the Library and Information Science (LIS) domain. It has been found that most of the relevant facts about Ranganathan exist in a variety of resources (e.g., books, essays, journal articles, websites, blogs, etc.), offering information in a fragmented and piecemeal way. With this dedicated KG (henceforth known as RKG), we hope to furnish a 360-degree view of his life and achievements. To the best of our knowledge, such a dedicated representation is unparalleled in its scope and coverage: using state-of-the-art technology for anyone to openly access, use/re-use, and contribute. Inspired by Ranganathan's theories and ideas, the KG was developed using a "facet-based methodology" at two levels: in the identification of the vital biographical aspects and the development of the ontological model. Finally, with this study, we call for a community-driven effort to enhance the KG and pay homage to the Father of Library Science on the hundredth anniversary of his revitalizing the LIS domain through his enduring participation.	翻訳日:2024-01-15 09:19:36 公開日:2023-11-23
# yoloを用いた牛のストール数の分類 Classifying cow stall numbers using YOLO ( http://arxiv.org/abs/2401.03340v1 ) ライセンス: Link先を確認	Dheeraj Vajjarapu	(参考訳) 本稿では,牛の群れ検出の分野を推し進めるために,牛の群れに着目したビデオから抽出した画像の集合であるCowStallNumbersデータセットを紹介する。データセットは、0から60までのストール番号を特徴とする1042のトレーニング画像と261のテスト画像からなる。データセットを強化するために, YOLOモデルを用いて微調整を行い, 乱作, 中心作物, ランダム回転などのデータ拡張手法を適用した。実験結果は、ストール数を認識する際の顕著な95.4\%の精度を示している。 This paper introduces the CowStallNumbers dataset, a collection of images extracted from videos focusing on cow teats, designed to advance the field of cow stall number detection. The dataset comprises 1042 training images and 261 test images, featuring stall numbers ranging from 0 to 60. To enhance the dataset, we performed fine-tuning on a YOLO model and applied data augmentation techniques, including random crop, center crop, and random rotation. The experimental outcomes demonstrate a notable 95.4\% accuracy in recognizing stall numbers.	翻訳日:2024-01-15 09:19:13 公開日:2023-11-23
# 畳み込みニューラルネットワークと局所バイナリパターンを用いたプレゼンテーションアタック検出 Presentation Attack Detection using Convolutional Neural Networks and Local Binary Patterns ( http://arxiv.org/abs/2312.00041v1 ) ライセンス: Link先を確認	Justin Spencer, Deborah Lawrence, Prosenjit Chatterjee, Kaushik Roy, Albert Esterline, and Jung-Hee Kim	(参考訳) 近年,ユーザ認証や安全な地域へのアクセス制御にバイオメトリックスを用いることが盛んになり,政府や民間企業でもバイオメトリック・アクセス制御システムが頻繁に利用されている。しかし、これらのシステムは生体情報提示攻撃(spoofing)の可能性を考慮せずに、配備時にセキュリティのリスクを表わす可能性がある。プレゼンテーション攻撃は、現在使用されている多くの生体認証システムに対して有効でありながら実行に要する時間、費用、技術を必要としないため、深刻な脅威である。本研究は,画像における顔および虹彩提示攻撃検出のための3つの異なるソフトウェアベース手法を比較した。最初の方法は、GoogleがImageNetチャレンジのために開発した、事前トレーニング済みの深層畳み込みニューラルネットワーク(CNN)であるInception-v3を使用する。 2つ目は、修正されたSpofnetアーキテクチャに基づいた浅いCNNを使用する。第3は,ローカルバイナリパターン(lbp)を用いたテクスチャベースの手法である。使用されるデータセットは、実画像と偽画像を含むatvs-firデータセットと、実際の画像だけでなく、歪んだ写真、カットされた写真、ビデオリプレイのプレゼンテーション攻撃を含むcasia face anti-spoofingデータセットである。また,casia画像の切り抜きバージョンに基づいて,第3の結果を提示する。 The use of biometrics to authenticate users and control access to secure areas has become extremely popular in recent years, and biometric access control systems are frequently used by both governments and private corporations. However, these systems may represent risks to security when deployed without considering the possibility of biometric presentation attacks (also known as spoofing). Presentation attacks are a serious threat because they do not require significant time, expense, or skill to carry out while remaining effective against many biometric systems in use today. This research compares three different software-based methods for facial and iris presentation attack detection in images. The first method uses Inception-v3, a pre-trained deep Convolutional Neural Network (CNN) made by Google for the ImageNet challenge, which is retrained for this problem. The second uses a shallow CNN based on a modified Spoofnet architecture, which is trained normally. The third is a texture-based method using Local Binary Patterns (LBP). The datasets used are the ATVS-FIr dataset, which contains real and fake iris images, and the CASIA Face Anti-Spoofing Dataset, which contains real images as well as warped photos, cut photos, and video replay presentation attacks. We also present a third set of results, based on cropped versions of the CASIA images.	翻訳日:2023-12-11 03:58:41 公開日:2023-11-23
# ウェーブレット変換とディープ残留ニューラルネットワークを用いた提示検出 Presentation Attack detection using Wavelet Transform and Deep Residual Neural Net ( http://arxiv.org/abs/2312.00040v1 ) ライセンス: Link先を確認	Prosenjit Chatterjee, Alex Yalchin, Joseph Shelton, Kaushik Roy, Xiaohong Yuan, and Kossi D. Edoh	(参考訳) 生体認証は、セキュアな認証システムでより普及している。しかし、生体計測物質はいくつかの方法で使徒によって騙されることがある。その他のインポスタ攻撃、プリント攻撃、マスク攻撃、リプレイ攻撃は、プレゼンテーションアタックのカテゴリに該当する。バイオメトリック画像、特に虹彩と顔は、異なるプレゼンテーション攻撃に対して脆弱である。本研究は、生体認証アクセス制御システムにおけるプレゼンテーション攻撃を軽減するためにディープラーニングアプローチを適用する。まず,生体画像から特徴を抽出するためにウェーブレット変換を適用した。第2に,深層残留ニューラルネットワークを修正してspoofデータセットに適用し,プレゼンテーション攻撃の検出を試みた。本研究は,提案手法を生体spoofデータセット(atvs,casia two class,casia cropped image sets)に適用した。この研究で使用されるデータセットには、さまざまな解像度とサイズとともに、制御された環境と制御されていない環境の両方でキャプチャされるイメージが含まれている。我々はATVS Irisデータセットで93%の精度を得た。 casia 2 クラスと casia cropped データセットでは,それぞれ 91% と 82% のテスト精度が得られた。 Biometric authentication is becoming more prevalent for secured authentication systems. However, the biometric substances can be deceived by the imposters in several ways. Among other imposter attacks, print attacks, mask attacks, and replay attacks fall under the presentation attack category. The bio-metric images, especially the iris and face, are vulnerable to different presentation attacks. This research applies deep learning approaches to mitigate presentation attacks in a biometric access control system. Our contribution in this paper is two-fold: First, we applied the wavelet transform to extract the features from the biometric images. Second, we modified the deep residual neural net and applied it to the spoof datasets in an attempt to detect the presentation attacks. This research applied the proposed approach to biometric spoof datasets, namely ATVS, CASIA two class, and CASIA cropped image sets. The datasets used in this research contain images that are captured in both a controlled and uncontrolled environment along with different resolutions and sizes. We obtained the best accuracy of 93% on the ATVS Iris datasets. For CASIA two class and CASIA cropped datasets, we achieved test accuracies of 91% and 82%, respectively.	翻訳日:2023-12-11 03:58:01 公開日:2023-11-23
# アコースティックサイバーセキュリティ:音声アクティベートシステムの利用 Acoustic Cybersecurity: Exploiting Voice-Activated Systems ( http://arxiv.org/abs/2312.00039v1 ) ライセンス: Link先を確認	Forrest McKee and David Noever	(参考訳) 本研究では,2024年までに人口が世界人口を超えると予測されていることから,デジタル音声アシスタントを対象とする不審な音響攻撃の脅威について検討する。私たちの研究は、AmazonのAlexa、Android、iOS、Cortanaといったさまざまなプラットフォームにおけるこれらの攻撃の可能性を広げ、スマートデバイスの重大な脆弱性を明らかにしています。特定された12の攻撃ベクターには、スマートホームデバイスと自動車システムの操作の成功、軍事通信の潜在的な侵入、重要なインフラセキュリティの課題が含まれる。攻撃成功率は60%程度で、100フィート以上離れた場所からリモートでデバイスをアクティベートできることを定量的に示しています。さらに、これらの攻撃は重要なインフラストラクチャを脅かし、音響シールド、高度な信号処理、マシンラーニング、堅牢なユーザ認証を組み合わせた多面的な防御戦略の必要性を強調している。 In this study, we investigate the emerging threat of inaudible acoustic attacks targeting digital voice assistants, a critical concern given their projected prevalence to exceed the global population by 2024. Our research extends the feasibility of these attacks across various platforms like Amazon's Alexa, Android, iOS, and Cortana, revealing significant vulnerabilities in smart devices. The twelve attack vectors identified include successful manipulation of smart home devices and automotive systems, potential breaches in military communication, and challenges in critical infrastructure security. We quantitatively show that attack success rates hover around 60%, with the ability to activate devices remotely from over 100 feet away. Additionally, these attacks threaten critical infrastructure, emphasizing the need for multifaceted defensive strategies combining acoustic shielding, advanced signal processing, machine learning, and robust user authentication to mitigate these risks.	翻訳日:2023-12-11 03:57:32 公開日:2023-11-23
# マウスクリックストリームデータ解析による連続認証 Continuous Authentication Using Mouse Clickstream Data Analysis ( http://arxiv.org/abs/2312.00802v1 ) ライセンス: Link先を確認	Sultan Almalki, Prosenjit Chatterjee, and Kaushik Roy	(参考訳) バイオメトリックスは、生理的または行動的特性に基づいて個人を認証するために使用される。マウスダイナミクスは、セキュリティ違反に対する保護として継続的認証を実行するために使用できる行動バイオメトリックの例である。マウスのダイナミクスに関する最近の研究は、ユーザを特定する上で有望な結果を示しているが、まだ許容できる精度に達していない。本稿では,マウスダイナミクスデータセットであるbalabit mouse challengeデータセットを用いて,異なる分類手法の実証的評価を行った。ユーザ識別は、マウス移動、ポイント・アンド・クリック、ドラッグ・アンド・ドロップの3つのマウスアクションを使って行われる。検証と認証は、決定木分類器、K-Nearest Neighbors分類器、ランダムフォレスト分類器の3つの機械学習分類器を用いて行われる。その結果、3つの分類器は実際のユーザと詐欺師を比較的高い精度で区別できることがわかった。検証モードでは、全ての分類器は完全な精度100%を達成する。認証モードでは、3つの分類器が最も正確(ACC:87.6%、AUC:90.3%)、(K-Nearest Neighbors ACC:99.3%、AUC:99.9%)、(Random Forest ACC:89.9%、AUC:92.5%)である。 Biometrics is used to authenticate an individual based on physiological or behavioral traits. Mouse dynamics is an example of a behavioral biometric that can be used to perform continuous authentication as protection against security breaches. Recent research on mouse dynamics has shown promising results in identifying users; however, it has not yet reached an acceptable level of accuracy. In this paper, an empirical evaluation of different classification techniques is conducted on a mouse dynamics dataset, the Balabit Mouse Challenge dataset. User identification is carried out using three mouse actions: mouse move, point and click, and drag and drop. Verification and authentication methods are conducted using three machine-learning classifiers: the Decision Tree classifier, the K-Nearest Neighbors classifier, and the Random Forest classifier. The results show that the three classifiers can distinguish between a genuine user and an impostor with a relatively high degree of accuracy. In the verification mode, all the classifiers achieve a perfect accuracy of 100%. In authentication mode, all three classifiers achieved the highest accuracy (ACC) and Area Under Curve (AUC) from scenario B using the point and click action data: (Decision Tree ACC:87.6%, AUC:90.3%), (K-Nearest Neighbors ACC:99.3%, AUC:99.9%), and (Random Forest ACC:89.9%, AUC:92.5%).	翻訳日:2023-12-11 03:46:06 公開日:2023-11-23
# 敗血症死亡予測におけるSOFAスコアのクラスター軌跡 Cluster trajectory of SOFA score in predicting mortality in sepsis ( http://arxiv.org/abs/2311.17066v1 ) ライセンス: Link先を確認	Yuhe Ke, Matilda Swee Sun Tang, Celestine Jia Ling Loh, Hairil Rizal Abdullah, Nicholas Brian Shannon	(参考訳) 目的: セプシスは生命を脅かす状態である。連続的臓器不全評価(sequential organ failure assessment, sofa)スコアは、臓器機能障害の評価やicu死亡率の予測に一般的に用いられるが、静的測定として捉えられ、動的変化を捉えることができない。本研究の目的は、ICU入院72時間におけるSOFAスコアの動的変化と患者の成績との関係を検討することである。設計, 設定, 参加者: 集中治療ivデータベースのための医療情報マート3,253名の患者に対し, 敗血症-3基準を満たし, icu入院72時間以上, 蘇生状態も検討した。動的時間ゆがみとk平均クラスタリングを用いた群ベース軌道モデリングは、動的ソファスコアの異なる軌道パターンを同定した。その後、Pythonで比較された。主な成果は, 病院, ICU死亡率, 入院期間, ICU, 入院期間, 入院期間などであった。 ICUから病棟への放電時間と7日間と14日間のカットオフが行われた。結果: A群, B群, C群, D群の4群を同定した。クラスターDは、最も長いICUと病院滞在、最も高いICUと病院死亡率を有していた。 ICUからの放電速度はクラスターAとBに似ており、クラスタCは当初は同等の速度であったが、ウォードへの移行は遅い。結論:SOFAスコアの動的変化のモニタリングは敗血症重症度と治療応答性を評価する上で重要である。 Objective: Sepsis is a life-threatening condition. Sequential Organ Failure Assessment (SOFA) score is commonly used to assess organ dysfunction and predict ICU mortality, but it is taken as a static measurement and fails to capture dynamic changes. This study aims to investigate the relationship between dynamic changes in SOFA scores over the first 72 hours of ICU admission and patient outcomes. Design, setting, and participants: 3,253 patients in the Medical Information Mart for Intensive Care IV database who met the sepsis-3 criteria and were admitted from the emergency department with at least 72 hours of ICU admission and full-active resuscitation status were analysed. Group-based trajectory modelling with dynamic time warping and k-means clustering identified distinct trajectory patterns in dynamic SOFA scores. They were subsequently compared using Python. Main outcome measures: Outcomes including hospital and ICU mortality, length of stay in hospital and ICU, and readmission during hospital stay, were collected. Discharge time from ICU to wards and cut-offs at 7-day and 14-day were taken. Results: Four clusters were identified: A (consistently low SOFA scores), B (rapid increase followed by a decline in SOFA scores), C (higher baseline scores with gradual improvement), and D (persistently elevated scores). Cluster D had the longest ICU and hospital stays, highest ICU and hospital mortality. Discharge rates from ICU were similar for Clusters A and B, while Cluster C had initially comparable rates but a slower transition to ward. Conclusion: Monitoring dynamic changes in SOFA score is valuable for assessing sepsis severity and treatment responsiveness.	翻訳日:2023-12-03 13:08:34 公開日:2023-11-23
# 陽電子結合分子ダイオードの多体理論計算 Many-body theory calculations of positronic-bonded molecular dianions ( http://arxiv.org/abs/2311.16318v1 ) ライセンス: Link先を確認	J. P. Cassidy, J. Hofierka, B. Cunningham and D. G. Green	(参考訳) 陽電子対アニオン系のエネルギー安定性 [A$^-;e^+;$A$^-$] は多体理論によって研究され、そこでは、$A^-$は H$^{-}$, F$^{-}$, Cl$^{-}$ および分子アニオン (CN)$^{-}$ および (NCO)$^{-}$ を含む。具体的には、[j]で構築された陽電子アニオン自己エネルギーを用いて、2つのアニオンの場における陽電子のダイソン方程式を解いて、イオン分離の関数としての系のエネルギーを決定する。 Hofierka, B. Cunningham, C. M. Rawlins, C. Patterson, D. G. Green, \emph{Nature} {\bf 606} 688 (2022)] は偏光、スクリーニング、仮想ポジトロニウム形成などの相関関係を記述している。計算は H$_{2}^{2-}$, F$_{2}^{2-}$, Cl$_{2}^{2-}$ と相互作用する陽電子に対して行われ、以前の理論とよく一致している。特に、イオン分離に関する[h$^-;e^+$;h$^-$]系のポテンシャルエネルギーにおける2つのミニマの存在を確認する: 1つは陽電子結合 [h$^-;e^+$;h$^-$] イオン分離における局所最小値 $r\sim3.4$~\aa\phantom{} と、より小さなイオン分離におけるグローバル最小値 $r\lesssim1.6$~\aa\phantom{} は、h$_2$分子と陽電子負イオンへの解離に関して系の全体的な不安定性を与える。最初の予測は、分子アニオンの断片からなる陽電子結合、特に(cn)$_{2}^{2-}$と(nco)$_{2}^{2-}$である。いずれの場合も、陽電子結合の生成によって形成される分子は、A$^-$と$e^+$A$^-$(陽電子は1つの陽イオンに結合する)への解離に対して安定であり、結合エネルギーは1~eVのオーダーで、結合長は数個の \r アングストロームの順番で変化する。 The energetic stability of positron di-anion systems [A$^-;e^+;$A$^-$] is studied via many-body theory, where $A^-$ includes H$^{-}$, F$^{-}$, Cl$^{-}$ and the molecular anions (CN)$^{-}$ and (NCO)$^{-}$. Specifically, the energy of the system as a function of ionic separation is determined by solving the Dyson equation for the positron in the field of the two anions, using a positron-anion self energy as constructed in [J. Hofierka, B. Cunningham, C. M. Rawlins, C. H. Patterson and D. G. Green, \emph{Nature} {\bf 606} 688 (2022)] that accounts for correlations including polarization, screening, and virtual-positronium formation. Calculations are performed for a positron interacting with H$_{2}^{2-}$, F$_{2}^{2-}$, and Cl$_{2}^{2-}$, and are found to be in good agreement with previous theory. In particular, we confirm the presence of two minima in the potential energy of the [H$^-;e^+$;H$^-$] system with respect to ionic separation: one a positronically-bonded [H$^-;e^+$;H$^-$] local minimum at ionic separations $r\sim3.4$~\AA\phantom{}, and a global minimum at smaller ionic separations $r\lesssim1.6$~\AA\phantom{} that gives overall instability of the system with respect to dissociation into a H$_2$ molecule and a positronium negative ion, Ps$^-$. The first predictions are made for positronic bonding in dianions consisting of molecular anionic fragments, specifically for (CN)$_{2}^{2-}$, and (NCO)$_{2}^{2-}$. In all cases we find that the molecules formed by the creation of a positronic bond are stable relative to dissociation into A$^-$ and $e^+$A$^-$ (positron bound to a single anion), with bond energies on the order of 1~eV and bond lengths on the order of several \r angstroms.	翻訳日:2023-12-03 13:07:35 公開日:2023-11-23
# 拡散確率モデルを用いた分岐多様性による短絡バイアス軽減 Shortcut Bias Mitigation via Ensemble Diversity Using Diffusion Probabilistic Models ( http://arxiv.org/abs/2311.16176v1 ) ライセンス: Link先を確認	Luca Scimeca, Alexander Rubinstein, Damien Teney, Seong Joon Oh, Armand Mihai Nicolicioiu, Yoshua Bengio	(参考訳) 複数の手がかりがターゲットラベルを予測しているデータにおける散発的な相関は、しばしば単純バイアス(simple bias)と呼ばれる現象に繋がる。本研究では,拡散確率モデル(DPM)を用いた短絡バイアス軽減のためのアンサンブル多様化フレームワークを提案する。また,特定のトレーニング間隔において,dpmは,相関した入力特徴を提示する画像上でも,新たな特徴の組み合わせを持つ画像を生成することができることを示した。我々は、この重要な特性を利用して合成反事実を生成し、アンサンブル不一致によるモデルの多様性を向上させる。そこで本研究では,DPM誘導の多様化は,制御信号の追加を必要とせず,一次ショートカットキューへの依存を取り除くのに十分であることを示す。さらに,複数の多様化目標に対して有効性を実証的に定量化し,さらに補助データ収集に依存する先行作業と同等に一般化および多様化性能の向上を図った。 Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to a phenomenon known as simplicity bias, where a model relies on erroneous, easy-to-learn cues while ignoring reliable ones. In this work, we propose an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs) for shortcut bias mitigation. We show that at particular training intervals, DPMs can generate images with novel feature combinations, even when trained on images displaying correlated input features. We leverage this crucial property to generate synthetic counterfactuals to increase model diversity via ensemble disagreement. We show that DPM-guided diversification is sufficient to remove dependence on primary shortcut cues, without a need for additional supervised signals. We further empirically quantify its efficacy on several diversification objectives, and finally show improved generalization and diversification performance on par with prior work that relies on auxiliary data collection.	翻訳日:2023-12-03 13:06:29 公開日:2023-11-23
# Point2RBox: エンドツーエンドオブジェクト検出のための合成視覚パターンからの知識と単一点スーパービジョンを組み合わせる Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision ( http://arxiv.org/abs/2311.14758v1 ) ライセンス: Link先を確認	Yu Yi, Xue Yang, Qingyun Li, Feipeng Da, Junchi Yan, Jifeng Dai, Yu Qiao	(参考訳) 指向性物体検出(ood)の需要が急速に高まる中、水平箱(hbox)から回転箱(rbox)を学ぶための弱教師付き検出器に関する最近の研究が注目を集めている。本稿では,より困難なラベル効率設定,すなわち単一点制御OODについて検討し,Point2RBoxというアプローチを提案する。具体的には,2つの原則を活用することを提案する。 1) 合成パターン知識の組み合わせ: 画像上の各ラベル付き点をサンプリングすることにより、既知境界ボックスによる合成視覚パターンにオブジェクト特徴を移譲し、ボックス回帰の知識を提供する。 2) 変換自己スーパービジョン: 変換された入力画像(例えば、スケール/ローテーション)により、出力RBoxは、オブジェクト間の相対的なサイズ/ローテーションを知覚できるように、同じ変換に従うように訓練される。この検出器は、周辺問題に対処するいくつかの工夫された技術によってさらに強化されている。例えば、我々の点監督設定では、オブジェクトのサイズが不足しているため、アンカー/層割り当てなどである。私たちの知る限りでは、Point2RBoxはポイント管理OODの最初のエンドツーエンドソリューションです。特に,本手法は軽量なパラダイムを用いているが,DOTA/DIOR/HRSCデータセットの41.05%/27.62%/80.01%の点教師付き代替品間での競合性能を実現する。 With the rapidly increasing demand for oriented object detection (OOD), recent research involving weakly-supervised detectors for learning rotated box (RBox) from the horizontal box (HBox) has attracted more and more attention. In this paper, we explore a more challenging yet label-efficient setting, namely single point-supervised OOD, and present our approach called Point2RBox. Specifically, we propose to leverage two principles: 1) Synthetic pattern knowledge combination: By sampling around each labelled point on the image, we transfer the object feature to synthetic visual patterns with the known bounding box to provide the knowledge for box regression. 2) Transform self-supervision: With a transformed input image (e.g. scaled/rotated), the output RBoxes are trained to follow the same transformation so that the network can perceive the relative size/rotation between objects. The detector is further enhanced by a few devised techniques to cope with peripheral issues, e.g. the anchor/layer assignment as the size of the object is not available in our point supervision setting. To our best knowledge, Point2RBox is the first end-to-end solution for point-supervised OOD. In particular, our method uses a lightweight paradigm, yet it achieves a competitive performance among point-supervised alternatives, 41.05%/27.62%/80.01% on DOTA/DIOR/HRSC datasets.	翻訳日:2023-11-30 09:52:37 公開日:2023-11-23
# PointOBB:シングルポイントスーパービジョンによるオブジェクト指向物体検出の学習 PointOBB: Learning Oriented Object Detection via Single Point Supervision ( http://arxiv.org/abs/2311.14757v1 ) ライセンス: Link先を確認	Junwei Luo, Xue Yang, Yi Yu, Qingyun Li, Junchi Yan, Yansheng Li	(参考訳) 単点監視対象検出はコスト効率のため注目されている。しかし、既存のアプローチでは水平境界ボックス(hbbs)の生成に重点を置いており、空中画像のオブジェクトに一般的に使用される指向境界ボックス(obbs)は無視している。本稿では,オブジェクト指向物体検出のための最初の単一点ベース OBB 生成法である PointOBB を提案する。 PointOBBは、オリジナルビュー、リサイズビュー、ローテーション/フリップ(rot/flp)ビューの3つのユニークなビューの協調利用を通じて動作する。元のビューでは、resizedとrot/flpビューを利用して、それぞれスケール拡張モジュールと角取得モジュールを構築します。前者のモジュールでは、SSC(Scale-Sensitive Consistency)損失は、オブジェクトのスケールを知覚するディープネットワークの能力を高めるために設計されている。正確な対象角度予測のために、後者のモジュールは自己教師付き学習を取り入れて、スパースオブジェクトに対応する密集角度を集約するスケール誘導Dense-to-Sparse(DS)マッチング戦略と関連付ける。リサイズとrot/flpビューは、トレーニング中にプログレッシブなマルチビュースイッチング戦略を用いて切り替えられ、スケールとアングルの同時最適化を実現する。 DIOR-RとDOTA-v1.0データセットの実験結果は、PointOBBが有望な性能を達成し、潜在的点監督ベースラインを著しく上回ることを示した。 Single point-supervised object detection is gaining attention due to its cost-effectiveness. However, existing approaches focus on generating horizontal bounding boxes (HBBs) while ignoring oriented bounding boxes (OBBs) commonly used for objects in aerial images. This paper proposes PointOBB, the first single Point-based OBB generation method, for oriented object detection. PointOBB operates through the collaborative utilization of three distinctive views: an original view, a resized view, and a rotated/flipped (rot/flp) view. Upon the original view, we leverage the resized and rot/flp views to build a scale augmentation module and an angle acquisition module, respectively. In the former module, a Scale-Sensitive Consistency (SSC) loss is designed to enhance the deep network's ability to perceive the object scale. For accurate object angle predictions, the latter module incorporates self-supervised learning to predict angles, which is associated with a scale-guided Dense-to-Sparse (DS) matching strategy for aggregating dense angles corresponding to sparse objects. The resized and rot/flp views are switched using a progressive multi-view switching strategy during training to achieve coupled optimization of scale and angle. Experimental results on the DIOR-R and DOTA-v1.0 datasets demonstrate that PointOBB achieves promising performance, and significantly outperforms potential point-supervised baselines.	翻訳日:2023-11-30 09:52:09 公開日:2023-11-23
# タスク分散的ロバストなデータフリーなメタラーニング Task-Distributionally Robust Data-Free Meta-Learning ( http://arxiv.org/abs/2311.14756v1 ) ライセンス: Link先を確認	Zixuan Hu, Li Shen, Zhenyi Wang, Yongxian Wei, Baoyuan Wu, Chun Yuan, Dacheng Tao	(参考訳) data-free meta-learning (dfml) は、トレーニングデータを必要としない複数の事前学習モデルを活用することで、新しいタスクを効率的に学習することを目的としている。既存のインバージョンベースのDFMLメソッドは、学習可能なデータセットから擬似タスクを構築する。タスク・ディストリビューション・シフト(TDS)とタスク・ディストリビューション・破壊(TDC)の2つの大きな課題を初めて明らかにした。 TDSは、新しく生成されたタスクに対する歪んだタスク分布のため、バイアス付きメタラーナーにつながる。 TDCは、ラベルを誤解させるような不信任モデルや品質の悪いモデルがタスク分布を汚染する場合に発生する。これらの課題に対処するために,タスク分散の堅牢性を保証する頑健なDFMLフレームワークを導入する。本稿では,タスクメモリバッファ内でのタスク補間を多用した擬似タスク分布からメタ学習を提案する。このアプローチは、広範囲の補間メモリタスクで一貫したパフォーマンスを維持することによって、新しく生成されたタスクに対するメタラーナーの過度な依存を減らす。さらに,自動モデル選択機構をメタトレーニングフェーズにシームレスに組み込んで,各モデルの信頼性を学習可能な重みとしてパラメータ化する。これは強化学習に触発されたポリシー勾配アルゴリズムで最適化され、モデル選択によって生じる非微分可能課題を効果的に解決する。さまざまなデータセットにわたる総合的な実験は、TDSとTDCの緩和におけるフレームワークの有効性を示し、現実のシナリオでDFMLを改善する可能性を示している。 Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data. Existing inversion-based DFML methods construct pseudo tasks from a learnable dataset, which is inversely generated from the pre-trained model pool. For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift (TDS) and Task-Distribution Corruption (TDC). TDS leads to a biased meta-learner because of the skewed task distribution towards newly generated tasks. TDC occurs when untrusted models characterized by misleading labels or poor quality pollute the task distribution. To tackle these issues, we introduce a robust DFML framework that ensures task distributional robustness. We propose to meta-learn from a pseudo task distribution, diversified through task interpolation within a compact task-memory buffer. This approach reduces the meta-learner's overreliance on newly generated tasks by maintaining consistent performance across a broader range of interpolated memory tasks, thus ensuring its generalization for unseen tasks. Additionally, our framework seamlessly incorporates an automated model selection mechanism into the meta-training phase, parameterizing each model's reliability as a learnable weight. This is optimized with a policy gradient algorithm inspired by reinforcement learning, effectively addressing the non-differentiable challenge posed by model selection. Comprehensive experiments across various datasets demonstrate the framework's effectiveness in mitigating TDS and TDC, underscoring its potential to improve DFML in real-world scenarios.	翻訳日:2023-11-30 09:51:41 公開日:2023-11-23
# excel : ex-of-distribution detection 強化のためのextreme と collective logit の複合情報 ExCeL : Combined Extreme and Collective Logit Information for Enhancing Out-of-Distribution Detection ( http://arxiv.org/abs/2311.14754v1 ) ライセンス: Link先を確認	Naveen Karunanayake, Suranga Seneviratne, Sanjay Chawla	(参考訳) ディープラーニングモデルは、アウト・オブ・ディストリビューション(OOD)データの予測に自信を欠くことが多く、予測の信頼性を確保する上でのOOD検出の重要性を浮き彫りにしている。様々なOOD検出手法の中で、ポストホック検出器は、主に使いやすさと実装の容易さから、大きな人気を集めている。しかしながら、ほとんどのポストホックなood検出器の有効性は、最大ロジットのような極端な情報のみに依存するか、出力層に埋め込まれた集団情報(すなわち、クラスにまたがる情報やトレーニングサンプル)に依存しているため、制限されている。本稿では,OOD検出の精度を高めるために,出力層内の極端情報と集合情報を組み合わせたExCeLを提案する。最上位の予測クラスのロジットを極端な情報(すなわち最大ロジット)として活用する一方、集団情報は、様々なトレーニングサンプルにまたがって、次の階級に現れる他のクラスの可能性を評価する新しいアプローチによって導かれる。我々の考えは、ID(In-distriion)データに対して、予測クラスを超えたクラスのランキングがOODデータよりも決定論的である、という観察に動機づけられている。 CIFAR100とImageNet-200データセットで実施された実験により、ExCeLは、近OODと遠OODのジョイントパフォーマンスが考慮されている場合、既存の21のベースラインのうち、一貫して5つのトップパフォーマンスメソッドであることが示された(AUROCとFPR95)。さらにexcelは、一方のデータセットでベストだが他方でパフォーマンスが低下する他のベースラインとは異なり、両方のデータセットで最高の全体的なパフォーマンスを示している。 Deep learning models often exhibit overconfidence in predicting out-of-distribution (OOD) data, underscoring the crucial role of OOD detection in ensuring reliability in predictions. Among various OOD detection approaches, post-hoc detectors have gained significant popularity, primarily due to their ease of use and implementation. However, the effectiveness of most post-hoc OOD detectors has been constrained as they rely solely either on extreme information, such as the maximum logit, or on the collective information (i.e., information spanned across classes or training samples) embedded within the output layer. In this paper, we propose ExCeL that combines both extreme and collective information within the output layer for enhanced accuracy in OOD detection. We leverage the logit of the top predicted class as the extreme information (i.e., the maximum logit), while the collective information is derived in a novel approach that involves assessing the likelihood of other classes appearing in subsequent ranks across various training samples. Our idea is motivated by the observation that, for in-distribution (ID) data, the ranking of classes beyond the predicted class is more deterministic compared to that in OOD data. Experiments conducted on CIFAR100 and ImageNet-200 datasets demonstrate that ExCeL consistently is among the five top-performing methods out of twenty-one existing post-hoc baselines when the joint performance on near-OOD and far-OOD is considered (i.e., in terms of AUROC and FPR95). Furthermore, ExCeL shows the best overall performance across both datasets, unlike other baselines that work best on one dataset but has a performance drop in the other.	翻訳日:2023-11-30 09:51:14 公開日:2023-11-23
# 一般ゼロショット学習のための属性認識型表現法 Attribute-Aware Representation Rectification for Generalized Zero-Shot Learning ( http://arxiv.org/abs/2311.14750v1 ) ライセンス: Link先を確認	Zhijie Rao, Jingcai Guo, Xiaocheng Lu, Qihua Zhou, Jie Zhang, Kang Wei, Chenxin Li, Song Guo	(参考訳) 一般化されたゼロショット学習(gzsl)は、一連の偏りのないビジュアル・セマンティクスマッピングを設計し、その精度は目に見えるクラスと見えないクラスの両方から抽出された視覚特徴の完全性に大きく依存している。しかしながら、gzslにおける一般的な慣例として、事前訓練された特徴抽出器は、下流のタスク/データセットのドメイン固有の特性を捉えるのが容易であり、特に見当たらないクラスにおいて、全体的な認識性能を妨げる、きめ細かい識別機能、すなわちドメインバイアスを提供する。最近の研究では、微調整された特徴抽出器によって部分的にこの問題に対処しているが、必然的に破滅的な放棄と過剰フィッティングの問題を引き起こす可能性がある。本稿では,GZSL の簡易かつ効果的な属性認識表現フレームワークである $\mathbf{(AR)^{2}}$ を提案する。具体的には,UAD (Unseen-Aware Distillation) とAGL (Attribute-Guided Learning) の2つの要素から構成される。トレーニング中、UDAは、未確認のクラスと未確認のクラスの両方で共有される属性テキストの事前知識を利用して、未確認のクラス感受性の視覚的特徴をターゲットとして検出・維持すると同時に、AGLは、価値ある特徴に焦点を合わせ、属性誘導表現学習により、そのクラスにノイズのある要素を適合させることを抑えることを目的としている。各種ベンチマークデータセットの大規模な実験により,本手法の有効性が示された。 Generalized Zero-shot Learning (GZSL) has yielded remarkable performance by designing a series of unbiased visual-semantics mappings, wherein, the precision relies heavily on the completeness of extracted visual features from both seen and unseen classes. However, as a common practice in GZSL, the pre-trained feature extractor may easily exhibit difficulty in capturing domain-specific traits of the downstream tasks/datasets to provide fine-grained discriminative features, i.e., domain bias, which hinders the overall recognition performance, especially for unseen classes. Recent studies partially address this issue by fine-tuning feature extractors, while may inevitably incur catastrophic forgetting and overfitting issues. In this paper, we propose a simple yet effective Attribute-Aware Representation Rectification framework for GZSL, dubbed $\mathbf{(AR)^{2}}$, to adaptively rectify the feature extractor to learn novel features while keeping original valuable features. Specifically, our method consists of two key components, i.e., Unseen-Aware Distillation (UAD) and Attribute-Guided Learning (AGL). During training, UAD exploits the prior knowledge of attribute texts that are shared by both seen/unseen classes with attention mechanisms to detect and maintain unseen class-sensitive visual features in a targeted manner, and meanwhile, AGL aims to steer the model to focus on valuable features and suppress them to fit noisy elements in the seen classes by attribute-guided representation learning. Extensive experiments on various benchmark datasets demonstrate the effectiveness of our method.	翻訳日:2023-11-30 09:50:37 公開日:2023-11-23
# プログレッシブ言語に基づく合成ゼロショット学習 Compositional Zero-shot Learning via Progressive Language-based Observations ( http://arxiv.org/abs/2311.14749v1 ) ライセンス: Link先を確認	Lin Li, Guikun Chen, Jun Xiao, Long Chen	(参考訳) compositional zero-shot learningは、トレーニング中に既知のプリミティブ(状態とオブジェクト)を活用することで、目に見えない状態オブジェクトの構成を認識することを目的としている。しかしながら、プリミティブ間の相互作用を効果的にモデル化し、新しい構成に知識を一般化することは、年次課題である。オブジェクト条件付きおよび状態条件付き分散、すなわち、状態(またはオブジェクト)の出現は、異なるオブジェクト(または状態)と組み合わせると著しく変化する。例えば、状態"old"は、"car"のヴィンテージデザインや"cat"の高齢を表すことができる。本稿では,事前観測されたプリミティブに基づく合成カテゴリの予測により,これらの分散を緩和できると主張する。そこで本研究では,プリミティブの観測順序を動的に決定できるprogressive language-based observations (plo)を提案する。これらの観察は、モデルがステップバイステップで画像の内容を理解することを可能にする一連の概念または言語から構成される。具体的には、PLOは事前に訓練された視覚言語モデル(VLM)を採用し、観察能力を持つモデルを強化する。さらに2つの変種を考案します 1) PLO-VLM: 予備観測分類器が2つのプリミティブの観測順序を動的に決定する2段階法。 2) PLO-LLM: 大規模言語モデル(LLM)を用いて, ステップバイステップ観測のための合成プロンプトを作成する多段階スキーム。 3つの挑戦的なデータセットに対する大規模な改善は、最先端の手法と比較してPLOの優位性を示し、合成認識におけるその能力を確認している。 Compositional zero-shot learning aims to recognize unseen state-object compositions by leveraging known primitives (state and object) during training. However, effectively modeling interactions between primitives and generalizing knowledge to novel compositions remains a perennial challenge. There are two key factors: object-conditioned and state-conditioned variance, i.e., the appearance of states (or objects) can vary significantly when combined with different objects (or states). For instance, the state "old" can signify a vintage design for a "car" or an advanced age for a "cat". In this paper, we argue that these variances can be mitigated by predicting composition categories based on pre-observed primitive. To this end, we propose Progressive Language-based Observations (PLO), which can dynamically determine a better observation order of primitives. These observations comprise a series of concepts or languages that allow the model to understand image content in a step-by-step manner. Specifically, PLO adopts pre-trained vision-language models (VLMs) to empower the model with observation capabilities. We further devise two variants: 1) PLO-VLM: a two-step method, where a pre-observing classifier dynamically determines the observation order of two primitives. 2) PLO-LLM: a multi-step scheme, which utilizes large language models (LLMs) to craft composition-specific prompts for step-by-step observing. Extensive ablations on three challenging datasets demonstrate the superiority of PLO compared with state-of-the-art methods, affirming its abilities in compositional recognition.	翻訳日:2023-11-30 09:50:04 公開日:2023-11-23
# homoe:hopfield networkとソフトミキシングによるゼロショット学習のためのメモリベースおよびコンポジションアウェアフレームワーク HOMOE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts ( http://arxiv.org/abs/2311.14747v1 ) ライセンス: Link先を確認	Do Huu Dat, Po Yuan Mao, Tien Hoang Nguyen, Wray Buntine, Mohammed Bennamoun	(参考訳) 合成ゼロショット学習(CZSL)は、合成思考を方法論に組み込むことで、従来のゼロショット学習の制約を克服することを目的として、機械学習において不可欠なパラダイムとして登場した。従来のゼロショット学習は、事前定義されたクラス埋め込みに依存するため、見知らぬクラスと見知らぬクラスの組み合わせを管理するのが困難である。対照的に、コンポジションゼロショット学習はクラス間の固有の階層と構造的接続を使用し、属性やコンポーネント、その他の意味要素を組み合わせて新しいクラス表現を作成する。本稿では,現代ホップフィールドネットワークとMixture of Experts(HOMOE)を組み合わせた新しいフレームワークを提案する。具体的には、Modern Hopfield Networkは、ラベルのプロトタイプを格納し、入力画像に関連するラベルを識別するメモリを作成する。その後、Mixture of Expertモデルでは、画像とフィッティングプロトタイプを統合して、最終的な構成分類を生成する。提案手法は,MIT-StatesやUT-Zapposなど,いくつかのベンチマークにおいてSOTA性能を実現する。また,各コンポーネントが一般化にどのように寄与するかについても検討した。 Compositional Zero-Shot Learning (CZSL) has emerged as an essential paradigm in machine learning, aiming to overcome the constraints of traditional zero-shot learning by incorporating compositional thinking into its methodology. Conventional zero-shot learning has difficulty managing unfamiliar combinations of seen and unseen classes because it depends on pre-defined class embeddings. In contrast, Compositional Zero-Shot Learning uses the inherent hierarchies and structural connections among classes, creating new class representations by combining attributes, components, or other semantic elements. In our paper, we propose a novel framework that for the first time combines the Modern Hopfield Network with a Mixture of Experts (HOMOE) to classify the compositions of previously unseen objects. Specifically, the Modern Hopfield Network creates a memory that stores label prototypes and identifies relevant labels for a given input image. Following this, the Mixture of Expert models integrates the image with the fitting prototype to produce the final composition classification. Our approach achieves SOTA performance on several benchmarks, including MIT-States and UT-Zappos. We also examine how each component contributes to improved generalization.	翻訳日:2023-11-30 09:49:36 公開日:2023-11-23
# まとめ:RGB、RGB-D、RGB-T Salient Object Detection All in One: RGB, RGB-D, and RGB-T Salient Object Detection ( http://arxiv.org/abs/2311.14746v1 ) ライセンス: Link先を確認	Xingzhao Jia, Zhongqiu Zhao, Changlei Dongye, and Zhao Zhang	(参考訳) 正当性物体検出(SOD)は、画像内の最も魅力的な物体を特定することを目的としている。検出されるデータの種類によって、SODはRGB、RGB-D(Depth)、RGB-T(Thermal)、光電場SODなど様々な形態に分類できる。これまでの研究は、個々のデータ型による塩分検出に重点を置いてきた。 RGB-D SODモデルがRGB-Tデータを検出せざるを得ない場合、性能は低下する。 3種類のデータ(rgb, rgb-d, rgb-t)のサルエントオブジェクト検出タスクに対して, 統一解を提供する革新的なモデルフレームワークを提案する。 3種類のデータは、同じ重みパラメータを持つ1つのモデル(すべて1つ)で処理できる。本フレームワークでは,3種類のデータを単一入力バッチ内で順序的に結合し,トランスフォーマネットワークを用いて特徴を抽出する。本研究では, 高速でRGB, RGB-D, RGB-Tデータ(RGBデータ780FPS, RGB-D, RGB-Tデータ485FPS)を検出可能な,効率的な軽量SODモデルを提案する。 625万のパラメータだけで、AiOSODはRGB、RGB-D、RGB-Tデータセット上で優れたパフォーマンスを達成する。 Salient object detection (SOD) aims to identify the most attractive objects within an image. Depending on the type of data being detected, SOD can be categorized into various forms, including RGB, RGB-D (Depth), RGB-T (Thermal) and light field SOD. Previous researches have focused on saliency detection with individual data type. If the RGB-D SOD model is forced to detect RGB-T data it will perform poorly. We propose an innovative model framework that provides a unified solution for the salient object detection task of three types of data (RGB, RGB-D, and RGB-T). The three types of data can be handled in one model (all in one) with the same weight parameters. In this framework, the three types of data are concatenated in an ordered manner within a single input batch, and features are extracted using a transformer network. Based on this framework, we propose an efficient lightweight SOD model, namely AiOSOD, which can detect any RGB, RGB-D, and RGB-T data with high speed (780FPS for RGB data, 485FPS for RGB-D or RGB-T data). Notably, with only 6.25M parameters, AiOSOD achieves excellent performance on RGB, RGB-D, and RGB-T datasets.	翻訳日:2023-11-30 09:49:16 公開日:2023-11-23
# 第2回海洋コンピュータビジョンワークショップ(MaCVi)2024 The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 ( http://arxiv.org/abs/2311.14762v1 ) ライセンス: Link先を確認	Benjamin Kiefer, Lojze \v{Z}ust, Matej Kristan, Janez Per\v{s}, Matija Ter\v{s}ek, Arnold Wiliem, Martin Messmer, Cheng-Yen Yang, Hsiang-Wei Huang, Zhongyu Jiang, Heng-Cheng Kuo, Jie Mei, Jenq-Neng Hwang, Daniel Stadler, Lars Sommer, Kaer Huang, Aiguo Zheng, Weitu Chong, Kanokphan Lertniphonphan, Jun Xie, Feng Chen, Jian Li, Zhepeng Wang, Luca Zedda, Andrea Loddo, Cecilia Di Ruberto, Tuan-Anh Vu, Hai Nguyen-Truong, Tan-Sang Ha, Quan-Dung Pham, Sai-Kit Yeung, Yuan Feng, Nguyen Thanh Thien, Lixin Tian, Sheng-Yao Kuan, Yuan-Hao Ho, Angel Bueno Rodriguez, Borja Carrillo-Perez, Alexander Klein, Antje Alex, Yannik Steiniger, Felix Sattler, Edgardo Solano-Carrillo, Matej Fabijani\'c, Magdalena \v{S}umunec, Nadir Kapetanovi\'c, Andreas Michel, Wolfgang Gross, Martin Weinmann	(参考訳) 第2回海洋コンピュータビジョンワークショップ (MaCVi) 2024は、無人航空機 (UAV) と無人表面車両 (USV) の海上コンピュータビジョンを扱っている。 3つの課題が考えられる。 (i)再識別によるUAVによる海上物体追跡 (II)USVによる海面障害物の分離・検出 (iii)usvによる海上艇追跡。 usvベースの海上障害物のセグメンテーションと検出は、実世界の組み込みデバイス上での効率のよい推論に取り組む新しい組込みチャレンジを含む3つの下位課題を特徴としている。本報告では,課題から得られた知見を概観する。統計的および定性的な分析を行い、195件以上の応募からトレンドを評価する。すべてのデータセット、評価コード、およびリーダボードがhttps://macvi.org/workshop/macvi24.comで公開されている。 The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obstacle Segmentation and Detection features three sub-challenges, including a new embedded challenge addressing efficicent inference on real-world embedded devices. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 195 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi24.	翻訳日:2023-11-30 09:35:57 公開日:2023-11-23
# SinSR:1ステップで拡散に基づく超解像 SinSR: Diffusion-Based Image Super-Resolution in a Single Step ( http://arxiv.org/abs/2311.14760v1 ) ライセンス: Link先を確認	Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C. Kot, Bihan Wen	(参考訳) 拡散モデルに基づく超解像(SR)法は有望な結果を示すが、その実用性は相当な数の推論ステップによって妨げられる。最近の方法では初期状態での劣化画像を利用してマルコフ連鎖を短縮する。しかしながら、これらの解は分解過程の正確な定式化に依存するか、あるいは比較的長い生成経路(例えば15回)を必要とする。推論速度を向上させるため,SinSRと呼ばれる単一ステップのSR生成を実現するための簡易かつ効果的な手法を提案する。具体的には、拡散型SRを高速化する最新のSOTA法から、まず決定論的サンプリング過程を導出する。これにより、入力されたランダムノイズと生成された高解像度画像とのマッピングが、トレーニング中の推論ステップの削減および許容される数で得られる。この決定論的マッピングを1つの推論ステップでsrを実行する学生モデルに蒸留できることを示す。さらに, 蒸留プロセスにおいて, 地上構造画像を同時に活用する新たな一貫性保存損失を提案し, 生徒モデルの性能が教師モデルの特徴多様体にのみ束縛されることを保証し, さらなる性能向上をもたらす。合成および実世界のデータセットを用いた大規模な実験により,提案手法は従来のSOTA法と教師モデルに比較して,1つのサンプリングステップで同等あるいはそれ以上の性能を達成できることが実証された。私たちのコードはhttps://github.com/wyf0912/SinSRでリリースされます。 While super-resolution (SR) methods based on diffusion models exhibit promising results, their practical application is hindered by the substantial number of required inference steps. Recent methods utilize degraded images in the initial state, thereby shortening the Markov chain. Nevertheless, these solutions either rely on a precise formulation of the degradation process or still necessitate a relatively lengthy generation path (e.g., 15 iterations). To enhance inference speed, we propose a simple yet effective method for achieving single-step SR generation, named SinSR. Specifically, we first derive a deterministic sampling process from the most recent state-of-the-art (SOTA) method for accelerating diffusion-based SR. This allows the mapping between the input random noise and the generated high-resolution image to be obtained in a reduced and acceptable number of inference steps during training. We show that this deterministic mapping can be distilled into a student model that performs SR within only one inference step. Additionally, we propose a novel consistency-preserving loss to simultaneously leverage the ground-truth image during the distillation process, ensuring that the performance of the student model is not solely bound by the feature manifold of the teacher model, resulting in further performance improvement. Extensive experiments conducted on synthetic and real-world datasets demonstrate that the proposed method can achieve comparable or even superior performance compared to both previous SOTA methods and the teacher model, in just one sampling step, resulting in a remarkable up to x10 speedup for inference. Our code will be released at https://github.com/wyf0912/SinSR	翻訳日:2023-11-30 09:35:43 公開日:2023-11-23
# ディープラーニングによる暗号通貨価格予測 - 金融、ブロックチェーン、テキストデータの統合 Forecasting Cryptocurrency Prices Using Deep Learning: Integrating Financial, Blockchain, and Text Data ( http://arxiv.org/abs/2311.14759v1 ) ライセンス: Link先を確認	Vincent Gurgul, Stefan Lessmann, Wolfgang Karl H\"ardle	(参考訳) 本稿では,暗号通貨の価格予測,特にBitcoin(BTC)とEthereum(ETH)における機械学習(ML)と自然言語処理(NLP)の応用について検討する。主にtwitterとredditのニュースやソーシャルメディアのデータに焦点を当て,高度なディープラーニングnlp手法を用いて,公開感情が暗号通貨評価に与える影響を分析した。従来の価格回帰に加えて、暗号通貨の価格予測を分類問題として扱う。これには価格変動の予測(上昇または下降)と局所極端の識別の両方が含まれる。我々は,NLPデータ統合の有無にかかわらず,各種MLモデルの性能を比較した。その結果,NLPデータの導入により予測性能が大幅に向上することが判明した。我々は,Twitter-RoBERTaやBART MNLIといった事前学習モデルが市場感情を捉える上で非常に有効であること,そして細調整されたLarge Language Models (LLMs) が大幅な予測改善をもたらすことを発見した。特に、BART MNLIゼロショット分類モデルでは、テキストデータからブルリッシュ信号やベアリッシュ信号を抽出する能力がかなり高い。当社のモデルはすべて、さまざまな検証シナリオで一貫して利益を生成しますが、利益の減少やNLPデータの影響の減少は観測できません。本研究は,金融予測の改善におけるテキスト分析の可能性を浮き彫りにし,ニュアンス市場の感情を捉えたnlp手法の有効性を実証する。 This paper explores the application of Machine Learning (ML) and Natural Language Processing (NLP) techniques in cryptocurrency price forecasting, specifically Bitcoin (BTC) and Ethereum (ETH). Focusing on news and social media data, primarily from Twitter and Reddit, we analyse the influence of public sentiment on cryptocurrency valuations using advanced deep learning NLP methods. Alongside conventional price regression, we treat cryptocurrency price forecasting as a classification problem. This includes both the prediction of price movements (up or down) and the identification of local extrema. We compare the performance of various ML models, both with and without NLP data integration. Our findings reveal that incorporating NLP data significantly enhances the forecasting performance of our models. We discover that pre-trained models, such as Twitter-RoBERTa and BART MNLI, are highly effective in capturing market sentiment, and that fine-tuning Large Language Models (LLMs) also yields substantial forecasting improvements. Notably, the BART MNLI zero-shot classification model shows considerable proficiency in extracting bullish and bearish signals from textual data. All of our models consistently generate profit across different validation scenarios, with no observed decline in profits or reduction in the impact of NLP data over time. The study highlights the potential of text analysis in improving financial forecasts and demonstrates the effectiveness of various NLP techniques in capturing nuanced market sentiment.	翻訳日:2023-11-30 09:35:18 公開日:2023-11-23
# 呼吸疾患検出のための融合音声例と表現 Fused Audio Instance and Representation for Respiratory Disease Detection ( http://arxiv.org/abs/2204.10581v4 ) ライセンス: Link先を確認	Tuan Truong, Matthias Lenga, Antoine Serrurier, Sadegh Mohammadi	(参考訳) 体音の音声に基づく分類技術は、呼吸器疾患の診断を助けるために長年研究されてきた。ほとんどの研究は、主要なバイオマーカーとしてcoughの使用に重点を置いているが、他の身体音は呼吸器疾患を検出する可能性も持っている。新型コロナウイルスに関する最近の研究によると、息の音と発声音は、この病気と相関している。本研究は,呼吸性疾患の診断方法としてFAIR(Fused Audio Instance and Representation)を提案する。フェアは波形とスペクトログラムで表される様々なボディサウンドからジョイント特徴ベクトルを構築することに依存している。体音の波形とスペクトログラムの表現を組み合わせることで、COVID-19検出のユースケースについて実験を行った。以上の結果から, 聴覚, 呼吸, 音声から抽出した特徴を組み合わすことで, 受信者動作特性曲線(AUC)スコアが0.8658, 感度が0.8057, 特異性が0.7958であることが示唆された。スペクトログラムや波形にのみ訓練されたモデルと比較して、両表現の使用によりAUCスコアが向上し、スペクトルと波形表現の組み合わせは抽出した特徴を豊かにし、1つの表現のみを使用するモデルよりも優れていることを示す。 Audio-based classification techniques on body sounds have long been studied to aid in the diagnosis of respiratory diseases. While most research is centered on the use of cough as the main biomarker, other body sounds also have the potential to detect respiratory diseases. Recent studies on COVID-19 have shown that breath and speech sounds, in addition to cough, correlate with the disease. Our study proposes Fused Audio Instance and Representation (FAIR) as a method for respiratory disease detection. FAIR relies on constructing a joint feature vector from various body sounds represented in waveform and spectrogram form. We conducted experiments on the use case of COVID-19 detection by combining waveform and spectrogram representation of body sounds. Our findings show that the use of self-attention to combine extracted features from cough, breath, and speech sounds leads to the best performance with an Area Under the Receiver Operating Characteristic Curve (AUC) score of 0.8658, a sensitivity of 0.8057, and a specificity of 0.7958. Compared to models trained solely on spectrograms or waveforms, the use of both representations results in an improved AUC score, demonstrating that combining spectrogram and waveform representation helps to enrich the extracted features and outperforms the models that use only one representation.	翻訳日:2023-11-28 05:21:44 公開日:2023-11-23
# ビッグデータクラスタリングにK-meansを使うには? How to Use K-means for Big Data Clustering? ( http://arxiv.org/abs/2204.07485v3 ) ライセンス: Link先を確認	Rustam Mussabayev, Nenad Mladenovic, Bassem Jarboui, Ravil Mussabayev	(参考訳) K-meansはデータマイニングにおいて重要な役割を担い、ユークリッド最小値クラスタリング(MSSC)モデルの下で最もシンプルで広く使われているアルゴリズムである。しかし、その性能は膨大なデータに適用すると劇的に低下する。したがって、データ、時間、アルゴリズムといった計算資源を可能な限り少なく使用して、ビッグデータにスケールすることで、K平均を改善することが重要である。そこで我々は,K-meansとK-means++アルゴリズムを,‘true Big Data’アルゴリズムの特性を満足するビッグデータクラスタリングに利用し,ソリューションの品質と実行性の観点から,古典的かつ最新のMSSCアプローチよりも優れた並列方式を提案する。新たなアプローチでは,MSSC問題をメタヒューリスティクスを使わずに分解することで,グローバル検索を自然に実現している。この研究は、ビッグデータクラスタリング問題を解決するための基本的なアプローチがデータの分解であることを示している。新しいアルゴリズムの実証的な成功により、優れたクラスタリングソリューションを得るためにより多くのデータが必要であるという共通の信念に挑戦することができました。さらに,より高度なクラスタリングソリューションを得るためには,より洗練されたハイブリッドアプローチとアルゴリズムが必要であるという確立されたトレンドに疑問を呈する。 K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of data. Therefore, it is crucial to improve K-means by scaling it to big data using as few of the following computational resources as possible: data, time, and algorithmic ingredients. We propose a new parallel scheme of using K-means and K-means++ algorithms for big data clustering that satisfies the properties of a ``true big data'' algorithm and outperforms the classical and recent state-of-the-art MSSC approaches in terms of solution quality and runtime. The new approach naturally implements global search by decomposing the MSSC problem without using additional metaheuristics. This work shows that data decomposition is the basic approach to solve the big data clustering problem. The empirical success of the new algorithm allowed us to challenge the common belief that more data is required to obtain a good clustering solution. Moreover, the present work questions the established trend that more sophisticated hybrid approaches and algorithms are required to obtain a better clustering solution.	翻訳日:2023-11-28 05:21:24 公開日:2023-11-23
# 量子コンピュータ上での重力波マッチングフィルタリング Gravitational-wave matched filtering on a quantum computer ( http://arxiv.org/abs/2204.04159v3 ) ライセンス: Link先を確認	Do\u{g}a Veske, Cenk T\"uys\"uz, Mirko Amico, Nicholas T. Bronn, Olivia T. Lanes, Imre Bartos, Zsuzsa M\'arka, Sebastian Will, Szabolcs M\'arka	(参考訳) 最先端の量子コンピュータは正確な計算に適用性が非常に限られている。本稿では,2値ブラックホールの融合から重力波信号を検出するために,量子ビットベースマッチングフィルタの最初の実験を行った。超伝導量子ビットの実装により、2元ブラックホールの融合の信号対雑音比が古典的な計算で達成でき、実用的なタスクに量子ビットの有効性を示す証拠が得られた。この応用のために考案したアルゴリズムは,量子計算と古典計算を併用したモンテカルロアルゴリズムである。時間領域畳み込みの準量子速度アップを提供するもので、高速フーリエ変換で実現可能である。 State of the art quantum computers have very limited applicability for accurate calculations. Here we report the first experimental demonstration of qubit-based matched filtering for a detection of the gravitational-wave signal from a binary black hole merger. With our implementation on noisy superconducting qubits, we obtained a similar signal-to-noise ratio for the binary black hole merger as achievable with classical computation, providing evidence for the utility of qubits for practically relevant tasks. The algorithm we invented for this application is a Monte Carlo algorithm which uses quantum and classical computation together. It provides a quasi-quadartic speed-up for time-domain convolution, similar to achievable with fast Fourier transform.	翻訳日:2023-11-28 05:21:04 公開日:2023-11-23
# 安全・信頼性の観点からの物体(ミス)検出の評価:議論と対策 Evaluating Object (mis)Detection from a Safety and Reliability Perspective: Discussion and Measures ( http://arxiv.org/abs/2203.02205v3 ) ライセンス: Link先を確認	Andrea Ceccarelli and Leonardo Montecchi	(参考訳) 安全クリティカルドメインのオブジェクト検出器は、自律的なアクターの動作に最も干渉しそうなオブジェクトの検出を優先すべきである、と我々は主張する。特に、アクターの安全性と信頼性に影響を与える可能性のあるオブジェクトに当てはまる。自律運転における物体(誤検出)の検出が安全性と信頼性に与える影響を定量化するために,最も危険で運転決定に影響する可能性が最も高い物体の正確な識別を報いる新しい物体検出手法を提案する。これを実現するために,対象車両に対する近接,方向,相対速度に基づく物体の検出に報いる物体臨界モデルを構築した。次に、最近の自律走行データセットnuScenesにモデルを適用し、9つの物体検出器を比較した。その結果、いくつかの環境では、安全性と信頼性に重点を置いている場合、nuScenesランキングでベストに機能するオブジェクト検出器は好ましくないことが判明した。 We argue that object detectors in the safety critical domain should prioritize detection of objects that are most likely to interfere with the actions of the autonomous actor. Especially, this applies to objects that can impact the actor's safety and reliability. To quantify the impact of object (mis)detection on safety and reliability in the context of autonomous driving, we propose new object detection measures that reward the correct identification of objects that are most dangerous and most likely to affect driving decisions. To achieve this, we build an object criticality model to reward the detection of the objects based on proximity, orientation, and relative velocity with respect to the subject vehicle. Then, we apply our model on the recent autonomous driving dataset nuScenes, and we compare nine object detectors. Results show that, in several settings, object detectors that perform best according to the nuScenes ranking are not the preferable ones when the focus is shifted on safety and reliability.	翻訳日:2023-11-28 05:20:19 公開日:2023-11-23
# スピノダル分解による超高インダクタンス材料 Ultrahigh-inductance materials from spinodal decomposition ( http://arxiv.org/abs/2111.05088v2 ) ライセンス: Link先を確認	Ran Gao, Hsiang-Sheng Ku, Hao Deng, Wenlong Yu, Tian Xia, Feng Wu, Zhijun Song, Xiaohe Miao, Chao Zhang, Yue Lin, Yaoyun Shi, Hui-Hai Zhao, Chunqing Deng	(参考訳) 動的インダクタンスを有する不規則超伝導窒化物は、長い間、高インダクタンス量子回路応用の主要な材料候補と見なされてきた。動的インダクタンスと対応する回路インピーダンスを増加させるため、材料寸法を減少させる努力は継続するが、材料品質を損なうことなくさらに改善することが根本的な課題となる。そこで本研究では,マイクロ波損失を低く抑えつつ,スピノダル分解による超伝導材料の動的インダクタンスを劇的に増加させる手法を提案する。モデルシステムとして, エピタキシャルTi\textsubscript{0.48}Al\textsubscript{0.52}Nを用い, スピノダール分解による絶縁体-超伝導体遷移の誘導を初めて実証した。測定された速度論的インダクタンスは、最も不規則な超伝導窒化物と比較して2～3等級に増加した。本研究は, 超伝導量子回路のインダクタンスを大幅に向上し, 決定的に制御する手法である。 Disordered superconducting nitrides with kinetic inductance have long been considered a leading material candidate for high-inductance quantum-circuit applications. Despite continuing efforts in reducing material dimensions to increase the kinetic inductance and the corresponding circuit impedance, it becomes a fundamental challenge to improve further without compromising material qualities. To this end, we propose a method to drastically increase the kinetic inductance of superconducting materials via spinodal decomposition while keeping a low microwave loss. We use epitaxial Ti\textsubscript{0.48}Al\textsubscript{0.52}N as a model system, and for the first time demonstrate the utilization of spinodal decomposition to trigger the insulator-to-superconductor transition with a drastically enhanced material disorder. The measured kinetic inductance has increased by 2-3 orders of magnitude compared with all the best reported disordered superconducting nitrides. Our work paves the way for substantially enhancing and deterministically controlling the inductance for advanced superconducting quantum circuits.	翻訳日:2023-11-28 05:18:53 公開日:2023-11-23
# エピタキシャル窒化チタンマイクロ波共振器:構造、化学、電気およびマイクロ波特性 Epitaxial titanium nitride microwave resonators: Structural, chemical, electrical, and microwave properties ( http://arxiv.org/abs/2111.04227v4 ) ライセンス: Link先を確認	Ran Gao, Wenlong Yu, Hao Deng, Hsiang-Sheng Ku, Zhisheng Li, Minghua Wang, Xiaohe Miao, Yue Lin, Chunqing Deng	(参考訳) 窒化チタンはマイクロ波損失が少なく、表面インダクタンスが高く、化学的安定性があるため、超伝導量子回路応用の魅力的な材料である。しかし、物理的特性とデバイス性能は材料の品質に大きく依存している。ここでは中間温度(300$^{\circ}$c)でマグネトロンスパッタリングによりサファイア基板上に堆積した高結晶性およびエピタキシャル窒化チタン薄膜に注目した。構造的, 化学的, 輸送的性質を徹底的に理解するために, 体系的かつ包括的な材料特性評価を行う。パターン型マイクロ波共振器を用いて低温でのマイクロ波損失を計測し, 単一光子系内部の最高品質係数を3.3\times 10^6$, $>1.0\times 10^7$とした。共振器の材料充填係数に調整されたマイクロ波損失試薬は、以前報告した超伝導共振器のベスト値とよく比較される。この研究は、エピタキシャル窒化チタンを用いた低損失超伝導量子回路の基礎を成す。 Titanium nitride is an attractive material for a range of superconducting quantum-circuit applications owing to its low microwave losses, high surface inductance, and chemical stability. The physical properties and device performance, nevertheless, depend strongly on the quality of the materials. Here we focus on the highly crystalline and epitaxial titanium nitride thin films deposited on sapphire substrates using magnetron sputtering at an intermediate temperature (300$^{\circ}$C). We perform a set of systematic and comprehensive material characterization to thoroughly understand the structural, chemical, and transport properties. Microwave losses at low temperatures are studied using patterned microwave resonators, where the best internal quality factor in the single-photon regime is measured to be $3.3\times 10^6$, and $> 1.0\times 10^7$ in the high-power regime. Adjusted with the material filling factor of the resonators, the microwave loss-tangent here compares well with the previously reported best values for superconducting resonators. This work lays the foundation of using epitaxial titanium nitride for low-loss superconducting quantum circuits.	翻訳日:2023-11-28 05:18:33 公開日:2023-11-23
# ニューラルネットワーク検証のための共有証明書 Shared Certificates for Neural Network Verification ( http://arxiv.org/abs/2109.00542v4 ) ライセンス: Link先を確認	Marc Fischer, Christian Sprecher, Dimitar I. Dimitrov, Gagandeep Singh, Martin Vechev	(参考訳) 既存のニューラルネットワーク検証器は、各層における到達可能な値の象徴的抽象化を伝播することにより、各入力が所定の摂動の下で正しく扱われることを示す。このプロセスは、各入力(画像など)と摂動(回転など)に対して独立してスクラッチから繰り返されるため、データセット全体を扱う場合のコストがかかる。本研究では,入力と摂動の異なる中間層で得られる抽象概念が重なり,あるいは互いに包含できるという重要な洞察に基づいて,精度を損なうことなく検証コストを削減する新しい手法を提案する。この知見を活かして,共有証明書の一般概念を導入し,複数の入力をまたいだ検証作業の再利用を可能にし,検証コストを削減した。一般的なパッチや幾何学的摂動を含む画像分類器に対する攻撃仕様やデータセットに対する検証コストの低減に有効な共有証明書の有効性を示すための実験的な評価を行った。実装はhttps://github.com/eth-sri/proof-sharingでリリースします。 Existing neural network verifiers compute a proof that each input is handled correctly under a given perturbation by propagating a symbolic abstraction of reachable values at each layer. This process is repeated from scratch independently for each input (e.g., image) and perturbation (e.g., rotation), leading to an expensive overall proof effort when handling an entire dataset. In this work, we introduce a new method for reducing this verification cost without losing precision based on a key insight that abstractions obtained at intermediate layers for different inputs and perturbations can overlap or contain each other. Leveraging our insight, we introduce the general concept of shared certificates, enabling proof effort reuse across multiple inputs to reduce overall verification costs. We perform an extensive experimental evaluation to demonstrate the effectiveness of shared certificates in reducing the verification cost on a range of datasets and attack specifications on image classifiers including the popular patch and geometric perturbations. We release our implementation at https://github.com/eth-sri/proof-sharing.	翻訳日:2023-11-28 05:17:48 公開日:2023-11-23
# 非マルコフ系における初期系相関によるサイドバンド冷却の最適化 Optimized sideband cooling with initial system correlations in non-Markovian regime ( http://arxiv.org/abs/2007.14094v2 ) ライセンス: Link先を確認	Wen-Zhao Zhang, Ting Tan, Jie Zhao, Wenlin Li, and Jiong Cheng	(参考訳) 一般機械式非マルコフ型貯水池と結合した標準光力学系において,初期系相関の存在下でのサイドバンド冷却を最適化した。本研究では,初期相関の効果をハイゼンベルク方程式の時間依存係数に組み込むことで,フォノン数の進化を研究する。冷却速度の概念を導入し,非マルコフ系におけるサイドバンド冷却効果を記述するために,平均フォノン還元関数を定義する。その結果,パラメトリック増幅型とビームスプリッター型の初期相関を導入することにより,瞬時フォノン数を大幅に削減できることがわかった。また,ビームスプリッタ型の初期相関を高めることにより,地中冷却速度を向上することができる。システムの初期状態を最適化し、Q変調技術を活用することにより、非常に短時間で安定した機械的地盤状態を得ることができる。我々の最適化冷却プロトコルは固体系のフォノン操作と量子情報処理のための魅力的なプラットフォームを提供する。 An optimized sideband cooling in the presence of initial system correlations is investigated for a standard optomechanical system coupled to a general mechanical non-Markovian reservoir. We study the evolution of phonon number by incorporating the effects of initial correlations into the time-dependent coefficients in the Heisenberg equation. We introduce the concept of cooling rate and define an average phonon reduction function to describe the sideband cooling effect in non-Markovian regime. Our results show that the instantaneous phonon number can be significantly reduced by introducing either the parametric-amplification type or the beam-splitter type initial correlations. In addition, the ground state cooling rate can be accelerated by enhancing the initial correlation of beam-splitter type. By optimizing the initial state of the system and utilizing Q-modulation technology, a stable mechanical ground state can be obtained in a very short time. Our optimized cooling protocol provides an appealing platform for phonon manipulation and quantum information processing in solid-state systems.	翻訳日:2023-11-28 05:16:32 公開日:2023-11-23
# スマートフォンリアルタイムアプリケーションのための知覚画像強調 Perceptual Image Enhancement for Smartphone Real-Time Applications ( http://arxiv.org/abs/2210.13552v2 ) ライセンス: Link先を確認	Marcos V. Conde, Florin Vasluianu, Javier Vazquez-Corral, Radu Timofte	(参考訳) 近年のカメラ設計や画像パイプラインの進歩により,スマートフォンによる高品質な画像の撮影が可能になった。しかし、スマートフォンカメラの小型化とレンズの限界のため、一般的には加工画像のアーチファクトや劣化が見られる。最も不快な効果は、ノイズアーティファクト、回折アーティファクト、ぼかし、HDR過剰露光である。画像復元のためのディープラーニング手法は、これらのアーティファクトをうまく取り除くことができる。しかし、多くのアプローチは、計算量とメモリ要件が重いため、モバイルデバイスのリアルタイムアプリケーションには適していない。本稿では,知覚的画像強調のための軽量ネットワークであるLPIENetを提案する。実験の結果,パラメータや操作がはるかに少ないため,提案したアーティファクトに対処でき,標準ベンチマークの最先端手法と比較して競争性能が向上することがわかった。さらに,提案手法の効率性と信頼性を証明するため,市販スマートフォンに直接モデルを配置し,性能評価を行った。我々のモデルは中級商用スマートフォンで1秒未満で2K解像度画像を処理することができる。 Recent advances in camera designs and imaging pipelines allow us to capture high-quality images using smartphones. However, due to the small size and lens limitations of the smartphone cameras, we commonly find artifacts or degradation in the processed images. The most common unpleasant effects are noise artifacts, diffraction artifacts, blur, and HDR overexposure. Deep learning methods for image restoration can successfully remove these artifacts. However, most approaches are not suitable for real-time applications on mobile devices due to their heavy computation and memory requirements. In this paper, we propose LPIENet, a lightweight network for perceptual image enhancement, with the focus on deploying it on smartphones. Our experiments show that, with much fewer parameters and operations, our model can deal with the mentioned artifacts and achieve competitive performance compared with state-of-the-art methods on standard benchmarks. Moreover, to prove the efficiency and reliability of our approach, we deployed the model directly on commercial smartphones and evaluated its performance. Our model can process 2K resolution images under 1 second in mid-level commercial smartphones.	翻訳日:2023-11-28 05:10:26 公開日:2023-11-23
# 移植学習による病理組織像の自動スコア化 Automatically Score Tissue Images Like a Pathologist by Transfer Learning ( http://arxiv.org/abs/2209.05954v4 ) ライセンス: Link先を確認	Iris Yan	(参考訳) がんは世界で2番目に多い死因である。早期にがんを診断することで多くの命を救える。病理学者は、腫瘍を特定するために手動で組織マイクロアレイ(TMA)画像を見る必要がある。既存の自動アルゴリズムは病理学者の正確性レベルに達していないか、あるいはかなりの人間の関与を必要とする。最大の課題は、異なる形状、サイズ、位置のtma画像が同じスコアを持つ可能性があることである。 tma画像における染色パターンの学習には膨大な数の画像が必要であり、医療機関のプライバシーや規制上の懸念からかなり制限されている。異なるがんタイプのTMA画像は、特定の共通の特徴を共有できるが、それらの組み合わせは、染色パターンの不均一性による精度を直接的に損なう。トランスファーラーニングは、同様の問題から強みを借りることのできる、新たな学習パラダイムである。しかし、既存のアプローチでは、通常、類似した学習問題の大規模なサンプルを必要とするが、異なるがんタイプのTMAイメージは、小さなサンプルサイズでしばしば利用可能であり、さらに既存のアルゴリズムは、類似した問題からの学習の転送に限られている。本稿では,複数の問題から学習可能な新しい移動学習アルゴリズムを提案する。各問題には小さなサンプルがあり,元の問題とはかなり異なる分布を持つことができる。提案したアルゴリズムは、スタンフォード組織マイクロアレイデータベース(Stanford tissue Microarray Database)から乳がんTMA画像の75.9%の精度で、重要な精度障壁(病理医の75%の精度レベル)を破ることを可能にした。転送学習理論の最近の発展とクラスタリング技術の実証的証拠によって支持されている。これにより、病理学者は腫瘍をリアルタイムでより高い精度で認識する自動アルゴリズムを確実に採用できる。 Cancer is the second leading cause of death in the world. Diagnosing cancer early on can save many lives. Pathologists have to look at tissue microarray (TMA) images manually to identify tumors, which can be time-consuming, inconsistent and subjective. Existing automatic algorithms either have not achieved the accuracy level of a pathologist or require substantial human involvements. A major challenge is that TMA images with different shapes, sizes, and locations can have the same score. Learning staining patterns in TMA images requires a huge number of images, which are severely limited due to privacy and regulation concerns in medical organizations. TMA images from different cancer types may share certain common characteristics, but combining them directly harms the accuracy due to heterogeneity in their staining patterns. Transfer learning is an emerging learning paradigm that allows borrowing strength from similar problems. However, existing approaches typically require a large sample from similar learning problems, while TMA images of different cancer types are often available in small sample size and further existing algorithms are limited to transfer learning from one similar problem. We propose a new transfer learning algorithm that could learn from multiple related problems, where each problem has a small sample and can have a substantially different distribution from the original one. The proposed algorithm has made it possible to break the critical accuracy barrier (the 75% accuracy level of pathologists), with a reported accuracy of 75.9% on breast cancer TMA images from the Stanford Tissue Microarray Database. It is supported by recent developments in transfer learning theory and empirical evidence in clustering technology. This will allow pathologists to confidently adopt automatic algorithms in recognizing tumors consistently with a higher accuracy in real time.	翻訳日:2023-11-28 05:09:12 公開日:2023-11-23
# 量子誤差緩和のためのユニバーサルサンプリング下限 Universal Sampling Lower Bounds for Quantum Error Mitigation ( http://arxiv.org/abs/2208.09178v4 ) ライセンス: Link先を確認	Ryuji Takagi and Hiroyasu Tajima and Mile Gu	(参考訳) 中間スケールの量子デバイスにおけるノイズ効果を抑制するために、多くの量子誤り軽減プロトコルが提案されている。しかし、その一般的な可能性と限界はいまだ解明されていない。特に、量子エラー軽減の究極の実現可能性を理解するためには、基本サンプリングコスト -- 任意の緩和プロトコルがノイズの多い量子デバイスを実行しなければならない回数 -- を特徴付けることが不可欠である。本稿では,量子誤差軽減のためのサンプリングコストの普遍的下限を定め,高い確率で所望の精度を達成する。我々の限界は、非線形後処理を含む一般的な緩和プロトコルや、未発見のプロトコルにも当てはまる。その結果、様々なノイズモデルにおいて、幅広い種類のプロトコルがエラーを緩和するために必要となるサンプリングコストは指数関数的に増大し、有用なノイズの短期量子デバイスのスケーラビリティにおける基本的な障害が明らかになった。 Numerous quantum error-mitigation protocols have been proposed, motivated by the critical need to suppress noise effects on intermediate-scale quantum devices. Yet, their general potential and limitations remain elusive. In particular, to understand the ultimate feasibility of quantum error mitigation, it is crucial to characterize the fundamental sampling cost -- how many times an arbitrary mitigation protocol must run a noisy quantum device. Here, we establish universal lower bounds on the sampling cost for quantum error mitigation to achieve the desired accuracy with high probability. Our bounds apply to general mitigation protocols, including the ones involving nonlinear postprocessing and those yet-to-be-discovered. The results imply that the sampling cost required for a wide class of protocols to mitigate errors must grow exponentially with the circuit depth for various noise models, revealing the fundamental obstacles in the scalability of useful noisy near-term quantum devices.	翻訳日:2023-11-28 05:08:22 公開日:2023-11-23
# DivideとConquer:Point-Wiseのバイナリ化による3Dポイントクラウドインスタンスセグメンテーション Divide and Conquer: 3D Point Cloud Instance Segmentation With Point-Wise Binarization ( http://arxiv.org/abs/2207.11209v4 ) ライセンス: Link先を確認	Weiguang Zhao, Yuyao Yan, Chaolong Yang, Jianan Ye, Xi Yang, Kaizhu Huang	(参考訳) ポイントクラウド上のインスタンスセグメンテーションは、3Dシーン理解にとって極めて重要である。ほとんどのSOTAは距離クラスタリングを採用しており、通常は有効であるが、隣接するオブジェクトを同じセマンティックラベルで区分けする(特に隣接するポイントを共有する場合)にはうまく機能しない。オフセットポイントの不均一な分布のため、これらの既存のメソッドはすべてのインスタンスポイントをクラスタ化できない。そこで本研究では,各点を二項化してセグメントインスタンスに分割してクラスタ化するPBNetという新しい分割・コンカレント戦略を設計する。我々のバイナリクラスタリングでは、オフセットインスタンスポイントを高密度点と低密度点(HP対LP)の2つのカテゴリに分けています。隣接オブジェクトは、LPを除去して明確に分離し、隣の投票方法でLPを割り当てることで完了および洗練することができる。過剰なセグメンテーションを抑制するために,各インスタンスの重みマスクを用いてローカルシーンを構築することを提案する。プラグインとして提案されているバイナリクラスタリングは、従来の距離クラスタリングを置き換えることができ、多くの主流ベースラインで一貫したパフォーマンス向上につながる。 ScanNetV2とS3DISデータセットに関する一連の実験は、我々のモデルの優位性を示している。特にPBNetは、ScanNetV2の公式ベンチマークチャレンジでトップにランクインし、最も高いmAPを達成した。コードはhttps://github.com/weiguangzhao/pbnetで公開される予定だ。 Instance segmentation on point clouds is crucially important for 3D scene understanding. Most SOTAs adopt distance clustering, which is typically effective but does not perform well in segmenting adjacent objects with the same semantic label (especially when they share neighboring points). Due to the uneven distribution of offset points, these existing methods can hardly cluster all instance points. To this end, we design a novel divide-and-conquer strategy named PBNet that binarizes each point and clusters them separately to segment instances. Our binary clustering divides offset instance points into two categories: high and low density points (HPs vs. LPs). Adjacent objects can be clearly separated by removing LPs, and then be completed and refined by assigning LPs via a neighbor voting method. To suppress potential over-segmentation, we propose to construct local scenes with the weight mask for each instance. As a plug-in, the proposed binary clustering can replace traditional distance clustering and lead to consistent performance gains on many mainstream baselines. A series of experiments on ScanNetV2 and S3DIS datasets indicate the superiority of our model. In particular, PBNet ranks first on the ScanNetV2 official benchmark challenge, achieving the highest mAP. Code will be available publicly at https://github.com/weiguangzhao/PBNet.	翻訳日:2023-11-28 05:07:22 公開日:2023-11-23
# 記憶能を考慮した3次元ヘリカルCT再構成法 3D helical CT Reconstruction with a Memory Efficient Learned Primal-Dual Architecture ( http://arxiv.org/abs/2205.11952v2 ) ライセンス: Link先を確認	Jevgenija Rudzusika, Buda Baji\'c, Thomas Koehler, Ozan \"Oktem	(参考訳) 深層学習によるCT(Computerd tomography)の再構成は, シミュレーション2次元低線量CTデータにおいて顕著な性能を示した。これは特に、CTイメージングのための手作り物理モデルを含む、ドメイン適応ニューラルネットワークに適用できる。このようなアーキテクチャを採用することで、トレーニングデータの需要が減少し、一般化によって改善される、という実証的な証拠がある。しかし,3次元ヘリカルCTは医用画像の取得法として最も一般的な3次元ヘリカルCTにおいて,急速に禁止となる膨大な計算資源を必要とする。さらに、臨床データには、フラックス測定の誤差、分解ミスマッチ、そして最も重要なことは、実際の真実がないことなど、シミュレーションで考慮されていない他の課題も伴っている。これらの課題に対処するために必要な計算可能トレーニングと組み合わせることの必要性は,臨床3次元ヘリカルCTによる深層学習の再構築を困難にしている。本稿では,学習プライマル・ダイアル (lpd) というドメイン適応型ニューラルネットワークアーキテクチャを改良し,この環境での再構築を訓練し,応用する。ヘリカル軌道をセクションに分割し,そのセクションに無回転のLPD反復を順次適用することで,これを実現する。我々の知る限りでは、この研究は、低線量CT画像や投影データセット(LDCT)のようなフルサイズの臨床データに、非ロールのディープラーニングアーキテクチャを適用した最初のものである。さらに、トレーニングとテストは、24GBのメモリを持つ単一のGPUカード上で行われる。 Deep learning based computed tomography (CT) reconstruction has demonstrated outstanding performance on simulated 2D low-dose CT data. This applies in particular to domain adapted neural networks, which incorporate a handcrafted physics model for CT imaging. Empirical evidence shows that employing such architectures reduces the demand for training data and improves upon generalisation. However, their training requires large computational resources that quickly become prohibitive in 3D helical CT, which is the most common acquisition geometry used for medical imaging. Furthermore, clinical data also comes with other challenges not accounted for in simulations, like errors in flux measurement, resolution mismatch and, most importantly, the absence of the real ground truth. The necessity to have a computationally feasible training combined with the need to address these issues has made it difficult to evaluate deep learning based reconstruction on clinical 3D helical CT. This paper modifies a domain adapted neural network architecture, the Learned Primal-Dual (LPD), so that it can be trained and applied to reconstruction in this setting. We achieve this by splitting the helical trajectory into sections and applying the unrolled LPD iterations to those sections sequentially. To the best of our knowledge, this work is the first to apply an unrolled deep learning architecture for reconstruction on full-sized clinical data, like those in the Low dose CT image and projection data set (LDCT). Moreover, training and testing is done on a single GPU card with 24GB of memory.	翻訳日:2023-11-28 05:05:50 公開日:2023-11-23
# 低周波・高周波同時ブートストラップによる大規模時系列表現学習 Large Scale Time-Series Representation Learning via Simultaneous Low and High Frequency Feature Bootstrapping ( http://arxiv.org/abs/2204.11291v2 ) ライセンス: Link先を確認	Vandan Gorade, Azad Singh and Deepak Mishra	(参考訳) ラベルのない時系列データからの表現の学習は難しい問題である。時系列領域における既存の自己監督的および非教師的アプローチの多くは、同時に低周波数の特徴を捉えない。さらに、これらの方法のいくつかは、トランスフォーマーのような大規模モデルを採用するか、コントラスト学習のような計算コストの高い技術に依存している。これらの問題に対処するために,非コントラスト型自己教師型学習手法を提案する。本手法は, 時系列データを入力として入力し, 同一家族からランダムに増補をサンプリングすることで, モデルの2つの分岐に対して2つの異なる拡張ビューを生成する。 BYOLの用語に従い、2つのブランチはオンラインとターゲットネットワークと呼ばれ、潜在表現のブートストラップを可能にする。 BYOLとは対照的に、バックボーンエンコーダにマルチ層パーセプトロン(MLP)ヘッドが続き、提案モデルは、追加の時間畳み込みネットワーク(TCN)ヘッドを含む。拡張ビューはエンコーダの大きなカーネル畳み込みブロックを通過するため、後続のMLPとTCNの組み合わせは、様々な受容場による低域と高周波数の時間変化の特徴を効果的に表現することができる。 2つのモジュール (MLP と TCN) は相補的に作用する。対象ネットワークブランチの各モジュールの結果を予測するために,各モジュールが学習するオンラインネットワークをトレーニングする。モデルの堅牢性を実証するために,5つの実世界の時系列データセットに関する広範な実験とアブレーション研究を行った。本手法は,5つの実世界のデータセットすべてにおいて最先端のパフォーマンスを達成した。 Learning representation from unlabeled time series data is a challenging problem. Most existing self-supervised and unsupervised approaches in the time-series domain do not capture low and high-frequency features at the same time. Further, some of these methods employ large scale models like transformers or rely on computationally expensive techniques such as contrastive learning. To tackle these problems, we propose a non-contrastive self-supervised learning approach efficiently captures low and high-frequency time-varying features in a cost-effective manner. Our method takes raw time series data as input and creates two different augmented views for two branches of the model, by randomly sampling the augmentations from same family. Following the terminology of BYOL, the two branches are called online and target network which allows bootstrapping of the latent representation. In contrast to BYOL, where a backbone encoder is followed by multilayer perceptron (MLP) heads, the proposed model contains additional temporal convolutional network (TCN) heads. As the augmented views are passed through large kernel convolution blocks of the encoder, the subsequent combination of MLP and TCN enables an effective representation of low as well as high-frequency time-varying features due to the varying receptive fields. The two modules (MLP and TCN) act in a complementary manner. We train an online network where each module learns to predict the outcome of the respective module of target network branch. To demonstrate the robustness of our model we performed extensive experiments and ablation studies on five real-world time-series datasets. Our method achieved state-of-art performance on all five real-world datasets.	翻訳日:2023-11-28 05:05:18 公開日:2023-11-23
# 高速ノイズ動作を用いた高速繰り返し猫符号 High-performance repetition cat code using fast noisy operations ( http://arxiv.org/abs/2212.11927v4 ) ライセンス: Link先を確認	Francois-Marie Le R\'egent, Camille Berdou, Zaki Leghtas, J\'er\'emie Guillaud and Mazyar Mirrahimi	(参考訳) 2光子駆動の散逸によって安定化されるボソニックキャットキュービットは、ビットフリップエラーの指数関数的な抑制と、この保護を保った広いゲートの恩恵を受ける。これらの特性により、ハードウェア効率が高くフォールトトレラントな量子プロセッサのビルディングブロックが期待できる。本稿では,高速だがノイズの多いCNOTゲートを用いた繰り返しキャットコードアーキテクチャの性能最適化手法を提案する。この最適化は、ボソニックモードの内在的な単光子損失率と2光子損失率との比として与えられる物理量に対する高い閾値をもたらし、また、必要オーバーヘッドのしきい値以下の非常に興味深いスケーリングにより、期待される論理誤差率に達する。キャット量子ビット演算の特定の誤差モデルに基づき、この最適化は高速パリティ測定を利用して、高速化された低忠実度CNOTゲートと高速アンシラパリティチェックキュービットを組み合わせる。キャットキュービットCNOTゲートが制御(アンシラ)キュービットの主要成分を持つ高度非対称誤差モデルである1-と、高速動作によって誘導されるリークの有無でエラー訂正性能の堅牢性を示す2-である。これらの性能を示すために,猫のクビット状態のリークを考慮した回路レベルの雑音下での繰り返しコードのサンプリング法を開発した。 Bosonic cat qubits stabilized by two-photon driven dissipation benefit from exponential suppression of bit-flip errors and an extensive set of gates preserving this protection. These properties make them promising building blocks of a hardware-efficient and fault-tolerant quantum processor. In this paper, we propose a performance optimization of the repetition cat code architecture using fast but noisy CNOT gates for stabilizer measurements. This optimization leads to high thresholds for the physical figure of merit, given as the ratio between intrinsic single-photon loss rate of the bosonic mode and the engineered two-photon loss rate, as well as a very interesting scaling below threshold of the required overhead, to reach an expected level of logical error rate. Relying on the specific error models for cat qubit operations, this optimization exploits fast parity measurements, using accelerated low-fidelity CNOT gates, combined with fast ancilla parity-check qubits. The significant enhancement in the performance is explained by: 1- the highly asymmetric error model of cat qubit CNOT gates with a major component on control (ancilla) qubits, and 2- the robustness of the error correction performance in presence of the leakage induced by fast operations. In order to demonstrate these performances, we develop a method to sample the repetition code under circuit-level noise that also takes into account cat qubit state leakage.	翻訳日:2023-11-28 04:56:52 公開日:2023-11-23
# 単一サーバキューシステムにおける学習型最適アドミッション制御 Learning-based Optimal Admission Control in a Single Server Queuing System ( http://arxiv.org/abs/2212.11316v2 ) ライセンス: Link先を確認	Asaf Cohen, Vijay G. Subramanian, Yili Zhang	(参考訳) 我々は,M/M/1待ち行列システムにおける入場制御問題の長期平均利益を最大化することを検討する。サービス完了時に定額の報酬と、待ち行列で待機している顧客に対して実施される時間単位当たりのコストにより、ディスペンサーは、システムの待ち行列の長さの観察の全履歴に基づいて、到着客を認めるか否かを判断する。 (naor 1969, econometrica) は、モデルの全パラメータが知られている場合、静的しきい値ポリシーを使用するのが最適であることを示した。本研究では,Naor(1969)の全情報モデルに対する最適ディスパッチポリシーについて,学習に基づくディスパッチアルゴリズムを提案し,その後悔を特徴づける。我々は,全情報を含む最適しきい値がゼロでない場合,アルゴリズムが$O(1)$後悔を達成し,全情報を持つ最適しきい値が0$である場合,$N$が到着数である場合,$O(\ln^{1+\epsilon}(N))$後悔を達成できることを示す。 We consider a long-term average profit maximizing admission control problem in an M/M/1 queuing system with unknown service and arrival rates. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue-length of the system. (Naor 1969, Econometrica) showed that if all the parameters of the model are known, then it is optimal to use a static threshold policy -- admit if the queue-length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full information model of Naor (1969). We show that the algorithm achieves an $O(1)$ regret when all optimal thresholds with full information are non-zero, and achieves an $O(\ln^{1+\epsilon}(N))$ regret for any specified $\epsilon>0$, in the case that an optimal threshold with full information is $0$ (i.e., an optimal policy is to reject all arrivals), where $N$ is the number of arrivals.	翻訳日:2023-11-28 04:56:27 公開日:2023-11-23
# 物体検出のための適応的自己学習 Adaptive Self-Training for Object Detection ( http://arxiv.org/abs/2212.05911v2 ) ライセンス: Link先を確認	Renaud Vandeghen and Gilles Louppe and Marc Van Droogenbroeck	(参考訳) ディープラーニングは、画像内のオブジェクト検出のタスクを解決する効果的なソリューションとして登場したが、大きなラベル付きデータセットを必要とするコストがかかる。このコストを軽減するために、豊富なラベルのないデータを活用する半教師付き物体検出手法が提案され、既に印象的な結果が出ている。しかし、これらの方法のほとんどがしきい値化によって擬似ラベルと接地オブジェクトをリンクする必要がある。以前の研究では、このしきい値は通常経験的に決定され、それは時間がかかり、1つのデータ分布に対してのみ実行される。ドメイン、つまりデータ分布が変化すると、新しくコストのかかるパラメータ検索が必要となる。本稿では,単純かつ効果的な教師教育手法である物体検出のための適応型自己学習法(astod)を提案する。 astodはスコアヒストグラムの基底値に基づいて閾値をコストなしで決定する。また,教師の予測の質を向上させるために,新しい擬似ラベル手法を提案する。疑似ラベル付けステップでは,未ラベル画像の異なるビューを用いて,誤り予測回数を削減し,よりよい候補ラベルを得る。教師と生徒は個別に教育を受けており、教師を生徒に置き換えることで、反復的な手法で利用することができる。 ms-cocoデータセットでは、しきい値パラメータを必要としない最先端のメソッドに対して一貫して良好に動作し、パラメータスイープ検索を必要とするメソッドで競合結果を示す。衛星画像を含むDIORデータセット上の教師付きベースラインに関する追加実験は、同様の結論を導き、データ分布に関係なく、自己学習においてスコア閾値を自動で適応させることが可能であることを証明した。コードはhttps:// github.com/rvandeghen/ASTODで公開されている。 Deep learning has emerged as an effective solution for solving the task of object detection in images but at the cost of requiring large labeled datasets. To mitigate this cost, semi-supervised object detection methods, which consist in leveraging abundant unlabeled data, have been proposed and have already shown impressive results. However, most of these methods require linking a pseudo-label to a ground-truth object by thresholding. In previous works, this threshold value is usually determined empirically, which is time consuming, and only done for a single data distribution. When the domain, and thus the data distribution, changes, a new and costly parameter search is necessary. In this work, we introduce our method Adaptive Self-Training for Object Detection (ASTOD), which is a simple yet effective teacher-student method. ASTOD determines without cost a threshold value based directly on the ground value of the score histogram. To improve the quality of the teacher predictions, we also propose a novel pseudo-labeling procedure. We use different views of the unlabeled images during the pseudo-labeling step to reduce the number of missed predictions and thus obtain better candidate labels. Our teacher and our student are trained separately, and our method can be used in an iterative fashion by replacing the teacher by the student. On the MS-COCO dataset, our method consistently performs favorably against state-of-the-art methods that do not require a threshold parameter, and shows competitive results with methods that require a parameter sweep search. Additional experiments with respect to a supervised baseline on the DIOR dataset containing satellite images lead to similar conclusions, and prove that it is possible to adapt the score threshold automatically in self-training, regardless of the data distribution. The code is available at https:// github.com/rvandeghen/ASTOD	翻訳日:2023-11-28 04:56:04 公開日:2023-11-23
# 雑音量子回路シミュレーションのための近似アルゴリズム Approximation Algorithm for Noisy Quantum Circuit Simulation ( http://arxiv.org/abs/2211.17028v2 ) ライセンス: Link先を確認	Mingyu Huang, Ji Guan, Wang Fang and Mingsheng Ying	(参考訳) ノイズ量子回路のシミュレーションは、量子ノイズが避けられない現在のNISQ(ノイズ中間量子)時代の量子アルゴリズムの設計と検証に不可欠である。しかし、量子状態の爆発問題(状態空間の次元は量子ビット数で指数関数的である)とノイズの複素(単項でない)表現のため、古典的よりもはるかに非効率である。これにより、約50キュービットのノイズ回路のみを略よくシミュレートすることができる。本稿では,シミュレーション可能な回路のスケーラビリティを向上させるために,ノイズ効率が重要でない場合に,ノイズ量子回路をシミュレートするための新しい近似アルゴリズムを提案する。このアルゴリズムは、ノイズシミュレーションのための新しいテンソルネットワーク図に基づいており、特異値分解を用いて、ダイアグラム内の量子ノイズのテンソルを近似する。テンソルネットワークダイアグラムの収縮は、GoogleのTensorNetwork上に実装されている。このアルゴリズムの有効性と実用性は、現実的な超伝導ノイズモデルを持つ実用的な量子回路の一連の実験によって実証される。その結果、アルゴリズムは最大225キュービット、20ノイズ(約1.8時間)の量子回路を近似的にシミュレートできる。特に,本手法は,量子軌道法(quantum trajectories method)の近似(サンプリング)アルゴリズムを高速化する。さらに,提案手法は,雑音が十分に小さい場合,量子軌道法におけるサンプル数を大幅に削減することができる。 Simulating noisy quantum circuits is vital in designing and verifying quantum algorithms in the current NISQ (Noisy Intermediate-Scale Quantum) era, where quantum noise is unavoidable. However, it is much more inefficient than the classical counterpart because of the quantum state explosion problem (the dimension of state space is exponential in the number of qubits) and the complex (non-unitary) representation of noises. Consequently, only noisy circuits with up to about 50 qubits can be simulated approximately well. This paper introduces a novel approximation algorithm for simulating noisy quantum circuits when the noisy effectiveness is insignificant to improve the scalability of the circuits that can be simulated. The algorithm is based on a new tensor network diagram for the noisy simulation and uses the singular value decomposition to approximate the tensors of quantum noises in the diagram. The contraction of the tensor network diagram is implemented on Google's TensorNetwork. The effectiveness and utility of the algorithm are demonstrated by experimenting on a series of practical quantum circuits with realistic superconducting noise models. As a result, our algorithm can approximately simulate quantum circuits with up to 225 qubits and 20 noises (within about 1.8 hours). In particular, our method offers a speedup over the commonly-used approximation (sampling) algorithm -- quantum trajectories method. Furthermore, our approach can significantly reduce the number of samples in the quantum trajectories method when the noise rate is small enough.	翻訳日:2023-11-28 04:54:51 公開日:2023-11-23
# MECCH:メタパスコンテキスト畳み込みに基づく異種グラフニューラルネットワーク MECCH: Metapath Context Convolution-based Heterogeneous Graph Neural Networks ( http://arxiv.org/abs/2211.12792v2 ) ライセンス: Link先を確認	Xinyu Fu, Irwin King	(参考訳) 複数のノードとエッジを持つ構造データによる表現学習のために,ヘテロジニアスグラフニューラルネットワーク(hgnns)が提案されている。 HGNNが深くなったときのパフォーマンス劣化問題に対処するため、研究者はHGNNにメタパスを結合し、セマンティクスに密接に関連するノードをグラフ内でより遠くまで関連付ける。しかし、既存のメタパスベースのモデルは情報損失または高い計算コストに悩まされている。これらの問題を解決するために,メタパスコンテキスト畳み込みに基づく異種グラフニューラルネットワーク(MECCH)を提案する。 MECCHは、冗長性を避けながら損失のないノード情報の集約を容易にする新しいタイプのグラフ構造であるメタパスコンテキストを活用する。具体的には,(1)メタパスコンテクスト構成,(2)メタパスコンテクストエンコーダ,(3)畳み込みメタパス融合の3つの特徴前処理により,入力グラフから包括的情報を効率的に抽出する。ノード分類とリンク予測のための5つの実世界の異種グラフデータセットの実験により、MECCHは計算効率を向上した最先端のベースラインと比較して予測精度が優れていることが示された。 Heterogeneous graph neural networks (HGNNs) were proposed for representation learning on structural data with multiple types of nodes and edges. To deal with the performance degradation issue when HGNNs become deep, researchers combine metapaths into HGNNs to associate nodes closely related in semantics but far apart in the graph. However, existing metapath-based models suffer from either information loss or high computation costs. To address these problems, we present a novel Metapath Context Convolution-based Heterogeneous Graph Neural Network (MECCH). MECCH leverages metapath contexts, a new kind of graph structure that facilitates lossless node information aggregation while avoiding any redundancy. Specifically, MECCH applies three novel components after feature preprocessing to extract comprehensive information from the input graph efficiently: (1) metapath context construction, (2) metapath context encoder, and (3) convolutional metapath fusion. Experiments on five real-world heterogeneous graph datasets for node classification and link prediction show that MECCH achieves superior prediction accuracy compared with state-of-the-art baselines with improved computational efficiency.	翻訳日:2023-11-28 04:54:32 公開日:2023-11-23
# 部分空間間の量子コヒーレンス:状態変換、コヒーレンスパワー、$k$コヒーレンスおよびその他の性質 Quantum coherence between subspaces: State transformation, Cohering Power, $k$-coherence and other properties ( http://arxiv.org/abs/2302.13148v3 ) ライセンス: Link先を確認	Azam Mani, Fatemeh Rezazadeh, Vahid Karimipour	(参考訳) 最初に[1]で導入され[2,3]で開発されたボックコヒーレンスの概念は、個々の原子上で任意の精密な測定を行うために実験能力がそれほど繊細でない場合を含む。我々は,この資源理論のさらなる研究を促進する枠組みを,いくつかの点で開発する。この枠組みを用いて、不整合演算による状態変換の問題と、クラウス作用素の明示的な形式を提示することにより、ブロックコヒーレンス(英語版)の文脈における状態変換の十分条件を導出する。我々はまた、他の全ての状態およびすべてのユニタリゲートが非コヒーレント操作によって構築できる最大コヒーレント状態の形式を決定する。その後、量子チャネルのブロックコヒーレンスおよびブロックデコヒーレンスパワーの概念を定義し、これらのパワーを複数の種類のチャネルで決定する。最後に、ブロックコヒーレンスと、$k$-コヒーレンスと呼ばれる以前のコヒーレンスの拡張との関係について検討する。 The concept of bock-coherence, first introduced in [1] and developed in [2,3] encompasses the case where experimental capabilities are not so delicate to perform arbitrary refined measurements on individual atoms. We develop a framework which facilitates further investigation of this resource theory in several respects. Using this framework, we investigate the problem of state conversion by incoherent operations and by presenting the explicit form of Kraus operators, we derive a majorization-like sufficient condition for state conversion within the context of block coherence. We also determine the form of the maximally coherent state from which all other states and all unitary gates can be constructed by incoherent operations. Thereafter, we define the concept of block-cohering and block-decohering powers of quantum channels and determine these powers for several types of channels. Finally, we explore the relation between block coherence and a previous extension of coherence, known as $k$-coherence.	翻訳日:2023-11-28 04:44:41 公開日:2023-11-23
# 相互作用するカオス小体量子系における普遍スペクトル相関 Universal spectral correlations in interacting chaotic few-body quantum systems ( http://arxiv.org/abs/2302.09955v3 ) ライセンス: Link先を確認	Felix Fritzsch and Maximilian F. I. Kieler	(参考訳) 相互作用量子系におけるランダム行列スペクトル相関の出現は、量子カオスの定義的特徴である。このような相関関係を,適切なランダム・マトリクス・アンサンブルでモデル化したカオス的少数・多体相互作用におけるスペクトル形状因子とそのモーメントの観点から検討した。スペクトル形式因子は、大きなヒルベルト空間次元に対して正確に得られる。これらの結果を有限ヒルベルト空間次元に補間すると、非相互作用から強相互作用の場合への普遍的な遷移が見つかる。この遷移は単一のスケーリングパラメータによって制御される。二成分の場合、スペクトル形式因子の全てのモーメントについても同様の結果が得られる。その結果を広範囲な数値研究により確認し, 数値化された一対の蹴りローターによって与えられるより現実的なシステムにも適用できることを実証した。最終的に、我々は小さな結合状態をカバーする摂動的アプローチで分析を補完する。 The emergence of random matrix spectral correlations in interacting quantum systems is a defining feature of quantum chaos. We study such correlations in terms of the spectral form factor and its moments in interacting chaotic few- and many-body systems, modeled by suitable random-matrix ensembles. We obtain the spectral form factor exactly for large Hilbert space dimension. Extrapolating those results to finite Hilbert space dimension we find a universal transition from the non-interacting to the strongly interacting case, which can be described as a simple combination of these two limits. This transition is governed by a single scaling parameter. In the bipartite case we derive similar results also for all moments of the spectral form factor. We confirm our results by extensive numerical studies and demonstrate that they apply to more realistic systems given by a pair of quantized kicked rotors as well. Ultimately we complement our analysis by a perturbative approach covering the small coupling regime.	翻訳日:2023-11-28 04:44:21 公開日:2023-11-23
# 集団測定による混合量子状態の判別 Discriminating mixed qubit states with collective measurements ( http://arxiv.org/abs/2302.08882v2 ) ライセンス: Link先を確認	Lorcan O. Conlon, Falk Eilenberger, Ping Koy Lam and Syed M. Assad	(参考訳) 非直交状態が完全に区別できないのは量子力学の中心的な事実である。この特性は量子鍵分布の安全性を保証する。したがって、量子状態を最適に区別する戦略を設計し実装する量子通信において重要なタスクである。一般に、複数の量子状態のコピーにアクセスすると、最適な測定は集合的な測定となる。しかし、これまでは量子状態の識別を強化するために集団計測は使われていない。この主な理由の1つは、同じ事前確率の通常の状態識別設定では、量子状態の少なくとも3つのコピーを総合的に測定し、分離可能な測定値を上回る必要があるという事実である。これは実験的に非常に難しい。本研究では,不平等な先行確率を考慮し,非絡み合い測定で達成できるよりも低い誤差率を実現する集合計測を用いて,単一量子状態の2つのコピーを識別するプロトコルを実験的に提案する。我々は、超伝導量子プロセッサであるIBM Q System Oneデバイス上で測定を実装した。さらに,未知状態の3と4のコピーに対して集団測定を行い,その有効性が低かった。 It is a central fact in quantum mechanics that non-orthogonal states cannot be distinguished perfectly. This property ensures the security of quantum key distribution. It is therefore an important task in quantum communication to design and implement strategies to optimally distinguish quantum states. In general, when we have access to multiple copies of quantum states the optimal measurement will be a collective measurement. However, to date, collective measurements have not been used to enhance quantum state discrimination. One of the main reasons for this is the fact that, in the usual state discrimination setting with equal prior probabilities, at least three copies of a quantum state are required to be measured collectively to outperform separable measurements. This is very challenging experimentally. In this work, by considering unequal prior probabilities, we propose and experimentally demonstrate a protocol for distinguishing two copies of single qubit states using collective measurements which achieves a lower probability of error than can be achieved by any non-entangling measurement. We implement our measurements on an IBM Q System One device, a superconducting quantum processor. Additionally, we implemented collective measurements on three and four copies of the unknown state and found they performed poorly.	翻訳日:2023-11-28 04:44:07 公開日:2023-11-23
# 確率的表現によるPDE学習のためのモンテカルロニューラルPDE解法 Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation ( http://arxiv.org/abs/2302.05104v2 ) ライセンス: Link先を確認	Rui Zhang, Qi Meng, Rongchan Zhu, Yue Wang, Wenlei Shi, Shihua Zhang, Zhi-Ming Ma, Tie-Yan Liu	(参考訳) 利用可能なデータや高品質のデータに制限のあるシナリオでは、教師なしの方法で関数から関数へのニューラルPDEソルバを訓練することが不可欠である。しかし、既存の手法の効率性と精度は、有限差分法や擬スペクトル法といった数値アルゴリズムの特性によって制約される。これらの手法は、適切な精度を達成するために、慎重な時空間離散化を必要とし、特に相当な時空間変動のある場合において、重要な計算課題と不正確なシミュレーションをもたらす。これらの制限に対処するため,我々は,マクロ現象をランダム粒子のアンサンブルとして扱うpdesの確率表現を介して教師なしニューラルネットワークを学習するためのモンテカルロ神経pdeソルバ(mcnpソルバ)を提案する。他の教師なし法と比較して、mcnpソルバは自然にモンテカルロ法の利点を継承しており、これは時空間の変動に対して頑健であり、粗いステップサイズを許容できる。粒子のランダムウォークをシミュレートするために, 対流過程に heun 法を適用し, 拡散過程における近傍格子点の確率密度関数による期待値を計算した。これらの技術は精度を高め、モンテカルロサンプリングに関連する計算メモリと時間問題を回避し、従来のモンテカルロ法よりも改善した。対流拡散, アレン・カーン, ナヴィエ・ストークス方程式に関する数値実験により, 他の教師なしベースラインと比較して精度と効率が著しく向上した。ソースコードは、https://github.com/optray/MCNPで公開される。 In scenarios with limited available or high-quality data, training the function-to-function neural PDE solver in an unsupervised manner is essential. However, the efficiency and accuracy of existing methods are constrained by the properties of numerical algorithms, such as finite difference and pseudo-spectral methods, integrated during the training stage. These methods necessitate careful spatiotemporal discretization to achieve reasonable accuracy, leading to significant computational challenges and inaccurate simulations, particularly in cases with substantial spatiotemporal variations. To address these limitations, we propose the Monte Carlo Neural PDE Solver (MCNP Solver) for training unsupervised neural solvers via the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles. Compared to other unsupervised methods, MCNP Solver naturally inherits the advantages of the Monte Carlo method, which is robust against spatiotemporal variations and can tolerate coarse step size. In simulating the random walk of particles, we employ Heun's method for the convection process and calculate the expectation via the probability density function of neighbouring grid points during the diffusion process. These techniques enhance accuracy and circumvent the computational memory and time issues associated with Monte Carlo sampling, offering an improvement over traditional Monte Carlo methods. Our numerical experiments on convection-diffusion, Allen-Cahn, and Navier-Stokes equations demonstrate significant improvements in accuracy and efficiency compared to other unsupervised baselines. The source code will be publicly available at: https://github.com/optray/MCNP.	翻訳日:2023-11-28 04:43:24 公開日:2023-11-23
# 回路量子力学の非分散レジームにおけるポラリトン状態の特徴 Characterising Polariton States in Non-Dispersive Regime of Circuit Quantum Electrodynamics ( http://arxiv.org/abs/2302.04523v2 ) ライセンス: Link先を確認	Arvind Mamgain, Samarth Hawaldar, Athreya Shankar and Baladitya Suri	(参考訳) 読み出し共振器に結合された超伝導量子ビットは、現在、複数の量子コンピューティングと量子光学実験の構成要素となっている。典型的なクビット共振器系は分散系において結合され、クビットと共振器の分解はそれらの結合よりもはるかに大きい。本研究では,非分散系における超伝導トランスモン共振器を作製し,測定した。素量子ビットと共振器状態の混合によって形成される着飾った状態は、キュービットに駆動を印加することでさらに混合することができ、偏光子状態の形成につながる。本研究では,様々な駆動パワーと周波数におけるポラリトン状態間の遷移を実験的に検討し,量子共鳴系の高次レベルの非分散結合がポラリトン固有状態と対応する遷移周波数をどのように修飾するかを示す。また,Jaynes-Cummingsモデルから得られる分散状態以外の数値結果との密接な一致を報告する。 A superconducting qubit coupled to a read-out resonator is currently the building block of multiple quantum computing as well as quantum optics experiments. A typical qubit-resonator system is coupled in the dispersive regime, where the detuning between qubit and resonator is much greater than the coupling between them. In this work, we fabricated and measured a superconducting transmon-resonator system in the non-dispersive regime. The dressed states formed by the mixing of the bare qubit and resonator states can be further mixed by applying a drive on the qubit, leading to the formation of polariton states. We report experimental studies of transitions between polariton states at varying driving powers and frequencies and show how the non-dispersive coupling of the higher levels of the qubit-resonator system modifies the polariton eigenstates and the corresponding transition frequencies. We also report close agreement with numerical results obtained from a driven Jaynes-Cummings Model beyond the dispersive regime.	翻訳日:2023-11-28 04:42:56 公開日:2023-11-23
# ベイズ階層モデルの比較のための深層学習法 A Deep Learning Method for Comparing Bayesian Hierarchical Models ( http://arxiv.org/abs/2301.11873v4 ) ライセンス: Link先を確認	Lasse Elsem\"uller, Martin Schnuerch, Paul-Christian B\"urkner, Stefan T. Radev	(参考訳) ベイズモデル比較(BMC)は、競合する計算モデルの相対的な利点を評価し、不確実性をモデル選択決定に伝播する原理的なアプローチを提供する。しかし、BMCは高次元ネストパラメータ構造のため、一般的な階層モデルのクラスにとってしばしば難解である。この難易度に対処するために,確率的プログラムとしてインスタンス化可能な階層モデルの集合上でBMCを実行する深層学習手法を提案する。そこで本手法では,任意の実データアプリケーションに先立って,後続モデル確率の効率的な再推定と高速な性能検証を可能にする。そこで本研究では, 提案手法の性能を最先端の橋梁サンプリング法と比較し, 全てのBMC設定において優れた償却推論を示す。次に,従来bmcでは難解であった4つの階層的エビデンス蓄積モデルを比較し,その手法を示す。さらに,トランスファー学習を活用してトレーニング効率を向上させる方法を示す。すべての解析に再現可能なコードを提供し,オープンソースで実装する。 Bayesian model comparison (BMC) offers a principled approach for assessing the relative merits of competing computational models and propagating uncertainty into model selection decisions. However, BMC is often intractable for the popular class of hierarchical models due to their high-dimensional nested parameter structure. To address this intractability, we propose a deep learning method for performing BMC on any set of hierarchical models which can be instantiated as probabilistic programs. Since our method enables amortized inference, it allows efficient re-estimation of posterior model probabilities and fast performance validation prior to any real-data application. In a series of extensive validation studies, we benchmark the performance of our method against the state-of-the-art bridge sampling method and demonstrate excellent amortized inference across all BMC settings. We then showcase our method by comparing four hierarchical evidence accumulation models that have previously been deemed intractable for BMC due to partly implicit likelihoods. Additionally, we demonstrate how transfer learning can be leveraged to enhance training efficiency. We provide reproducible code for all analyses and an open-source implementation of our method.	翻訳日:2023-11-28 04:41:48 公開日:2023-11-23
# 原子スケールにおける量子崩壊モデルの指紋としてのキャンセル効果 Cancellation effects as a fingerprint of quantum collapse models at atomic scale ( http://arxiv.org/abs/2301.09920v2 ) ライセンス: Link先を確認	Kristian Piscicchia, Sandro Donadi, Simone Manti, Angelo Bassi, Maaneli Derakhshani, Lajos Diosi and Catalina Curceanu	(参考訳) 本研究は, 動的波動関数崩壊によって引き起こされる原子系からの自然電磁放射をX線領域で研究する。強い離脱は、これまでの文献で考慮された単純な場合において、放出が完全にコヒーレント(同じ核にある陽子)または非コヒーレント(電子)であることを示す。この低エネルギー状態において、自然放射線の速度は調査中の原子種に強く依存しており、初めて特定の崩壊モデルに依存することが判明した。 In this work the spontaneous electromagnetic radiation from atomic systems, induced by dynamical wave-function collapse, is investigated in the X-rays domain. Strong departures are evidenced with respect to the simple cases considered until now in the literature, in which the emission is either perfectly coherent (protons in the same nuclei) or incoherent (electrons). In this low-energy regime the spontaneous radiation rate strongly depends on the atomic species under investigation and, for the first time, is found to depend on the specific collapse model.	翻訳日:2023-11-28 04:41:09 公開日:2023-11-23
# 分割弦とホログラフィ Segmented strings and holography ( http://arxiv.org/abs/2304.10389v2 ) ライセンス: Link先を確認	Bercel Boldis, P\'eter L\'evay	(参考訳) 本稿では,ミンコフスキー時空における$ads_{d+1}$ と $cft_d$ で伝播する分割文字列間の接続を,真空状態から計算した量子情報理論量によって特徴づける。本稿では,AdS側の文字列セグメントのワールドシートの面積を,CFT側のフィリティ感受性(量子幾何テンソルの実部分)に接続可能であることを示す。この量は、運動空間の計量に従って空間的に変位する因果ダイヤモンドに対応する無限に分離された状態に対する計算複雑性として別の解釈を持つ。これらの転位因果ダイヤモンドは、弦ワールドシートセグメントをホログラフィック的に一意に再構成するための情報を符号化する。二次的に、バルクセグメントは、押し上げられた慣性フレームまたは一定加速度で進行する非慣性フレームにおける連続的な境界事象の因果的に順序づけられた集合を表す。特別な場合、$AdS_3$は4GL$(G$はニュートン定数、$L$はAdS長)の単位で区切られた文字列領域を条件付き相互情報 $I(A,C\vert B)$ として見ることもできる。この特別な場合、離散化されたナムブ・ゴト作用の変動は、トーダ方程式の形式の境界理論における絡み合いエントロピーの方程式につながる。任意の$d$ に対して、string world のシートパッチは、エンタングルメント・ウェッジのモジュラースライスに含まれている。それらは絡み合うくさびのある種のトモグラフィーを提供しており、そこでパッチは補間アンサッツ(英語版)、すなわちナムブ・ゴト作用の運動方程式の離散バージョンによって連結される。 In this paper we establish a connection between segmented strings propagating in $AdS_{d+1}$ and $CFT_d$ subsystems in Minkowski spacetime characterized by quantum information theoretic quantities calculated for the vacuum state. We show that the area of the world sheet of a string segment on the AdS side can be connected to fidelity susceptibility (the real part of the quantum geometric tensor) on the CFT side. This quantity has another interpretation as the computational complexity for infinitesimally separated states corresponding to causal diamonds that are displaced in a spacelike manner according to the metric of kinematic space. These displaced causal diamonds encode information for a unique reconstruction of the string world sheet segments in a holographic manner. Dually the bulk segments are representing causally ordered sets of consecutive boundary events in boosted inertial frames or in noninertial ones proceeding with constant acceleration. For the special case of $AdS_3$ one can also see the segmented stringy area in units of $4GL$ ($G$ is Newton's constant and $L$ is the AdS length) as the conditional mutual information $I(A,C\vert B)$ calculated for a trapezoid configuration arising from boosted spacelike intervals $A$,$B$ and $C$. In this special case the variation of the discretized Nambu-Goto action leads to an equation for entanglement entropies in the boundary theory of the form of a Toda equation. For arbitrary $d$ the string world sheet patches are living in the modular slices of the entanglement wedge. They seem to provide some sort of tomography of the entanglement wedge where the patches are linked together by the interpolation ansatz, i.e. the discretized version of the equations of motion for the Nambu-Goto action.	翻訳日:2023-11-28 04:33:02 公開日:2023-11-23
# HRS-Bench: テキスト-画像モデルのためのホロスティックで信頼性が高くスケーラブルなベンチマーク HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models ( http://arxiv.org/abs/2304.05390v2 ) ライセンス: Link先を確認	Eslam Mohamed Bakr, Pengzhan Sun, Xiaoqian Shen, Faizan Farooq Khan, Li Erran Li, Mohamed Elhoseiny	(参考訳) 近年,テキスト・トゥ・イメージ(T2I)モデルの研究が盛んに行われており,特にT2I合成タスクにおける最新結果が得られる拡散モデルが出現している。しかし、既存のベンチマークは主観的な人間の評価に大きく依存しており、モデルの性能を全体的評価する能力を制限する。さらに、新しいT2Iアーキテクチャの開発と評価の成果との間には大きなギャップがある。そこで本研究では,t2iモデルの具体的評価ベンチマークであるhrs-bench(hrs-bench)を提案する。限られた側面に焦点を当てた既存のベンチマークとは異なり、hrs-benchは13のスキルを測定し、正確性、堅牢性、一般化、公平性、バイアスの5つの主要なカテゴリに分類できる。さらに、HRS-Benchはファッション、動物、輸送、食べ物、衣服を含む50のシナリオをカバーする。幅広いスキルをカバーするメトリクスを用いて,最近の9つの大規模t2iモデルを評価した。 HRS-Benchの有効性を調査するために, 平均的評価の95%と一致した人的評価を行った。我々の実験では、既存のモデルは、望まれる対象数、視覚的テキストまたは接地感情で画像を生成するのに苦労することが多い。われわれのベンチマークは、将来のテキストから画像までの研究を容易にすることを願っている。コードとデータはhttps://eslambakr.github.io/hrsbench.github.ioで入手できる。 In recent years, Text-to-Image (T2I) models have been extensively studied, especially with the emergence of diffusion models that achieve state-of-the-art results on T2I synthesis tasks. However, existing benchmarks heavily rely on subjective human evaluation, limiting their ability to holistically assess the model's capabilities. Furthermore, there is a significant gap between efforts in developing new T2I architectures and those in evaluation. To address this, we introduce HRS-Bench, a concrete evaluation benchmark for T2I models that is Holistic, Reliable, and Scalable. Unlike existing bench-marks that focus on limited aspects, HRS-Bench measures 13 skills that can be categorized into five major categories: accuracy, robustness, generalization, fairness, and bias. In addition, HRS-Bench covers 50 scenarios, including fashion, animals, transportation, food, and clothes. We evaluate nine recent large-scale T2I models using metrics that cover a wide range of skills. A human evaluation aligned with 95% of our evaluations on average was conducted to probe the effectiveness of HRS-Bench. Our experiments demonstrate that existing models often struggle to generate images with the desired count of objects, visual text, or grounded emotions. We hope that our benchmark help ease future text-to-image generation research. The code and data are available at https://eslambakr.github.io/hrsbench.github.io	翻訳日:2023-11-28 04:31:57 公開日:2023-11-23
# フェデレーション学習を用いた映画推薦のためのプライバシー保護システム A Privacy Preserving System for Movie Recommendations Using Federated Learning ( http://arxiv.org/abs/2303.04689v3 ) ライセンス: Link先を確認	David Neumann, Andreas Lutz, Karsten M\"uller, Wojciech Samek	(参考訳) 過去数年間、レコメンダシステムはユビキタスになってきた。多くのユーザーが直面する選択の専横を解消し、多くのオンラインビジネスがエンゲージメントと販売を促進するために利用している。ソーシャルネットワーク内でフィルターバブルを作成するなど、他の批判に加えて、レコメンダシステムは大量の個人データを集めるためにしばしば証明される。しかし、レコメンデーションをパーソナライズするには、個人情報が不可欠である。フェデレートラーニング(Federated Learning)と呼ばれる最近の分散学習方式により,集中的な収集なしに個人データから学習できるようになった。第一に、第一に、フェデレーション学習を用いてトレーニングされており、その性質上、プライバシーを保護しつつ、ユーザはグローバルな洞察から恩恵を受けられるようにしています。さらに、FedQと呼ばれる新しいフェデレーション学習方式が採用され、非i-d-nessや小さなローカルデータセットの問題に対処するだけでなく、クライアント更新を早期に集約することで入力データ再構成攻撃を防止する。最後に、通信オーバーヘッドを低減するために圧縮を適用し、交換されたニューラルネットワークのパラメータ化を元のサイズのごく一部に大幅に圧縮する。量子化の欠如によってデータのプライバシも向上する可能性があると推測する。 Recommender systems have become ubiquitous in the past years. They solve the tyranny of choice problem faced by many users, and are utilized by many online businesses to drive engagement and sales. Besides other criticisms, like creating filter bubbles within social networks, recommender systems are often reproved for collecting considerable amounts of personal data. However, to personalize recommendations, personal information is fundamentally required. A recent distributed learning scheme called federated learning has made it possible to learn from personal user data without its central collection. Consequently, we present a recommender system for movie recommendations, which provides privacy and thus trustworthiness on multiple levels: First and foremost, it is trained using federated learning and thus, by its very nature, privacy-preserving, while still enabling users to benefit from global insights. Furthermore, a novel federated learning scheme, called FedQ, is employed, which not only addresses the problem of non-i.i.d.-ness and small local datasets, but also prevents input data reconstruction attacks by aggregating client updates early. Finally, to reduce the communication overhead, compression is applied, which significantly compresses the exchanged neural network parametrizations to a fraction of their original size. We conjecture that this may also improve data privacy through its lossy quantization stage.	翻訳日:2023-11-28 04:29:07 公開日:2023-11-23
# 文脈強化学習のための構造化状態空間モデル Structured State Space Models for In-Context Reinforcement Learning ( http://arxiv.org/abs/2303.03982v3 ) ライセンス: Link先を確認	Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder Singh, Feryal Behbahani	(参考訳) structured state space sequence (s4)モデルは最近、長距離シーケンスモデリングタスクで最先端のパフォーマンスを達成している。これらのモデルは高速な推論速度と並列トレーニングも備えており、多くの強化学習環境で有用である可能性がある。本研究では,隠れた状態を並列に初期化,リセットすることが可能なS4の変種を改良し,強化学習タスクに取り組むことを提案する。変更したアーキテクチャはシーケンス長のトランスフォーマーよりも漸近的に高速に動作し、単純なメモリベースのタスクでRNNよりも優れた性能を示す。修正されたアーキテクチャを部分的に観測可能な環境上で評価し、実際に、我々のモデルはRNNより5倍以上高速に動作し、RNNより優れています。そして,モデルが長距離シーケンスを処理できる能力を活用することで,エージェントがランダムにサンプリングされた連続的な制御環境と,ランダムにサンプリングされた環境の観察と行動の線形投影を併用した,挑戦的なメタ学習タスクにおいて,高い性能を達成する。さらに,結果モデルが分散処理に適応できることを示す。本論文では,構造化状態空間モデルがテキスト内強化学習タスクにおいて高速かつ高性能であることを示す。 https://github.com/luchris429/popjaxrl.comでコードを提供しています。 Structured state space sequence (S4) models have recently achieved state-of-the-art performance on long-range sequence modeling tasks. These models also have fast inference speeds and parallelisable training, making them potentially useful in many reinforcement learning settings. We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel, allowing us to tackle reinforcement learning tasks. We show that our modified architecture runs asymptotically faster than Transformers in sequence length and performs better than RNN's on a simple memory-based task. We evaluate our modified architecture on a set of partially-observable environments and find that, in practice, our model outperforms RNN's while also running over five times faster. Then, by leveraging the model's ability to handle long-range sequences, we achieve strong performance on a challenging meta-learning task in which the agent is given a randomly-sampled continuous control environment, combined with a randomly-sampled linear projection of the environment's observations and actions. Furthermore, we show the resulting model can adapt to out-of-distribution held-out tasks. Overall, the results presented in this paper show that structured state space models are fast and performant for in-context reinforcement learning tasks. We provide code at https://github.com/luchris429/popjaxrl.	翻訳日:2023-11-28 04:28:44 公開日:2023-11-23
# ニューラルネットワークにおける重要度推定器の信頼性評価のための特徴摂動増強 Feature Perturbation Augmentation for Reliable Evaluation of Importance Estimators in Neural Networks ( http://arxiv.org/abs/2303.01538v2 ) ライセンス: Link先を確認	Lennart Brocki and Neo Christopher Chung	(参考訳) ポストホックな説明手法は、ディープニューラルネットワークの内部動作をより解釈しやすくする。しかし、基礎的な真理が一般に欠けているため、入力特徴に重要得点を割り当てる局所的なポストホック解釈可能性手法は、評価が困難である。最も一般的な評価フレームワークの1つは、解釈可能性法による重要な特徴の摂動と予測精度の変化を測定することである。直感的には、予測精度の大幅な低下は、説明が予測結果(例えばロジット)に対する特徴の重要性を正しく定量化したことを示している。しかしながら、テストデータセット内の摂動サンプルは、トレーニングデータセットと比較して分散(ood)外であり、予期せぬ方法でモデルを妨げる可能性があるため、予測結果の変化は摂動アーティファクトに起因する可能性がある。この課題を克服するために、モデルトレーニング中に摂動画像を生成し、付加する機能摂動増強(FPA)を提案する。広範な計算実験を通じて,fpaが深層ニューラルネットワーク(dnn)を摂動に対してより強固にすることを示す。さらに、FPAを用いたDNNのトレーニングでは、重要なスコアのサインが、以前想定されていたよりも有意義にモデルを説明する可能性がある。全体として、FPAは、ポストホック解釈可能性の評価を改善する直感的なデータ拡張技術である。 Post-hoc explanation methods attempt to make the inner workings of deep neural networks more interpretable. However, since a ground truth is in general lacking, local post-hoc interpretability methods, which assign importance scores to input features, are challenging to evaluate. One of the most popular evaluation frameworks is to perturb features deemed important by an interpretability method and to measure the change in prediction accuracy. Intuitively, a large decrease in prediction accuracy would indicate that the explanation has correctly quantified the importance of features with respect to the prediction outcome (e.g., logits). However, the change in the prediction outcome may stem from perturbation artifacts, since perturbed samples in the test dataset are out of distribution (OOD) compared to the training dataset and can therefore potentially disturb the model in an unexpected manner. To overcome this challenge, we propose feature perturbation augmentation (FPA) which creates and adds perturbed images during the model training. Through extensive computational experiments, we demonstrate that FPA makes deep neural networks (DNNs) more robust against perturbations. Furthermore, training DNNs with FPA demonstrate that the sign of importance scores may explain the model more meaningfully than has previously been assumed. Overall, FPA is an intuitive data augmentation technique that improves the evaluation of post-hoc interpretability methods.	翻訳日:2023-11-28 04:28:22 公開日:2023-11-23
# 優先型強化学習におけるクエリ・ポリティクスのミスアライメント Query-Policy Misalignment in Preference-Based Reinforcement Learning ( http://arxiv.org/abs/2305.17400v2 ) ライセンス: Link先を確認	Xiao Hu, Jianxiong Li, Xianyuan Zhan, Qing-Shan Jia, Ya-Qin Zhang	(参考訳) 嗜好に基づく強化学習(PbRL)は、RLエージェントの振る舞いを人間の望ましい結果と整合させる自然な方法を提供するが、コストのかかる人間のフィードバックによって抑制されることが多い。フィードバック効率を向上させるため,既存のPbRL手法の多くは,報酬モデル全体の品質を最大化するためにクエリの選択に重点を置いている。この謎を解くために、既存のPbRL研究のクエリ選択スキームにおいて、長年無視されてきた問題を特定する: Query-Policy Misalignment。報酬モデル全体の品質を改善するために選択された一見有意義なクエリは、実際にはRLエージェントの関心と一致せず、政策学習にはほとんど役立ちず、結果としてフィードバック効率が低下することを示します。この課題は,双方向のクエリとポリシのアライメントを両立させる特別に設計されたハイブリッド・エクスペリエンス・リプレイによって効果的に解決できることを示す。シンプルでエレガントな手法で、数行のコードだけを変更することで、既存のアプローチに容易に組み込むことができます。提案手法は,PbRLタスクにおけるクエリ・ポリティクスのミスアライメントに対処することの重要性を実証し,人間のフィードバックとRLサンプルの効率の両面で大幅に向上することを示す。 Preference-based reinforcement learning (PbRL) provides a natural way to align RL agents' behavior with human desired outcomes, but is often restrained by costly human feedback. To improve feedback efficiency, most existing PbRL methods focus on selecting queries to maximally improve the overall quality of the reward model, but counter-intuitively, we find that this may not necessarily lead to improved performance. To unravel this mystery, we identify a long-neglected issue in the query selection schemes of existing PbRL studies: Query-Policy Misalignment. We show that the seemingly informative queries selected to improve the overall quality of reward model actually may not align with RL agents' interests, thus offering little help on policy learning and eventually resulting in poor feedback efficiency. We show that this issue can be effectively addressed via near on-policy query and a specially designed hybrid experience replay, which together enforce the bidirectional query-policy alignment. Simple yet elegant, our method can be easily incorporated into existing approaches by changing only a few lines of code. We showcase in comprehensive experiments that our method achieves substantial gains in both human feedback and RL sample efficiency, demonstrating the importance of addressing query-policy misalignment in PbRL tasks.	翻訳日:2023-11-28 04:19:39 公開日:2023-11-23
# 1次元非エルミタンスターク系におけるエルゴディディティから多体局在へ From Ergodicity to Many-Body Localization in a One-Dimensional Interacting Non-Hermitian Stark System ( http://arxiv.org/abs/2305.13636v3 ) ライセンス: Link先を確認	Jinghu Liu and Zhihao Xu	(参考訳) 非エルミート量子系における無秩序誘導多体局在(mbl)の研究が注目されている。しかし、非エルミート障害のないMBLは明らかにする必要がある。時間-逆対称性を持つ非相互ホッピングを持つ1次元の相互作用するスタークモデルを考える。周期境界条件 (pbcs) 下では、そのようなモデルは3種類の位相遷移(固有エネルギーの実複素遷移、位相相転移、非エルミートスターク mbl遷移)を示す。実複素およびトポロジカル相転移は熱力学的極限において同じ点で起こるが、非エルミートスタークMBL遷移とは一致しない。レベル統計により、系は、線形傾動ポテンシャルの強さの増加とともに、ジニブレアンサンブル(GE)からガウス直交アンサンブル(GOE)からポッションアンサンブル(Possionアンサンブル)へ遷移する。固有値の実際の複素遷移は、エルゴード系におけるGE-to-GOE遷移を伴う。さらに、レベル統計の第二の遷移は非エルミートスターク mbl の発生に対応する。我々は、非エルミートスタークMBLがロバストであり、スペクトル統計学および固有状態特性の既存の特徴量で確認できる障害誘発MBLと多くの類似点を有することを示した。絡み合いエントロピーと密度不均衡の動的進化は、実複素およびスタークMBL遷移を区別することができる。最後に, 開境界条件下での系には実複素遷移が欠如しており, 非エルミートスターク mbl の遷移はpbcs の系と同じであることがわかった。 Recent studies on disorder-induced many-body localization (MBL) in non-Hermitian quantum systems have attracted great interest. However, the non-Hermitian disorder-free MBL still needs to be clarified. We consider a one-dimensional interacting Stark model with nonreciprocal hoppings having time-reversal symmetry, the properties of which are boundary dependent. Under periodic boundary conditions (PBCs), such a model exhibits three types of phase transitions: the real-complex transition of eigenenergies, the topological phase transition, and the non-Hermitian Stark MBL transition. The real-complex and topological phase transitions occur at the same point in the thermodynamic limit but do not coincide with the non-Hermitian Stark MBL transition, which is quite different from the non-Hermitian disordered cases. By the level statistics, the system transitions from the Ginibre ensemble (GE) to the Gaussian orthogonal ensemble (GOE) to the Possion ensemble with the increase of the linear tilt potential's strength. The real-complex transition of the eigenvalues is accompanied by the GE-to-GOE transition in the ergodic regime. Moreover, the second transition of the level statistics corresponds to the occurrence of non-Hermitian Stark MBL. We demonstrate that the non-Hermitian Stark MBL is robust and shares many similarities with disorder-induced MBL, which several existing characteristic quantities of the spectral statistics and eigenstate properties can confirm. The dynamical evolutions of the entanglement entropy and the density imbalance can distinguish the real-complex and Stark MBL transitions. Finally, we find that our system under open boundary conditions lacks a real-complex transition, and the transition of non-Hermitian Stark MBL is the same as that under PBCs.	翻訳日:2023-11-28 04:18:58 公開日:2023-11-23
# Clembench: チャット最適化言語モデルを会話エージェントとして評価するためにゲームプレイを使用する Clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents ( http://arxiv.org/abs/2305.13455v3 ) ライセンス: Link先を確認	Kranti Chalamalasetti and Jana G\"otze and Sherzod Hakimov and Brielen Madureira and Philipp Sadler and David Schlangen	(参考訳) 近年,豊かな言語的・非言語的文脈で行動する「言語理解エージェント(situated language understanding agents)」-エイジェントを,注意深く構築された対話的環境でテストすることで体系的に評価する手法が提案されている。その他の最近の研究は、もし適切に設定されたとしても、Large Language Models (LLMs) はそのようなエージェント(シミュレーション)として理解できると主張している。 LLMは、特定の機能に挑戦するために構築された制約付きゲームライクな設定に公開することで、有意義に評価することができますか? そこで本研究では,現在のチャット最適化LDMがゲームプレイの指示に従うことができる程度に,5つのインタラクション設定について検討する。この能力とゲームプレイの品質は、異なるゲームの目的がどの程度うまく満たされているかによって測定され、開発サイクルに従って、より新しいモデルのパフォーマンスが向上する。比較的単純な例のゲームでもメトリクスは飽和していないため、提案された機器は診断値を持つことになる。 LLMを使ったゲームの実装と評価のための一般的なフレームワークは、https://github.com/clembench で公開されています。 Recent work has proposed a methodology for the systematic evaluation of "Situated Language Understanding Agents"-agents that operate in rich linguistic and non-linguistic contexts-through testing them in carefully constructed interactive settings. Other recent work has argued that Large Language Models (LLMs), if suitably set up, can be understood as (simulators of) such agents. A connection suggests itself, which this paper explores: Can LLMs be evaluated meaningfully by exposing them to constrained game-like settings that are built to challenge specific capabilities? As a proof of concept, this paper investigates five interaction settings, showing that current chat-optimised LLMs are, to an extent, capable to follow game-play instructions. Both this capability and the quality of the game play, measured by how well the objectives of the different games are met, follows the development cycle, with newer models performing better. The metrics even for the comparatively simple example games are far from being saturated, suggesting that the proposed instrument will remain to have diagnostic value. Our general framework for implementing and evaluating games with LLMs is available at https://github.com/clembench .	翻訳日:2023-11-28 04:18:31 公開日:2023-11-23
# 大言語モデルからの複合視覚手がかりによるゼロショット視覚関連検出 Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models ( http://arxiv.org/abs/2305.12476v3 ) ライセンス: Link先を確認	Lin Li, Jun Xiao, Guikun Chen, Jian Shao, Yueting Zhuang, Long Chen	(参考訳) CLIPのような事前訓練された視覚言語モデルは強力な一般化能力を示しており、ゼロショット視覚認識の領域において有望なツールとなっている。視覚的関係検出(VRD)は、画像内のオブジェクトペア間の関係(または相互作用)タイプを特定する典型的なタスクである。しかし、ゼロショットvrdのクラスベースプロンプトが一般的であるクリップは、異なる細かな関係タイプを区別するのに苦労し、2つのオブジェクトの本質的な空間情報を無視するなど、いくつかの弱点がある。そこで本研究では,複合記述プロンプトによる関係検出を解消する,ゼロショットvrd: recodeを提案する。具体的には、まず各述語カテゴリを主題、対象、空間構成要素に分解する。次に、大きな言語モデル(LLM)を活用して、各コンポーネントに対する記述ベースのプロンプト(またはビジュアルキュー)を生成する。異なる視覚的な手がかりは、異なる視点から類似した関連カテゴリの識別性を高め、vrdのパフォーマンスを著しく向上させる。異なる視覚的手がかりを動的に融合させるために,LLMが異なる視覚的手がかりに対して適切な重みを生成できるようにするチェーン・オブ・シント法を導入する。 4つのVRDベンチマークの大規模な実験は、RECODEの有効性と解釈可能性を示している。 Pretrained vision-language models, such as CLIP, have demonstrated strong generalization capabilities, making them promising tools in the realm of zero-shot visual recognition. Visual relation detection (VRD) is a typical task that identifies relationship (or interaction) types between object pairs within an image. However, naively utilizing CLIP with prevalent class-based prompts for zero-shot VRD has several weaknesses, e.g., it struggles to distinguish between different fine-grained relation types and it neglects essential spatial information of two objects. To this end, we propose a novel method for zero-shot VRD: RECODE, which solves RElation detection via COmposite DEscription prompts. Specifically, RECODE first decomposes each predicate category into subject, object, and spatial components. Then, it leverages large language models (LLMs) to generate description-based prompts (or visual cues) for each component. Different visual cues enhance the discriminability of similar relation categories from different perspectives, which significantly boosts performance in VRD. To dynamically fuse different cues, we further introduce a chain-of-thought method that prompts LLMs to generate reasonable weights for different visual cues. Extensive experiments on four VRD benchmarks have demonstrated the effectiveness and interpretability of RECODE.	翻訳日:2023-11-28 04:17:28 公開日:2023-11-23
# 高次Annealed Langevin拡散を用いた線形逆問題の解法 Solving Linear Inverse Problems using Higher-Order Annealed Langevin Diffusion ( http://arxiv.org/abs/2305.05014v3 ) ライセンス: Link先を確認	Nicolas Zilberstein, Ashutosh Sabharwal, Santiago Segarra	(参考訳) 我々は高次ランゲヴィン拡散に基づく線形逆問題に対する解を提案する。より正確には、未知の変数の後続分布から確実にサンプリングできる事前条件付き二階および三階ランゲヴィン力学を提案し、その計算効率は、その第一条件と両方の力学の非条件バージョンよりも高い。さらに, 事前条件付きダイナミクスはどちらも well-defined であり, 非条件付きの場合と同じ一意な不変分布を持つことを証明した。また,アルゴリズムの収束をさらに加速し,未知変数が離散的な場合に対応するという2つの利点を持つアニーリング手順も取り入れた。通信における2つの異なるタスク(MIMOシンボルの検出とチャネル推定)と画像に対する3つのタスクの数値実験は、我々の手法の汎用性を示し、計算複雑性を同等あるいは低めながら、競合するアプローチ(学習ベースを含む)と比較して高い性能を示す。 We propose a solution for linear inverse problems based on higher-order Langevin diffusion. More precisely, we propose pre-conditioned second-order and third-order Langevin dynamics that provably sample from the posterior distribution of our unknown variables of interest while being computationally more efficient than their first-order counterpart and the non-conditioned versions of both dynamics. Moreover, we prove that both pre-conditioned dynamics are well-defined and have the same unique invariant distributions as the non-conditioned cases. We also incorporate an annealing procedure that has the double benefit of further accelerating the convergence of the algorithm and allowing us to accommodate the case where the unknown variables are discrete. Numerical experiments in two different tasks in communications (MIMO symbol detection and channel estimation) and in three tasks for images showcase the generality of our method and illustrate the high performance achieved relative to competing approaches (including learning-based ones) while having comparable or lower computational complexity.	翻訳日:2023-11-28 04:16:42 公開日:2023-11-23
# 多目的進化強化学習によるロードバランサによる金融クラウドサービスのアイドルネスの低減 Reducing Idleness in Financial Cloud Services via Multi-objective Evolutionary Reinforcement Learning based Load Balancer ( http://arxiv.org/abs/2305.03463v2 ) ライセンス: Link先を確認	Peng Yang, Laoming Zhang, Haifeng Liu, Guiying Li	(参考訳) 近年,さまざまな企業が,自社のデータセンタを従来型のデータセンタからクラウドに移行する動きを見せている。主な動機の1つは、クラウドの弾力性によって運用コストを節約することである。本稿では,サーバ側から切り離すことなく,ユーザ接続の少ないアイドルサーバの出現率を低減するための金融サービスの必要性について論じる。本稿では、このニーズを双方向のオンライン負荷分散問題と考える。ニューラルネットワークベースのスケーラブルポリシは,要求される弾力性のために,ユーザ要求をさまざまなサーバにルーティングするように設計されている。政策の重み付けを最適化するために,進化的多目的学習フレームワークを提案する。アイドルネスの新たな目的が従来の産業ソリューションよりも130%以上削減されるだけでなく、本来の負荷分散目標自体もわずかに改善されている。合成データと実世界のデータの両方を用いた広範囲なシミュレーションは,金融サービスのアイドルネス低減に関する創発的問題に対する提案手法の詳細な適用可能性を明らかにするのに役立つ。 In recent years, various companies have started to shift their data services from traditional data centers to the cloud. One of the major motivations is to save on operational costs with the aid of cloud elasticity. This paper discusses an emerging need from financial services to reduce the incidence of idle servers retaining very few user connections, without disconnecting them from the server side. This paper considers this need as a bi-objective online load balancing problem. A neural network based scalable policy is designed to route user requests to varied numbers of servers for the required elasticity. An evolutionary multi-objective training framework is proposed to optimize the weights of the policy. Not only is the new objective of idleness is reduced by over 130% more than traditional industrial solutions, but the original load balancing objective itself is also slightly improved. Extensive simulations with both synthetic and real-world data help reveal the detailed applicability of the proposed method to the emergent problem of reducing idleness in financial services.	翻訳日:2023-11-28 04:16:24 公開日:2023-11-23
# Pick-a-Pic: テキスト対画像生成のためのユーザ嗜好のオープンデータセット Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation ( http://arxiv.org/abs/2305.01569v2 ) ライセンス: Link先を確認	Yuval Kirstain and Adam Polyak and Uriel Singer and Shahbuland Matiana and Joe Penna and Omer Levy	(参考訳) テキスト・ツー・イメージのユーザから人間の好みの大規模なデータセットを収集する能力は通常、企業に限定されており、そのようなデータセットは一般にはアクセスできない。この問題に対処するため,テキスト・ツー・イメージのユーザが画像を生成し,好みを指定できるWebアプリを開発した。このWebアプリを使ってPick-a-Picという,テキストと画像のプロンプトの大規模でオープンなデータセットを構築します。このデータセットを利用して、CLIPベースのスコアリング機能PickScoreをトレーニングし、人間の好みを予測するタスクで超人的なパフォーマンスを示す。次に、モデル評価を行うPickScoreの能力を検証し、他の自動評価指標よりも人格との相関が優れていることを観察する。そこで我々は、将来のテキスト・画像生成モデルの評価にPickScoreを使うこと、MS-COCOよりも関連するデータセットとしてPick-a-Picプロンプトを使用することを推奨する。最後に、PickScoreが既存のテキスト・ツー・イメージモデルをどのように強化できるかをランキングで示す。 The ability to collect a large dataset of human preferences from text-to-image users is usually limited to companies, making such datasets inaccessible to the public. To address this issue, we create a web app that enables text-to-image users to generate images and specify their preferences. Using this web app we build Pick-a-Pic, a large, open dataset of text-to-image prompts and real users' preferences over generated images. We leverage this dataset to train a CLIP-based scoring function, PickScore, which exhibits superhuman performance on the task of predicting human preferences. Then, we test PickScore's ability to perform model evaluation and observe that it correlates better with human rankings than other automatic evaluation metrics. Therefore, we recommend using PickScore for evaluating future text-to-image generation models, and using Pick-a-Pic prompts as a more relevant dataset than MS-COCO. Finally, we demonstrate how PickScore can enhance existing text-to-image models via ranking.	翻訳日:2023-11-28 04:15:49 公開日:2023-11-23
# LeCo: シリアル相関学習による軽量圧縮 LeCo: Lightweight Compression via Learning Serial Correlations ( http://arxiv.org/abs/2306.15374v3 ) ライセンス: Link先を確認	Yihao Liu, Xinyu Zeng, Huanchen Zhang	(参考訳) 軽量データ圧縮は、カラムストアが分析クエリのパフォーマンスを向上する鍵となる技術である。シャノンのエントロピーに近づくための辞書ベースのエンコーディングに関する包括的な研究にもかかわらず、圧縮のための列のシリアル相関を体系的に利用した先行研究はほとんどない。本稿では,機械学習を用いて値列の連続冗長性を自動的に除去し,優れた圧縮率と減圧縮性能を同時に達成するフレームワークであるleco(すなわち学習圧縮)を提案する。 LeCoはこの目的に対して一般的なアプローチを示し、既存の(アドホックな)アルゴリズムであるFrame-of-Reference(FOR)、Delta Encoding(Delta Encoding)、Run-Length Encoding(RLE)をフレームワークの下に置く。 3つの合成データと6つの実世界のデータセットを持つマイクロベンチマークは、lecoのプロトタイプが既存のソリューションよりも圧縮比とランダムアクセス速度の両方においてparetoの改善を達成していることを示している。 lecoを広く使われているアプリケーションに統合する場合、arrowカラム型実行エンジンのデータ分析クエリで最大5.2倍のスピードアップとrocksdbのスループットが16%向上しました。 Lightweight data compression is a key technique that allows column stores to exhibit superior performance for analytical queries. Despite a comprehensive study on dictionary-based encodings to approach Shannon's entropy, few prior works have systematically exploited the serial correlation in a column for compression. In this paper, we propose LeCo (i.e., Learned Compression), a framework that uses machine learning to remove the serial redundancy in a value sequence automatically to achieve an outstanding compression ratio and decompression performance simultaneously. LeCo presents a general approach to this end, making existing (ad-hoc) algorithms such as Frame-of-Reference (FOR), Delta Encoding, and Run-Length Encoding (RLE) special cases under our framework. Our microbenchmark with three synthetic and six real-world data sets shows that a prototype of LeCo achieves a Pareto improvement on both compression ratio and random access speed over the existing solutions. When integrating LeCo into widely-used applications, we observe up to 5.2x speed up in a data analytical query in the Arrow columnar execution engine and a 16% increase in RocksDB's throughput.	翻訳日:2023-11-28 04:08:09 公開日:2023-11-23
# ブロックチェーンによるフェデレーション学習 - リファレンスアーキテクチャ設計、実装、検証 Blockchain-Enabled Federated Learning: A Reference Architecture Design, Implementation, and Verification ( http://arxiv.org/abs/2306.10841v3 ) ライセンス: Link先を確認	Eunsu Goh, Dae-Yeol Kim, Kwangkee Lee, Suyeong Oh, Jong-Eui Chae, Do-Yup Kim	(参考訳) 本稿では,ブロックチェーン可能な連合学習(bcfl)のための新たなリファレンスアーキテクチャを提案する。このアプローチは,連合学習とブロックチェーン技術の強みを融合させるものである。我々は,スマートコントラクト機能,利害関係者とその役割を定義し,惑星間ファイルシステム(ipfs)をbcfの重要なコンポーネントとして使用し,包括的な分析を行う。従来の集中型フェデレーション学習では、各ラウンド毎のローカルノードの選択と学習結果の収集は、中央サーバの制御の下でマージされる。対照的にBCFLでは、これらのプロセスはすべて監視され、スマートコントラクトを通じて管理されます。さらに,クロスデバイスとクロスサイロ連合学習シナリオの両方をサポートする拡張アーキテクチャを提案する。さらに,実際のEthereum開発環境におけるアーキテクチャの実装と検証を行う。私たちのBCFL参照アーキテクチャは柔軟性と拡張性を提供し、特定の要件やユースケースに応じて様々な追加要素を統合することで、広範囲のBCFLアプリケーションに適応可能なソリューションになります。拡張性の顕著な例として、did(decentralized identifiers)がbcflで実用的利用を導入するための認証手法として採用されている。この研究は、研究と実践的な展開の間に重要なギャップを埋めるだけでなく、BCFLの領域における将来の探査の基盤となる。この研究の重要な貢献は、現実的なBCFL参照アーキテクチャの実装と検証の成功である。私たちは近いうちにソースコードを公開し、コミュニティ内のさらなる進歩と適応を促進するつもりです。 This paper presents a novel reference architecture for blockchain-enabled federated learning (BCFL), a state-of-the-art approach that amalgamates the strengths of federated learning and blockchain technology.We define smart contract functions, stakeholders and their roles, and the use of interplanetary file system (IPFS) as key components of BCFL and conduct a comprehensive analysis. In traditional centralized federated learning, the selection of local nodes and the collection of learning results for each round are merged under the control of a central server. In contrast, in BCFL, all these processes are monitored and managed via smart contracts. Additionally, we propose an extension architecture to support both crossdevice and cross-silo federated learning scenarios. Furthermore, we implement and verify the architecture in a practical real-world Ethereum development environment. Our BCFL reference architecture provides significant flexibility and extensibility, accommodating the integration of various additional elements, as per specific requirements and use cases, thereby rendering it an adaptable solution for a wide range of BCFL applications. As a prominent example of extensibility, decentralized identifiers (DIDs) have been employed as an authentication method to introduce practical utilization within BCFL. This study not only bridges a crucial gap between research and practical deployment but also lays a solid foundation for future explorations in the realm of BCFL. The pivotal contribution of this study is the successful implementation and verification of a realistic BCFL reference architecture. We intend to make the source code publicly accessible shortly, fostering further advancements and adaptations within the community.	翻訳日:2023-11-28 04:06:47 公開日:2023-11-23
# MARBLE:ユニバーサル評価のための音楽オーディオ表現ベンチマーク MARBLE: Music Audio Representation Benchmark for Universal Evaluation ( http://arxiv.org/abs/2306.10548v4 ) ライセンス: Link先を確認	Ruibin Yuan, Yinghao Ma, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Le Zhuo, Yiqi Liu, Jiawen Huang, Zeyue Tian, Binyue Deng, Ningzhi Wang, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Roger Dannenberg, Wenhu Chen, Gus Xia, Wei Xue, Si Liu, Shi Wang, Ruibo Liu, Yike Guo, Jie Fu	(参考訳) 画像生成やフィクションの共創など、芸術と人工知能(AI)の広範な交差の時代において、音楽のためのAIは、特に音楽の理解において比較的初期段階にある。これは、深い音楽表現に関する限られた作業、大規模データセットの不足、普遍的でコミュニティ主導のベンチマークの欠如によって明らかである。この問題に対処するため,MARBLEと呼ばれるUniversaL評価のためのMusic Audio Representation Benchmarkを導入する。音響、パフォーマンス、スコア、ハイレベル記述を含む4つの階層レベルを持つ包括的分類を定義することで、様々な音楽情報検索(MIR)タスクのベンチマークを提供する。次に,8つの公開データセット上で14のタスクに基づく統一プロトコルを構築し,音楽録音をベースラインとして開発したオープンソース事前学習モデルの表現を公平かつ標準的に評価する。さらに、MARBLEは、データセットの著作権問題に関する明確な声明とともに、使いやすく、拡張可能で、再現可能なスイートをコミュニティに提供する。その結果、近年提案されている大規模事前学習型言語モデルは、多くのタスクにおいて最善を尽くし、さらなる改善の余地があることがわかった。 leaderboardと toolkitリポジトリは、将来の音楽ai研究を促進するためにhttps://marble-bm.shef.ac.ukで公開されている。 In the era of extensive intersection between art and Artificial Intelligence (AI), such as image generation and fiction co-creation, AI for music remains relatively nascent, particularly in music understanding. This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark. To address this issue, we introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description. We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines. Besides, MARBLE offers an easy-to-use, extendable, and reproducible suite for the community, with a clear statement on copyright issues on datasets. Results suggest recently proposed large-scale pre-trained musical language models perform the best in most tasks, with room for further improvement. The leaderboard and toolkit repository are published at https://marble-bm.shef.ac.uk to promote future music AI research.	翻訳日:2023-11-28 04:06:22 公開日:2023-11-23
# 平均化」による不均一時系列予測の改善と食料需要予測への応用 Improving Forecasts for Heterogeneous Time Series by "Averaging", with Application to Food Demand Forecast ( http://arxiv.org/abs/2306.07119v2 ) ライセンス: Link先を確認	Lukas Neubauer, Peter Filzmoser	(参考訳) 実世界のアプリケーションにおける一般的な予測設定は、同一領域のおそらく異種時系列の集合を考える。長さなどの各時系列の特性が異なるため、各時系列の予測を直進的に得ることは困難である。本稿では,k-ネアレスト近傍の近傍に類似する時系列を探索するために,動的時間ウォーピングにおける類似度尺度を用いた一般的な枠組みを提案し,平均化による簡易モデルの予測を改善する。平均化を行ういくつかの方法が提案され、理論的議論は平均化が予測に有用であることを示す。さらに、診断ツールの提案により、手順の深い理解が可能になる。 A common forecasting setting in real world applications considers a set of possibly heterogeneous time series of the same domain. Due to different properties of each time series such as length, obtaining forecasts for each individual time series in a straight-forward way is challenging. This paper proposes a general framework utilizing a similarity measure in Dynamic Time Warping to find similar time series to build neighborhoods in a k-Nearest Neighbor fashion, and improve forecasts of possibly simple models by averaging. Several ways of performing the averaging are suggested, and theoretical arguments underline the usefulness of averaging for forecasting. Additionally, diagnostics tools are proposed allowing a deep understanding of the procedure.	翻訳日:2023-11-28 04:05:31 公開日:2023-11-23
# drive-bath interplayのデコード:超伝導向上のための指針 Decoding the drive-bath interplay: A guideline to enhance superconductivity ( http://arxiv.org/abs/2306.02861v2 ) ライセンス: Link先を確認	Rui Lin, Aline Ramires, R. Chitra	(参考訳) 駆動散逸物理学は量子光学の核にある。しかし、駆動量子多体系とその環境との完全な相互作用は、固体領域では比較的解明されていない。本研究では, 駆動型超伝導体の具体例に基づいて, 一般に採用されているストロボスコピック・ハミルトニアン・ピクチャーを超えて, この相互作用を検証した。シャーリー・フロケットとケルディッシュの定式化と、駆動されたケースに対する超伝導適合性の概念の一般化を用いて、超伝導ギャップ演算子と反共役する駆動が、熱浴の観点からスペクトル関数の異常な粒子ホール構造を一般化することを示した。基礎となる相互作用の固有遮断周波数とほぼ共振する駆動周波数と相まって、このスペクトル構造を利用して超伝導遷移温度を高めることができる。我々の研究は、固体系における物質のエキゾチック相の駆動散逸工学のさらなる研究の道を開く。 Driven-dissipative physics lie at the core of quantum optics. However, the full interplay between a driven quantum many-body system and its environment remains relatively unexplored in the solid state realm. In this work, we inspect this interplay beyond the commonly employed stroboscopic Hamiltonian picture based on the specific example of a driven superconductor. Using the Shirley-Floquet and Keldysh formalisms as well as a generalization of the notion of superconducting fitness to the driven case, we show how a drive which anti-commutes with the superconducting gap operator generically induces an unusual particle-hole structure in the spectral functions from the perspective of the thermal bath. Concomitant with a driving frequency which is near resonant with the intrinsic cutoff frequency of the underlying interaction, this spectral structure can be harnessed to enhance the superconducting transition temperature. Our work paves the way for further studies for driven-dissipative engineering of exotic phases of matter in solid-state systems.	翻訳日:2023-11-28 04:04:33 公開日:2023-11-23
# 量子力学による固有エネルギー推定--統一ノイズレジリエント測定駆動アプローチ Estimating Eigenenergies from Quantum Dynamics: A Unified Noise-Resilient Measurement-Driven Approach ( http://arxiv.org/abs/2306.01858v3 ) ライセンス: Link先を確認	Yizhi Shen, Daan Camps, Aaron Szasz, Siva Darbha, Katherine Klymko, David B. Williams--Young, Norm M. Tubman, Roel Van Beeumen	(参考訳) 物理、化学、材料科学における基底状態エネルギーの推定は、量子コンピューティングの最も有望な応用の1つである。本稿では,動的モード分解機構(dmd)を用いて,実時間計測値の収集と後処理を行い,固有値を求める新しいハイブリッド手法を提案する。量子力学の観点からは、量子多体系から利用可能な可観測の関数空間上の安定な変分法として、このアプローチを形式的に理解できることを確かめる。また,本手法が摂動雑音の存在下においても急速に収束することを示す理論的・数値的な証拠も提供し,様々な科学コミュニティで独立に開発された頑健な行列分解法に対する同型性を示す。スピン系および分子系に関する数値ベンチマークにより,最先端アルゴリズムに対する収束の促進と資源削減が示された。 DMD中心の戦略は、ノイズを系統的に緩和し、主要なハイブリッド量子古典的固有解法として際立っている。 Ground state energy estimation in physical, chemical, and materials sciences is one of the most promising applications of quantum computing. In this work, we introduce a new hybrid approach that finds the eigenenergies by collecting real-time measurements and post-processing them using the machinery of dynamic mode decomposition (DMD). From the perspective of quantum dynamics, we establish that our approach can be formally understood as a stable variational method on the function space of observables available from a quantum many-body system. We also provide strong theoretical and numerical evidence that our method converges rapidly even in the presence of a large degree of perturbative noise, and show that the method bears an isomorphism to robust matrix factorization methods developed independently across various scientific communities. Our numerical benchmarks on spin and molecular systems demonstrate an accelerated convergence and a favorable resource reduction over state-of-the-art algorithms. The DMD-centric strategy can systematically mitigate noise and stands out as a leading hybrid quantum-classical eigensolver.	翻訳日:2023-11-28 04:04:17 公開日:2023-11-23
# XTransCT:2本の直交X線投影による超高速CT再構成によるトランスフォーマーネットワークを用いた画像誘導放射線治療 XTransCT: Ultra-Fast Volumetric CT Reconstruction using Two Orthogonal X-Ray Projections for Image-guided Radiation Therapy via a Transformer Network ( http://arxiv.org/abs/2305.19621v2 ) ライセンス: Link先を確認	Chulong Zhang, Lin Liu, Jingjing Dai, Xuan Liu, Wenfeng He, Yinping Chan, Yaoqin Xie, Feng Chi, and Xiaokun Liang	(参考訳) CTスキャンは、患者の内臓の詳細な三次元的表現を提供する。しかし、従来のCT再構成技術は、体の完全な回転スキャンを通して数百から数千のX線投影を取得する必要があり、手術中のナビゲーションや位置決めは不可能である。超疎x線投影をct画像に再構成する画像誘導放射線療法では,放射線線量を大幅に削減し,測位やナビゲーションの機器負担を最小限に抑えることができる。本研究では,2次元X線画像からのCT画像のリアルタイム再構成を容易にするために,XTransCTと呼ばれる新しいトランスフォーマーアーキテクチャを提案する。病院から提供された50件の患者データセットと、数千件の患者を対象とするより大きなLIDC-IDRIを用いて、画像品質と構造信頼性に関するアプローチを評価する。さらに,LNDbデータセット上でのアルゴリズムの一般化性を検証した。本研究は, 画像品質, 構造精度, 一般化可能性において, アルゴリズムが他の手法を上回ることを示す。さらに,従来の3次元畳み込み法と比較して,約300%の速度向上を示し,約44msの3次元画像再構成を達成した。 Computed tomography (CT) scans offer a detailed, three-dimensional representation of patients' internal organs. However, conventional CT reconstruction techniques necessitate acquiring hundreds or thousands of x-ray projections through a complete rotational scan of the body, making navigation or positioning during surgery infeasible. In image-guided radiation therapy, a method that reconstructs ultra-sparse X-ray projections into CT images, we can exploit the substantially reduced radiation dose and minimize equipment burden for localization and navigation. In this study, we introduce a novel Transformer architecture, termed XTransCT, devised to facilitate real-time reconstruction of CT images from two-dimensional X-ray images. We assess our approach regarding image quality and structural reliability using a dataset of fifty patients, supplied by a hospital, as well as the larger public dataset LIDC-IDRI, which encompasses thousands of patients. Additionally, we validated our algorithm's generalizability on the LNDb dataset. Our findings indicate that our algorithm surpasses other methods in image quality, structural precision, and generalizability. Moreover, in comparison to previous 3D convolution-based approaches, we note a substantial speed increase of approximately 300 %, achieving 44 ms per 3D image reconstruction.	翻訳日:2023-11-28 04:04:00 公開日:2023-11-23
# 無損失可視化を用いた分類・混合データの説明可能な機械学習 Explainable Machine Learning for Categorical and Mixed Data with Lossless Visualization ( http://arxiv.org/abs/2305.18437v3 ) ライセンス: Link先を確認	Boris Kovalerchuk, Elijah McCoy	(参考訳) 不均一/混合データのための正確で解釈可能な機械学習(ML)モデルの構築は、数値データ用に設計されたアルゴリズムの長年にわたる課題である。この研究は、正確で説明可能なMLモデルをサポートするMLアルゴリズムの非数値属性のための数値符号化スキーム、これらの視覚化における視覚的ルール発見を伴うn-D非数値分類データの無意味な可視化方法、そして分類データのための正確で説明可能なMLモデルの開発に焦点を当てる。本研究では、混合データ型を分類し、機械学習におけるそれらの重要な役割を分析する。混合データ上での視覚的データ探索により、MLアルゴリズムのすべての内部操作の解釈可能性を高めるツールキットを提供する。カテゴリーデータを用いた説明可能なルール生成のための新しい逐次ルール生成(SRG)アルゴリズムを提案し,複数の計算実験で評価した。この研究は、Parallel Coordinatesを超えたGeneral Line Coordinatesにおけるn-Dデータのロスレス可視化をサポートする混合データのための全スコープMLアルゴリズムのステップの1つである。 Building accurate and interpretable Machine Learning (ML) models for heterogeneous/mixed data is a long-standing challenge for algorithms designed for numeric data. This work focuses on developing numeric coding schemes for non-numeric attributes for ML algorithms to support accurate and explainable ML models, methods for lossless visualization of n-D non-numeric categorical data with visual rule discovery in these visualizations, and accurate and explainable ML models for categorical data. This study proposes a classification of mixed data types and analyzes their important role in Machine Learning. It presents a toolkit for enforcing interpretability of all internal operations of ML algorithms on mixed data with a visual data exploration on mixed data. A new Sequential Rule Generation (SRG) algorithm for explainable rule generation with categorical data is proposed and successfully evaluated in multiple computational experiments. This work is one of the steps to the full scope ML algorithms for mixed data supported by lossless visualization of n-D data in General Line Coordinates beyond Parallel Coordinates.	翻訳日:2023-11-28 04:02:48 公開日:2023-11-23
# 多成分ボソニック系の記述における正準アンサンブル対大正準アンサンブル Canonical Ensemble vs. Grand Canonical Ensemble in the Description of Multicomponent Bosonic Systems ( http://arxiv.org/abs/2307.08518v2 ) ライセンス: Link先を確認	D. Anchishkin, V. Gnatovskyy, D. Zhuravel, V. Karpenko, I. Mishustin, and H. Stoecker	(参考訳) ボース・アインシュタイン凝縮体の存在下でのボゾン粒子と反粒子の相互作用系の熱力学は、スカイムのような平均場モデルの枠組みで研究されている。全電荷密度(アイソスピン密度)は全温度で保存されていると仮定される。 2つのケースが明確に考慮されている: 系のゼロと非ゼロのアイソスピン電荷。カノニカル・アンサンブルとグランド・カノニカル・アンサンブルを用いて比較分析を行う。大正準アンサンブルは凝縮物の存在下で粒子と反粒子のボソニック系を記述するのに適していないことが示されている。 The thermodynamics of a system of interacting bosonic particles and antiparticles in the presence of the Bose-Einstein condensate is studied in the framework of the Skyrme-like mean-field model. It is assumed that the total charge density (isospin density) is conserved at all temperatures. Two cases are explicitly considered: zero and nonzero isospin charge of the system. A comparative analysis is carried out using Canonical Ensemble and Grand Canonical Ensemble. It is shown that the Grand Canonical Ensemble is not suitable for describing bosonic systems of particles and antiparticles in the presence of condensate.	翻訳日:2023-11-28 03:53:19 公開日:2023-11-23
# 社会AI学派 : 発達心理学から社会・文化エージェントへ The SocialAI School: Insights from Developmental Psychology Towards Artificial Socio-Cultural Agents ( http://arxiv.org/abs/2307.07871v2 ) ライセンス: Link先を確認	Grgur Kova\v{c}, R\'emy Portelas, Peter Ford Dominey, Pierre-Yves Oudeyer	(参考訳) 発達心理学者は、人間の知性における社会認知能力の重要性を長い間確立してきた。これらの能力により、私たちは人間の文化に入り、参加し、利益を得ることができます。社会対話エージェントに関するAI研究は、主にマルチエージェント環境での文化の出現を懸念している(しばしば発達心理学の基盤が強くない)。我々は、AI研究は心理学から知らされ、文化への参入を可能にする社会認知能力を研究するべきだと論じている。我々は、michael tomasello と jerome bruner の理論を議論し、彼らの概念のいくつかをaiに導入し、重要な概念と社会認知能力の概要を説明する。 The SocialAI School - 手続き的に生成された環境のカスタマイズ可能なパラメータ化uiteを含むツールで、それらの概念に関する実験を単純化する。 rlエージェントと大規模言語モデルを用いた実験の例を示す。この研究の主な動機は、発達心理学から情報を得た社会知能の問題に関わるAIコミュニティへの取り組みと、この方向への第一歩を単純化するためのツールの提供である。コードと追加情報についてはプロジェクトのWebサイトを参照してください。 Developmental psychologists have long-established the importance of socio-cognitive abilities in human intelligence. These abilities enable us to enter, participate and benefit from human culture. AI research on social interactive agents mostly concerns the emergence of culture in a multi-agent setting (often without a strong grounding in developmental psychology). We argue that AI research should be informed by psychology and study socio-cognitive abilities enabling to enter a culture too. We discuss the theories of Michael Tomasello and Jerome Bruner to introduce some of their concepts to AI and outline key concepts and socio-cognitive abilities. We present The SocialAI school - a tool including a customizable parameterized uite of procedurally generated environments, which simplifies conducting experiments regarding those concepts. We show examples of such experiments with RL agents and Large Language Models. The main motivation of this work is to engage the AI community around the problem of social intelligence informed by developmental psychology, and to provide a tool to simplify first steps in this direction. Refer to the project website for code and additional information: https://sites.google.com/view/socialai-school.	翻訳日:2023-11-28 03:53:09 公開日:2023-11-23
# 大規模言語モデルの包括的概要 A Comprehensive Overview of Large Language Models ( http://arxiv.org/abs/2307.06435v6 ) ライセンス: Link先を確認	Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, Ajmal Mian	(参考訳) 大規模言語モデル(LLM)は、最近自然言語処理タスクなどにおいて顕著な機能を示した。 LLMの成功は、この方向に多くの研究貢献をもたらした。これらの作業は、アーキテクチャの革新、より良いトレーニング戦略、コンテキスト長の改善、微調整、マルチモーダルllm、ロボティクス、データセット、ベンチマーク、効率など、さまざまなトピックをカバーする。 LLM研究における技術の急速な発展と定期的なブレークスルーにより、この方向の進歩の全体像を理解することは極めて困難になっている。 LLMに関する文献が急速に増えていることを考えると、研究コミュニティは、この分野の最近の発展の簡潔かつ包括的概要から恩恵を受けることができることが不可欠である。本稿では, LLM関連概念の幅広い範囲について, 既存の文献について概説する。 LLM研究の最前線における先進的なトピックを取り上げ,その背景概念について概観した。このレビュー記事は、体系的な調査だけでなく、研究者や実践者が既存の研究の広範な情報的要約から洞察を引き出し、LLM研究を前進させることも意図している。 Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks and beyond. This success of LLMs has led to a large influx of research contributions in this direction. These works encompass diverse topics such as architectural innovations, better training strategies, context length improvements, fine-tuning, multi-modal LLMs, robotics, datasets, benchmarking, efficiency, and more. With the rapid development of techniques and regular breakthroughs in LLM research, it has become considerably challenging to perceive the bigger picture of the advances in this direction. Considering the rapidly emerging plethora of literature on LLMs, it is imperative that the research community is able to benefit from a concise yet comprehensive overview of the recent developments in this field. This article provides an overview of the existing literature on a broad range of LLM-related concepts. Our self-contained comprehensive overview of LLMs discusses relevant background concepts along with covering the advanced topics at the frontier of research in LLMs. This review article is intended to not only provide a systematic survey but also a quick comprehensive reference for the researchers and practitioners to draw insights from extensive informative summaries of the existing works to advance the LLM research.	翻訳日:2023-11-28 03:52:22 公開日:2023-11-23
# 平衡から外れた位相的近藤模型 The topological Kondo model out of equilibrium ( http://arxiv.org/abs/2307.03773v2 ) ライセンス: Link先を確認	Matteo M. Wauters, Chia-Min Chung, Lorenzo Maffi, Michele Burrello	(参考訳) 位相的近藤効果はマヨラナモードの非局所性の真の顕現である。それらのトポロジカルモードを4つホストするクーパーペアボックスを用いたモデルにおいて, それぞれが金属鉛に結合した平衡外シグネチャについて検討する。超伝導体の力学を研究するために調整された高度なマトリックス-生成物-状態アプローチにより、マヨラナ磁化の緩和をシミュレートし、関連する近藤温度を判定し、鉛電圧の量子クエンチ後の電気輸送の開始を分析する。本研究は, 二重ナノワイヤで作製したMajorana Cooper-pairボックスに適用し, 弱結合状態から強相関の近藤政権への交叉の非摂動的証拠を提供する。後者は超伝導電荷縮退点で支配的であり、期待される普遍分数ゼロバイアスコンダクタンスを表示する。 The topological Kondo effect is a genuine manifestation of the nonlocality of Majorana modes. We investigate its out-of-equilibrium signatures in a model with a Cooper-pair box hosting four of these topological modes, each connected to a metallic lead. Through an advanced matrix-product-state approach tailored to study the dynamics of superconductors, we simulate the relaxation of the Majorana magnetization, which allows us to determine the related Kondo temperature, and we analyze the onset of electric transport after a quantum quench of a lead voltage. Our results apply to Majorana Cooper-pair boxes fabricated in double nanowire devices and provide nonperturbative evidence of the crossover from weak-coupling states to the strongly correlated topological Kondo regime. The latter dominates at the superconductor charge degeneracy points and displays the expected universal fractional zero-bias conductance.	翻訳日:2023-11-28 03:51:55 公開日:2023-11-23
# 等変フローマッチング Equivariant flow matching ( http://arxiv.org/abs/2306.15030v2 ) ライセンス: Link先を確認	Leon Klein, Andreas Kr\"amer, Frank No\'e	(参考訳) 正規化フロー(英: normalizing flow)は、物理学における確率分布のモデル化において特に興味深い深層生成モデルの一種であり、流れの正確な可能性によって既知の対象エネルギー関数への重み付けと偏りのない観測可能性の計算が可能になる。例えば、ボルツマン発生器は、小さな分子やタンパク質のような多体系の平衡サンプルを生成するためのトレーニングフローによって、統計物理学における長期間にわたるサンプリング問題に取り組む。このようなシステムに対して効果的なモデルを構築するためには、同変連続正規化フロー(CNF)によって達成される対象エネルギーの対称性をモデルに組み込むことが重要である。しかし、cnfはトレーニングやサンプル生成に計算コストがかかるため、スケーラビリティや実用的応用を妨げている。本稿では,最近提案された最適輸送流マッチングに基づく同変CNFの新しいトレーニング目標である同変フローマッチングを提案する。等変流マッチングは、標的エネルギーの物理対称性を利用して、同変CNFの効率的でシミュレーションなしな訓練を行う。本稿では, 回転および置換不変多粒子系および小分子アラニンジペプチドに対するフローマッチングの有効性を実証する。この結果から,同変フローマッチングの対象は,従来の手法に比べて,より短い積分経路,サンプリング効率の向上,スケーラビリティの向上を図っている。 Normalizing flows are a class of deep generative models that are especially interesting for modeling probability distributions in physics, where the exact likelihood of flows allows reweighting to known target energy functions and computing unbiased observables. For instance, Boltzmann generators tackle the long-standing sampling problem in statistical physics by training flows to produce equilibrium samples of many-body systems such as small molecules and proteins. To build effective models for such systems, it is crucial to incorporate the symmetries of the target energy into the model, which can be achieved by equivariant continuous normalizing flows (CNFs). However, CNFs can be computationally expensive to train and generate samples from, which has hampered their scalability and practical application. In this paper, we introduce equivariant flow matching, a new training objective for equivariant CNFs that is based on the recently proposed optimal transport flow matching. Equivariant flow matching exploits the physical symmetries of the target energy for efficient, simulation-free training of equivariant CNFs. We demonstrate the effectiveness of flow matching on rotation and permutation invariant many-particle systems and a small molecule, alanine dipeptide, where for the first time we obtain a Boltzmann generator with significant sampling efficiency without relying on tailored internal coordinate featurization. Our results show that the equivariant flow matching objective yields flows with shorter integration paths, improved sampling efficiency, and higher scalability compared to existing methods.	翻訳日:2023-11-28 03:51:38 公開日:2023-11-23
# ディック状態のエントロピー円錐と絡み合い進化 Entropy Cones and Entanglement Evolution for Dicke States ( http://arxiv.org/abs/2306.13146v2 ) ライセンス: Link先を確認	William Munizzi, Howard J. Schnitzer	(参考訳) N$-qubit Dicke state $\|D^N_k\rangle$, of Hamming-weight $k$は量子アルゴリズムの最適化において重要な役割を果たす絡み合った状態のクラスである。ディッケ状態における絡み合いエントロピーの一般計算を行い, \|d^n_k\rangle$エントロピー円錐を記述する。我々は、すべての$\|D^N_k\rangle$エントロピーベクトルが対称化されることを示し、これを用いて、$\|D^N_k\rangle$エントロピーベクトルを実現するスターグラフ上のmin-cutプロトコルを定義する。すべての$\|D^N_k\rangle$に対する安定化群を、$N$-qubit Pauli群と2-qubit Clifford群の作用の下で同定し、$\|D^N_k\rangle$リーチビリティグラフを構成する。これらの到達可能性グラフを用いて、クリフォード回路における$\|d^n_k\rangle$エントロピーベクトルの進化を解析・束縛する。 The $N$-qubit Dicke states $\|D^N_k\rangle$, of Hamming-weight $k$, are a class of entangled states which play an important role in quantum algorithm optimization. We present a general calculation of entanglement entropy in Dicke states, which we use to describe the $\|D^N_k\rangle$ entropy cone. We demonstrate that all $\|D^N_k\rangle$ entropy vectors emerge symmetrized, and use this to define a min-cut protocol on star graphs which realizes $\|D^N_k\rangle$ entropy vectors. We identify the stabilizer group for all $\|D^N_k\rangle$, under the action of the $N$-qubit Pauli group and two-qubit Clifford group, which we use to construct $\|D^N_k\rangle$ reachability graphs. We use these reachability graphs to analyze and bound the evolution of $\|D^N_k\rangle$ entropy vectors in Clifford circuits.	翻訳日:2023-11-28 03:50:28 公開日:2023-11-23
# Karasu: ビッグデータ分析のための効率的なクラスタ構成のためのコラボレーションアプローチ Karasu: A Collaborative Approach to Efficient Cluster Configuration for Big Data Analytics ( http://arxiv.org/abs/2308.11792v2 ) ライセンス: Link先を確認	Dominik Scheinert, Philipp Wiesner, Thorsten Wittkopp, Lauritz Thamsen, Jonathan Will, and Odej Kao	(参考訳) マシンタイプやクラスタサイズなど、さまざまな設定オプションがあるため、ビッグデータ分析ジョブの適切なリソースの選択は困難です。選択不足が資源効率、コスト、エネルギー利用に重大な影響を与えるため、自動化アプローチが人気を集めています。既存のメソッドのほとんどは、時間とともに最適に近いソリューションを見つけるために、繰り返し発生するワークロードのプロファイリングに依存している。コールドスタートの問題のため、これはしばしば長くコストのかかるプロファイリングフェーズにつながる。しかし、ユーザ間のビッグデータ分析ジョブは、多くの共通プロパティを共有することができる。集約されたプロファイリングを共有する可能性は、コールドスタート問題に対処するために協調的に実行されます。 Karasuは、同様のインフラストラクチャ、フレームワーク、アルゴリズム、データセットを扱うユーザ間のデータ共有を促進する、より効率的なリソース構成プロファイリングのアプローチである。 karasuはコラボレータの集約ランタイム情報を使用して軽量なパフォーマンスモデルをトレーニングし、それらをアンサンブルメソッドに組み合わせ、構成検索空間の固有の知識を利用する。さらに、カラスでは複数の目的を同時に最適化できる。評価は,パブリッククラウド環境における多様なワークロード実行のパフォーマンスデータに基づく。対象のジョブに共通する部分的な特徴のみを共有するプロファイリングの実行がほとんどない場合でも,カラスではパフォーマンス,検索時間,コストの観点から既存手法を大幅に向上できることを示す。 Selecting the right resources for big data analytics jobs is hard because of the wide variety of configuration options like machine type and cluster size. As poor choices can have a significant impact on resource efficiency, cost, and energy usage, automated approaches are gaining popularity. Most existing methods rely on profiling recurring workloads to find near-optimal solutions over time. Due to the cold-start problem, this often leads to lengthy and costly profiling phases. However, big data analytics jobs across users can share many common properties: they often operate on similar infrastructure, using similar algorithms implemented in similar frameworks. The potential in sharing aggregated profiling runs to collaboratively address the cold start problem is largely unexplored. We present Karasu, an approach to more efficient resource configuration profiling that promotes data sharing among users working with similar infrastructures, frameworks, algorithms, or datasets. Karasu trains lightweight performance models using aggregated runtime information of collaborators and combines them into an ensemble method to exploit inherent knowledge of the configuration search space. Moreover, Karasu allows the optimization of multiple objectives simultaneously. Our evaluation is based on performance data from diverse workload executions in a public cloud environment. We show that Karasu is able to significantly boost existing methods in terms of performance, search time, and cost, even when few comparable profiling runs are available that share only partial common characteristics with the target job.	翻訳日:2023-11-28 03:43:11 公開日:2023-11-23
# 変分量子ビット効率maxcutヒューリスティックアルゴリズム A Variational Qubit-Efficient MaxCut Heuristic Algorithm ( http://arxiv.org/abs/2308.10383v2 ) ライセンス: Link先を確認	Yovav Tene-Cohen, Tomer Kelman, Ohad Lev, and Adi Makmal	(参考訳) MaxCutは、Isingモデルやチップ設計を含む広範な理論および工業的応用を持つNP-Hard組合せ最適化グラフ問題である。量子コンピューティングは、古典的なスキームよりも優れた組合せ問題に対する新しい解決策を提供する一方で、Quantum Approximate Optimization Algorithm (QAOA)は最先端の例であるが、その性能は現在、ハードウェアノイズと限定量子ビット数によって妨げられている。本稿では,QAOAと比較して指数的に削減されたグラフサイズに対して,量子ビットの対数を必要とする変分Qubit-Efficient MaxCut (QEMC)アルゴリズムを提案する。実超伝導ハードウェア上で最大32ノード (5 qubits) のグラフインスタンスに対して,ノイズレスシミュレーションを用いた最大2048ノード(11 qubits) のグラフに対して,Guemans and Williamson (GW) の確立した古典的アルゴリズムよりも優れた最先端性能を示す。 qemcアルゴリズムの革新的なエンコーディング方式は、大きなノイズ耐性を持つ一方で、その効率の良い古典的シミュレーションを可能にするため、量子的な利点を損なう。にもかかわらず、量子優位性がなくても、QEMCアルゴリズムは量子に着想を得た潜在的なアルゴリズムとして機能し、QAOAの挑戦的なベンチマークを提供し、他の量子および古典的アルゴリズムに拡張する可能性のある新しいエンコーディングパラダイムを提供する。 MaxCut is a key NP-Hard combinatorial optimization graph problem with extensive theoretical and industrial applications, including the Ising model and chip design. While quantum computing offers new solutions for such combinatorial challenges which are potentially better than classical schemes, with the Quantum Approximate Optimization Algorithm (QAOA) being a state-of-the-art example, its performance is currently hindered by hardware noise and limited qubit number. Here, we present a new variational Qubit-Efficient MaxCut (QEMC) algorithm that requires a logarithmic number of qubits with respect to the graph size, an exponential reduction compared to QAOA. We demonstrate cutting-edge performance for graph instances consisting of up to 32 nodes (5 qubits) on real superconducting hardware, and for graphs with up to 2048 nodes (11 qubits) using noiseless simulations, outperforming the established classical algorithm of Goemans and Williamson (GW). The QEMC algorithm's innovative encoding scheme empowers it with great noise-resiliency on the one hand, but also enables its efficient classical simulation on the other, thus obscuring a distinct quantum advantage. Nevertheless, even in the absence of quantum advantage, the QEMC algorithm serves as a potential quantum-inspired algorithm, provides a challenging benchmark for QAOA, and presents a novel encoding paradigm with potential applications extending to other quantum and classical algorithms.	翻訳日:2023-11-28 03:42:49 公開日:2023-11-23
# EDDense-Net: オプティカルカップとディスクの同時分割のための完全高密度エンコーダデコーダネットワーク EDDense-Net: Fully Dense Encoder Decoder Network for Joint Segmentation of Optic Cup and Disc ( http://arxiv.org/abs/2308.10192v2 ) ライセンス: Link先を確認	Mehwish Mehmood, Khuram Naveed, Khursheed Aurangzeb, Haroon Ahmed Khan, Musaed Alhussein, Syed Saud Naqvi	(参考訳) 緑内障(英: Glaucoma)は、視神経に損傷を与える眼疾患であり、視覚障害と永久盲眼を引き起こす。したがって、早期緑内障検出は永久盲目を避けるために重要である。緑内障の診断には、光ディスク(OD)検査におけるカップ・ツー・ディスク比(CDR)の推定が用いられる。本稿では,OCとODの結合分割のためのEDDense-Netセグメンテーションネットワークを提案する。このネットワークのエンコーダとデコーダは、各ブロックにグループ化された畳み込み層を持つ密ブロックで構成されており、同時にネットワークの複雑さを低減しつつ、画像から空間情報を取得、伝達することができる。空間情報損失を低減するため,全ての畳み込み層におけるフィルタの最適数を利用した。セマンティックセグメンテーションでは、クラス不均衡の問題を軽減するためにデコーダにダイスピクセル分類を用いる。提案するネットワークは2つの公開データセットで評価され、精度と効率の点で既存の最先端手法を上回っていた。緑内障の診断と解析には、医用眼科医を支援するための第2の意見システムとして使用できる。 Glaucoma is an eye disease that causes damage to the optic nerve, which can lead to visual loss and permanent blindness. Early glaucoma detection is therefore critical in order to avoid permanent blindness. The estimation of the cup-to-disc ratio (CDR) during an examination of the optical disc (OD) is used for the diagnosis of glaucoma. In this paper, we present the EDDense-Net segmentation network for the joint segmentation of OC and OD. The encoder and decoder in this network are made up of dense blocks with a grouped convolutional layer in each block, allowing the network to acquire and convey spatial information from the image while simultaneously reducing the network's complexity. To reduce spatial information loss, the optimal number of filters in all convolution layers were utilised. In semantic segmentation, dice pixel classification is employed in the decoder to alleviate the problem of class imbalance. The proposed network was evaluated on two publicly available datasets where it outperformed existing state-of-the-art methods in terms of accuracy and efficiency. For the diagnosis and analysis of glaucoma, this method can be used as a second opinion system to assist medical ophthalmologists.	翻訳日:2023-11-28 03:42:20 公開日:2023-11-23
# 確率フリー仮説テストのためのカーネルベーステスト Kernel-Based Tests for Likelihood-Free Hypothesis Testing ( http://arxiv.org/abs/2308.09043v2 ) ライセンス: Link先を確認	Patrik R\'obert Gerber, Tianze Jiang, Yury Polyanskiy, Rui Sun	(参考訳) 2つの平衡クラスからの$n$の観察を与えられた場合、これら2つのクラスの \emph{one} に属することが分かっている追加の$m$入力をラベル付けるタスクを考える。この問題の特別なケースはよく知られており、クラス分布の完全な知識(n=\infty$)は、確率比テストによって最適に解かれる;$m=1$は二値分類に対応し、$m\approx n$は二値検定と同値である。中間設定は、ラベル付きサンプルを前方シミュレーションにより取得し、ラベルなしサンプルを実験的に収集する確率フリー推論の分野で行われる。最近の研究で、$m$と$n$の間に基本的なトレードオフがあることが判明した。この作品では (a) ラベルのないサンプルが2つのクラスを混ぜ合わせたものであるという一般化を導入すること。 b) <textit{maximum mean discrepancy} (MMD) 分離の下での非パラメトリックな密度のクラスに対するミニマックスサンプル複雑性の研究 (c) ニューラルネットワークでパラメータ化されたカーネルの2つのタスクにおいて, ヒッグス粒子の検出と, CIFAR-10画像中のDDPM生成画像の検出を行う。どちらの問題に対しても、理論的に予測された非対称$m$対$n$トレードオフの存在を確認する。 Given $n$ observations from two balanced classes, consider the task of labeling an additional $m$ inputs that are known to all belong to \emph{one} of the two classes. Special cases of this problem are well-known: with complete knowledge of class distributions ($n=\infty$) the problem is solved optimally by the likelihood-ratio test; when $m=1$ it corresponds to binary classification; and when $m\approx n$ it is equivalent to two-sample testing. The intermediate settings occur in the field of likelihood-free inference, where labeled samples are obtained by running forward simulations and the unlabeled sample is collected experimentally. In recent work it was discovered that there is a fundamental trade-off between $m$ and $n$: increasing the data sample $m$ reduces the amount $n$ of training/simulation data needed. In this work we (a) introduce a generalization where unlabeled samples come from a mixture of the two classes -- a case often encountered in practice; (b) study the minimax sample complexity for non-parametric classes of densities under \textit{maximum mean discrepancy} (MMD) separation; and (c) investigate the empirical performance of kernels parameterized by neural networks on two tasks: detection of the Higgs boson and detection of planted DDPM generated images amidst CIFAR-10 images. For both problems we confirm the existence of the theoretically predicted asymmetric $m$ vs $n$ trade-off.	翻訳日:2023-11-28 03:41:40 公開日:2023-11-23
# lindblad以外の量子ビットダイナミクス:非マルコフ性と回転波近似 Qubit Dynamics beyond Lindblad: Non-Markovianity versus Rotating Wave Approximation ( http://arxiv.org/abs/2308.06029v2 ) ライセンス: Link先を確認	Kiyoto Nakamura, Joachim Ankerhold	(参考訳) 実際の量子ビットデバイスの性能が向上するにつれて、量子ビットと環境自由度の間の相互作用の微妙な効果が徐々に関連し、実験的に見えるようになる。これは特に、従来のリンドブラッド・マスター方程式(LE: Lindblad master equation)、マルコフ近似(Markov approximation)と回転波近似(RWA:Roing Wave approximation)という、キュービット演算に最もよく使用される数値シミュレーションプラットフォームの基礎となる時間スケールの分離に適用される。この貢献で私たちは質問に光を当てた (i)これらの時間スケール分離のいずれの違反を実験的に監視できる程度 (ii)関連するパラメータ範囲における(近似)数値スキーム内の高精度な予測を提供するのに最も厳しいものはどれか。そこで本研究では, 還元密度行列の3つのシミュレーション手法と, 漸進的に増加する精度を比較した。特に,オーミックとサブオーミックのスペクトル密度を持つ貯水池の存在下での量子ビット系の緩和と非一貫性の予測について検討し,ラムゼー実験に基づく適切なプロトコルを用いて,非マルコフ性とrwaの役割を明らかにする。今後の実験への可能性や、近似的かつ正確な数値的アプローチの設計について論じる。 With increasing performance of actual qubit devices, even subtle effects in the interaction between qubits and environmental degrees of freedom become progressively relevant and experimentally visible. This applies particularly to the timescale separations that are at the basis of the most commonly used numerical simulation platform for qubit operations, namely, the conventional Lindblad master equation (LE): the Markov approximation and the rotating wave approximation (RWA). In this contribution we shed light on the questions (i) to which extent it is possible to monitor violations of either of these timescale separations experimentally and (ii) which of them is the most severe to provide highly accurate predictions within (approximate) numerical schemes in relevant parameter ranges. For this purpose, we compare three simulation methods for the reduced density matrix with progressively growing accuracy. In particular, predictions for relaxation and decoherence of a qubit system in presence of reservoirs with Ohmic and sub-Ohmic spectral densities are explored and, with the aid of proper protocols based on Ramsey experiments, the role of non-Markovianity and RWA are revealed. We discuss potential implications for future experiments and the design of approximate yet accurate numerical approaches.	翻訳日:2023-11-28 03:40:44 公開日:2023-11-23
# 非マルコフ量子ゲートセットトモグラフィ Non-Markovian Quantum Gate Set Tomography ( http://arxiv.org/abs/2307.14696v3 ) ライセンス: Link先を確認	Ze-Tong Li, Cong-Cong Zheng, Fan-Xu Meng, Han Zeng, Tian Luan, Zai-Chen Zhang, Xu-Tao Yu	(参考訳) エンジニアリング量子デバイスは、量子ビット、量子演算(計器としても知られる)、量子ノイズを含む、量子システムの信頼性の高いキャラクタリゼーションを必要とする。近年,量子ゲート集合トモグラフィ(gst)は,量子状態,ゲート,測定を自己整合的に記述するための強力な技術として出現している。しかし、量子系と環境の間の非マルコフ相関はGSTの信頼性に影響を与える。これを解決するために,非マルコフGSTのための計器セットトモグラフィー(IST)と呼ばれる自己整合演算フレームワークを提案する。確率的量子過程に基づいて、機器セットは機器とシステム環境(SE)相関を記述する。楽器とSE相関を物理的制約なく記述するための線形反転IST(LIST)を導入する。楽器間の線形関係の不整合を検出する。さらに、マルコフ順序に関するパラメータの多項式数を持つIST(MLE-IST)の最大推定値に基づいて、物理的に制約された統計手法を提案する。 MLE-ISTは、モデルと制約を調整することで、ノイズの多い中間スケール量子(NISQ)デバイスなど、さまざまな種類のデバイスに適応する際の大きな柔軟性を示している。実験結果から,機器とSE相関の同時記述の有効性と必要性が示された。特に、LIST と MLE-IST は、比較法と比較して、-23.77 と -6.21 の順で実装された不完全なシミュレーションにおいて平均2乗誤差の削減を著しく改善する。その結果、ISTは、機器セットの観点で量子デバイスを特徴づけ、ベンチマークし、開発するための本質的で自己整合的なフレームワークを提供する。 Engineering quantum devices requires reliable characterization of the quantum system, including qubits, quantum operations (also known as instruments) and the quantum noise. Recently, quantum gate set tomography (GST) has emerged as a powerful technique for self-consistently describing quantum states, gates, and measurements. However, non-Markovian correlations between the quantum system and environment impact the reliability of GST. To address this, we propose a self-consistent operational framework called instrument set tomography (IST) for non-Markovian GST. Based on the stochastic quantum process, the instrument set describes instruments and system-environment (SE) correlations. We introduce a linear inversion IST (LIST) to describe instruments and SE correlations without physical constraints. The disharmony of linear relationships between instruments is detected. Furthermore, we propose a physically constrained statistical method based on the maximum likelihood estimation for IST (MLE-IST), with a polynomial number of parameters with respect to the Markovian order. MLE-IST shows significant flexibility in adapting to different types of devices, such as noisy intermediate-scale quantum (NISQ) devices, by adjusting the model and constraints. Experimental results demonstrate the effectiveness and necessity of simultaneously describing instruments and SE correlations. Specifically, the LIST and MLE-IST obtains significant improvement on average square error reduction in the imperfect implemented simulations by orders of -23.77 and -6.21, respectively, compared to their comparative methods. Consequently, IST provides an essential and self-consistent framework for characterizing, benchmarking, and developing quantum devices in terms of the instrument set.	翻訳日:2023-11-28 03:38:19 公開日:2023-11-23
# 第1回社会ロボットパーソナライゼーションワークショップの開催にあたって Proceeding of the 1st Workshop on Social Robots Personalisation At the crossroads between engineering and humanities (CONCATENATE) ( http://arxiv.org/abs/2307.12777v2 ) ライセンス: Link先を確認	Imene Tarakli, Georgios Angelopoulos, Mehdi Hellou, Camille Vindolet, Boris Abramovic, Rocco Limongelli, Dimitri Lacroix, Andrea Bertolini, Silvia Rossi, Alessandro Di Nuovo, Angelo Cangelosi, Gordon Cheng	(参考訳) 現在、ロボットはより物理的、認知的、社会的に人と対話することが期待されている。彼らは様々な行動を持つ個人と一緒に予測不能な状況に適応すべきである。そのため、個人化は、特定のユーザのニーズや好みに応じて行動し、人間にとって自然で透明なロボット行動を達成することができるため、社会ロボットにとって貴重な属性である。正しく実装されれば、パーソナライズがソーシャルロボティクスの大規模採用の鍵となるかもしれない。しかし、様々な分野の専門知識を活用してロボット工学の境界を広げる必要があるため、パーソナライゼーションの達成は困難である。実際、パーソナライズされたロボットは、適応プロセスへの関与を考慮してユーザーインタラクションを分析し、モデル化する必要がある。また、個人化されたHRIの倫理的・社会的側面に対処し、包括的かつ多様な相互作用を達成し、ユーザとの対話において詐欺や誤った信頼を避ける必要がある。同時に、政策立案者は短期的かつ長期的適応的HRIの観点から規制を確保する必要がある。本ワークショップは,ロボットのパーソナライゼーションに関する学際的な議論を提起することを目的とする。異なる分野の研究者をまとめてパーソナライズのためのガイドラインを提案し、どのように定義するか、どのように達成するか、法的および倫理的要件に合うようにガイドするかという問題に対処することを目的としている。 Nowadays, robots are expected to interact more physically, cognitively, and socially with people. They should adapt to unpredictable contexts alongside individuals with various behaviours. For this reason, personalisation is a valuable attribute for social robots as it allows them to act according to a specific user's needs and preferences and achieve natural and transparent robot behaviours for humans. If correctly implemented, personalisation could also be the key to the large-scale adoption of social robotics. However, achieving personalisation is arduous as it requires us to expand the boundaries of robotics by taking advantage of the expertise of various domains. Indeed, personalised robots need to analyse and model user interactions while considering their involvement in the adaptative process. It also requires us to address ethical and socio-cultural aspects of personalised HRI to achieve inclusive and diverse interaction and avoid deception and misplaced trust when interacting with the users. At the same time, policymakers need to ensure regulations in view of possible short-term and long-term adaptive HRI. This workshop aims to raise an interdisciplinary discussion on personalisation in robotics. It aims at bringing researchers from different fields together to propose guidelines for personalisation while addressing the following questions: how to define it - how to achieve it - and how it should be guided to fit legal and ethical requirements.	翻訳日:2023-11-28 03:37:40 公開日:2023-11-23
# より効果的な体系的レビューのための自然言語クエリの生成 Generating Natural Language Queries for More Effective Systematic Review Screening Prioritisation ( http://arxiv.org/abs/2309.05238v3 ) ライセンス: Link先を確認	Shuai Wang, Harrisen Scells, Martin Potthast, Bevan Koopman, Guido Zuccon	(参考訳) 医学的体系的レビューにおける優先順位付けは、複雑なブールクエリによって検索された文書の集合をランク付けすることを目的としている。最も重要な文書の優先順位付けにより、その後のレビュー手順をより効率的に効果的に行うことができる。現在の技術状況では、レビューの最終タイトルをクエリとして、BERTベースのニューラルランキングを使用して文書をランク付けする。しかし、最終タイトルはレビュープロセスの終了時にのみ定式化されるため、このアプローチはポストファクト情報に依存しているため、現実的ではない。スクリーニングの時点では、最終タイトルよりもbertベースのランク付けが著しく悪くなるような大雑把な作業タイトルしか提供されていない。本稿では,ChatGPT や Alpaca のような命令ベースで生成した大規模言語モデルによって生成される文書の検索に使用される Boolean クエリなど,スクリーニングを優先するクエリの代替源を検討する。私たちのベストアプローチは、スクリーニング時に利用可能な情報に基づいて実現されるだけでなく、最終タイトルと同じような効果があります。 Screening prioritisation in medical systematic reviews aims to rank the set of documents retrieved by complex Boolean queries. Prioritising the most important documents ensures that subsequent review steps can be carried out more efficiently and effectively. The current state of the art uses the final title of the review as a query to rank the documents using BERT-based neural rankers. However, the final title is only formulated at the end of the review process, which makes this approach impractical as it relies on ex post facto information. At the time of screening, only a rough working title is available, with which the BERT-based ranker performs significantly worse than with the final title. In this paper, we explore alternative sources of queries for prioritising screening, such as the Boolean query used to retrieve the documents to be screened and queries generated by instruction-based generative large-scale language models such as ChatGPT and Alpaca. Our best approach is not only viable based on the information available at the time of screening, but also has similar effectiveness to the final title.	翻訳日:2023-11-28 03:28:50 公開日:2023-11-23
# SAM3D: ボリューム医療画像におけるセグメンテーションモデル SAM3D: Segment Anything Model in Volumetric Medical Images ( http://arxiv.org/abs/2309.03493v3 ) ライセンス: Link先を確認	Nhat-Tan Bui and Dinh-Hieu Hoang and Minh-Triet Tran and Gianfranco Doretto and Donald Adjeroh and Brijesh Patel and Arabinda Choudhary and Ngan Le	(参考訳) 画像セグメンテーションは医用画像解析において重要な要素であり、正確な診断のための重要な情報の抽出を支援する。深層学習の出現により、画像の自動分割手法が隆盛し、医療画像の処理において異常な熟練度を示している。 Segment Anything Model (SAM) による動機付け - 2次元の自然画像のセグメンテーションにおける顕著な精度と堅牢な一般化能力で有名な基礎モデルである。我々のSAM3Dモデルは、ボリュームを個別に2次元スライスに変換することでボリュームデータを分割する現在のSAMベース手法とは異なり、統一的なアプローチで全3次元ボリューム画像を処理する。複数の医用画像データセットを用いて大規模な実験を行い, パラメータの面では極めて効率的でありながら, 従来の3次元医用セグメンテーションの手法と比較して, ネットワークが競争力を発揮することを示した。コードとチェックポイントはhttps://github.com/UARK-AICV/SAM3Dで入手できる。 Image segmentation remains a pivotal component in medical image analysis, aiding in the extraction of critical information for precise diagnostic practices. With the advent of deep learning, automated image segmentation methods have risen to prominence, showcasing exceptional proficiency in processing medical imagery. Motivated by the Segment Anything Model (SAM)-a foundational model renowned for its remarkable precision and robust generalization capabilities in segmenting 2D natural images-we introduce SAM3D, an innovative adaptation tailored for 3D volumetric medical image analysis. Unlike current SAM-based methods that segment volumetric data by converting the volume into separate 2D slices for individual analysis, our SAM3D model processes the entire 3D volume image in a unified approach. Extensive experiments are conducted on multiple medical image datasets to demonstrate that our network attains competitive results compared with other state-of-the-art methods in 3D medical segmentation tasks while being significantly efficient in terms of parameters. Code and checkpoints are available at https://github.com/UARK-AICV/SAM3D.	翻訳日:2023-11-28 03:28:10 公開日:2023-11-23
# 事前学習モデルがジャストインタイム欠陥予測に及ぼす影響に関する研究 A study on the impact of pre-trained model on Just-In-Time defect prediction ( http://arxiv.org/abs/2309.02317v2 ) ライセンス: Link先を確認	Yuxiang Guo, Xiaopeng Gao, Zhenyu Zhang, W.K.Chan and Bo Jiang	(参考訳) JIT(Just-In-Time)欠陥予測タスクを実行する以前の研究者は、主に、トレーニング済みモデルとトレーニング済みモデルの関係をバックボーンとして調べることなく、トレーニング済みモデルの個々のパフォーマンスに焦点を当ててきた。本研究では,RoBERTaJIT,CodeBERTJIT,BARTJIT,PLBARTJIT,GPT2JIT,CodeGPTJITの6つのモデルを構築する。これらのモデルの違いと関係を体系的に検討する。具体的には、コミットコードとコミットメッセージを入力として使用する際のモデルの性能と、これらの6つのモデル間のトレーニング効率とモデル分布の関係について検討する。さらに,入力に対する各モデルの感度を調べるため,アブレーション実験を行った。さらに,ゼロショットと少数ショットのシナリオでモデルがどのように機能するかを検討する。以上の結果から, 異なるバックボーンに基づく各モデルでは改善が見られ, バックボーンの事前学習モデルが類似している場合には, 使用すべきトレーニングリソースがはるかに近いことが示唆された。我々はまた、Commitコードが欠陥検出において重要な役割を果たすことを観察し、様々な事前訓練されたモデルが、数ショットのシナリオ下でバランスの取れたデータセットでより良い欠陥検出能力を示す。これらの結果は、事前学習したモデルを用いてjit欠陥予測タスクを最適化するための新しい洞察を与え、これらのモデルを構築する際により注意を要する要因を強調する。さらに、CodeGPTJITとGPT2JITは、2000のトレーニングサンプルでそれぞれ2つのデータセットでDeepJITとCC2Vecよりも優れたパフォーマンスを達成した。これらの結果は,JIT欠陥予測タスク,特に限られたトレーニングデータを持つシナリオにおいて,トランスフォーマーに基づく事前学習モデルの有効性を強調した。 Previous researchers conducting Just-In-Time (JIT) defect prediction tasks have primarily focused on the performance of individual pre-trained models, without exploring the relationship between different pre-trained models as backbones. In this study, we build six models: RoBERTaJIT, CodeBERTJIT, BARTJIT, PLBARTJIT, GPT2JIT, and CodeGPTJIT, each with a distinct pre-trained model as its backbone. We systematically explore the differences and connections between these models. Specifically, we investigate the performance of the models when using Commit code and Commit message as inputs, as well as the relationship between training efficiency and model distribution among these six models. Additionally, we conduct an ablation experiment to explore the sensitivity of each model to inputs. Furthermore, we investigate how the models perform in zero-shot and few-shot scenarios. Our findings indicate that each model based on different backbones shows improvements, and when the backbone's pre-training model is similar, the training resources that need to be consumed are much more closer. We also observe that Commit code plays a significant role in defect detection, and different pre-trained models demonstrate better defect detection ability with a balanced dataset under few-shot scenarios. These results provide new insights for optimizing JIT defect prediction tasks using pre-trained models and highlight the factors that require more attention when constructing such models. Additionally, CodeGPTJIT and GPT2JIT achieved better performance than DeepJIT and CC2Vec on the two datasets respectively under 2000 training samples. These findings emphasize the effectiveness of transformer-based pre-trained models in JIT defect prediction tasks, especially in scenarios with limited training data.	翻訳日:2023-11-28 03:27:51 公開日:2023-11-23
# KubernetesからKnactorへ - サービス構成に関するデータ中心の再考 From Kubernetes to Knactor: A Data-Centric Rethink of Service Composition ( http://arxiv.org/abs/2309.01805v2 ) ライセンス: Link先を確認	Silvery Fu, Hong Zhang, Ryan Teoh, Taras Priadka, Sylvia Ratnasamy	(参考訳) マイクロサービスは現代のアプリケーションでますます使われており、効果的なサービス構成ソリューションの必要性が高まっている。しかし、従来のapi中心の構成メカニズム(rpc、rest、pub/subなど)がマイクロサービスのモジュラリティを阻害していると主張する。これらのメカニズムは、厳格なコードレベルの結合、分散コンポジションロジック、およびサービス間のデータ交換への可視性を妨げる。最終的にこれらの制限は、マイクロサービスベースのアプリケーションのメンテナンスと進化を複雑にする。そこで,我々はサービス構成の再考と,マイクロサービスの提供するモジュール性を取り戻すためのデータ中心の構成フレームワークである knactor を提案する。 Knactorはサービス構成をサービス開発から切り離し、コンポジションを複数のサービス間で明示的なデータ交換として実装することができる。最初のケーススタディでは、Knactorはサービス構成を単純化し、最適化の新しい機会を生み出します。 Microservices are increasingly used in modern applications, leading to a growing need for effective service composition solutions. However, we argue that traditional API-centric composition mechanisms (e.g., RPC, REST, and Pub/Sub) hamper the modularity of microservices. These mechanisms introduce rigid code-level coupling, scatter composition logic, and hinder visibility into cross-service data exchanges. Ultimately, these limitations complicate the maintenance and evolution of microservice-based applications. In response, we propose a rethinking of service composition and present Knactor, a new data-centric composition framework to restore the modularity that microservices were intended to offer. Knactor decouples service composition from service development, allowing composition to be implemented as explicit data exchanges among multiple services. Our initial case study suggests that Knactor simplifies service composition and creates new opportunities for optimizations.	翻訳日:2023-11-28 03:26:57 公開日:2023-11-23
# 量子超曲面に対する物質 Matter relative to quantum hypersurfaces ( http://arxiv.org/abs/2308.12912v2 ) ライセンス: Link先を確認	Philipp A. Hoehn, Andrea Russo, and Alexander R. H. Smith	(参考訳) 時空超曲面を特徴づける追加の埋め込み場を含む拡張位相空間上でのパラメータ化場理論としてスカラー場の標準記述を、スカラー場が記述される相対的に$\mathsf{X}$とする。この理論はディラックの処方によって量子化され、理論の物理的状態は条件付き波動汎関数 $\|\psi_\phi[\mathsf{x}]\rangle$ を超曲面 $\mathsf{x}$ に対する場の状態として解釈するために用いられる。この条件波関数は友長=シュウィンガー方程式を満たすことが示され、この拡張されたページ・ウォッタース形式と標準量子場理論の間の形式的同値性を示す。また、関係ディラック可観測性を構築し、物理的ヒルベルト空間の量子非パラメータ化を定義して関係ハイゼンベルク像を導く。さらに,超曲面を量子参照フレームとして扱うことで,古典的・非古典的な超曲面の変化に対する量子フレーム変換を拡張した。これにより、より大きな変換のクラスの下で量子場の変換特性を示し、フレーム依存の粒子生成効果をもたらすことができる。 We explore the canonical description of a scalar field as a parameterized field theory on an extended phase space that includes additional embedding fields that characterize spacetime hypersurfaces $\mathsf{X}$ relative to which the scalar field is described. This theory is quantized via the Dirac prescription and physical states of the theory are used to define conditional wave functionals $\|\psi_\phi[\mathsf{X}]\rangle$ interpreted as the state of the field relative to the hypersurface $\mathsf{X}$, thereby extending the Page-Wootters formalism to quantum field theory. It is shown that this conditional wave functional satisfies the Tomonaga-Schwinger equation, thus demonstrating the formal equivalence between this extended Page-Wootters formalism and standard quantum field theory. We also construct relational Dirac observables and define a quantum deparameterization of the physical Hilbert space leading to a relational Heisenberg picture, which are both shown to be unitarily equivalent to the Page-Wootters formalism. Moreover, by treating hypersurfaces as quantum reference frames, we extend recently developed quantum frame transformations to changes between classical and nonclassical hypersurfaces. This allows us to exhibit the transformation properties of a quantum field under a larger class of transformations, which leads to a frame-dependent particle creation effect.	翻訳日:2023-11-28 03:26:07 公開日:2023-11-23
# パラメータ効率の良い微調整でトロイの木馬を攻撃 Fewer is More: Trojan Attacks on Parameter-Efficient Fine-Tuning ( http://arxiv.org/abs/2310.00648v3 ) ライセンス: Link先を確認	Lauren Hong, Ting Wang	(参考訳) パラメータ効率のよい微調整(PEFT)により、事前訓練された言語モデル(PLM)を特定のタスクに効率的に適応させることができる。 PEFTは最小限のパラメータのみをチューニングすることで、完全な微調整に匹敵するパフォーマンスを達成する。しかし、広く使われているにもかかわらず、PEFTのセキュリティ上の意味はほとんど解明されていない。本稿では,PEFTがトロイの木馬攻撃に特有の脆弱性を示すことを示すパイロット実験を行った。具体的には,両レベル最適化による下流適応を考慮した新たな攻撃である PETA について述べる。上層目標がバックドアを PLM に埋め込む一方で,下層目標が PEFT をシミュレートして PLM のタスク固有性能を維持する。様々なダウンストリームタスクやトリガ設計において,攻撃成功率と影響を受けないクリーンさの両方の観点から,PETAの有効性を実証する。両レベル最適化は本質的にはバックドアとPEFTモジュールを「直交」し、PEFT全体を通してバックドアを保持する。この知見に基づいて,PEFT をバックドア PLM の選択層で省略し,これらの層のパラメータのサブセットを解凍する簡単な防御法を探索し,PETA を効果的に中和することを示した。 Parameter-efficient fine-tuning (PEFT) enables efficient adaptation of pre-trained language models (PLMs) to specific tasks. By tuning only a minimal set of (extra) parameters, PEFT achieves performance comparable to full fine-tuning. However, despite its prevalent use, the security implications of PEFT remain largely unexplored. In this paper, we conduct a pilot study revealing that PEFT exhibits unique vulnerability to trojan attacks. Specifically, we present PETA, a novel attack that accounts for downstream adaptation through bilevel optimization: the upper-level objective embeds the backdoor into a PLM while the lower-level objective simulates PEFT to retain the PLM's task-specific performance. With extensive evaluation across a variety of downstream tasks and trigger designs, we demonstrate PETA's effectiveness in terms of both attack success rate and unaffected clean accuracy, even after the victim user performs PEFT over the backdoored PLM using untainted data. Moreover, we empirically provide possible explanations for PETA's efficacy: the bilevel optimization inherently 'orthogonalizes' the backdoor and PEFT modules, thereby retaining the backdoor throughout PEFT. Based on this insight, we explore a simple defense that omits PEFT in selected layers of the backdoored PLM and unfreezes a subset of these layers' parameters, which is shown to effectively neutralize PETA.	翻訳日:2023-11-28 03:17:28 公開日:2023-11-23
# OpenStreetMapから一般目的表現を学習するための都市基盤モデル City Foundation Models for Learning General Purpose Representations from OpenStreetMap ( http://arxiv.org/abs/2310.00583v2 ) ライセンス: Link先を確認	Pasquale Balsebre, Weiming Huang, Gao Cong, Yi Li	(参考訳) 事前訓練されたファンデーションモデル(PFM)は、幅広い下流タスクで容易に使用できる汎用表現を学習する能力のために、人工知能のパラダイムシフトに取って代わられている。 PFMは自然言語処理やコンピュータビジョンなど様々な分野で採用されているが、地理空間データを扱う能力や都市部の質問に答える能力は依然として限られている。これは、点、セグメント、領域を含む様々なデータ型と、空間的位置、視覚特性、テキスト的アノテーションといった複数の情報モダリティを含む地理空間データの固有不均一性に起因する可能性がある。 Volunteered Geographic Informationイニシアチブの急増と、世界中で自由にアクセスできるOpenStreetMapのようなオープンな地理空間データソースの普及は、このギャップを埋める有望な機会を明らかにしている。そこで本稿では,都市のような地理的地域において基礎モデルを学習するための自己監督型枠組みであるcityfmを提案する。 CityFMはOSMからのオープンデータのみに依存し、空間情報、視覚情報、テキスト情報を組み込んだ異なるタイプのエンティティのマルチモーダル表現を生成する。基礎モデルを用いて生成したエンティティ表現を定性的な観点から分析し,道路,建物,地域レベルの下流タスクを定量的に実験する。その結果を各アプリケーション用に特別に調整したアルゴリズムと比較する。すべての実験において、CityFMはベースラインに匹敵する、あるいは同等のパフォーマンスを達成する。 Pre-trained Foundation Models (PFMs) have ushered in a paradigm-shift in Artificial Intelligence, due to their ability to learn general-purpose representations that can be readily employed in a wide range of downstream tasks. While PFMs have been successfully adopted in various fields such as Natural Language Processing and Computer Vision, their capacity in handling geospatial data and answering urban questions remains limited. This can be attributed to the intrinsic heterogeneity of geospatial data, which encompasses different data types, including points, segments and regions, as well as multiple information modalities, such as a spatial position, visual characteristics and textual annotations. The proliferation of Volunteered Geographic Information initiatives, and the ever-increasing availability of open geospatial data sources, like OpenStreetMap, which is freely accessible globally, unveil a promising opportunity to bridge this gap. In this paper, we present CityFM, a self-supervised framework to train a foundation model within a selected geographical area of interest, such as a city. CityFM relies solely on open data from OSM, and produces multimodal representations of entities of different types, incorporating spatial, visual, and textual information. We analyse the entity representations generated using our foundation models from a qualitative perspective, and conduct quantitative experiments on road, building, and region-level downstream tasks. We compare its results to algorithms tailored specifically for the respective applications. In all the experiments, CityFM achieves performance superior to, or on par with, the baselines.	翻訳日:2023-11-28 03:17:05 公開日:2023-11-23
# YFlows: CPU上のSIMDアーキテクチャを用いた効率的なニューラルネットワーク推論のための体系的データフロー探索とコード生成 YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUs ( http://arxiv.org/abs/2310.00574v3 ) ライセンス: Link先を確認	Cyrus Zhou, Zack Hassman, Ruize Xu, Dhirpal Shah, Vaugnn Richard, Yanjing Li	(参考訳) 我々は、CPU上にニューラルネットワークをデプロイする際の課題に対処し、精度を維持しながら推論時間を最小化することに重点を置いている。本稿では、ニューラルネットワークのデータフロー(すなわち計算順序)を用いて、ヒューリスティック誘導分析とコード生成フレームワークを用いてデータ再利用の機会を探索し、様々な単一命令や複数のデータ(simd)の実装を探索し、最適化されたニューラルネットワークの実行を実現する。その結果、入力と重みの再利用の両方を最大化しつつ、simdレジスタに出力を保持するデータフローは、8ビットニューラルネットワークの最大3倍のスピードアップ、バイナリニューラルネットワークの最大4.8倍のスピードアップを実現し、様々な推論ワークロードにおいて一貫して最高のパフォーマンスをもたらすことがわかった。 We address the challenges associated with deploying neural networks on CPUs, with a particular focus on minimizing inference time while maintaining accuracy. Our novel approach is to use the dataflow (i.e., computation order) of a neural network to explore data reuse opportunities using heuristic-guided analysis and a code generation framework, which enables exploration of various Single Instruction, Multiple Data (SIMD) implementations to achieve optimized neural network execution. Our results demonstrate that the dataflow that keeps outputs in SIMD registers while also maximizing both input and weight reuse consistently yields the best performance for a wide variety of inference workloads, achieving up to 3x speedup for 8-bit neural networks, and up to 4.8x speedup for binary neural networks, respectively, over the optimized implementations of neural networks today.	翻訳日:2023-11-28 03:16:40 公開日:2023-11-23
# プロンプトベースのテスト時間実画像デハジング:新しいパイプライン Prompt-based test-time real image dehazing: a novel pipeline ( http://arxiv.org/abs/2309.17389v3 ) ライセンス: Link先を確認	Zixuan Chen, Zewei He, Ziqian Lu, Xuecheng Sun, Zhe-Ming Lu	(参考訳) 既存の手法は、よく設計されたトレーニングスキーム(例えば、CycleGAN、事前損失)を探索することで、実世界のハジー画像におけるモデルの一般化能力を向上しようとする。しかし、そのほとんどは満足な結果を得るために非常に複雑な訓練手順が必要である。そこで本研究では,提案手法を用いたプロンプトベーステストタイムデハジング(pttd)と呼ばれる全く新しいテストパイプラインを提案する。 PTTDは、合成データに基づいて訓練された復調モデルを用いて、符号化機能の統計(平均偏差と標準偏差)を微調整することにより、領域ギャップを狭め、実画像の復調性能を高めることができることを実験的に見出した。そこで我々はまず,平均および標準偏差に対する適切な統計的摂動の源である視覚的プロンプトを生成するために,プロンプト生成モジュール(PGM)を適用した。そして,既存のデハージングモデルに特徴適応モジュール(FAM)を用いて,生成したプロンプトのガイダンスを用いて,元の統計量を調整する。なお、PTTDはモデル非依存であり、合成ヘイズクリーンペアで訓練された様々な最先端の脱ハージングモデルを備えることができる。 PTTDは現実のシナリオにおける最先端の脱ハージング手法に対して優れた性能を達成可能であることを示す。 PTTDのソースコードはhttps://github.com/cecret3350/PTTD-Dehazing.comで公開されます。 Existing methods attempt to improve models' generalization ability on real-world hazy images by exploring well-designed training schemes (e.g., CycleGAN, prior loss). However, most of them need very complicated training procedures to achieve satisfactory results. In this work, we present a totally novel testing pipeline called Prompt-based Test-Time Dehazing (PTTD) to help generate visually pleasing results of real-captured hazy images during the inference phase. We experimentally find that given a dehazing model trained on synthetic data, by fine-tuning the statistics (i.e., mean and standard deviation) of encoding features, PTTD is able to narrow the domain gap, boosting the performance of real image dehazing. Accordingly, we first apply a prompt generation module (PGM) to generate a visual prompt, which is the source of appropriate statistical perturbations for mean and standard deviation. And then, we employ the feature adaptation module (FAM) into the existing dehazing models for adjusting the original statistics with the guidance of the generated prompt. Note that, PTTD is model-agnostic and can be equipped with various state-of-the-art dehazing models trained on synthetic hazy-clean pairs. Extensive experimental results demonstrate that our PTTD is flexible meanwhile achieves superior performance against state-of-the-art dehazing methods in real-world scenarios. The source code of our PTTD will be made available at https://github.com/cecret3350/PTTD-Dehazing.	翻訳日:2023-11-28 03:15:50 公開日:2023-11-23
# 表現型概念認識のためのGPTモデルの評価 An evaluation of GPT models for phenotype concept recognition ( http://arxiv.org/abs/2309.17169v2 ) ライセンス: Link先を確認	Tudor Groza, Harry Caufield, Dylan Gration, Gareth Baynam, Melissa A Haendel, Peter N Robinson, Christopher J Mungall and Justin T Reese	(参考訳) 目的: 臨床深部表現型検査と表現型アノテーションは, 稀な疾患の診断と, 稀な疾患の分野での計算学的知識構築において重要な役割を担っている。これらのプロセスは、しばしば人間の表現型オントロジーからのオントロジ概念の使用と、患者のプロファイルや既存の科学文献をキュレートするための表現型概念認識タスク(機械学習手法によって支援される)との併用に依存している。多くのNLPタスクに大規模言語モデル(LLM)を用いることで,ChatGPTを基盤とした最新の生成事前学習トランスフォーマ(GPT)モデルの性能を臨床的表現型および表現型アノテーションのタスクの基盤として検討する。材料と方法: 実験装置は, 各種特異性の7つのプロンプト, 2つのGPTモデル(gpt-3.5-turboとgpt-4.0)および2つの確立された表現型認識のための金標準コーパス, 1つは出版要約とその他の臨床観察を含む。結果: 得られた結果は, 適切な設定で, これらのモデルが芸術的性能の状態を達成できることを示す。ベストランは、数発の学習を用いて、出版物の要約で0.58マクロF1スコア、臨床観察で0.75マクロF1スコアを達成し、前者は最先端の美術品に匹敵し、後者は現在のクラスツールで最高のものを上回った。結論: 結果は有望であるが、結果の非決定論的性質、高いコスト、同じプロンプトと入力を使用して異なる実行間の一致の欠如により、この特定のタスクにこれらのLCMを使用することは困難である。 Objective: Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes rely on using ontology concepts, often from the Human Phenotype Ontology, in conjunction with a phenotype concept recognition task (supported usually by machine learning methods) to curate patient profiles or existing scientific literature. With the significant shift in the use of large language models (LLMs) for most NLP tasks, we examine the performance of the latest Generative Pre-trained Transformer (GPT) models underpinning ChatGPT as a foundation for the tasks of clinical phenotyping and phenotype annotation. Materials and Methods: The experimental setup of the study included seven prompts of various levels of specificity, two GPT models (gpt-3.5-turbo and gpt-4.0) and two established gold standard corpora for phenotype recognition, one consisting of publication abstracts and the other clinical observations. Results: Our results show that, with an appropriate setup, these models can achieve state of the art performance. The best run, using few-shot learning, achieved 0.58 macro F1 score on publication abstracts and 0.75 macro F1 score on clinical observations, the former being comparable with the state of the art, while the latter surpassing the current best in class tool. Conclusion: While the results are promising, the non-deterministic nature of the outcomes, the high cost and the lack of concordance between different runs using the same prompt and input make the use of these LLMs challenging for this particular task.	翻訳日:2023-11-28 03:14:54 公開日:2023-11-23
# n-of-1試験における運動推奨のためのオンライン強化学習エージェントの設計と評価 Designing and evaluating an online reinforcement learning agent for physical exercise recommendations in N-of-1 trials ( http://arxiv.org/abs/2309.14156v2 ) ライセンス: Link先を確認	Dominik Meier, Ipek Ensari, Stefan Konigorski	(参考訳) パーソナライズされた適応型介入は患者の利益を高める機会を提供するが、計画と実施には課題がある。一旦実施すると、パーソナライズされた適応的介入が、固定金標準介入よりも臨床的に効果的であるかどうかが重要な問題である。本稿では,オンライン強化学習エージェントによるパーソナライズされた介入の実装が実現可能か,有効かを検証した,革新的なN-of-1トライアルデザインを提案する。本研究は, 子宮内膜症の痛みを軽減するために, エクササイズレコメンデーションに関する新しい研究を用いている。本稿では,文脈的包括的推薦エージェントの設計とシミュレーション研究における評価について述べる。その結果、まず、オンライン強化学習エージェントによるパーソナライズされた介入を実装することは可能であった。第二に、適応的介入は、わずかな観察しか得られなくても、患者の利益を改善する可能性がある。課題のひとつは、設計と実装プロセスに複雑さを加えることです。期待される利益を定量化するためには、過去の介入研究のデータが必要である。アプローチは他の介入や臨床介入に移行できるものと期待している。 Personalized adaptive interventions offer the opportunity to increase patient benefits, however, there are challenges in their planning and implementation. Once implemented, it is an important question whether personalized adaptive interventions are indeed clinically more effective compared to a fixed gold standard intervention. In this paper, we present an innovative N-of-1 trial study design testing whether implementing a personalized intervention by an online reinforcement learning agent is feasible and effective. Throughout, we use a new study on physical exercise recommendations to reduce pain in endometriosis for illustration. We describe the design of a contextual bandit recommendation agent and evaluate the agent in simulation studies. The results show that, first, implementing a personalized intervention by an online reinforcement learning agent is feasible. Second, such adaptive interventions have the potential to improve patients' benefits even if only few observations are available. As one challenge, they add complexity to the design and implementation process. In order to quantify the expected benefit, data from previous interventional studies is required. We expect our approach to be transferable to other interventions and clinical interventions.	翻訳日:2023-11-28 03:13:27 公開日:2023-11-23
# Associative Transformerはスパース表現学習者 Associative Transformer Is A Sparse Representation Learner ( http://arxiv.org/abs/2309.12862v2 ) ライセンス: Link先を確認	Yuwei Sun, Hideya Ochiai, Zhirong Wu, Stephen Lin, Ryota Kanai	(参考訳) 従来のトランスフォーマーモデルのモノリシックなペアワイズアテンション機構から生まれ、生物学的原理とより密接に一致する疎結合な相互作用を活用することへの関心が高まっている。セットトランスやパーセプタを含むアプローチでは、潜在空間とクロスアテンションが統合され、限られた容量で注意のボトルネックとなる。近年のグローバルワークスペース理論と連想記憶の神経科学研究に基づいて,AiT(Associative Transformer)を提案する。 AiTは、共有ワークスペースにおけるボトルネックの注意とホップフィールドネットワークの連想メモリ内のアトラクタを導くために、両方の先行として機能する低ランクな明示メモリを誘導する。エンドツーエンドの合同トレーニングを通じて、これらの優先順位はモジュールの特殊化を自然に発展させ、それぞれが注意のボトルネックを形成するために異なる帰納的バイアスをもたらします。ボトルネックは、情報をメモリに書き込む際の入力間の競合を促進する。 AiTはスパース表現学習者であり、入力量や次元に複雑性不変なボトルネックを通じて、異なる事前学習を行う。 AiTは、様々な視覚タスクにおいて、Set Transformer、Vision Transformer、Coordinationなどのメソッドよりも優れていることを示す。 Emerging from the monolithic pairwise attention mechanism in conventional Transformer models, there is a growing interest in leveraging sparse interactions that align more closely with biological principles. Approaches including the Set Transformer and the Perceiver employ cross-attention consolidated with a latent space that forms an attention bottleneck with limited capacity. Building upon recent neuroscience studies of Global Workspace Theory and associative memory, we propose the Associative Transformer (AiT). AiT induces low-rank explicit memory that serves as both priors to guide bottleneck attention in the shared workspace and attractors within associative memory of a Hopfield network. Through joint end-to-end training, these priors naturally develop module specialization, each contributing a distinct inductive bias to form attention bottlenecks. A bottleneck can foster competition among inputs for writing information into the memory. We show that AiT is a sparse representation learner, learning distinct priors through the bottlenecks that are complexity-invariant to input quantities and dimensions. AiT demonstrates its superiority over methods such as the Set Transformer, Vision Transformer, and Coordination in various vision tasks.	翻訳日:2023-11-28 03:12:48 公開日:2023-11-23
# XAIの公正性に関する批判的調査 A Critical Survey on Fairness Benefits of XAI ( http://arxiv.org/abs/2310.13007v4 ) ライセンス: Link先を確認	Luca Deck, Jakob Schoeffer, Maria De-Arteaga, Niklas K\"uhl	(参考訳) 本稿では,説明可能なai(xai)と公平性の関係に関する典型的な主張を分析し,これら2つの概念間の多次元関係を解消する。体系的な文献レビューとその後の質的内容分析に基づいて,XAIの公正性に関する175論文から7つの古文書を抽出した。我々はこれらの主張に関して重要な注意事項を提示し、特定の公正なデシダラタに対するXAIの可能性と限界に関する今後の議論のエントリポイントを提供する。文献では、XAIがいくつかのフェアネス・デシダラタの有効性を示すことが多いが、これらのデシダラタとXAIの能力の相違に気付く。我々は,XAIを,アルゴリズムフェアネスの多次元社会技術的課題にアプローチするための多くのツールの1つとして捉え,どのようなXAI手法がどのフェアネス・デシディラトゥムに対処できるかを正確に示すことを推奨する。 In this critical survey, we analyze typical claims on the relationship between explainable AI (XAI) and fairness to disentangle the multidimensional relationship between these two concepts. Based on a systematic literature review and a subsequent qualitative content analysis, we identify seven archetypal claims from 175 papers on the alleged fairness benefits of XAI. We present crucial caveats with respect to these claims and provide an entry point for future discussions around the potentials and limitations of XAI for specific fairness desiderata. While the literature often suggests XAI to be an enabler for several fairness desiderata, we notice a divide between these desiderata and the capabilities of XAI. We encourage to conceive XAI as one of many tools to approach the multidimensional, sociotechnical challenge of algorithmic fairness and to be more specific about how exactly what kind of XAI method enables whom to address which fairness desideratum.	翻訳日:2023-11-28 03:06:13 公開日:2023-11-23
# 感度を意識したベイズ推定 Sensitivity-Aware Amortized Bayesian Inference ( http://arxiv.org/abs/2310.11122v3 ) ライセンス: Link先を確認	Lasse Elsem\"uller, Hans Olischl\"ager, Marvin Schmitt, Paul-Christian B\"urkner, Ullrich K\"othe, Stefan T. Radev	(参考訳) ベイズ推論は不確実性の下で確率的推論と決定を行うための強力なフレームワークである。現代のベイズワークフローの基本的選択は、可能性関数と事前分布、後部近似器、およびデータに関するものである。各選択はモデルに基づく推論とその後の決定に大きく影響し、感度分析を必要とする。本研究では,無形ベイズ推論(abi,すなわちニューラルネットワークを用いたシミュレーションベース推論)に感度解析を統合するための多面的手法を提案する。まず,計算オーバーヘッドを最小に抑えながら,学習プロセスにおける代替可能性と事前仕様との間の構造的類似性を符号化するために,重みの共有を利用する。第2に,ニューラルネットワークの迅速な推論を利用して,様々なデータ摂動や前処理に対する感度を評価する。他のほとんどのベイズ的アプローチとは対照的に、どちらのステップも、確率、事前、データセットの選択ごとにモデルを再フィッティングするコストのかかるボトルネックを回避する。最後に,ニューラルネットワークアンサンブルを用いて,未知データに対する信頼できない近似による結果のばらつきを評価することを提案する。本稿では,本手法の応用モデリング問題における有効性を示す。疫病の発生動態と地球温暖化閾値の推定から,人為的意思決定モデルの比較まで。実験では,モデル選択と推論的帰結の間の隠れた関係を効果的に明らかにする手法を示す。 Bayesian inference is a powerful framework for making probabilistic inferences and decisions under uncertainty. Fundamental choices in modern Bayesian workflows concern the specification of the likelihood function and prior distributions, the posterior approximator, and the data. Each choice can significantly influence model-based inference and subsequent decisions, thereby necessitating sensitivity analysis. In this work, we propose a multifaceted approach to integrate sensitivity analyses into amortized Bayesian inference (ABI, i.e., simulation-based inference with neural networks). First, we utilize weight sharing to encode the structural similarities between alternative likelihood and prior specifications in the training process with minimal computational overhead. Second, we leverage the rapid inference of neural networks to assess sensitivity to various data perturbations or pre-processing procedures. In contrast to most other Bayesian approaches, both steps circumvent the costly bottleneck of refitting the model(s) for each choice of likelihood, prior, or dataset. Finally, we propose to use neural network ensembles to evaluate variation in results induced by unreliable approximation on unseen data. We demonstrate the effectiveness of our method in applied modeling problems, ranging from the estimation of disease outbreak dynamics and global warming thresholds to the comparison of human decision-making models. Our experiments showcase how our approach enables practitioners to effectively unveil hidden relationships between modeling choices and inferential conclusions.	翻訳日:2023-11-28 03:05:57 公開日:2023-11-23
# 深層学習に基づく空間依存音響特性の回復 Deep Learning based Spatially Dependent Acoustical Properties Recovery ( http://arxiv.org/abs/2310.10970v2 ) ライセンス: Link先を確認	Ruixian Liu, Peter Gerstoft	(参考訳) 物理インフォームドニューラルネットワーク(PINN)は、物理測定から直接空間領域全体を通して一定である偏微分方程式(PDE)係数を回復することができる。本研究では,単一のニューラルネットワークを用いて空間依存型pdesにおける係数の回復を可能にする空間依存物理形ニューラルネットワーク(sd-pinn)を提案する。 sdピンを空間依存波動方程式係数復元に適用し,不均質媒質中の音響特性の空間分布を明らかにする。提案手法は、推定されたPDEが満たさなければならない物理的制約に対する損失関数の組み入れによる雑音に対する堅牢性を示す。空間的に2次元のPDEの係数回復のために、PDE係数は興味のある2次元領域のすべての位置を行列に格納し、そのような行列に対する低ランクの仮定を組み込んで、測定できない場所で係数を復元する。 The physics-informed neural network (PINN) is capable of recovering partial differential equation (PDE) coefficients that remain constant throughout the spatial domain directly from physical measurements. In this work, we propose a spatially dependent physics-informed neural network (SD-PINN), which enables the recovery of coefficients in spatially-dependent PDEs using a single neural network, eliminating the requirement for domain-specific physical expertise. We apply the SD-PINN to spatially-dependent wave equation coefficients recovery to reveal the spatial distribution of acoustical properties in the inhomogeneous medium. The proposed method exhibits robustness to noise owing to the incorporation of a loss function for the physical constraint that the assumed PDE must be satisfied. For the coefficients recovery of spatially two-dimensional PDEs, we store the PDE coefficients at all locations in the 2D region of interest into a matrix and incorporate the low-rank assumption for such a matrix to recover the coefficients at locations without available measurements.	翻訳日:2023-11-28 03:05:35 公開日:2023-11-23
# ダイバーAIスーパービジョンの原理による探索 Exploration with Principles for Diverse AI Supervision ( http://arxiv.org/abs/2310.08899v2 ) ライセンス: Link先を確認	Hao Liu, Matei Zaharia, Pieter Abbeel	(参考訳) 次世代の予測を用いた大規模トランスフォーマーのトレーニングは、AIの画期的な進歩をもたらした。この生成AIアプローチは印象的な結果をもたらしたが、人間の監督に大きく依存している。 ChatGPTのような最先端のAIモデルでさえ、人間のデモを通じて微調整を行い、人間の入力とドメインの専門知識を必要とする。この人間の監視への強い依存は、AIイノベーションの進歩に大きなハードルとなる。この制限に対処するために,我々は,高品質なトレーニングデータの自動生成を目的とした新しいパラダイムであるexploratory ai(eai)を提案する。教師なし強化学習(RL)プレトレーニングからインスピレーションを得たEAIは、自然言語空間内での探索を実現する。我々は,生成されたコンテンツの新規性を評価するために,大規模言語モデルを用いてこれを実現する。このアプローチでは,探索原理に従って新たなコンテンツを生成するアクタと,生成したコンテンツを評価する批評家の2つの重要なコンポーネントを用いて,アクタを導くための批判を提供する。実証的な評価は、EAIが複雑な推論タスクにおけるモデルパフォーマンスを著しく向上させ、人間集約的な監督の限界に対処することを示している。 Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI. While this generative AI approach has produced impressive results, it heavily leans on human supervision. Even state-of-the-art AI models like ChatGPT depend on fine-tuning through human demonstrations, demanding extensive human input and domain expertise. This strong reliance on human oversight poses a significant hurdle to the advancement of AI innovation. To address this limitation, we propose a novel paradigm termed Exploratory AI (EAI) aimed at autonomously generating high-quality training data. Drawing inspiration from unsupervised reinforcement learning (RL) pretraining, EAI achieves exploration within the natural language space. We accomplish this by harnessing large language models to assess the novelty of generated content. Our approach employs two key components: an actor that generates novel content following exploration principles and a critic that evaluates the generated content, offering critiques to guide the actor. Empirical evaluations demonstrate that EAI significantly boosts model performance on complex reasoning tasks, addressing the limitations of human-intensive supervision.	翻訳日:2023-11-28 03:04:55 公開日:2023-11-23
# 量子コヒーレンスにおける超伝導材料の不規則性の役割の解明 Unraveling the role of disorderness in superconducting materials on qubit coherence ( http://arxiv.org/abs/2310.06621v2 ) ライセンス: Link先を確認	Ran Gao, Feng Wu, Hantao Sun, Jianjun Chen, Hao Deng, Xizheng Ma, Xiaohe Miao, Zhijun Song, Xin Wan, Fei Wang, Tian Xia, Make Ying, Chao Zhang, Yaoyun Shi, Hui-Hai Zhao, Chunqing Deng	(参考訳) 超伝導材料の障害導入は、電磁インピーダンスの向上と耐雑音性超伝導量子ビットの実現に期待されている。多くの先駆的な実装にもかかわらず、物質障害とキュービットコヒーレンスとの相関の理解はまだ発展途上である。ここでは, チタン-窒化アルミニウム-窒化チタン製スーパーインダクタを用いたフラクソニウム量子ビットの最初の, 体系的特性を示す。クビット雑音スペクトルから、コヒーレンス特性の指標としてフラックスノイズと誘電損失を抽出する。その結果, 1/f$のフラックスノイズはフラックスフラストレーション点付近のクビットのデコヒーレンスを支配しており, 誘電体損失は幅広い材料特性下では低いが, 材料障害と強く相関していることがわかった。フラックスノイズ振幅から, 現象的スピン欠陥のアラル密度(\sigma$)と材料障害は, $\sigma \propto \rho_{xx}^3$, あるいは有効$(k_F l)^{-3}$とほぼ相関していることがわかった。この研究は超伝導体内のデコヒーレンスチャネルの起源に関する新たな洞察を与え、材料設計と最適化のための有用なガイドラインとして役立った。 Introducing disorderness in the superconducting materials has been considered promising to enhance the electromagnetic impedance and realize noise-resilient superconducting qubits. Despite a number of pioneering implementations, the understanding of the correlation between the material disorderness and the qubit coherence is still developing. Here, we demonstrate the first and a systematic characterization of fluxonium qubits with the superinductors made from titanium-aluminum-nitride with varied disorderness. From qubit noise spectroscopy, the flux noise and the dielectric loss are extracted as a measure of the coherence properties. Our results reveal that the $1/f$ flux noise dominates the qubit decoherence around the flux-frustration point, strongly correlated with the material disorderness; while the dielectric loss remains low under a wide range of material properties. From the flux-noise amplitudes, the areal density ($\sigma$) of the phenomenological spin defects and material disorderness are found to be approximately correlated by $\sigma \propto \rho_{xx}^3$, or effectively $(k_F l)^{-3}$. This work has provided new insights on the origin of decoherence channels within superconductors, and could serve as a useful guideline for material design and optimization.	翻訳日:2023-11-28 03:04:07 公開日:2023-11-23
# CoT3DRef:データ効率のよい3Dビジュアルグラウンド CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding ( http://arxiv.org/abs/2310.06214v2 ) ライセンス: Link先を確認	Eslam Mohamed Bakr, Mohamed Ayman, Mahmoud Ahmed, Habib Slim, Mohamed Elhoseiny	(参考訳) 3Dビジュアルグラウンドティングは、発話によって条件付けられた3Dシーンでオブジェクトをローカライズする機能である。既存のほとんどのメソッドは参照ヘッドを使って参照オブジェクトを直接ローカライズし、複雑なシナリオで失敗する。さらに、ネットワークが最終決定に達する方法や理由も説明されていない。本稿では,人間の知覚システムを模倣する可能性を秘めた,解釈可能な3次元視覚接地フレームワークを設計できるのか? . この目的のために、まずアンカーの連鎖と最終ターゲットを予測することによって、シーケンス・ツー・シーケンスタスクとして3次元視覚接地問題を定式化する。解釈性は全体的なパフォーマンスを向上させるだけでなく、障害事例の特定にも役立ちます。思考の連鎖に従えば、参照タスクを解釈可能な中間ステップに分解し、パフォーマンスを高め、フレームワークを極めてデータ効率良くすることができる。さらに,提案するフレームワークは既存のアーキテクチャに容易に組み込むことができる。我々は,Nr3D,Sr3D,Scanreferベンチマークの総合的な実験を通じてアプローチを検証するとともに,手動のアノテートデータを必要としない既存手法と比較して一貫した性能向上を示す。さらに,提案フレームワークであるcot3drefはデータ効率が著しく向上するが,sr3dデータセットでは10%のデータしかトレーニングしない場合,データ全体のsata性能と一致している。 3D visual grounding is the ability to localize objects in 3D scenes conditioned by utterances. Most existing methods devote the referring head to localize the referred object directly, causing failure in complex scenarios. In addition, it does not illustrate how and why the network reaches the final decision. In this paper, we address this question Can we design an interpretable 3D visual grounding framework that has the potential to mimic the human perception system?. To this end, we formulate the 3D visual grounding problem as a sequence-to-sequence task by first predicting a chain of anchors and then the final target. Interpretability not only improves the overall performance but also helps us identify failure cases. Following the chain of thoughts approach enables us to decompose the referring task into interpretable intermediate steps, boosting the performance and making our framework extremely data-efficient. Moreover, our proposed framework can be easily integrated into any existing architecture. We validate our approach through comprehensive experiments on the Nr3D, Sr3D, and Scanrefer benchmarks and show consistent performance gains compared to existing methods without requiring manually annotated data. Furthermore, our proposed framework, dubbed CoT3DRef, is significantly data-efficient, whereas on the Sr3D dataset, when trained only on 10% of the data, we match the SOTA performance that trained on the entire data.	翻訳日:2023-11-28 03:03:42 公開日:2023-11-23
# Lyapunovの予測通り、ライオンは秘密裏に最適化する Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts ( http://arxiv.org/abs/2310.05898v3 ) ライセンス: Link先を確認	Lizhang Chen, Bo Liu, Kaizhao Liang, Qiang Liu	(参考訳) プログラム検索を通じて発見された新しいオプティマイザであるLion(Evolved Sign Momentum)は、大規模なAIモデルのトレーニングにおいて有望な結果を示している。 AdamWと同等か好意的に動作するが、メモリ効率は高い。ランダム探索プログラムの結果から想像できるように、lionは、符号付き運動量、デカップリングされた重みの減衰、polak、ネステロフ運動量を含む、いくつかの既存のアルゴリズムの要素を組み込んでいるが、理論上既定のオプティマイザのどのカテゴリにも当てはまらない。したがって、ライオンは幅広いタスクの汎用最適化器として機能するように見えるが、理論的根拠は定かではない。この理論的明快さの欠如は、ライオンの有効性をさらに強化し拡大する機会を制限している。この作品はライオンを軽蔑することを目的としている。連続時間解析と離散時間解析の両方に基づき、Lion は一般損失関数 $f(x)$ を最小化し、有界制約 $\\|x\\|_\infty \leq 1/\lambda$ を強制する理論的および原理的アプローチであることを示した。ライオンはこれをデカップリングウェイト崩壊の包含によって達成し、$\lambda$はウェイト崩壊係数を表す。我々の分析はライオン更新のための新しいリアプノフ関数の開発によって可能である。これは、Lion-$\kappa$アルゴリズムのより広範なファミリーに適用され、Lionの$\text{sign}(\cdot)$演算子は凸関数 $\kappa$ の次数に置き換えられ、一般的な合成最適化問題である $\min_x f(x) + \kappa^(x)$ の解となる。我々の発見はライオンのダイナミクスに関する貴重な洞察を与え、ライオン関連アルゴリズムのさらなる改良と拡張の道を開く。 Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $\\|x\\|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.	翻訳日:2023-11-28 03:02:37 公開日:2023-11-23
# FlexTrain: 異種デバイス環境のための動的トレーニングフレームワーク FlexTrain: A Dynamic Training Framework for Heterogeneous Devices Environments ( http://arxiv.org/abs/2310.20457v2 ) ライセンス: Link先を確認	Mert Unsal, Ali Maatouk, Antonio De Domenico, Nicola Piovesan, Fadhel Ayed	(参考訳) ディープラーニングモデルが大きくなるにつれて、異種デバイス環境において大きな課題が生じる。ディープラーニングモデルのサイズは、低消費電力またはリソース制約のデバイスにそれらをデプロイすることを難しくし、長い推論時間と高エネルギー消費をもたらす。これらの課題に対処するため、トレーニング期間中に異なるデバイスで利用可能な多様なストレージと計算資源に対応するフレームワークFlexTrainを提案する。 FlexTrainは、デバイス制約を尊重し、通信コストを最小化し、多様なデバイスとのシームレスな統合を確保しながら、ディープラーニングモデルの効率的なデプロイを可能にする。 flextrainをトレーニングした単一のグローバルモデルをヘテロジニアスデバイスに簡単にデプロイでき、トレーニング時間とエネルギー消費を節約できるcifar-100データセット上でflextrainの有効性を実証する。また、FlexTrainをフェデレーション学習環境に拡張し、CIFAR-10およびCIFAR-100データセットの標準フェデレーション学習ベンチマークよりも優れていることを示す。 As deep learning models become increasingly large, they pose significant challenges in heterogeneous devices environments. The size of deep learning models makes it difficult to deploy them on low-power or resource-constrained devices, leading to long inference times and high energy consumption. To address these challenges, we propose FlexTrain, a framework that accommodates the diverse storage and computational resources available on different devices during the training phase. FlexTrain enables efficient deployment of deep learning models, while respecting device constraints, minimizing communication costs, and ensuring seamless integration with diverse devices. We demonstrate the effectiveness of FlexTrain on the CIFAR-100 dataset, where a single global model trained with FlexTrain can be easily deployed on heterogeneous devices, saving training time and energy consumption. We also extend FlexTrain to the federated learning setting, showing that our approach outperforms standard federated learning benchmarks on both CIFAR-10 and CIFAR-100 datasets.	翻訳日:2023-11-28 02:53:13 公開日:2023-11-23
# ChatGPTはソフトウェアテストインテリジェンスを前進させることができるか? 変成試験の経験報告 Can ChatGPT advance software testing intelligence? An experience report on metamorphic testing ( http://arxiv.org/abs/2310.19204v2 ) ライセンス: Link先を確認	Quang-Hung Luu, Huai Liu, and Tsong Yueh Chen	(参考訳) ChatGPTは人間の質問に答えるために使われている人工知能チャットボットとしてよく知られているが、ソフトウェアテストの進歩の可能性を見出したいかもしれない。本稿では,最新のソフトウェアテスト技術であるメタモルフィックテスト(MT)のケーススタディを通じて,ソフトウェアテストのインテリジェンス向上におけるChatGPTの有効性を検討する。私たちはchatgptに、基本的にはオブジェクトプログラムに必要な特性であり、伝統的に人間の知性を必要とするメタモーフィックリレーション(mrs)の候補を生成するように依頼します。これらのMR候補は、ドメインの専門家による正確性の観点から評価される。複数のソフトウェアシステムをテストするために、chatgptを新しい正しいmrsを生成するために使用できることを示す。とはいえ、MR候補の大多数は曖昧に定義されているか、正しく定義されていないか、特にMTでテストされたことのないシステムで定義されている。ChatGPTは、後にテストを実施するために採用されるMR候補を提案することで、ソフトウェアテストインテリジェンスを促進するために使用できる。 While ChatGPT is a well-known artificial intelligence chatbot being used to answer human's questions, one may want to discover its potential in advancing software testing. We examine the capability of ChatGPT in advancing the intelligence of software testing through a case study on metamorphic testing (MT), a state-of-the-art software testing technique. We ask ChatGPT to generate candidates of metamorphic relations (MRs), which are basically necessary properties of the object program and which traditionally require human intelligence to identify. These MR candidates are then evaluated in terms of correctness by domain experts. We show that ChatGPT can be used to generate new correct MRs to test several software systems. Having said that, the majority of MR candidates are either defined vaguely or incorrect, especially for systems that have never been tested with MT. ChatGPT can be used to advance software testing intelligence by proposing MR candidates that can be later adopted for implementing tests; but human intelligence should still inevitably be involved to justify and rectify their correctness.	翻訳日:2023-11-28 02:52:28 公開日:2023-11-23
# 混合前駆体を用いたベイズ予測型共変量調整 Bayesian Prognostic Covariate Adjustment With Additive Mixture Priors ( http://arxiv.org/abs/2310.18027v3 ) ライセンス: Link先を確認	Alyssa M. Vanderbeek and Arman Sabbaghi and Jon R. Walsh and Charles K. Fisher	(参考訳) ランダム化対照試験(rcts)による効果的かつ迅速な意思決定には、偏りなく正確な治療効果推論が必要である。この要求に対処する2つの戦略は、結果と高い相関関係を持つ共変分を調整し、ベイズの定理を通じて歴史的制御情報を活用することである。我々は,これら2つの戦略を組み合わせた新たなベイズ予測型共変量調整手法であるベイズプロコバを提案する。ベイジアン ProCOVA における共変量調整は、RCT 参加者のためのデジタルツインジェネレータ (DTG) を構築する生成人工知能 (AI) アルゴリズムに基づいている。 DTGは、履歴制御データに基づいてトレーニングされ、制御処理により各RTT参加者の結果に対してデジタルツイン(DT)確率分布を生成する。 DT分布の予測は、確率的スコアと呼ばれ、調整のための共変量を定義する。履歴制御情報は、履歴制御データに基づいて指定された情報的事前確率分布と、弱情報的事前確率分布の2つの成分とを予め添加混合して活用される。混合重みは、下位の推論が情報成分から引き出される程度を、弱い情報成分に対して決定する。この重量も事前分布を持つため、前の添加剤の混合物はRCT情報を含まない状態で完全に特定可能である。ベイジアン・プロコバにおいて,後方分布からサンプリングするための効率的なgibbsアルゴリズムを確立し,後平均と治療効果パラメータ条件のばらつきに対する閉形式表現を導出する。異なる相違性を含むシミュレーション研究において,ベイジアン ProCOVA の効率向上を,そのバイアス制御と分散低減により評価した。これらの利得はより小さなRDTに変換される。 Effective and rapid decision-making from randomized controlled trials (RCTs) requires unbiased and precise treatment effect inferences. Two strategies to address this requirement are to adjust for covariates that are highly correlated with the outcome, and to leverage historical control information via Bayes' theorem. We propose a new Bayesian prognostic covariate adjustment methodology, referred to as Bayesian PROCOVA, that combines these two strategies. Covariate adjustment in Bayesian PROCOVA is based on generative artificial intelligence (AI) algorithms that construct a digital twin generator (DTG) for RCT participants. The DTG is trained on historical control data and yields a digital twin (DT) probability distribution for each RCT participant's outcome under the control treatment. The expectation of the DT distribution, referred to as the prognostic score, defines the covariate for adjustment. Historical control information is leveraged via an additive mixture prior with two components: an informative prior probability distribution specified based on historical control data, and a weakly informative prior distribution. The mixture weight determines the extent to which posterior inferences are drawn from the informative component, versus the weakly informative component. This weight has a prior distribution as well, and so the entire additive mixture prior is completely pre-specifiable without involving any RCT information. We establish an efficient Gibbs algorithm for sampling from the posterior distribution, and derive closed-form expressions for the posterior mean and variance of the treatment effect parameter conditional on the weight, in Bayesian PROCOVA. We evaluate efficiency gains of Bayesian PROCOVA via its bias control and variance reduction compared to frequentist PROCOVA in simulation studies that encompass different discrepancies. These gains translate to smaller RCTs.	翻訳日:2023-11-28 02:50:47 公開日:2023-11-23
# データ駆動交通シミュレーション:総括的レビュー Data-driven Traffic Simulation: A Comprehensive Review ( http://arxiv.org/abs/2310.15975v2 ) ライセンス: Link先を確認	Di Chen, Meixin Zhu, Hao Yang, Xuesong Wang, Yinhai Wang	(参考訳) 自動運転車(avs)は安全で効率的な交通手段を提供することで社会を大きく変革する可能性を秘めている。近年、自律運転の認識と予測において顕著な進歩が見られるが、AVの性能を検証するという課題はほとんど解決されていない。データ駆動型微視的交通シミュレーションは自動運転テストにとって重要なツールとなった 1) 高忠実度交通データの提供 2)大規模テストとシナリオ再現性の実現のメリット 3)反応的かつ現実的な交通シミュレーションの可能性。しかし、現在このトピックに関する包括的なレビューは欠落している。本稿では,このギャップを埋めるために関連する研究を要約する。本稿の目的は,現在の研究成果を概観し,この分野の今後の発展に資する未来的視点を提供することである。データ駆動トラフィックシミュレーションの一般的な問題を紹介し、重要な概念と用語を概説する。トラヒックシミュレーションを概観した後、一般的に使用される様々なデータセットと評価メトリクスをレビューする。そこで本研究では,模倣学習,強化学習,深層生成学習,深層学習を総合的に評価し,それぞれを要約し,その利点と欠点を詳細に分析する。さらに、最先端、既存の課題、そして将来の研究方向性を評価する。 Autonomous vehicles (AVs) have the potential to significantly revolutionize society by providing a secure and efficient mode of transportation. Recent years have witnessed notable advancements in autonomous driving perception and prediction, but the challenge of validating the performance of AVs remains largely unresolved. Data-driven microscopic traffic simulation has become an important tool for autonomous driving testing due to 1) availability of high-fidelity traffic data; 2) its advantages of enabling large-scale testing and scenario reproducibility; and 3) its potential in reactive and realistic traffic simulation. However, a comprehensive review of this topic is currently lacking. This paper aims to fill this gap by summarizing relevant studies. The primary objective of this paper is to review current research efforts and provide a futuristic perspective that will benefit future developments in the field. It introduces the general issues of data-driven traffic simulation and outlines key concepts and terms. After overviewing traffic simulation, various datasets and evaluation metrics commonly used are reviewed. The paper then offers a comprehensive evaluation of imitation learning, reinforcement learning, deep generative and deep learning methods, summarizing each and analyzing their advantages and disadvantages in detail. Moreover, it evaluates the state-of-the-art, existing challenges, and future research directions.	翻訳日:2023-11-28 02:49:37 公開日:2023-11-23
# コープマン表現の修正コース Course Correcting Koopman Representations ( http://arxiv.org/abs/2310.15386v2 ) ライセンス: Link先を確認	Mahan Fathi and Clement Gehring and Jonathan Pilault and David Kanaa and Pierre-Luc Bacon and Ross Goroshin	(参考訳) クープマン表現は、潜在空間における線形力学をもたらす非線形力学系(NLDS)の特徴を学習することを目的としている。理論的には、これらの特徴はNLDSのモデリングと制御における多くの問題を単純化するために使用できる。本研究では, この問題のオートエンコーダの定式化と, ダイナミックスをモデル化するための様々な方法, 特に長期水平線上での将来の状態予測について検討する。我々は、潜在空間における将来の状態を予測するいくつかの制限を発見し、長期的ダイナミクスを忠実に捉えるために、周期的再符号化と呼ばれる推論時間機構を提案する。我々は,低次元および高次元NLDSの実験を通して解析的および経験的にこの手法を正当化する。 Koopman representations aim to learn features of nonlinear dynamical systems (NLDS) which lead to linear dynamics in the latent space. Theoretically, such features can be used to simplify many problems in modeling and control of NLDS. In this work we study autoencoder formulations of this problem, and different ways they can be used to model dynamics, specifically for future state prediction over long horizons. We discover several limitations of predicting future states in the latent space and propose an inference-time mechanism, which we refer to as Periodic Reencoding, for faithfully capturing long term dynamics. We justify this method both analytically and empirically via experiments in low and high dimensional NLDS.	翻訳日:2023-11-28 02:49:21 公開日:2023-11-23
# 重み付き関節最大平均差によるマルチソース・マルチターゲット非教師付きドメイン適応障害診断 Weighted Joint Maximum Mean Discrepancy Enabled Multi-Source-Multi-Target Unsupervised Domain Adaptation Fault Diagnosis ( http://arxiv.org/abs/2310.14790v2 ) ライセンス: Link先を確認	Zixuan Wang, Haoran Tang, Haibo Wang, Bo Qin, Mark D. Butala, Weiming Shen, Hongwei Wang	(参考訳) データ駆動型インテリジェント障害診断技術によって達成される顕著な結果にもかかわらず、トレーニングデータとテストデータの同じ分布と十分なラベル付きデータを想定している。様々な運用状態が実用的なシナリオにしばしば存在し、障害診断の有効性を妨げるドメインシフトの問題に繋がる。最近の教師なしドメイン適応法はクロスドメイン障害診断を可能にするが、複数のソースドメインからの情報を効果的に活用し、複数のターゲットドメインにおいて効果的な診断障害を同時に達成することは困難である。本稿では,障害診断の分野では,マルチソース・マルチターゲットシナリオ下でドメイン適応を実現するマルチソース・マルチターゲット・非教師なしドメイン適応(wjmmd-mda)を実現するために,重み付きジョイント最大平均偏差を提案する。提案手法では,複数のラベル付きソースドメインから十分な情報を抽出し,重み付き距離損失を改善することにより,ソースドメインとターゲットドメインのドメインアライメントを実現する。その結果、複数のソースとターゲットドメイン間のドメイン不変性と識別的特徴は、クロスドメイン障害診断により学習される。提案手法の性能を3つのデータセットの総合的な比較実験で評価し,本手法の優位性を実証した。 Despite the remarkable results that can be achieved by data-driven intelligent fault diagnosis techniques, they presuppose the same distribution of training and test data as well as sufficient labeled data. Various operating states often exist in practical scenarios, leading to the problem of domain shift that hinders the effectiveness of fault diagnosis. While recent unsupervised domain adaptation methods enable cross-domain fault diagnosis, they struggle to effectively utilize information from multiple source domains and achieve effective diagnosis faults in multiple target domains simultaneously. In this paper, we innovatively proposed a weighted joint maximum mean discrepancy enabled multi-source-multi-target unsupervised domain adaptation (WJMMD-MDA), which realizes domain adaptation under multi-source-multi-target scenarios in the field of fault diagnosis for the first time. The proposed method extracts sufficient information from multiple labeled source domains and achieves domain alignment between source and target domains through an improved weighted distance loss. As a result, domain-invariant and discriminative features between multiple source and target domains are learned with cross-domain fault diagnosis realized. The performance of the proposed method is evaluated in comprehensive comparative experiments on three datasets, and the experimental results demonstrate the superiority of this method.	翻訳日:2023-11-28 02:48:54 公開日:2023-11-23
# プロサッカー選手の市場価値識別のための説明可能な人工知能モデル Explainable artificial intelligence model for identifying Market Value in Professional Soccer Players ( http://arxiv.org/abs/2311.04599v2 ) ライセンス: Link先を確認	Chunyang Huang, Shaoliang Zhang	(参考訳) 本研究では,サッカー選手の市場価値を予測するための高度な機械学習手法を提案し,アンサンブルモデルとShapley Additive Explanations (SHAP)を組み合わせて解釈可能とした。 sofifaの約12,000人のプレイヤーのデータを利用して、borutaアルゴリズムは特徴の選択を合理化した。グラディエントブースティング決定木(GBDT)モデルは予測精度に優れ、R-squaredは0.901、Root Mean Squared Error(RMSE)は3,221,632.175である。プレイヤーのスキル、フィットネス、認知領域の属性は市場価値に大きく影響した。これらの洞察は選手の評価においてスポーツ業界のステークホルダーを助ける。しかし、この研究には、スーパースタープレーヤーの値を過小評価したり、より大きなデータセットを必要とするような制限がある。今後の研究の方向性には、モデルの適用性の向上と、さまざまなコンテキストにおける価値予測の探索が含まれる。 This study introduces an advanced machine learning method for predicting soccer players' market values, combining ensemble models and the Shapley Additive Explanations (SHAP) for interpretability. Utilizing data from about 12,000 players from Sofifa, the Boruta algorithm streamlined feature selection. The Gradient Boosting Decision Tree (GBDT) model excelled in predictive accuracy, with an R-squared of 0.901 and a Root Mean Squared Error (RMSE) of 3,221,632.175. Player attributes in skills, fitness, and cognitive areas significantly influenced market value. These insights aid sports industry stakeholders in player valuation. However, the study has limitations, like underestimating superstar players' values and needing larger datasets. Future research directions include enhancing the model's applicability and exploring value prediction in various contexts.	翻訳日:2023-11-28 02:40:32 公開日:2023-11-23
# Big-Meansアルゴリズムの並列化戦略: 効果的なビッグデータクラスタリングのための総合的チュートリアル Strategies for Parallelizing the Big-Means Algorithm: A Comprehensive Tutorial for Effective Big Data Clustering ( http://arxiv.org/abs/2311.04517v2 ) ライセンス: Link先を確認	Ravil Mussabayev and Rustam Mussabayev	(参考訳) 本研究では,大規模データセットをクラスタリングするためのBig-meansアルゴリズムの最適化に注目し,4つの異なる並列化戦略を探索する。各アプローチの計算効率,スケーラビリティ,クラスタリング性能を評価し,そのメリットと限界を明らかにするため,広範な実験を行った。また,計算効率とクラスタリング品質のトレードオフについても検討し,各種要因の影響について検討した。今回の知見は,利用可能なリソースとデータセット特性に基づく最良並列化戦略の選択に関する実践的ガイダンスを提供し,big-meansアルゴリズムの並列化手法のより深い理解に寄与する。 This study focuses on the optimization of the Big-means algorithm for clustering large-scale datasets, exploring four distinct parallelization strategies. We conducted extensive experiments to assess the computational efficiency, scalability, and clustering performance of each approach, revealing their benefits and limitations. The paper also delves into the trade-offs between computational efficiency and clustering quality, examining the impacts of various factors. Our insights provide practical guidance on selecting the best parallelization strategy based on available resources and dataset characteristics, contributing to a deeper understanding of parallelization techniques for the Big-means algorithm.	翻訳日:2023-11-28 02:40:18 公開日:2023-11-23
# 企業データガバナンスの基盤としての組織知識のセマンティックモデリング 4.0 --統一臨床データモデルへの応用 Semantic Modelling of Organizational Knowledge as a Basis for Enterprise Data Governance 4.0 -- Application to a Unified Clinical Data Model ( http://arxiv.org/abs/2311.02082v3 ) ライセンス: Link先を確認	Miguel AP Oliveira, Stephane Manara, Bruno Mol\'e, Thomas Muller, Aur\'elien Guillouche, Lysann Hesske, Bruce Jordan, Gilles Hubert, Chinmay Kulkarni, Pralipta Jagdev and Cedric R. Berger	(参考訳) 個人や組織は常に増加するデータ量に対応し、その内容や形式は異質である。データの品質とライフサイクルの制御をもたらす適切なデータ管理プロセスは、このデータから価値を取り出し、複数の利用に関する固有のリスクを最小化するための前提条件である。一般的なデータガバナンスフレームワークは、データの圧倒的な複雑さに欠ける人々、ポリシー、プロセスに依存しています。しかし、高品質な標準を達成するためには、この複雑さを活用する必要がある。後者は、このデータに基づいてトレーニングされた生成人工知能を含む、ダウンストリームのデータ使用結果を条件とする。本稿では,メタデータ駆動,アジャイル,(準)自動データガバナンス(すなわちデータガバナンス 4.0)を実現する,シンプルでコスト効率のよいフレームワークを構築した具体的経験を報告する。本稿では,25年間の臨床研究データを企業規模で完全に生産的な環境で統合する方法について説明する。このフレームワークはセマンティックウェブの原則を利用する方法論と技術の両方を含んでいる。ガバナンスの原則を含む、ビジネスコンテキストにおけるデータ資産のアバターを記述する知識グラフを構築しました。エンタープライズ上のオントロジーによって記述された複数のオントロジーは、FAIRification、ライフサイクル管理、役割と責任の定義、トランスフォーメーション間の血統、ソースコードからの証明といった重要なガバナンスのアクションを可能にします。このメタデータモデルは、ビジネスコンテキストをアジャイルな方法で考慮し、各ユースケースにガバナンスの制約を適用し、ビジネスの変化に基づいて動的に調整する、半自動的なデータ管理プロセスであるdata governance 4.0の鍵となるものです。 Individuals and organizations cope with an always-growing amount of data, which is heterogeneous in its contents and formats. An adequate data management process yielding data quality and control over its lifecycle is a prerequisite to getting value out of this data and minimizing inherent risks related to multiple usages. Common data governance frameworks rely on people, policies, and processes that fall short of the overwhelming complexity of data. Yet, harnessing this complexity is necessary to achieve high-quality standards. The latter will condition any downstream data usage outcome, including generative artificial intelligence trained on this data. In this paper, we report our concrete experience establishing a simple, cost-efficient framework that enables metadata-driven, agile and (semi-)automated data governance (i.e. Data Governance 4.0). We explain how we implement and use this framework to integrate 25 years of clinical study data at an enterprise scale in a fully productive environment. The framework encompasses both methodologies and technologies leveraging semantic web principles. We built a knowledge graph describing avatars of data assets in their business context, including governance principles. Multiple ontologies articulated by an enterprise upper ontology enable key governance actions such as FAIRification, lifecycle management, definition of roles and responsibilities, lineage across transformations and provenance from source systems. This metadata model is the keystone to data governance 4.0: a semi-automatised data management process that considers the business context in an agile manner to adapt governance constraints to each use case and dynamically tune it based on business changes.	翻訳日:2023-11-28 02:37:27 公開日:2023-11-23
# VCISR:ビデオ圧縮合成データを用いたBlind Single Image Super-Resolution VCISR: Blind Single Image Super-Resolution with Video Compression Synthetic Data ( http://arxiv.org/abs/2311.00996v2 ) ライセンス: Link先を確認	Boyang Wang, Bowen Liu, Shiyu Liu, Fengyu Yang	(参考訳) ブラインド・シングル・イメージ・スーパーレゾリューション(SISR)タスクでは、画像レベルの未知の劣化の回復に成功している。しかし、単一のビデオフレームが入力となると、これらの作業は通常、蚊の音、鳴き声、ブロック性、階段の音などのビデオ圧縮による劣化に対処できない。本稿では,まず,映像圧縮に基づく劣化モデルを用いて,ブラインドsisrタスクにおける低分解能画像データを合成する。提案手法は既存の画像データセットに広く適用可能であり,映像圧縮アルゴリズムの損失による歪みを1つの劣化画像に含めることができる。これにより、ビデオデータの機能の多様性の漏洩が克服され、トレーニング効率が維持される。 SISR分解モデルにビデオ符号化アーティファクトを導入することで、ニューラルネットワークは、ビデオ圧縮の劣化を回復し、画像圧縮による一般的な歪みを回復するためのより良い結果を得ることができる。提案手法は, sotaノーリファレンス画像品質評価において優れた性能を達成し, 各種データセットの視覚品質を向上させる。さらに,ビデオスーパーレゾリューション(vsr)データセットの分解モデルを用いてトレーニングしたsisrニューラルネットワークを評価する。 VSR用に特別に設計されたアーキテクチャと比較して、ビデオベースの劣化を注入する提案された戦略は、時間的手がかりがなくても、より複雑な圧縮アーティファクトに対処するために一般化可能である。 In the blind single image super-resolution (SISR) task, existing works have been successful in restoring image-level unknown degradations. However, when a single video frame becomes the input, these works usually fail to address degradations caused by video compression, such as mosquito noise, ringing, blockiness, and staircase noise. In this work, we for the first time, present a video compression-based degradation model to synthesize low-resolution image data in the blind SISR task. Our proposed image synthesizing method is widely applicable to existing image datasets, so that a single degraded image can contain distortions caused by the lossy video compression algorithms. This overcomes the leak of feature diversity in video data and thus retains the training efficiency. By introducing video coding artifacts to SISR degradation models, neural networks can super-resolve images with the ability to restore video compression degradations, and achieve better results on restoring generic distortions caused by image compression as well. Our proposed approach achieves superior performance in SOTA no-reference Image Quality Assessment, and shows better visual quality on various datasets. In addition, we evaluate the SISR neural network trained with our degradation model on video super-resolution (VSR) datasets. Compared to architectures specifically designed for the VSR purpose, our method exhibits similar or better performance, evidencing that the presented strategy on infusing video-based degradation is generalizable to address more complicated compression artifacts even without temporal cues.	翻訳日:2023-11-28 02:36:35 公開日:2023-11-23
# ゼロコーディネートシフト:物理インフォームド演算子学習のためのWhetted Automatic Differentiation Zero Coordinate Shift: Whetted Automatic Differentiation for Physics-informed Operator Learning ( http://arxiv.org/abs/2311.00860v2 ) ライセンス: Link先を確認	Kuangdai Leng, Mallikarjun Shankar, Jeyan Thiyagalingam	(参考訳) 自動微分(AD)は、ネットワーク出力w.r.t.座標の高次微分を計算するために必要となる物理インフォームド機械学習における重要なステップである。本稿では,ゼロ座標シフト (zcs) のトリックと呼ばれる,物理に変形した演算子学習のためのadを行う新しい軽量アルゴリズムを提案する。すべてのサンプル座標をリーフ変数にするのではなく、zcsは空間的または時間的次元ごとにスカラー値のリーフ変数を1つだけ導入し、望んでいた微分を"many-roots-many-leaves"から"one-root-many-leaves"へと単純化した。これは関数の次元(物理パラメータ)に沿って計算グラフの重複を避けることによって、優れた性能の飛躍をもたらした。 ZCSは現在のディープラーニングライブラリで簡単に実装できますが、私たちの独自の実装はDeepXDEパッケージを拡張して実現しています。我々は、データなしで偏微分方程式(PDE)を解くために、総合的なベンチマーク分析といくつかのケーススタディを行い、物理情報を用いたDeepONetsを訓練する。以上の結果から,ZCSはGPUメモリ使用量とトレーニングのウォール時間を桁違いに削減し,その削減係数は関数数に比例して拡大した。低レベルの最適化手法として、ZCSはデータ、物理(PDE)、ネットワークアーキテクチャに制限を課さず、あらゆる面からトレーニング結果を妥協しない。 Automatic differentiation (AD) is a critical step in physics-informed machine learning, required for computing the high-order derivatives of network output w.r.t. coordinates of collocation points. In this paper, we present a novel and lightweight algorithm to conduct AD for physics-informed operator learning, which we call the trick of Zero Coordinate Shift (ZCS). Instead of making all sampled coordinates as leaf variables, ZCS introduces only one scalar-valued leaf variable for each spatial or temporal dimension, simplifying the wanted derivatives from "many-roots-many-leaves" to "one-root-many-leaves" whereby reverse-mode AD becomes directly utilisable. It has led to an outstanding performance leap by avoiding the duplication of the computational graph along the dimension of functions (physical parameters). ZCS is easy to implement with current deep learning libraries; our own implementation is achieved by extending the DeepXDE package. We carry out a comprehensive benchmark analysis and several case studies, training physics-informed DeepONets to solve partial differential equations (PDEs) without data. The results show that ZCS has persistently reduced GPU memory consumption and wall time for training by an order of magnitude, and such reduction factor scales with the number of functions. As a low-level optimisation technique, ZCS imposes no restrictions on data, physics (PDE) or network architecture and does not compromise training results from any aspect.	翻訳日:2023-11-28 02:36:09 公開日:2023-11-23
# agramplifier: 局所更新増幅による中毒攻撃に対する連合学習の防御 AGRAMPLIFIER: Defending Federated Learning Against Poisoning Attacks Through Local Update Amplification ( http://arxiv.org/abs/2311.06996v2 ) ライセンス: Link先を確認	Zirui Gong, Liyue Shen, Yanjun Zhang, Leo Yu Zhang, Jingwei Wang, Guangdong Bai, and Yong Xiang	(参考訳) 連合学習の協調性(fl)は、ビザンチン中毒攻撃として知られる局所的なトレーニングデータと局所的な更新を操作するという形で大きな脅威となる。この問題に対処するために、ビザンチン参加者がアップロードした不審なローカルアップデートをフィルタリングまたは緩和するために、多くのビザンチン・ロバスト集約ルール(agr)が提案されている。本稿では,既存のAGRの堅牢性,忠実性,効率性を同時に向上することを目的とした,AGRAMPLIFIERと呼ばれる新しいアプローチを提案する。 AGRAMPLIFIERの中核となる考え方は、各勾配更新の最も抑圧的な特徴を特定して、ローカル更新の「道徳」を増幅することであり、悪意のある更新と良心的な更新を明確に区別し、その結果、検出効果を改善することである。この目的を達成するために、AGRMPとAGRXAIという2つのアプローチを提案する。 AGRMPはパッチへのローカルアップデートを整理し、各パッチから最大の値を抽出する一方、AGRXAIは説明可能なAIメソッドを活用して、最もアクティブな機能の勾配を抽出する。 AGRAMPLIFIERに既存のビザンチン・ロバスト機構を組み込むことで、モデルの堅牢性を向上し、その忠実性を維持し、全体的な効率を向上する。 AGRAMPLIFIERは、既存のビザンチン・ロバスト機構と普遍的に互換性がある。本報告では, 主要なAGR機構に組み込むことにより, 有効性を示す。 7つの代表的な毒殺攻撃に対する多様なドメインから7つのデータセットに対して行われた広範な評価では、ロバスト性、忠実性、効率性が一貫して向上し、それぞれ40.08%、39.18%、10.68%の値が得られた。 The collaborative nature of federated learning (FL) poses a major threat in the form of manipulation of local training data and local updates, known as the Byzantine poisoning attack. To address this issue, many Byzantine-robust aggregation rules (AGRs) have been proposed to filter out or moderate suspicious local updates uploaded by Byzantine participants. This paper introduces a novel approach called AGRAMPLIFIER, aiming to simultaneously improve the robustness, fidelity, and efficiency of the existing AGRs. The core idea of AGRAMPLIFIER is to amplify the "morality" of local updates by identifying the most repressive features of each gradient update, which provides a clearer distinction between malicious and benign updates, consequently improving the detection effect. To achieve this objective, two approaches, namely AGRMP and AGRXAI, are proposed. AGRMP organizes local updates into patches and extracts the largest value from each patch, while AGRXAI leverages explainable AI methods to extract the gradient of the most activated features. By equipping AGRAMPLIFIER with the existing Byzantine-robust mechanisms, we successfully enhance the model's robustness, maintaining its fidelity and improving overall efficiency. AGRAMPLIFIER is universally compatible with the existing Byzantine-robust mechanisms. The paper demonstrates its effectiveness by integrating it with all mainstream AGR mechanisms. Extensive evaluations conducted on seven datasets from diverse domains against seven representative poisoning attacks consistently show enhancements in robustness, fidelity, and efficiency, with average gains of 40.08%, 39.18%, and 10.68%, respectively.	翻訳日:2023-11-28 02:28:32 公開日:2023-11-23
# 生体神経力学からの因果関係発見への注意 Attention for Causal Relationship Discovery from Biological Neural Dynamics ( http://arxiv.org/abs/2311.06928v3 ) ライセンス: Link先を確認	Ziyu Lu, Anika Tabassum, Shruti Kulkarni, Lu Mi, J. Nathan Kutz, Eric Shea-Brown, Seung-Hwan Lim	(参考訳) 本稿では,神経生物学的および生体物理ネットワークのように,各ノードに複雑な非線形ダイナミクスを持つネットワークにおけるグランガー因果関係を学習するためのトランスフォーマーモデルの可能性について検討する。本研究は主に、基礎となる接続マトリックスを介して基底的因果関係が知られているシミュレーションニューラルネットワークに基づく概念実証研究に焦点をあてた。神経集団動態を予測するために訓練されたトランスフォーマーモデルに対し、クロスアテンションモジュールはニューロン間の因果関係を効果的に捉え、最も一般的なグランガー因果解析法と同等かそれ以上の精度で得ることを示した。現実の神経生物学のデータは、動的接続性や観測されていない変動性など、さらなる課題をもたらすことを認めていますが、この研究は、神経科学における因果表現学習のためのトランスフォーマーモデルの有用性について、前向きな予見を与えてくれます。 This paper explores the potential of the transformer models for learning Granger causality in networks with complex nonlinear dynamics at every node, as in neurobiological and biophysical networks. Our study primarily focuses on a proof-of-concept investigation based on simulated neural dynamics, for which the ground-truth causality is known through the underlying connectivity matrix. For transformer models trained to forecast neuronal population dynamics, we show that the cross attention module effectively captures the causal relationship among neurons, with an accuracy equal or superior to that for the most popular Granger causality analysis method. While we acknowledge that real-world neurobiology data will bring further challenges, including dynamic connectivity and unobserved variability, this research offers an encouraging preliminary glimpse into the utility of the transformer model for causal representation learning in neuroscience.	翻訳日:2023-11-28 02:27:59 公開日:2023-11-23
# TrainerAgent: LLM搭載マルチエージェントシステムによるカスタマイズ可能かつ効率的なモデルトレーニング TrainerAgent: Customizable and Efficient Model Training through LLM-Powered Multi-Agent System ( http://arxiv.org/abs/2311.06622v2 ) ライセンス: Link先を確認	Haoyuan Li, Hao Jiang, Tianke Zhang, Zhelun Yu, Aoxiong Yin, Hao Cheng, Siming Fu, Yuhao Zhang, Wanggui He	(参考訳) AIモデルのトレーニングは、特にパーソナライズされたサービスを提供するカスタムモデルが必要な場合、常に困難だった。アルゴリズムエンジニアは、特定のビジネス要件に合わせて反復的にモデルを開発するための長いプロセスに直面します。高品質で効率的なモデル開発の探求は、大規模言語モデル(llm)エージェントの出現とともに、業界において重要な焦点となっている。 LLMの強力な分析,計画,意思決定機能を活用し,タスク,データ,モデル,サーバエージェントを含むマルチエージェントフレームワークからなるTranerAgentシステムを提案する。これらのエージェントは、ユーザ定義のタスク、入力データ、要求(例えば、精度、速度)を分析し、データとモデルの両方の観点から包括的な最適化を行い、満足なモデルを取得し、最終的にこれらのモデルをオンラインサービスとしてデプロイする。コンピュータビジョンおよび自然言語処理領域における古典的識別的・生成的タスクに関する実験的評価は,我々のシステムが所望の基準を満たすモデルを一貫して生成していることを示す。さらに、システムは、ファンタスティックなシナリオや非倫理的な要求など、達成不可能なタスクを批判的に識別し、拒否する能力を示し、堅牢性と安全性を確保する。本研究は, LLMを用いた分析, 意思決定, 実行能力の統合, および4つのエージェント間の協調により, 従来のモデル開発と比較して, 効率と品質が向上した望ましいモデルの実現において, 大幅な進歩を示すものである。我々は,AI分野におけるモデル開発の新たなパラダイムとして,学術および産業コミュニティにおけるTranerAgentの研究の進展に,我々の研究が貢献することを期待している。 Training AI models has always been challenging, especially when there is a need for custom models to provide personalized services. Algorithm engineers often face a lengthy process to iteratively develop models tailored to specific business requirements, making it even more difficult for non-experts. The quest for high-quality and efficient model development, along with the emergence of Large Language Model (LLM) Agents, has become a key focus in the industry. Leveraging the powerful analytical, planning, and decision-making capabilities of LLM, we propose a TrainerAgent system comprising a multi-agent framework including Task, Data, Model and Server agents. These agents analyze user-defined tasks, input data, and requirements (e.g., accuracy, speed), optimizing them comprehensively from both data and model perspectives to obtain satisfactory models, and finally deploy these models as online service. Experimental evaluations on classical discriminative and generative tasks in computer vision and natural language processing domains demonstrate that our system consistently produces models that meet the desired criteria. Furthermore, the system exhibits the ability to critically identify and reject unattainable tasks, such as fantastical scenarios or unethical requests, ensuring robustness and safety. This research presents a significant advancement in achieving desired models with increased efficiency and quality as compared to traditional model development, facilitated by the integration of LLM-powered analysis, decision-making, and execution capabilities, as well as the collaboration among four agents. We anticipate that our work will contribute to the advancement of research on TrainerAgent in both academic and industry communities, potentially establishing it as a new paradigm for model development in the field of AI.	翻訳日:2023-11-28 02:27:42 公開日:2023-11-23
# Instant3D:スパースビュー生成と大規模再構成モデルによる高速テキストから3D Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model ( http://arxiv.org/abs/2311.06214v2 ) ライセンス: Link先を確認	Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, Sai Bi	(参考訳) 拡散モデルを用いたtext-to-3dは近年著しく進歩している。しかし, 従来の方法では, 低い推算, 低多様性, ジャヌス問題に悩まされる, あるいは3次元トレーニングデータの不足による低品質な結果を生成するフィードフォワード法に依拠している。本稿では,テキストプロンプトから高品質で多様な3Dアセットをフィードフォワードで生成する新しい手法であるInstant3Dを提案する。我々はまず,2次元テキスト・画像拡散モデルを用いてテキストから4つの構造的・一貫したビューのスパースセットを1ショットで生成し,その後,新しいトランスフォーマー・ベース・スパース・ビュー・コンストラクタを用いて生成画像から直接NeRFを回帰する2段階のパラダイムを採用する。広範にわたる実験により,従来の最適化手法よりも2桁高速で1～10時間で高画質の3Dアセットを20秒以内で生成できることが実証された。私たちのプロジェクトwebページは、https://jiahao.ai/instant3d/です。 Text-to-3D with diffusion models has achieved remarkable progress in recent years. However, existing methods either rely on score distillation-based optimization which suffer from slow inference, low diversity and Janus problems, or are feed-forward methods that generate low-quality results due to the scarcity of 3D training data. In this paper, we propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner. We adopt a two-stage paradigm, which first generates a sparse set of four structured and consistent views from text in one shot with a fine-tuned 2D text-to-image diffusion model, and then directly regresses the NeRF from the generated images with a novel transformer-based sparse-view reconstructor. Through extensive experiments, we demonstrate that our method can generate diverse 3D assets of high visual quality within 20 seconds, which is two orders of magnitude faster than previous optimization-based methods that can take 1 to 10 hours. Our project webpage: https://jiahao.ai/instant3d/.	翻訳日:2023-11-28 02:26:09 公開日:2023-11-23
# ChiMed-GPT:フルトレーニングレギュムと人間の嗜好への適応性を備えた中国医学大言語モデル ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences ( http://arxiv.org/abs/2311.06025v2 ) ライセンス: Link先を確認	Yuanhe Tian, Ruyi Gan, Yan Song, Jiaxing Zhang, Yongdong Zhang	(参考訳) 近年,医療サービスに対する需要の高まりが,医療インフラの格差を浮き彫りにしている。ビッグデータ、特にテキストは医療サービスの基盤を形成するため、医療領域に合わせた効果的な自然言語処理(NLP)ソリューションが必要不可欠である。事前学習モデルを活用する従来のアプローチは、この領域で有望な結果をもたらし、現在の大規模言語モデル(LLM)は、医療テキスト処理の高度な基盤を提供する。しかし、ほとんどの医療用LDMは、医用指導の理解と対応を効率よく行うが、ドメイン知識の習得や人間の嗜好の整合には効果がないにもかかわらず、教師付き微調整(SFT)でしか訓練されない。現在の医療用LLMがテキスト処理能力を改善するのを防ぐもう1つの工学的障壁は、制限されたコンテキスト長(2,048トークンなど)であり、医学領域で頻繁に必要とされる長いコンテキストを処理するのが困難である。本研究では,中国医学領域向けに明示的に設計された新しいベンチマーク LLM であるChiMed-GPT を提案する。情報抽出,質問応答,対話生成などの実世界のタスクの評価は,一般的なドメインLLMよりもChiMed-GPTの方が優れた性能を示している。さらに,ChiMed-GPTに患者の識別に関する態度尺度を実施させ,医療領域におけるLCMのさらなる発展に寄与する可能性が示唆された。コードとモデルはhttps://github.com/synlp/ChiMed-GPTで公開されている。 Recently, the increasing demand for superior medical services has highlighted the discrepancies in the medical infrastructure. With big data, especially texts, forming the foundation of medical services, there is an exigent need for effective natural language processing (NLP) solutions tailored to the healthcare domain. Conventional approaches leveraging pre-trained models present promising results in this domain and current large language models (LLMs) offer advanced foundation for medical text processing. However, most medical LLMs are trained only with supervised fine-tuning (SFT), even though it efficiently empowers LLMs to understand and respond to medical instructions but is ineffective in learning domain knowledge and aligning with human preference. Another engineering barrier that prevents current medical LLM from better text processing ability is their restricted context length (e.g., 2,048 tokens), making it hard for the LLMs to process long context, which is frequently required in the medical domain. In this work, we propose ChiMed-GPT, a new benchmark LLM designed explicitly for Chinese medical domain, with enlarged context length to 4,096 tokens and undergoes a comprehensive training regime with pre-training, SFT, and RLHF. Evaluations on real-world tasks including information extraction, question answering, and dialogue generation demonstrate ChiMed-GPT's superior performance over general domain LLMs. Furthermore, we analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients, so as to contribute to further responsible development of LLMs in the medical domain. The code and model are released at https://github.com/synlp/ChiMed-GPT.	翻訳日:2023-11-28 02:25:32 公開日:2023-11-23
# コスト正規化最適輸送による空間間の構造変換 Structured Transforms Across Spaces with Cost-Regularized Optimal Transport ( http://arxiv.org/abs/2311.05788v2 ) ライセンス: Link先を確認	Othmane Sebbouh and Marco Cuturi and Gabriel Peyr\'e	(参考訳) 目標確率測度へのソースマッチングは、点間の差分を定量化する地価関数によってパラメータ化される線形最適輸送(OT)問題をインスタンス化することでしばしば解決される。これらの測度が同じ距離空間にある場合、地価はその距離にデフォルトとなることが多い。しかし、2つの異なる空間にまたがってインスタンス化されると、整列データがない場合のコストを選択することは難題である。その結果、実践者は代わりに二次グロモウ=ワッサーシュタイン(Gromow-Wasserstein, GW)問題を解く。本研究は,gwとコスト正規化otを並列に活用し,地上コストでパラメータ化された線形ot目標の正規化最小化を行う。我々は、このコスト規則化された定式化を用いて、2つの異なるユークリッド空間における測度を一致させ、変換元点と目標点の間のコストを評価する。二次ot問題のいくつかはこのカテゴリに陥り、構造誘導正規化子を導入することで線形変換(例えばスパーシティ)における構造を強制することを考える。非整合データからそのような変換を抽出できる近位法アルゴリズムを提案し,単細胞空間転写学/マルチオミクスマッチングタスクへの適用性を示す。 Matching a source to a target probability measure is often solved by instantiating a linear optimal transport (OT) problem, parameterized by a ground cost function that quantifies discrepancy between points. When these measures live in the same metric space, the ground cost often defaults to its distance. When instantiated across two different spaces, however, choosing that cost in the absence of aligned data is a conundrum. As a result, practitioners often resort to solving instead a quadratic Gromow-Wasserstein (GW) problem. We exploit in this work a parallel between GW and cost-regularized OT, the regularized minimization of a linear OT objective parameterized by a ground cost. We use this cost-regularized formulation to match measures across two different Euclidean spaces, where the cost is evaluated between transformed source points and target points. We show that several quadratic OT problems fall in this category, and consider enforcing structure in linear transform (e.g. sparsity), by introducing structure-inducing regularizers. We provide a proximal algorithm to extract such transforms from unaligned data, and demonstrate its applicability to single-cell spatial transcriptomics/multiomics matching tasks.	翻訳日:2023-11-28 02:24:47 公開日:2023-11-23
# 階層的"視覚表現は階層的か? Are "Hierarchical" Visual Representations Hierarchical? ( http://arxiv.org/abs/2311.05784v2 ) ライセンス: Link先を確認	Ethan Shen, Ali Farhadi, Aditya Kusupati	(参考訳) 学習された視覚表現はしばしば、正確な下流アプリケーションのための大量の意味情報をキャプチャする。世界に対する人間の理解は階層構造に根ざしている。これを模倣し、さらに表現能力を改善するために、コミュニティは視覚世界の基盤となる階層をモデル化することを目的とした「階層的な」視覚表現を探求した。本研究では,階層的視覚表現が標準的な学習表現よりも人間の知覚階層を本当に捉えているかを検討する。この目的のために、ImageNetのBREEDsサブセットから3種類の階層にまたがる12のデータセットからなるHierNetを作成します。トレーニング環境におけるハイパーボリック表現とマトリシカ表現の広範な評価の後, 階層構造が標準表現より優れているのではなく, 探索効率や解釈可能性などの他の側面を支援することができると結論付けた。ベンチマークとデータセットはhttps://github.com/ethanlshen/HierNet.comで公開されている。 Learned visual representations often capture large amounts of semantic information for accurate downstream applications. Human understanding of the world is fundamentally grounded in hierarchy. To mimic this and further improve representation capabilities, the community has explored "hierarchical" visual representations that aim at modeling the underlying hierarchy of the visual world. In this work, we set out to investigate if hierarchical visual representations truly capture the human perceived hierarchy better than standard learned representations. To this end, we create HierNet, a suite of 12 datasets spanning 3 kinds of hierarchy from the BREEDs subset of ImageNet. After extensive evaluation of Hyperbolic and Matryoshka Representations across training setups, we conclude that they do not capture hierarchy any better than the standard representations but can assist in other aspects like search efficiency and interpretability. Our benchmark and the datasets are open-sourced at https://github.com/ethanlshen/HierNet.	翻訳日:2023-11-28 02:24:26 公開日:2023-11-23
# テキストからの因果推論:変数間の相互作用を明らかにする Causal Inference from Text: Unveiling Interactions between Variables ( http://arxiv.org/abs/2311.05286v2 ) ライセンス: Link先を確認	Yuxiang Zhou, Yulan He	(参考訳) 観測テキストデータから因果効果を推定するには潜在共変量の調整が不可欠である。既存の方法の多くは、治療と結果の両方に影響を及ぼす共変量の結合のみを考慮し、潜在的に偏りのある因果効果をもたらす。このバイアスは、治療または結果にのみ関係する非共変量に対する不十分な考慮から生じる。本研究では,テキストから因果効果を推定する際,異なる変数間の相互作用を露呈し,非折りたたみ共変体を乱すことによりバイアスを軽減することを目的とする。分離過程は、共変数がそれぞれの目的にのみ寄与することを保証し、変数間の独立を可能にする。さらに,選択バイアスを軽減するために,治療群と対照群からの表現のバランスをとるための制約を課す。様々なシナリオにおいて, 2つの異なる治療因子について実験を行い, 提案モデルは近年の強基線を著しく上回っている。さらに、収支報告書の徹底的な分析により、我々のモデルが変数を効果的に解体できることが示され、現実世界のシナリオに関するさらなる調査は、投資家が情報的な意思決定を行うためのガイダンスを提供する。 Adjusting for latent covariates is crucial for estimating causal effects from observational textual data. Most existing methods only account for confounding covariates that affect both treatment and outcome, potentially leading to biased causal effects. This bias arises from insufficient consideration of non-confounding covariates, which are relevant only to either the treatment or the outcome. In this work, we aim to mitigate the bias by unveiling interactions between different variables to disentangle the non-confounding covariates when estimating causal effects from text. The disentangling process ensures covariates only contribute to their respective objectives, enabling independence between variables. Additionally, we impose a constraint to balance representations from the treatment group and control group to alleviate selection bias. We conduct experiments on two different treatment factors under various scenarios, and the proposed model significantly outperforms recent strong baselines. Furthermore, our thorough analysis on earnings call transcripts demonstrates that our model can effectively disentangle the variables, and further investigations into real-world scenarios provide guidance for investors to make informed decisions.	翻訳日:2023-11-28 02:24:11 公開日:2023-11-23
# オープンワールドにおけるクロスドメインシークエンシャルレコメンデーション:モデルに依存しないコントラシブデノイングアプローチ Towards Open-world Cross-Domain Sequential Recommendation: A Model-Agnostic Contrastive Denoising Approach ( http://arxiv.org/abs/2311.04760v2 ) ライセンス: Link先を確認	Wujiang Xu, Xuying Ning, Wenfang Lin, Mingming Ha, Qiongxu Ma, Qianqiao Liang, Xuewen Tao, Linxun Chen, Bing Han, Minnan Luo	(参考訳) クロスドメインシーケンシャルレコメンデーション(CDSR)は、従来のシーケンシャルレコメンデーション(SR)システムに存在するデータ空間の問題に対処することを目的としている。既存手法は,複数のドメインにまたがって情報を伝達・伝播する特定のクロスドメインユニットを設計することを目的としている。しかし、現実のレコメンデーションシステムでは、CDSRシナリオは通常、疎い振る舞いを持つ長い尾を持つユーザーの大多数と、一つのドメインにしか存在しないコールドスタートユーザーから構成される。これにより、現実世界の業界プラットフォームにおける既存のCDSRメソッドのパフォーマンスが低下する。したがって、オープンワールドCDSRシナリオにおけるモデルの一貫性と有効性を改善することは、CDSRモデルを構築する上で重要である(\textit{1st} CH)。近年,SR手法のいくつかは,長期使用者の情報を補完する補助行動を利用している。しかし、これらのマルチビヘイビアSR法は、ターゲットと補助動作のセマンティックなギャップや、ドメイン間のユーザ関心の偏り(\textit{2nd} CH)を見落としているため、CDSRにおいて有望な性能をもたらすことはできない。 Cross-domain sequential recommendation (CDSR) aims to address the data sparsity problems that exist in traditional sequential recommendation (SR) systems. The existing approaches aim to design a specific cross-domain unit that can transfer and propagate information across multiple domains by relying on overlapping users with abundant behaviors. However, in real-world recommender systems, CDSR scenarios usually consist of a majority of long-tailed users with sparse behaviors and cold-start users who only exist in one domain. This leads to a drop in the performance of existing CDSR methods in the real-world industry platform. Therefore, improving the consistency and effectiveness of models in open-world CDSR scenarios is crucial for constructing CDSR models (\textit{1st} CH). Recently, some SR approaches have utilized auxiliary behaviors to complement the information for long-tailed users. However, these multi-behavior SR methods cannot deliver promising performance in CDSR, as they overlook the semantic gap between target and auxiliary behaviors, as well as user interest deviation across domains (\textit{2nd} CH).	翻訳日:2023-11-28 02:23:48 公開日:2023-11-23
# ProAgent: ロボットプロセス自動化からエージェントプロセス自動化へ ProAgent: From Robotic Process Automation to Agentic Process Automation ( http://arxiv.org/abs/2311.10751v2 ) ライセンス: Link先を確認	Yining Ye, Xin Cong, Shizuo Tian, Jiannan Cao, Hao Wang, Yujia Qin, Yaxi Lu, Heyang Yu, Huadong Wang, Yankai Lin, Zhiyuan Liu, Maosong Sun	(参考訳) 古代の水車からロボットプロセス自動化(RPA)まで、自動化技術は歴史を通じて進化し、人間を困難な仕事から解放してきた。しかし、RPAは人間のような知性を必要とするタスク、特にワークフロー構築の精巧な設計とワークフロー実行における動的意思決定に苦慮している。大規模言語モデル (LLM) が人間のような知性を持つようになったため, 建設・実行に関連するエージェントに人的労働力をオフロードすることで, LLMをベースとしたエージェントによる高度な自動化のための基盤的自動化パラダイムである Agentic Process Automation (APA) を導入する。そして、人間の指示からワークフローを作り、特殊エージェントを調整することで複雑な決定を下すように設計されたLLMベースのエージェントであるProAgentをインスタンス化する。ワークフローの構築と実行手順を詳細に説明し、APAの実現可能性を示し、エージェントによって駆動される新しい自動化パラダイムの可能性を明らかにする実証実験を行った。私たちのコードはhttps://github.com/openbmb/proagent.comで公開しています。 From ancient water wheels to robotic process automation (RPA), automation technology has evolved throughout history to liberate human beings from arduous tasks. Yet, RPA struggles with tasks needing human-like intelligence, especially in elaborate design of workflow construction and dynamic decision-making in workflow execution. As Large Language Models (LLMs) have emerged human-like intelligence, this paper introduces Agentic Process Automation (APA), a groundbreaking automation paradigm using LLM-based agents for advanced automation by offloading the human labor to agents associated with construction and execution. We then instantiate ProAgent, an LLM-based agent designed to craft workflows from human instructions and make intricate decisions by coordinating specialized agents. Empirical experiments are conducted to detail its construction and execution procedure of workflow, showcasing the feasibility of APA, unveiling the possibility of a new paradigm of automation driven by agents. Our code is public at https://github.com/OpenBMB/ProAgent.	翻訳日:2023-11-28 02:15:27 公開日:2023-11-23
# 注意を再考する - トランスフォーマーの注意層に代わる、浅層フィードフォワードニューラルネットワークの探索 Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers ( http://arxiv.org/abs/2311.10642v2 ) ライセンス: Link先を確認	Vukasin Bozic, Danilo Dordevic, Daniele Coppola, Joseph Thommes, Sidak Pal Singh	(参考訳) 本研究は,sequence-to-sequenceタスクのための最先端アーキテクチャであるオリジナルのtransformerモデルにおいて,アテンション機構の挙動を模倣するために,標準的な浅層フィードフォワードネットワークを用いた場合の有効性の分析を行う。トランスの注意機構のキー要素を単純なフィードフォワードネットワークに置き換え, 知識蒸留により元の成分を用いて学習する。 IWSLT2017データセットで実施した実験では,これらの“アテンションレストランスフォーマー”の能力が,元のアーキテクチャのパフォーマンスに匹敵することを示した。厳密なアブレーション研究と、様々な代替ネットワークタイプとサイズの実験を通じて、我々のアプローチの生存可能性を支える洞察を提供する。これは、アテンション機構をエミュレートする上での浅いフィードフォワードネットワークの適応性に光を当てるだけでなく、シーケンスからシーケンスへのタスクの複雑なアーキテクチャを合理化する可能性にも光を当てている。 This work presents an analysis of the effectiveness of using standard shallow feed-forward networks to mimic the behavior of the attention mechanism in the original Transformer model, a state-of-the-art architecture for sequence-to-sequence tasks. We substitute key elements of the attention mechanism in the Transformer with simple feed-forward networks, trained using the original components via knowledge distillation. Our experiments, conducted on the IWSLT2017 dataset, reveal the capacity of these "attentionless Transformers" to rival the performance of the original architecture. Through rigorous ablation studies, and experimenting with various replacement network types and sizes, we offer insights that support the viability of our approach. This not only sheds light on the adaptability of shallow feed-forward networks in emulating attention mechanisms but also underscores their potential to streamline complex architectures for sequence-to-sequence tasks.	翻訳日:2023-11-28 02:15:07 公開日:2023-11-23
# 視覚受容場に対する一般化ガウス微分モデルによる時空間受容場に対する幾何学的画像変換の共分散特性 Joint covariance property under geometric image transformations for spatio-temporal receptive fields according to the generalized Gaussian derivative model for visual receptive fields ( http://arxiv.org/abs/2311.10543v3 ) ライセンス: Link先を確認	Tony Lindeberg	(参考訳) 自然な画像変換が受容野応答に与える影響は、コンピュータビジョンと生体視覚の視覚操作のモデリングに不可欠である。この点において、視覚階層の最初期の層における幾何学的画像変換に関する共分散特性は、ロバストな画像操作の表現や、高レベルでの不変な視覚操作の定式化に不可欠である。本稿では,空間的スケーリング変換,空間的アフィン変換,ガリレオ変換,時間的スケーリング変換といった構成下での結合共分散特性を定義し,両者の相互作用を特徴付ける。具体的には、時空間の知覚場からの出力と時空間のイメージ変換とを一致させるために、受容場パラメータをどのように変換する必要があるかを示す。 The influence of natural image transformations on receptive field responses is crucial for modelling visual operations in computer vision and biological vision. In this regard, covariance properties with respect to geometric image transformations in the earliest layers of the visual hierarchy are essential for expressing robust image operations and for formulating invariant visual operations at higher levels. This paper defines and proves a joint covariance property under compositions of spatial scaling transformations, spatial affine transformations, Galilean transformations and temporal scaling transformations, which makes it possible to characterize how different types of image transformations interact with each other. Specifically, the derived relations show how the receptive field parameters need to be transformed, in order to match the output from spatio-temporal receptive fields with the underlying spatio-temporal image transformations.	翻訳日:2023-11-28 02:14:47 公開日:2023-11-23
# 複数インスタンス学習による逐次的時系列分類 Inherently Interpretable Time Series Classification via Multiple Instance Learning ( http://arxiv.org/abs/2311.10049v2 ) ライセンス: Link先を確認	Joseph Early, Gavin KC Cheung, Kurt Cutajar, Hanting Xie, Jas Kandola, Niall Twomey	(参考訳) 従来の時系列分類 (tsc) 法は、意思決定過程を曖昧に解釈するブラックボックスであることが多い。本研究では、この問題を解決するためにMIL(Multiple Instance Learning)を活用し、MILLET: Multiple Instance Learning for Locally Explainable Time Series Classificationという新しいフレームワークを提案する。我々はMILLETを既存のディープラーニングTSCモデルに適用し、予測性能を損なうことなく(場合によっては改善しても)本質的に解釈可能であることを示す。 85 UCR TSCデータセット上でMILLETを評価し,解釈可能性評価を容易にするために特別に設計された新しい合成データセットを提案する。これらのデータセットにおいて,ミレットは,他のよく知られた解釈方法よりも高い品質のスパースな説明を素早く生成することを示した。私たちの知る限り、GitHubで入手可能なMILLET(https://github.com/JAEarly/MILTimeSeriesClassification)は、TSCのための一般的なMILメソッドを開発し、それらを広範囲のドメインに適用する最初の方法です。 Conventional Time Series Classification (TSC) methods are often black boxes that obscure inherent interpretation of their decision-making processes. In this work, we leverage Multiple Instance Learning (MIL) to overcome this issue, and propose a new framework called MILLET: Multiple Instance Learning for Locally Explainable Time series classification. We apply MILLET to existing deep learning TSC models and show how they become inherently interpretable without compromising (and in some cases, even improving) predictive performance. We evaluate MILLET on 85 UCR TSC datasets and also present a novel synthetic dataset that is specially designed to facilitate interpretability evaluation. On these datasets, we show MILLET produces sparse explanations quickly that are of higher quality than other well-known interpretability methods. To the best of our knowledge, our work with MILLET, which is available on GitHub (https://github.com/JAEarly/MILTimeSeriesClassification), is the first to develop general MIL methods for TSC and apply them to an extensive variety of domains	翻訳日:2023-11-28 02:13:28 公開日:2023-11-23
# 超解法の再定義:古典的シミュレーションを伴わないPDE予測 Redefining Super-Resolution: Fine-mesh PDE predictions without classical simulations ( http://arxiv.org/abs/2311.09740v2 ) ライセンス: Link先を確認	Rajat Kumar Sarkar, Ritam Majumdar, Vishal Jadhav, Sagar Srinivas Sakhinana, Venkataramana Runkana	(参考訳) 計算流体力学(CFD)では、粗いメッシュシミュレーションは計算効率を提供するが、精度は低いことが多い。これらのシミュレーションに従来の超解像を適用することは、高分解能画像のダウンサンプリングと低分解能物理のオーステンシャルエミュレーションの基本的なコントラストのために大きな課題となる。前者の手法は、現実のシナリオの通常の制約を超越して、基礎となる物理学をより保存する。 PDEに基づく問題に適した超解像の新たな定義を提案する。高解像度データセットから単純にサンプリングする代わりに、粗いグリッドシミュレーションデータを入力として使用し、細粒度シミュレーション結果を予測する。物理拡散型UNetアップスケーリング法を用いて,バーガー方程式の不連続検出,メタン燃焼,産業熱交換器のファウリングなど,様々な2次元CFD問題に対して有効性を示す。提案手法は,従来のシミュレーションを通過させることで,基礎となる真理結果に対する計算的保存と忠実性の確保を可能にする。トレーニング中の境界条件の多様さにより,本手法の堅牢性をさらに確立し,工学および科学的CFD解法における幅広い応用の道を開く。 In Computational Fluid Dynamics (CFD), coarse mesh simulations offer computational efficiency but often lack precision. Applying conventional super-resolution to these simulations poses a significant challenge due to the fundamental contrast between downsampling high-resolution images and authentically emulating low-resolution physics. The former method conserves more of the underlying physics, surpassing the usual constraints of real-world scenarios. We propose a novel definition of super-resolution tailored for PDE-based problems. Instead of simply downsampling from a high-resolution dataset, we use coarse-grid simulated data as our input and predict fine-grid simulated outcomes. Employing a physics-infused UNet upscaling method, we demonstrate its efficacy across various 2D-CFD problems such as discontinuity detection in Burger's equation, Methane combustion, and fouling in Industrial heat exchangers. Our method enables the generation of fine-mesh solutions bypassing traditional simulation, ensuring considerable computational saving and fidelity to the original ground truth outcomes. Through diverse boundary conditions during training, we further establish the robustness of our method, paving the way for its broad applications in engineering and scientific CFD solvers.	翻訳日:2023-11-28 02:13:09 公開日:2023-11-23
# 事前学習されたコードモデルに対する敵意攻撃に関する広範囲研究 An Extensive Study on Adversarial Attack against Pre-trained Models of Code ( http://arxiv.org/abs/2311.07553v2 ) ライセンス: Link先を確認	Xiaohu Du, Ming Wen, Zichao Wei, Shangwen Wang, Hai Jin	(参考訳) Transformer-based pre-trained code (PTMC)は、多くのミッションクリティカルなアプリケーションで最先端のパフォーマンスを実現している。しかし、識別子置換やコーディングスタイル変換による敵攻撃に対して脆弱であり、精度を著しく低下させ、さらにセキュリティ上の懸念を生じさせる可能性がある。 PTMCの逆例を生成するためのいくつかの手法が提案されているが、このような手法の有効性と効率性は、特に異なるコードインテリジェンスタスクにおいてよく理解されていない。このギャップを埋めるために,本研究では,5つの最先端の敵攻撃アプローチを,有効性,効率,生成例の品質という3つの視点から体系的に分析した。結果は、5つのアプローチのいずれもこれらの観点のバランスが取れていないことを示している。特に攻撃成功率の高いアプローチは、時間を要する傾向がある。この制限に対処するために、異なるコンテキスト下で識別子を摂動させることの影響を調べ、forおよびif文内の識別子置換が最も効果的であることを示す。そこで本研究では,様々なタスクに対して異なる種類の文を優先し,さらにビーム探索を用いて逆例を生成する新しい手法を提案する。評価結果から, ALERTは, 実例の自然性を保ちながら, 有効性と効率の両面で高い性能を示した。 Transformer-based pre-trained models of code (PTMC) have been widely utilized and have achieved state-of-the-art performance in many mission-critical applications. However, they can be vulnerable to adversarial attacks through identifier substitution or coding style transformation, which can significantly degrade accuracy and may further incur security concerns. Although several approaches have been proposed to generate adversarial examples for PTMC, the effectiveness and efficiency of such approaches, especially on different code intelligence tasks, has not been well understood. To bridge this gap, this study systematically analyzes five state-of-the-art adversarial attack approaches from three perspectives: effectiveness, efficiency, and the quality of generated examples. The results show that none of the five approaches balances all these perspectives. Particularly, approaches with a high attack success rate tend to be time-consuming; the adversarial code they generate often lack naturalness, and vice versa. To address this limitation, we explore the impact of perturbing identifiers under different contexts and find that identifier substitution within for and if statements is the most effective. Based on these findings, we propose a new approach that prioritizes different types of statements for various tasks and further utilizes beam search to generate adversarial examples. Evaluation results show that it outperforms the state-of-the-art ALERT in terms of both effectiveness and efficiency while preserving the naturalness of the generated adversarial examples.	翻訳日:2023-11-28 02:11:10 公開日:2023-11-23
# 移動ロボットのセマンティック・セマンティック・セグメンテーションと境界検出 Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for Mobile Robots ( http://arxiv.org/abs/2311.12651v2 ) ライセンス: Link先を確認	Youqi Liao, Shuhao Kang, Jianping Li, Yang Liu, Yun Liu, Zhen Dong, Bisheng Yang, Xieyuanli Chen	(参考訳) シャープバウンダリとロバストセマンティクスの高精度かつ迅速なデライン化は、ロボットの把握と操作、リアルタイムセマンティクスマッピング、エッジコンピューティングユニットで実行されるオンラインセンサーキャリブレーションなど、多くの下流ロボットタスクに不可欠である。境界検出とセマンティックセグメンテーションは相補的なタスクであるが、ほとんどの研究はセマンティックセグメンテーションの軽量モデルに焦点を当てているが、境界検出の重要な役割を見落としている。本研究では,同時セマンティックセグメンテーションと境界検出に適した軽量なデュアルタスクフレームワークであるMobile-Seedを紹介する。我々のフレームワークは、2ストリームエンコーダ、アクティブフュージョンデコーダ(AFD)、デュアルタスク正規化アプローチを備えている。エンコーダは2つの経路に分けられる: 1つはカテゴリ認識のセマンティック情報をキャプチャし、もう1つはマルチスケールの特徴から境界を識別する。 AFDモジュールは、チャネル関係を学習することで意味情報と境界情報の融合を動的に適応し、各チャネルの正確な重み付けを可能にする。さらに,二重タスク学習と深層ダイバーシティの監督における矛盾を軽減するために,正規化損失を導入する。既存の手法と比較して,提案するMobile-Seedはセマンティックセグメンテーション性能を同時に改善し,オブジェクト境界を正確に特定する軽量なフレームワークを提供する。 Cityscapesデータセットの実験によると、Mobile-Seedは、RTX 2080 Ti GPU上で1024x2048の解像度で23.9フレーム/秒(FPS)のオンライン推論速度を維持しながら、mIoUで2.2ポイント(pp)、mFスコアで4.2ppという、最先端のSOTAベースラインよりも顕著に改善されている。 CamVidおよびPASCALコンテキストデータセットに関する追加実験により、我々のメソッドの一般化可能性が確認された。コードと追加結果はhttps://whu-usi3dv.github.io/Mobile-Seed/で公開されている。 Precise and rapid delineation of sharp boundaries and robust semantics is essential for numerous downstream robotic tasks, such as robot grasping and manipulation, real-time semantic mapping, and online sensor calibration performed on edge computing units. Although boundary detection and semantic segmentation are complementary tasks, most studies focus on lightweight models for semantic segmentation but overlook the critical role of boundary detection. In this work, we introduce Mobile-Seed, a lightweight, dual-task framework tailored for simultaneous semantic segmentation and boundary detection. Our framework features a two-stream encoder, an active fusion decoder (AFD) and a dual-task regularization approach. The encoder is divided into two pathways: one captures category-aware semantic information, while the other discerns boundaries from multi-scale features. The AFD module dynamically adapts the fusion of semantic and boundary information by learning channel-wise relationships, allowing for precise weight assignment of each channel. Furthermore, we introduce a regularization loss to mitigate the conflicts in dual-task learning and deep diversity supervision. Compared to existing methods, the proposed Mobile-Seed offers a lightweight framework to simultaneously improve semantic segmentation performance and accurately locate object boundaries. Experiments on the Cityscapes dataset have shown that Mobile-Seed achieves notable improvement over the state-of-the-art (SOTA) baseline by 2.2 percentage points (pp) in mIoU and 4.2 pp in mF-score, while maintaining an online inference speed of 23.9 frames-per-second (FPS) with 1024x2048 resolution input on an RTX 2080 Ti GPU. Additional experiments on CamVid and PASCAL Context datasets confirm our method's generalizability. Code and additional results are publicly available at https://whu-usi3dv.github.io/Mobile-Seed/.	翻訳日:2023-11-28 02:03:23 公開日:2023-11-23
# DisPLACE Challenge 2023の概要 -- 会話環境におけるSPeakerとLanguageのダイアリゼーション Summary of the DISPLACE Challenge 2023 -- DIarization of SPeaker and LAnguage in Conversational Environments ( http://arxiv.org/abs/2311.12564v2 ) ライセンス: Link先を確認	Shikha Baghel, Shreyas Ramoji, Somil Jain, Pratik Roy Chowdhuri, Prachi Singh, Deepu Vijayasenan, Sriram Ganapathy	(参考訳) 複数の言語が小さな地理的近傍で話される多言語社会では、非公式な会話はしばしば言語が混在する。既存の音声技術は、音声データが複数の言語や話者の多様性に富んでいるような会話から情報を抽出するのに非効率である。 displace (diaarization of speaker and language in conversational environment) チャレンジは、この困難な条件下で話者と言語ダイアリゼーション技術を評価するためのオープンコールを構成する。トラック1は多言語環境での話者ダイアリゼーション(SD)に焦点を当て、トラック2は多話者シナリオで言語ダイアリゼーション(LD)に対処した。両トラックは同じ音声データを用いて評価された。この評価を容易にするために,多言語・多話者対話型遠距離音声を用いた実世界のデータセットを作成した。さらに、SDタスクとLDタスクの両方でベースラインシステムが利用可能となり、これらのタスクの最先端を模倣した。このチャレンジは全世界で42ドルの登録金を集め、トラック1とトラック2の合計で19ドルの応募金を受け取った。本稿では,課題,データセット,タスク,ベースラインシステムの詳細について述べる。さらに,本論文では,提案したシステムの概要を両トラックで簡潔に概説し,上位のシステムに重点を置いている。また,SDタスクとLDタスクに対する洞察と今後の展望を述べるとともに,このような会話に広範に展開する前に,システムが克服すべき重要な課題に焦点をあてる。 In multi-lingual societies, where multiple languages are spoken in a small geographic vicinity, informal conversations often involve mix of languages. Existing speech technologies may be inefficient in extracting information from such conversations, where the speech data is rich in diversity with multiple languages and speakers. The DISPLACE (DIarization of SPeaker and LAnguage in Conversational Environments) challenge constitutes an open-call for evaluating and bench-marking the speaker and language diarization technologies on this challenging condition. The challenge entailed two tracks: Track-1 focused on speaker diarization (SD) in multilingual situations while, Track-2 addressed the language diarization (LD) in a multi-speaker scenario. Both the tracks were evaluated using the same underlying audio data. To facilitate this evaluation, a real-world dataset featuring multilingual, multi-speaker conversational far-field speech was recorded and distributed. Furthermore, a baseline system was made available for both SD and LD task which mimicked the state-of-art in these tasks. The challenge garnered a total of $42$ world-wide registrations and received a total of $19$ combined submissions for Track-1 and Track-2. This paper describes the challenge, details of the datasets, tasks, and the baseline system. Additionally, the paper provides a concise overview of the submitted systems in both tracks, with an emphasis given to the top performing systems. The paper also presents insights and future perspectives for SD and LD tasks, focusing on the key challenges that the systems need to overcome before wide-spread commercial deployment on such conversations.	翻訳日:2023-11-28 02:02:26 公開日:2023-11-23
# HoVer-UNet:知識蒸留によるUNetをベースとした多クラス核セグメンテーションによるHoVerNetの高速化 "HoVer-UNet": Accelerating HoVerNet with UNet-based multi-class nuclei segmentation via knowledge distillation ( http://arxiv.org/abs/2311.12553v2 ) ライセンス: Link先を確認	Cristian Tommasino, Cristiano Russo, Antonio Maria Rinaldi, Francesco Ciompi	(参考訳) 本稿では,核のインスタンス分割と組織学的分類のためのマルチブランチHoVerNetフレームワークの知識を抽出する手法として,HoVer-UNetを提案する。我々は,Mix Vision Transformerのバックボーンを備えたコンパクトで合理化された単一UNetネットワークを提案し,HoVerNetの蒸留知識を最適に符号化し,性能を損なうことなく計算要求を減らした。提案モデルは,公開PanNukeデータセットとConsepデータセットでHoVerNetに匹敵する結果を達成し,推論時間を3倍に短縮したことを示す。モデルのコードはhttps://github.com/DIAGNijmegen/HoVer-UNet.comで公開しています。 We present "HoVer-UNet", an approach to distill the knowledge of the multi-branch HoVerNet framework for nuclei instance segmentation and classification in histopathology. We propose a compact, streamlined single UNet network with a Mix Vision Transformer backbone, and equip it with a custom loss function to optimally encode the distilled knowledge of HoVerNet, reducing computational requirements without compromising performances. We show that our model achieved results comparable to HoVerNet on the public PanNuke and Consep datasets with a three-fold reduction in inference time. We make the code of our model publicly available at https://github.com/DIAGNijmegen/HoVer-UNet.	翻訳日:2023-11-28 02:01:56 公開日:2023-11-23
# PF-LRM:共振器と形状予測のための多孔性大再構成モデル PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction ( http://arxiv.org/abs/2311.12024v2 ) ライセンス: Link先を確認	Peng Wang, Hao Tan, Sai Bi, Yinghao Xu, Fujun Luan, Kalyan Sunkavalli, Wenping Wang, Zexiang Xu, Kai Zhang	(参考訳) A100 GPUで相対カメラのポーズを約1.3秒で推定しながら、視覚的オーバーラップが少なく、少数の未提示画像から3Dオブジェクトを再構成するPF-LRMを提案する。 pf-lrmは3dオブジェクトトークンと2dイメージトークン間の情報を交換するために自己アテンションブロックを利用する高度にスケーラブルな手法であり、各ビューで粗いポイントクラウドを予測し、微分可能なpnpソルバを用いてカメラポーズを得る。 PF-LRMは, 約1Mオブジェクトの膨大な多ビューポーズデータに基づいてトレーニングを行うと, 強力なクロスデータセット一般化能力を示し, 様々な未知の評価データセットに対して, ポーズ予測精度と3次元再構成品質の点で, ベースライン手法を大きなマージンで上回っている。また,高速フィードフォワード推論によるダウンストリームテキスト/画像間3dタスクにおけるモデルの適用性を示す。プロジェクトのWebサイトは以下の通り。 We propose a Pose-Free Large Reconstruction Model (PF-LRM) for reconstructing a 3D object from a few unposed images even with little visual overlap, while simultaneously estimating the relative camera poses in ~1.3 seconds on a single A100 GPU. PF-LRM is a highly scalable method utilizing the self-attention blocks to exchange information between 3D object tokens and 2D image tokens; we predict a coarse point cloud for each view, and then use a differentiable Perspective-n-Point (PnP) solver to obtain camera poses. When trained on a huge amount of multi-view posed data of ~1M objects, PF-LRM shows strong cross-dataset generalization ability, and outperforms baseline methods by a large margin in terms of pose prediction accuracy and 3D reconstruction quality on various unseen evaluation datasets. We also demonstrate our model's applicability in downstream text/image-to-3D task with fast feed-forward inference. Our project website is at: https://totoro97.github.io/pf-lrm .	翻訳日:2023-11-28 02:00:44 公開日:2023-11-23
# ダイヤモンドスズ空孔中心の電荷状態と光遷移周波数の周知初期化 Heralded initialization of charge state and optical transition frequency of diamond tin-vacancy centers ( http://arxiv.org/abs/2311.11962v3 ) ライセンス: Link先を確認	Julia M. Brevoord, Lorenzo De Santis, Takashi Yamamoto, Matteo Pasini, Nina Codreanu, Tim Turan, Hans K. C. Beukers, Christopher Waas, Ronald Hanson	(参考訳) Diamond Tin-Vacancy Centerは、量子情報科学と技術のための有望なプラットフォームとして登場した。より複雑な量子実験やスケーラブルな応用で使用する上で重要な課題は、所望の電荷状態の中心を予め定義された周波数で光遷移させる能力である。本稿では,レーザー励起,光子検出,リアルタイム論理を併用したヘラルド作成について報告する。まず、最適化共振プローブパルス中の蛍光光子数とその後の電荷状態と光遷移周波数とを強く相関させ、閾値光子計数により所望の状態をリアルタイムに階層化することを示した。次に,光発光励起測定,コヒーレント光駆動,光ラムゼイ実験に適用し,閾値の上昇に伴う光コヒーレンスを強く改善した。最後に、作製した光周波数が不均質線幅を横切るプローブレーザに従い、複数の均質線幅上の遷移周波数のチューニングを可能にすることを実証する。 Diamond Tin-Vacancy centers have emerged as a promising platform for quantum information science and technology. A key challenge for their use in more complex quantum experiments and scalable applications is the ability to prepare the center in the desired charge state with the optical transition at a pre-defined frequency. Here we report on heralding such successful preparation using a combination of laser excitation, photon detection, and real-time logic. We first show that fluorescence photon counts collected during an optimized resonant probe pulse strongly correlate with the subsequent charge state and optical transition frequency, enabling real-time heralding of the desired state through threshold photon counting. We then implement and apply this heralding technique to photoluminescence excitation measurements, coherent optical driving, and an optical Ramsey experiment, finding strongly improved optical coherence with increasing threshold. Finally, we demonstrate that the prepared optical frequency follows the probe laser across the inhomogeneous linewidth, enabling tuning of the transition frequency over multiple homogeneous linewidths.	翻訳日:2023-11-28 02:00:09 公開日:2023-11-23
# muvo:幾何表現を用いた自律運転のためのマルチモーダル生成世界モデル MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations ( http://arxiv.org/abs/2311.11762v2 ) ライセンス: Link先を確認	Daniel Bogdoll, Yitian Yang, J. Marius Z\"ollner	(参考訳) 自律運転のための教師なしの世界モデルを学ぶことは、今日のシステムの推論能力を大幅に改善する可能性がある。しかし、ほとんどの作業は世界の物理的特性を無視し、センサーデータのみに焦点を当てている。本稿では,幾何学的ボクセル表現を持つマルチモーダル世界モデルであるmuvoを提案する。生のカメラとライダーデータを用いて,センサに依存しない世界の幾何学的表現を学習する。マルチモーダルな将来の予測を実証し,この幾何表現により,カメラ画像とライダー点雲の両方の予測品質が向上することを示す。 Learning unsupervised world models for autonomous driving has the potential to improve the reasoning capabilities of today's systems dramatically. However, most work neglects the physical attributes of the world and focuses on sensor data alone. We propose MUVO, a MUltimodal World Model with Geometric VOxel Representations to address this challenge. We utilize raw camera and lidar data to learn a sensor-agnostic geometric representation of the world, which can directly be used by downstream tasks, such as planning. We demonstrate multimodal future predictions and show that our geometric representation improves the prediction quality of both camera images and lidar point clouds.	翻訳日:2023-11-28 01:59:34 公開日:2023-11-23
# 抽出ダイアログ要約のためのLLM支援セミスーパービジョン LLM aided semi-supervision for Extractive Dialog Summarization ( http://arxiv.org/abs/2311.11462v2 ) ライセンス: Link先を確認	Nishant Mishra, Gaurav Sahu, Iacer Calixto, Ameen Abu-Hanna, Issam H. Laradji	(参考訳) チャットダイアログの高品質な要約を生成するには、しばしば大きなラベル付きデータセットが必要になる。本研究では,ラベルなしデータを用いてユーザエージェント対話の抽出を効率的に行う手法を提案する。本手法では,問合せ問題として要約をフレーム化し,現在最先端の大規模言語モデル(LLM)を用いてダイアログの擬似ラベルを生成する。次に、これらの擬似ラベルを用いてチャット要約モデルを微調整し、大きなLLMからの知識をより小さな特殊モデルに効果的に転送する。従来のラベル付きデータセットの10%を使って65.9/57.0/61.0 ROUGE-1/-2/Lを達成するのに対し、トレーニングデータセット全体に基づいてトレーニングされた現在の最先端技術は65.16/55.81/64.37 ROUGE-1/-2/Lを得る。言い換えれば、最悪の場合(ROUGE-L)では、パフォーマンスの94.7%を維持しながら、データの10%しか使用していません。 Generating high-quality summaries for chat dialogs often requires large labeled datasets. We propose a method to efficiently use unlabeled data for extractive summarization of customer-agent dialogs. In our method, we frame summarization as a question-answering problem and use state-of-the-art large language models (LLMs) to generate pseudo-labels for a dialog. We then use these pseudo-labels to fine-tune a chat summarization model, effectively transferring knowledge from the large LLM into a smaller specialized model. We demonstrate our method on the \tweetsumm dataset, and show that using 10% of the original labelled data set we can achieve 65.9/57.0/61.0 ROUGE-1/-2/-L, whereas the current state-of-the-art trained on the entire training data set obtains 65.16/55.81/64.37 ROUGE-1/-2/-L. In other words, in the worst case (i.e., ROUGE-L) we still effectively retain 94.7% of the performance while using only 10% of the data.	翻訳日:2023-11-28 01:59:23 公開日:2023-11-23
# BrainZ-BP:脳バイオインダプタンスと心電図を利用した非侵襲的カフレス血圧推定法 BrainZ-BP: A Non-invasive Cuff-less Blood Pressure Estimation Approach Leveraging Brain Bio-impedance and Electrocardiogram ( http://arxiv.org/abs/2311.10996v2 ) ライセンス: Link先を確認	Bufang Yang, Le Liu, Wenxuan Wu, Mengliang Zhou, Hongxing Liu, Xinbao Ning	(参考訳) 心血管疾患の早期予防には,正確な血圧モニタリング(BP)が不可欠である。近年,非侵襲的かつカフレスBP推定アルゴリズムが注目されている。これまでの研究では、脳内バイオインダプタンス(BIOZ)が非侵襲的頭蓋内圧(ICP)モニタリングの有望な技術であることが示された。臨床的には、外傷性脳損傷(TBI)患者の治療には、ICPとBPを同時に監視する必要がある。脳BIOZによるBPの推定は、患者に装着されるセンサーの数を減少させ、快適さを向上させる。そこで本研究では,脳内BIOZを用いたBP推定の実現可能性について検討し,新しいカフレスBP推定手法であるBrainZ-BPを提案する。頭部の額骨と後頭骨の2つの電極を脳バイオス測定の前後方向に配置する。脈波伝達時間とBIOZの形態的特徴を抽出し, BP推定のための4つの回帰モデルに入力した。その結果, 無作為森林回帰モデルの平均絶対誤差, 根平均二乗誤差, 相関係数は2.17 mmHg, 3.91 mmHg, 0.90で, 拡張期圧力推定では1.71 mmHg, 3.02 mmHg, 0.89であった。提案するbrainz-bpは、脳biozベースのicp監視シナリオに適用でき、同時にbpを監視することができる。 Accurate and continuous blood pressure (BP) monitoring is essential to the early prevention of cardiovascular diseases. Non-invasive and cuff-less BP estimation algorithm has gained much attention in recent years. Previous studies have demonstrated that brain bio-impedance (BIOZ) is a promising technique for non-invasive intracranial pressure (ICP) monitoring. Clinically, treatment for patients with traumatic brain injuries (TBI) requires monitoring the ICP and BP of patients simultaneously. Estimating BP by brain BIOZ directly can reduce the number of sensors attached to the patients, thus improving their comfort. To address the issues, in this study, we explore the feasibility of leveraging brain BIOZ for BP estimation and propose a novel cuff-less BP estimation approach called BrainZ-BP. Two electrodes are placed on the forehead and occipital bone of the head in the anterior-posterior direction for brain BIOZ measurement. Various features including pulse transit time and morphological features of brain BIOZ are extracted and fed into four regression models for BP estimation. Results show that the mean absolute error, root mean square error, and correlation coefficient of random forest regression model are 2.17 mmHg, 3.91 mmHg, and 0.90 for systolic pressure estimation, and are 1.71 mmHg, 3.02 mmHg, and 0.89 for diastolic pressure estimation. The presented BrainZ-BP can be applied in the brain BIOZ-based ICP monitoring scenario to monitor BP simultaneously.	翻訳日:2023-11-28 01:59:03 公開日:2023-11-23
# サルポックス病検出のための深層学習技術の進歩に関する最近の調査 A Recent Survey of the Advancements in Deep Learning Techniques for Monkeypox Disease Detection ( http://arxiv.org/abs/2311.10754v2 ) ライセンス: Link先を確認	Saddam Hussain Khan, Rashid Iqbal, Saeeda Naz (Artifical Intelligence Lab, Department of Computer Systems Engineering, University of Engineering and Applied Science (UEAS), Swat, Pakistan)	(参考訳) サルポックス(英: Monkeypox、MPox)は、アフリカで最初に発見され、2022年半ばに世界的注目を集めたポックスウイルスの一群であるMPoxウイルスによって引き起こされた動物感染症である。 2022年7月には、頭痛、寒冷感、発熱、天然痘、麻疹、ニワトリのような皮膚の症状や、WHOが世界公衆衛生のパンデミックとして公式に発表したMPoxなどの症状がある。しかし、病院内の手動分析は、医療専門家の負担、限られた施設、医師の可用性と疲労、公衆衛生上の緊急事態時のヒューマンエラーなど、大きな課題となっている。そこで本研究では,皮膚病変画像におけるmpox自動検出のための深層学習法(dl)の広範囲かつ効率的な解析を行う。これらのdl技術は、深層cnn、深層cnnsアンサンブル、深層ハイブリッド学習、新規開発、mpox診断のための視覚トランスフォーマといったカテゴリに広く分類されている。さらに, 本研究は, DL技術の進化的進展を体系的に調査し, 従来の手法の限界に対処し, 価値ある貢献とイノベーションを強調した。さらに,本論文では,各種情報源からのベンチマークデータセットとその収集,前処理技術,評価指標について述べる。調査はまた、新たな概念を簡単に探り、研究のギャップ、限界、応用を特定し、診断プロセスの課題を概説する。この調査は、dlイノベーティブなアイデアの展望領域に関する貴重な洞察を提供し、研究者の道筋となることが期待されている。 Monkeypox (MPox) is a zoonotic infectious disease induced by the MPox Virus, part of the poxviridae orthopoxvirus group initially discovered in Africa and gained global attention in mid-2022 with cases reported outside endemic areas. Symptoms include headaches, chills, fever, smallpox, measles, and chickenpox-like skin manifestations and the WHO officially announced MPox as a global public health pandemic, in July 2022.Traditionally, PCR testing of skin lesions is considered a benchmark for the primary diagnosis by WHO, with symptom management as the primary treatment and antiviral drugs like tecovirimat for severe cases. However, manual analysis within hospitals poses a substantial challenge including the substantial burden on healthcare professionals, limited facilities, availability and fatigue among doctors, and human error during public health emergencies. Therefore, this survey paper provides an extensive and efficient analysis of deep learning (DL) methods for the automatic detection of MPox in skin lesion images. These DL techniques are broadly grouped into categories, including deep CNN, Deep CNNs ensemble, deep hybrid learning, the newly developed, and Vision transformer for diagnosing MPox. Moreover, this study offers a systematic exploration of the evolutionary progression of DL techniques and identifies, and addresses limitations in previous methods while highlighting the valuable contributions and innovation. Additionally, the paper addresses benchmark datasets and their collection from various authentic sources, pre-processing techniques, and evaluation metrics. The survey also briefly delves into emerging concepts, identifies research gaps, limitations, and applications, and outlines challenges in the diagnosis process. This survey furnishes valuable insights into the prospective areas of DL innovative ideas and is anticipated to serve as a path for researchers.	翻訳日:2023-11-28 01:58:19 公開日:2023-11-23
# プロンプト誘導多モード変圧器による結晶材料の状態予測密度 Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal Transformer ( http://arxiv.org/abs/2311.12856v2 ) ライセンス: Link先を確認	Namkyeong Lee, Heewoong Noh, Sungwon Kim, Dongmin Hyun, Gyoung S. Na, Chanyoung Park	(参考訳) 状態密度 (DOS) は結晶材料のスペクトル特性であり、物質の様々な特性に関する基本的な知見を提供する。従来の研究は主にDOS予測のための結晶材料の高品質な表現の獲得に焦点が当てられていたが、我々はDOSの性質を反映して得られた表現からDOSを予測することに重点を置いている。つまり、dosは結晶性物質だけでなく、以前の作品では無視されているエネルギーレベルによっても決定される。本稿では,多モード変圧器を用いて結晶材料とエネルギーから得られる不均一な情報を統合し,結晶材料中の原子と様々なエネルギー準位との複雑な関係をモデル化し,dos予測を行う。さらに, 結晶構造系とエネルギーの相互作用を学習するためのモデルとして, プロンプトを活用することを提案する。 Phonon DOSとElectron DOSの2種類のDOSに関する大規模な実験は、DOSTransformerの優位性を実証している。 DOSTransformerのソースコードはhttps://github.com/HeewoongNoh/DOSTransformerで入手できる。 The density of states (DOS) is a spectral property of crystalline materials, which provides fundamental insights into various characteristics of the materials. While previous works mainly focus on obtaining high-quality representations of crystalline materials for DOS prediction, we focus on predicting the DOS from the obtained representations by reflecting the nature of DOS: DOS determines the general distribution of states as a function of energy. That is, DOS is not solely determined by the crystalline material but also by the energy levels, which has been neglected in previous works. In this paper, we propose to integrate heterogeneous information obtained from the crystalline materials and the energies via a multi-modal transformer, thereby modeling the complex relationships between the atoms in the crystalline materials and various energy levels for DOS prediction. Moreover, we propose to utilize prompts to guide the model to learn the crystal structural system-specific interactions between crystalline materials and energies. Extensive experiments on two types of DOS, i.e., Phonon DOS and Electron DOS, with various real-world scenarios demonstrate the superiority of DOSTransformer. The source code for DOSTransformer is available at https://github.com/HeewoongNoh/DOSTransformer.	翻訳日:2023-11-28 01:46:13 公開日:2023-11-23
# minimax: JAX における Autocurricula の効率的なベースライン minimax: Efficient Baselines for Autocurricula in JAX ( http://arxiv.org/abs/2311.12716v2 ) ライセンス: Link先を確認	Minqi Jiang, Michael Dennis, Edward Grefenstette, Tim Rockt\"aschel	(参考訳) 教師なし環境設計(unsupervised environment design, ued)は、ロバストな意思決定エージェントを訓練し、目に見えない環境へゼロショット転送する自動カリキュラム学習の一形態である。このようなautocurriculaはrlコミュニティから大きな関心を集めている。しかし、CPUロールアウトとGPUモデルの更新に基づくUED実験は、しばしば数週間のトレーニングを必要とした。この計算要求は、この分野の急速な革新の大きな障害である。本研究は、加速ハードウェア上でのuedトレーニングのためのminimaxライブラリを紹介する。 JAXを使って完全に拡張された環境とオートキュラムアルゴリズムを実装し、minimaxはハードウェアアクセラレーションのためにトレーニングループ全体をコンパイルできる。手続き的に生成された環境でオートキュリキュラを行うための再利用可能な抽象化に加えて、MiniGridに基づくテンソル化グリッドワールドを含む、迅速な実験用のペトリ皿を提供する。これらのコンポーネントにより、minimaxは、バッチサイズのトレーニングで以前の実装と比較して120$\times$のスピードアップを実現する新しい並列化バージョンを含む、強力なuedベースラインを提供する。 minimaxライブラリはApache 2.0ライセンスでhttps://github.com/facebookresearch/minimax.comから入手できる。 Unsupervised environment design (UED) is a form of automatic curriculum learning for training robust decision-making agents to zero-shot transfer into unseen environments. Such autocurricula have received much interest from the RL community. However, UED experiments, based on CPU rollouts and GPU model updates, have often required several weeks of training. This compute requirement is a major obstacle to rapid innovation for the field. This work introduces the minimax library for UED training on accelerated hardware. Using JAX to implement fully-tensorized environments and autocurriculum algorithms, minimax allows the entire training loop to be compiled for hardware acceleration. To provide a petri dish for rapid experimentation, minimax includes a tensorized grid-world based on MiniGrid, in addition to reusable abstractions for conducting autocurricula in procedurally-generated environments. With these components, minimax provides strong UED baselines, including new parallelized variants, which achieve over 120$\times$ speedups in wall time compared to previous implementations when training with equal batch sizes. The minimax library is available under the Apache 2.0 license at https://github.com/facebookresearch/minimax.	翻訳日:2023-11-28 01:45:40 公開日:2023-11-23
# オーストラリアの建設サプライチェーンリスクマネジメントにおけるトランスフォーマティブに基づく名前付きエンティティ認識 Transformer-based Named Entity Recognition in Construction Supply Chain Risk Management in Australia ( http://arxiv.org/abs/2311.13755v1 ) ライセンス: Link先を確認	Milad Baghalzadeh Shishehgarkhaneh, Robert C. Moehler, Yihai Fang, Amer A. Hijazi, Hamed Aboutorab	(参考訳) オーストラリアの建設産業は複雑なサプライチェーンと無数のリスクに対する脆弱性が特徴である。これにより、効果的なサプライチェーンリスクマネジメント(SCRM)が必須となる。本稿では,異なるトランスフォーマーモデルを用いて,オーストラリアのSCRMにおける名前付きエンティティ認識(NER)の訓練を行う。 NERを利用することで、トランスフォーマーモデルはニュース記事の特定のリスク関連エンティティを特定し、分類し、サプライチェーンの脆弱性に関する詳細な洞察を提供する。異なるトランスフォーマーモデルを通じてニュース記事を分析することにより,オーストラリアの建築景観に特定のリスク分類群 (milieu) に関連する関連エンティティと洞察を抽出できる。本研究は, トランスフォーマーモデルのようなNLP駆動型ソリューションが, 地理メディア特有の文脈で構築するためのSCRMに革命をもたらす可能性を強調する。 The construction industry in Australia is characterized by its intricate supply chains and vulnerability to myriad risks. As such, effective supply chain risk management (SCRM) becomes imperative. This paper employs different transformer models, and train for Named Entity Recognition (NER) in the context of Australian construction SCRM. Utilizing NER, transformer models identify and classify specific risk-associated entities in news articles, offering a detailed insight into supply chain vulnerabilities. By analysing news articles through different transformer models, we can extract relevant entities and insights related to specific risk taxonomies local (milieu) to the Australian construction landscape. This research emphasises the potential of NLP-driven solutions, like transformer models, in revolutionising SCRM for construction in geo-media specific contexts.	翻訳日:2023-11-28 00:59:55 公開日:2023-11-23
# 3D-MIR:放射線画像検索のベンチマークと実証的研究 3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in Radiology ( http://arxiv.org/abs/2311.13752v1 ) ライセンス: Link先を確認	Asma Ben Abacha, Alberto Santamaria-Pang, Ho Hin Lee, Jameson Merkow, Qin Cai, Surya Teja Devarakonda, Abdullah Islam, Julia Gong, Matthew P. Lungren, Thomas Lin, Noel C Codella, Ivan Tarapov	(参考訳) 医療現場での医療画像の利用の増加は、放射線科医の作業負荷の増加によって大きな課題となっているが、効果的に活用すれば医療結果を高める機会も提供する。 3d画像検索は、臨床医が診断的に類似または関連のある症例を効率的に検索することで、放射線科医の作業を減らす可能性を秘めている。しかし、3次元医用画像検索の分野は、確立された評価ベンチマーク、包括的なデータセット、徹底的な研究が欠如している。本稿では,3次元医用画像検索(3D-MIR)の新たなベンチマークを導入することで,このギャップを埋めようとしている。このベンチマークを用いて,一般的なマルチモーダル基礎モデルの2次元スライス,3次元ボリューム,マルチモーダル埋め込みをクエリとして利用する,多様な検索戦略を探索する。各アプローチの定量的で質的な評価は、将来の研究への洞察を提供する詳細な議論とともに提供される。この分野の進歩を促進するため、我々のベンチマーク、データセット、コードを公開しています。 The increasing use of medical imaging in healthcare settings presents a significant challenge due to the increasing workload for radiologists, yet it also offers opportunity for enhancing healthcare outcomes if effectively leveraged. 3D image retrieval holds potential to reduce radiologist workloads by enabling clinicians to efficiently search through diagnostically similar or otherwise relevant cases, resulting in faster and more precise diagnoses. However, the field of 3D medical image retrieval is still emerging, lacking established evaluation benchmarks, comprehensive datasets, and thorough studies. This paper attempts to bridge this gap by introducing a novel benchmark for 3D Medical Image Retrieval (3D-MIR) that encompasses four different anatomies imaged with computed tomography. Using this benchmark, we explore a diverse set of search strategies that use aggregated 2D slices, 3D volumes, and multi-modal embeddings from popular multi-modal foundation models as queries. Quantitative and qualitative assessments of each approach are provided alongside an in-depth discussion that offers insight for future research. To promote the advancement of this field, our benchmark, dataset, and code are made publicly available.	翻訳日:2023-11-28 00:59:40 公開日:2023-11-23
# 自律性のための伝達可能なマルチモーダル知覚表現学習に向けて:NeRF-Supervised Masked AutoEncoder Towards Transferable Multi-modal Perception Representation Learning for Autonomy: NeRF-Supervised Masked AutoEncoder ( http://arxiv.org/abs/2311.13750v1 ) ライセンス: Link先を確認	Xiaohao Xu	(参考訳) 本研究では、NeRF(Near Radiance Field)におけるマスク付きマルチモーダル再構成(NeRF-Supervised Masked AutoEncoder, NS-MAE)による、伝達可能なマルチモーダル認識表現学習のための統合型事前学習フレームワークを提案する。具体的には、特定の視点方向や位置に基づいて、破損したマルチモーダル入力信号、すなわちlidar点雲や画像から抽出されたマルチモーダル埋め込みを、ニューラルネットワークによる投影されたマルチモーダル特徴マップに描画する。そして、元のマルチモーダル信号はレンダリングされたマルチモーダル特徴写像の再構成ターゲットとして機能し、自己教師付き表現学習を可能にする。 NS-MAEを用いて学習した表現は、多様な微調整ラベル付きデータを用いて、多様な3次元認識下流タスク(3Dオブジェクト検出およびBEVマップセグメンテーション)上の多モードおよび単モード(カメラのみおよびライダーのみ)知覚モデルに対する有望な伝達可能性を示す。さらに、NS-MAEは、マスキングオートエンコーダとニューラルラディアンスフィールドの両方の機構の相乗効果を経験的に享受している。我々のコードは受諾後に解放される。 This work proposes a unified self-supervised pre-training framework for transferable multi-modal perception representation learning via masked multi-modal reconstruction in Neural Radiance Field (NeRF), namely NeRF-Supervised Masked AutoEncoder (NS-MAE). Specifically, conditioned on certain view directions and locations, multi-modal embeddings extracted from corrupted multi-modal input signals, i.e., Lidar point clouds and images, are rendered into projected multi-modal feature maps via neural rendering. Then, original multi-modal signals serve as reconstruction targets for the rendered multi-modal feature maps to enable self-supervised representation learning. Extensive experiments show that the representation learned via NS-MAE shows promising transferability for diverse multi-modal and single-modal (camera-only and Lidar-only) perception models on diverse 3D perception downstream tasks (3D object detection and BEV map segmentation) with diverse amounts of fine-tuning labeled data. Moreover, we empirically find that NS-MAE enjoys the synergy of both the mechanism of masked autoencoder and neural radiance field. Our code shall be released upon acceptance.	翻訳日:2023-11-28 00:59:16 公開日:2023-11-23
# 創発組織の原則について On Principles of Emergent Organization ( http://arxiv.org/abs/2311.13749v1 ) ライセンス: Link先を確認	Adam T. Rupe and James P. Crutchfield	(参考訳) 1世紀以上にわたる共同作業の後、物理学は依然として自発的な自己組織化の基本原理を欠いている。その理由を理解すべく、我々はまず問題を述べ、歴史的アプローチを概説し、自己組織化の物理学の現状を調査する。これは、数学的難解性から生じる特定の課題と、計算アプローチの必要性、および構造を定義するための慢性的な失敗から生じる課題の枠組みである。次に、組織における2つの現代的な数学的定式化(内在的計算と進化演算子)の概要が、これらの課題を克服する方法を示している。同時に、それらが得るバンテージポイントは、平衡から遠ざかる系の統計力学を通して構造状態の出現を説明する方法を示している。その結果は、構造を数学的に識別する組織原理への建設的な道のりである。 After more than a century of concerted effort, physics still lacks basic principles of spontaneous self-organization. To appreciate why, we first state the problem, outline historical approaches, and survey the present state of the physics of self-organization. This frames the particular challenges arising from mathematical intractability and the resulting need for computational approaches, as well as those arising from a chronic failure to define structure. Then, an overview of two modern mathematical formulations of organization -- intrinsic computation and evolution operators -- lays out a way to overcome these challenges. Together, the vantage point they afford shows how to account for the emergence of structured states via a statistical mechanics of systems arbitrarily far from equilibrium. The result is a constructive path forward to principles of organization that builds on mathematical identification of structure.	翻訳日:2023-11-28 00:58:46 公開日:2023-11-23
# 拡散のためのサンプル効率トレーニング Sample-Efficient Training for Diffusion ( http://arxiv.org/abs/2311.13745v1 ) ライセンス: Link先を確認	Shivam Gupta, Aditya Parulekar, Eric Price, Zhiyang Xun	(参考訳) スコアベースの拡散モデルは、その経験的性能と信頼性から、画像の深層生成モデルに対する最も一般的なアプローチとなっている。近年,いくつかの理論研究が,l^2$-accurate score 推定を仮定して,拡散モデルが効率的にサンプル化できることを実証している。スコアマッチングの目的は自然に$L^2$の真のスコアを近似するが、既存の境界のサンプルの複雑さはデータ半径と所望のワッサーシュタイン精度に依存する。対照的に、サンプリングの時間複雑性はこれらのパラメータの対数のみである。この多項式依存度は$l^2$ \emph{requires} でスコアを推定するが、ワッサースタイン精度で多対数にスケールする多くのサンプルはサンプリングに十分である。本研究は, 多対数的なサンプル数を用いて, スコアマッチング対象のERMが真の分布の確率$\delta$分数以外のすべてに対して$L^2$精度であり, より弱い保証は効率的なサンプリングに十分であることを示す。 Score-based diffusion models have become the most popular approach to deep generative modeling of images, largely due to their empirical performance and reliability. Recently, a number of theoretical works \citep{chen2022, Chen2022ImprovedAO, Chenetal23flowode, benton2023linear} have shown that diffusion models can efficiently sample, assuming $L^2$-accurate score estimates. The score-matching objective naturally approximates the true score in $L^2$, but the sample complexity of existing bounds depends \emph{polynomially} on the data radius and desired Wasserstein accuracy. By contrast, the time complexity of sampling is only logarithmic in these parameters. We show that estimating the score in $L^2$ \emph{requires} this polynomial dependence, but that a number of samples that scales polylogarithmically in the Wasserstein accuracy actually do suffice for sampling. We show that with a polylogarithmic number of samples, the ERM of the score-matching objective is $L^2$ accurate on all but a probability $\delta$ fraction of the true distribution, and that this weaker guarantee is sufficient for efficient sampling.	翻訳日:2023-11-28 00:58:33 公開日:2023-11-23
# ディープラーニングモデルにおけるセキュリティとプライバシの課題 Security and Privacy Challenges in Deep Learning Models ( http://arxiv.org/abs/2311.13744v1 ) ライセンス: Link先を確認	Gopichandh Golla	(参考訳) 近年、ディープラーニングモデルは、自律運転から医療診断まで、複数の分野で大きな成功を収めている。これらのモデルは、これまで解決が困難だった複雑な問題に対する優れた解決策を提供することで、人工知能の能力を拡大した。さまざまな面で成功しているにも関わらず、ディープラーニングモデルは、ディープニューラルネットワークモデルのモデルセキュリティとデータプライバシを侵害するさまざまな攻撃を受ける可能性がある、という研究を通じて特定されている。ディープラーニングモデルは、ライフサイクルのさまざまな段階でさまざまな攻撃を受けることができる。テストフェーズでは、モデル抽出攻撃、モデル反転攻撃、逆攻撃など、さまざまな種類の攻撃を通じて脆弱性を利用することができる。モデル抽出攻撃は、訓練されたディープラーニングモデルをリバースエンジニアリングすることを目的としており、アーキテクチャとパラメータを明らかにすることが主な目的である。モデル反転攻撃は、ディープラーニングモデルで使用されるデータのプライバシーを侵害することを目的としている。これらの攻撃は、モデルの予測からセンシティブなトレーニングデータを調べることによって、モデルの機密性を損なうために行われる。モデルの応答を分析することで、攻撃者は機密情報を再構築することを目指している。このようにして、モデルのデータプライバシが侵害される。主にコンピュータビジョンモデルに使用される敵攻撃は、悪意のあるテストデータを通じて、モデルが確実に不正な予測を行うよう、モデルを破損させる。これらの攻撃は入力データを微妙に変更し、正常に見えるが、誤った判断をする深層学習モデルを誤解させる。このような攻撃は、モデルの評価とトレーニングフェーズの両方で起こりうる。データ中毒攻撃はトレーニングセットに有害なデータを加え、学習プロセスを破壊し、ディープラーニングモードの信頼性を低下させる。 These days, deep learning models have achieved great success in multiple fields, from autonomous driving to medical diagnosis. These models have expanded the abilities of artificial intelligence by offering great solutions to complex problems that were very difficult to solve earlier. In spite of their unseen success in various, it has been identified, through research conducted, that deep learning models can be subjected to various attacks that compromise model security and data privacy of the Deep Neural Network models. Deep learning models can be subjected to various attacks at different stages of their lifecycle. During the testing phase, attackers can exploit vulnerabilities through different kinds of attacks such as Model Extraction Attacks, Model Inversion attacks, and Adversarial attacks. Model Extraction Attacks are aimed at reverse-engineering a trained deep learning model, with the primary objective of revealing its architecture and parameters. Model inversion attacks aim to compromise the privacy of the data used in the Deep learning model. These attacks are done to compromise the confidentiality of the model by going through the sensitive training data from the model's predictions. By analyzing the model's responses, attackers aim to reconstruct sensitive information. In this way, the model's data privacy is compromised. Adversarial attacks, mainly employed on computer vision models, are made to corrupt models into confidently making incorrect predictions through malicious testing data. These attacks subtly alter the input data, making it look normal but misleading deep learning models to make incorrect decisions. Such attacks can happen during both the model's evaluation and training phases. Data Poisoning Attacks add harmful data to the training set, disrupting the learning process and reducing the reliability of the deep learning mode.	翻訳日:2023-11-28 00:58:11 公開日:2023-11-23
# FinMe: 階層記憶と文字設計を備えたパフォーマンス向上した大規模言語モデルトレーディングエージェント FinMe: A Performance-Enhanced Large Language Model Trading Agent with Layered Memory and Character Design ( http://arxiv.org/abs/2311.13743v1 ) ライセンス: Link先を確認	Yangyang Yu, Haohang Li, Zhi Chen, Yuechen Jiang, Yang Li, Denghui Zhang, Rong Liu, Jordan W. Suchow, Khaldoun Khashanah	(参考訳) 近年のLarge Language Models (LLMs) の進歩は、様々な領域にわたる質問応答(QA)タスクにおいて顕著な効果を示した。彼らの広範なウェブ知識の統合への取り組みは、LSM自律エージェントの開発への関心を喚起した。 LLMは、人間の指示を復号し、歴史的入力を水平に処理することで解を導出するのに効率的であるが、目的駆動エージェントへの移行には、多元的情報処理、推論連鎖の確立、重要なタスクの優先順位付けなどの補助的合理的なアーキテクチャが必要である。そこで我々は, LLM をベースとした新たなエージェントフレームワークである \textsc{FinMe} を導入し, エージェントの特徴を概説するためのプロファイリング, 階層化処理によるエージェントの現実的な階層的金融データの同化を支援するメモリ, メモリから得られる洞察を投資決定に変換するための意思決定, という3つの中核モジュールを包含する。特に、 \textsc{FinMe} のメモリモジュールは人間のトレーダーの認知構造と密接に一致し、堅牢な解釈可能性とリアルタイムチューニングを提供する。その調整可能な認知スパンにより、人間の知覚限界を超えた重要な情報の保持が可能になり、取引結果が向上する。このフレームワークにより、エージェントは自身の専門知識を自発的に活用し、新たな投資のヒントにアジャイルに反応し、不安定な金融環境におけるトレーディング決定を継続的に洗練することができる。まず、さまざまなアルゴリズムエージェントをスケーラブルな現実世界の財務データセットで比較し、株価やファンドにおける主要なトレーディングパフォーマンスを裏付ける。その後、エージェントの知覚範囲を微調整して、重要な取引パフォーマンスを実現しました。集合的に、 \textsc{FinMe} は自動取引のための最先端の LLM エージェントフレームワークを提示し、累積投資リターンを高める。 Recent advancements in Large Language Models (LLMs) have exhibited notable efficacy in question-answering (QA) tasks across diverse domains. Their prowess in integrating extensive web knowledge has fueled interest in developing LLM autonomous agents. While LLMs are efficient in decoding human instructions and deriving solutions by holistically processing historical inputs, transitioning to purpose-driven agents requires a supplementary rational architecture to process multi-source information, establish reasoning chains, and prioritize critical tasks. Addressing this, we introduce \textsc{FinMe}, a novel LLM-based agent framework devised for financial decision-making, encompassing three core modules: Profiling, to outline the agent's characteristics; Memory, with layered processing, to aid the agent in assimilating realistic hierarchical financial data; and Decision-making, to convert insights gained from memories into investment decisions. Notably, \textsc{FinMe}'s memory module aligns closely with the cognitive structure of human traders, offering robust interpretability and real-time tuning. Its adjustable cognitive span allows for the retention of critical information beyond human perceptual limits, thereby enhancing trading outcomes. This framework enables the agent to self-evolve its professional knowledge, react agilely to new investment cues, and continuously refine trading decisions in the volatile financial environment. We first compare \textsc{FinMe} with various algorithmic agents on a scalable real-world financial dataset, underscoring its leading trading performance in stocks and funds. We then fine-tuned the agent's perceptual spans to achieve a significant trading performance. Collectively, \textsc{FinMe} presents a cutting-edge LLM agent framework for automated trading, boosting cumulative investment returns.	翻訳日:2023-11-28 00:57:46 公開日:2023-11-23
# OASIS:フェデレートラーニングにおけるアクティブリコンストラクションアタックのオフセット OASIS: Offsetting Active Reconstruction Attacks in Federated Learning ( http://arxiv.org/abs/2311.13739v1 ) ライセンス: Link先を確認	Tre' R. Jeter, Truc Nguyen, Raed Alharbi, My T. Thai	(参考訳) フェデレーション学習(federated learning, fl)は、モデルのトレーニング効率を高めながら、ユーザのプライバシを保護する可能性に大きな注目を集めている。しかし、近年の研究では、flプロトコルが不正なサーバによって実行されたアクティブリコンストラクション攻撃によって容易に侵害できることが示されている。これらの攻撃には、グローバルモデルパラメータの悪質な修正が含まれており、勾配更新を反転させることで、サーバがユーザのプライベートデータの冗長コピーを取得することができる。このタイプの攻撃に対処することは、強力な脅威モデルのために重要な課題である。本稿では,モデル性能を維持しつつ,能動的再構築攻撃を効果的に防止する画像拡張に基づく防御機構であるoasisを提案する。まず,これらの攻撃を可能にする勾配反転の原理を明らかにし,攻撃戦略によらず防御が堅牢である主条件を理論的に同定する。次に,攻撃原理を損なう可能性があることを示す画像拡張によりoasisを構築する。包括的評価はoasisのソリューションとしての可能性を強調した効果を示している。 Federated Learning (FL) has garnered significant attention for its potential to protect user privacy while enhancing model training efficiency. However, recent research has demonstrated that FL protocols can be easily compromised by active reconstruction attacks executed by dishonest servers. These attacks involve the malicious modification of global model parameters, allowing the server to obtain a verbatim copy of users' private data by inverting their gradient updates. Tackling this class of attack remains a crucial challenge due to the strong threat model. In this paper, we propose OASIS, a defense mechanism based on image augmentation that effectively counteracts active reconstruction attacks while preserving model performance. We first uncover the core principle of gradient inversion that enables these attacks and theoretically identify the main conditions by which the defense can be robust regardless of the attack strategies. We then construct OASIS with image augmentation showing that it can undermine the attack principle. Comprehensive evaluations demonstrate the efficacy of OASIS highlighting its feasibility as a solution.	翻訳日:2023-11-28 00:57:09 公開日:2023-11-23
# フェデレーション学習による車両のインターネット侵入検知の強化 Enhancing Intrusion Detection In Internet Of Vehicles Through Federated Learning ( http://arxiv.org/abs/2311.13800v1 ) ライセンス: Link先を確認	Abhishek Sebastian, Pragna R, Sudhakaran G, Renjith P N and Leela Karthikeyan H	(参考訳) フェデレートラーニング(Federated Learning)は、分散機械学習のテクニックである。複数のパーティが協力して、生データを共有せずに共有モデルを学びます。本稿では,CIC-IDS 2017データセットを用いたIoT(Internet of Vehicles)における侵入検知のための連合学習フレームワークを提案する。提案フレームワークでは,クラス不均衡の処理にSMOTE,異常観測の識別と除去にoutlier Detection,モデルの性能最適化にハイパーパラメータチューニングを採用している。提案手法を各種性能指標を用いて評価し,他のデータセット(KDD-Cup 99およびUNSW-NB-15)と従来の分類器との侵入検出の有効性を示した。さらに,提案フレームワークは,高い侵入検出性能を実現しながら,機密データを保護できる。 Federated learning is a technique of decentralized machine learning. that allows multiple parties to collaborate and learn a shared model without sharing their raw data. Our paper proposes a federated learning framework for intrusion detection in Internet of Vehicles (IOVs) using the CIC-IDS 2017 dataset. The proposed framework employs SMOTE for handling class imbalance, outlier detection for identifying and removing abnormal observations, and hyperparameter tuning to optimize the model's performance. The authors evaluated the proposed framework using various performance metrics and demonstrated its effectiveness in detecting intrusions with other datasets (KDD-Cup 99 and UNSW- NB-15) and conventional classifiers. Furthermore, the proposed framework can protect sensitive data while achieving high intrusion detection performance.	翻訳日:2023-11-28 00:47:33 公開日:2023-11-23
# Evidential Active Recognition: Intelligent and Prudent Open World Embodied Perception Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception ( http://arxiv.org/abs/2311.13793v1 ) ライセンス: Link先を確認	Lei Fan, Mingfu Liang, Yunxuan Li, Gang Hua and Ying Wu	(参考訳) アクティブな認識により、ロボットは新しい観察をインテリジェントに探索し、望ましくない視界を回避しながらより多くの情報を得ることができる。近年のアプローチでは、シミュレーションや収集データからの学習方針が好まれており、認識が正確である場合には適切な行動がより頻繁に選択される。しかし、ほとんどの認識モジュールはクローズドワールド仮定の下で開発されており、現在の観測における対象物体の欠如のような予期せぬ入力を処理できない。そこで本研究では, 逐次的エビデンス収集プロセスとしての能動認識の処理を提案し, 証拠組合せ理論に基づく不確実性定量化と信頼性予測を行う。さらに,本稿で開発された報酬関数は,オープンワールド環境での運用における行動のメリットを効果的に特徴付ける。性能を評価するため,室内シミュレータからデータセットを収集し,距離,閉塞レベル,可視性などの様々な認識課題を含む。認識とロバスト性解析に関する一連の実験を通じて,提案手法の能動認識に不確実性を導入する必要性と優れた性能を示す。 Active recognition enables robots to intelligently explore novel observations, thereby acquiring more information while circumventing undesired viewing conditions. Recent approaches favor learning policies from simulated or collected data, wherein appropriate actions are more frequently selected when the recognition is accurate. However, most recognition modules are developed under the closed-world assumption, which makes them ill-equipped to handle unexpected inputs, such as the absence of the target object in the current observation. To address this issue, we propose treating active recognition as a sequential evidence-gathering process, providing by-step uncertainty quantification and reliable prediction under the evidence combination theory. Additionally, the reward function developed in this paper effectively characterizes the merit of actions when operating in open-world environments. To evaluate the performance, we collect a dataset from an indoor simulator, encompassing various recognition challenges such as distance, occlusion levels, and visibility. Through a series of experiments on recognition and robustness analysis, we demonstrate the necessity of introducing uncertainties to active recognition and the superior performance of the proposed method.	翻訳日:2023-11-28 00:47:20 公開日:2023-11-23
# 単一捕捉イオンを伴うボソニック系のlee-yang零点 Lee-Yang Zeros of a Bosonic system associated with a single trapped ion ( http://arxiv.org/abs/2311.13790v1 ) ライセンス: Link先を確認	Wenjie Shao, Yulian Chen, Ren-Liu, Yiheng Lin	(参考訳) 分割関数の零点、特にlee-yang零点は、複素平面において位相遷移を理解する上で重要な情報を提供する。中心量子系のコヒーレンスと複素平面における環境の分配関数との等価性に関する最近の発見は、スピン系に関するいくつかの先駆的な実験でLee-Yangゼロの実験的な研究を可能にした。リー・ヤンゼロはボソニック系では観測されていない。本稿では,スピンと運動の自由度の間の強い結合,すなわち弱結合ラム・ダイク状態を超えて,単一閉じ込めイオンに関連するボソニック系のリー・ヤン零点を実験的に示す手法を提案する。我々のスキームは、複素平面におけるボソン系の熱力学の量子シミュレーションの新しい可能性を提供する。 Zeros of partition functions, in particular Lee-Yang zeros, in a complex plane provide important information for understanding phase transitions. A recent discovery on the equivalence between the coherence of a central quantum system and the partition function of the environment in the complex plane enabled the experimental study of Lee-Yang zeros, with several pioneering experiments on spin systems. Lee-Yang zeros have not been observed in Bosonic systems. Here we propose an experimental scheme to demonstrate Lee-Yang zeros in Bosonic systems associated with a single trapped ion by introducing strong coupling between the spin and motion degrees of freedom, i.e. beyond the weak coupling Lamb-Dicke regime. Our scheme provides new possibilities for quantum simulation of the thermodynamics of Bosonic systems in the complex plane.	翻訳日:2023-11-28 00:47:00 公開日:2023-11-23
# 知識蒸留に基づく複数ユーザのための意味コミュニケーション Knowledge Distillation Based Semantic Communications For Multiple Users ( http://arxiv.org/abs/2311.13789v1 ) ライセンス: Link先を確認	Chenguang Liu, Yuxin Zhou, Yunfei Chen and Shuang-Hua Yang	(参考訳) ディープラーニング(DL)は,従来のコミュニケーションシステムに革命をもたらす大きな可能性を示している。コミュニケーションにおける多くのアプリケーションは、強力な表現能力のためにDL技術を採用している。しかしながら、学習に基づく手法は、トレーニングデータセットに依存し、モデルの一般化性や複雑さが限られているため、見過ごされない干渉によりさらに悪化する可能性がある。本稿では,複数のユーザを対象としたセマンティックコミュニケーション(SemCom)システムについて考察する。そこで本研究では,トランスフォーマーをベースとしたエンコーダデコーダをセマンティックエンコーダデコーダとして実装し,チャネルエンコーダデコーダとして完全に接続されたニューラルネットワークを実装した知識蒸留(KD)システムを提案する。具体的には,4種類の知識伝達とモデル圧縮を解析する。ノイズと干渉のレベル、干渉ユーザ数、エンコーダとデコーダのサイズなど、重要なシステムとモデルパラメータが考慮されている。数値計算の結果,kdは不意な干渉に適用した場合のロバスト性と一般化能力を大幅に改善し,モデルサイズ圧縮時の性能損失を低減できることがわかった。 Deep learning (DL) has shown great potential in revolutionizing the traditional communications system. Many applications in communications have adopted DL techniques due to their powerful representation ability. However, the learning-based methods can be dependent on the training dataset and perform worse on unseen interference due to limited model generalizability and complexity. In this paper, we consider the semantic communication (SemCom) system with multiple users, where there is a limited number of training samples and unexpected interference. To improve the model generalization ability and reduce the model size, we propose a knowledge distillation (KD) based system where Transformer based encoder-decoder is implemented as the semantic encoder-decoder and fully connected neural networks are implemented as the channel encoder-decoder. Specifically, four types of knowledge transfer and model compression are analyzed. Important system and model parameters are considered, including the level of noise and interference, the number of interfering users and the size of the encoder and decoder. Numerical results demonstrate that KD significantly improves the robustness and the generalization ability when applied to unexpected interference, and it reduces the performance loss when compressing the model size.	翻訳日:2023-11-28 00:46:45 公開日:2023-11-23
# DaG LLM ver 1.0: 韓国NLPのための命令調整言語モデリングのパイオニア化 DaG LLM ver 1.0: Pioneering Instruction-Tuned Language Modeling for Korean NLP ( http://arxiv.org/abs/2311.13784v1 ) ライセンス: Link先を確認	Dongjun Jang, Sangah Lee, Sungjoo Byun, Jinwoong Kim, Jean Seo, Minseok Kim, Soyeon Kim, Chaeyoung Oh, Jaeyoon Kim, Hyemi Jo, Hyopil Shin	(参考訳) 本稿では,韓国語に特化した言語モデルであるDaG LLM(David and Goliath Large Language Model)について述べる。 This paper presents the DaG LLM (David and Goliath Large Language Model), a language model specialized for Korean and fine-tuned through Instruction Tuning across 41 tasks within 13 distinct categories.	翻訳日:2023-11-28 00:46:25 公開日:2023-11-23
# ネットワークセマンティック通信のためのスケーラブルAI生成コンテンツ Scalable AI Generative Content for Vehicular Network Semantic Communication ( http://arxiv.org/abs/2311.13782v1 ) ライセンス: Link先を確認	Hao Feng, Yi Yang, Zhu Han	(参考訳) ドライバーの盲点における車両の認識は安全な運転には不可欠である。これらの盲点における潜在的に危険な車両の検出は、車載ネットワークセマンティック通信技術の恩恵を受けることができる。しかし、効率的なセマンティック通信は、特に帯域幅に制限のある状況において、精度と遅延の間のトレードオフを伴う。本稿では,エンコーダ・デコーダアーキテクチャを活用したスケーラブルな人工知能生成コンテンツ(aigc)システムを提案する。本システムは,画像をテキスト表現に変換し,高品質な画像に再構成し,車載ネットワークセマンティック通信の伝送を最適化する。また、帯域幅が許されると補助情報も統合される。エンコーダデコーダは、様々なタスクにわたる元の画像とのセマンティックな等価性を維持することを目的としている。提案手法は強化学習を用いて生成したコンテンツの信頼性を高める。実験結果から,提案手法はブラインドスポットにおける車両のベースラインを超え,通信データを効果的に圧縮することが示唆された。この手法はシナリオを駆動するために特別に設計されているが、このエンコーダ・デコーダアーキテクチャは様々なセマンティック通信シナリオにまたがる幅広い用途の可能性を秘めている。 Perceiving vehicles in a driver's blind spot is vital for safe driving. The detection of potentially dangerous vehicles in these blind spots can benefit from vehicular network semantic communication technology. However, efficient semantic communication involves a trade-off between accuracy and delay, especially in bandwidth-limited situations. This paper unveils a scalable Artificial Intelligence Generated Content (AIGC) system that leverages an encoder-decoder architecture. This system converts images into textual representations and reconstructs them into quality-acceptable images, optimizing transmission for vehicular network semantic communication. Moreover, when bandwidth allows, auxiliary information is integrated. The encoder-decoder aims to maintain semantic equivalence with the original images across various tasks. Then the proposed approach employs reinforcement learning to enhance the reliability of the generated contents. Experimental results suggest that the proposed method surpasses the baseline in perceiving vehicles in blind spots and effectively compresses communication data. While this method is specifically designed for driving scenarios, this encoder-decoder architecture also holds potential for wide use across various semantic communication scenarios.	翻訳日:2023-11-28 00:46:19 公開日:2023-11-23
# 効率的な複合人体運動予測のための動的構成グラフ畳み込みネットワーク Dynamic Compositional Graph Convolutional Network for Efficient Composite Human Motion Prediction ( http://arxiv.org/abs/2311.13781v1 ) ライセンス: Link先を確認	Wanying Zhang, Shen Zhao, Fanyang Meng, Songtao Wu, Mengyuan Liu	(参考訳) インテリジェントな監視と人間とロボットのインタラクションを含む分野への潜在的な応用により、人間の動き予測タスクはホットな研究トピックとなり、特に最近のグラフ畳み込みネットワーク(GCN)を用いて高い成功を収めた。現在の人間の運動予測タスクは、通常、原子の動きの予測に焦点を当てている。原子の作用が同時に起こり得ることを観察し,その複合作用を定式化することにより,複合人体動作予測タスクを提案する。この課題に対処するために,まず複合動作生成(cag)モジュールを提示し,訓練用合成複合アクションを生成し,複合アクションサンプル収集の手間を回避する。さらに,動的構成グラフ畳み込みネットワーク(DC-GCN)を提示することにより,複合動作がより複雑なモデルの需要に与える影響を緩和する。 Human3.6Mデータセットと新たに収集したCHAMPデータセットの大規模な実験により、最新の動き予測精度を実現するDC-GCN法の効率が一貫して検証され、一方、従来のGCNベースの人間の動き法よりも計算コストが少なくなる。 With potential applications in fields including intelligent surveillance and human-robot interaction, the human motion prediction task has become a hot research topic and also has achieved high success, especially using the recent Graph Convolutional Network (GCN). Current human motion prediction task usually focuses on predicting human motions for atomic actions. Observing that atomic actions can happen at the same time and thus formulating the composite actions, we propose the composite human motion prediction task. To handle this task, we first present a Composite Action Generation (CAG) module to generate synthetic composite actions for training, thus avoiding the laborious work of collecting composite action samples. Moreover, we alleviate the effect of composite actions on demand for a more complicated model by presenting a Dynamic Compositional Graph Convolutional Network (DC-GCN). Extensive experiments on the Human3.6M dataset and our newly collected CHAMP dataset consistently verify the efficiency of our DC-GCN method, which achieves state-of-the-art motion prediction accuracies and meanwhile needs few extra computational costs than traditional GCN-based human motion methods.	翻訳日:2023-11-28 00:46:03 公開日:2023-11-23
# ハイパースペクトル画像のPCA高速化リアルタイム処理の検出と同定精度 Detection and Identification Accuracy of PCA-Accelerated Real-Time Processing of Hyperspectral Imagery ( http://arxiv.org/abs/2311.13779v1 ) ライセンス: Link先を確認	Abigail Basener and Meagan Herald	(参考訳) リアルタイムまたはほぼリアルタイムのハイパースペクトル検出と同定は、多くの分野で非常に有用であり、必要である。これらのデータセットは非常に大きく、アルゴリズムは処理を遅くする多数の計算を必要とする可能性がある。プロセスの高速化の一般的な方法は、次元の縮小に主成分分析(PCA)を使用することである。主成分のサブセットによって提供される縮小次元空間では、データの処理に必要な計算量が少なくなり、実行時間が短縮される。本稿では,PCAの使用時間を削減するために,検出率に最小限の影響を伴って,主成分の省略数を調べることで,PCAの使用時間を短縮する手法を提案する。 aceを用いて検出を行い、次に確率とスペクトルを同定し、検出率の顕著な変化を見る前に、主成分の数をかなりの量削減できることを示す。 Real-time or near real-time hyperspectral detection and identification are extremely useful and needed in many fields. These data sets can be quite large, and the algorithms can require numerous computations that slow the process down. A common way of speeding up the process is to use principal component analysis (PCA) for dimension reduction. In the reduced dimensional space, provided by a subset of the principal components, fewer computations are needed to process the data resulting in a faster run time. In this paper, we propose a way to further decrease the time required to use PCA by investigating how many principal components may be omitted with minimal impact on the detection rate. Using ACE to perform the detection, and then probability, and spectral fit for identification, we find that the number of principal components can be reduced by a substantial amount before seeing a noticeable change in detection rates.	翻訳日:2023-11-28 00:45:42 公開日:2023-11-23
# GS-Pose:幾何学的・意味的対応によるカテゴリーレベルオブジェクトポス推定 GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence ( http://arxiv.org/abs/2311.13777v1 ) ライセンス: Link先を確認	Pengyuan Wang, Takuya Ikeda, Robert Lee, Koichi Nishiwaki	(参考訳) カテゴリーレベルのポーズ推定は、コンピュータビジョンとロボット工学における多くの潜在的な応用において難しい課題である。近年、ディープラーニングベースのアプローチは大きな進歩を遂げているが、通常はポーズラベル付き実画像の大規模なデータセットの必要性や、注意深く調整されたフォトリアリスティックシミュレータの必要性によって妨げられている。これは、深度画像などの幾何入力のみを使用してドメインギャップを減らすことで回避できるが、これらのアプローチは意味情報の欠如に苦しむため、ポーズ推定問題において不可欠である。この矛盾を解決するために,我々は,事前学習した基礎モデルから得られた幾何学的特徴と意味的特徴の両方を利用するように提案する。我々のアプローチでは,この基礎モデルから2d特徴をカテゴリ毎に1つのオブジェクトモデルに対して3dに計画する。セマンティクス機能はオブジェクトのテクスチャと外観にロバストであるため、トレーニングするデータ量は以前のメソッドよりもはるかに少ない。我々はこれをリッチな評価で実証し、必要なデータの一部で事前の手法よりも優れた性能を示す。 Category-level pose estimation is a challenging task with many potential applications in computer vision and robotics. Recently, deep-learning-based approaches have made great progress, but are typically hindered by the need for large datasets of either pose-labelled real images or carefully tuned photorealistic simulators. This can be avoided by using only geometry inputs such as depth images to reduce the domain-gap but these approaches suffer from a lack of semantic information, which can be vital in the pose estimation problem. To resolve this conflict, we propose to utilize both geometric and semantic features obtained from a pre-trained foundation model.Our approach projects 2D features from this foundation model into 3D for a single object model per category, and then performs matching against this for new single view observations of unseen object instances with a trained matching network. This requires significantly less data to train than prior methods since the semantic features are robust to object texture and appearance. We demonstrate this with a rich evaluation, showing improved performance over prior methods with a fraction of the data required.	翻訳日:2023-11-28 00:45:29 公開日:2023-11-23
# メソスコピック超高速非線形光学 --多モード量子非ガウス物理学の出現 Mesoscopic ultrafast nonlinear optics -- The emergence of multimode quantum non-Gaussian physics ( http://arxiv.org/abs/2311.13775v1 ) ライセンス: Link先を確認	Ryotatsu Yanagimoto, Edwin Ng, Marc Jankowski, Rajveer Nehra, Timothy P. McKenna, Tatsuhiro Onodera, Logan G. Wright, Ryan Hamerly, Alireza Marandi, M. M. Fejer, Hideo Mabuchi	(参考訳) 過去数十年間、非線形光学は大幅に非線形になり、エネルギー効率は10億倍近く向上し、特に超高速の非線形ナノフォトニクスは空間工学と時間工学の融合のフロンティアとして登場してきた。現在、非線形ナノフォトニクスにおける最先端の実験は、数百個の光子が非線形飽和を誘発するメソスコピックレジームのすぐ上に位置している。古典光学や深量子光学とは対照的に、メソスケールは平均場、ガウス的、非ガウス的量子的特徴の間の動的相互作用によって特徴づけられる。光場の本質的に多重モードの複雑さと組み合わせると、そのようなハイブリッド量子古典力学は現代の量子光学の枠組みに理論的、実験的、および工学的な課題をもたらす。本稿では、メソスケールにおけるマルチモード非線形光学において現れる特異な物理を取り上げ、古典的特徴と量子的特徴の両方を活用するための鍵となる原理を概説する。我々は, 材料, 分散工学, およびメゾスコピック操作を行うためのデバイス設計において, 優れた技術的課題に留意する。最後に、量子情報処理から非古典的光駆動ダイナミクスや現象、全光学的非ゲージ計測やセンシングまで、量子フォトニクスにおけるこれらの能力がどのような新しいパラダイムをもたらすのかを推測する。メソスケールで解き放たれた物理学は、理論と実験における重要な課題と機会をも示しており、このレビューは、超高速量子非線形光学におけるこの新たなフロンティアをナビゲートする上での指針となることを意図している。 Over the last few decades, nonlinear optics has become significantly more nonlinear, traversing nearly a billionfold improvement in energy efficiency, with ultrafast nonlinear nanophotonics in particular emerging as a frontier for combining both spatial and temporal engineering. At present, cutting-edge experiments in nonlinear nanophotonics place us just above the mesoscopic regime, where a few hundred photons suffice to trigger nonlinear saturation. In contrast to classical or deep-quantum optics, the mesoscale is characterized by dynamical interactions between mean-field, Gaussian, and non-Gaussian quantum features, all within a close hierarchy of scales. When combined with the inherent multimode complexity of optical fields, such hybrid quantum-classical dynamics present theoretical, experimental, and engineering challenges to the contemporary framework of quantum optics. In this review, we highlight the unique physics that emerges in multimode nonlinear optics at the mesoscale and outline key principles for exploiting both classical and quantum features to engineer novel functionalities. We briefly survey the experimental landscape and draw attention to outstanding technical challenges in materials, dispersion engineering, and device design for accessing mesoscopic operation. Finally, we speculate on how these capabilities might usher in some new paradigms in quantum photonics, from quantum-augmented information processing to nonclassical-light-driven dynamics and phenomena to all-optical non-Gaussian measurement and sensing. The physics unlocked at the mesoscale present significant challenges and opportunities in theory and experiment alike, and this review is intended to serve as a guidepost as we begin to navigate this new frontier in ultrafast quantum nonlinear optics.	翻訳日:2023-11-28 00:45:07 公開日:2023-11-23
# 3層ニューラルネットワークによる階層多項式の学習 Learning Hierarchical Polynomials with Three-Layer Neural Networks ( http://arxiv.org/abs/2311.13774v1 ) ライセンス: Link先を確認	Zihao Wang, Eshaan Nichani, Jason D. Lee	(参考訳) 3層ニューラルネットワークを用いた標準ガウス分布における階層多項式の学習問題について検討する。ここで、$p : \mathbb{r}^d \rightarrow \mathbb{r}$ は次数 $k$ 多項式であり、$g: \mathbb{r} \rightarrow \mathbb{r}$ は次数 $q$ 多項式である。この関数クラスは、$k=1$に対応する単一インデックスモデルを一般化し、基礎となる階層構造を持つ関数の自然なクラスである。我々の主な結果は、次数$k$多項式の大規模サブクラス$p$に対して、正方形損失の層次勾配降下によってトレーニングされた3層ニューラルネットワークは、$\widetilde{\mathcal{O}}(d^k)$サンプルと多項式時間でテストエラーを消すための目標$h$を学習することを示している。これはカーネルメソッドに対する厳格な改善であり、$\widetilde \theta(d^{kq})$サンプルと、ターゲット関数を低ランクで要求する2層ネットワークに対する既存の保証が必要である。また,3層ニューラルネットワークに関する先行研究を一般化し,これを2次ニューラルネットワークである$p$に制限した。実際に$p$が二次であるとき、情報理論上最適なサンプル複雑性 $\widetilde{\mathcal{O}}(d^2)$ が得られ、これは以前の作業よりも改善され、サンプルサイズが$\widetilde\Theta(d^4)$ となる。我々の証明は、トレーニングの初期段階において、ネットワークが機能学習を行い、$\widetilde{\mathcal{O}}(d^k)$サンプルで$$p$の機能を回復することを示す。この研究は、複雑な特徴を学習する3層ニューラルネットワークの能力を示し、その結果、階層関数の幅広いクラスを学習する。 We study the problem of learning hierarchical polynomials over the standard Gaussian distribution with three-layer neural networks. We specifically consider target functions of the form $h = g \circ p$ where $p : \mathbb{R}^d \rightarrow \mathbb{R}$ is a degree $k$ polynomial and $g: \mathbb{R} \rightarrow \mathbb{R}$ is a degree $q$ polynomial. This function class generalizes the single-index model, which corresponds to $k=1$, and is a natural class of functions possessing an underlying hierarchical structure. Our main result shows that for a large subclass of degree $k$ polynomials $p$, a three-layer neural network trained via layerwise gradient descent on the square loss learns the target $h$ up to vanishing test error in $\widetilde{\mathcal{O}}(d^k)$ samples and polynomial time. This is a strict improvement over kernel methods, which require $\widetilde \Theta(d^{kq})$ samples, as well as existing guarantees for two-layer networks, which require the target function to be low-rank. Our result also generalizes prior works on three-layer neural networks, which were restricted to the case of $p$ being a quadratic. When $p$ is indeed a quadratic, we achieve the information-theoretically optimal sample complexity $\widetilde{\mathcal{O}}(d^2)$, which is an improvement over prior work~\citep{nichani2023provable} requiring a sample size of $\widetilde\Theta(d^4)$. Our proof proceeds by showing that during the initial stage of training the network performs feature learning to recover the feature $p$ with $\widetilde{\mathcal{O}}(d^k)$ samples. This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.	翻訳日:2023-11-28 00:44:34 公開日:2023-11-23
# 身体運動のアーカイブ:中国書道の集合的生成 Archiving Body Movements: Collective Generation of Chinese Calligraphy ( http://arxiv.org/abs/2311.13770v1 ) ライセンス: Link先を確認	Aven Le Zhou, Jiayi Ye, Tianchen Liu, Kang Zhang	(参考訳) コミュニケーションチャネルとして、身体運動は行動研究やキネシクスで広く研究されている。演技と視覚芸術は同じ関心を持っているが、ダンス表記や視覚作品の作成など、人間の身体運動の文書化と表現に焦点を当てている。本稿では,東洋書道における身体運動と,身体運動を刺激し,アーカイブする書道原理について検討する。作品(ウーシュ)を通して,著者らは,生成した書道の要約として,身体的参加や身体運動のアーカイブを行うための対話的かつ生成的なアプローチを試した。読者は作家と読者の両方の役割を引き受け、文字や書道に関するさらなる注意と議論の動機となる無限の「本」の中で、生成した書を創造し、鑑賞する(読む)ことは循環的なプロセスとなる。 As a communication channel, body movements have been widely explored in behavioral studies and kinesics. Performing and visual arts share the same interests but focus on documenting and representing human body movements, such as for dance notation and visual work creation. This paper investigates body movements in oriental calligraphy and how to apply calligraphy principles to stimulate and archive body movements. Through an artwork (Wushu), the authors experiment with an interactive and generative approach to engage the audience's bodily participation and archive the body movements as a compendium of generated calligraphy. The audience assumes the role of both writers and readers; creating ("writing") and appreciating ("reading") the generated calligraphy becomes a cyclical process within this infinite "Book," which can motivate further attention and discussions concerning Chinese characters and calligraphy.	翻訳日:2023-11-28 00:43:52 公開日:2023-11-23
# 効果的なグラフ学習によるフェアスペクトルクラスタリングのための統一フレームワーク A Unified Framework for Fair Spectral Clustering With Effective Graph Learning ( http://arxiv.org/abs/2311.13766v1 ) ライセンス: Link先を確認	Xiang Zhang, Qiao Wang	(参考訳) グループフェアネス制約下でのスペクトルクラスタリングの問題を考察し、各感度グループからのサンプルは各クラスタでほぼ比例的に表現される。従来のフェアスペクトルクラスタリング(FSC)手法は、2つの連続的な段階、すなわち与えられたグラフにフェアスペクトルを埋め込み、離散クラスタラベルを得るために$k$meansを実行する。しかし、実際にはグラフは通常不明であり、潜在的にノイズの多いデータから基礎となるグラフを構築する必要がある。さらに、別々のステップでFSCを実行すると、これらのステップ間の接続が断ち切られ、最適な結果が得られます。この目的のために、構築されたグラフがFSCに与える影響を理論的に解析する。そこで本研究では,ノード適応グラフフィルタを用いた新しいグラフ構築手法を提案し,ノイズの多いデータからグラフを学習する。そして、従来のSFCのすべての独立したステージを単一の目的関数に統合し、生データを入力し、離散クラスタラベルを出力するエンドツーエンドのフレームワークを形成する。各段の変数を共同で交互に更新するアルゴリズムが開発された。最後に、我々は、合成、ベンチマーク、および実データに関する広範な実験を行い、我々のモデルは最先端の公正クラスタリング手法よりも優れていることを示す。 We consider the problem of spectral clustering under group fairness constraints, where samples from each sensitive group are approximately proportionally represented in each cluster. Traditional fair spectral clustering (FSC) methods consist of two consecutive stages, i.e., performing fair spectral embedding on a given graph and conducting $k$means to obtain discrete cluster labels. However, in practice, the graph is usually unknown, and we need to construct the underlying graph from potentially noisy data, the quality of which inevitably affects subsequent fair clustering performance. Furthermore, performing FSC through separate steps breaks the connections among these steps, leading to suboptimal results. To this end, we first theoretically analyze the effect of the constructed graph on FSC. Motivated by the analysis, we propose a novel graph construction method with a node-adaptive graph filter to learn graphs from noisy data. Then, all independent stages of conventional FSC are integrated into a single objective function, forming an end-to-end framework that inputs raw data and outputs discrete cluster labels. An algorithm is developed to jointly and alternately update the variables in each stage. Finally, we conduct extensive experiments on synthetic, benchmark, and real data, which show that our model is superior to state-of-the-art fair clustering methods.	翻訳日:2023-11-28 00:43:37 公開日:2023-11-23
# 展開時に収集したデータから骨格資源のオンライン配置のための最適かつ公正な政策の学習 Learning Optimal and Fair Policies for Online Allocation of Scarce Societal Resources from Data Collected in Deployment ( http://arxiv.org/abs/2311.13765v1 ) ライセンス: Link先を確認	Bill Tang, \c{C}a\u{g}{\i}l Ko\c{c}yi\u{g}it, Eric Rice, Phebe Vayanos	(参考訳) 本研究では,待機リスト上の異種アロケート(ホームレス,末期腎疾患患者,covid-19患者など)に対して,異なるタイプの希少社会資源(永住,移植用ドナー腎臓,人工呼吸器など)を,観察された共変量に基づいて割り当てる問題について検討した。デプロイメントで収集した管理データを活用して、長期的には予算制約を満たしながら、期待される成果を最大化するオンラインポリシーを設計します。提案するポリシウェイトリストは,各リソースに対する評価平均処理結果と推定資源の2値値との差を最大化するか,あるいは大まかに言えば,リソースの利用機会コストを最大化する。リソースは、最初の最初のサービスとして、到着時に割り当てられる。我々は,我々のデータ駆動型政策が,穏やかな技術的前提の下で,最適なサンプル外政策の期待結果をほぼ確実に達成できることを実証した。フレームワークを拡張して、さまざまな公正な制約を取り入れます。ホームレス管理情報システムから得られたデータをもとに,ロサンゼルスのホームレス体験者を対象に,不足する住宅資源を割り当てる政策を設計する上でのアプローチの有効性を評価した。特に,我々の政策は,ホームレスからの退去率を1.9%向上させ,人種による配分や結果に公平な政策は,非常に低いフェアネス価格となることを示す。 We study the problem of allocating scarce societal resources of different types (e.g., permanent housing, deceased donor kidneys for transplantation, ventilators) to heterogeneous allocatees on a waitlist (e.g., people experiencing homelessness, individuals suffering from end-stage renal disease, Covid-19 patients) based on their observed covariates. We leverage administrative data collected in deployment to design an online policy that maximizes expected outcomes while satisfying budget constraints, in the long run. Our proposed policy waitlists each individual for the resource maximizing the difference between their estimated mean treatment outcome and the estimated resource dual-price or, roughly, the opportunity cost of using the resource. Resources are then allocated as they arrive, in a first-come first-serve fashion. We demonstrate that our data-driven policy almost surely asymptotically achieves the expected outcome of the optimal out-of-sample policy under mild technical assumptions. We extend our framework to incorporate various fairness constraints. We evaluate the performance of our approach on the problem of designing policies for allocating scarce housing resources to people experiencing homelessness in Los Angeles based on data from the homeless management information system. In particular, we show that using our policies improves rates of exit from homelessness by 1.9% and that policies that are fair in either allocation or outcomes by race come at a very low price of fairness.	翻訳日:2023-11-28 00:43:16 公開日:2023-11-23
# J-TEXTにおけるニューラルネットワークに基づくロックモード検出器によるn = 0ピックアップの抽出 Extraction of n = 0 pick-up by locked mode detectors based on neural networks in J-TEXT ( http://arxiv.org/abs/2311.13763v1 ) ライセンス: Link先を確認	Chengshuo Shen, Jianchao Li, Yonghua Ding, Jiaolong Dong, Nengchao Wang, Dongliang.Han, Feiyue Mao, Da Li, Zhipeng Chen, Zhoujun Yang, Zhongyong Chen, Yuan Pan and J-Text Team	(参考訳) ロックモード(LM)の測定は磁気流体力学(MHD)不安定性とプラズマ破壊の物理的研究において重要である。 n = 0 のピックアップは、LMの振幅と位相を計算するために抽出および減算する必要がある。 J-TEXTのニューラルネットワーク(NN)に基づくLM検出器により,n = 0のピックアップbrn=0を予測することにより,このピックアップを抽出する新たな手法を開発した。 power multiple time scale (pmts) と呼ばれる手法が開発され、複数の周波数範囲で優れた回帰効果が得られた。 PMTS NNをベースとした3つのモデルが開発されている。 PMTSは時間領域と周波数領域の両方でほとんど誤差のないLM検出器にbrn=0を適合させることができた。抽出したbrn=0を減算した後、共鳴磁気摂動(RMP)により生じるn>0ピックアップbrn>0が得られる。この方法では、4個のLM検出器の代わりに1個のLMのみを使用してbrn=0を抽出する。したがって, この手法により, LM検出器の分布を最適化することもできる。 Measurement of locked mode (LM) is important for the physical research of Magnetohydrodynamic (MHD) instabilities and plasma disruption. The n = 0 pick-up need to be extracted and subtracted to calculate the amplitude and phase of the LM. A new method to extract this pick-up has been developed by predicting the n = 0 pick-up brn=0 by the LM detectors based on Neural Networks (NNs) in J-TEXT. An approach called Power Multiple Time Scale (PMTS) has been developed with outstanding regressing effect in multiple frequency ranges. Three models have been progressed based on PMTS NNs. PMTS could fit the brn=0 on the LM detectors with little errors both in time domain and frequency domain. The n>0 pick-up brn>0 generated by resonant magnetic perturbations (RMPs) can be obtained after subtracting the extracted brn=0. This new method uses only one LM instead of 4 LM detectors to extract brn=0. Therefore, the distribution of the LM detectors can also be optimized based on this new method.	翻訳日:2023-11-28 00:42:50 公開日:2023-11-23
# 可変レート画像圧縮のためのビジュアルプロンプトチューニングによるプログレッシブラーニング Progressive Learning with Visual Prompt Tuning for Variable-Rate Image Compression ( http://arxiv.org/abs/2311.13846v1 ) ライセンス: Link先を確認	Shiyu Qin, Yimin Zhou, Jinpeng Wang, Bin Chen, Baoyi An, Tao Dai, Shu-Tao Xia	(参考訳) 本稿では,変圧器を用いた可変レート画像圧縮のための漸進学習パラダイムを提案する。提案手法は,Layer-Adaptive Prompt Module (LPM) の助けを借りて,幅広い圧縮率をカバーする。視覚的プロンプトチューニングにより,LPMを用いてエンコーダ側の入力画像とデコーダ側の隠れ特徴のプロンプトを抽出し,事前学習されたトランスフォーマーベース画像圧縮モデルのSwinトランスフォーマー層に付加情報として供給し,アテンション領域とビットの割り当てに影響を及ぼし,モデルの目標圧縮率を変化させる。ネットワークがより軽量であることを保証するため、より畳み込みの少ないプロンプトネットワークの統合を伴います。実験の結果,異なるターゲットレートで個別に最適化された複数のモデルに基づく手法と比較して,パラメータストレージの80%,データセットの90%の削減で,提案手法は同一性能に到達した。一方,本モデルでは,現在の可変ビットレート画像法をレートゆらぎ性能で上回り,スクラッチからトレーニングした最先端の固定ビットレート画像圧縮手法にアプローチする。 In this paper, we propose a progressive learning paradigm for transformer-based variable-rate image compression. Our approach covers a wide range of compression rates with the assistance of the Layer-adaptive Prompt Module (LPM). Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively, which are fed as additional information into the Swin Transformer layer of a pre-trained transformer-based image compression model to affect the allocation of attention region and the bits, which in turn changes the target compression ratio of the model. To ensure the network is more lightweight, we involves the integration of prompt networks with less convolutional layers. Exhaustive experiments show that compared to methods based on multiple models, which are optimized separately for different target rates, the proposed method arrives at the same performance with 80% savings in parameter storage and 90% savings in datasets. Meanwhile, our model outperforms all current variable bitrate image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed bitrate image compression methods trained from scratch.	翻訳日:2023-11-28 00:36:03 公開日:2023-11-23
# プッシュフォワードマップを用いたツアーサンプリング Touring sampling with pushforward maps ( http://arxiv.org/abs/2311.13845v1 ) ライセンス: Link先を確認	Vivien Cabannes, Charles Arnal	(参考訳) 強力な機械学習手法を特定の問題に当てはめようとしている実践者にとって、サンプリングメソッドの数は恐ろしいかもしれない。本稿では,「世代モデリング」設定における多くのサンプリング手法の見直しと整理に理論的スタンスを採り入れ,いくつかのトレーニング例に類似した新しいデータを作成したいと考えている。既存の手法間のリンクを明らかにすることで、拡散シミュレーションによる長い推論時間や生成されたサンプルの多様性の欠如といった、現在の拡散モデルによるサンプリングの課題を克服できる可能性がある。 The number of sampling methods could be daunting for a practitioner looking to cast powerful machine learning methods to their specific problem. This paper takes a theoretical stance to review and organize many sampling approaches in the ``generative modeling'' setting, where one wants to generate new data that are similar to some training examples. By revealing links between existing methods, it might prove useful to overcome some of the current challenges in sampling with diffusion models, such as long inference time due to diffusion simulation, or the lack of diversity in generated samples.	翻訳日:2023-11-28 00:35:26 公開日:2023-11-23
# テンポアテンショングラフニューラルネットワークによる完全組合せ最適化 Exact Combinatorial Optimization with Temporo-Attentional Graph Neural Networks ( http://arxiv.org/abs/2311.13843v1 ) ライセンス: Link先を確認	Mehdi Seyfi, Amin Banitalebi-Dehkordi, Zirui Zhou, and Yong Zhang	(参考訳) 組合せ最適化は、変数と制約の離散集合の中で最適な解を見つける。この分野は研究と産業の両方で大きく進歩している。過去10年間のディープラーニングの成功により、組合せ最適化の最近の傾向は、キーヒューリスティックコンポーネントを機械学習(ML)モデルに置き換えることで、最先端の組合せ最適化問題を改善している。本稿では,組合せ最適化のための機械学習アルゴリズムの2つの重要な側面について考察する。分岐とバウンド(B&B)アルゴリズムにおける変数選択のタスクでは、時間情報と二部グラフの注意を組み込むことで、解法の性能が向上すると主張している。文献やコンペティションで使用されるいくつかの標準データセットに対する直観と数値結果による主張を支持する。コードは、https://developer.huaweicloud.com/develop/aigallery/notebook/detail? id=047c6cf2-8463-40d7-b92f-7b2ca998e935 Combinatorial optimization finds an optimal solution within a discrete set of variables and constraints. The field has seen tremendous progress both in research and industry. With the success of deep learning in the past decade, a recent trend in combinatorial optimization has been to improve state-of-the-art combinatorial optimization solvers by replacing key heuristic components with machine learning (ML) models. In this paper, we investigate two essential aspects of machine learning algorithms for combinatorial optimization: temporal characteristics and attention. We argue that for the task of variable selection in the branch-and-bound (B&B) algorithm, incorporating the temporal information as well as the bipartite graph attention improves the solver's performance. We support our claims with intuitions and numerical results over several standard datasets used in the literature and competitions. Code is available at: https://developer.huaweicloud.com/develop/aigallery/notebook/detail?id=047c6cf2-8463-40d7-b92f-7b2ca998e935	翻訳日:2023-11-28 00:35:10 公開日:2023-11-23
# Lego: テキストと画像の拡散モデルにおいて、オブジェクトの出現以上の概念を分離し、逆転させる学習 Lego: Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models ( http://arxiv.org/abs/2311.13833v1 ) ライセンス: Link先を確認	Saman Motamed and Danda Pani Paudel and Luc Van Gool	(参考訳) 拡散モデルは生成コンテンツの作成に革命をもたらし、特にテキスト・ツー・イメージ(t2i)拡散モデルは自然言語を用いたシーン合成を可能にし、ユーザの創造的自由度を高めた。 T2Iモデルは名詞、外観、スタイルといった概念の合成に優れている。所望のコンセプトを倒してテキスト反転やドリームブースなどの手法を用いて、コンセプトの少数の例画像に基づいてカスタマイズされたコンテンツ作成を可能とし、新たなシーンで合成できるようにする。しかし、オブジェクトの外観やスタイル(形容詞や動詞)を自然言語で超越した、より一般的な概念を逆転することは、依然として課題である。これらの概念の2つの重要な特徴は、現在の反転法の限界に寄与する。 1)形容詞と動詞は名詞(形容詞)と絡み合っており,概念埋め込みに主語出現が漏れる出現に基づく反転法を阻害しうる。 2) 従来の手法では扱えない単一の単語の埋め込み(氷で凍ったり、綱渡りなど)を超えて、そのような概念を記述することも多い。そこで本研究では,いくつかの例から対象の絡み合った概念を逆転するテキスト変換手法であるLegoを紹介する。 legoは、概念を、単純かつ効果的な主題分離ステップを使って、関連する主題から切り離し、単一/マルチエンベディングの概念の反転を導くコンテキストロスを採用する。徹底的なユーザスタディでは、レゴ生成の概念がベースラインと比較して70%以上好まれました。さらに、大きな言語モデルを用いた視覚的な質問応答では、レゴ生成の概念は、概念のテキスト記述と整合性が高いことが示唆されている。 Diffusion models have revolutionized generative content creation and text-to-image (T2I) diffusion models in particular have increased the creative freedom of users by allowing scene synthesis using natural language. T2I models excel at synthesizing concepts such as nouns, appearances, and styles. To enable customized content creation based on a few example images of a concept, methods such as Textual Inversion and DreamBooth invert the desired concept and enable synthesizing it in new scenes. However, inverting more general concepts that go beyond object appearance and style (adjectives and verbs) through natural language, remains a challenge. Two key characteristics of these concepts contribute to the limitations of current inversion methods. 1) Adjectives and verbs are entangled with nouns (subject) and can hinder appearance-based inversion methods, where the subject appearance leaks into the concept embedding and 2) describing such concepts often extends beyond single word embeddings (being frozen in ice, walking on a tightrope, etc.) that current methods do not handle. In this study, we introduce Lego, a textual inversion method designed to invert subject entangled concepts from a few example images. Lego disentangles concepts from their associated subjects using a simple yet effective Subject Separation step and employs a Context Loss that guides the inversion of single/multi-embedding concepts. In a thorough user study, Lego-generated concepts were preferred over 70% of the time when compared to the baseline. Additionally, visual question answering using a large language model suggested Lego-generated concepts are better aligned with the text description of the concept.	翻訳日:2023-11-28 00:34:45 公開日:2023-11-23
# 後部蒸留サンプリング Posterior Distillation Sampling ( http://arxiv.org/abs/2311.13831v1 ) ライセンス: Link先を確認	Juil Koo, Chanho Park, Minhyuk Sung	(参考訳) 拡散モデルに基づくパラメトリック画像編集のための新しい最適化手法である PDS (Posterior Distillation Sampling) を導入する。様々なパラメトリック画像の処理に拡散モデルの強力な2次元前処理を利用する既存の最適化手法は,主に生成に重点を置いている。生成とは異なり、編集にはターゲット属性への準拠とソースコンテンツのアイデンティティ保持のバランスが必要となる。近年の2次元画像編集法は,拡散モデルの生成過程に符号化された確率的潜伏を利用してこのバランスを達成している。画素空間で示される拡散モデルのパラメータ空間への編集能力を拡張するため、2次元画像編集法をPDSという最適化形式に再構成する。 PDSはソースとターゲットの確率的潜在値と一致し、ソースのアイデンティティを維持しながら、望ましい属性と整合する多様なパラメータ空間におけるターゲットのサンプリングを可能にする。この最適化は、生成プロセスをターゲット属性で実行するのに似ているが、ソースの生成プロセスの軌跡と一致させることを実証する。 Neural Radiance Fields と Scalable Vector Graphics representations の広範囲な編集結果は、PDSが上記パラメータ空間間のバランスを満たすためにターゲットをサンプリングできることを示している。 We introduce Posterior Distillation Sampling (PDS), a novel optimization method for parametric image editing based on diffusion models. Existing optimization-based methods, which leverage the powerful 2D prior of diffusion models to handle various parametric images, have mainly focused on generation. Unlike generation, editing requires a balance between conforming to the target attribute and preserving the identity of the source content. Recent 2D image editing methods have achieved this balance by leveraging the stochastic latent encoded in the generative process of diffusion models. To extend the editing capabilities of diffusion models shown in pixel space to parameter space, we reformulate the 2D image editing method into an optimization form named PDS. PDS matches the stochastic latents of the source and the target, enabling the sampling of targets in diverse parameter spaces that align with a desired attribute while maintaining the source's identity. We demonstrate that this optimization resembles running a generative process with the target attribute, but aligning this process with the trajectory of the source's generative process. Extensive editing results in Neural Radiance Fields and Scalable Vector Graphics representations demonstrate that PDS is capable of sampling targets to fulfill the aforementioned balance across various parameter spaces.	翻訳日:2023-11-28 00:33:59 公開日:2023-11-23
# モデル平均化における安定性とl2ペナルティ Stability and L2-penalty in Model Averaging ( http://arxiv.org/abs/2311.13827v1 ) ライセンス: Link先を確認	Hengkun Zhu, Guohua Zou	(参考訳) モデル平均化は過去20年間に多くの注目を集めており、モデルの平均化によって利用可能な情報を統合している。様々なモデル平均化手法が開発されているが、安定性の観点からモデル平均化の理論的な性質に関する文献は少なく、これらの手法の多くはモデル重み付けを単純なものに制限している。本研究の目的は,統計的学習理論からモデル平均化への安定性の導入である。したがって,モデル平均化の安定性,漸近的経験的リスク最小化,一般化,一貫性を定義し,それらの関係を考察する。以上の結果から,モデル平均化による予測誤差の漸近的最小化が可能であるため,モデル平均化性能と妥当な条件下での一貫性が確保できることが示唆された。また,モデル重みを制限することなくL2ペナルティモデル平均化法を提案し,安定性と整合性を示す。チューニングパラメータ選択の影響を低減するために,10倍のクロスバリデーションを用いて,パラメータの候補セットを選択し,推定誤差に基づいてモデル重み付けの推定値の重み付け平均を実行する。モンテカルロシミュレーションと図解的応用は,提案手法の有用性を実証するものである。 Model averaging has received much attention in the past two decades, which integrates available information by averaging over potential models. Although various model averaging methods have been developed, there are few literatures on the theoretical properties of model averaging from the perspective of stability, and the majority of these methods constrain model weights to a simplex. The aim of this paper is to introduce stability from statistical learning theory into model averaging. Thus, we define the stability, asymptotic empirical risk minimizer, generalization, and consistency of model averaging and study the relationship among them. Our results indicate that stability can ensure that model averaging has good generalization performance and consistency under reasonable conditions, where consistency means model averaging estimator can asymptotically minimize the mean squared prediction error. We also propose a L2-penalty model averaging method without limiting model weights and prove that it has stability and consistency. In order to reduce the impact of tuning parameter selection, we use 10-fold cross-validation to select a candidate set of tuning parameters and perform a weighted average of the estimators of model weights based on estimation errors. The Monte Carlo simulation and an illustrative application demonstrate the usefulness of the proposed method.	翻訳日:2023-11-28 00:33:21 公開日:2023-11-23
# 有限温度における改良ハーツリー・フォック近似における均質希薄ボース気体の非凝縮分数 The non-condensed fraction of a homogeneous dilute Bose gas within the improved Hartree-Fock approximation at finite temperature ( http://arxiv.org/abs/2311.13822v1 ) ライセンス: Link先を確認	Nguyen Van Thu	(参考訳) Cornwall-Jackiw-Tomboulis 効果作用を用いて,有限温度で弱相互作用するボースガスについて検討した。臨界温度のシフトは、普遍形 $\Delta T_C/T_C^{(0)} = cn_0^{1/3}a_s$ で得られる。非凝縮分数は、量子揺らぎ、熱揺らぎ、およびその両方に対応する3つの項の合計で表される。 By means of Cornwall-Jackiw-Tomboulis effective action we investigate a dilute weakly interacting Bose gas at finite temperature. The shift of critical temperature is obtained in the universal form $\Delta T_C/T_C^{(0)} = cn_0^{1/3}a_s$. The non-condensate fraction is expressed in sum of three terms, which correspond to the quantum fluctuation, thermal fluctuation and both.	翻訳日:2023-11-28 00:32:55 公開日:2023-11-23
# hypuc : 不平衡心電図の信頼性回帰のための勾配ブースト補正による超微視的不確実性校正 HypUC: Hyperfine Uncertainty Calibration with Gradient-boosted Corrections for Reliable Regression on Imbalanced Electrocardiograms ( http://arxiv.org/abs/2311.13821v1 ) ライセンス: Link先を確認	Uddeshya Upadhyay, Sairam Bade, Arjun Puranik, Shahir Asfahan, Melwin Babu, Francisco Lopez-Jimenez, Samuel J. Asirvatham, Ashim Prasad, Ajit Rajasekharan, Samir Awasthi, Rakesh Barve	(参考訳) 心電図(ECG)、脳波図(EEG)、パルスオキシメトリーなどの医療時系列の自動解析は、患者を遠隔監視し、高価で時間を要する医療処置をより効率的に活用するための貴重なツールとして機能する可能性がある。ディープニューラルネットワーク(DNN)は、そのような信号を効果的に処理することを示した。しかし、これまでの研究では、診断の中心となる生理的パラメータの連続的な評価を抑えるのではなく、医学時系列の分類に重点を置いてきた。この点において重要な課題の1つは、異常な状況の発生率の低さが、不正確な予測とデプロイ時の予測の確実性の欠如をもたらす大きな歪曲データに繋がる可能性があるため、データセットのバランスの取れない性質である。これらの課題に対処するため,医療時系列における不均衡確率回帰の枠組みであるHypUCを提案する。 (i)医療時系列と不均衡回帰問題に取り組むためのカーネル密度に基づく簡単な手法を導入する。さらに,予測された連続値に対する不確実性推定を可能にする確率回帰フレームワークを用いる。 (iii) 予測の不確実性をさらに校正する新たな手法を提案する。 (iv)最後に,予測された連続値を改善するために校正不確実性推定を用いる手法を示し,不確実性推定を校正不確実性推定の有効性を示す。 HypUCは、数百万の患者から収集された多種多様なECGの大規模な実世界のデータセットに基づいて評価され、様々な診断タスクにおける従来のベースラインよりも優れている。 The automated analysis of medical time series, such as the electrocardiogram (ECG), electroencephalogram (EEG), pulse oximetry, etc, has the potential to serve as a valuable tool for diagnostic decisions, allowing for remote monitoring of patients and more efficient use of expensive and time-consuming medical procedures. Deep neural networks (DNNs) have been demonstrated to process such signals effectively. However, previous research has primarily focused on classifying medical time series rather than attempting to regress the continuous-valued physiological parameters central to diagnosis. One significant challenge in this regard is the imbalanced nature of the dataset, as a low prevalence of abnormal conditions can lead to heavily skewed data that results in inaccurate predictions and a lack of certainty in such predictions when deployed. To address these challenges, we propose HypUC, a framework for imbalanced probabilistic regression in medical time series, making several contributions. (i) We introduce a simple kernel density-based technique to tackle the imbalanced regression problem with medical time series. (ii) Moreover, we employ a probabilistic regression framework that allows uncertainty estimation for the predicted continuous values. (iii) We also present a new approach to calibrate the predicted uncertainty further. (iv) Finally, we demonstrate a technique to use calibrated uncertainty estimates to improve the predicted continuous value and show the efficacy of the calibrated uncertainty estimates to flag unreliable predictions. HypUC is evaluated on a large, diverse, real-world dataset of ECGs collected from millions of patients, outperforming several conventional baselines on various diagnostic tasks, suggesting a potential use-case for the reliable clinical deployment of deep learning models.	翻訳日:2023-11-28 00:32:44 公開日:2023-11-23
# 分子同定とピークアサインメント:NMRによるマルチレベルマルチモーダルアライメントの活用 Molecular Identification and Peak Assignment: Leveraging Multi-Level Multimodal Alignment on NMR ( http://arxiv.org/abs/2311.13817v1 ) ライセンス: Link先を確認	Hao Xu, Zhengyang Zhou, Pengyu Hong	(参考訳) 核磁気共鳴(NMR)分光は、様々な科学分野において重要な役割を担い、分子動力学と相互作用に関する貴重な洞察を提供する。 AIによるNMR予測モデルの約束にもかかわらず、分子検索、異性体認識、ピーク割り当てといったタスクのスペクトルの解釈には課題が続いている。そこで本研究では、分子グラフ(構造)とNMRスペクトルの2つの不均一なモード間の有意な対応を確立するために、知識誘導型インスタンスワイズ識別を用いたマルチレベルマルチモーダルアライメント(K-M3AID)を提案する。特に、K-M3AIDは二重協調型コントラスト学習アーキテクチャを採用し、グラフレベルのアライメントモジュール、ノードレベルのアライメントモジュール、通信チャネルを備えている。特に、このフレームワークは、ノードレベルのアライメントモジュール内でのコントラスト学習に知識誘導型インスタンスワイド識別を導入し、クロスモーダルアライメントの精度を大幅に向上させる。さらにK-M3AIDは,ノードレベルのアライメントによって獲得したスキルがグラフレベルのアライメントに肯定的な影響を与えることを示すことで,メタラーニングの能力を示す。経験的検証は、K-M3AIDが複数のゼロショットタスクに対処する効果を強調し、複雑なNMRシナリオにおける構造情報とスペクトルデータのギャップを埋めるための有望な解決策を提供する。 Nuclear magnetic resonance (NMR) spectroscopy plays an essential role across various scientific disciplines, providing valuable insights into molecular dynamics and interactions. Despite the promise of AI-enhanced NMR prediction models, challenges persist in the interpretation of spectra for tasks such as molecular retrieval, isomer recognition, and peak assignment. In response, this paper introduces Multi-Level Multimodal Alignment with Knowledge-Guided Instance-Wise Discrimination (K-M3AID) to establish meaningful correspondences between two heterogeneous modalities: molecular graphs (structures) and NMR spectra. In particular, K-M3AID employs a dual-coordinated contrastive learning architecture, and incorporates a graph-level alignment module, a node-level alignment module, and a communication channel. Notably, the framework introduces knowledge-guided instance-wise discrimination into contrastive learning within the node-level alignment module, significantly enhancing accuracy in cross-modal alignment. Additionally, K-M3AID showcases its capability of meta-learning by demonstrating that skills acquired during node-level alignment positively impact graph-level alignment. Empirical validation underscores K-M3AID's effectiveness in addressing multiple zero-shot tasks, offering a promising solution to bridge the gap between structural information and spectral data in complex NMR scenarios.	翻訳日:2023-11-28 00:32:12 公開日:2023-11-23
# 共変量と依存シフトによるフェアネスアウェアドメインの一般化 Fairness-Aware Domain Generalization under Covariate and Dependence Shifts ( http://arxiv.org/abs/2311.13816v1 ) ライセンス: Link先を確認	Chen Zhao, Kai Jiang, Xintao Wu, Haoliang Wang, Latifur Khan, Christan Grant, Feng Chen	(参考訳) モデルフェアネスを同時に考慮しながら、ソースドメインからシフト対象ドメインへの不変な分類器の一般化を実現することは、機械学習における実質的で複雑な課題である。既存の領域一般化研究は、典型的には、クラスラベルの変更に関連する概念シフトと、データスタイルのバリエーションに関連する共変量シフトにドメインシフトがある。本稿では,領域間の公正な依存パターンの変動を伴う依存シフトと呼ばれる別の形態の分布シフトを導入することにより,共変量と依存シフトの両方を考慮し,領域シフトに対処する新しい領域一般化手法を提案する。基礎となる変換モデルの存在は、データをある領域から別の領域に変換できると断言する。モデルを用いて合成ドメインのデータを生成することにより、モデル精度と未知領域の公正性の両方を強制するフェアネス対応不変分類器が学習される。 4つのベンチマークデータセットに関する広範な実証研究は、我々のアプローチが最先端の手法を上回っていることを示している。 Achieving the generalization of an invariant classifier from source domains to shifted target domains while simultaneously considering model fairness is a substantial and complex challenge in machine learning. Existing domain generalization research typically attributes domain shifts to concept shift, which relates to alterations in class labels, and covariate shift, which pertains to variations in data styles. In this paper, by introducing another form of distribution shift, known as dependence shift, which involves variations in fair dependence patterns across domains, we propose a novel domain generalization approach that addresses domain shifts by considering both covariate and dependence shifts. We assert the existence of an underlying transformation model can transform data from one domain to another. By generating data in synthetic domains through the model, a fairness-aware invariant classifier is learned that enforces both model accuracy and fairness in unseen domains. Extensive empirical studies on four benchmark datasets demonstrate that our approach surpasses state-of-the-art methods.	翻訳日:2023-11-28 00:31:45 公開日:2023-11-23
# ニューラルネットワークを用いた確率的構造メタマテリアルの機械的特性と逆設計 Mechanical Characterization and Inverse Design of Stochastic Architected Metamaterials Using Neural Operators ( http://arxiv.org/abs/2311.13812v1 ) ライセンス: Link先を確認	Hanxun Jin, Enrui Zhang, Boyu Zhang, Sridhar Krishnaswamy, George Em Karniadakis, Horacio D. Espinosa	(参考訳) 機械学習(ML)は、設計した材料を設計するための変革的なツールとして登場し、ラボベースの試行錯誤手法によって達成可能なものを超える特性を提供する。しかし、現在の逆設計戦略における大きな課題は、計算および/または実験的なデータセットへの依存であり、特に非線形機械的挙動を示すマイクロスケールの確率的構造材料の設計において問題となる。本稿では,ディープニューラル演算子(deeponet)を活用した新しいエンド・ツー・エンドの科学mlフレームワークについて紹介する。このアプローチは、特定の非線形機械的挙動に合わせた構造物の逆設計を容易にする。 2光子リソグラフィで印刷したスピノダル微細構造から得られた結果は, 機械応答の予測誤差が5～10%の範囲内にあることを明らかにした。我々の研究は、先進的なマイクロメカニクス実験技術を用いたニューラル演算子を用いることで、データ不足に制約されたシナリオにおいても、所望の特性を持つ複雑なマイクロ構造材料の設計が実現可能であることを強調している。我々の研究は、材料設計の分野において重要な進歩を示し、実験的な洞察から直接得られる非平行な機械的特性を持つ次世代のメタマテリアルの発見と開発における新しい時代を告げる可能性を秘めている。 Machine learning (ML) is emerging as a transformative tool for the design of architected materials, offering properties that far surpass those achievable through lab-based trial-and-error methods. However, a major challenge in current inverse design strategies is their reliance on extensive computational and/or experimental datasets, which becomes particularly problematic for designing micro-scale stochastic architected materials that exhibit nonlinear mechanical behaviors. Here, we introduce a new end-to-end scientific ML framework, leveraging deep neural operators (DeepONet), to directly learn the relationship between the complete microstructure and mechanical response of architected metamaterials from sparse but high-quality in situ experimental data. The approach facilitates the inverse design of structures tailored to specific nonlinear mechanical behaviors. Results obtained from spinodal microstructures, printed using two-photon lithography, reveal that the prediction error for mechanical responses is within a range of 5 - 10%. Our work underscores that by employing neural operators with advanced micro-mechanics experimental techniques, the design of complex micro-architected materials with desired properties becomes feasible, even in scenarios constrained by data scarcity. Our work marks a significant advancement in the field of materials-by-design, potentially heralding a new era in the discovery and development of next-generation metamaterials with unparalleled mechanical characteristics derived directly from experimental insights.	翻訳日:2023-11-28 00:31:29 公開日:2023-11-23
# 教育蒸留:学生モデルを用いてシュクールで学ぶ Education distillation:getting student models to learn in shcools ( http://arxiv.org/abs/2311.13811v1 ) ライセンス: Link先を確認	Ling Feng, Danyang Li, Tianhao Wu, Xuliang Duan	(参考訳) 知識蒸留はモデル圧縮の方法の一つであり、既存の知識蒸留技術は蒸留効率を高めるために蒸留アルゴリズムを改善する方法に焦点を当てている。本稿では,知識蒸留における動的漸進学習を導入し,教育蒸留のための蒸留戦略を提案する。具体的には,学生モデルから分割した断片化学生モデルを低次モデルとして検討することを提案する。学級レベルが上がるにつれて、断片化された学生モデルはデザインされた教育基準層と共に深くなり、さらに多くの教師モデルから学び、蒸留する。低学年から高学年への移行により、断片化された学生モデルは徐々に完全な対象の学生モデルに統合され、学生モデルの性能は段階の下位から上位へと徐々に向上した。教育蒸留戦略と蒸留アルゴリズムの組み合わせは、公開データセットであるcifar100,caltech256,food-101データセットで単一蒸留アルゴリズムの結果を上回る。 Knowledge distillation is one of the methods for model compression, and existing knowledge distillation techniques focus on how to improve the distillation algorithm so as to enhance the distillation efficdiency. This paper introduces dynamic incremental learning into knowledge distillation and proposes a distillation strategy for education distillation. Specifically, it is proposed to look at fragmented student models divided from the full student model as low models. As the grade level rises, fragmented student models deepen in conjunction with designed teaching reference layers, while learning and distilling from more teacher models. By moving from lower to higher grades, fragmented student models were gradually integrated into a complete target student model, and the performance of the student models gradually improved from lower to senior grades of the stage. Education distillation strategies combined with distillation algorithms outperform the results of single distillation algorithms on the public dataset CIFAR100,Caltech256, Food-101 dataset.	翻訳日:2023-11-28 00:31:03 公開日:2023-11-23
# 古典と量子機械学習のブリッジ:知識蒸留を用いた古典から量子ニューラルネットワークへの知識伝達 Bridging Classical and Quantum Machine Learning: Knowledge Transfer From Classical to Quantum Neural Networks Using Knowledge Distillation ( http://arxiv.org/abs/2311.13810v1 ) ライセンス: Link先を確認	Mohammad Junayed Hasan and M.R.C.Mahdy	(参考訳) ごく最近の研究では、同じ数の学習可能なパラメータが使用される場合、量子ニューラルネットワークが画像分類のようなタスクにおいて古典的なニューラルネットワークを上回ることが示されている。しかし、量子モデルの開発と最適化は、現在、量子ビット不安定性や量子ビット可用性の制限といった問題によって妨げられている。対照的に、古典的なモデルはリソースのかなりの可用性のために高性能を示すことができる。その結果、より多くの研究が古典量子統合のハイブリッドに焦点をあてている。特に、古典量子積分や量子量子アプローチによる転送学習に重点が置かれている。従来の研究とは異なり、従来の機械学習と創発的量子コンピューティングのギャップを効果的に橋渡しし、知識を古典的ニューラルネットワークから量子ニューラルネットワークに移す新しい手法を提案する。我々は、ルネットやアレックスネットのような古典的畳み込みニューラルネットワーク(CNN)アーキテクチャを教師ネットワークとして利用し、KL分割によるバックプロパガンス中の監視信号を送信することによって、学生量子モデルのトレーニングを容易にする。このアプローチは、古典的cnnのみに依存することで量子モデルのパフォーマンスを大幅に改善し、量子モデルの平均精度はmnistデータセットで0.80%、より複雑なファッションmnistデータセットで5.40%向上した。この技術を適用することで、リソース制約された環境での移動学習のための巨大な量子モデルの煩雑なトレーニングを不要にし、既存の事前学習された古典的モデルを再利用して性能を向上させることができる。 Very recently, studies have shown that quantum neural networks surpass classical neural networks in tasks like image classification when a similar number of learnable parameters are used. However, the development and optimization of quantum models are currently hindered by issues such as qubit instability and limited qubit availability, leading to error-prone systems with weak performance. In contrast, classical models can exhibit high-performance owing to substantial resource availability. As a result, more studies have been focusing on hybrid classical-quantum integration. A line of research particularly focuses on transfer learning through classical-quantum integration or quantum-quantum approaches. Unlike previous studies, this paper introduces a new method to transfer knowledge from classical to quantum neural networks using knowledge distillation, effectively bridging the gap between classical machine learning and emergent quantum computing techniques. We adapt classical convolutional neural network (CNN) architectures like LeNet and AlexNet to serve as teacher networks, facilitating the training of student quantum models by sending supervisory signals during backpropagation through KL-divergence. The approach yields significant performance improvements for the quantum models by solely depending on classical CNNs, with quantum models achieving an average accuracy improvement of 0.80% on the MNIST dataset and 5.40% on the more complex Fashion MNIST dataset. Applying this technique eliminates the cumbersome training of huge quantum models for transfer learning in resource-constrained settings and enables re-using existing pre-trained classical models to improve performance.Thus, this study paves the way for future research in quantum machine learning (QML) by positioning knowledge distillation as a core technique for advancing QML applications.	翻訳日:2023-11-28 00:30:49 公開日:2023-11-23
# AdaTyper:適応型セマンティックカラム型検出 AdaTyper: Adaptive Semantic Column Type Detection ( http://arxiv.org/abs/2311.13806v1 ) ライセンス: Link先を確認	Madelon Hulsebos and Paul Groth and \c{C}a\u{g}atay Demiralp	(参考訳) 関係表の意味を理解することは、データ探索と準備システムの自動化に役立つ。テーブルを理解するための重要な情報源は列のセマンティクスである。ディープラーニングの台頭に伴い、学習したテーブル表現が利用可能になり、セマンティックな型検出に適用でき、ベンチマークのパフォーマンスが向上する。それでも我々は,この性能と実用性とのギャップを観察する。本稿では,最も重要なデプロイメント課題の1つである適応性に対処するために,adatyperを提案する。 AdaTyperは弱いスーパービジョンを使用して、人間の最小限のフィードバックを使用して、ハイブリッド型予測器を新しいセマンティックタイプに適応し、推論時にデータ分散をシフトする。 AdaTyperのハイブリッド型予測器は,ルールベースの手法と,意味列型検出のための光機械学習モデルを組み合わせる。本稿では,実世界のデータベーステーブルにおけるadatyperの適応性能をクラウドソーシングによって評価し,f1-scoreが新規および既存型に対して改善することを示す。 adatyperは5つの例だけを見て平均0.6の精度にアプローチし、人間の正規表現や辞書に基づく既存の適応法を大きく上回っている。 Understanding the semantics of relational tables is instrumental for automation in data exploration and preparation systems. A key source for understanding a table is the semantics of its columns. With the rise of deep learning, learned table representations are now available, which can be applied for semantic type detection and achieve good performance on benchmarks. Nevertheless, we observe a gap between this performance and its applicability in practice. In this paper, we propose AdaTyper to address one of the most critical deployment challenges: adaptation. AdaTyper uses weak-supervision to adapt a hybrid type predictor towards new semantic types and shifted data distributions at inference time, using minimal human feedback. The hybrid type predictor of AdaTyper combines rule-based methods and a light machine learning model for semantic column type detection. We evaluate the adaptation performance of AdaTyper on real-world database tables hand-annotated with semantic column types through crowdsourcing and find that the f1-score improves for new and existing types. AdaTyper approaches an average precision of 0.6 after only seeing 5 examples, significantly outperforming existing adaptation methods based on human-provided regular expressions or dictionaries.	翻訳日:2023-11-28 00:30:21 公開日:2023-11-23
# 数百以上のRydberg原子のキング格子上の最大独立集合問題の量子計算データセット Quantum Computing Dataset of Maximum Independent Set Problem on King's Lattice of over Hundred Rydberg Atoms ( http://arxiv.org/abs/2311.13803v1 ) ライセンス: Link先を確認	Kangheun Kim, Minhyuk Kim, Juyoung Park, Andrew Byun, Jaewook Ahn	(参考訳) 大規模グラフの最大独立集合(MIS)を見つけることは、古典計算では効率的に解けない非決定論的多項式時間(NP)完全問題である。ここでは、キングス格子上にランダムに配置された最大141個の原子のMIS問題を解決するために、ライドバーグ原子実験の量子断熱計算データについて述べる。 733,853の異なるグラフのMIS溶液に対して、Rydberg-atom測定の合計582,916の事象が収集された。実画像データと、測定された多体基底状態と分類されたグラフデータの全二値決定を行い、ベンチマーク試験と高度なデータ駆動分析により、Rydberg-atomアプローチの性能とシステム改善の検証を行う。 Finding the maximum independent set (MIS) of a large-size graph is a nondeterministic polynomial-time (NP)-complete problem not efficiently solvable with classical computations. Here, we present a set of quantum adiabatic computing data of Rydberg-atom experiments performed to solve the MIS problem of up to 141 atoms randomly arranged on the King's lattice. A total of 582,916 events of Rydberg-atom measurements are collected for experimental MIS solutions of 733,853 different graphs. We provide the raw image data along with the entire binary determinations of the measured many-body ground states and the classified graph data, to offer bench-mark testing and advanced data-driven analyses for validation of the performance and system improvements of the Rydberg-atom approach.	翻訳日:2023-11-28 00:29:59 公開日:2023-11-23
# 機械学習応用のためのサブスペースへの投影による最適輸送の活用 Leveraging Optimal Transport via Projections on Subspaces for Machine Learning Applications ( http://arxiv.org/abs/2311.13883v1 ) ライセンス: Link先を確認	Cl\'ement Bonet	(参考訳) 最適輸送は、基礎となる空間の幾何学を利用して確率分布を比較することができるため、機械学習において多くの注目を集めている。しかし、元々の定式化では、この問題を解決するにはかなりの計算負荷がかかる。したがって、有意義な作業ラインは、その特性を享受しながら、この負担を軽減するための代替案を提案することにある。この論文では、部分空間上の射影を用いる代替に焦点をあてる。そのような代替案の主なものはスリケード・ワッサーシュタイン距離(英語版)であり、この距離はリーマン多様体に拡張して機械学習アプリケーションに応用することを最初に提案し、近年、そのような空間を使うことが有用であることが示されている。また,いわゆる不均衡OT問題における正測度間のスライス距離についても検討した。確率測度間の元のユークリッドスライクド=ワッサーシュタイン距離に遡って、通常のワッサーシュタイン距離の代わりにこの距離を持つ空間を与えるときの勾配流のダイナミクスを研究する。次に、計量空間における内積の一般化である、確率測度の空間におけるブシェマン関数の利用について検討する。最後に、Gromov-Wasserstein 距離を用いて、部分空間デトラルアプローチを非可換空間に拡張する。 Optimal Transport has received much attention in Machine Learning as it allows to compare probability distributions by exploiting the geometry of the underlying space. However, in its original formulation, solving this problem suffers from a significant computational burden. Thus, a meaningful line of work consists at proposing alternatives to reduce this burden while still enjoying its properties. In this thesis, we focus on alternatives which use projections on subspaces. The main such alternative is the Sliced-Wasserstein distance, which we first propose to extend to Riemannian manifolds in order to use it in Machine Learning applications for which using such spaces has been shown to be beneficial in the recent years. We also study sliced distances between positive measures in the so-called unbalanced OT problem. Back to the original Euclidean Sliced-Wasserstein distance between probability measures, we study the dynamic of gradient flows when endowing the space with this distance in place of the usual Wasserstein distance. Then, we investigate the use of the Busemann function, a generalization of the inner product in metric spaces, in the space of probability measures. Finally, we extend the subspace detour approach to incomparable spaces using the Gromov-Wasserstein distance.	翻訳日:2023-11-28 00:22:33 公開日:2023-11-23
# DPAのGDPRAI対応完全性チェックに関するマルチソリューション研究 A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs ( http://arxiv.org/abs/2311.13881v1 ) ライセンス: Link先を確認	Muhammad Ilyas Azeem and Sallam Abualhaija	(参考訳) 要件エンジニアリング(RE)において、適用可能な規則に準拠するようにソフトウェアシステムの法的要件を指定することが大きな関心事である。組織によって収集される個人データは、特定の処理活動を行うために他の組織と共有されることが多い。このような場合、GDPR(General Data Protection Regulation)は、データ処理を規制し、個人データが保護され続けることを保証するデータ処理契約(DPA)を発行する必要がある。 GDPRに違反すると、巨額の罰金が数十億ユーロに達する可能性がある。個人データ処理を含むソフトウェアシステムは、GDPRで規定された法的義務に従わなければならない。要件エンジニアは、ソフトウェアシステム内のデータ処理アクティビティを規制するためのDPAの法的要件から引き出すことができる。したがって、GDPRの規定に従ってDPAの完全性を確認することは、要求が満たされることを保証するための必須条件である。 dpasを完全に手動で分析するのは時間がかかり、適切な法的専門知識を必要とする。本稿では,GDPRに対するDPAの完全性チェックに対処する自動化戦略を提案する。具体的には,従来の機械学習,ディープラーニング,言語モデリング,少数ショット学習など,さまざまなテクノロジで実現可能な10の代替ソリューションを追求する。私たちの仕事の目標は、これらの異なる技術が法的な領域においてどのように機能するかを実証的に調べることです。 F2スコアを30個の実DPAで計算した。評価の結果,F2スコアは86.7%,89.7%は事前学習されたBERTおよびRoBERTa言語モデルに基づく。我々の分析は、ディープラーニング(例えば、BiLSTM)や少数ショット学習(例えば、SetFit)に基づく他の代替ソリューションは、同等の精度を達成できるが、より効率的に開発できることを示している。 Specifying legal requirements for software systems to ensure their compliance with the applicable regulations is a major concern to requirements engineering (RE). Personal data which is collected by an organization is often shared with other organizations to perform certain processing activities. In such cases, the General Data Protection Regulation (GDPR) requires issuing a data processing agreement (DPA) which regulates the processing and further ensures that personal data remains protected. Violating GDPR can lead to huge fines reaching to billions of Euros. Software systems involving personal data processing must adhere to the legal obligations stipulated in GDPR and outlined in DPAs. Requirements engineers can elicit from DPAs legal requirements for regulating the data processing activities in software systems. Checking the completeness of a DPA according to the GDPR provisions is therefore an essential prerequisite to ensure that the elicited requirements are complete. Analyzing DPAs entirely manually is time consuming and requires adequate legal expertise. In this paper, we propose an automation strategy to address the completeness checking of DPAs against GDPR. Specifically, we pursue ten alternative solutions which are enabled by different technologies, namely traditional machine learning, deep learning, language modeling, and few-shot learning. The goal of our work is to empirically examine how these different technologies fare in the legal domain. We computed F2 score on a set of 30 real DPAs. Our evaluation shows that best-performing solutions yield F2 score of 86.7% and 89.7% are based on pre-trained BERT and RoBERTa language models. Our analysis further shows that other alternative solutions based on deep learning (e.g., BiLSTM) and few-shot learning (e.g., SetFit) can achieve comparable accuracy, yet are more efficient to develop.	翻訳日:2023-11-28 00:22:17 公開日:2023-11-23
# PointPCA+: 客観的品質評価尺度の拡張 PointPCA+: Extending PointPCA objective quality assessment metric ( http://arxiv.org/abs/2311.13880v1 ) ライセンス: Link先を確認	Xuemei Zhou, Evangelos Alexiou, Irene Viola, Pablo Cesar	(参考訳) 本稿では,pointpca の拡張である pointpca+ という計算的単純化とディスクリプタ・リッチ・ポイント・クラウド品質評価(pcqa)指標を提案する。完全参照PCQAのための点雲の幾何データとテクスチャデータの両方に適用したPCA分解に基づく知覚関連記述子セットを提案した。 PointPCA+は、より効率的に計算される既存の幾何学やテクスチャ記述子を豊かにしながら、幾何学データにのみPCAを使用する。 PointPCAと同様に、局所的な形状と外観特性をキャプチャする幾何学とテクスチャ記述子から個々の予測を学習ベースで融合することで、総品質スコアが得られる。機能融合の前に、提案されたスーパーセットから最も効果的な機能を選択するために、機能選択モジュールが導入される。実験結果から,PointPCA+は,公開データセットから得られた主観的真理値に対して高い予測性能を示した。コードは \url{https://github.com/cwi-dis/pointpca_suite/} で入手できる。 A computationally-simplified and descriptor-richer Point Cloud Quality Assessment (PCQA) metric, namely PointPCA+, is proposed in this paper, which is an extension of PointPCA. PointPCA proposed a set of perceptually-relevant descriptors based on PCA decomposition that were applied to both the geometry and texture data of point clouds for full reference PCQA. PointPCA+ employs PCA only on the geometry data while enriching existing geometry and texture descriptors, that are computed more efficiently. Similarly to PointPCA, a total quality score is obtained through a learning-based fusion of individual predictions from geometry and texture descriptors that capture local shape and appearance properties, respectively. Before feature fusion, a feature selection module is introduced to choose the most effective features from a proposed super-set. Experimental results show that PointPCA+ achieves high predictive performance against subjective ground truth scores obtained from publicly available datasets. The code is available at \url{https://github.com/cwi-dis/pointpca_suite/}.	翻訳日:2023-11-28 00:21:50 公開日:2023-11-23
# 時空オントロジーの相対性:空間における相関が時間内でコレルタになるとき Relativity of spacetime ontology: When correlations in space become correlata in time ( http://arxiv.org/abs/2311.13879v1 ) ライセンス: Link先を確認	Marek Czachor and Marcin Nowakowski	(参考訳) マーミンの見解に「相関は物理的現実を持ち、相関関係は相関しない」という見解は、相関関係と相関関係は根本的に異なるものではないと主張する。これらは部分系を定義するテンソル積分解に依存する双対概念である。同じ量子状態は絡み合うか分離可能であるが、別のテンソル積構造に関して、ある文脈における空間的相関は別の文脈における時間的相関となり、その逆も成り立つ。結果として、$v\otimes v$ の下で不変な 2-量子ビット状態は、一重項状態に関するよく知られた一意性定理と相反し、絡み合うか絡み合うかのいずれかになり得る。 Challenging Mermin's perspective that ``correlations have physical reality; that which they correlate does not'' we argue that correlations and correlata are not fundamentally distinct. These are dual concepts depending on the tensor product decomposition defining subsystems. Since the same quantum states may be either entangled or separable, but with respect to alternative tensor product structures, a spatial correlation in one context can become a temporal correlatum in another, and vice versa. In consequence, 2-qubit states invariant under $V\otimes V$ can be either entangled or unentangled, in conflict with the well known uniqueness theorem about the singlet state, a fact with possible implications for the quantum measurement theory.	翻訳日:2023-11-28 00:21:32 公開日:2023-11-23
# 大規模言語モデルにおけるFactual UnconsistencyとHalucinationの最小化 Minimizing Factual Inconsistency and Hallucination in Large Language Models ( http://arxiv.org/abs/2311.13878v1 ) ライセンス: Link先を確認	Muneeswaran I, Shreya Saxena, Siva Prasad, M V Sai Prakash, Advaith Shankar, Varun V, Vishal Vaddina, Saisubramaniam Gopalakrishnan	(参考訳) 大規模言語モデル(英語版)(llm)は、様々な言語関連のタスクにおいて顕著な能力があるため、医療、教育、金融といった重要な分野で広く使われている。しかし、llmは事実的に不正確な応答や「幻覚」を生じやすいため、信頼性とユーザー間の信頼が失われる可能性がある。この問題に対処するため,我々は,まず根拠を生成し,誤用を検証し,改良し,回答を生成するための参照支援として使用する多段階フレームワークを提案する。生成された合理性は回答の透明性を高め、私たちのフレームワークは、この合理性とコンテキストへの参照を使用することで、この回答にモデルがどのように到達したかに関する洞察を提供します。本稿では,生命科学産業における薬物関連質問に対する回答の質の向上に有効であることを示す。 2つのデータセットにおいて,openai gpt-3.5-turboの方が14～25%忠実で16～22%精度が向上し,従来の検索拡張生成(rag)を改善した。さらに,提案手法に基づく微調整サンプルは,小型オープンアクセスllmの精度を33～42%向上させ,商用モデルのragと競合する。 Large Language Models (LLMs) are widely used in critical fields such as healthcare, education, and finance due to their remarkable proficiency in various language-related tasks. However, LLMs are prone to generating factually incorrect responses or "hallucinations," which can lead to a loss of credibility and trust among users. To address this issue, we propose a multi-stage framework that generates the rationale first, verifies and refines incorrect ones, and uses them as supporting references to generate the answer. The generated rationale enhances the transparency of the answer and our framework provides insights into how the model arrived at this answer, by using this rationale and the references to the context. In this paper, we demonstrate its effectiveness in improving the quality of responses to drug-related inquiries in the life sciences industry. Our framework improves traditional Retrieval Augmented Generation (RAG) by enabling OpenAI GPT-3.5-turbo to be 14-25% more faithful and 16-22% more accurate on two datasets. Furthermore, fine-tuning samples based on our framework improves the accuracy of smaller open-access LLMs by 33-42% and competes with RAG on commercial models.	翻訳日:2023-11-28 00:21:17 公開日:2023-11-23
# 動的ステップスケジューリングのための局所最適降下 Locally Optimal Descent for Dynamic Stepsize Scheduling ( http://arxiv.org/abs/2311.13877v1 ) ライセンス: Link先を確認	Gilad Yehudai, Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain	(参考訳) 本稿では,実際にスケジュールのマニュアルと時間的チューニングを簡略化することを目的として,理論に基づく新しい動的学習率スケジューリング手法を提案する。本手法は,局所最適ステップを推定し,現在のステップの確率勾配の方向における最大降下を保証する。まず, 滑らか性パラメータの知識のみを仮定しながら, 滑らかな非凸確率最適化の文脈において, 理論収束境界を定式化する。次に,本手法を既存の学習率スケジューラと比較し,多種多様なデータセットと最適化アルゴリズムにまたがる系統的実験を行う。提案手法は,既存の手法と比較して最小限のチューニングが必要であり,補助的な手動スケジュールやウォームアップフェーズを不要とし,パラメータチューニングを劇的に削減して同等の性能を達成できることを示す。 We introduce a novel dynamic learning-rate scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in practice. Our approach is based on estimating the locally-optimal stepsize, guaranteeing maximal descent in the direction of the stochastic gradient of the current step. We first establish theoretical convergence bounds for our method within the context of smooth non-convex stochastic optimization, matching state-of-the-art bounds while only assuming knowledge of the smoothness parameter. We then present a practical implementation of our algorithm and conduct systematic experiments across diverse datasets and optimization algorithms, comparing our scheme with existing state-of-the-art learning-rate schedulers. Our findings indicate that our method needs minimal tuning when compared to existing approaches, removing the need for auxiliary manual schedules and warm-up phases and achieving comparable performance with drastically reduced parameter tuning.	翻訳日:2023-11-28 00:20:55 公開日:2023-11-23
# Denoising Autoencoders を用いたDyslexia検出のための脳波接続性解析 EEG Connectivity Analysis Using Denoising Autoencoders for the Detection of Dyslexia ( http://arxiv.org/abs/2311.13876v1 ) ライセンス: Link先を確認	Francisco Jesus Martinez-Murcia, Andr\'es Ortiz, Juan Manuel G\'orriz, Javier Ram\'irez, Pedro Javier Lopez-Perez, Miguel L\'opez-Zamora, Juan Luis Luque	(参考訳) テンポラルサンプリングフレームワーク(TSF)は、ディプレキシアの特徴的な音韻学的困難は、1つ以上の時間速度で非定型的な振動サンプリングによって引き起こされると理論化している。 leeduca研究は、遅発性韻律(0.5-1 hz)、音節性(4-8 hz)、音素(12-40 hz)の速度で振幅変調(am)ノイズを聴く小児の脳波(eeg)実験を行い、失読症に関連する振動サンプリングの知覚の違いを検出することを目的とした。本研究の目的は、これらの違いが存在するかどうか、また、言語の違いによる子どものパフォーマンスと、ディフレキシーの検知に一般的に使用される認知タスクとの関連性を確認することである。この目的のために、時間的およびスペクトル的なチャネル間EEG接続を推定し、接続行列の低次元表現を学習するためにDAE(denoising autoencoder)を訓練した。この表現は相関分析と分類分析によって解析され、0.8以上の精度で失読症患者を検出する能力が明らかにされ、約0.7の精度でバランスが取れた。 dae表現のいくつかの特徴は、音韻認識や急速シンボリックネーミングといった音韻学的仮説のカテゴリーの言語と認知的タスクにおける子どものパフォーマンスと、読書効率と読書理解とで有意に相関した(p<0.005$)。最後に,adjacency matrixの深部解析により,dd患者において側頭葉(大体一次聴覚野)の電極間の両側接続が減少し,ブロカ領域に略してf7電極の接続が増加することが明らかとなった。これらの結果は、脳波などの客観的な方法論を用いた失読症の補完的評価への道を開いた。 The Temporal Sampling Framework (TSF) theorizes that the characteristic phonological difficulties of dyslexia are caused by an atypical oscillatory sampling at one or more temporal rates. The LEEDUCA study conducted a series of Electroencephalography (EEG) experiments on children listening to amplitude modulated (AM) noise with slow-rythmic prosodic (0.5-1 Hz), syllabic (4-8 Hz) or the phoneme (12-40 Hz) rates, aimed at detecting differences in perception of oscillatory sampling that could be associated with dyslexia. The purpose of this work is to check whether these differences exist and how they are related to children's performance in different language and cognitive tasks commonly used to detect dyslexia. To this purpose, temporal and spectral inter-channel EEG connectivity was estimated, and a denoising autoencoder (DAE) was trained to learn a low-dimensional representation of the connectivity matrices. This representation was studied via correlation and classification analysis, which revealed ability in detecting dyslexic subjects with an accuracy higher than 0.8, and balanced accuracy around 0.7. Some features of the DAE representation were significantly correlated ($p<0.005$) with children's performance in language and cognitive tasks of the phonological hypothesis category such as phonological awareness and rapid symbolic naming, as well as reading efficiency and reading comprehension. Finally, a deeper analysis of the adjacency matrix revealed a reduced bilateral connection between electrodes of the temporal lobe (roughly the primary auditory cortex) in DD subjects, as well as an increased connectivity of the F7 electrode, placed roughly on Broca's area. These results pave the way for a complementary assessment of dyslexia using more objective methodologies such as EEG.	翻訳日:2023-11-28 00:20:38 公開日:2023-11-23
# 法的要件分析 Legal Requirements Analysis ( http://arxiv.org/abs/2311.13871v1 ) ライセンス: Link先を確認	Sallam Abualhaija and Marcello Ceci and Lionel Briand	(参考訳) 現代のソフトウェアは多くの分野やアプリケーションコンテキストにおいて日常的な活動の不可欠な部分です。人工知能(AI)を活用したインテリジェントオートメーションの導入は、多くの分野でブレークスルーにつながった。 aiの有効性は、データの可用性の増加など、いくつかの要因によって引き起こされる可能性がある。欧州連合(EU)におけるGDPR(General Data Protection Regulation)などの規制は、個人データの保護を保証するために導入されている。個人データを収集、処理、共有するソフトウェアシステムは、そのような規則に従っている。コンプライアンスソフトウェアの開発は、ソフトウェア開発プロセスの要件工学(re)フェーズにおける中心的な活動である、適用規則に規定された法的要件の対処に大きく依存する。 REは、法的要件を含むシステム・トゥ・ビーの要件を特定し維持することに関心がある。個人データ処理のために組織が実施する政策を記述した法的合意は、法的要件を付与するための規制に付加的な情報源を提供することができる。本章では、法的要件を分析し、GDPR上でそれらを実証する様々な方法について考察する。具体的には、規制から機械分析可能な表現を作成するための代替案について述べ、規制に対するコンプライアンス検証を可能にする既存の自動化手段を調査し、法的要件分析の現在の課題をさらに反映する。 Modern software has been an integral part of everyday activities in many disciplines and application contexts. Introducing intelligent automation by leveraging artificial intelligence (AI) led to break-throughs in many fields. The effectiveness of AI can be attributed to several factors, among which is the increasing availability of data. Regulations such as the general data protection regulation (GDPR) in the European Union (EU) are introduced to ensure the protection of personal data. Software systems that collect, process, or share personal data are subject to compliance with such regulations. Developing compliant software depends heavily on addressing legal requirements stipulated in applicable regulations, a central activity in the requirements engineering (RE) phase of the software development process. RE is concerned with specifying and maintaining requirements of a system-to-be, including legal requirements. Legal agreements which describe the policies organizations implement for processing personal data can provide an additional source to regulations for eliciting legal requirements. In this chapter, we explore a variety of methods for analyzing legal requirements and exemplify them on GDPR. Specifically, we describe possible alternatives for creating machine-analyzable representations from regulations, survey the existing automated means for enabling compliance verification against regulations, and further reflect on the current challenges of legal requirements analysis.	翻訳日:2023-11-28 00:20:00 公開日:2023-11-23
# L(M)V-IQL:動物行動評価のための複数意図逆強化学習 L(M)V-IQL: Multiple Intention Inverse Reinforcement Learning for Animal Behavior Characterization ( http://arxiv.org/abs/2311.13870v1 ) ライセンス: Link先を確認	Hao Zhu, Brice De La Crompe, Gabriel Kalweit, Artur Schneider, Maria Kalweit, Ilka Diester, Joschka Boedecker	(参考訳) 意思決定プロセスの理解を深める過程で、数学モデル、特に逆強化学習(Inverse Reinforcement Learning、IRL)は、複雑な行動の中で動物の複数の意図を再構築するのに役立つことが証明されている。近年,連続時間マルチインテンションirlフレームワークが開発されており,マルチインテンションirlアプローチによる離散時間変動報酬関数の推測について検討が続けられている。この課題に対処するために、離散固有報酬の調整に適した新しいIRLフレームワークであるL(M)V-IQLアルゴリズム(Latent (Markov) V-IQL)を導入する。期待最大化手法を活用し,観測された軌跡を異なる意図に分類し,それぞれのirl問題を独立に解く。シミュレーション実験によるL(M)V-IQLの有効性の実証と実際のマウス行動データセットへの応用により,動物行動予測における現在のベンチマークを超え,解釈可能な報酬関数を生成する。この進歩は神経科学と心理学の約束を守り、動物の意思決定をより深く理解し、基礎となる脳のメカニズムを明らかにするのに役立つ。 In advancing the understanding of decision-making processes, mathematical models, particularly Inverse Reinforcement Learning (IRL), have proven instrumental in reconstructing animal's multiple intentions amidst complex behaviors. Given the recent development of a continuous-time multi-intention IRL framework, there has been persistent inquiry into inferring discrete time-varying reward functions with multiple intention IRL approaches. To tackle the challenge, we introduce the Latent (Markov) Variable Inverse Q-learning (L(M)V-IQL) algorithms, a novel IRL framework tailored for accommodating discrete intrinsic rewards. Leveraging an Expectation-Maximization approach, we cluster observed trajectories into distinct intentions and independently solve the IRL problem for each. Demonstrating the efficacy of L(M)V-IQL through simulated experiments and its application to different real mouse behavior datasets, our approach surpasses current benchmarks in animal behavior prediction, producing interpretable reward functions. This advancement holds promise for neuroscience and psychology, contributing to a deeper understanding of animal decision-making and uncovering underlying brain mechanisms.	翻訳日:2023-11-28 00:19:43 公開日:2023-11-23
# 言語誘導による少数ショット意味セグメンテーション Language-guided Few-shot Semantic Segmentation ( http://arxiv.org/abs/2311.13865v1 ) ライセンス: Link先を確認	Jing Wang, Yuang Liu, Qiang Zhou, Fan Wang	(参考訳) 少数ショット学習は、小さなラベル付きサポートセットのガイダンスに従って、新しいカテゴリ適応でラベルコストを削減する有望な方法である。しかし、いくつかのセマンティックセグメンテーションでは、サポート画像のピクセルレベルのアノテーションはまだ高価だ。本稿では,言語情報,すなわち画像レベルテキストラベルのみを用いた,数発意味セグメンテーションの課題に取り組むための革新的な解決法を提案する。提案手法では,VLPモデルとマスク精錬器を含む視覚言語駆動型マスク蒸留方式を用いて,テキストプロンプトから高品質な擬似セマンティックマスクを生成する。さらに,支援画像と問合せ画像間の正確な意味関係を探索するモデルのガイドとして,分散プロトタイプ監督手法と補完相関マッチングモジュールを提案する。 2つのベンチマークデータセットにおける実験により,本手法は,言語誘導小ショット意味セグメンテーションのための新しいベースラインを確立し,近年の視覚誘導法と競合する結果を得ることができた。 Few-shot learning is a promising way for reducing the label cost in new categories adaptation with the guidance of a small, well labeled support set. But for few-shot semantic segmentation, the pixel-level annotations of support images are still expensive. In this paper, we propose an innovative solution to tackle the challenge of few-shot semantic segmentation using only language information, i.e.image-level text labels. Our approach involves a vision-language-driven mask distillation scheme, which contains a vision-language pretraining (VLP) model and a mask refiner, to generate high quality pseudo-semantic masks from text prompts. We additionally introduce a distributed prototype supervision method and complementary correlation matching module to guide the model in digging precise semantic relations among support and query images. The experiments on two benchmark datasets demonstrate that our method establishes a new baseline for language-guided few-shot semantic segmentation and achieves competitive results to recent vision-guided methods.	翻訳日:2023-11-28 00:19:20 公開日:2023-11-23
# ファンド投資の意思決定において、何が一番重要なのか。多粒度グラフ分散学習フレームワーク Which Matters Most in Making Fund Investment Decisions? A Multi-granularity Graph Disentangled Learning Framework ( http://arxiv.org/abs/2311.13864v1 ) ライセンス: Link先を確認	Chunjing Gan, Binbin Hu, Bo Huang, Tianyu Zhao, Yingru Lin, Wenliang Zhong, Zhiqiang Zhang, Jun Zhou, Chuan Shi	(参考訳) 本稿では、個人的利益を超えた投資判断を行う上での適合性とリスクの選好が重要であることを強調し、これらの側面を相反する形で共同特徴付けしようとする。そこで我々は,投資商品の知的マッチングを効果的に行うため,MGDLと呼ばれる新しいM言語グラニュラリティグラフ分散学習フレームワークを開発した。十分に確立されたファンドグラフとアテンションモジュールから得られる多粒度ユーザ表現は、個人的関心、適合性、リスク嗜好をきめ細かな方法で別々に表現する歴史的な行動に由来する。特定のセマンティクスでより強い非絡み合い表現を実現するため、MGDLは2つの自己監督信号、すなわちファンドタイプのコントラストとファンドの人気を明示的に含んでいる。オフラインおよびオンライン環境での大規模な実験はMGDLの有効性を検証する。 In this paper, we highlight that both conformity and risk preference matter in making fund investment decisions beyond personal interest and seek to jointly characterize these aspects in a disentangled manner. Consequently, we develop a novel M ulti-granularity Graph Disentangled Learning framework named MGDL to effectively perform intelligent matching of fund investment products. Benefiting from the well-established fund graph and the attention module, multi-granularity user representations are derived from historical behaviors to separately express personal interest, conformity and risk preference in a fine-grained way. To attain stronger disentangled representations with specific semantics, MGDL explicitly involve two self-supervised signals, i.e., fund type based contrasts and fund popularity. Extensive experiments in offline and online environments verify the effectiveness of MGDL.	翻訳日:2023-11-28 00:19:03 公開日:2023-11-23
# メンタルヘルスカウンセリングにおける大規模言語モデルの課題 Challenges of Large Language Models for Mental Health Counseling ( http://arxiv.org/abs/2311.13857v1 ) ライセンス: Link先を確認	Neo Christopher Chung, George Dyer, Lennart Brocki	(参考訳) 世界的メンタルヘルス危機は、精神障害の急速な増加、限られた資源、治療を求める社会的汚名を伴っている。近年、人工知能(AI)の分野が顕著な進歩を見せているため、人間のような文章を理解・生成できる大規模言語モデル(LLM)が心理学的カウンセリングを支援したり提供したりすることができる。しかし、精神保健領域におけるLSMの適用は、提供された情報の正確性、有効性、信頼性に関する懸念を提起する。本稿では, モデル幻覚, 解釈可能性, バイアス, プライバシ, 臨床効果など, 心理カウンセリングのためのLSMの開発に伴う課題について検討する。我々は、現在のAIパラダイムに適用可能な、これらの課題に対する潜在的な解決策を探る。メンタルヘルスのためのLLMの開発とデプロイの経験から、LLMの落とし穴を慎重にナビゲートし克服できれば、AIはメンタルヘルスを改善するための大きな約束を持っています。 The global mental health crisis is looming with a rapid increase in mental disorders, limited resources, and the social stigma of seeking treatment. As the field of artificial intelligence (AI) has witnessed significant advancements in recent years, large language models (LLMs) capable of understanding and generating human-like text may be used in supporting or providing psychological counseling. However, the application of LLMs in the mental health domain raises concerns regarding the accuracy, effectiveness, and reliability of the information provided. This paper investigates the major challenges associated with the development of LLMs for psychological counseling, including model hallucination, interpretability, bias, privacy, and clinical effectiveness. We explore potential solutions to these challenges that are practical and applicable to the current paradigm of AI. From our experience in developing and deploying LLMs for mental health, AI holds a great promise for improving mental health care, if we can carefully navigate and overcome pitfalls of LLMs.	翻訳日:2023-11-28 00:18:47 公開日:2023-11-23
# うつ病診療ガイドラインを用いた診断説明可能性へのクロスアテンションアプローチ A Cross Attention Approach to Diagnostic Explainability using Clinical Practice Guidelines for Depression ( http://arxiv.org/abs/2311.13852v1 ) ライセンス: Link先を確認	Sumit Dalal, Deepa Tilwani, Manas Gaur, Sarika Jain, Valerie Shalin, and Amit Seth	(参考訳) 関連する臨床知識を用いた説明可能性の欠如は、非構造化臨床対話の人工知能による分析の採用を妨げる。 MH(Mental Health)に関する豊富なデータがオンラインコミュニティで利用可能であり、オンラインとオフラインの両方のアプリケーションのスクリーニングツールとして、潜在的な影響で説明可能性の問題に対処する機会を提供する。本研究では, 一般的な変圧器モデルの注意力を高める手法を開発し, 外部臨床知識を組み込んだ分類のための臨床理解可能な説明を生成する。臨床医が患者と対話するときの専門知識にどのように依存するかに着想を得て,関連する臨床知識を活用して患者の入力をモデル化し,分類に有意義な説明を与える。これは手作業によるレビュー時間を節約し、信頼を高める。我々は,世界的懸念の精神保健障害であるうつ病の診断に臨床実習ガイドライン(CPG)を用いて,MHの文脈でこのようなシステムを開発する。本稿では,cpgを組み込んだpsat(process knowledge-infused cross attention)と呼ばれるアプリケーション固有言語モデルを提案する。うつ病に関連する3つの専門家計算データセットの厳密な評価を通じて, PSATの応用関連説明可能性を示す。 PSATは9つのベースラインモデルのパフォーマンスを上回り、他のベースラインが不足している説明を提供する。我々は,患者健康アンケート(PHQ-9)などの抑うつに焦点を当てたCPGリソースを,SNOMED-CTを用いた機械可読性オントロジーに変換する。このリソースにより、PSATはGPT-3.5のようなモデルでアプリケーション関連の説明を生成する能力を高める。 The lack of explainability using relevant clinical knowledge hinders the adoption of Artificial Intelligence-powered analysis of unstructured clinical dialogue. A wealth of relevant, untapped Mental Health (MH) data is available in online communities, providing the opportunity to address the explainability problem with substantial potential impact as a screening tool for both online and offline applications. We develop a method to enhance attention in popular transformer models and generate clinician-understandable explanations for classification by incorporating external clinical knowledge. Inspired by how clinicians rely on their expertise when interacting with patients, we leverage relevant clinical knowledge to model patient inputs, providing meaningful explanations for classification. This will save manual review time and engender trust. We develop such a system in the context of MH using clinical practice guidelines (CPG) for diagnosing depression, a mental health disorder of global concern. We propose an application-specific language model called ProcesS knowledge-infused cross ATtention (PSAT), which incorporates CPGs when computing attention. Through rigorous evaluation on three expert-curated datasets related to depression, we demonstrate application-relevant explainability of PSAT. PSAT also surpasses the performance of nine baseline models and can provide explanations where other baselines fall short. We transform a CPG resource focused on depression, such as the Patient Health Questionnaire (e.g. PHQ-9) and related questions, into a machine-readable ontology using SNOMED-CT. With this resource, PSAT enhances the ability of models like GPT-3.5 to generate application-relevant explanations.	翻訳日:2023-11-28 00:18:30 公開日:2023-11-23
# 混合重みトレーニングによる文法的誤り訂正 Grammatical Error Correction via Mixed-Grained Weighted Training ( http://arxiv.org/abs/2311.13848v1 ) ライセンス: Link先を確認	Jiahao Li, Quan Wang, Chiwei Zhu, Zhendong Mao, Yongdong Zhang	(参考訳) 文法的誤り訂正(GEC)の課題は,自然文の文法的誤りを自動的に補正することである。ほとんど全ての先行研究は、注釈付きトレーニングデータを平等に扱うが、固有のデータの不一致は無視される。本稿では,データアノテーションの精度と潜在的なアノテーションの多様性という2つの側面に固有の相違点を示す。そこで本研究では,データアノテーションの精度と潜在的多様性の相違に基づいて,トークンレベルと文レベルのトレーニング重み付けをそれぞれ設計し,gecのトレーニング効果を向上させるために混合粒度重み付けトレーニングを行うmaingecを提案する。経験的評価は、Seq2SeqとSeq2Editの方法では、MainGECは2つのベンチマークデータセットで一貫した、重要なパフォーマンス改善を実現し、混合粒度トレーニングの有効性と優位性を示している。さらにアブレーション実験により,MainGECにおける両粒度の設計重量の有効性が検証された。 The task of Grammatical Error Correction (GEC) aims to automatically correct grammatical errors in natural texts. Almost all previous works treat annotated training data equally, but inherent discrepancies in data are neglected. In this paper, the inherent discrepancies are manifested in two aspects, namely, accuracy of data annotation and diversity of potential annotations. To this end, we propose MainGEC, which designs token-level and sentence-level training weights based on inherent discrepancies in accuracy and potential diversity of data annotation, respectively, and then conducts mixed-grained weighted training to improve the training effect for GEC. Empirical evaluation shows that whether in the Seq2Seq or Seq2Edit manner, MainGEC achieves consistent and significant performance improvements on two benchmark datasets, demonstrating the effectiveness and superiority of the mixed-grained weighted training. Further ablation experiments verify the effectiveness of designed weights of both granularities in MainGEC.	翻訳日:2023-11-28 00:18:03 公開日:2023-11-23
# 協調的側方情報を用いた知覚画像圧縮 Perceptual Image Compression with Cooperative Cross-Modal Side Information ( http://arxiv.org/abs/2311.13847v1 ) ライセンス: Link先を確認	Shiyu Qin, Bin Chen, Yujun Huang, Baoyi An, Tao Dai, Shu-Tao Via	(参考訳) データの爆発により、画像とともに多くの関連テキストが送信されるようになった。分散ソース符号化から着想を得た多くの作品が画像側情報を利用して画像圧縮を強化する。しかし、既存の手法では、マルチモーダル・シナジーの利点が研究で広く実証されているにもかかわらず、画像の知覚的圧縮を高めるために、テキストをサイド情報として使うことを考慮していない。テキストレベルのセマンティクスを効果的に転送して、デコーダにのみ使用可能な画像圧縮を支援するには、どうすればよいのか? 本研究では,テキスト誘導側情報を用いた新しい深層画像圧縮手法を提案する。具体的には,CLIPテキストエンコーダとSemantic-Spatial Awareブロックを用いてテキストと画像の特徴を融合する。これは、学習したテキスト適応アフィン変換をピクセルレベルで導くためにセマンティックマスクを予測することで実現される。さらに,再構成画像の知覚品質を向上させるために,テキスト条件生成対向ネットワークを設計する。 4つのデータセットと10の画像品質評価指標を含む大規模な実験により、提案手法は速度知覚トレードオフと意味的歪みの点で優れた結果が得られることを示した。 The explosion of data has resulted in more and more associated text being transmitted along with images. Inspired by from distributed source coding, many works utilize image side information to enhance image compression. However, existing methods generally do not consider using text as side information to enhance perceptual compression of images, even though the benefits of multimodal synergy have been widely demonstrated in research. This begs the following question: How can we effectively transfer text-level semantic dependencies to help image compression, which is only available to the decoder? In this work, we propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff. Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features. This is done by predicting a semantic mask to guide the learned text-adaptive affine transformation at the pixel level. Furthermore, we design a text-conditional generative adversarial networks to improve the perceptual quality of reconstructed images. Extensive experiments involving four datasets and ten image quality assessment metrics demonstrate that the proposed approach achieves superior results in terms of rate-perception trade-off and semantic distortion.	翻訳日:2023-11-28 00:17:47 公開日:2023-11-23
# metafbp:パーソナライズされた美容予測のための高次予測学習 MetaFBP: Learning to Learn High-Order Predictor for Personalized Facial Beauty Prediction ( http://arxiv.org/abs/2311.13929v1 ) ライセンス: Link先を確認	Luojun Lin, Zhifeng Shen, Jia-Li Yin, Qipeng Liu, Yuanlong Yu, Weijie Chen	(参考訳) 個人の美的嗜好を予測することは、人間社会に重要な実用的応用と学術的意味を持つ。しかし,既存の研究では顔の魅力の共通性を学習・予測することを中心に,Personalized Facial Beauty Prediction (PFBP)にはほとんど注目されていない。 PFBPは、個々の審美的嗜好に適応できるマシンを開発することを目的としており、各ユーザーによって評価される画像はわずかである。本稿では,各ユーザがメタタスクに対応するメタ学習の観点から,このタスクを定式化する。このようなpfbp課題に対処するために、社会における視覚美学がガウス分布に従う人間の美的メカニズムからインスピレーションを得て、ユーザの嗜好を共通性と個性部分とに切り離すことを動機付ける。そこで本研究では,ユニバーサル特徴抽出器を考案し,審美的共通性を捉え,メタラーニング機構を介して予測器の判断境界をシフトすることで審美的個性に適応するように最適化した,新しいメタfbpフレームワークを提案する。適応の遅さや小さなサポートセットへの過剰適合に苦しむ従来のメタ学習手法とは異なり、高速適応のための高次予測器を最適化する新しい手法を提案する。提案手法の性能を検証するために,既存の顔美観予測データセットを用いて複数のpfbpベンチマークを構築した。これらのベンチマーク実験により,MetaFBP法の有効性が示された。 Predicting individual aesthetic preferences holds significant practical applications and academic implications for human society. However, existing studies mainly focus on learning and predicting the commonality of facial attractiveness, with little attention given to Personalized Facial Beauty Prediction (PFBP). PFBP aims to develop a machine that can adapt to individual aesthetic preferences with only a few images rated by each user. In this paper, we formulate this task from a meta-learning perspective that each user corresponds to a meta-task. To address such PFBP task, we draw inspiration from the human aesthetic mechanism that visual aesthetics in society follows a Gaussian distribution, which motivates us to disentangle user preferences into a commonality and an individuality part. To this end, we propose a novel MetaFBP framework, in which we devise a universal feature extractor to capture the aesthetic commonality and then optimize to adapt the aesthetic individuality by shifting the decision boundary of the predictor via a meta-learning mechanism. Unlike conventional meta-learning methods that may struggle with slow adaptation or overfitting to tiny support sets, we propose a novel approach that optimizes a high-order predictor for fast adaptation. In order to validate the performance of the proposed method, we build several PFBP benchmarks by using existing facial beauty prediction datasets rated by numerous users. Extensive experiments on these benchmarks demonstrate the effectiveness of the proposed MetaFBP method.	翻訳日:2023-11-28 00:11:11 公開日:2023-11-23
# ロバストな動的ドメイン一般化のためのパラメータ交換 Parameter Exchange for Robust Dynamic Domain Generalization ( http://arxiv.org/abs/2311.13928v1 ) ライセンス: Link先を確認	Luojun Lin, Zhifeng Shen, Zhishu Sun, Yuanlong Yu, Lei Zhang, Weijie Chen	(参考訳) ドメインの非依存化は、未知のターゲットドメインにおけるモデル劣化の主な原因であり、ドメイン一般化(DG)の開発が緊急に必要となる。近年のDGは、固定重み付き静的モデルにおける自己適応性の欠如を補う、動的ドメイン一般化(DDG)と呼ばれる、未知のターゲットドメインに対するトレーニング不要適応を実現するために、動的ネットワークを使用している。動的ネットワークのパラメータは、それぞれドメイン不変性とドメイン固有性を学ぶために設計された静的コンポーネントと動的コンポーネントに分離することができる。本研究では,既存の技術に基づいて,静的および動的コンポーネントを最適化の観点からより徹底的に切り離すことにより,DDGの限界を推し進める。ドメイン固有の情報を増強することで、静的コンポーネントがより包括的にドメイン不変な特徴を学べるようにすることが主な考慮事項です。その結果、静的コンポーネントによって学習されるより包括的なドメイン不変機能は、動的コンポーネントを適応したドメイン固有の機能を学ぶことに集中させることができます。そこで本研究では,静的成分と動的成分の組み合わせを摂動する,単純で効果的なパラメータ交換法を提案する。摂動および非摂動フィードフォワードの勾配を併用してモデルを最適化し, 上記不等角化を暗黙的に達成する。このように、2つのコンポーネントは相互に便宜的に最適化することができ、これは非依存領域シフトに抵抗し、未知のターゲット領域における自己適応性を改善することができる。大規模な実験により、PEは既存の動的ネットワークに簡単に接続でき、ベルやホイッスルを使わずに一般化能力を向上させることができる。 Agnostic domain shift is the main reason of model degradation on the unknown target domains, which brings an urgent need to develop Domain Generalization (DG). Recent advances at DG use dynamic networks to achieve training-free adaptation on the unknown target domains, termed Dynamic Domain Generalization (DDG), which compensates for the lack of self-adaptability in static models with fixed weights. The parameters of dynamic networks can be decoupled into a static and a dynamic component, which are designed to learn domain-invariant and domain-specific features, respectively. Based on the existing arts, in this work, we try to push the limits of DDG by disentangling the static and dynamic components more thoroughly from an optimization perspective. Our main consideration is that we can enable the static component to learn domain-invariant features more comprehensively by augmenting the domain-specific information. As a result, the more comprehensive domain-invariant features learned by the static component can then enforce the dynamic component to focus more on learning adaptive domain-specific features. To this end, we propose a simple yet effective Parameter Exchange (PE) method to perturb the combination between the static and dynamic components. We optimize the model using the gradients from both the perturbed and non-perturbed feed-forward jointly to implicitly achieve the aforementioned disentanglement. In this way, the two components can be optimized in a mutually-beneficial manner, which can resist the agnostic domain shifts and improve the self-adaptability on the unknown target domain. Extensive experiments show that PE can be easily plugged into existing dynamic networks to improve their generalization ability without bells and whistles.	翻訳日:2023-11-28 00:10:46 公開日:2023-11-23
# 機械学習分類アルゴリズムを用いた臨床・RT-PCR患者の回復・減少予測 Predicting Recovery or Decease of COVID-19 Patients with Clinical and RT-PCR Using Machine Learning Classification Algorithms ( http://arxiv.org/abs/2311.13925v1 ) ライセンス: Link先を確認	Mohammad Dehghani, Zahra Yazdanparast	(参考訳) 新型コロナウイルスのパンデミックは世界経済と人々の日常生活を前例のない方法で破壊している。適切な判断を下すには、covid-19を迅速かつ正確に診断する必要がある。臨床意思決定は患者から収集されたデータに左右される。人工知能の助けを借りて、COVID-19は症状、ポリメラーゼ連鎖反応(PCR)、CTスキャン、胸部X線検査、定期的な血液検査、さらにはうっ血音を分析して迅速に診断された。さらに、これらのデータは患者のモラルの予測に使用できるが、どのデータが最も正確な予測を行うかという疑問がある。したがって,本研究は2つの部分からなる。私たちの最初の目標は、データセットにある機能に基づいて、機械学習アルゴリズムがcovid-19のケース(回復か死か)の結果を予測することができるかどうかを調べることです。本研究の第2部では, 臨床とrt-pcrが回復の予測とデセアーゼに与えた影響について検討し, どちらが信頼性が高いかについて検討した。特徴セットの異なる4つのステージを定義し,6つの機械学習手法を用いて予測モデルを構築した。 78.7%の精度で、ランダム森林は患者の死亡と回復を予測できる有望な結果を示した。このことから,患者の回復と退院は機械学習を用いて予測可能であると考えられる。第2の目的は、AdaBoostアルゴリズムで訓練された臨床単独(RT-PCRを使用しない)が82.1%の精度で最も正確であることを示している。本研究は、危機やcovid-19に類似したアウトブレイクが発生した場合に、医療専門家にガイダンスを提供することができる。 The COVID-19 pandemic has disrupted the global economy and people's daily lives in unprecedented ways. To make appropriate decisions, it is necessary to diagnose COVID-19 rapidly and accurately. Clinical decision making is influenced by data collected from patients. With the aid of artificial intelligence, COVID-19 has been diagnosed quickly by analyzing symptoms, polymerase chain reaction (PCR), computed tomography scans, chest X-rays, routine laboratory blood tests and even cough sounds. Furthermore, these data can be used to predict a patient's morality, although there is a question about which data makes the most accurate predictions. Therefore, this study consists of two parts. Our first objective is to examine whether machine learning algorithms can predict the outcome of COVID-19 cases (recovery or death), based on the features present in the dataset. In the second part of the research, we investigated the impact of clinical and RT-PCR on prediction of recovery and decease to determine which one is more reliable. We defined four stages with different feature sets and use six machine learning methods to build prediction model. With an accuracy of 78.7%, random forest showed promising results for predicting death and recovery of patients. Based on this, it appears that recovery and decease of patients are predictable using machine learning. For second objective, results indicate that clinical alone (without using RT-PCR), trained with AdaBoost algorithm, is the most accurate with an accuracy of 82.1%. This study can provide guidance for medical professionals in the event of a crisis or outbreak similar to COVID-19.	翻訳日:2023-11-28 00:10:16 公開日:2023-11-23
# 産業アプリケーションのためのチェコ語意味的埋め込みモデル Some Like It Small: Czech Semantic Embedding Models for Industry Applications ( http://arxiv.org/abs/2311.13921v1 ) ライセンス: Link先を確認	Ji\v{r}\'i Bedn\'a\v{r}, Jakub N\'aplava, Petra Baran\v{c}\'ikov\'a, Ond\v{r}ej Lisick\'y	(参考訳) 本稿では,小型チェコ文埋め込みモデルの開発と評価について述べる。小型モデルは資源制約環境におけるリアルタイム産業アプリケーションにとって重要なコンポーネントである。ラベル付きチェコデータの利用が限られている中、事前訓練、知識蒸留、教師なしのコントラスト微調整などの代替手法が検討されている。包括的本質的および極端的分析を行い,従来型モデルに比べて約8倍小さく,5倍の速度で比較した。協調と再現性を促進するため、モデルと評価パイプラインの両方が公開アクセス可能となる。本稿では,チェコの検索エンジンであるseznam.czにおける文埋め込みモデルの実践的応用について述べる。これらのモデルは、オーガニック検索、フィーチャースニペット、画像検索など、従来のモデルに取って代わり、全体的な検索エクスペリエンスを高めた。この移行により性能が向上した。 This article focuses on the development and evaluation of Small-sized Czech sentence embedding models. Small models are important components for real-time industry applications in resource-constrained environments. Given the limited availability of labeled Czech data, alternative approaches, including pre-training, knowledge distillation, and unsupervised contrastive fine-tuning, are investigated. Comprehensive intrinsic and extrinsic analyses are conducted, showcasing the competitive performance of our models compared to significantly larger counterparts, with approximately 8 times smaller size and 5 times faster speed than conventional Base-sized models. To promote cooperation and reproducibility, both the models and the evaluation pipeline are made publicly accessible. Ultimately, this article presents practical applications of the developed sentence embedding models in Seznam.cz, the Czech search engine. These models have effectively replaced previous counterparts, enhancing the overall search experience for instance, in organic search, featured snippets, and image search. This transition has yielded improved performance.	翻訳日:2023-11-28 00:09:50 公開日:2023-11-23
# 超高真空中へのナノ粒子のホロコアファイバ負荷 Hollow-core fiber loading of nanoparticles into ultra-high vacuum ( http://arxiv.org/abs/2311.13920v1 ) ライセンス: Link先を確認	Stefan Lindner and Paul Juschitz and Jakob Rieser and Yaakov Y. Fein and Mario Ciampini and Markus Aspelmeyer and Nikolai Kiesel	(参考訳) ナノ粒子による光浮上の分野での多くの実験は、粒子の積み込み技術によって制限されている。本稿では, 量子実験に必要な粒子の位置決定論的位置決めと超高真空レベルでのクリーンデリバリという課題を解決する新しい粒子負荷法を提案する。我々は,100-755\,\mathrm{nm}$径のナノ粒子の定常波光トラップの格子部位への効率的な載荷,位置決め,再配置,および10^{-9}\,\mathrm{mbar}$以下の前例のない圧力下でのナノ粒子の直接載荷を実証した。本手法は,光コンベヤベルトを用いた中空コアフォトニック結晶ファイバ内のナノ粒子の輸送に依存し,ターゲットトラップに対して正確に位置決めすることができる。超高真空中における浮遊固体の量子状態を利用した多粒子動力学と高ターンアラウンド時間の研究において、ナノ粒子数が増加する道を開く。 Many experiments in the field of optical levitation with nanoparticles today are limited by the available technologies for particle loading. Here we introduce a new particle loading method that solves the main challenges, namely deterministic positioning of the particles and clean delivery at ultra-high vacuum levels as required for quantum experiments. We demonstrate the efficient loading, positioning, and repositioning of nanoparticles in the range of $100-755\,\mathrm{nm}$ diameter into different lattice sites of a standing wave optical trap, as well as direct loading of nanoparticles at an unprecedented pressure below $10^{-9}\,\mathrm{mbar}$. Our method relies on the transport of nanoparticles within a hollow-core photonic crystal fiber using an optical conveyor belt, which can be precisely positioned with respect to the target trap. Our work opens the path for increasing nanoparticle numbers in the study of multiparticle dynamics and high turn-around times for exploiting the quantum regime of levitated solids in ultra-high vacuum.	翻訳日:2023-11-28 00:09:33 公開日:2023-11-23
# 周期的に駆動される量子点接触による電流に対する位相差の影響 Effect of dephasing on the current through a periodically driven quantum point contact ( http://arxiv.org/abs/2311.13918v1 ) ライセンス: Link先を確認	Igor Ermakov, Oleg Lychkovskiy	(参考訳) 周期駆動量子点接触(qpc)によってリンクされる2つの1次元量子xx$磁石について考察する。磁石が最初に反対方向に偏極すると、qpcを通るスピン電流が成立することを期待する。近年(B103,L041405(2021))では、駆動周波数が臨界値を超えると電流が完全に停止し、QPCが効果的に絶縁されていることが示されている。ここでは、この画像が量子デファスによってどのように影響を受けるかを探る。以上の結果から,非ゼロなデファス化が電流を回復させることが明らかとなった。 We consider two one-dimensional quantum $XX$ magnets linked by a periodically driven quantum point contact (QPC). If magnets are initially polarized in opposite directions, one expects that a spin current through the QPC will establish. It has been shown recently [Phys. Rev. B 103, L041405 (2021)] that, in fact, when the driving frequency exceeds a critical value, the current halts completely, the QPC being effectively insulating. Here we enquire how this picture is affected by quantum dephasing. Our findings reveal that any non-zero dephasing restores the current.	翻訳日:2023-11-28 00:09:14 公開日:2023-11-23
# 社会的ストレスがCOVID-19の適応動態に及ぼす影響を探る : 流行に直面する「活力」集団の行動のタイピング Exploring the impact of social stress on the adaptive dynamics of COVID-19: Typing the behavior of na\"ive populations faced with epidemics ( http://arxiv.org/abs/2311.13917v1 ) ライセンス: Link先を確認	Innokentiy Kastalskiy, Andrei Zinovyev, Evgeny Mirkes, Victor Kazantsev and Alexander N. Gorban	(参考訳) 自然災害の文脈では、人間の反応は必然的に自然要因と相互作用する。新型コロナウイルス(covid-19)のパンデミックは大きなストレス要因として、さまざまな地域での感染拡大に対応する適応的なダイナミクスの観点から、各国間で大きな変化をもたらしている。これは自然災害解析における文化的特徴の重要な役割を強調している。大規模な流行の理論的理解は主に平均場運動モデルに依存している。しかし、従来のsirモデルでは、新型コロナウイルスの流行開始時に観測された現象を十分に説明できなかった。これらの現象は指数関数的成長の予期せぬ停止、高原の到達、マルチウェーブダイナミクスの発生を含む。高い病原性・不慣れな感染が発生した場合、負の社会経済的影響を軽減するために、非医療レベルで迅速に対応することが重要となる。本稿では、シンプルなSIRSSモデル(SIR with Social Stress)に基づいて、流行の最初の波に関する理論的検討を行う。我々は、世界各国におけるna\"ive population behaviorsの社会文化的特徴の分析を行う。各国/地域特有の特徴は、私たちのモデル内の数個の定数でカプセル化され、これは、適合したCOVID-19統計から導かれる。これらの定数はまた、外的ストレス要因に対する社会的反応のダイナミクスを反映しており、地球規模の社会災害における人間性と自然要因の相互行動を研究することの重要性を強調している。これらの地域特有の特徴に基づき、地域当局はワクチン開発まで疫病対策を効果的に行うことができる。 In the context of natural disasters, human responses inevitably intertwine with natural factors. The COVID-19 pandemic, as a significant stress factor, has brought to light profound variations among different countries in terms of their adaptive dynamics in addressing the spread of infection outbreaks across different regions. This emphasizes the crucial role of cultural characteristics in natural disaster analysis. The theoretical understanding of large-scale epidemics primarily relies on mean-field kinetic models. However, conventional SIR-like models failed to fully explain the observed phenomena at the onset of the COVID-19 outbreak. These phenomena included the unexpected cessation of exponential growth, the reaching of plateaus, and the occurrence of multi-wave dynamics. In situations where an outbreak of a highly virulent and unfamiliar infection arises, it becomes crucial to respond swiftly at a non-medical level to mitigate the negative socio-economic impact. Here we present a theoretical examination of the first wave of the epidemic based on a simple SIRSS model (SIR with Social Stress). We conduct an analysis of the socio-cultural features of na\"ive population behaviors across various countries worldwide. The unique characteristics of each country/territory are encapsulated in only a few constants within our model, derived from the fitted COVID-19 statistics. These constants also reflect the societal response dynamics to the external stress factor, underscoring the importance of studying the mutual behavior of humanity and natural factors during global social disasters. Based on these distinctive characteristics of specific regions, local authorities can optimize their strategies to effectively combat epidemics until vaccines are developed.	翻訳日:2023-11-28 00:09:05 公開日:2023-11-23
# LVNC診断のためのディープラーニングモデルの拡張:限界とトレードオフ Expanding the deep-learning model to diagnosis LVNC: Limitations and trade-offs ( http://arxiv.org/abs/2311.13912v1 ) ライセンス: Link先を確認	Gregorio Bernab\'e and Pilar Gonz\'alez-F\'erez and Jos\'e M. Garc\'ia and Guillem Casas and Josefa Gonz\'alez-Carrillo	(参考訳) 心室左室(LVNC)における過形成あるいは非作用は、近年の心筋症の一形態である。左心室における気管の定量化にはいくつかの方法が提案されているが、特定のアプローチを用いるための一般の合意はない。 U-Net CNNアーキテクチャに基づく左室トラベキュラー定量化のための深層学習手法であるDL-LVTQを提案する。 DL-LVTQは、同じ心筋症(肥大型心筋症)患者のデータセットから開発された自動診断ツールである。本研究は, DL-LVTQを拡張, 適応し, 異なる心筋症に対処した。このデータセットは、3つのグループに379人の患者から成り、異なる特質と心筋症を呈する。患者画像は異なるスキャナーと病院から撮影された。我々は,u-net畳み込みニューラルネットワークを改良し,様々な分類不能・混合・遺伝性心筋症患者の異種集団の特異性を考慮した。提案した自動深層学習法の感度を維持しつつ,新たな患者グループを取り入れることで精度,特異性,カッパ値が向上した。したがって、異なる特徴を有する様々な心筋疾患に対して、より良好な診断ツールが準備されている。心臓科医は、評価されたアウトプットの98.9%が臨床的に診断のために検証されていると考えている。したがって, 心臓組織を分割する精度が高いことにより, 堅牢な診断システムを客観的かつ高速にし, 人的ミスや時間の短縮を図ることができる。 Hyper-trabeculation or non-compaction in the left ventricle of the myocardium (LVNC) is a recently classified form of cardiomyopathy. Several methods have been proposed to quantify the trabeculae accurately in the left ventricle, but there is no general agreement in the medical community to use a particular approach. In previous work, we proposed DL-LVTQ, a deep learning approach for left ventricular trabecular quantification based on a U-Net CNN architecture. DL-LVTQ was an automatic diagnosis tool developed from a dataset of patients with the same cardiomyopathy (hypertrophic cardiomyopathy). In this work, we have extended and adapted DL-LVTQ to cope with patients with different cardiomyopathies. The dataset consists of up 379 patients in three groups with different particularities and cardiomyopathies. Patient images were taken from different scanners and hospitals. We have modified and adapted the U-Net convolutional neural network to account for the different particularities of a heterogeneous group of patients with various unclassifiable or mixed and inherited cardiomyopathies. The inclusion of new groups of patients has increased the accuracy, specificity and kappa values while maintaining the sensitivity of the automatic deep learning method proposed. Therefore, a better-prepared diagnosis tool is ready for various cardiomyopathies with different characteristics. Cardiologists have considered that 98.9% of the evaluated outputs are verified clinically for diagnosis. Therefore, the high precision to segment the different cardiac structures allows us to make a robust diagnostic system objective and faster, decreasing human error and time spent.	翻訳日:2023-11-28 00:08:22 公開日:2023-11-23
# 顧客支援会話における対話品質と感情アノテーション Dialogue Quality and Emotion Annotations for Customer Support Conversations ( http://arxiv.org/abs/2311.13910v1 ) ライセンス: Link先を確認	John Mendon\c{c}a and Patr\'icia Pereira and Miguel Menezes and Vera Cabarr\~ao and Ana C. Farinha and Helena Moniz and Jo\~ao Paulo Carvalho and Alon Lavie and Isabel Trancoso	(参考訳) タスク指向の会話型データセットは、トピック変動や言語多様性を欠くことが多い。しかし、大規模言語モデル(llm)が出現し、多言語多種多様なテキストデータに事前学習されたことにより、これらの制限は克服されたように思われる。しかしながら、対話アプリケーションにおける異なる言語やドメインへの一般化性は、ベンチマークデータセットなしでは不確実である。本稿では、二言語的顧客サポート会話の文脈における感情と会話品質に対する全体論的アノテーションアプローチを提案する。会話を構成する完全なインスタンスを考慮したアノテーションを実行することによって、対話全体のより広い視点を形成することができる。さらに、テキスト分類モデルの開発には、ユニークで貴重なリソースを提供する。そこで本研究では,感情認識と対話品質推定のベンチマークを行い,これらのモデルを活用するためのさらなる研究が必要であることを示す。 Task-oriented conversational datasets often lack topic variability and linguistic diversity. However, with the advent of Large Language Models (LLMs) pretrained on extensive, multilingual and diverse text data, these limitations seem overcome. Nevertheless, their generalisability to different languages and domains in dialogue applications remains uncertain without benchmarking datasets. This paper presents a holistic annotation approach for emotion and conversational quality in the context of bilingual customer support conversations. By performing annotations that take into consideration the complete instances that compose a conversation, one can form a broader perspective of the dialogue as a whole. Furthermore, it provides a unique and valuable resource for the development of text classification models. To this end, we present benchmarks for Emotion Recognition and Dialogue Quality Estimation and show that further research is needed to leverage these models in a production setting.	翻訳日:2023-11-28 00:07:44 公開日:2023-11-23
# 自転車用信号の確保に要する待ち時間削減のためのDRLソリューション A DRL solution to help reduce the cost in waiting time of securing a traffic light for cyclists ( http://arxiv.org/abs/2311.13905v1 ) ライセンス: Link先を確認	Lucas Magnana (AGORA), Herv\'e Rivano (AGORA), Nicolas Chiabaut	(参考訳) サイクリストは、それらを電動交通から切り離すインフラを使うことを好む。交通信号を使って自動車と自転車の流れを分離し、自転車固有のグリーンフェーズを追加することで、自転車レーンのような重いインフラの機会を評価するために、動的に展開できる軽量で安価なソリューションである。そこで本論文では,これらの新しい位相によって引き起こされる待ち時間の増加を補うために,交通信号のグリーン相周期をトラヒックに適用する深層強化学習ソリューションを提案する。車両カウンタデータは、drlアプローチと交通光制御アルゴリズムを1日を通して比較するために使用される。その結果,DRLは車待ち時間をほぼ全時間で最小化できることがわかった。私たちのDRLアプローチは、自転車のトラフィックの適度な変化に対しても堅牢です。本論文のコードはhttps://github.com/LucasMagnana/A-DRL-solution-to-help-reduce-the-cost-in-await-of-securing-a-traffi c-light-for-cyclistsで公開されている。 Cyclists prefer to use infrastructure that separates them from motorized traffic. Using a traffic light to segregate car and bike flows, with the addition of bike-specific green phases, is a lightweight and cheap solution that can be deployed dynamically to assess the opportunity of a heavier infrastructure such as a separate bike lane. To compensate for the increased waiting time induced by these new phases, we introduce in this paper a deep reinforcement learning solution that adapts the green phase cycle of a traffic light to the traffic. Vehicle counter data are used to compare the DRL approach with the actuated traffic light control algorithm over whole days. Results show that DRL achieves better minimization of vehicle waiting time at almost all hours. Our DRL approach is also robust to moderate changes in bike traffic. The code of this paper is available at https://github.com/LucasMagnana/A-DRL-solution-to-help-reduce-the-cost-in-waiting-time-of-securing-a -traffic-light-for-cyclists.	翻訳日:2023-11-28 00:07:22 公開日:2023-11-23
# 野生でのアクティビティビデオによるクエリ Query by Activity Video in the Wild ( http://arxiv.org/abs/2311.13895v1 ) ライセンス: Link先を確認	Tao Hu, William Thong, Pascal Mettes, Cees G.M. Snoek	(参考訳) 本稿では,不均衡シナリオにおけるビデオクエリからのアクティビティ検索に着目した。現在のクェリ・バイ・アクティビティ・ビデオの文献では、埋め込みを学ぶ際に全てのアクティビティに十分なラベル付き例があるという仮定が一般的である。しかし、この仮定は実際には成立せず、一部の活動には多くの例があるが、他の活動は少数の例によってのみ記述される。本稿では,アクティビティ検索における不均衡シナリオを明示的に扱う視覚意味埋め込みネットワークを提案する。私たちのネットワークには2つの新しいモジュールがあります。視覚アライメントモジュールは、すべてのアクティビティに対して、入力ビデオと固定サイズの視覚バンク表現のグローバルアライメントを実行する。セマンティックモジュールは、入力ビデオと固定サイズのセマンティックアクティビティ表現のアライメントを実行する。すべてのアクティビティに対して同じ大きさの視覚的および意味的なアクティビティ表現とマッチングすることにより、検索中の頻繁なアクティビティを無視することが可能になる。新たな不均衡アクティビティ検索ベンチマーク実験では,あらゆるタイプのアクティビティに対するアプローチの有効性が示された。 This paper focuses on activity retrieval from a video query in an imbalanced scenario. In current query-by-activity-video literature, a common assumption is that all activities have sufficient labelled examples when learning an embedding. This assumption does however practically not hold, as only a portion of activities have many examples, while other activities are only described by few examples. In this paper, we propose a visual-semantic embedding network that explicitly deals with the imbalanced scenario for activity retrieval. Our network contains two novel modules. The visual alignment module performs a global alignment between the input video and fixed-sized visual bank representations for all activities. The semantic module performs an alignment between the input video and fixed-sized semantic activity representations. By matching videos with both visual and semantic activity representations that are of equal size over all activities, we no longer ignore infrequent activities during retrieval. Experiments on a new imbalanced activity retrieval benchmark show the effectiveness of our approach for all types of activities.	翻訳日:2023-11-28 00:06:42 公開日:2023-11-23
# General Phrase Debiaser:マルチトークンレベルでのマスク言語モデルのデバイアス General Phrase Debiaser: Debiasing Masked Language Models at a Multi-Token Level ( http://arxiv.org/abs/2311.13892v1 ) ライセンス: Link先を確認	Bingkang Shi, Xiaodan Zhang, Dehan Kong, Yulei Wu, Zongzhen Liu, Honglei Lyu, Longtao Huang	(参考訳) 事前訓練された言語モデルによって明らかになった社会的バイアスと不適切なステレオタイプは、彼らの応用の障害になりつつある。単語レベルを対象とする多くのデバイアス化手法と比較して、フレーズレベルに存在するバイアスに対する関心は比較的少なく、規律領域におけるデバイアス化のパフォーマンスが制限されている。本稿では,マスキング言語モデルにおける句レベルの偏りを緩和できる「textbf{ General Phrase Debiaser}」と呼ばれる自動多言語脱バイアスパイプラインを提案する。具体的には、wikipediaページから定型的なフレーズを生成する \textit{phrase filter stage} と、複数トケンレベルでモデルをデバイアスし、フレーズのバイアス課題に取り組む \textit{model debias stage} からなる。後者はモデルのバイアスをトリガーするプロンプトを検索し、デバイアスに使用する。標準データセットとメトリクスの最先端結果から、我々のアプローチは、様々なパラメータサイズを持つモデル間で、キャリアと複数の規律の両方における性別バイアスを著しく低減できることを示している。 The social biases and unwelcome stereotypes revealed by pretrained language models are becoming obstacles to their application. Compared to numerous debiasing methods targeting word level, there has been relatively less attention on biases present at phrase level, limiting the performance of debiasing in discipline domains. In this paper, we propose an automatic multi-token debiasing pipeline called \textbf{General Phrase Debiaser}, which is capable of mitigating phrase-level biases in masked language models. Specifically, our method consists of a \textit{phrase filter stage} that generates stereotypical phrases from Wikipedia pages as well as a \textit{model debias stage} that can debias models at the multi-token level to tackle bias challenges on phrases. The latter searches for prompts that trigger model's bias, and then uses them for debiasing. State-of-the-art results on standard datasets and metrics show that our approach can significantly reduce gender biases on both career and multiple disciplines, across models with varying parameter sizes.	翻訳日:2023-11-28 00:06:22 公開日:2023-11-23
# 交通ネットワークのトポロジカル分類のための教師なし学習 Unsupervised Learning for Topological Classification of Transportation Networks ( http://arxiv.org/abs/2311.13887v1 ) ライセンス: Link先を確認	Sina Sabzekar, Mohammad Reza Valipour Malakshah, Zahra Amini	(参考訳) 都市化が進むにつれて、交通は都市開発においてますます重要な役割を担っている。輸送システムのモデリング、最適化、シミュレーション、データ分析に関する研究が増えている。これらの研究の多くは、都市における実世界の交通システムを表現するために輸送試験ネットワークを用いており、提案手法の有効性を検証している。これらのネットワークは、それぞれのトポロジーにおいてユニークな特徴を示し、それらの応用は様々な研究目的のために区別される。研究で広く利用されているにもかかわらず、これらのネットワークのトポロジ的特徴に基づく分類に関する包括的な研究は乏しい。本研究では,教師なし学習手法,特にクラスタリングを用いて,このギャップを埋めることを目的とする。本稿では,様々なトポロジカルネットワーク特性を評価するための包括的フレームワークを提案する。さらに, 主成分分析 (PCA) と等尺的特徴マッピング (ISOMAP) の2つの次元化手法を用いて, 相関性の高い特徴の重複を低減し, その後の分類結果の解釈可能性を高める。次に、K-meansとHDBSCANという2つのクラスタリングアルゴリズムを用いて14のトランスポートネットワークを分類する。 PCA法はK平均クラスタリング法に続き、Silhouetteスコア0.510ドルの他の手法よりも優れており、輸送ネットワークを5つのクラスタに分類することができる。結果の分類に関する詳細な議論も行っています。 With increasing urbanization, transportation plays an increasingly critical role in city development. The number of studies on modeling, optimization, simulation, and data analysis of transportation systems is on the rise. Many of these studies utilize transportation test networks to represent real-world transportation systems in urban areas, examining the efficacy of their proposed approaches. Each of these networks exhibits unique characteristics in their topology, making their applications distinct for various study objectives. Despite their widespread use in research, there is a lack of comprehensive study addressing the classification of these networks based on their topological characteristics. This study aims to fill this gap by employing unsupervised learning methods, particularly clustering. We present a comprehensive framework for evaluating various topological network characteristics. Additionally, we employ two dimensionality reduction techniques, namely Principal Component Analysis (PCA) and Isometric Feature Mapping (ISOMAP), to reduce overlaps of highly correlated features and enhance the interpretability of the subsequent classification results. We then utilize two clustering algorithms, K-means and HDBSCAN, to classify 14 transportation networks. The PCA method, followed by the K-means clustering approach, outperforms other alternatives with a Silhouette score of $0.510$, enabling the classification of transportation networks into five clusters. We also provide a detailed discussion on the resulting classification.	翻訳日:2023-11-28 00:05:43 公開日:2023-11-23
# 物理インフォームドニューラルネットワークは自己改善できるか? Can Physics Informed Neural Operators Self Improve? ( http://arxiv.org/abs/2311.13885v1 ) ライセンス: Link先を確認	Ritam Majumdar, Amey Varhade, Shirish Karande, Lovekesh Vig	(参考訳) 自己学習技術は多くのディープラーニングモデルやタスクで顕著な価値を示している。しかし、偏微分方程式系(Eg: Neural Operators)の高速解法学習の文脈で考えると、そのような手法はほとんど解明されていない。本研究では,フーリエニューラル演算子(FNO)の自己学習の利用について検討する。ニューラル演算子はデータ駆動技術として登場したが、実験や従来のソルバからのデータは、常に容易に利用できるとは限らない。物理インフォームドニューラルオペレータ(PINO)は、トレーニングに物理損失を利用することで、この制約を克服するが、データ無しでトレーニングされたPINOの精度は、データによるトレーニングによって得られた性能と一致しない。この研究で、このパフォーマンスのギャップを埋めるために自己学習が使えることを示す。本研究では,1D-Burgers と 2D-Darcy PDE を用いて,自己学習の有効性を示す。具体的には、FNOは、自己学習を通じて物理損失を専門に訓練すると、データと物理損失の両方で訓練されたFNOと比較して、Burgersが1.07x、Darcyが1.02xに近づきます。さらに,各反復の収束を必ずしも訓練することなく,擬似ラベルを自己学習に利用できることを発見した。その結果,PINOのベースライン性能を向上する自己学習スケジュールを,精度と時間の観点から発見できることがわかった。 Self-training techniques have shown remarkable value across many deep learning models and tasks. However, such techniques remain largely unexplored when considered in the context of learning fast solvers for systems of partial differential equations (Eg: Neural Operators). In this work, we explore the use of self-training for Fourier Neural Operators (FNO). Neural Operators emerged as a data driven technique, however, data from experiments or traditional solvers is not always readily available. Physics Informed Neural Operators (PINO) overcome this constraint by utilizing a physics loss for the training, however the accuracy of PINO trained without data does not match the performance obtained by training with data. In this work we show that self-training can be used to close this gap in performance. We examine canonical examples, namely the 1D-Burgers and 2D-Darcy PDEs, to showcase the efficacy of self-training. Specifically, FNOs, when trained exclusively with physics loss through self-training, approach 1.07x for Burgers and 1.02x for Darcy, compared to FNOs trained with both data and physics loss. Furthermore, we discover that pseudo-labels can be used for self-training without necessarily training to convergence in each iteration. A consequence of this is that we are able to discover self-training schedules that improve upon the baseline performance of PINO in terms of accuracy as well as time.	翻訳日:2023-11-28 00:05:23 公開日:2023-11-23
# 大規模意思決定のための大規模言語モデルベースエージェントの制御:アクタ・クリティカルアプローチ Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach ( http://arxiv.org/abs/2311.13884v1 ) ライセンス: Link先を確認	Bin Zhang, Hangyu Mao, Jingqing Ruan, Ying Wen, Yang Li, Shao Zhang, Zhiwei Xu, Dapeng Li, Ziyue Li, Rui Zhao, Lijuan Li, Guoliang Fan	(参考訳) 大規模言語モデル(LLM)の大幅な進歩は、マルチエージェントシステムにおける計画と意思決定に対処する新たな機会をもたらした。しかし, エージェントの数が増加するにつれて, LLMの幻覚化やマルチエージェントシステム(MAS)のコーディネーションの問題がますます顕著になっている。さらに、多数のエージェントの相互作用を促進するためにLLMを使用する場合、トークンの効率的な利用が重要な考慮事項となる。本稿では,大規模マルチエージェント環境におけるLCMのコーディネーションと意思決定能力の向上を目的とした新しいフレームワークを提案する。提案手法は,マルチエージェント強化学習におけるアクタ批判的枠組みからインスピレーションを得て,LLMやMASが提示する課題に効果的に対処する,モジュール的でトークン効率のよいソリューションを開発した。システム資源割当とロボットグリッド輸送に関する実験で実施した評価を通じて,提案手法が有するかなりの利点を実証する。 The significant advancements in large language models (LLMs) have presented novel opportunities for tackling planning and decision-making within multi-agent systems. However, as the number of agents increases, the issues of hallucination in LLMs and coordination in multi-agent systems (MAS) have become increasingly pronounced. Additionally, the efficient utilization of tokens becomes a critical consideration when employing LLMs to facilitate the interactions of large numbers of agents. In this paper, we present a novel framework aimed at enhancing coordination and decision-making capabilities of LLMs within large-scale multi-agent environments. Our approach draws inspiration from the actor-critic framework employed in multi-agent reinforcement learning, and we develop a modular and token-efficient solution that effectively addresses challenges presented by LLMs and MAS. Through evaluations conducted in experiments involving system resource allocation and robot grid transportation, we demonstrate the considerable advantages afforded by our proposed approach.	翻訳日:2023-11-28 00:04:57 公開日:2023-11-23
# 医用画像のディープ・インタラクティブ・セグメンテーション : システムレビューと分類学 Deep Interactive Segmentation of Medical Images: A Systematic Review and Taxonomy ( http://arxiv.org/abs/2311.13964v1 ) ライセンス: Link先を確認	Zdravko Marinov, Paul F. J\"ager, Jan Egger, Jens Kleesiek, Rainer Stiefelhagen	(参考訳) 対話的セグメンテーションは、人的フィードバックを取り入れることでコストのかかるアノテーションの効率を高めることを目的とした、医用画像解析における重要な研究分野である。このフィードバックはクリック、スクリブル、マスクの形式で行われ、モデルの出力を反復的に洗練することで、システムが望ましい振る舞いに向かって効率的に導くことができる。近年、深層学習に基づくアプローチは、医療画像領域だけで提案されている121の手法によって、この分野の急速な成長をもたらす新たなレベルへと結果をもたらしている。本論では,包括的分類法,既存手法の体系的見直し,現在の実践の深い分析を特徴とする,この新興分野の構造化的概観について述べる。これらの貢献に基づいて,この分野の課題と機会について論じる。例えば、標準化されたベースラインとベンチマークによって取り組まなければならないメソッド間の比較が著しく欠落していることが分かります。 Interactive segmentation is a crucial research area in medical image analysis aiming to boost the efficiency of costly annotations by incorporating human feedback. This feedback takes the form of clicks, scribbles, or masks and allows for iterative refinement of the model output so as to efficiently guide the system towards the desired behavior. In recent years, deep learning-based approaches have propelled results to a new level causing a rapid growth in the field with 121 methods proposed in the medical imaging domain alone. In this review, we provide a structured overview of this emerging field featuring a comprehensive taxonomy, a systematic review of existing methods, and an in-depth analysis of current practices. Based on these contributions, we discuss the challenges and opportunities in the field. For instance, we find that there is a severe lack of comparison across methods which needs to be tackled by standardized baselines and benchmarks.	翻訳日:2023-11-27 23:58:21 公開日:2023-11-23
# ダイナミックMR画像再構成学習における公開天然ビデオの利用の検討 Investigating the use of publicly available natural videos to learn Dynamic MR image reconstruction ( http://arxiv.org/abs/2311.13963v1 ) ライセンス: Link先を確認	Olivier Jaubert, Michele Pascale, Javier Montalt-Tordera, Julius Akesson, Ruta Virsinskaite, Daniel Knight, Simon Arridge, Jennifer Steeden, Vivek Muthurangu	(参考訳) 目的:公開天然ビデオ(Inter4K)から動的MR画像再構成を学習するために,ディープラーニング(DL)パイプラインの開発と評価を行う。 Materials and Methods: DLアーキテクチャ(VarNet, 3D UNet, FastDVDNet)およびそれに対応するサンプリングパターン(Cartesian, radial, spiral)について,真のマルチコイル心MRデータ(N=692)から,あるいはInter4K自然ビデオ(N=692)からシミュレーションした擬似MRデータから学習を行った。実時間アンサンプされた動的MR画像は、心臓データと自然ビデオで訓練されたDLネットワークと圧縮センシング(CS)を用いて再構成された。 MSE, PSNR, SSIMのシミュレーション(N=104データセット)において, 心臓(短軸, 4室, N=20) と音声(N=10) の主観的画像品質ランキング, SNR, エッジシャープネスの相違について検討した。熱後ネメニイ分析によるFriedman Chi Square試験を行い,統計的意義を検討した。結果: 心臓データで訓練されたdlネットワークは自然ビデオで訓練されたdlネットワークを上回り, cs (p<0.05) を上回った。しかし, 予報実験では, 両トレーニングデータセットを用いたDL再構成はCSと同等にランク付けされ, ほとんどの条件においてSNRとエッジシャープネスの統計的差異は認められなかった。また,心臓データを用いたdl法と自然映像法(ssim>0.85)では高いssimが測定された。結論: 開発パイプラインでは, dl再構成の利点を保ちつつ, 限界(データの不足や共有)を克服しながら, dl再構成の利点を保ちながら, 自然映像から動的mr再構成を学習できる。自然なビデオデータセット、コード、トレーニング済みネットワークは、githubで簡単に利用できる。キーワード:リアルタイム、ダイナミックMRI、ディープラーニング、画像再構成、機械学習 Purpose: To develop and assess a deep learning (DL) pipeline to learn dynamic MR image reconstruction from publicly available natural videos (Inter4K). Materials and Methods: Learning was performed for a range of DL architectures (VarNet, 3D UNet, FastDVDNet) and corresponding sampling patterns (Cartesian, radial, spiral) either from true multi-coil cardiac MR data (N=692) or from pseudo-MR data simulated from Inter4K natural videos (N=692). Real-time undersampled dynamic MR images were reconstructed using DL networks trained with cardiac data and natural videos, and compressed sensing (CS). Differences were assessed in simulations (N=104 datasets) in terms of MSE, PSNR, and SSIM and prospectively for cardiac (short axis, four chambers, N=20) and speech (N=10) data in terms of subjective image quality ranking, SNR and Edge sharpness. Friedman Chi Square tests with post-hoc Nemenyi analysis were performed to assess statistical significance. Results: For all simulation metrics, DL networks trained with cardiac data outperformed DL networks trained with natural videos, which outperformed CS (p<0.05). However, in prospective experiments DL reconstructions using both training datasets were ranked similarly (and higher than CS) and presented no statistical differences in SNR and Edge Sharpness for most conditions. Additionally, high SSIM was measured between the DL methods with cardiac data and natural videos (SSIM>0.85). Conclusion: The developed pipeline enabled learning dynamic MR reconstruction from natural videos preserving DL reconstruction advantages such as high quality fast and ultra-fast reconstructions while overcoming some limitations (data scarcity or sharing). The natural video dataset, code and pre-trained networks are made readily available on github. Key Words: real-time; dynamic MRI; deep learning; image reconstruction; machine learning;	翻訳日:2023-11-27 23:58:06 公開日:2023-11-23
# 人間の機械の共創。 GANを用いた創造的文字設計プロセスへの補完的認知的アプローチ Human Machine Co-Creation. A Complementary Cognitive Approach to Creative Character Design Process Using GANs ( http://arxiv.org/abs/2311.13960v1 ) ライセンス: Link先を確認	Mohammad Lataifeh, Xavier A Carrascoa, Ashraf M Elnagara, Naveed Ahmeda, Imran Junejo	(参考訳) 生成型逆ネットワークの最近の進歩 gans 応用は、様々な分野の研究者の注目を集め続けている。このようなフレームワークでは、2つのニューラルネットワークが競合し、元のデータセットと区別できない新しい視覚コンテンツを生成する。本研究の目的は,ゲームやアニメーションなどのマルチメディアプロジェクトにおけるキャラクタの可視化と作成におけるキャラクタ設計能力を高めるため,人間と機械の補完的なコード署名プロセスを作成することである。設計認知的足場によって駆動されるこのアプローチは、知覚、理解、および作りの過程を知らせることを目的としている。マシン生成の概念は、キャラクターデザイナーが新しいキャラクターを概念化するためのローンチプラットフォームとして使用される。この研究のためにラベル付き22,000文字のデータセットを開発し、異なるGANを用いてコンテキストに最も適した評価を行い、続いて機械出力と人間の導出の混合手法の評価を行った。提案したコクリエーションフレームワークの価値を検証し,創発された概念がデザイナーの能力と相互作用する認知物質としてどのように利用されているかを明らかにする。 Recent advances in Generative Adversarial Networks GANs applications continue to attract the attention of researchers in different fields. In such a framework, two neural networks compete adversely to generate new visual contents indistinguishable from the original dataset. The objective of this research is to create a complementary codesign process between humans and machines to augment character designers abilities in visualizing and creating new characters for multimedia projects such as games and animation. Driven by design cognitive scaffolding, the proposed approach aims to inform the process of perceiving, knowing, and making. The machine generated concepts are used as a launching platform for character designers to conceptualize new characters. A labelled dataset of 22,000 characters was developed for this work and deployed using different GANs to evaluate the most suited for the context, followed by mixed methods evaluation for the machine output and human derivations. The discussed results substantiate the value of the proposed cocreation framework and elucidate how the generated concepts are used as cognitive substances that interact with designers competencies in a versatile manner to influence the creative processes of conceptualizing novel characters.	翻訳日:2023-11-27 23:57:29 公開日:2023-11-23
# RankFeat\&RankWeight: Rank-1 Feature/Weight removal for Out-of-distriion Detection RankFeat\&RankWeight: Rank-1 Feature/Weight Removal for Out-of-distribution Detection ( http://arxiv.org/abs/2311.13959v1 ) ライセンス: Link先を確認	Yue Song, Nicu Sebe, Wei Wang	(参考訳) out-of-distribution(ood)検出のタスクは、実際の環境で機械学習モデルをデプロイする上で非常に重要です。本稿では,in-distribution (id) と ood の特徴の特異値分布がかなり異なることを観察する。 ood 特徴行列は id 特徴よりも支配的特異値が大きい傾向にあり,ood サンプルのクラス予測はそれらによって決定される。この観察は、最大特異値と関連する特異ベクトルからなるランク1行列を高次特徴量から取り除き、OOD検出のための単純で効果的な \emph{post hoc} アプローチである \texttt{RankFeat} を提案する動機付けとなる。 texttt{RankFeat} は \emph{state-of-the-art} のパフォーマンスを達成し、以前のベストメソッドと比較して平均偽陽性率 (FPR95) を 17.90 % 削減する。 texttt{RankFeat} の成功は、ニューラルネットワークのパラメータ行列に同様の現象が存在するかどうかを調べる動機となる。そこで我々は,1つの深層パラメータ行列からランク1重みを除去する‘texttt{RankWeight} を提案する。我々の \texttt{RankWeight} もまた \emph{post hoc} であり、ランク1行列を一度だけ計算する必要がある。スタンドアロンのアプローチとして、 \texttt{RankWeight} は様々なバックボーンにわたる他のメソッドと非常に競合するパフォーマンスを持つ。さらに \texttt{RankWeight} は、幅広い OOD 検出方法との柔軟な互換性を享受しています。 texttt{rankweight} と \texttt{rankfeat} の組み合わせは、新しい \emph{state-of-the-art} のパフォーマンスをリフレッシュし、imagenet-1k ベンチマークで fpr95 を 16.13\% まで低くした。実験結果を支持するために,広範囲なアブレーション研究と包括的理論解析を行った。 The task of out-of-distribution (OOD) detection is crucial for deploying machine learning models in real-world settings. In this paper, we observe that the singular value distributions of the in-distribution (ID) and OOD features are quite different: the OOD feature matrix tends to have a larger dominant singular value than the ID feature, and the class predictions of OOD samples are largely determined by it. This observation motivates us to propose \texttt{RankFeat}, a simple yet effective \emph{post hoc} approach for OOD detection by removing the rank-1 matrix composed of the largest singular value and the associated singular vectors from the high-level feature. \texttt{RankFeat} achieves \emph{state-of-the-art} performance and reduces the average false positive rate (FPR95) by 17.90\% compared with the previous best method. The success of \texttt{RankFeat} motivates us to investigate whether a similar phenomenon would exist in the parameter matrices of neural networks. We thus propose \texttt{RankWeight} which removes the rank-1 weight from the parameter matrices of a single deep layer. Our \texttt{RankWeight}is also \emph{post hoc} and only requires computing the rank-1 matrix once. As a standalone approach, \texttt{RankWeight} has very competitive performance against other methods across various backbones. Moreover, \texttt{RankWeight} enjoys flexible compatibility with a wide range of OOD detection methods. The combination of \texttt{RankWeight} and \texttt{RankFeat} refreshes the new \emph{state-of-the-art} performance, achieving the FPR95 as low as 16.13\% on the ImageNet-1k benchmark. Extensive ablation studies and comprehensive theoretical analyses are presented to support the empirical results.	翻訳日:2023-11-27 23:57:08 公開日:2023-11-23
# テンソル$U_1$ノルムによる高次テンソル回復 High-Order Tensor Recovery with A Tensor $U_1$ Norm ( http://arxiv.org/abs/2311.13958v1 ) ライセンス: Link先を確認	Jingjing Zheng, Wenzhe Wang, Xiaoqin Zhang, Yankai Cao, Xianta Jiang	(参考訳) 近年,多くのテンソルSVD(t-SVD)ベースのテンソルリカバリ手法が登場し,視覚データ処理の可能性を示唆している。しかし、これらの手法は、非滑らかな変化を示す高次テンソルデータに直面すると、しばしば性能劣化に悩まされるが、従来のt-SVD法では無視される。本研究の目的は, テンソルデータの非滑らかな変化を効果的に処理し, 様々な次元にわたる高次テンソルデータの相関を, 多数の変数や重みを導入することなく効率的に探索することである。この目的のために、新しいテンソル分解とテンソル $u_1$ ノルムと呼ばれる新しいテンソルノルムを導入する。これらの手法を高階テンソル補完問題の解法に利用し,結果のテンソル補完モデルの厳密な回復のための理論的保証を提供する。近似アルゴリズムと乗算器の交互方向法を組み合わせることにより, 結果のテンソル完備化モデルを反復的に解く最適化アルゴリズムを提案する。理論的解析により最適化問題のKKT点へのアルゴリズムの収束が示された。数値実験により,高次テンソル補完法,特に非スムース変化を有するテンソルデータにおいて,提案手法の有効性が示された。 Recently, numerous tensor SVD (t-SVD)-based tensor recovery methods have emerged, showing promise in processing visual data. However, these methods often suffer from performance degradation when confronted with high-order tensor data exhibiting non-smooth changes, commonly observed in real-world scenarios but ignored by the traditional t-SVD-based methods. Our objective in this study is to provide an effective tensor recovery technique for handling non-smooth changes in tensor data and efficiently explore the correlations of high-order tensor data across its various dimensions without introducing numerous variables and weights. To this end, we introduce a new tensor decomposition and a new tensor norm called the Tensor $U_1$ norm. We utilize these novel techniques in solving the problem of high-order tensor completion problem and provide theoretical guarantees for the exact recovery of the resulting tensor completion models. An optimization algorithm is proposed to solve the resulting tensor completion model iteratively by combining the proximal algorithm with the Alternating Direction Method of Multipliers. Theoretical analysis showed the convergence of the algorithm to the Karush-Kuhn-Tucker (KKT) point of the optimization problem. Numerical experiments demonstrated the effectiveness of the proposed method in high-order tensor completion, especially for tensor data with non-smooth changes.	翻訳日:2023-11-27 23:56:27 公開日:2023-11-23
# 効率的なトリガーワード挿入 Efficient Trigger Word Insertion ( http://arxiv.org/abs/2311.13957v1 ) ライセンス: Link先を確認	Yueqi Zeng, Ziqiang Li, Pengfei Xia, Lei Liu, Bin Li	(参考訳) 近年、自然言語処理(NLP)分野のブームにより、バックドア攻撃はディープニューラルネットワークモデルに対して重大な脅威となる。しかし、前回の研究は中毒率の影響をほとんど考慮していない。本研究の目的は,テキストバックドア攻撃における適切な攻撃成功率(asr)を保ちながら,被毒サンプル数を削減することである。そこで本研究では,トリガーワードの最適化と有毒サンプル選択の観点から,効率的なトリガーワード挿入戦略を提案する。異なるデータセットとモデルに関する広範囲な実験により,提案手法がテキスト分類タスクにおける攻撃効率を著しく向上できることが証明された。また,本手法は汚れラベル設定では10個の有毒試料のみで90%以上を達成でき,クリーンラベル設定ではトレーニングデータの1.5%しか必要としない。 With the boom in the natural language processing (NLP) field these years, backdoor attacks pose immense threats against deep neural network models. However, previous works hardly consider the effect of the poisoning rate. In this paper, our main objective is to reduce the number of poisoned samples while still achieving a satisfactory Attack Success Rate (ASR) in text backdoor attacks. To accomplish this, we propose an efficient trigger word insertion strategy in terms of trigger word optimization and poisoned sample selection. Extensive experiments on different datasets and models demonstrate that our proposed method can significantly improve attack effectiveness in text classification tasks. Remarkably, our approach achieves an ASR of over 90% with only 10 poisoned samples in the dirty-label setting and requires merely 1.5% of the training data in the clean-label setting.	翻訳日:2023-11-27 23:56:06 公開日:2023-11-23
# 電気ネットワーク周波数光センシング装置 Electric Network Frequency Optical Sensing Devices ( http://arxiv.org/abs/2311.13954v1 ) ライセンス: Link先を確認	Christos Moysiadis, Georgios Karantaidis, Constantine Kotropoulos	(参考訳) ENF(Electric Network Frequency)は、マルチメディア法医学の応用において指紋として機能する。屋内環境では、ENFの変動は主電源に接続された光源の強度に影響を与える。これにより、センサ装置が捉えた光強度変動を利用してENFを推定することができる。光ダイオードに基づく第1の光センシング装置は、室内照明環境におけるENF変動を捉えるために開発された。また、電源メインから直接ENFを捕捉する装置を実装する。この装置は、真理ENFコレクターとして機能する。カメラが捉えたビデオ記録もENFを推定するために使われる。カメラは第2の光学センサーとして機能する。 ENF推定に影響を及ぼす要因について検討した。 2つの光学センサで推定されるenfとパワーメインから直接推定されるenfとの最大相関係数を用いて推定精度を測定する。論文の主な貢献は、白壁を捕獲する静的なものから人間の活動を含む非静的なものまで、enf推定に関する広範な実験的な証拠の開示である。 Electric Network Frequency (ENF) acts as a fingerprint in multimedia forensics applications. In indoor environments, ENF variations affect the intensity of light sources connected to power mains. Accordingly, the light intensity variations captured by sensing devices can be exploited to estimate the ENF. A first optical sensing device based on a photodiode is developed for capturing ENF variations in indoor lighting environments. In addition, a device that captures the ENF directly from power mains is implemented. This device serves as a ground truth ENF collector. Video recordings captured by a camera are also employed to estimate the ENF. The camera serves as a second optical sensor. The factors affecting the ENF estimation are thoroughly studied. The maximum correlation coefficient between the ENF estimated by the two optical sensors and that estimated directly from power mains is used to measure the estimation accuracy. The paper's major contribution is in the disclosure of extensive experimental evidence on ENF estimation in scenes ranging from static ones capturing a white wall to non-static ones, including human activity.	翻訳日:2023-11-27 23:55:51 公開日:2023-11-23
# グラフレベルのクラスタリングのためのハイパースフィア上の一様クラスタの学習 Learning Uniform Clusters on Hypersphere for Deep Graph-level Clustering ( http://arxiv.org/abs/2311.13953v1 ) ライセンス: Link先を確認	Mengling Hu, Chaochao Chen, Weiming Liu, Xinyi Zhang, Xinting Liao, and Xiaolin Zheng	(参考訳) 近年,グラフクラスタリングが広く研究されている。しかし、既存のグラフクラスタリング手法のほとんどは、単一のグラフ内のノードをクラスタにグループ化するノードレベルのクラスタリングに焦点を当てている。対照的に、複数のグラフをクラスタにグループ化するグラフレベルのクラスタリングは、ほとんど未調査のままである。グラフレベルのクラスタリングは、分子の特性予測やソーシャルネットワークにおけるコミュニティ分析など、様々な実世界のアプリケーションにおいて重要である。しかし,グラフレベルでのクラスタリングは,グラフレベルでの表現の識別性が不十分であることや,クラスタのクラスタリングが不十分であることから,解の縮退(クラスタ崩壊)が困難である。そこで本研究では,Uniform Deep Graph Clustering (UDGC) と呼ばれるグラフレベルのクラスタリング手法を提案する。 UDGCはインスタンスを異なるクラスタに均等に割り当て、次にこれらのクラスタをユニットハイパースフィア上に分散させ、より均一なクラスタレベルの分散と、より小さなクラスタ崩壊につながる。具体的には,クラスタ分割のための均一に分散された信頼性の高い擬似ラベルを生成するためのAugmentation-Consensus Optimal Transport (ACOT)を提案する。そして、これらのクラスタを分散するために、対比学習を採用します。さらに,より優れたパラメータを学習するためのモデルを導くために,Center Alignment Optimal Transport (CAOT)を提案する。 8つのよく知られたデータセットに関する実証研究は、UDGCが最先端のモデルを大幅に上回っていることを示している。 Graph clustering has been popularly studied in recent years. However, most existing graph clustering methods focus on node-level clustering, i.e., grouping nodes in a single graph into clusters. In contrast, graph-level clustering, i.e., grouping multiple graphs into clusters, remains largely unexplored. Graph-level clustering is critical in a variety of real-world applications, such as, properties prediction of molecules and community analysis in social networks. However, graph-level clustering is challenging due to the insufficient discriminability of graph-level representations, and the insufficient discriminability makes deep clustering be more likely to obtain degenerate solutions (cluster collapse). To address the issue, we propose a novel deep graph-level clustering method called Uniform Deep Graph Clustering (UDGC). UDGC assigns instances evenly to different clusters and then scatters those clusters on unit hypersphere, leading to a more uniform cluster-level distribution and a slighter cluster collapse. Specifically, we first propose Augmentation-Consensus Optimal Transport (ACOT) for generating uniformly distributed and reliable pseudo labels for partitioning clusters. Then we adopt contrastive learning to scatter those clusters. Besides, we propose Center Alignment Optimal Transport (CAOT) for guiding the model to learn better parameters, which further promotes the cluster performance. Our empirical study on eight well-known datasets demonstrates that UDGC significantly outperforms the state-of-the-art models.	翻訳日:2023-11-27 23:55:40 公開日:2023-11-23
# GPT-4Vを用いたマルチモーダルLCMのMLLM-Bench評価 MLLM-Bench, Evaluating Multi-modal LLMs using GPT-4V ( http://arxiv.org/abs/2311.13951v1 ) ライセンス: Link先を確認	Wentao Ge, Shunian Chen, Guiming Chen, Junying Chen, Zhihong Chen, Shuo Yan, Chenghao Zhu, Ziyue Lin, Wenya Xie, Xidong Wang, Anningzhe Gao, Zhiyi Zhang, Jianquan Li, Xiang Wan, Benyou Wang	(参考訳) AI(Artificial General Intelligence)の追求において、言語モデルにおけるビジョンの統合は重要なマイルストーンとなった。 GPT-4Vのような視覚言語モデル(MLLM)の出現は、人間の脳のマルチモーダル能力に合わせて、AIアプリケーションを拡張した。しかし、MLLMの有効性を評価することは、不十分な回答を欠くタスクの主観的な性質のために大きな課題となる。既存のマルチモーダルな大規模言語モデルの自動評価手法は、創造的で連想的なマルチモーダルタスクのニュアンスに不適切に対処する、標準回答を持つ客観的クエリに依存している。これに対処するため、我々はmllm-benchを紹介する。これはvicunaに触発された革新的なベンチマークで、認識、理解、適用、分析、評価、創造を含む様々なシナリオにまたがる。 MLLM-Benchは、ユーザエクスペリエンスをより正確に反映し、モデルパフォーマンスのより包括的な評価を提供するように設計されている。比較評価は、既存のオープンソースモデルとgpt-4vの大幅な性能差を示している。我々は,MLLM-Benchがオープンソースコミュニティの進展をきっかけに,現実世界の幅広いアプリケーションに対応するユーザ中心の視覚言語モデルを開発することを仮定する。 online leaderboard in \url{https://mllm-bench.llmzoo.com} を参照。 In the pursuit of Artificial General Intelligence (AGI), the integration of vision in language models has marked a significant milestone. The advent of vision-language models (MLLMs) like GPT-4V have expanded AI applications, aligning with the multi-modal capabilities of the human brain. However, evaluating the efficacy of MLLMs poses a substantial challenge due to the subjective nature of tasks that lack definitive answers. Existing automatic evaluation methodologies on multi-modal large language models rely on objective queries that have standard answers, inadequately addressing the nuances of creative and associative multi-modal tasks. To address this, we introduce MLLM-Bench, an innovative benchmark inspired by Vicuna, spanning a diverse array of scenarios, including Perception, Understanding, Applying, Analyzing, Evaluating, and Creation along with the ethical consideration. MLLM-Bench is designed to reflect user experience more accurately and provide a more holistic assessment of model performance. Comparative evaluations indicate a significant performance gap between existing open-source models and GPT-4V. We posit that MLLM-Bench will catalyze progress in the open-source community towards developing user-centric vision-language models that meet a broad spectrum of real-world applications. See online leaderboard in \url{https://mllm-bench.llmzoo.com}.	翻訳日:2023-11-27 23:55:15 公開日:2023-11-23
# LSTMニューラルネットと多項式回帰を用いたリアルタイム物体位置予測 Object Location Prediction in Real-time using LSTM Neural Network and Polynomial Regression ( http://arxiv.org/abs/2311.13950v1 ) ライセンス: Link先を確認	Petar Stojkovi\'c, Predrag Tadi\'c	(参考訳) 本稿では,物体位置座標の予測と補間を行うシステムの設計と実装について述べる。本手法は,Long Short-Term Memory(LSTM)ニューラルネットワークと多項式回帰による慣性測定とグローバル位置決めシステムデータに基づく。 LSTMは、データシーケンスの処理と長期依存性の問題を回避するのに特に適した、リカレントニューラルネットワークの一種である。実世界の車両とGPS(グローバル測位システム)センサーのデータを応用した。様々なセンサ周波数とGPSの時間ステップとドロップアウトに対応するために、重要な前処理ステップが開発された。 LSTMベースのシステムの性能はカルマンフィルタと比較された。システムは低レイテンシと高精度でリアルタイムに動作するように調整された。我々は, 加速, 旋回, 減速, 直線経路など, 様々な運転条件下での走行試験を行った。提案手法の精度と推定時間を検証し,リアルタイムに実現可能であることを示した。従来のカルマンフィルタ法と比較して誤差は76\%減少し, 平均誤差は0.46mであり, 推定時間はlstm法と同程度であった。 This paper details the design and implementation of a system for predicting and interpolating object location coordinates. Our solution is based on processing inertial measurements and global positioning system data through a Long Short-Term Memory (LSTM) neural network and polynomial regression. LSTM is a type of recurrent neural network (RNN) particularly suited for processing data sequences and avoiding the long-term dependency problem. We employed data from real-world vehicles and the global positioning system (GPS) sensors. A critical pre-processing step was developed to address varying sensor frequencies and inconsistent GPS time steps and dropouts. The LSTM-based system's performance was compared with the Kalman Filter. The system was tuned to work in real-time with low latency and high precision. We tested our system on roads under various driving conditions, including acceleration, turns, deceleration, and straight paths. We tested our proposed solution's accuracy and inference time and showed that it could perform in real-time. Our LSTM-based system yielded an average error of 0.11 meters with an inference time of 2 ms. This represents a 76\% reduction in error compared to the traditional Kalman filter method, which has an average error of 0.46 meters with a similar inference time to the LSTM-based system.	翻訳日:2023-11-27 23:54:51 公開日:2023-11-23
# 注意ニューラルネットワークに基づく高再生可能電力系統の最適潮流 Optimal Power Flow in Highly Renewable Power System Based on Attention Neural Networks ( http://arxiv.org/abs/2311.13949v1 ) ライセンス: Link先を確認	Chen Li, Alexander Kies, Kai Zhou, Markus Schlott, Omar El Sayed, Mariia Bilousova and Horst Stoecker	(参考訳) 最適電力フロー(opf)問題は、物理的および工学的な制約に固執しながら、最小コストで需要を満たすために発電機出力と電力分布を誘導する電力系統運用において重要な問題である。しかし、風や太陽といった再生可能エネルギー源の統合は、その固有の変動性のために問題となる。この変動性は、主に気象条件の変化によって引き起こされ、電源設定の頻繁な再調整を必要とする。このタスクは、特に広範囲の電力システムにおいて、従来の数値的手法を駆使している。本稿では,模倣学習と欧州の過去の気象データを用いて学習を行う,最先端の物理学を応用した機械学習手法を提案する。提案手法は電力需要と気象パターンを電力供給・発電と直接相関させ,従来のOPF解決器の繰り返し要求を回避する。これにより、リアルタイムアプリケーションに適応するより効率的なソリューションが提供される。集約欧州電力システムにおける厳密な評価は、opf解決における既存のデータ駆動技術よりも優れていることを検証している。高速で堅牢で効率的なソリューションを提示することにより、この研究は、再生可能エネルギー時代においてより回復力のある電力システムを実現するために、リアルタイムOPF解決の新しい標準を確立する。 The Optimal Power Flow (OPF) problem is pivotal for power system operations, guiding generator output and power distribution to meet demand at minimized costs, while adhering to physical and engineering constraints. The integration of renewable energy sources, like wind and solar, however, poses challenges due to their inherent variability. This variability, driven largely by changing weather conditions, demands frequent recalibrations of power settings, thus necessitating recurrent OPF resolutions. This task is daunting using traditional numerical methods, particularly for extensive power systems. In this work, we present a cutting-edge, physics-informed machine learning methodology, trained using imitation learning and historical European weather datasets. Our approach directly correlates electricity demand and weather patterns with power dispatch and generation, circumventing the iterative requirements of traditional OPF solvers. This offers a more expedient solution apt for real-time applications. Rigorous evaluations on aggregated European power systems validate our method's superiority over existing data-driven techniques in OPF solving. By presenting a quick, robust, and efficient solution, this research sets a new standard in real-time OPF resolution, paving the way for more resilient power systems in the era of renewable energy.	翻訳日:2023-11-27 23:54:32 公開日:2023-11-23
# 量子ネットワーク絡み合い対策 Quantum network-entanglement measures ( http://arxiv.org/abs/2311.13945v1 ) ライセンス: Link先を確認	Zhen-Peng Xu, Julio I. de Vicente, Liang-Liang Sun and Sixia Yu	(参考訳) 量子ネットワークは近年注目されており、量子インターネットは長い間考えられてきた。ネットワーク絡み合いは、ネットワーク絡み合いの概念をネットワークシナリオに適用し、ネットワーク絡み合い状態は、与えられたネットワーク構造の制限を克服するためのリソースであると考えられる。本研究では,量子資源理論の一般的な枠組みにおいてよく定義された量子ネットワークの絡み合いの尺度を導入すると同時に,与えられたネットワーク内でターゲットとする量子状態を作成するために必要な余剰資源を特徴付ける明確な操作解釈を行う。特に,ネットワーク通信コストとネットワークラウンド複雑性を定義し,グラフ理論パラメータと密接に関連していることがわかった。デバイスに依存しない,デバイスに依存しない手法も提案する。 Quantum networks are of high interest nowadays and a quantum internet has been long envisioned. Network-entanglement adapts the notion of entanglement to the network scenario and network-entangled states are considered to be a resource to overcome the limitations of a given network structure. In this work we introduce measures of quantum network-entanglement that are well-defined within the general framework of quantum resource theories, which at the same time have a clear operational interpretation characterizing the extra resources necessary to prepare a targeted quantum state within a given network. In particular, we define the network communication cost and the network round complexity, which turn out to be intimately related to graph-theoretic parameters. We also provide device-dependent and device-independent methods to estimate these measures.	翻訳日:2023-11-27 23:54:10 公開日:2023-11-23
# 言語間テキストスタイル転送のための探索法:テキストデトックス化の場合 Exploring Methods for Cross-lingual Text Style Transfer: The Case of Text Detoxification ( http://arxiv.org/abs/2311.13937v1 ) ライセンス: Link先を確認	Daryna Dementieva, Daniil Moskovskiy, David Dale and Alexander Panchenko	(参考訳) テキストデトックス化(text detoxification)は、テキストのスタイルを有害から中立に移す作業である。例えば(Dale et al., 2021; Hallinan et al., 2022)、単言語的なセットアップにおいて有望な結果をもたらすアプローチがあるが、このタスクに対する言語間移動は難しい問題のままである(Moskovskiy et al., 2022)。本研究では,ある言語に対して平行なデトックス化コーパスを与えられた言語間テキストのデトックス化戦略を大規模に検討し,その目的は,そのようなコーパスを持たない他の言語にデトックス化能力を伝達することである。さらに,テキスト翻訳と非翻訳を同時に行う新しいタスクを初めて検討し,このタスクに強力なベースラインをいくつか提供した。最後に,従来のベンチマークよりも高い相関率を持つ新しい自動解毒評価指標を提案する。手動のマークアップによる最も有望なアプローチの評価を行い、言語間でテキストデトキシフィケーションの知識を伝達する最善の戦略の答えを決定する。 Text detoxification is the task of transferring the style of text from toxic to neutral. While here are approaches yielding promising results in monolingual setup, e.g., (Dale et al., 2021; Hallinan et al., 2022), cross-lingual transfer for this task remains a challenging open problem (Moskovskiy et al., 2022). In this work, we present a large-scale study of strategies for cross-lingual text detoxification -- given a parallel detoxification corpus for one language; the goal is to transfer detoxification ability to another language for which we do not have such a corpus. Moreover, we are the first to explore a new task where text translation and detoxification are performed simultaneously, providing several strong baselines for this task. Finally, we introduce new automatic detoxification evaluation metrics with higher correlations with human judgments than previous benchmarks. We assess the most promising approaches also with manual markup, determining the answer for the best strategy to transfer the knowledge of text detoxification between languages.	翻訳日:2023-11-27 23:53:57 公開日:2023-11-23
# 相関距離とネットワークプルーニングによるロバスト性強化知識蒸留 Robustness-Reinforced Knowledge Distillation with Correlation Distance and Network Pruning ( http://arxiv.org/abs/2311.13934v1 ) ライセンス: Link先を確認	Seonghak Kim, Gyeongdo Ham, Yucheol Cho, and Daeshik Kim	(参考訳) 効率的で軽量なモデル(すなわち学生モデル)の性能の向上は、より複雑なモデル(すなわち教師モデル)から知識を伝達する知識蒸留(KD)によって達成される。しかし、既存のKD技術のほとんどは、特定の制限を持つKL(Kullback-Leibler)の発散に依存している。まず、教師分布がエントロピーが高い場合、kl発散のモード平均化の性質は、十分なターゲット情報の転送を妨げる。第二に、教師の分布が低エントロピーである場合、KL分散は特定のモードに過度に集中する傾向にあり、学生に十分な量の貴重な知識を伝達できない。結果として、多くの難解なサンプルを含むデータセットを扱う場合、学生モデルは十分な知識を得るのに苦労し、結果として性能が劣る可能性がある。さらに,これまでのkdアプローチでは,モデルの一般化を促進する技術であるデータ拡張が悪影響を及ぼす可能性があることを観察した。そこで我々は,相関距離とネットワークプルーニングを利用したロバストネス強化知識蒸留(R2KD)を提案する。このアプローチにより、KDはパフォーマンス改善のためにデータ拡張を効果的に組み込むことができる。 cifar-100、fgvr、tinyimagenet、imagenetなど、さまざまなデータセットに関する広範な実験は、現在の最先端の方法よりも優れた方法を示している。 The improvement in the performance of efficient and lightweight models (i.e., the student model) is achieved through knowledge distillation (KD), which involves transferring knowledge from more complex models (i.e., the teacher model). However, most existing KD techniques rely on Kullback-Leibler (KL) divergence, which has certain limitations. First, if the teacher distribution has high entropy, the KL divergence's mode-averaging nature hinders the transfer of sufficient target information. Second, when the teacher distribution has low entropy, the KL divergence tends to excessively focus on specific modes, which fails to convey an abundant amount of valuable knowledge to the student. Consequently, when dealing with datasets that contain numerous confounding or challenging samples, student models may struggle to acquire sufficient knowledge, resulting in subpar performance. Furthermore, in previous KD approaches, we observed that data augmentation, a technique aimed at enhancing a model's generalization, can have an adverse impact. Therefore, we propose a Robustness-Reinforced Knowledge Distillation (R2KD) that leverages correlation distance and network pruning. This approach enables KD to effectively incorporate data augmentation for performance improvement. Extensive experiments on various datasets, including CIFAR-100, FGVR, TinyImagenet, and ImageNet, demonstrate our method's superiority over current state-of-the-art methods.	翻訳日:2023-11-27 23:53:35 公開日:2023-11-23
# 音源自由物体検出のための教師学生の定期交流 Periodically Exchange Teacher-Student for Source-Free Object Detection ( http://arxiv.org/abs/2311.13930v1 ) ライセンス: Link先を確認	Qipeng Liu, Luojun Lin, Zhifeng Shen, Zhifeng Yang	(参考訳) source-free object detection (sfod) は、ソースドメインデータがない場合、未ラベルのターゲットドメインデータにソース検出器を適用することを目的としている。ほとんどのSFOD法は、学生モデルを1つの教師モデルのみで指導する平均教師(MT)フレームワークを用いて、同じ自己学習パラダイムに従っている。しかし、このようなパラダイムは、ドメインシフトによって教師モデルが制御不能に崩壊すると、生徒モデルも劇的なパフォーマンス低下に苦しむようなトレーニング不安定問題に容易に陥る可能性がある。そこで,本稿では,静的教師,動的教師,学生モデルからなるマルチ・ティーチャー・フレームワークを導入するための,単純かつ斬新な手法であるpets法を提案する。学習段階では,静的教師と生徒モデルの重み付けを定期的に交換する。そして,静的教師によって既に交換されている学生モデルの移動平均を用いて,動的教師を更新する。このようにして、動的教師は過去の知識を統合し、エラーの蓄積を効果的に削減し、mtベースのフレームワーク内でより安定したトレーニングプロセスを可能にする。さらに,2つの教師モデルの予測を融合し,生徒モデルに高品質な擬似ラベルを提供するコンセンサス機構を開発した。複数のSFODベンチマークにおいて,提案手法が他の手法と比較して最先端性能を実現し,SFODタスクにおける本手法の有効性と優位性を実証した。 Source-free object detection (SFOD) aims to adapt the source detector to unlabeled target domain data in the absence of source domain data. Most SFOD methods follow the same self-training paradigm using mean-teacher (MT) framework where the student model is guided by only one single teacher model. However, such paradigm can easily fall into a training instability problem that when the teacher model collapses uncontrollably due to the domain shift, the student model also suffers drastic performance degradation. To address this issue, we propose the Periodically Exchange Teacher-Student (PETS) method, a simple yet novel approach that introduces a multiple-teacher framework consisting of a static teacher, a dynamic teacher, and a student model. During the training phase, we periodically exchange the weights between the static teacher and the student model. Then, we update the dynamic teacher using the moving average of the student model that has already been exchanged by the static teacher. In this way, the dynamic teacher can integrate knowledge from past periods, effectively reducing error accumulation and enabling a more stable training process within the MT-based framework. Further, we develop a consensus mechanism to merge the predictions of two teacher models to provide higher-quality pseudo labels for student model. Extensive experiments on multiple SFOD benchmarks show that the proposed method achieves state-of-the-art performance compared with other related methods, demonstrating the effectiveness and superiority of our method on SFOD task.	翻訳日:2023-11-27 23:53:11 公開日:2023-11-23
# Duling Banditを用いた直接選好に基づく進化的多目的最適化 Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit ( http://arxiv.org/abs/2311.14003v1 ) ライセンス: Link先を確認	Tian Huang, Ke Li	(参考訳) 最適化問題は、単目的シナリオと多目的シナリオの両方で広く用いられる。実践的なアプリケーションでは、ユーザはParetoフロント(PF)に沿って関心領域(ROI)に収束するソリューションを志しています。従来のアプローチでは,適合度関数や客観的関数を近似してユーザの好みを反映するが,本論文では代替手段を検討する。具体的には、人間のフィードバックのみに頼って、フィットネス関数の計算を補助的に行う方法を見つけることを目的とする。提案手法は,アクティブなデュリングバンディットアルゴリズムによって直接選好学習が容易になることを示す。実験段階は3つのセッションに分けられる。まず,我々のアクティブデュエルバンディットアルゴリズムの性能を評価する。次に,多目的進化アルゴリズム(MOEA)の文脈内で提案手法を実装した。最後に,タンパク質構造予測(PSP)において本手法を実用上の問題に展開する。本研究は,従来の手法の限界に対処するだけでなく,最適化問題に対する新たな可能性を明らかにする,インタラクティブな嗜好ベースのMOEAフレームワークを提案する。 Optimization problems find widespread use in both single-objective and multi-objective scenarios. In practical applications, users aspire for solutions that converge to the region of interest (ROI) along the Pareto front (PF). While the conventional approach involves approximating a fitness function or an objective function to reflect user preferences, this paper explores an alternative avenue. Specifically, we aim to discover a method that sidesteps the need for calculating the fitness function, relying solely on human feedback. Our proposed approach entails conducting direct preference learning facilitated by an active dueling bandit algorithm. The experimental phase is structured into three sessions. Firstly, we assess the performance of our active dueling bandit algorithm. Secondly, we implement our proposed method within the context of Multi-objective Evolutionary Algorithms (MOEAs). Finally, we deploy our method in a practical problem, specifically in protein structure prediction (PSP). This research presents a novel interactive preference-based MOEA framework that not only addresses the limitations of traditional techniques but also unveils new possibilities for optimization problems.	翻訳日:2023-11-27 23:45:54 公開日:2023-11-23
# 固定周波数を有する量子オットー型熱エンジン A quantum Otto-type heat engine with fixed frequency ( http://arxiv.org/abs/2311.13999v1 ) ライセンス: Link先を確認	Richard Q. Matos, Rogerio J. de Assis, and Norton G. de Almeida	(参考訳) 本研究では,量子調和振動子(qho)からなる動作物質を用いてオットー型サイクルを解析する。 qhoの周波数を変化させて、圧縮した貯水池で熱化させることで作業抽出を行う他の研究とは異なり、ここでは、qhoをスクイーズパラメータで制御されたパラメトリックポンプに送信し、熱貯水池で熱化させる。次に,パラメトリックポンプを用いたオットー型エンジンにおけるスクイーズパラメータの役割について検討し,スキューズパラメータを任意に増加させることでカルノー限界に到達可能であることを示す。特に、あるスクイーズパラメータ$r$、例えば$r=0.4$の場合、準静的オットー極限は非ゼロパワーでも到達できる。また, ユニタリストローク時の効率挙動におけるエントロピー生成の役割について検討し, エンジン効率の正の(負の)変化は, 予想通り増加(低下)に対応することを示した。さらに, 熱貯留下では, 系のハミルトニアンによって導入された量子資源によらず, カーノーエンジンよりも効率の良い作業抽出プロセスは不可能であることを示す。 In this work, we analyze an Otto-type cycle operating with a working substance composed of a quantum harmonic oscillator (QHO). Unlike other studies in which the work extraction is done by varying the frequency of the QHO and letting it thermalize with a squeezed reservoir, here we submit the QHO to a parametric pumping controlled by the squeezing parameter and let it thermalize with a thermal reservoir. We then investigate the role of the squeezing parameter in our Otto-type engine powered by parametric pumping and show that it is possible to reach the Carnot limit by arbitrarily increasing the squeezing parameter. Notably, for certain squeezing parameters $r$, e.g. $r=0.4$, the quasi-static Otto limit can be reached even at non-zero power. We also investigated the role of entropy production in the efficiency behavior during the unitary strokes, showing that positive (negative) changes in entropy production correspond to increases (decreases) in engine efficiency, as expected. Furthermore, we show that under thermal reservoirs a work extraction process that is more efficient than the Carnot engine is impossible, regardless of the quantum resource introduced via the Hamiltonian of the system.	翻訳日:2023-11-27 23:45:38 公開日:2023-11-23
# GRJointNET: 3次元不完全点雲の相乗的補完と部分分割 GRJointNET: Synergistic Completion and Part Segmentation on 3D Incomplete Point Clouds ( http://arxiv.org/abs/2311.13997v1 ) ライセンス: Link先を確認	Yigit Gurses, Melisa Taspinar, Mahmut Yurt, Sedat Ozer	(参考訳) 三次元3次元点雲の分割は自律システムにとって重要な課題である。しかしながら、セグメンテーションアルゴリズムの成功は、基礎となるポイントクラウド(解像度、完全性など)の品質に大きく依存する。特に不完全点雲は下流モデルの性能を低下させる可能性がある。 grnetは、完全ポイントクラウドに対する新しいディープラーニングソリューションとして提案されているが、部分セグメンテーションはできない。一方,提案手法であるGRJointNetは,GRNetの後継として,ポイントクラウド上で共同補完とセグメンテーションを行うことができるアーキテクチャである。 2つのタスクのために抽出された特徴も、全体的なパフォーマンスを高めるために互いに利用されます。提案したネットワークをShapeNet-Partデータセット上で評価し,その性能をGRNetと比較した。 GRJointNet は GRNet より優れていることを示す。 GRJointNetはセグメンテーションができないが、GRJointNetはセグメンテーションができない。この研究1は、自律システムの3Dビジョンにおける点雲の実用性と実用性を高めることを約束している。 Segmentation of three-dimensional (3D) point clouds is an important task for autonomous systems. However, success of segmentation algorithms depends greatly on the quality of the underlying point clouds (resolution, completeness etc.). In particular, incomplete point clouds might reduce a downstream model's performance. GRNet is proposed as a novel and recent deep learning solution to complete point clouds, but it is not capable of part segmentation. On the other hand, our proposed solution, GRJointNet, is an architecture that can perform joint completion and segmentation on point clouds as a successor of GRNet. Features extracted for the two tasks are also utilized by each other to increase the overall performance. We evaluated our proposed network on the ShapeNet-Part dataset and compared its performance to GRNet. Our results demonstrate GRJointNet can outperform GRNet on point completion. It should also be noted that GRNet is not capable of segmentation while GRJointNet is. This study1, therefore, holds a promise to enhance practicality and utility of point clouds in 3D vision for autonomous systems.	翻訳日:2023-11-27 23:45:15 公開日:2023-11-23
# eigen: 文書画像からの忠実度情報抽出のための専門家による共同学習アグリゲーション EIGEN: Expert-Informed Joint Learning Aggregation for High-Fidelity Information Extraction from Document Images ( http://arxiv.org/abs/2311.13993v1 ) ライセンス: Link先を確認	Abhishek Singh, Venkatapathy Subramanian, Ayush Maheshwari, Pradeep Narayan, Devi Prasad Shetty, Ganesh Ramakrishnan	(参考訳) 文書画像からの情報抽出(IE)は,レイアウトフォーマットの多様性が高いため困難である。 LayoutLMやBROSのような深層モデルはこの問題に対処するために提案されており、有望な結果を示している。しかし、これらのモデルのトレーニングには、まだ大量のフィールドレベルのアノテーションが必要です。幾何学的位置やフィールドの種類といった形式のレイアウトやセマンティクスの理解に基づいて、ルールベースの手法を用いた他のアプローチも提案されている。本研究では,ルールベース手法とディープラーニングモデルを組み合わせて,大量のトレーニングデータのアノテーション要件を回避するための新しい手法であるeigen(expert-informed joint learning aggreation)を提案する。具体的には、eigenは複数のヒューリスティックから引き起こされる弱いラベルを生成モデルを通じて統合し、少数の注釈付きラベルと共に使用して深層モデルを訓練する。本稿では,文脈情報を組み込んだラベル付け機能を用いて,単語の視覚的・言語的コンテキストを正確に分類する手法を提案する。 EIGENフレームワークは,ラベル付きデータインスタンスをほとんど使用せずに,最先端のディープモデルの性能を大幅に向上させることができることを実証的に示す。ソースコードはhttps://github.com/ayushayush591/EIGEN-High-Fidelity-Extraction-Document-Imagesで公開されている。 Information Extraction (IE) from document images is challenging due to the high variability of layout formats. Deep models such as LayoutLM and BROS have been proposed to address this problem and have shown promising results. However, they still require a large amount of field-level annotations for training these models. Other approaches using rule-based methods have also been proposed based on the understanding of the layout and semantics of a form such as geometric position, or type of the fields, etc. In this work, we propose a novel approach, EIGEN (Expert-Informed Joint Learning aGgrEatioN), which combines rule-based methods with deep learning models using data programming approaches to circumvent the requirement of annotation of large amounts of training data. Specifically, EIGEN consolidates weak labels induced from multiple heuristics through generative models and use them along with a small number of annotated labels to jointly train a deep model. In our framework, we propose the use of labeling functions that include incorporating contextual information thus capturing the visual and language context of a word for accurate categorization. We empirically show that our EIGEN framework can significantly improve the performance of state-of-the-art deep models with the availability of very few labeled data instances. The source code is available at https://github.com/ayushayush591/EIGEN-High-Fidelity-Extraction-Document-Images.	翻訳日:2023-11-27 23:44:57 公開日:2023-11-23
# 学習型ダウンウォッシュモデルを用いた近接ドッキングマルチロータ Docking Multirotors in Close Proximity using Learnt Downwash Models ( http://arxiv.org/abs/2311.13988v1 ) ライセンス: Link先を確認	Ajay Shankar, Heedo Woo, Amanda Prorok	(参考訳) 非モデル空力障害は、複数の車両が互いに近接している場合、マルチロータ飛行において重要な課題となる。しかし、あるミッション \textit{require} 2つのマルチローターが互いに1-2体の長さで接近し、形成を保ちます。このリーダー従者設定では、従者は最終ドッキング段階でリーダーから大きなダウンウォッシュの干渉を受ける。これを補うために,最適フィードバックコントローラ内でオンライン上で学習したダウンウォッシュモデルを用いてドッキング動作を正確に追跡し,形成を保持する。実世界の飛行と操縦の異なる飛行を通して、この補償が従来のニーブアプローチで必要とされる大きな垂直離間を減らすために重要であることを示す。本評価では,リーダーの2つの体長内に垂直に接近したときの追従者に対する追従誤差が0.06m未満(3～4倍減少)であった。最後に,2つの空飛ぶマルチローター間の物理的ドッキングを,単一のスムーズな計画軌道で実施する。 Unmodeled aerodynamic disturbances pose a key challenge for multirotor flight when multiple vehicles are in close proximity to each other. However, certain missions \textit{require} two multirotors to approach each other within 1-2 body-lengths of each other and hold formation -- we consider one such practical instance: vertically docking two multirotors in the air. In this leader-follower setting, the follower experiences significant downwash interference from the leader in its final docking stages. To compensate for this, we employ a learnt downwash model online within an optimal feedback controller to accurately track a docking maneuver and then hold formation. Through real-world flights with different maneuvers, we demonstrate that this compensation is crucial for reducing the large vertical separation otherwise required by conventional/naive approaches. Our evaluations show a tracking error of less than 0.06m for the follower (a 3-4x reduction) when approaching vertically within two body-lengths of the leader. Finally, we deploy the complete system to effect a successful physical docking between two airborne multirotors in a single smooth planned trajectory.	翻訳日:2023-11-27 23:44:36 公開日:2023-11-23
# jam-alt: フォーマッティングアウェアな歌詞書き起こしベンチマーク Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark ( http://arxiv.org/abs/2311.13987v1 ) ライセンス: Link先を確認	Ond\v{r}ej C\'ifka, Constantinos Dimitriou, Cheng-i Wang, Hendrik Schreiber, Luke Miner, Fabian-Robert St\"oter	(参考訳) 現在のalt(automatic lyrics transcription)ベンチマークは、言語コンテンツのみに焦点を当てており、書式や句読点を含む歌詞の微妙なニュアンスを無視しているため、ミュージシャンやソングライターの創造的製品やリスナーの経験との潜在的な不一致につながる可能性がある。例えば、ラインブレークはリズム、感情強調、韻律、高レベルの構造に関する情報を伝える上で重要である。この問題に対処するため,JamendoLyricsデータセットをベースとした新しい歌詞書き起こしベンチマークであるJam-ALTを導入する。私たちの貢献は2倍です。まず、書き起こしの完全な改訂は、音楽産業のガイドラインを統一し、句読点、線切れ、綴り、バックグラウンドボーカル、非単語音といった側面をカバーする、新たに作成された注釈ガイドに従うことで、ALTの評価に特化している。第二に、従来の単語エラー率とは異なり、このような現象を捉えるために設計された評価指標のセット。提案するベンチマークがALTタスクに寄与し,より正確で信頼性の高い書き起こしシステムの評価と,字幕の字幕表示やカラオケなどの歌詞アプリケーションにおけるユーザエクスペリエンスの向上を期待する。 Current automatic lyrics transcription (ALT) benchmarks focus exclusively on word content and ignore the finer nuances of written lyrics including formatting and punctuation, which leads to a potential misalignment with the creative products of musicians and songwriters as well as listeners' experiences. For example, line breaks are important in conveying information about rhythm, emotional emphasis, rhyme, and high-level structure. To address this issue, we introduce Jam-ALT, a new lyrics transcription benchmark based on the JamendoLyrics dataset. Our contribution is twofold. Firstly, a complete revision of the transcripts, geared specifically towards ALT evaluation by following a newly created annotation guide that unifies the music industry's guidelines, covering aspects such as punctuation, line breaks, spelling, background vocals, and non-word sounds. Secondly, a suite of evaluation metrics designed, unlike the traditional word error rate, to capture such phenomena. We hope that the proposed benchmark contributes to the ALT task, enabling more precise and reliable assessments of transcription systems and enhancing the user experience in lyrics applications such as subtitle renderings for live captioning or karaoke.	翻訳日:2023-11-27 23:44:15 公開日:2023-11-23
# FViT-Grasp:高速ビジョン変換器を用いた物体のグラッピング FViT-Grasp: Grasping Objects With Using Fast Vision Transformers ( http://arxiv.org/abs/2311.13986v1 ) ライセンス: Link先を確認	Arda Sarp Yenicesu, Berk Cicek and Ozgur S.Oguz	(参考訳) 本研究はロボット工学における課題であるマニピュレーションの課題を扱っている。我々は,ロボットが物体を操作するための最適な把握ポイントを迅速かつ正確に同定するための新しい手法を考案した。我々のアプローチは、視覚データを処理し、最も適切な把握位置を予測するように設計されたニューラルネットワークであるFViT(Fast Vision Transformer)を活用する。高い精度を維持しながら, 最先端の性能を実証し, リアルタイムロボット把持アプリケーションへの展開を期待する。この研究は、視覚ベースのロボット把持応用における将来の研究のベースラインを提供すると信じている。その高速かつ精度は、研究者を現実の応用に近づける。 This study addresses the challenge of manipulation, a prominent issue in robotics. We have devised a novel methodology for swiftly and precisely identifying the optimal grasp point for a robot to manipulate an object. Our approach leverages a Fast Vision Transformer (FViT), a type of neural network designed for processing visual data and predicting the most suitable grasp location. Demonstrating state-of-the-art performance in terms of speed while maintaining a high level of accuracy, our method holds promise for potential deployment in real-time robotic grasping applications. We believe that this study provides a baseline for future research in vision-based robotic grasp applications. Its high speed and accuracy bring researchers closer to real-life applications.	翻訳日:2023-11-27 23:43:48 公開日:2023-11-23
# フォトニックプロセッサにおける誤差緩和変動アルゴリズム Error mitigated variational algorithm on a photonic processor ( http://arxiv.org/abs/2311.13985v1 ) ライセンス: Link先を確認	O.V. Borzenkova, G.I. Struchalin, I. Kondratyev, A. Moiseevskiy, I.V. Dyakonov, and S.S. Straupe	(参考訳) 量子フォトニックプロセッサにおける不明瞭性関連ノイズの誤差低減効果を示す。変動量子固有解法(VQE)にゼロノイズ外挿法(ZNE)を適用する。 2つの異なるエラーレベルで測定された観測可能な値は、ノイズのないレジームへと外挿することができる。シュウィンガーハミルトニアンのためのVQEを実装した2ビットプロセッサにおける光子の偏微分性の影響について検討する。その結果、ZNE法によるハミルトン固有値推定の改善が証明された。最後に、外部制御偏光を持つ他の線形光プロセッサに対する誤差緩和法の適用可能性を分析する。 We demonstrate successful error mitigation of indistinguishability-related noise in a quantum photonic processor. We apply zero-noise extrapolation (ZNE) technique to a variational quantum eigensolver (VQE). The observable values measured at two-different error levels allow us to extrapolate it towards noise-free regime. We study influence of the partial distinguishability of photons in a two-qubit processor which implements the VQE for a Schwinger Hamiltonian. The results evidence the improvement of the Hamiltonian eigenvalue estimation using the ZNE procedure. Lastly, we analyze potential applicability of this error mitigation method to other linear optical processors with externally controlled polarization.	翻訳日:2023-11-27 23:43:37 公開日:2023-11-23
# 家庭外配送の学習動的選択と価格設定 Learning Dynamic Selection and Pricing of Out-of-Home Deliveries ( http://arxiv.org/abs/2311.13983v1 ) ライセンス: Link先を確認	Fabian Akkerman, Peter Dieter, Martijn Mes	(参考訳) 宅配の失敗、交通渋滞、そして比較的大きなハンドリング時間が、ラストマイル物流の収益性に悪影響を及ぼす。これらの外部要因は、全体のコストの最大28セントと、宅配サプライチェーンのエミッションの最大25セントに寄与する。年間成長率が最大36.5%まで上昇する可能性のある解決策は、宅配(OOH)で示されるパーセルロッカーやパーセルショップへの配送である。学術文献では、OOH提供に関する顧客行動のモデルが決定論的設定に限定されており、実際の顧客選択の確率的性質とは対照的である。我々は、今後の顧客の到着や選択を考慮して、到着する顧客に対するインセンティブに対して、OOHロケーションが提供すべきシーケンシャルな意思決定問題をモデル化する。本稿では、畳み込みニューラルネットワークへの入力として、新しい時空間状態符号化を用いたアルゴリズムパイプラインであるOOH(DSPO)の動的選択と価格設定を提案する。提案手法を3つの最先端アプローチに対してベンチマークすることで,本手法の性能を実証する。実世界のデータによって導かれた広範な数値研究により、dspoはoohの配置のない状況と比較して20.8\%のコストを節約でき、静的選択と価格ポリシーと比較して8.1\%、最先端の需要管理ベンチマークと比較して4.6\%のコストを節約できることが明らかとなった。当社では,ooh配信のダイナミクスと価格戦略による顧客の行動との複雑な相互作用に関する総合的な洞察を提供する。この結果から,OOHデリバリーが市場シェアを拡大するにつれて,動的選択と価格政策を採用することが示唆された。 Home delivery failures, traffic congestion, and relatively large handling times have a negative impact on the profitability of last-mile logistics. These external factors contribute to up to $28\%$ of the overall costs and $25\%$ of emissions for the home delivery supply chain. A potential solution, showing annual growth rates up to $36\%$, is the delivery to parcel lockers or parcel shops, denoted by out-of-home (OOH) delivery. In the academic literature, models of customer behavior with respect to OOH delivery were so far limited to deterministic settings, contrasting with the stochastic nature of actual customer choices. We model the sequential decision-making problem of which OOH location to offer against what incentive for each incoming customer, taking into account future customer arrivals and choices. We propose Dynamic Selection and Pricing of OOH (DSPO), an algorithmic pipeline that uses a novel spatial-temporal state encoding as input to a convolutional neural network. We demonstrate the performance of our method by benchmarking it against three state-of-the-art approaches. Our extensive numerical study, guided by real-world data, reveals that DSPO can save $20.8\%$ in costs compared to a situation without OOH locations, $8.1\%$ compared to a static selection and pricing policy, and $4.6\%$ compared to a state-of-the-art demand management benchmark. We provide comprehensive insights into the complex interplay between OOH delivery dynamics and customer behavior influenced by pricing strategies. The implications of our findings suggest practitioners to adopt dynamic selection and pricing policies as OOH delivery gains a larger market share.	翻訳日:2023-11-27 23:43:30 公開日:2023-11-23
# 知識集約型質問に対する確率的思考木推論 Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions ( http://arxiv.org/abs/2311.13982v1 ) ライセンス: Link先を確認	Shulin Cao, Jiajie Zhang, Jiaxin Shi, Xin Lv, Zijun Yao, Qi Tian, Juanzi Li, Lei Hou	(参考訳) 大規模言語モデル(LLM)は、知識集約的な複雑な質問にチェーン・オブ・シント(CoT)推論で答えることができる。しかし、モデルパラメーターで必要な知識が利用できない場合や最新の場合、実際には誤った推論ステップを生成する傾向がある。最近の研究は、CoT推論を強化するための外部知識の回収に向けられている。有望であるにも拘わらず、これらのチェーンベースの方法は: 1)否定的検索。不要又は不正確な検索は,その推論を誤解することができる。 2) 視界が限られている。後方または前方を見る能力が欠如しているため、あるステップで局所的なエラーが連鎖に沿って伝播する。本稿では,確率的ツリー・オブ・シント推論(ProbTree)という新しいアプローチを提案する。まず、LLMは複雑な質問をクエリツリーに変換し、各非ルートノードはその親ノードのサブクエストを表す。そして、木の上に確率論的推論を行い、問合せと解答の両方の信頼性を考慮して葉から根まで質問を解く。推論中、葉ノードでは、パラメトリック知識を用いたクローズドブックQAと、検索した外部知識を用いたオープンブックQAからより確実な回答を選択し、負の検索問題を除去する。階層構造を持つ非リードノードの場合、llmはより広い視野を持ち、子ノードからの情報をグローバルに推論できるため、ローカルエラーから回復することができる。オープンドメイン設定下での3つの複雑なQAデータセット実験により,本手法がSOTA法よりも優れており,確率的ツリー・オブ・シークレット推論の効果が示された。 Large language models (LLMs) are capable of answering knowledge-intensive complex questions with chain-of-thought (CoT) reasoning. However, they tend to generate factually incorrect reasoning steps when the required knowledge is not available or up-to-date in models' parameters. Recent works turn to retrieving external knowledge to augment CoT reasoning. Despite being promising, these chain-based methods suffer from: 1) Negative retrieval. Unnecessary or incorrect retrieval may mislead the reasoning; 2) Limited sight. Lacking the ability to look backward or forward, a local error in one step will propagate along the chain. In this paper, we propose a novel approach: Probabilistic Tree-of-thought Reasoning (ProbTree). First, LLMs translate a complex question into a query tree, in which each non-root node denotes a sub-question of its parent node. Then, probabilistic reasoning is conducted over the tree, by solving questions from leaf to root considering the confidence of both question decomposing and answering. During reasoning, for leaf nodes, LLMs choose a more confident answer from Closed-book QA that employs parametric knowledge and Open-book QA that employs retrieved external knowledge, thus eliminating the negative retrieval problem. For non-leaf nodes, with the hierarchical structure, LLMs have broader sights and are able to globally reason with the information from child nodes, thus recovering from local errors. The experiments on three Complex QA datasets under the open-domain setting show that our approach outperforms SOTA methods significantly, demonstrating the effect of probabilistic tree-of-thought reasoning.	翻訳日:2023-11-27 23:43:00 公開日:2023-11-23
# MedISure: 混合境界解析を用いた機械学習に基づく医用画像分類支援 MedISure: Towards Assuring Machine Learning-based Medical Image Classifiers using Mixup Boundary Analysis ( http://arxiv.org/abs/2311.13978v1 ) ライセンス: Link先を確認	Adam Byfield, William Poulett, Ben Wallace, Anusha Jose, Shatakshi Tyagi, Smita Shembekar, Adnan Qayyum, Junaid Qadir, and Muhammad Bilal	(参考訳) 機械学習(ml)モデルは医療技術に不可欠なものになってきており、安全性、公平性、堅牢性、信頼性を検証するための公式な保証が求められている。これらのモデルは本質的にエラーを起こしやすいため、患者の健康に深刻なリスクを及ぼす可能性がある。従来のソフトウェア保証技術は固定コードに依存しており、これらのアルゴリズムはトレーニングプロセスを通じてキュレートされたデータセットから適応および学習できるため、mlモデルに直接適用されない。しかし、合成テストデータを用いた境界試験のような確立された原則を適用することで、このギャップを効果的に埋めることができる。そこで,本稿では,画像分類器の予測フェアネスの評価を容易にするmix-up boundary analysis (muba) という手法を提案する。脳腫瘍の分類と乳癌の分類という2つの重要な医療画像診断課題について評価し,有望な結果を得た。本研究の目的は、医療技術の安全性と信頼性を高めるため、MLモデルの評価に従来の保証原則を適用することの重要性を明らかにすることである。今後の研究を促進するため、MUBAのコードを公開する予定です。 Machine learning (ML) models are becoming integral in healthcare technologies, presenting a critical need for formal assurance to validate their safety, fairness, robustness, and trustworthiness. These models are inherently prone to errors, potentially posing serious risks to patient health and could even cause irreparable harm. Traditional software assurance techniques rely on fixed code and do not directly apply to ML models since these algorithms are adaptable and learn from curated datasets through a training process. However, adapting established principles, such as boundary testing using synthetic test data can effectively bridge this gap. To this end, we present a novel technique called Mix-Up Boundary Analysis (MUBA) that facilitates evaluating image classifiers in terms of prediction fairness. We evaluated MUBA for two important medical imaging tasks -- brain tumour classification and breast cancer classification -- and achieved promising results. This research aims to showcase the importance of adapting traditional assurance principles for assessing ML models to enhance the safety and reliability of healthcare technologies. To facilitate future research, we plan to publicly release our code for MUBA.	翻訳日:2023-11-27 23:42:34 公開日:2023-11-23
# 回転LiDARセンサの連続クラスタリングによる低レイテンシインスタンス分割 Low Latency Instance Segmentation by Continuous Clustering for Rotating LiDAR Sensors ( http://arxiv.org/abs/2311.13976v1 ) ライセンス: Link先を確認	Andreas Reich and Hans-Joachim Wuensche	(参考訳) LiDARポイントクラウドの低レイテンシインスタンスセグメンテーションは、ロボットの知覚パイプラインにおいて初期的で頻繁に使用されるビルディングブロックとして機能するため、現実世界のアプリケーションでは不可欠である。特に動的環境において、この全遅延は、高速道路のシナリオに見られるように、動的物体のかなりの位置オフセットをもたらす。この問題に対処するため,我々は,インスタンス単位のポイントクラウドを得るために,障害点の連続的クラスタリングを用いる。 LiDARセンサーの完全な革命を利用する既存のアプローチとは異なり、データストリームを連続的かつシームレスに処理します。より具体的には、レンジイメージの各カラムはすぐに処理される。障害ポイントは、既存のインスタンスにリアルタイムでクラスタ化され、インスタンスが完了して公開準備が整った高周波でチェックされる。もう1つの利点は、スキャンの開始点と終了点の間の問題のある不連続が観察されないことである。本稿では,入力データをリアルタイムにクラスタ化可能な2層データ構造と,それに対応する連続クラスタリングアルゴリズムについて述べる。我々は、大きな知覚的視野の重要性を説明します。さらに,ディープラーニングに基づく低レイテンシインスタンスセグメンテーションのためのアーキテクチャの設計に関係のある重要なアーキテクチャ設計選択について記述し,評価する。ソースコードはhttps://github.com/UniBwTAS/continuous_clustering.comで公開しています。 Low-latency instance segmentation of LiDAR point clouds is crucial in real-world applications because it serves as an initial and frequently-used building block in a robot's perception pipeline, where every task adds further delay. Particularly in dynamic environments, this total delay can result in significant positional offsets of dynamic objects, as seen in highway scenarios. To address this issue, we employ continuous clustering of obstacle points in order to obtain an instance-segmented point cloud. Unlike most existing approaches, which use a full revolution of the LiDAR sensor, we process the data stream in a continuous and seamless fashion. More specifically, each column of a range image is processed as soon it is available. Obstacle points are clustered to existing instances in real-time and it is checked at a high-frequency which instances are completed and are ready to be published. An additional advantage is that no problematic discontinuities between the points of the start and the end of a scan are observed. In this work we describe the two-layered data structure and the corresponding algorithm for continuous clustering, which is able to cluster the incoming data in real time. We explain the importance of a large perceptive field of view. Furthermore, we describe and evaluate important architectural design choices, which could be relevant to design an architecture for deep learning based low-latency instance segmentation. We are publishing the source code at https://github.com/UniBwTAS/continuous_clustering.	翻訳日:2023-11-27 23:42:02 公開日:2023-11-23
# 量子セキュアな直接通信の進化--qinternetへの道のり The Evolution of Quantum Secure Direct Communication: On the Road to the Qinternet ( http://arxiv.org/abs/2311.13974v1 ) ライセンス: Link先を確認	Dong Pan, Gui-Lu Long, Liuguo Yin, Yu-Bo Sheng, Dong Ruan, Soon Xin Ng, Jianhua Lu, and Lajos Hanzo	(参考訳) 通信セキュリティは、新興の量子コンピュータの巨大なコンピューティングパワーの脅威に直面して、より高い平面に進化する必要がある。 qsdc(quantum secure direct communication)は、量子通信の有望な分野であり、量子コンピューティングの脅威を立証し克服すると同時に、量子チャネルを介してシークレットメッセージを直接伝達する。本稿では,qsdc研究の動機と現状について,その理論的基礎と実験的検証を中心に紹介する。我々は、関連するポイントツーポイント通信プロトコルを詳述し、情報の保護と送信方法を示す。最後に,QSDC は純量子鍵分布(QKD)プロトコルではなく,完全なセキュア通信方式であることを強調して,オープンな課題と QSDC ネットワークの今後の動向について議論する。 Communication security has to evolve to a higher plane in the face of the threat from the massive computing power of the emerging quantum computers. Quantum secure direct communication (QSDC) constitutes a promising branch of quantum communication, which is provably secure and overcomes the threat of quantum computing, whilst conveying secret messages directly via the quantum channel. In this survey, we highlight the motivation and the status of QSDC research with special emphasis on its theoretical basis and experimental verification. We will detail the associated point-to-point communication protocols and show how information is protected and transmitted. Finally, we discuss the open challenges as well as the future trends of QSDC networks, emphasizing again that QSDC is not a pure quantum key distribution (QKD) protocol, but a fully-fledged secure communication scheme.	翻訳日:2023-11-27 23:41:21 公開日:2023-11-23
# ファインマンとフォン・ノイマンの仮定から制限されたファインマン経路積分へ:時間的連続量子測定の数学的理論 From each of Feynman's and von Neumann's postulates to the restricted Feynman path integrals: a mathematical theory of temporally continuous quantum measurements ( http://arxiv.org/abs/2311.13972v1 ) ライセンス: Link先を確認	Wataru Ichinose	(参考訳) ファインマンは1948年に有名な論文で仮定や量子化の方法を提案した。ファインマンの仮定を粒子の位置の時間的連続量子測定に適用し、メンスキーは現象学的考察の後、連続量子測定のための制限されたファインマン経路積分を提案した。本論文の目的は,メンスキーの制限されたファインマン経路積分が,単純な近似の下でファインマンの仮定から現れることを厳密に証明することである。加えて、制限されたファインマン経路積分はフォン・ノイマンの仮定やファインマンの仮定から生じることが証明されている。私たちが研究している量子系はスピン系を含む。これらの結果は、マルチスプリット実験、量子ゼノンおよびアハラノフ・ボーム効果の定式化に適用される。 Feynman proposed a postulate or a method of quantization in his celebrated paper in 1948. Applying Feynman's postulate to temporally continuous quantum measurements of the positions of particles, Mensky proposed the restricted Feynman path integrals for continuous quantum measurements after phenomenological considerations. Our aim in the present paper is to give a rigorous proof that Mensky's restricted Feynman path integrals emerge out of the Feynman's postulate under a simple approximation. In addition, it is proved that the restricted Feynman path integrals emerge out of von Neumann's postulate on instantaneous measurements as well as Feynman's postulate. The quantum systems that we study include spin systems. These results are applied to formulations of the multi-split experiments, the quantum Zeno and the Aharanov-Bohm effects.	翻訳日:2023-11-27 23:40:56 公開日:2023-11-23
# 荷電マクロ分子を用いた連続自発局在モデルの試験 Testing Continuous Spontaneous Localization model with charged macro-molecules ( http://arxiv.org/abs/2311.13966v1 ) ライセンス: Link先を確認	Emil Lenler-Eriksen and Michael Drewsen and Matteo Carlesso	(参考訳) この10年間で、波動関数の自発的崩壊モデル(崩壊モデルとも呼ばれる)への関心が高まった。彼らは、schr\"odinger進化を適切に修正することにより、よく知られた量子測定問題をコヒーレントに解く。量子実験は現在、そのようなモデルをテストする(したがって量子理論の限界をテストする)範囲内にある。そこで本研究では,線形ポールトラップに閉じ込められた2イオンを用いた試験手法を提案する。原子イオンと高分子イオンの組み合わせは、それぞれの崩壊機構における運動の自由度と不可分な洞察の冷却に好適である。 In the last decade, a growing interest has been devoted to models of spontaneous collapse of the wavefunction, known also as collapse models. They coherently solve the well-known quantum measurement problem by suitably modifying the Schr\"odinger evolution. Quantum experiments are now finally within the reach of testing such models (and thus testing the limits of quantum theory). Here, we propose a method based on a two-ions confined in a linear Paul trap to possibly enhance the testing capabilities of such experiments. The combination of an atomic and a macromolecular ion provide a good match for the cooling of the motional degrees of freedom and a non-negligible insight in the collapse mechanism, respectively.	翻訳日:2023-11-27 23:39:59 公開日:2023-11-23
# DPSUR: 選択的更新とリリースによる個人性確率勾配の高速化 DPSUR: Accelerating Differentially Private Stochastic Gradient Descent Using Selective Update and Release ( http://arxiv.org/abs/2311.14056v1 ) ライセンス: Link先を確認	Jie Fu, Qingqing Ye, Haibo Hu, Zhili Chen, Lulu Wang, Kuncan Wang, Ran Xun	(参考訳) マシンラーニングモデルは、トレーニング損失を減らすためにプライベートデータを記憶することが知られており、モデルインバージョンやメンバシップ推論といったプライバシ攻撃によって不注意に悪用される可能性がある。これらの攻撃から保護するために、差分プライバシー(dp)は、特にdpsgdのような確率的勾配降下を用いた一般的なトレーニングアルゴリズムにおいて、プライバシ保存機械学習のデファクトスタンダードとなっている。それでも、DPSGDは、収束が遅いために、依然として深刻なユーティリティー損失に悩まされている。これは、勾配にバイアスとばらつきをもたらすランダムサンプリングと、勾配更新の変動を引き起こすガウスノイズによって部分的に引き起こされる。これらの問題に対処するための重要なアイデアは、モデルトレーニングに選択的に更新を適用することです。そこで本研究では,各イテレーションからの勾配を検証テストに基づいて評価し,収束に至る更新のみをモデルに適用する,選択的更新とリリースに基づく差分プライベートなトレーニングフレームワークdpsurを提案する。したがって、DPSURは正しい方向のトレーニングを確実にし、DPSGDよりも早く収束することができる。主な課題は2つの側面にある – 勾配評価に起因するプライバシの懸念と、モデル更新のための勾配選択戦略だ。この課題に対処するため、DPSURは、更新ランダム化のためのクリッピング戦略と、勾配選択のためのしきい値メカニズムを導入した。 MNIST、FMNIST、CIFAR-10、IMDBのデータセットで行った実験では、DPSURは収束速度とモデルユーティリティの点で、従来よりも大幅に優れていた。 Machine learning models are known to memorize private data to reduce their training loss, which can be inadvertently exploited by privacy attacks such as model inversion and membership inference. To protect against these attacks, differential privacy (DP) has become the de facto standard for privacy-preserving machine learning, particularly those popular training algorithms using stochastic gradient descent, such as DPSGD. Nonetheless, DPSGD still suffers from severe utility loss due to its slow convergence. This is partially caused by the random sampling, which brings bias and variance to the gradient, and partially by the Gaussian noise, which leads to fluctuation of gradient updates. Our key idea to address these issues is to apply selective updates to the model training, while discarding those useless or even harmful updates. Motivated by this, this paper proposes DPSUR, a Differentially Private training framework based on Selective Updates and Release, where the gradient from each iteration is evaluated based on a validation test, and only those updates leading to convergence are applied to the model. As such, DPSUR ensures the training in the right direction and thus can achieve faster convergence than DPSGD. The main challenges lie in two aspects -- privacy concerns arising from gradient evaluation, and gradient selection strategy for model update. To address the challenges, DPSUR introduces a clipping strategy for update randomization and a threshold mechanism for gradient selection. Experiments conducted on MNIST, FMNIST, CIFAR-10, and IMDB datasets show that DPSUR significantly outperforms previous works in terms of convergence speed and model utility.	翻訳日:2023-11-27 23:33:02 公開日:2023-11-23
# 量子行列乗法の改良法 An Improved Method for Quantum Matrix Multiplication ( http://arxiv.org/abs/2311.14044v1 ) ライセンス: Link先を確認	Nhat A. Nghiem and Tzu-Chieh Wei	(参考訳) 線形方程式を解くための有名な量子アルゴリズム(いわゆるHHLアルゴリズム)に続いて、Childs, Kothari and Somma (SIAM Journal on Computing, {\bf 46}: 1920, (2017))は、精度への依存を指数的に改善した線形方程式系の解法を提供した。本稿では、チェビシェフ多項式のアプローチに基づいて、行列をある量子状態に適用するためのそのような結果を補うことを目的とする。この応用を動機づけるいくつかの例を含め、この改良された行列応用アルゴリズムを効率の良い量子プロシージャで明示的に適用することをさらに議論する。 Following the celebrated quantum algorithm for solving linear equations (so-called HHL algorithm), Childs, Kothari and Somma [SIAM Journal on Computing, {\bf 46}: 1920, (2017)] provided an approach to solve a linear system of equations with exponentially improved dependence on precision. In this note, we aim to complement such a result for applying a matrix to some quantum state, based upon their Chebyshev polynomial approach. A few examples that motivate this application are included and we further discuss an application of this improved matrix application algorithm explicitly with an efficient quantum procedure.	翻訳日:2023-11-27 23:32:33 公開日:2023-11-23
# AdapterFL:資源制約型モバイルコンピューティングシステムのための適応的不均一フェデレーション学習 AdapterFL: Adaptive Heterogeneous Federated Learning for Resource-constrained Mobile Computing Systems ( http://arxiv.org/abs/2311.14037v1 ) ライセンス: Link先を確認	Ruixuan Liu and Ming Hu and Zeke Xia and Jun Xia and Pengyu Zhang and Yihao Huang and Yang Liu and Mingsong Chen	(参考訳) Federated Learning (FL)は、データ共有なしで大規模分散クライアントの協調学習を可能にする。しかし,大規模モバイル機器間での計算資源の格差から,従来の均質モデルに基づくフェデレーション学習(fl)の性能は極めて限られている。一方で、すべての多様なクライアントでモデルトレーニングを実現するために、モバイルコンピューティングシステムは、協調学習に小さな低パフォーマンスモデルしか使えない。一方,高い計算資源を持つデバイスは,不十分な生データで高性能な大規模モデルを訓練することはできない。本稿では,モバイル・コンピューティング・システムにおける資源制約問題に対処するために,モデル再構成戦略を用いて大規模異種モバイルデバイスの協調学習を適応的に行う,adapterflと呼ばれる新しいヘテロジニアスflアプローチを提案する。具体的には,大規模モバイルデバイスの計算性能に基づいて複数の候補異種モデルを選択し,各異種モデルを2つのパーティションに分割する。分割を再組み立てることで、大きなモデルの部分パラメータと小さなモデルの部分パラメータを組み合わせることで、さまざまなサイズのモデルを生成することができる。これらの再組み立てモデルを用いてflトレーニングを行い、低パフォーマンスデバイスを用いて大規模モデルの部分パラメータを訓練する。このように、資源制約による大規模モデルの性能劣化を軽減することができる。実験の結果,AdapterFLは資源制約のあるシナリオにおいて,最先端の不均一なフェデレーション学習手法と比較して最大12倍の精度向上を実現可能であることがわかった。 Federated Learning (FL) enables collaborative learning of large-scale distributed clients without data sharing. However, due to the disparity of computing resources among massive mobile computing devices, the performance of traditional homogeneous model-based Federated Learning (FL) is seriously limited. On the one hand, to achieve model training in all the diverse clients, mobile computing systems can only use small low-performance models for collaborative learning. On the other hand, devices with high computing resources cannot train a high-performance large model with their insufficient raw data. To address the resource-constrained problem in mobile computing systems, we present a novel heterogeneous FL approach named AdapterFL, which uses a model reassemble strategy to facilitate collaborative training of massive heterogeneous mobile devices adaptively. Specifically, we select multiple candidate heterogeneous models based on the computing performance of massive mobile devices and then divide each heterogeneous model into two partitions. By reassembling the partitions, we can generate models with varied sizes that are combined by the partial parameters of the large model with the partial parameters of the small model. Using these reassembled models for FL training, we can train the partial parameters of the large model using low-performance devices. In this way, we can alleviate performance degradation in large models due to resource constraints. The experimental results show that AdapterFL can achieve up to 12\% accuracy improvement compared to the state-of-the-art heterogeneous federated learning methods in resource-constrained scenarios.	翻訳日:2023-11-27 23:32:19 公開日:2023-11-23
# 量子コンピューティングアプローチによる高スピンモデルの2次元コヒーレントスペクトル Two-dimensional coherent spectrum of high-spin models via a quantum computing approach ( http://arxiv.org/abs/2311.14035v1 ) ライセンス: Link先を確認	Martin Mootz, Peter P. Orth, Chuankun Huang, Liang Luo, Jigang Wang, and Yong-Xin Yao	(参考訳) 本稿では,高スピンモデルの2次元コヒーレントスペクトル(2DCS)を計算するための量子コンピューティング手法を提案する。我々のアプローチは、複数の磁場パルスの存在下でのリアルタイムダイナミクスをシミュレートすることに基づいている。適応型変動量子力学シミュレーション(AVQDS)アルゴリズムを,その小型回路による研究に利用し,周波数空間の必要な分解能を達成するために,十分に長時間のシミュレーションを可能にする。具体的には、dzyaloshinskii-moriya相互作用と単イオン異方性を含む反強磁性量子スピンモデルを考える。得られた2DCSスペクトルは、未摂動ハミルトニアンの異なる固有状態間の遷移から生じるマグノン周波数の倍数の異なるピークを示す。 1次元コヒーレントスペクトルを2DCSと比較することにより、2DCSがエネルギースペクトルの高分解能を提供することを示す。さらに、高スピン演算子の2つの異なるバイナリエンコーディング(標準バイナリエンコーディングとグレイ符号)を用いて、スピンの大きさで量子資源がスケールする方法について検討する。低磁場ではどちらのエンコーディングも同等の量子リソースを必要とするが、大きな磁場ではグレーコードは有利である。最後に,2DCSの数値計算結果と希土類オルソフェリット系の実験結果を比較した。量子高スピンモデルの2dcsにおけるマグノニック高ハーモニック発生信号の観測強度は実験データとよく一致し、対応する平均場結果に対して有意な改善を示した。 We present and benchmark a quantum computing approach to calculate the two-dimensional coherent spectrum (2DCS) of high-spin models. Our approach is based on simulating their real-time dynamics in the presence of several magnetic field pulses, which are spaced in time. We utilize the adaptive variational quantum dynamics simulation (AVQDS) algorithm for the study due to its compact circuits, which enables simulations over sufficiently long times to achieve the required resolution in frequency space. Specifically, we consider an antiferromagnetic quantum spin model that incorporates Dzyaloshinskii-Moriya interactions and single-ion anisotropy. The obtained 2DCS spectra exhibit distinct peaks at multiples of the magnon frequency, arising from transitions between different eigenstates of the unperturbed Hamiltonian. By comparing the one-dimensional coherent spectrum with 2DCS, we demonstrate that 2DCS provides a higher resolution of the energy spectrum. We further investigate how the quantum resources scale with the magnitude of the spin using two different binary encodings of the high-spin operators: the standard binary encoding and the Gray code. At low magnetic fields both encodings require comparable quantum resources, but at larger field strengths the Gray code is advantageous. Lastly, we compare the numerical 2DCS with experimental results on a rare-earth orthoferrite system. The observed strength of the magnonic high-harmonic generation signals in the 2DCS of the quantum high-spin model aligns well with the experimental data, showing significant improvement over the corresponding mean-field results.	翻訳日:2023-11-27 23:31:56 公開日:2023-11-23
# 正規化フローを用いた日頭電力価格の多変量シナリオ生成 Multivariate Scenario Generation of Day-Ahead Electricity Prices using Normalizing Flows ( http://arxiv.org/abs/2311.14033v1 ) ライセンス: Link先を確認	Hannes Hilger, Dirk Witthaut, Manuel Dahmen, Leonardo Rydin Gorjao, Julius Trebbien, Eike Cramer	(参考訳) 電気市場の取引には、電気価格の実現と予測に付随する不確実性に関する正確な情報が必要である。本稿では,完全データ駆動型深層生成モデルである正規化フローを用いた日頭電力価格の確率的予測手法を提案する。モデル手法は,残負荷予測などの条件的特徴に基づく日頭電力価格の1日当たりのシナリオを生成する。さらに, 先行実現のための拡張的特徴セットと, 正規化フローを現代電力市場の変動条件に適応させる定期的再訓練方式を提案する。特に、ロシアによるウクライナ侵攻に伴うエネルギー危機の影響について調査する。その結果,正規化フローは真の価格分布を再現し,高精度な予測を行う高品質なシナリオを生成することがわかった。さらに,環境変化における適応性の改善は,市場状況の変化に正規化フローを適応させ,高品質な日頭価格シナリオを継続的にサンプリングすることを可能にしている。 Trading on electricity markets requires accurate information about the realization of electricity prices and the uncertainty attached to the predictions. We present a probabilistic forecasting approach for day-ahead electricity prices using the fully data-driven deep generative model called normalizing flows. Our modeling approach generates full-day scenarios of day-ahead electricity prices based on conditional features such as residual load forecasts. Furthermore, we propose extended feature sets of prior realizations and a periodic retraining scheme that allows the normalizing flow to adapt to the changing conditions of modern electricity markets. In particular, we investigate the impact of the energy crisis ensuing from the Russian invasion of Ukraine. Our results highlight that the normalizing flow generates high-quality scenarios that reproduce the true price distribution and yield highly accurate forecasts. Additionally, our analysis highlights how our improvements towards adaptations in changing regimes allow the normalizing flow to adapt to changing market conditions and enables continued sampling of high-quality day-ahead price scenarios.	翻訳日:2023-11-27 23:31:31 公開日:2023-11-23
# 効率的なプライバシー保護のためのPrivateLoRA PrivateLoRA For Efficient Privacy Preserving LLM ( http://arxiv.org/abs/2311.14030v1 ) ライセンス: Link先を確認	Yiming Wang, Yu Lin, Xiaodong Zeng, Guannan Zhang	(参考訳) エンドユーザは、現在のLarge Language Model(LLM)サービスのパラダイムにおいて、プライバシと効率の選択肢に直面します。クラウドベースのパラダイムでは、ユーザは生成品質と処理速度のためにデータのローカリティを妥協せざるを得ない。逆にエッジデバイスのパラダイムはデータのローカリティを維持しているが、十分なパフォーマンスを提供できない。本研究では,エッジデバイス上にプライバシに敏感な計算を分散し,クラウド上での共有計算を行うLLMサービスパラダイムを提案する。アクティベーションだけが中央クラウドとエッジデバイス間で送信され、データのローカリティが保証される。私たちの中心となるイノベーションであるPrivateLoRAは、残余アクティベーションの低いレベルを活用し、95%以上の通信削減を実現することで、困難な通信オーバーヘッドに対処しています。その結果、PrivateLoRAはデータのローカリティを効果的に維持し、非常にリソース効率が高い。標準的な5gネットワークでは、privateloraは7bモデルではデバイスのみのソリューションの300%、33bモデルではa100 gpuの80%以上のスループットを実現している。 PrivateLoRAはまた、高度なパーソナライゼーションのためのLoRAに匹敵するチューニングパフォーマンスを提供する。我々のアプローチは、最先端デバイスのための最先端のジェネレーティブAIへのアクセスを民主化し、一般向けによりカスタマイズされたLLM体験を実現する。我々の知る限り、我々の提案するフレームワークは文献における最初の効率的かつプライバシー保護のLLMソリューションである。 End users face a choice between privacy and efficiency in current Large Language Model (LLM) service paradigms. In cloud-based paradigms, users are forced to compromise data locality for generation quality and processing speed. Conversely, edge device paradigms maintain data locality but fail to deliver satisfactory performance. In this work, we propose a novel LLM service paradigm that distributes privacy-sensitive computation on edge devices and shared computation in the cloud. Only activations are transmitted between the central cloud and edge devices to ensure data locality. Our core innovation, PrivateLoRA, addresses the challenging communication overhead by exploiting the low rank of residual activations, achieving over 95% communication reduction. Consequently, PrivateLoRA effectively maintains data locality and is extremely resource efficient. Under standard 5G networks, PrivateLoRA achieves throughput over 300% of device-only solutions for 7B models and over 80% of an A100 GPU for 33B models. PrivateLoRA also provides tuning performance comparable to LoRA for advanced personalization. Our approach democratizes access to state-of-the-art generative AI for edge devices, paving the way for more tailored LLM experiences for the general public. To our knowledge, our proposed framework is the first efficient and privacy-preserving LLM solution in the literature.	翻訳日:2023-11-27 23:31:15 公開日:2023-11-23
# 画像圧縮におけるCLIPの脆弱性の理解 Understanding the Vulnerability of CLIP to Image Compression ( http://arxiv.org/abs/2311.14029v1 ) ライセンス: Link先を確認	Cangxiong Chen, Vinay P. Namboodiri, Julian Padget	(参考訳) CLIPは、ゼロショット画像認識やその他の画像テキストアライメントタスクに使用される、基礎的な視覚言語モデルである。圧縮条件下での画質変化に対してCLIPは脆弱であることを示す。この驚くべき結果は帰属法統合勾配を用いてさらに解析される。この属性法を用いることで,圧縮がゼロショット認識精度に影響を及ぼす性質を定量的かつ定性的に理解することができる。 CIFAR-10とSTL-10で広く評価した。私たちの研究は、CLIPのこの脆弱性を理解する基盤を提供し、CLIPや他のビジョン言語モデルの堅牢性を改善するためのより効果的な方法の開発に役立つ。 CLIP is a widely used foundational vision-language model that is used for zero-shot image recognition and other image-text alignment tasks. We demonstrate that CLIP is vulnerable to change in image quality under compression. This surprising result is further analysed using an attribution method-Integrated Gradients. Using this attribution method, we are able to better understand both quantitatively and qualitatively exactly the nature in which the compression affects the zero-shot recognition accuracy of this model. We evaluate this extensively on CIFAR-10 and STL-10. Our work provides the basis to understand this vulnerability of CLIP and can help us develop more effective methods to improve the robustness of CLIP and other vision-language models.	翻訳日:2023-11-27 23:30:52 公開日:2023-11-23
# 生成的蒸留を伴う拡散モデルの連続学習 Continual Learning of Diffusion Models with Generative Distillation ( http://arxiv.org/abs/2311.14028v1 ) ライセンス: Link先を確認	Sergi Masip, Pau Rodriguez, Tinne Tuytelaars, Gido M. van de Ven	(参考訳) 拡散モデルは画像合成などのタスクで最先端のパフォーマンスを達成する強力な生成モデルである。しかし、訓練には大量のデータと計算資源が必要である。継続的な学習は、新しいタスクを漸進的に学習し、知識を蓄積することを可能にする。ここでは、以前のタスクで訓練された生成モデルのコピーが、現在のタスクのデータとインターリーブされた合成データを生成する。しかし、拡散モデルに適用される標準的な生成リプレイは、消音能力の壊滅的な損失をもたらす。本稿では,拡散モデルの逆過程全体を拡散する生成蒸留法を提案する。本手法は,計算コストを緩やかに増やすだけで,生成リプレイの連続学習性能を大幅に向上させることを実証する。 Diffusion models are powerful generative models that achieve state-of-the-art performance in tasks such as image synthesis. However, training them demands substantial amounts of data and computational resources. Continual learning would allow for incrementally learning new tasks and accumulating knowledge, thus reusing already trained models would be possible. One potentially suitable approach is generative replay, where a copy of a generative model trained on previous tasks produces synthetic data that are interleaved with data from the current task. However, standard generative replay applied to diffusion models results in a catastrophic loss in denoising capabilities. In this paper, we propose generative distillation, an approach that distils the entire reverse process of a diffusion model. We demonstrate that our approach significantly improves the continual learning performance of generative replay with only a moderate increase in the computational costs.	翻訳日:2023-11-27 23:30:40 公開日:2023-11-23
# クラウド光厚推定のための合成データセットの作成とベンチマーク Creating and Benchmarking a Synthetic Dataset for Cloud Optical Thickness Estimation ( http://arxiv.org/abs/2311.14024v1 ) ライセンス: Link先を確認	Aleksis Pirinen, Nosheen Abid, Nuria Agues Paszkowsky, Thomas Ohlson Timoudas, Ronald Scheirer, Chiara Ceccobello, Gy\"orgy Kov\'acs, Anders Persson	(参考訳) 雲の形成はしばしば衛星による地球表面の観測を曖昧にし、土地被覆マッピング、海洋色分析、作物のモニタリングなどの地球観測(eo)活動を制限する。リモートセンシング領域における機械学習(ML)メソッドの統合は、クラウド検出やフィルタリングを含む幅広いEOタスクのパフォーマンスを大幅に向上させたが、まだ改善の余地がたくさんある。重要なボトルネックは、一般的にMLメソッドがトレーニングのために大量のアノテートされたデータに依存していることだ。これは特に、雲の光学的厚さ(COT)の推定に当てはまる。 COTの信頼性の高い推定は、実際に一般的に行われているように、事前に特定されたクラウドカテゴリを使用する場合と比較して、よりきめ細かいアプリケーション依存の制御を可能にする。そこで本研究では,sentinel-2 プラットフォームに搭載されたマルチスペクトラルインスツルメンツ (msi) センサのスペクトル帯域12について,上層大気放射をシミュレートした,cot推定のための新しい合成データセットを提案する。これらのデータポイントは、異なる雲の種類、COT、地表および大気プロファイルを考慮してシミュレーションされている。スペクトル帯域の反射率の測定値からCOTを予測するためのMLモデルの大規模な実験により,提案したデータセットの有用性が示された。実データへの一般化は、2つの衛星画像データセットでも実証されています。合成データ、新たに収集された実際のデータセット、コード、モデルはhttps://github.com/aleksispi/ml-cloud-opt-thickで公開されている。 Cloud formations often obscure optical satellite-based monitoring of the Earth's surface, thus limiting Earth observation (EO) activities such as land cover mapping, ocean color analysis, and cropland monitoring. The integration of machine learning (ML) methods within the remote sensing domain has significantly improved performance on a wide range of EO tasks, including cloud detection and filtering, but there is still much room for improvement. A key bottleneck is that ML methods typically depend on large amounts of annotated data for training, which is often difficult to come by in EO contexts. This is especially true for the task of cloud optical thickness (COT) estimation. A reliable estimation of COT enables more fine-grained and application-dependent control compared to using pre-specified cloud categories, as is commonly done in practice. To alleviate the COT data scarcity problem, in this work we propose a novel synthetic dataset for COT estimation, where top-of-atmosphere radiances have been simulated for 12 of the spectral bands of the Multi-Spectral Instrument (MSI) sensor onboard Sentinel-2 platforms. These data points have been simulated under consideration of different cloud types, COTs, and ground surface and atmospheric profiles. Extensive experimentation of training several ML models to predict COT from the measured reflectivity of the spectral bands demonstrates the usefulness of our proposed dataset. Generalization to real data is also demonstrated on two satellite image datasets -- one that is publicly available, and one which we have collected and annotated. The synthetic data, the newly collected real dataset, code and models have been made publicly available at https://github.com/aleksispi/ml-cloud-opt-thick.	翻訳日:2023-11-27 23:30:29 公開日:2023-11-23
# 境界状態強化量子メトロロジーの量子シミュレーション Quantum Simulation of Bound-State-Enhanced Quantum Metrology ( http://arxiv.org/abs/2311.14020v1 ) ライセンス: Link先を確認	Cheng-Ge Liu and Cong-Wei Lu and Na-Na Zhang and Qing Ai	(参考訳) 量子気象学は量子効果を探求し、古典的な限界を超える物理量の測定精度を向上させる。しかし,システムと環境の相互作用により,デコヒーレンスは測定精度を著しく低下させることができる。長期限界における測定精度を回復するための多くの手法が提案されている。最近、境界状態は誤差のない測定を補助し、$t^{-1}$スケーリング(K. Bai, Z. Peng, H. G. Luo, J. H. An, Phys. Rev. Lett. 123, 040402 (2019))]を回復できることがわかった。ここでは、$N$-qubitsを用いて、1つの原子と結合共振器を含むハイブリッドシステムのオープン量子力学をシミュレートする手法を提案する。境界状態の存在により時間が増えるにつれて測定の誤差がなくなることが判明した。解析的および数値的シミュレーションにより, ハイブリッド系に境界状態が存在する場合, 測定誤差の$t^{-1}$スケーリングが再現可能であることを証明した。興味深いことに、原子遷移周波数の評価に使用できる完璧な振動が存在することが観察される。有限$N$の場合、完全振動の持続時間は、もう1つのキュービットが関与するにつれて2倍になる。 Quantum metrology explores quantum effects to improve the measurement accuracy of some physical quantities beyond the classical limit. However, due to the interaction between the system and the environment, the decoherence can significantly reduce the accuracy of the measurement. Many methods have been proposed to restore the accuracy of the measurement in the long-time limit. Recently, it has been found that the bound state can assist the error-free measurement and recover the $t^{-1}$ scaling [K. Bai, Z. Peng, H. G. Luo, and J. H. An, Phys. Rev. Lett. 123, 040402 (2019)]. Here, by using $N$-qubits, we propose a method to simulate the open quantum dynamics of the hybrid system including one atom and coupled resonators. We find that the error of the measurement can vanish as the time increases due to the existence of the bound state. By both analytical and numerical simulations, we prove the $t^{-1}$ scaling of the measurement error can be recovered when there is a bound state in the hybrid system. Interestingly, we observe that there are perfect oscillations which can be used for the evaluation of the atomic transition frequency. For a finite-$N$, the duration of the perfect oscillations doubles as one more qubit is involved.	翻訳日:2023-11-27 23:29:59 公開日:2023-11-23
# 機械学習アルゴリズムのハイパーパラメータ景観について On the Hyperparameter Landscapes of Machine Learning Algorithms ( http://arxiv.org/abs/2311.14014v1 ) ライセンス: Link先を確認	Mingyu Huang, Ke Li	(参考訳) 近年、機械学習(ML)モデルのための多くのハイパーパラメータ最適化(HPO)手法が成功しているにもかかわらず、モデルハイパーパラメータ(HP)と予測損失(フィットネス)の間の複雑な相互作用は、HPOを理解する上で重要な前提条件である。これにより、HPOプロセスにおける説明可能性に限界が生じ、人間の信頼の欠如とアルゴリズムのボトルネックの特定が困難になる。本稿では,6 mlモデルと11 モデル以上のモデル構成,67 のデータセットと異なるフィダリティレベルにおいて,1500 hp のロスランドスケープに対して,大規模フィットネスランドスケープ分析 (fla) を行うことにより,このブラックボックスに光を当てる。我々は、その地形のスムーズさ、中立性、モダリティの観点から、最初の統一された総合的な肖像画を明らかにする。また,このような特性はデータセットやフィディティー間で高い転送性を有しており,マルチ忠実度と転送学習手法の成功の基本的な証拠となっている。これらの発見は、視覚的および定量的な指標を組み合わせた専用のFLAフレームワークを開発することで可能となる。我々は、NAS-Bench-101のランドスケープを分析して、このフレームワークの可能性をさらに実証し、幅広いAutoMLタスクの基本的な理解をファシリケートできると考えている。 Despite the recent success in a plethora of hyperparameter optimization (HPO) methods for machine learning (ML) models, the intricate interplay between model hyperparameters (HPs) and predictive losses (a.k.a fitness), which is a key prerequisite for understanding HPO, remain notably underexplored in our community. This results in limited explainability in the HPO process, rendering a lack of human trust and difficulties in pinpointing algorithm bottlenecks. In this paper, we aim to shed light on this black box by conducting large-scale fitness landscape analysis (FLA) on 1,500 HP loss landscapes of 6 ML models with more than 11 model configurations, across 67 datasets and different levels of fidelities. We reveal the first unified, comprehensive portrait of their topographies in terms of smoothness, neutrality and modality. We also show that such properties are highly transferable across datasets and fidelities, providing fundamental evidence for the success of multi-fidelity and transfer learning methods. These findings are made possible by developing a dedicated FLA framework that incorporates a combination of visual and quantitative measures. We further demonstrate the potential of this framework by analyzing the NAS-Bench-101 landscape, and we believe it is able to faciliate fundamental understanding of a broader range of AutoML tasks.	翻訳日:2023-11-27 23:29:34 公開日:2023-11-23
# シャドー:シームズネットワークにおける効率的なトレーニングのための新しい損失関数 Shadow: A Novel Loss Function for Efficient Training in Siamese Networks ( http://arxiv.org/abs/2311.14012v1 ) ライセンス: Link先を確認	Alif Elham Khan, Mohammad Junayed Hasan, Humayra Anjum, Nabeel Mohammed	(参考訳) 最近の類似性検出タスクの大幅な進歩にもかかわらず、既存のアプローチはメモリ制約下で大きな課題をもたらす。この主な理由の1つは、シームズネットワークにおけるトリプルト損失のような計算コストのかかるメトリック学習損失関数を使用することである。本稿では,損失計算中の埋め込み空間の次元を,性能を損なわずに圧縮するシャドウロスと呼ばれる新しい損失関数を提案する。埋め込みの射影間の距離は、距離がクラス類似性の測度と直接一致するコンパクト射影空間上の入力から学習される。低次元射影空間を投影すると、損失関数はより早く収束し、その結果、分類された画像クラスターはクラス間距離が高く、クラス内距離も小さい。シャドウロスはメモリ制約デバイスを好む埋め込み次元を減らすだけでなく、さまざまなデータセットで5\%-10\%の精度で最先端のトリプルトマージンロスよりも一貫してパフォーマンスが向上する。提案した損失関数はモデル非依存であり、いくつかの試験されたモデルで性能を向上する。バランスのとれた、不均衡な、医療的、非医療的なイメージデータセットにおけるその効果と堅牢性は、特定のモデルやデータセットに固有のものではなく、メモリと計算量が少なく、一貫して優れたパフォーマンスを示すことを示唆している。 Despite significant recent advances in similarity detection tasks, existing approaches pose substantial challenges under memory constraints. One of the primary reasons for this is the use of computationally expensive metric learning loss functions such as Triplet Loss in Siamese networks. In this paper, we present a novel loss function called Shadow Loss that compresses the dimensions of an embedding space during loss calculation without loss of performance. The distance between the projections of the embeddings is learned from inputs on a compact projection space where distances directly correspond to a measure of class similarity. Projecting on a lower-dimension projection space, our loss function converges faster, and the resulting classified image clusters have higher inter-class and smaller intra-class distances. Shadow Loss not only reduces embedding dimensions favoring memory constraint devices but also consistently performs better than the state-of-the-art Triplet Margin Loss by an accuracy of 5\%-10\% across diverse datasets. The proposed loss function is also model agnostic, upholding its performance across several tested models. Its effectiveness and robustness across balanced, imbalanced, medical, and non-medical image datasets suggests that it is not specific to a particular model or dataset but demonstrates superior performance consistently while using less memory and computation.	翻訳日:2023-11-27 23:29:11 公開日:2023-11-23
# 非慣性フレームにおけるW状態の量子条件相互情報 Quantum conditional mutual information of W state in non-inertial frames ( http://arxiv.org/abs/2311.14010v1 ) ライセンス: Link先を確認	H Saveetha, Peter P. Rohde and R Chandrashekar	(参考訳) 量子条件相互情報(Quantum Conditional mutual information, QCMI)は、多元的情報理論の尺度である。これは、第3のキュービットの観点から、2つのキュービット間の相関の量を見つけるために用いられる。この研究では、量子ビットの一部が加速運動下にあるとき、三分割W状態のQCMIを特徴づける。ここでは, 単一モード近似における無質量フェルミオン場について検討する。量子ビットの加速度に関して可能なすべての状況を考える。この結果から、QCMIは加速される量子ビットの役割に応じて増大または減少する可能性がある。最後に, 分離状態と分離状態について検討し, qcmiと相関関係について考察する。 Quantum conditional mutual information (QCMI) is a versatile information theoretic measure. It is used to find the amount of correlations between two qubits from the perspective of a third qubit. In this work we characterise the QCMI of tripartite W-states when some of the qubits are under accelerated motion. Here for our investigations we consider a massless fermionic field in the single mode approximation. We consider all possible situations with respect to acceleration of the qubits. From our results we observe that QCMI can either increase or decrease depending on the role of the qubit being accelerated. Finally we discuss the connection between QCMI and correlations by studying the biseparable and separable states.	翻訳日:2023-11-27 23:28:46 公開日:2023-11-23
# Sentinel-1とSentinel-2から得られた高分解能人口地図 High-resolution Population Maps Derived from Sentinel-1 and Sentinel-2 ( http://arxiv.org/abs/2311.14006v1 ) ライセンス: Link先を確認	Nando Metzger, Rodrigo Caye Daudt, Devis Tuia, Konrad Schindler	(参考訳) 詳細な人口地図は人道行動から都市計画まで様々な分野で重要な役割を果たしている。このような地図をタイムリーかつスケーラブルに生成することは、特にデータスカース領域において課題となる。そこで我々は,Sentinel-1 と Sentinel-2 の衛星画像のみを無償で利用できる人口マッピング手法であるPOPCORN を開発した。最小限のデータ要求にもかかわらず、我々のアプローチは既存のスキームのマッピング精度を超えています。例えば、400人未満の地域国勢調査に基づいて100m GSDでルワンダの人口地図を作成できた。キガリでは、これらの地図はw.r.t.地上真理参照地図の66%の$r^2$スコアに達し、平均誤差は1haあたり$pm$10である。同時に、POPCORNは、ビルトアップされた地域の明示的な地図と、地元の建物占有率を検索し、マッピングプロセスが解釈可能となり、例えば、工業倉庫のような、人口の少ない地域の分布に関する追加の洞察を提供する。さらに、一度訓練すると、人口の変化を追跡するためにモデルが繰り返し適用され、ウガンダからルワンダまで、地理的に類似した地域へ移行できることがわかった。本研究の目的は,特に人口動態の強い地域では,費用のかかるマイクロセンサスキャンペーンの資源が不足している可能性があることを認識して,最新の高解像度人口地図へのアクセスを民主化することにある。 Detailed population maps play an important role in diverse fields ranging from humanitarian action to urban planning. Generating such maps in a timely and scalable manner presents a challenge, especially in data-scarce regions. To address it we have developed POPCORN, a population mapping method whose only inputs are free, globally available satellite images from Sentinel-1 and Sentinel-2; and a small number of aggregate population counts over coarse census districts for calibration. Despite the minimal data requirements our approach surpasses the mapping accuracy of existing schemes, including several that rely on building footprints derived from high-resolution imagery. E.g., we were able to produce population maps for Rwanda with 100m GSD based on less than 400 regional census counts. In Kigali, those maps reach an $R^2$ score of 66% w.r.t. a ground truth reference map, with an average error of only $\pm$10 inhabitants/ha. Conveniently, POPCORN retrieves explicit maps of built-up areas and of local building occupancy rates, making the mapping process interpretable and offering additional insights, for instance about the distribution of built-up, but unpopulated areas, e.g., industrial warehouses. Moreover, we find that, once trained, the model can be applied repeatedly to track population changes; and that it can be transferred to geographically similar regions, e.g., from Uganda to Rwanda). With our work we aim to democratize access to up-to-date and high-resolution population maps, recognizing that some regions faced with particularly strong population dynamics may lack the resources for costly micro-census campaigns.	翻訳日:2023-11-27 23:28:38 公開日:2023-11-23
# サイドチャネル攻撃が組込み人工知能のブラックボックス特性を破る時 When Side-Channel Attacks Break the Black-Box Property of Embedded Artificial Intelligence ( http://arxiv.org/abs/2311.14005v1 ) ライセンス: Link先を確認	Benoit Coqueret, Mathieu Carbone, Olivier Sentieys, Gabriel Zaid	(参考訳) 人工知能、特にディープニューラルネットワーク(DNN)は、特定の広告からオブジェクト検出まで、いくつかのタスクの標準として過去10年間に急速に登場した。提供された性能はDNNアルゴリズムを重要な組み込みシステムの一部にし、効率と信頼性の両方を必要とした。特に、DNNは、人間の観察者にとって検出不能であると同時に、ネットワークを騙すために設計された悪意のある例である。以前の研究では、このような攻撃をブラックボックスの設定で実装するためのフレームワークが提案されていたが、攻撃者がニューラルネットワークのロジットにアクセスでき、従来のブラックボックスの仮定を破る、という仮説に依存することが多い。本稿では,攻撃者がロジットにアクセスできない,本物のブラックボックスのシナリオについて検討する。特に,ロジットを抽出してこの制約を解決するアーキテクチャ非依存攻撃を提案する。ハードウェアとソフトウェアを併用して,電磁的漏れを利用して入力のログを抽出し,攻撃者が勾配を推定し,最先端の敵の例を生成して標的のニューラルネットワークを騙す,サイドチャネル攻撃を行う。この逆攻撃の例を通じて,より一般的な攻撃フレームワークにおいて,サイドチャネルを用いたロジット抽出が,ロジットか信頼度スコアのいずれかを必要とする第一歩として有効であることを示す。 Artificial intelligence, and specifically deep neural networks (DNNs), has rapidly emerged in the past decade as the standard for several tasks from specific advertising to object detection. The performance offered has led DNN algorithms to become a part of critical embedded systems, requiring both efficiency and reliability. In particular, DNNs are subject to malicious examples designed in a way to fool the network while being undetectable to the human observer: the adversarial examples. While previous studies propose frameworks to implement such attacks in black box settings, those often rely on the hypothesis that the attacker has access to the logits of the neural network, breaking the assumption of the traditional black box. In this paper, we investigate a real black box scenario where the attacker has no access to the logits. In particular, we propose an architecture-agnostic attack which solve this constraint by extracting the logits. Our method combines hardware and software attacks, by performing a side-channel attack that exploits electromagnetic leakages to extract the logits for a given input, allowing an attacker to estimate the gradients and produce state-of-the-art adversarial examples to fool the targeted neural network. Through this example of adversarial attack, we demonstrate the effectiveness of logits extraction using side-channel as a first step for more general attack frameworks requiring either the logits or the confidence scores.	翻訳日:2023-11-27 23:28:06 公開日:2023-11-23
# モデル選択におけるクロスバリデーションと変異バリデーションの実証比較 Empirical Comparison between Cross-Validation and Mutation-Validation in Model Selection ( http://arxiv.org/abs/2311.14079v1 ) ライセンス: Link先を確認	Jinyang Yu, Sami Hamdan, Leonard Sasse, Abigail Morrison, Kaustubh R. Patil	(参考訳) 変異検証(MV)は、最近提案されたモデル選択のアプローチであり、広く使われているクロスバリデーション(CV)法と比較して、その特徴と潜在的な利点から重要な関心を集めている。本研究では,ベンチマークと実世界のデータセットを用いて,MVと$k$fold CVを比較した。ベイズ試験を用いて, 実用的等価性, CV優越性, MV優越性の3つの後続確率を推定した。また,選択したモデルの容量と計算効率の差についても検討した。その結果、MVとCVは、様々な機械学習アルゴリズムとベンチマークデータセットの大部分で、実質的に等価な一般化性能を持つモデルを選択することがわかった。 MVはより単純なモデルを選択し、計算コストを下げるという利点を示した。しかし、mvは過度に単純化されたモデルを選択し、過度なパラメータ選択の不安定さを示した。これらのmvの限界は、脳機能的接続を用いて出生時の性別を予測する現実世界の神経科学的タスクの評価においてより顕著となった。 Mutation validation (MV) is a recently proposed approach for model selection, garnering significant interest due to its unique characteristics and potential benefits compared to the widely used cross-validation (CV) method. In this study, we empirically compared MV and $k$-fold CV using benchmark and real-world datasets. By employing Bayesian tests, we compared generalization estimates yielding three posterior probabilities: practical equivalence, CV superiority, and MV superiority. We also evaluated the differences in the capacity of the selected models and computational efficiency. We found that both MV and CV select models with practically equivalent generalization performance across various machine learning algorithms and the majority of benchmark datasets. MV exhibited advantages in terms of selecting simpler models and lower computational costs. However, in some cases MV selected overly simplistic models leading to underfitting and showed instability in hyperparameter selection. These limitations of MV became more evident in the evaluation of a real-world neuroscientific task of predicting sex at birth using brain functional connectivity.	翻訳日:2023-11-27 23:20:23 公開日:2023-11-23
# VLC IoTネットワークのための機械学習に基づく分散TDMA Machine learning-based decentralized TDMA for VLC IoT networks ( http://arxiv.org/abs/2311.14078v1 ) ライセンス: Link先を確認	Armin Makvandi, Yousef Seifi Kavian	(参考訳) 本稿では,可視光通信(vlc)モノのインターネット(iot)ネットワークのための機械学習に基づく分散時分割多重アクセス(tdma)アルゴリズムを提案する。提案アルゴリズムは強化学習アルゴリズムであるQ-learningに基づいている。本稿では、同期フレームを送信し、他のノードに送信時間スロットを割り当てるコーディネータノードが存在しない分散状態を考える。提案アルゴリズムは同期に分散的手法を用いており,各ノードはQ学習アルゴリズムを用いて衝突のないデータ送信に最適な送信時間スロットを求める。提案アルゴリズムは,本研究所で設計・実装されたVLCハードウェアシステム上に実装されている。評価パラメータは、平均報酬、収束時間、出力、平均遅延、データパケットサイズである。その結果,提案アルゴリズムは高速に収束し,ネットワークに無衝突分散TDMAを提供することがわかった。提案アルゴリズムは、分散VLC IoTネットワークの潜在的選択として、衝突回避によるキャリアセンス多重アクセス(CSMA/CA)アルゴリズムと比較する。その結果,提案アルゴリズムはCSMA/CAよりも最大61%,平均遅延を最大49%低減できることがわかった。 In this paper, a machine learning-based decentralized time division multiple access (TDMA) algorithm for visible light communication (VLC) Internet of Things (IoT) networks is proposed. The proposed algorithm is based on Q-learning, a reinforcement learning algorithm. This paper considers a decentralized condition in which there is no coordinator node for sending synchronization frames and assigning transmission time slots to other nodes. The proposed algorithm uses a decentralized manner for synchronization, and each node uses the Q-learning algorithm to find the optimal transmission time slot for sending data without collisions. The proposed algorithm is implemented on a VLC hardware system, which had been designed and implemented in our laboratory. Average reward, convergence time, goodput, average delay, and data packet size are evaluated parameters. The results show that the proposed algorithm converges quickly and provides collision-free decentralized TDMA for the network. The proposed algorithm is compared with carrier-sense multiple access with collision avoidance (CSMA/CA) algorithm as a potential selection for decentralized VLC IoT networks. The results show that the proposed algorithm provides up to 61% more goodput and up to 49% less average delay than CSMA/CA.	翻訳日:2023-11-27 23:20:07 公開日:2023-11-23
# RetroDiff:多段階分布補間としての再合成 RetroDiff: Retrosynthesis as Multi-stage Distribution Interpolation ( http://arxiv.org/abs/2311.14077v1 ) ライセンス: Link先を確認	Yiming Wang, Yuxuan Song, Minkai Xu, Rui Wang, Hao Zhou, Weiying Ma	(参考訳) 再合成は、化学者が適切な反応分子や決定された生成物分子の合成経路を見つけるのを助けることを目的として、バイオ医薬品の基本的な課題となっている。反応物と積が2Dグラフとして表されるので、逆合成は条件付きグラフからグラフへの生成タスクを構成する。グラフ生成のための離散拡散モデルの最近の進歩に触発されて,この問題に対処する新しい拡散法であるRetro synthesis Diffusion(RetroDiff)を導入する。しかし,本質的な化学反応テンプレート情報を保持しつつ拡散ベースのグラフ・ツー・グラフのフレームワークを統合することは,大きな課題である。私たちの重要な革新は、多段階拡散プロセスを開発することです。本手法では, ダミー分布生成物から最初に外部基を採取し, 生成物と生成物を結合する外部結合を生成するために, 逆合成法を分解する。興味深いことに、このような生成過程は、広く適応された半テンプレート逆合成過程、すなわち反応中心の同定から合成完了までの逆であり、エラーの蓄積を著しく減少させる。評価実験の結果,提案手法が他の準テンプレート法よりも優れていることが示された。 Retrosynthesis poses a fundamental challenge in biopharmaceuticals, aiming to aid chemists in finding appropriate reactant molecules and synthetic pathways given determined product molecules. With the reactant and product represented as 2D graphs, retrosynthesis constitutes a conditional graph-to-graph generative task. Inspired by the recent advancements in discrete diffusion models for graph generation, we introduce Retrosynthesis Diffusion (RetroDiff), a novel diffusion-based method designed to address this problem. However, integrating a diffusion-based graph-to-graph framework while retaining essential chemical reaction template information presents a notable challenge. Our key innovation is to develop a multi-stage diffusion process. In this method, we decompose the retrosynthesis procedure to first sample external groups from the dummy distribution given products and then generate the external bonds to connect the products and generated groups. Interestingly, such a generation process is exactly the reverse of the widely adapted semi-template retrosynthesis procedure, i.e. from reaction center identification to synthon completion, which significantly reduces the error accumulation. Experimental results on the benchmark have demonstrated the superiority of our method over all other semi-template methods.	翻訳日:2023-11-27 23:19:51 公開日:2023-11-23
# タスク指向対話データセットにおけるオープンドメイン対話スニペットの探索 Searching for Snippets of Open-Domain Dialogue in Task-Oriented Dialogue Datasets ( http://arxiv.org/abs/2311.14076v1 ) ライセンス: Link先を確認	Armand Stricker, Patrick Paroubek	(参考訳) 既存の対話コーパスやモデルは2つの主要なカテゴリに適合するように設計されている: タスク指向の対話はレストランの予約や飛行機チケットの予約といった機能目標を表現する。しかし、人間はモードをシームレスに切り替え、chitchatを使ってタスク指向の会話を強化する傾向があります。このギャップを埋めるため、最近新しいデータセットが作成され、コミュニケーションモードと会話の例が混在している。使用されるアプローチは、既存のタスク指向データセットにchit-chatスニペットを追加する傾向がある。しかし、人間で観察される傾向を考えると、後者がchit-chatシークエンスを保持しないのかどうか疑問である。トピックモデリングと、ソーシャルトークに関連するキーワードの集合に最もよく似たトピック検索を用いて、スキーマガイド対話とマルチWOZのトレーニングセットを探索する。本研究は,ソーシャルトークに関連するシーケンスが自然に存在することを示し,chitchatがタスク指向対話に組み合わされる方法に関するさらなる研究を動機付ける。 Most existing dialogue corpora and models have been designed to fit into 2 predominant categories : task-oriented dialogues portray functional goals, such as making a restaurant reservation or booking a plane ticket, while chit-chat/open-domain dialogues focus on holding a socially engaging talk with a user. However, humans tend to seamlessly switch between modes and even use chitchat to enhance task-oriented conversations. To bridge this gap, new datasets have recently been created, blending both communication modes into conversation examples. The approaches used tend to rely on adding chit-chat snippets to pre-existing, human-generated task-oriented datasets. Given the tendencies observed in humans, we wonder however if the latter do not \textit{already} hold chit-chat sequences. By using topic modeling and searching for topics which are most similar to a set of keywords related to social talk, we explore the training sets of Schema-Guided Dialogues and MultiWOZ. Our study shows that sequences related to social talk are indeed naturally present, motivating further research on ways chitchat is combined into task-oriented dialogues.	翻訳日:2023-11-27 23:19:30 公開日:2023-11-23
# 修正から回復を学ぶ Learning Saliency From Fixations ( http://arxiv.org/abs/2311.14073v1 ) ライセンス: Link先を確認	Yasser Abdelaziz Dahou Djilali, Kevin McGuiness, Noel O'Connor	(参考訳) 本稿では, 画像の並列デコードを利用して, 修正マップからのみ唾液度を学習する手法を提案する。モデルは通常、離散固定写像の最適化の難しさを克服するため、連続的サルマンシー写像に依存する。我々は,saliencyデータセットを生成する実験的なセットアップを再現しようとする。提案手法は,両部マッチングとトランスフォーマーエンコーダ・デコーダアーキテクチャを用いて,一意な固定予測を強制するグローバルな損失を通じて,サリエンシ予測を直接セット予測問題として扱う。学習した修正クエリの固定セットを利用することで、画像特徴に対する横断的理由は、修正点を直接出力し、他の現代の唾液量予測器と区別する。我々のアプローチは、Saliency TRansformer (SalTR) と呼ばれ、SaliconとMIT300ベンチマークの最先端のアプローチと同等のスコアを得る。 We present a novel approach for saliency prediction in images, leveraging parallel decoding in transformers to learn saliency solely from fixation maps. Models typically rely on continuous saliency maps, to overcome the difficulty of optimizing for the discrete fixation map. We attempt to replicate the experimental setup that generates saliency datasets. Our approach treats saliency prediction as a direct set prediction problem, via a global loss that enforces unique fixations prediction through bipartite matching and a transformer encoder-decoder architecture. By utilizing a fixed set of learned fixation queries, the cross-attention reasons over the image features to directly output the fixation points, distinguishing it from other modern saliency predictors. Our approach, named Saliency TRansformer (SalTR), achieves metric scores on par with state-of-the-art approaches on the Salicon and MIT300 benchmarks.	翻訳日:2023-11-27 23:19:08 公開日:2023-11-23
# chitchatによるタスク指向対話の強化--語彙の多様性と多様性に基づく比較研究 Enhancing Task-Oriented Dialogues with Chitchat: a Comparative Study Based on Lexical Diversity and Divergence ( http://arxiv.org/abs/2311.14067v1 ) ライセンス: Link先を確認	Armand Stricker, Patrick Paroubek	(参考訳) 近年,タスク指向対話(TOD)は,対話をより多様かつ活発にするために,chitchatで強化されている。この強化は、TODが狭い領域に制限されることがしばしばあり、反復的かつ予測可能な応答の緩和が重要な課題となるため、特に貴重である。本稿では,3つのchitchat拡張の比較分析を行い,多様性の観点から最も効果的なアプローチを明らかにすることを目的とした。さらに、タスク指向言語であるchitchatと、chitchatデータセットで一般的に見られるchitchatとの相違を定量化し、各比較で上位20のダイバージェントキーワードを強調する。本研究は,tod強化のための今後の強化に関する議論を促し,より多様で自然な交流を実現するためのタスクを超えて対話を基礎付けることの重要性を強調した。 As a recent development, task-oriented dialogues (TODs) have been enriched with chitchat in an effort to make dialogues more diverse and engaging. This enhancement is particularly valuable as TODs are often confined to narrow domains, making the mitigation of repetitive and predictable responses a significant challenge. This paper presents a comparative analysis of three chitchat enhancements, aiming to identify the most effective approach in terms of diversity. Additionally, we quantify the divergence between the added chitchat, the original task-oriented language, and chitchat typically found in chitchat datasets, highlighting the top 20 divergent keywords for each comparison. Our findings drive a discussion on future enhancements for augmenting TODs, emphasizing the importance of grounding dialogues beyond the task to achieve more diverse and natural exchanges.	翻訳日:2023-11-27 23:18:53 公開日:2023-11-23
# HGCLIP:階層的理解のためのグラフ表現を用いた視覚言語モデルの探索 HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding ( http://arxiv.org/abs/2311.14064v1 ) ライセンス: Link先を確認	Peng Xia, Xingtong Yu, Ming Hu, Lie Ju, Zhiyong Wang, Peibo Duan, Zongyuan Ge	(参考訳) 対象分類は通常、多粒性分類階層に分類される。異なる階層レベルでカテゴリを分類する場合、従来のユニモーダルアプローチは主にイメージ機能に焦点を当て、複雑なシナリオにおける制限を明らかにする。ビジョンランゲージモデル(VLM)とクラス階層を統合する最近の研究は、将来性を示しているが、階層関係を完全に活用するには至っていない。これらの取り組みは、様々なカテゴリの粒度で効果的に実行できないことによる制約を受けている。本稿では,CLIPとグラフ表現学習による階層型クラス構造のより深い活用を効果的に組み合わせた新しいフレームワーク(HGCLIP)を提案する。各カテゴリのテキスト的または画像的特徴を表すノードを用いて、クラス階層をグラフに構築することを検討する。グラフエンコーダを通過した後、テキストの特徴は階層構造情報を含み、画像特徴はアテンション機構を通じてプロトタイプから派生したクラス認識の特徴を強調している。本手法は,総称と細粒度の両方の視覚認識ベンチマークにおいて有意な改善を示す。私たちのコードはhttps://github.com/richard-peng-xia/HGCLIPで完全に利用可能です。 Object categories are typically organized into a multi-granularity taxonomic hierarchy. When classifying categories at different hierarchy levels, traditional uni-modal approaches focus primarily on image features, revealing limitations in complex scenarios. Recent studies integrating Vision-Language Models (VLMs) with class hierarchies have shown promise, yet they fall short of fully exploiting the hierarchical relationships. These efforts are constrained by their inability to perform effectively across varied granularity of categories. To tackle this issue, we propose a novel framework (HGCLIP) that effectively combines CLIP with a deeper exploitation of the Hierarchical class structure via Graph representation learning. We explore constructing the class hierarchy into a graph, with its nodes representing the textual or image features of each category. After passing through a graph encoder, the textual features incorporate hierarchical structure information, while the image features emphasize class-aware features derived from prototypes through the attention mechanism. Our approach demonstrates significant improvements on both generic and fine-grained visual recognition benchmarks. Our codes are fully available at https://github.com/richard-peng-xia/HGCLIP.	翻訳日:2023-11-27 23:18:39 公開日:2023-11-23
# VSRモデルはRS3を超えて一般化されるか? Do VSR Models Generalize Beyond LRS3? ( http://arxiv.org/abs/2311.14063v1 ) ライセンス: Link先を確認	Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Eustache Le Bihan, Haithem Boussaid, Ebtessam Almazrouei, Merouane Debbah	(参考訳) Lip Reading Sentences-3 (LRS3) ベンチマークは、ここ数年、視覚音声認識(VSR)における激しい研究の焦点となっている。その結果、過度に使用されるテストセットに過度に適合するリスクが高まり、これは1時間しか続かない。この問題を緩和するために、LSS3データセット生成プロセスに密接に従えば、WildVSRという新しいVSRテストセットを構築します。次に、現在のVSRモデルが新しいテストデータに一般化される範囲を評価し、分析する。我々は、利用可能なVSRモデルを幅広く評価し、対応するLSS3結果と比較して、テストセットの性能が大幅に低下することを示した。以上の結果から,単語誤り率の増加は,LSS3テストセットよりもわずかに困難で野生の唇配列に一般化できないモデルが原因であることが示唆された。我々の新しいテストベンチマークは、より堅牢なVSRモデルに向けた将来の研究を可能にするために公開されています。 The Lip Reading Sentences-3 (LRS3) benchmark has primarily been the focus of intense research in visual speech recognition (VSR) during the last few years. As a result, there is an increased risk of overfitting to its excessively used test set, which is only one hour duration. To alleviate this issue, we build a new VSR test set named WildVSR, by closely following the LRS3 dataset creation processes. We then evaluate and analyse the extent to which the current VSR models generalize to the new test data. We evaluate a broad range of publicly available VSR models and find significant drops in performance on our test set, compared to their corresponding LRS3 results. Our results suggest that the increase in word error rates is caused by the models inability to generalize to slightly harder and in the wild lip sequences than those found in the LRS3 test set. Our new test benchmark is made public in order to enable future research towards more robust VSR models.	翻訳日:2023-11-27 23:18:22 公開日:2023-11-23
# テキストガイド画像分類器のハードウェアレジリエンス特性 Hardware Resilience Properties of Text-Guided Image Classifiers ( http://arxiv.org/abs/2311.14062v1 ) ライセンス: Link先を確認	Syed Talal Wasim, Kabila Haile Saboka, Abdulrahman Mahmoud, Salman Khan, David Brooks, Gu-Yeon Wei	(参考訳) 本稿では,過渡的ハードウェアエラーに直面した配置中の画像分類モデルの信頼性を高める新しい手法を提案する。 GPT-3から派生したリッチテキスト埋め込みをクラスごとの質問プロンプトとCLIP事前訓練テキストエンコーダを用いて,分類層の初期化としての影響を検討する。当社のアプローチは,PyTorchのベースラインモデルと比較して,最小限の精度低下(平均0.3%)で,さまざまなアーキテクチャにおけるハードウェア信頼性(最大14倍)の平均的な向上を実現している。さらに,任意の画像分類バックボーンとシームレスに統合し,様々なネットワークアーキテクチャにまたがる結果を表示し,パラメータとフラップのオーバーヘッドを低減し,一貫したトレーニングレシピに従う。この研究は、ハードウェア障害に対する画像分類モデルのロバスト性を強化するための、実用的で効率的なソリューションを提供する。私たちのコードとモデルはhttps://github.com/talalwasim/textguidedresilienceでリリースしています。 This paper presents a novel method to enhance the reliability of image classification models during deployment in the face of transient hardware errors. By utilizing enriched text embeddings derived from GPT-3 with question prompts per class and CLIP pretrained text encoder, we investigate their impact as an initialization for the classification layer. Our approach achieves a remarkable $5.5\times$ average increase in hardware reliability (and up to 14x) across various architectures in the most critical layer, with minimal accuracy drop (0.3% on average) compared to baseline PyTorch models. Furthermore, our method seamlessly integrates with any image classification backbone, showcases results across various network architectures, decreases parameter and FLOPs overhead, and follows a consistent training recipe. This research offers a practical and efficient solution to bolster the robustness of image classification models against hardware failures, with potential implications for future studies in this domain. Our code and models are released at https://github.com/TalalWasim/TextGuidedResilience.	翻訳日:2023-11-27 23:18:04 公開日:2023-11-23
# nlpトランスフォーマーを用いた説明可能な戦略テンプレート作成に向けて Towards Explainable Strategy Templates using NLP Transformers ( http://arxiv.org/abs/2311.14061v1 ) ライセンス: Link先を確認	Pallavi Bagga, Kostas Stathis	(参考訳) 本稿では,自動エージェントネゴシエーションにおけるDeep Reinforcement Learning(DRL)から学んだ数学的ヒューリスティック戦略と,理解可能な自然言語説明とのギャップを橋渡しする。我々の目標は、これらの戦略を非専門家にもっとアクセスできるようにすることです。トランスフォーマーを備えた従来の自然言語処理(NLP)技術とLarge Language Models(LLM)を活用して,戦略テンプレートを構成するDRL戦略の一部をユーザフレンドリで人間らしい英語の物語に変換する方法について概説する。これを実現するために、戦略テンプレートの数学的表現解析、変数と構造の意味論的解釈、ルールベースの一次説明の生成、およびこれらの説明を洗練・文脈化するために生成前変換器(GPT)モデルを利用するトップレベルアルゴリズムを提案する。様々なオーディエンスに対するその後のカスタマイズと、厳密な検証プロセスの例は、このアプローチの適用性と可能性を示している。 This paper bridges the gap between mathematical heuristic strategies learned from Deep Reinforcement Learning (DRL) in automated agent negotiation, and comprehensible, natural language explanations. Our aim is to make these strategies more accessible to non-experts. By leveraging traditional Natural Language Processing (NLP) techniques and Large Language Models (LLMs) equipped with Transformers, we outline how parts of DRL strategies composed of parts within strategy templates can be transformed into user-friendly, human-like English narratives. To achieve this, we present a top-level algorithm that involves parsing mathematical expressions of strategy templates, semantically interpreting variables and structures, generating rule-based primary explanations, and utilizing a Generative Pre-trained Transformer (GPT) model to refine and contextualize these explanations. Subsequent customization for varied audiences and meticulous validation processes in an example illustrate the applicability and potential of this approach.	翻訳日:2023-11-27 23:17:44 公開日:2023-11-23
# 多項式時間における木型構造因果モデルの同定 Identification for Tree-shaped Structural Causal Models in Polynomial Time ( http://arxiv.org/abs/2311.14058v1 ) ライセンス: Link先を確認	Aaryan Gupta and Markus Bl\"aser	(参考訳) 線形構造因果モデル(SCM)は、確率変数間の関係を表現・解析するために用いられる。直接因果効果は有向エッジとして表現され、結合因子は両向エッジとして表現される。ノード間の相関から因果パラメータを同定することは、人工知能におけるオープンな問題である。本稿では,木を配向成分とするSCMについて検討する。 Van der Zander et al. (AISTATS'22, PLMR 151, pp. 6770--6792, 2022) は、この場合の同定問題に対する PSPACE-algorithm を与える。本研究では,木形SCMの同定問題を解くランダム化多項式時間アルゴリズムを提案する。すべての構造パラメータに対して、アルゴリズムは、汎用的に識別可能か、ジェネリックで2-識別可能か、ジェネリックで識別不能かを決定する。 (他にはあり得ない。) 最初の2つのケースでは、対応するパラメータに対して多項式の1つまたは2つの分数アフィン平方根項(FASTP)を提供する。 Linear structural causal models (SCMs) are used to express and analyse the relationships between random variables. Direct causal effects are represented as directed edges and confounding factors as bidirected edges. Identifying the causal parameters from correlations between the nodes is an open problem in artificial intelligence. In this paper, we study SCMs whose directed component forms a tree. Van der Zander et al. (AISTATS'22, PLMR 151, pp. 6770--6792, 2022) give a PSPACE-algorithm for the identification problem in this case, which is a significant improvement over the general Gr\"obner basis approach, which has doubly-exponential time complexity in the number of structural parameters. In this work, we present a randomized polynomial-time algorithm, which solves the identification problem for tree-shaped SCMs. For every structural parameter, our algorithms decides whether it is generically identifiable, generically 2-identifiable, or generically unidentifiable. (No other cases can occur.) In the first two cases, it provides one or two fractional affine square root terms of polynomials (FASTPs) for the corresponding parameter, respectively.	翻訳日:2023-11-27 23:17:26 公開日:2023-11-23
# 量子ニューラルネットワークにおけるノイズの影響評価 : 実験的検討 Assessing the Impact of Noise on Quantum Neural Networks: An Experimental Analysis ( http://arxiv.org/abs/2311.14057v1 ) ライセンス: Link先を確認	Erik B. Terres Escudero, Danel Arias Alamo, Oier Mentxaka G\'omez, Pablo Garc\'ia Bringas	(参考訳) 量子コンピューティングへの競争の中で、量子ニューラルネットワーク(QNN)の潜在的な利点はますます明らかになっている。しかし、Noisy Intermediate-Scale Quantum (NISQ)プロセッサはエラーを起こしやすいため、複雑なアルゴリズムや量子機械学習の実行には大きな課題がある。 QNNの品質とセキュリティを確保するためには,ノイズがパフォーマンスに与える影響を検討することが重要である。本稿では,qnnに対する雑音の影響を包括的に解析し,様々なノイズモデルに基づくモットネン状態生成アルゴリズムを調べ,qnnの複数の層を通過する量子状態の劣化について検討する。さらに,事前学習されたqnnの性能に対する雑音の影響を評価し,量子コンピューティングにおけるノイズモデルによる課題を強調する。本研究は,QNNの開発における安定性とノイズ補正の優先順位付けの重要性を強調し,信頼性と信頼性を確保することを目的とする。本稿では,量子コンピューティングと量子機械学習に関する文献の発展に寄与し,ノイズがqnnに与える影響に関する新たな知見を提供し,より堅牢で効率的な量子アルゴリズムの開発への道を開く。 In the race towards quantum computing, the potential benefits of quantum neural networks (QNNs) have become increasingly apparent. However, Noisy Intermediate-Scale Quantum (NISQ) processors are prone to errors, which poses a significant challenge for the execution of complex algorithms or quantum machine learning. To ensure the quality and security of QNNs, it is crucial to explore the impact of noise on their performance. This paper provides a comprehensive analysis of the impact of noise on QNNs, examining the Mottonen state preparation algorithm under various noise models and studying the degradation of quantum states as they pass through multiple layers of QNNs. Additionally, the paper evaluates the effect of noise on the performance of pre-trained QNNs and highlights the challenges posed by noise models in quantum computing. The findings of this study have significant implications for the development of quantum software, emphasizing the importance of prioritizing stability and noise-correction measures when developing QNNs to ensure reliable and trustworthy results. This paper contributes to the growing body of literature on quantum computing and quantum machine learning, providing new insights into the impact of noise on QNNs and paving the way towards the development of more robust and efficient quantum algorithms.	翻訳日:2023-11-27 23:17:04 公開日:2023-11-23
# 株式市場予測のためのニューラルアーキテクチャと特徴の共進化--多目的意思決定の観点から Coevolution of Neural Architectures and Features for Stock Market Forecasting: A Multi-objective Decision Perspective ( http://arxiv.org/abs/2311.14053v1 ) ライセンス: Link先を確認	Faizal Hafiz and Jan Broekaert and Davide La Torre and Akshya Swain	(参考訳) 多目的設定では、ポートフォリオマネージャの極めて連続的な決定は、ストックインデックス運動の代替予測モデルを評価することの恩恵を受けることができる。本研究は、意思決定者によるさらなる選択のための、非支配的なニューラルネットワークモデルセットを特定するための新しいアプローチを提案する。ニューラルネットワークの特徴とトポロジを同時に選択する新たな共進化的手法が提案され、トポロジ的観点から入力ニューロンとみなす。さらに、共進化はスパースで効率的な神経アーキテクチャを進化させるための多重基準問題として提起される。よく知られた支配と分解に基づく多目的進化アルゴリズムは、非幾何学的クロスオーバー演算子で拡張され、矛盾する基準を越えてニューラルネットワークの探索を多様化しバランスをとる。さらに、進行中のcovid-19パンデミック(covid-19)の前後における、異なる市場の行動に関するデータに基づく影響に対応するために、共進化が強化されている。特徴選択の従来の逐次的アプローチとニューラルトポロジー設計,スカラー化共進化アプローチを用いて,詳細な比較評価を行った。 nasdaq index in pre and peri covid time windowsの結果は、提案された共進化アプローチが、より一般化能力のある非支配的な神経予測モデルの集合を進化させることができることを説得力をもって証明している。 In a multi objective setting, a portfolio manager's highly consequential decisions can benefit from assessing alternative forecasting models of stock index movement. The present investigation proposes a new approach to identify a set of nondominated neural network models for further selection by the decision maker. A new coevolution approach is proposed to simultaneously select the features and topology of neural networks (collectively referred to as neural architecture), where the features are viewed from a topological perspective as input neurons. Further, the coevolution is posed as a multicriteria problem to evolve sparse and efficacious neural architectures. The well known dominance and decomposition based multiobjective evolutionary algorithms are augmented with a nongeometric crossover operator to diversify and balance the search for neural architectures across conflicting criteria. Moreover, the coevolution is augmented to accommodate the data based implications of distinct market behaviors prior to and during the ongoing COVID 19 pandemic. A detailed comparative evaluation is carried out with the conventional sequential approach of feature selection followed by neural topology design, as well as a scalarized coevolution approach. The results on the NASDAQ index in pre and peri COVID time windows convincingly demonstrate that the proposed coevolution approach can evolve a set of nondominated neural forecasting models with better generalization capabilities.	翻訳日:2023-11-27 23:16:44 公開日:2023-11-23
# リアルタイム自由出血型心臓磁気共鳴画像における深層学習のセグメンテーションの評価 Assessment of Deep Learning Segmentation for Real-Time Free-Breathing Cardiac Magnetic Resonance Imaging ( http://arxiv.org/abs/2311.14049v1 ) ライセンス: Link先を確認	Martin Schilling and Christina Unterberg-Buchwald and Joachim Lotz and Martin Uecker	(参考訳) 近年、心臓MRI(CMR)セグメンテーションのための様々なディープラーニングネットワークが開発され、分析されている。しかし、ほとんど全員が呼吸中のシネCMRに焦点を当てている。本研究は、安静時および運動負荷時のリアルタイム自由呼吸cmrにおける左室容積分析(セグメンテーション)において、深部学習法の精度を評価した。健常者(n=15)とリアルタイム自由呼吸型CMRのデータを振り返って分析した。商用ソフトウェア(comDL)と利用可能なニューラルネットワーク(nnU-Net)のセグメンテーションを、comDLセグメンテーションのマニュアル修正によって作成されたリファレンスと比較した。左室心内膜(lv)、左室心筋(myo)、右室(rv)のセグメンテーションは、末期収縮期と末期拡張期の両方において評価され、dice係数(dc)を用いて解析された。ボリューム分析は、LV端収縮体積(EDV)、LV端収縮体積(ESV)、LV放出率(EF)を含む。 cine cmr では、nnu-net と comdl は lv が 0.95 以上、myo と rv が 0.9 以上となる。リアルタイムCMRでは, nnU-Net の精度が comDL の精度を上回っている。リアルタイムCMRでは、nnU-NetはLVが0.94、MYOが0.89、RVが0.90、EDVが2.9mL、ESVが3.5mL、EFが2.6%である。運動ストレス下でのリアルタイムCMRでは、nnU-Netは、LVが0.92、MYOが0.85、RVが0.83、EDVが11.4mL、ESVが2.9mL、EFが3.6%である。シネCMRセグメンテーションのために設計または訓練されたディープラーニング手法は、リアルタイムCMRでよく機能する。リアルタイムのフリーブレスCMRでは、ディープラーニングメソッドのパフォーマンスは、cine CMRのサーバ間変動と同等であり、使用可能なか、完全に自動セグメンテーションである。 In recent years, a variety of deep learning networks for cardiac MRI (CMR) segmentation have been developed and analyzed. However, nearly all of them are focused on cine CMR under breathold. In this work, accuracy of deep learning methods is assessed for volumetric analysis (via segmentation) of the left ventricle in real-time free-breathing CMR at rest and under exercise stress. Data from healthy volunteers (n=15) for cine and real-time free-breathing CMR were analyzed retrospectively. Segmentations of a commercial software (comDL) and a freely available neural network (nnU-Net), were compared to a reference created via the manual correction of comDL segmentation. Segmentation of left ventricular endocardium (LV), left ventricular myocardium (MYO), and right ventricle (RV) is evaluated for both end-systolic and end-diastolic phases and analyzed with Dice's coefficient (DC). The volumetric analysis includes LV end-diastolic volume (EDV), LV end-systolic volume (ESV), and LV ejection fraction (EF). For cine CMR, nnU-Net and comDL achieve a DC above 0.95 for LV and 0.9 for MYO, and RV. For real-time CMR, the accuracy of nnU-Net exceeds that of comDL overall. For real-time CMR at rest, nnU-Net achieves a DC of 0.94 for LV, 0.89 for MYO, and 0.90 for RV; mean absolute differences between nnU-Net and reference are 2.9mL for EDV, 3.5mL for ESV and 2.6% for EF. For real-time CMR under exercise stress, nnU-Net achieves a DC of 0.92 for LV, 0.85 for MYO, and 0.83 for RV; mean absolute differences between nnU-Net and reference are 11.4mL for EDV, 2.9mL for ESV and 3.6% for EF. Deep learning methods designed or trained for cine CMR segmentation can perform well on real-time CMR. For real-time free-breathing CMR at rest, the performance of deep learning methods is comparable to inter-observer variability in cine CMR and is usable or fully automatic segmentation.	翻訳日:2023-11-27 23:16:07 公開日:2023-11-23
# 深い)線形ニューラルネットワークにおける重み変動と逆分散平坦性関係の導出 Weight fluctuations in (deep) linear neural networks and a derivation of the inverse-variance flatness relation ( http://arxiv.org/abs/2311.14120v1 ) ライセンス: Link先を確認	Markus Gross, Arne P. Raulf, Christoph R\"ath	(参考訳) 合成ガウスデータに対する確率勾配降下(SGD)の連続限界内における単層および二層線形ニューラルネットワークの定常的(時間的)訓練条件について検討した。弱いオーバーサンプリング状態の単一層ネットワークの場合、ノイズ共分散行列のスペクトルは特にヘシアンから逸脱し、これはSGD力学の詳細なバランスの破れに起因する。この場合、重量変動は一般に異方性であるが、等方性損失を経験する。 2層ネットワークでは,各層内の重みの確率的ダイナミクスを求め,関連する定常共分散の解析を行う。重みのゆらぎに対する新しい異方性源として層間カップリングを同定した。単層の場合とは対照的に、重量変動は異方性損失を経験し、その平坦さは変動分散と逆関係である。そこで我々は,最近観測された線形ネットワークモデルにおける逆分散-平坦性関係の解析的導出を行う。 We investigate the stationary (late-time) training regime of single- and two-layer linear neural networks within the continuum limit of stochastic gradient descent (SGD) for synthetic Gaussian data. In the case of a single-layer network in the weakly oversampled regime, the spectrum of the noise covariance matrix deviates notably from the Hessian, which can be attributed to the broken detailed balance of SGD dynamics. The weight fluctuations are in this case generally anisotropic, but experience an isotropic loss. For a two-layer network, we obtain the stochastic dynamics of the weights in each layer and analyze the associated stationary covariances. We identify the inter-layer coupling as a new source of anisotropy for the weight fluctuations. In contrast to the single-layer case, the weight fluctuations experience an anisotropic loss, the flatness of which is inversely related to the fluctuation variance. We thereby provide an analytical derivation of the recently observed inverse variance-flatness relation in a deep linear network model.	翻訳日:2023-11-27 23:08:59 公開日:2023-11-23
# アルミニウム薄膜の磁気浸透深さ Magnetic penetration depth of Aluminum thin films ( http://arxiv.org/abs/2311.14119v1 ) ライセンス: Link先を確認	David L\'opez-N\'u\~nez, Queralt Portell Montserrat, Gemma Rius, Elia Bertoldo, Alba Torras-Coloma, M. Mart\'inez, P. Forn-D\'iaz	(参考訳) 厚みの異なるアルミニウム薄膜における超伝導透過深さ$\lambda$の研究を行った。選択された厚さの範囲は、薄膜状態からバルクな挙動に近づく状態まで幅広い。観測された浸透深度は、$\lambda = 163.3\pm0.4~\rm{nm}$ から$\lambda = 53.6\pm0.4~\rm{nm}$ 200~\rm{nm}$-thick までの幅である。我々は,$\lambda$を正確に判定するために,超伝導$LC$共振器の周波数と常圧メランダの抵抗を用いて相補的な測定を行った。どちらの方法も同等の結果をもたらし、量子コンピューティングやマイクロ波放射検出器技術といった分野の応用に適切な範囲のアルミニウムに$\lambda$の値を与える。 We present a study of the superconducting penetration depth $\lambda$ in aluminum thin films of varying thickness. The range of thicknesses chosen spans from the thin-film regime to the regime approaching bulk behavior. The penetration depths observed range from $\lambda = 163.3\pm0.4~\rm{nm}$ for the thinnest $20~\rm{nm}$ samples down to $\lambda = 53.6\pm0.4~\rm{nm}$ for the $200~\rm{nm}$-thick ones. In order to accurately determine $\lambda$, we performed complementary measurements using the frequency of superconducting $LC$ resonators as well as the resistance of normal-state meanders. Both methods yield comparable results, providing a well-characterized set of values of $\lambda$ in aluminum in the relevant range for applications in fields such as quantum computing and microwave radiation detector technologies.	翻訳日:2023-11-27 23:08:43 公開日:2023-11-23
# 対人嗜好からの学習における密度推定の視点 A density estimation perspective on learning from pairwise human preferences ( http://arxiv.org/abs/2311.14115v1 ) ライセンス: Link先を確認	Vincent Dumoulin, Daniel D. Johnson, Pablo Samuel Castro, Hugo Larochelle, Yann Dauphin	(参考訳) 人間からのフィードバック(LHF)から学ぶこと、特にペアの好みから学ぶことは、最近、大きな言語モデル(LLM)のトレーニングにおいて重要な要素となり、多くの研究の対象となっている。最近の研究は、報酬関数がペアの選好データから学習され、LLMは報酬を最大化するためのポリシーとして扱われ、しばしば追加の正規化制約の下で扱われる強化学習問題である。本稿では,一対選好のための生成過程を中心とし,lhfを密度推定問題として扱う代替解釈を提案する。選好行動分布方程式によって定義される生成過程の族に対して、ペアワイズ選好の報奨関数を訓練することで、アノテーションの暗黙の選好分布を効果的にモデル化できることを理論的および実証的に示す。最後に,「注釈的誤特定」について考察し,その知見を提示する。アノテーション的行動について誤ったモデリングの仮定がなされた場合,その結果,不適応なモデルが生まれる場合,ペアで人間の選好から学ぶアプローチは,多様な視点を持つ注釈者集団から学ぶのに困難をもたらす可能性があることを示唆する。 Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in training large language models (LLMs), and has been the subject of much research. Most recent works frame it as a reinforcement learning problem, where a reward function is learned from pairwise preference data and the LLM is treated as a policy which is adapted to maximize the rewards, often under additional regularization constraints. We propose an alternative interpretation which centers on the generative process for pairwise preferences and treats LHF as a density estimation problem. We provide theoretical and empirical results showing that for a family of generative processes defined via preference behavior distribution equations, training a reward function on pairwise preferences effectively models an annotator's implicit preference distribution. Finally, we discuss and present findings on "annotator misspecification" -- failure cases where wrong modeling assumptions are made about annotator behavior, resulting in poorly-adapted models -- suggesting that approaches that learn from pairwise human preferences could have trouble learning from a population of annotators with diverse viewpoints.	翻訳日:2023-11-27 23:08:30 公開日:2023-11-23
# SySMOL:超低・微細混合精度ニューラルネットワークのためのハードウェア・ソフトウェア共同設計フレームワーク SySMOL: A Hardware-software Co-design Framework for Ultra-Low and Fine-Grained Mixed-Precision Neural Networks ( http://arxiv.org/abs/2311.14114v1 ) ライセンス: Link先を確認	Cyrus Zhou, Vaughn Richard, Pedro Savarese, Zachary Hassman, Michael Maire, Michael DiBrino, Yanjing Li	(参考訳) 近年の量子化と混合精度技術の発展は、ニューラルネットワークの実行時間とエネルギー効率を改善するための大きな約束となる。本研究では,個々のパラメータやアクティベーションが1ビットから4ビットの間で異なる精度を発揮できるニューラルネットワークが,全精度と同等あるいはそれ以上の精度を実現できることを示した。しかしながら、これらのネットワークの展開は、各データに対する非常に細かい混合精度に関連する計算/通信/ストレージ要件の管理と制御の必要性から、多くの課題を生んでいる。これらのユニークで困難な要件に合わせて、既存の効率的なハードウェアとシステムレベルのサポートがない。本研究は,ハードウェア設計,トレーニング,推論間の継続的なフィードバックループを実現し,系統的な設計探索を容易にする,新たなハードウェア・ソフトウェア共同設計手法を提案する。概念実証として、これらのネットワークに適した新しい構成可能なCPU SIMDアーキテクチャを設計し、アーキテクチャを新しいシステム認識トレーニングと推論技術と密に統合することで、この共同設計のアプローチを説明する。このフレームワークを用いて,様々なトレードオフを解析するシステム設計空間探索を行う。 The design for mixed-precision networks that achieves optimized tradeoffs corresponds to an architecture that supports 1, 2, and 4-bit fixed-point operations with four configurable precision patterns, when coupled with system-aware training and inference optimization -- networks trained for this design achieve accuracies that closely match full-precision accuracies, while compressing and improving run-time efficiency of the neural networks drastically by 10-20x, compared to full-precision networks. Recent advancements in quantization and mixed-precision techniques offer significant promise for improving the run-time and energy efficiency of neural networks. In this work, we further showed that neural networks, wherein individual parameters or activations can take on different precisions ranging between 1 and 4 bits, can achieve accuracies comparable to or exceeding the full-precision counterparts. However, the deployment of such networks poses numerous challenges, stemming from the necessity to manage and control the compute/communication/storage requirements associated with these extremely fine-grained mixed precisions for each piece of data. There is a lack of existing efficient hardware and system-level support tailored to these unique and challenging requirements. Our research introduces the first novel holistic hardware-software co-design approach for these networks, which enables a continuous feedback loop between hardware design, training, and inference to facilitate systematic design exploration. As a proof-of-concept, we illustrate this co-design approach by designing new, configurable CPU SIMD architectures tailored for these networks, tightly integrating the architecture with new system-aware training and inference techniques. We perform systematic design space exploration using this framework to analyze various tradeoffs. The design for mixed-precision networks that achieves optimized tradeoffs corresponds to an architecture that supports 1, 2, and 4-bit fixed-point operations with four configurable precision patterns, when coupled with system-aware training and inference optimization -- networks trained for this design achieve accuracies that closely match full-precision accuracies, while compressing and improving run-time efficiency of the neural networks drastically by 10-20x, compared to full-precision networks.	翻訳日:2023-11-27 23:08:06 公開日:2023-11-23
# 円錐空間上の強文脈的単純分布のホモトピカル解析 Homotopical characterization of strongly contextual simplicial distributions on cone spaces ( http://arxiv.org/abs/2311.14111v1 ) ライセンス: Link先を確認	Aziz Kharoof, Cihan Okay	(参考訳) 本稿では,2次結果を持つ強文脈的単純分布,特に1次元空間の円錐上で定義されるものについて,新しいホモトピー的特徴を与える。せん断理論の枠組みでは、これらの分布は、各文脈が2つの測定結果を含むような測定シナリオ上の非シグナリング分布に対応している。結果の確立には,計測空間の崩壊を含むホモトピー的アプローチを採用し,強力な文脈性を検出するための単純分布に関連するカテゴリを導入する。 This paper offers a novel homotopical characterization of strongly contextual simplicial distributions with binary outcomes, specifically those defined on the cone of a 1-dimensional space. In the sheaf-theoretic framework, such distributions correspond to non-signaling distributions on measurement scenarios where each context contains 2 measurements with binary outcomes. To establish our results, we employ a homotopical approach that includes collapsing measurement spaces and introduce categories associated with simplicial distributions that can detect strong contextuality.	翻訳日:2023-11-27 23:07:40 公開日:2023-11-23
# オフポリティ評価はいつ有効か? データ中心の視点 When is Off-Policy Evaluation Useful? A Data-Centric Perspective ( http://arxiv.org/abs/2311.14110v1 ) ライセンス: Link先を確認	Hao Sun, Alex J. Chan, Nabeel Seedat, Alihan H\"uy\"uk, Mihaela van der Schaar	(参考訳) ログ化されたデータセットだけで仮説的ターゲットポリシーの価値を評価することは重要だが、難しい。一方で、臨床ガイドラインのような高リスクシナリオの下で、安全な政策改善の機会をもたらす。一方、このような機会は、正確な非政治評価(OPE)の必要性を高める。 OPEに関する以前の研究は、価値推定におけるアルゴリズムの改善に重点を置いていたが、この研究ではオフラインデータセットの重要性を強調し、OPE問題を評価するためのデータ中心のフレームワークを推進した。我々は、データ中心のope評価フレームワークであるdatacopeを提案し、データセットが与えられた場合、ターゲットポリシーをどの程度評価できるかという疑問に答える。データCOPE(1)は,OPE評価が不可能な実世界展開前において特に有用である環境へのアクセスのないOPEアルゴリズム全体の性能を予測し,(2)OPEが不正確なデータセット内のサブグループを特定し,(3)OPE問題に対するデータセットの評価やデータ収集戦略を許可する。医療データセットを用いたログ化された文脈的帯域設定におけるDataCOPEの実証分析により、臨床ガイドラインのような機械学習と人的専門家のポリシーを評価する能力が確認された。 Evaluating the value of a hypothetical target policy with only a logged dataset is important but challenging. On the one hand, it brings opportunities for safe policy improvement under high-stakes scenarios like clinical guidelines. On the other hand, such opportunities raise a need for precise off-policy evaluation (OPE). While previous work on OPE focused on improving the algorithm in value estimation, in this work, we emphasize the importance of the offline dataset, hence putting forward a data-centric framework for evaluating OPE problems. We propose DataCOPE, a data-centric framework for evaluating OPE, that answers the questions of whether and to what extent we can evaluate a target policy given a dataset. DataCOPE (1) forecasts the overall performance of OPE algorithms without access to the environment, which is especially useful before real-world deployment where evaluating OPE is impossible; (2) identifies the sub-group in the dataset where OPE can be inaccurate; (3) permits evaluations of datasets or data-collection strategies for OPE problems. Our empirical analysis of DataCOPE in the logged contextual bandit settings using healthcare datasets confirms its ability to evaluate both machine-learning and human expert policies like clinical guidelines.	翻訳日:2023-11-27 23:07:31 公開日:2023-11-23
# 大規模モデルに適合する小型マルチモーダル推論モデルのパワー向上と自己一貫性トレーニング Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training ( http://arxiv.org/abs/2311.14109v1 ) ライセンス: Link先を確認	Cheng Tan, Jingxuan Wei, Zhangyang Gao, Linzhuang Sun, Siyuan Li, Xihong Yang, Stan Z. Li	(参考訳) マルチモーダル推論(multimodal reasoning)は、複数のモーダルをまたいだモデルによる質問に答える難しいタスクである。既存のアプローチでは、言語と視覚のモダリティを2段階の推論フレームワークに組み込むことで、応答推論から合理的生成を分離する。しかし、これらのアプローチは、しばしば生成された合理性の不十分な品質のために不足する。この研究では、モデル推論における理性の重要性を掘り下げる。理論が完全に正確である場合、モデルの精度が大幅に向上し、高品質な論理生成の必要性が強調される。 MC-CoTは,複数の合理性と回答を生成する自己整合性学習戦略であり,投票プロセスを通じて最も正確なものを選択する。このアプローチは、生成された合理性の品質を高めるだけでなく、より正確で堅牢な答えをもたらす。広範な実験を通じて,本手法は様々なベンチマークにおけるモデル性能を著しく向上させることを示した。注目すべきことに,提案手法を応用すれば,より小さなベースモデルであっても,より大規模なモデルに匹敵する結果が得られることが示され,マルチモーダル推論の改善に合理性のパワーを活用できる可能性が示唆された。コードはhttps://github.com/chengtan9907/mc-cotで入手できる。 Multimodal reasoning is a challenging task that requires models to reason across multiple modalities to answer questions. Existing approaches have made progress by incorporating language and visual modalities into a two-stage reasoning framework, separating rationale generation from answer inference. However, these approaches often fall short due to the inadequate quality of the generated rationales. In this work, we delve into the importance of rationales in model reasoning. We observe that when rationales are completely accurate, the model's accuracy significantly improves, highlighting the need for high-quality rationale generation. Motivated by this, we propose MC-CoT, a self-consistency training strategy that generates multiple rationales and answers, subsequently selecting the most accurate through a voting process. This approach not only enhances the quality of generated rationales but also leads to more accurate and robust answers. Through extensive experiments, we demonstrate that our approach significantly improves model performance across various benchmarks. Remarkably, we show that even smaller base models, when equipped with our proposed approach, can achieve results comparable to those of larger models, illustrating the potential of our approach in harnessing the power of rationales for improved multimodal reasoning. The code is available at https://github.com/chengtan9907/mc-cot.	翻訳日:2023-11-27 23:07:10 公開日:2023-11-23
# minty: 価値のない機能を暗示する必要性を最小限にするルールベースのモデル MINTY: Rule-based Models that Minimize the Need for Imputing Features with Missing Values ( http://arxiv.org/abs/2311.14108v1 ) ライセンス: Link先を確認	Lena Stempfle and Fredrik D. Johansson	(参考訳) ルールモデルは、自然言語を使って容易に解釈でき、より複雑なモデルと同等の予測性能を提供するため、表形式の入力を持つ予測タスクでしばしば好まれる。しかし、ほとんどのルールモデルの予測は、いくつかの入力が欠けている場合、定義されていないか曖昧であり、ユーザーは統計的インプテーションモデルやゼロインプテーションのようなヒューリスティックに依存し、モデルの解釈可能性を損なう。本稿では,値の欠如を回避し,テスト時のインプテーションへの依存度を制限することを学ぶ,簡潔で正確なルールモデルを適用することを提案する。 MINTYは,各変数間の解離という形でルールを学習する手法で,各変数が1つ以上の欠落時に相互の代替として機能する。これにより、不適合性、解釈性、テスト時の欠落値に対する堅牢性の間のトレードオフを可能にするために、値の欠如した特徴への依存が小さいように規則化されたスパース線形規則モデルが実現される。本研究では,合成および実世界のデータセットを用いた実験におけるmintyの価値を実証し,その予測性能がベースラインに匹敵するか好適であるかを見出した。 Rule models are often preferred in prediction tasks with tabular inputs as they can be easily interpreted using natural language and provide predictive performance on par with more complex models. However, most rule models' predictions are undefined or ambiguous when some inputs are missing, forcing users to rely on statistical imputation models or heuristics like zero imputation, undermining the interpretability of the models. In this work, we propose fitting concise yet precise rule models that learn to avoid relying on features with missing values and, therefore, limit their reliance on imputation at test time. We develop MINTY, a method that learns rules in the form of disjunctions between variables that act as replacements for each other when one or more is missing. This results in a sparse linear rule model, regularized to have small dependence on features with missing values, that allows a trade-off between goodness of fit, interpretability, and robustness to missing values at test time. We demonstrate the value of MINTY in experiments using synthetic and real-world data sets and find its predictive performance comparable or favorable to baselines, with smaller reliance on features with missing values.	翻訳日:2023-11-27 23:06:48 公開日:2023-11-23
# モード最適化型ハイブリッドCPU-GPU密度行列再正規化法による2次元量子格子モデル Two dimensional quantum lattice models via mode optimized hybrid CPU-GPU density matrix renormalization group method ( http://arxiv.org/abs/2311.14106v1 ) ライセンス: Link先を確認	Andor Menczer, Korn\'el Kap\'as, Mikl\'os Antal Werner, and \"Ors Legeza	(参考訳) 本稿では,2次元量子格子モデル上の量子多体問題を非可換ab型密度行列再正規化群法でシミュレートするハイブリッド数値的手法を提案する。本稿では,2次元スピンレスフェルミオンモデルとトーラス幾何学上のハバードモデルについて,最適化した計算とハイブリッドcpu-マルチgpu並列化を用いて計算時間を何桁も節約できることを実証する。少なくとも1桁の計算複雑性の減少はモード最適化によるものであり、さらに大きな並列化によって壁時間の減少が達成される。結果はFLOPと秒で直接測定される。得られた性能を行列ランクの関数として,および12\times 12$格子トポロジーまでのシステムサイズ関数として詳細なスケーリング解析を行った。 We present a hybrid numerical approach to simulate quantum many body problems on two spatial dimensional quantum lattice models via the non-Abelian ab initio version of the density matrix renormalization group method on state-of-the-art high performance computing infrastructures. We demonstrate for the two dimensional spinless fermion model and for the Hubbard model on torus geometry that altogether several orders of magnitude in computational time can be saved by performing calculations on an optimized basis and by utilizing hybrid CPU-multiGPU parallelization. At least an order of magnitude reduction in computational complexity results from mode optimization, while a further order of reduction in wall time is achieved by massive parallelization. Our results are measured directly in FLOP and seconds. A detailed scaling analysis of the obtained performance as a function of matrix ranks and as a function of system size up to $12\times 12$ lattice topology is discussed.	翻訳日:2023-11-27 23:06:26 公開日:2023-11-23
# カオス系シミュレーションのためのハイブリッド量子古典型貯留層計算 Hybrid quantum-classical reservoir computing for simulating chaotic systems ( http://arxiv.org/abs/2311.14105v1 ) ライセンス: Link先を確認	Filip Wudarski, Daniel O`Connor, Shaun Geaney, Ata Akbari Asanjan, Max Wilson, Elena Strbac, P. Aaron Lott, and Davide Venturelli	(参考訳) カオスシステムの予測は特に複雑なタスクであり、近年、システムの時空間情報を抽出するのに使われる固定ランダム重み(貯水池)のリカレントネットワークであるリザーバコンピューティング(rc)を用いて合理的に成功している。この研究は、RCの貯水池を量子回路に置き換える、ハイブリッド量子貯水池計算(HQRC)フレームワークを提案する。回路のモジュラー構造と測定フィードバックは、貯水池状態における複雑な系のダイナミクスを符号化するために使用され、そこから古典的学習を行い、将来のダイナミクスを予測する。 HQRCのノイズレスシミュレーションは、ロレンツ63とダブルスクロールカオスのパラダイムシステムの両方の最先端の古典的RCモデルに匹敵する有効な予測時間を示し、予測が真実から逸脱してからずっと後のアトラクタダイナミクスに固執する。 Forecasting chaotic systems is a notably complex task, which in recent years has been approached with reasonable success using reservoir computing (RC), a recurrent network with fixed random weights (the reservoir) used to extract the spatio-temporal information of the system. This work presents a hybrid quantum reservoir-computing (HQRC) framework, which replaces the reservoir in RC with a quantum circuit. The modular structure and measurement feedback in the circuit are used to encode the complex system dynamics in the reservoir states, from which classical learning is performed to predict future dynamics. The noiseless simulations of HQRC demonstrate valid prediction times comparable to state-of-the-art classical RC models for both the Lorenz63 and double-scroll chaotic paradigmatic systems and adhere to the attractor dynamics long after the forecasts have deviated from the ground truth.	翻訳日:2023-11-27 23:06:11 公開日:2023-11-23
# 単一光子による量子鍵分布のクロック解析と回復 Single-Photon-Based Clock Analysis and Recovery in Quantum Key Distribution ( http://arxiv.org/abs/2311.14104v1 ) ライセンス: Link先を確認	Mujtaba Zahidy, Domenico Ribezzo, Ronny M\"uller, Jasper Riebesehl, Alessandro Zavatta, Michael Galili, Leif Katsuo Oxenl{\o}we, Davide Bacco	(参考訳) 量子鍵分布は、市場に向けて準備された最初の量子技術の一つである。現在の量子通信システムは通常、送信機(alice)と受信機(bob)の同期にサービスチャネルを使用する。しかし、このサービスチャネルを除去し、クロックリカバリ手法を利用する可能性は、ファイバーリンクとフリースペースリンクの両方において将来の実装に興味深い。本稿では,量子通信シナリオにおけるクロック回収の基準について検討し,タイムビン量子鍵分散プロトコルにおける量子クロック回収システムの利用可能性について実験的に検証した。クロックリカバリ技術の性能は、量子ビット誤り率と秘密鍵レートの点で、クロック共有のためのサービスチャネルの使用と同等である。 Quantum key distribution is one of the first quantum technologies ready for the market. Current quantum telecommunication systems usually utilize a service channel for synchronizing the transmitter (Alice) and the receiver (Bob). However, the possibility of removing this service channel and exploiting a clock recovery method is intriguing for future implementation, both in fiber and free-space links. In this paper, we investigate criteria to recover the clock in a quantum communication scenario, and experimentally demonstrated the possibility of using a quantum-based clock recovery system in a time-bin quantum key distribution protocol. The performance of the clock recovery technique, in terms of quantum bit error rate and secret key rate, is equivalent to using the service channel for clock sharing.	翻訳日:2023-11-27 23:05:53 公開日:2023-11-23
# サブネットワークアンサンブル Subnetwork Ensembles ( http://arxiv.org/abs/2311.14101v1 ) ライセンス: Link先を確認	Tim Whitaker	(参考訳) ニューラルネットワークアンサンブルは、独立に訓練された複数のモデルの予測を組み合わせることで、一般化を改善するために効果的に使用されている。しかし、ディープニューラルネットワークのスケールと複雑さが増大し、これらの手法は禁止的に高価になり、実装に時間がかかる。複数のモデルをスクラッチからトレーニングする必要性を軽減しつつ、従来のアンサンブル学習方法が備える一般化のメリットを保ちながら、低コストなアンサンブル手法がますます重要になっている。この論文は、訓練された親モデルからサブネットワークをサンプリング、摂動、最適化することで子ネットワークの集合を形成するサブネットワークアンサンブルを構築するための低コストなフレームワークを紹介し、定式化する。児童ネットワーク生成のための異なる手法を探索し、様々なアブレーション研究と確立されたベンチマークを通じてその有効性を評価する。その結果,この手法は計算コストを最小化しつつ,トレーニング効率,パラメトリック利用,一般化性能を大幅に向上できることがわかった。 Subnetwork Ensemblesは、ディープニューラルネットワークの非現実的なポテンシャルを活用することによって、よりよいシステムを構築するための魅力的なフレームワークを提供する。 Neural network ensembles have been effectively used to improve generalization by combining the predictions of multiple independently trained models. However, the growing scale and complexity of deep neural networks have led to these methods becoming prohibitively expensive and time consuming to implement. Low-cost ensemble methods have become increasingly important as they can alleviate the need to train multiple models from scratch while retaining the generalization benefits that traditional ensemble learning methods afford. This dissertation introduces and formalizes a low-cost framework for constructing Subnetwork Ensembles, where a collection of child networks are formed by sampling, perturbing, and optimizing subnetworks from a trained parent model. We explore several distinct methodologies for generating child networks and we evaluate their efficacy through a variety of ablation studies and established benchmarks. Our findings reveal that this approach can greatly improve training efficiency, parametric utilization, and generalization performance while minimizing computational cost. Subnetwork Ensembles offer a compelling framework for exploring how we can build better systems by leveraging the unrealized potential of deep neural networks.	翻訳日:2023-11-27 23:05:41 公開日:2023-11-23
# ACT: 敵対的一貫性モデル ACT: Adversarial Consistency Models ( http://arxiv.org/abs/2311.14097v1 ) ライセンス: Link先を確認	Fei Kong, Jinhao Duan, Lichao Sun, Hao Cheng, Renjing Xu, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu	(参考訳) 拡散モデルは画像生成に優れているが、ステップバイステップのデノージングは生成速度を遅くする。一貫性トレーニングは、単一ステップサンプリングでこの問題に対処するが、しばしば低品質世代を生成し、高いトレーニングコストを必要とする。本稿では,目標分布と生成分布との間のwasserstein距離を最小化する一貫性トレーニング損失の最適化について述べる。時間ステップが増加すると、上限は以前の一貫性トレーニング損失を蓄積する。そのため、電流と累積損失を減らすために、より大きなバッチサイズが必要となる。本稿では,判別器を用いて,各時刻における分布間のJensen-Shannon(JS)ばらつきを極力最小化するAdversarial Consistency Training(ACT)を提案する。理論的には、ACTは生成品質と収束を高める。一貫性トレーニングフレームワークに識別器を組み込むことにより、cifar10とimagenet 64$\times$64のfidスコアを改善し、ゼロショット画像の塗り込み能力を保持し、元のバッチサイズが1/6ドル以下で、モデルパラメータとトレーニングステップがベースラインメソッドと比較して1/2$以下となることにより、リソース消費量が大幅に削減される。 Though diffusion models excel in image generation, their step-by-step denoising leads to slow generation speeds. Consistency training addresses this issue with single-step sampling but often produces lower-quality generations and requires high training costs. In this paper, we show that optimizing consistency training loss minimizes the Wasserstein distance between target and generated distributions. As timestep increases, the upper bound accumulates previous consistency training losses. Therefore, larger batch sizes are needed to reduce both current and accumulated losses. We propose Adversarial Consistency Training (ACT), which directly minimizes the Jensen-Shannon (JS) divergence between distributions at each timestep using a discriminator. Theoretically, ACT enhances generation quality, and convergence. By incorporating a discriminator into the consistency training framework, our method achieves improved FID scores on CIFAR10 and ImageNet 64$\times$64, retains zero-shot image inpainting capabilities, and uses less than $1/6$ of the original batch size and fewer than $1/2$ of the model parameters and training steps compared to the baseline method, this leads to a substantial reduction in resource consumption.	翻訳日:2023-11-27 23:05:20 公開日:2023-11-23
# LLMにおける文化的バイアスの監査と緩和 Auditing and Mitigating Cultural Bias in LLMs ( http://arxiv.org/abs/2311.14096v1 ) ライセンス: Link先を確認	Yan Tao, Olga Viberg, Ryan S. Baker, Rene F. Kizilcec	(参考訳) 文化は人々の推論、行動、コミュニケーションを根本的に形作る。生成人工知能(AI)技術は、支配的な文化へと移行する可能性がある。人々がAIを使って、さまざまな専門的および個人的タスクを迅速かつ自動化するにつれて、AIモデルに埋め込まれた文化的価値は、真の表現をバイアスする可能性がある。我々は,文化バイアスに対する大規模言語モデルの監査を行い,その回答を全国的に代表される調査データと比較し,国別行動の促進を緩和戦略として評価した。 GPT-4,3.5,3は、英語やプロテスタントのヨーロッパ諸国に似た文化的価値を示す。我々の緩和戦略は、近年のモデルでは文化バイアスを減少させるが、すべての国や地域ではそうではない。生成AIの文化的偏見を回避するため,文化マッチングと継続的な文化監査を併用することを提案する。 Culture fundamentally shapes people's reasoning, behavior, and communication. Generative artificial intelligence (AI) technologies may cause a shift towards a dominant culture. As people increasingly use AI to expedite and even automate various professional and personal tasks, cultural values embedded in AI models may bias authentic expression. We audit large language models for cultural bias, comparing their responses to nationally representative survey data, and evaluate country-specific prompting as a mitigation strategy. We find that GPT-4, 3.5 and 3 exhibit cultural values resembling English-speaking and Protestant European countries. Our mitigation strategy reduces cultural bias in recent models but not for all countries/territories. To avoid cultural bias in generative AI, especially in high-stakes contexts, we suggest using culture matching and ongoing cultural audits.	翻訳日:2023-11-27 23:04:57 公開日:2023-11-23
# GANを用いたビデオ異常検出 Video Anomaly Detection using GAN ( http://arxiv.org/abs/2311.14095v1 ) ライセンス: Link先を確認	Anikeit Sethi, Krishanu Saini and Sai Mounika Mididoddi	(参考訳) 公衆の安全に対する懸念が高まる中で,監視場面における異常の自動検出と認識が重要である。その複雑さと実用性から、現在オープンな研究対象となっている。異常な出来事を自動的に識別することは、異常という考え方が異なるため、難しい作業です。ある状況での典型的な発生は、別の状況では異常と見なすことができる。人混みや閉塞度が高いため, 群集による監視映像では, 自動的異常識別が特に困難となる。機械学習技術を利用することで、この論文は、このユースケースに対する解決策を提供することを目的としています。我々は,新しい生成型逆向ネットワーク(gan)ベースの異常検出モデルを開発した。このモデルは、高次元画像空間の構築と、映像の文脈から潜在空間の決定について一緒に学ぶように訓練される。このジェネレータは、マルチステージチャネルアテンションベースのデコーダと、空間データと時間データの両方を実現できる2ストリームの深層畳み込みエンコーダからなる残余のオートエンコーダアーキテクチャを使用する。また,データセット間の移動学習を活用してモデルを一般化しながら,トレーニング時間を短縮するGANモデルを精錬する手法も提案している。様々な評価尺度を用いて,4つのベンチマークデータセットにおける現在の最先端技術と比較した。実験の結果,既存の手法と比較して,ネットワークはすべてのデータセットで良好に動作していることがわかった。 Accounting for the increased concern for public safety, automatic abnormal event detection and recognition in a surveillance scene is crucial. It is a current open study subject because of its intricacy and utility. The identification of aberrant events automatically, it's a difficult undertaking because everyone's idea of abnormality is different. A typical occurrence in one circumstance could be seen as aberrant in another. Automatic anomaly identification becomes particularly challenging in the surveillance footage with a large crowd due to congestion and high occlusion. With the use of machine learning techniques, this thesis study aims to offer the solution for this use case so that human resources won't be required to keep an eye out for any unusual activity in the surveillance system records. We have developed a novel generative adversarial network (GAN) based anomaly detection model. This model is trained such that it learns together about constructing a high dimensional picture space and determining the latent space from the video's context. The generator uses a residual Autoencoder architecture made up of a multi-stage channel attention-based decoder and a two-stream, deep convolutional encoder that can realise both spatial and temporal data. We have also offered a technique for refining the GAN model that reduces training time while also generalising the model by utilising transfer learning between datasets. Using a variety of assessment measures, we compare our model to the current state-of-the-art techniques on four benchmark datasets. The empirical findings indicate that, in comparison to existing techniques, our network performs favourably on all datasets.	翻訳日:2023-11-27 23:04:44 公開日:2023-11-23
# 2次情報を用いたロバスト決定集約 Robust Decision Aggregation with Second-order Information ( http://arxiv.org/abs/2311.14094v1 ) ライセンス: Link先を確認	Yuqi Pan, Zhaohua Chen, Yuqing Kong	(参考訳) 我々は,未知のバイナリ世界状態に関するプライベートシグナルを観察した後に,それぞれ二進的推薦を行う2人の専門家による意思決定集約問題を考察する。信号と状態の合同情報構造を知らないエージェントは、専門家の勧告を見て、アクションを実際の状態と一致させることを目指している。本シナリオでは,2次情報(各専門家の推薦予測)を補足することで,より優れた集計が可能かどうかを検討する。我々は,複合的な情報構造を知っている全知的なベンチマークと比較することにより,アグリゲータのパフォーマンスを評価するために,minimax regretフレームワークを採用する。一般的な情報構造では、二階情報には利益がないことを示す。簡単なアグリゲータよりも改善できるアグリゲータは存在しない。しかし、専門家の信号が世界状態から条件的に独立していると仮定すると、ポジティブな結果が得られる。本稿では,アグリゲータが決定論的である場合,第2次情報を活用するロバストアグリゲータを提案する。第2に、信号に非退化仮定を加えることによって、2次情報を用いたランダムアグリゲータが、それなしで最適なアグリゲータを超越できることを実証する。残りの設定では、2階情報は有益ではない。また、アグリゲータのユーティリティ関数がより一般的な場合に、上記の結果を設定に拡張する。 We consider a decision aggregation problem with two experts who each make a binary recommendation after observing a private signal about an unknown binary world state. An agent, who does not know the joint information structure between signals and states, sees the experts' recommendations and aims to match the action with the true state. Under the scenario, we study whether supplemented additionally with second-order information (each expert's forecast on the other's recommendation) could enable a better aggregation. We adopt a minimax regret framework to evaluate the aggregator's performance, by comparing it to an omniscient benchmark that knows the joint information structure. With general information structures, we show that second-order information provides no benefit. No aggregator can improve over a trivial aggregator, which always follows the first expert's recommendation. However, positive results emerge when we assume experts' signals are conditionally independent given the world state. When the aggregator is deterministic, we present a robust aggregator that leverages second-order information, which can significantly outperform counterparts without it. Second, when two experts are homogeneous, by adding a non-degenerate assumption on the signals, we demonstrate that random aggregators using second-order information can surpass optimal ones without it. In the remaining settings, the second-order information is not beneficial. We also extend the above results to the setting when the aggregator's utility function is more general.	翻訳日:2023-11-27 23:04:21 公開日:2023-11-23
# PortfolioMentor: インタラクティブデジタルアートポートフォリオの学習と製作のためのマルチモーダル生成AIコンパニオン PortfolioMentor: Multimodal Generative AI Companion for Learning and Crafting Interactive Digital Art Portfolios ( http://arxiv.org/abs/2311.14091v1 ) ライセンス: Link先を確認	Tao Long, Weirui Peng	(参考訳) デジタルアートのポートフォリオは、アーティストが自分のビジョンを伝え、視覚、オーディオ、対話、物語を織り上げるためのインパクトのある媒体となる。しかし、技術的背景がなければ、美術学校における非技術的、学術的支援のための調整されたリソースの欠如や、精神的に要求されるプロセスを通じて包括的な指導ツールが欠如していることを考えると、創造的なアイデアを具体的なコードやデザインに翻訳することは困難である。コード学習におけるコンパニオンの役割を認識し,創造的なタスクを支援するための生成AIモデルの能力を活用して,IDE用のコーディングコンパニオンチャットボットであるPortfolioMentorを紹介する。このツールは、学習、インスピレーション、サポートのための積極的な提案と責任あるq&aを通じて学生を指導し、協力する。このシステムは、タスクとアーティストのビジョンの理解から始まり、視覚的なイラスト、オーディオまたは音楽の提案とファイル、対話のためのクリック・スクロール効果、創造的な視覚概念化の共創に従い、最終的にこれらのファセットを洗練されたデジタルポートフォリオに合成する。 Digital art portfolios serve as impactful mediums for artists to convey their visions, weaving together visuals, audio, interactions, and narratives. However, without technical backgrounds, design students often find it challenging to translate creative ideas into tangible codes and designs, given the lack of tailored resources for the non-technical, academic support in art schools, and a comprehensive guiding tool throughout the mentally demanding process. Recognizing the role of companionship in code learning and leveraging generative AI models' capabilities in supporting creative tasks, we present PortfolioMentor, a coding companion chatbot for IDEs. This tool guides and collaborates with students through proactive suggestions and responsible Q&As for learning, inspiration, and support. In detail, the system starts with the understanding of the task and artist's visions, follows the co-creation of visual illustrations, audio or music suggestions and files, click-scroll effects for interactions, and creative vision conceptualization, and finally synthesizes these facets into a polished interactive digital portfolio.	翻訳日:2023-11-27 23:03:57 公開日:2023-11-23
# クラス不確実性:クラス不均衡の緩和方策 Class Uncertainty: A Measure to Mitigate Class Imbalance ( http://arxiv.org/abs/2311.14090v1 ) ライセンス: Link先を確認	Z. S. Baltaci, K. Oksuz, S. Kuzucu, K. Tezoren, B. K. Konar, A. Ozkan, E. Akbas, S. Kalkan	(参考訳) 訓練例のクラスワイド特性は深層分類器の性能に影響を及ぼす。良く研究された例は、クラスのトレーニング例の数が長い尾の分布に従うときであり、この状況は、表現不足なクラスに対して最適でないパフォーマンスをもたらす可能性がある。このクラス不均衡問題は、データ再サンプリングのようなトレーニング例のクラスワイドの濃度に依存するアプローチによって解決される。本稿では,クラス濃度のみを考慮すれば,クラス不均衡の原因となる問題をすべてカバーできるわけではないことを実証する。クラス不均衡を測定するために,訓練例の平均予測不確実性として「クラス不確実性」を提案し,この新手法が濃度よりもクラス間の差異を捉えていることを示す。また, SVCI-20は, クラスが同じ数のトレーニングサンプルを持つが, それらの硬さによって異なる新しいデータセットとしてキュレートし, 基数に依存するアプローチでは対応できないクラス不均衡を生じさせる。当社の"クラス不確実性"尺度を10種類のクラス不均衡緩和手法に組み込んで,ロングテールデータセットとsvci-20上での有効性を実証した。コードとデータセットが利用可能になる。 Class-wise characteristics of training examples affect the performance of deep classifiers. A well-studied example is when the number of training examples of classes follows a long-tailed distribution, a situation that is likely to yield sub-optimal performance for under-represented classes. This class imbalance problem is conventionally addressed by approaches relying on the class-wise cardinality of training examples, such as data resampling. In this paper, we demonstrate that considering solely the cardinality of classes does not cover all issues causing class imbalance. To measure class imbalance, we propose "Class Uncertainty" as the average predictive uncertainty of the training examples, and we show that this novel measure captures the differences across classes better than cardinality. We also curate SVCI-20 as a novel dataset in which the classes have equal number of training examples but they differ in terms of their hardness; thereby causing a type of class imbalance which cannot be addressed by the approaches relying on cardinality. We incorporate our "Class Uncertainty" measure into a diverse set of ten class imbalance mitigation methods to demonstrate its effectiveness on long-tailed datasets as well as on our SVCI-20. Code and datasets will be made available.	翻訳日:2023-11-27 23:03:35 公開日:2023-11-23
# 自然言語における質問応答 : 時間表現の特例 Question Answering in Natural Language: the Special Case of Temporal Expressions ( http://arxiv.org/abs/2311.14087v1 ) ライセンス: Link先を確認	Armand Stricker	(参考訳) 近年,一般的な質問応答はよく研究されているが,時間的質問応答はそれほど注目されていない課題である。本研究の目的は,一般質問応答や回答抽出に使用される一般的なアプローチを活用し,段落内の時間的質問に対する回答を見つけることにある。モデルをトレーニングするために、SQuADにインスパイアされた新しいデータセットを提案する。我々は歴史の最大の紛争に関するいくつかの文書を含むコーパスウィキワーズの採用を選択した。評価の結果,テキスト内で直接回答しなければならない質問を受理した場合,一般的な質問応答によく使用されるパターンマッチングを行うように訓練されたディープラーニングモデルが,時間的質問応答に適応できることが示されている。 Although general question answering has been well explored in recent years, temporal question answering is a task which has not received as much focus. Our work aims to leverage a popular approach used for general question answering, answer extraction, in order to find answers to temporal questions within a paragraph. To train our model, we propose a new dataset, inspired by SQuAD, specifically tailored to provide rich temporal information. We chose to adapt the corpus WikiWars, which contains several documents on history's greatest conflicts. Our evaluation shows that a deep learning model trained to perform pattern matching, often used in general question answering, can be adapted to temporal question answering, if we accept to ask questions whose answers must be directly present within a text.	翻訳日:2023-11-27 23:03:13 公開日:2023-11-23
# フェデレーション学習による脳MRIスクリーニングツール Brain MRI Screening Tool with Federated Learning ( http://arxiv.org/abs/2311.14086v1 ) ライセンス: Link先を確認	Roman Stoklasa, Ioannis Stathopoulos, Efstratios Karavasilis, Efstathios Efstathopoulos, Marek Dost\'al, Milo\v{s} Ke\v{r}kovsk\'y, Michal Kozubek, Luigi Serio	(参考訳) 臨床では,重症例においてもmri検査と放射線科医による診断との間に有意な遅延がみられた。場合によっては、追加情報や手がかりの欠如によって引き起こされる場合もあるため、重篤なケースでさえ診断待ちに待たなければならない。これは、追加情報を補う自動ソフトウェアツールがあれば回避でき、特定の患者が重篤なケースである可能性があると放射線科医に警告する。我々は,脳MRI自動スクリーニングツールを提示し,腫瘍様の病態を検出する能力を実証している。これは、堅牢なマルチ病理スクリーニングソリューションに向けた最初のバージョンである。このツールは連合学習をサポートするので、複数の機関がプライベートデータを開示することなくモデルに貢献することができる。 In clinical practice, we often see significant delays between MRI scans and the diagnosis made by radiologists, even for severe cases. In some cases, this may be caused by the lack of additional information and clues, so even the severe cases need to wait in the queue for diagnosis. This can be avoided if there is an automatic software tool, which would supplement additional information, alerting radiologists that the particular patient may be a severe case. We are presenting an automatic brain MRI Screening Tool and we are demonstrating its capabilities for detecting tumor-like pathologies. It is the first version on the path toward a robust multi-pathology screening solution. The tool supports Federated Learning, so multiple institutions may contribute to the model without disclosing their private data.	翻訳日:2023-11-27 23:03:01 公開日:2023-11-23
# テキスト画像検索に可視的関連バイアスをもたらすai生成画像 AI-Generated Images Introduce Invisible Relevance Bias to Text-Image Retrieval ( http://arxiv.org/abs/2311.14084v1 ) ライセンス: Link先を確認	Shicheng Xu, Danyang Hou, Liang Pang, Jingcheng Deng, Jun Xu, Huawei Shen, Xueqi Cheng	(参考訳) 世代モデルの発展に伴い、AIGC(AI- generated content)がより現実的になり、インターネットが溢れている。最近の研究は、この現象がウェブ検索のテキスト検索におけるソースバイアスの問題を増加させたことを示唆している。具体的には、ニューラル検索モデルは、人間が書いたテキストよりも高いテキストをランク付けする傾向にある。本稿では,このバイアスの研究をクロスモーダル検索に拡張する。まず,バイアスの存在を調べるための適切なベンチマークの構築に成功しました。このベンチマークのさらなる実験により、AI生成画像はテキスト画像検索モデルに目に見えない関連性バイアスをもたらすことが明らかになった。具体的には,テキスト画像検索モデルが,実際の画像よりも視覚的に関連した特徴を提示していないにもかかわらず,実際の画像よりもai生成画像を上位にランク付けする傾向があることを示す。この目に見えない関連性バイアスは、トレーニングデータやアーキテクチャの異なる検索モデルに共通している。さらに, 検索モデルの学習データにai生成画像が組み込まれることにより, 可視性バイアスが悪化することが明らかとなった。上記の現象は悪循環を引き起こし、目に見えない関連性バイアスがますます深刻になる。見えない関連性の潜在的原因を解明し、上記の問題に対処するために、目に見えない関連性バイアスを緩和するための効果的なトレーニング手法を提案する。次に,提案手法を適用して,視覚的関連性の原因を遡及的に同定し,AI生成画像が画像エンコーダを誘導し,その表現に付加情報を埋め込むことを示した。この情報は、異なる意味を持つ生成された画像間で一定の一貫性を示し、レトリバーが高い関連性スコアを推定することができる。 With the advancement of generation models, AI-generated content (AIGC) is becoming more realistic, flooding the Internet. A recent study suggests that this phenomenon has elevated the issue of source bias in text retrieval for web searches. Specifically, neural retrieval models tend to rank generated texts higher than human-written texts. In this paper, we extend the study of this bias to cross-modal retrieval. Firstly, we successfully construct a suitable benchmark to explore the existence of the bias. Subsequent extensive experiments on this benchmark reveal that AI-generated images introduce an invisible relevance bias to text-image retrieval models. Specifically, our experiments show that text-image retrieval models tend to rank the AI-generated images higher than the real images, even though the AI-generated images do not exhibit more visually relevant features to the query than real images. This invisible relevance bias is prevalent across retrieval models with varying training data and architectures. Furthermore, our subsequent exploration reveals that the inclusion of AI-generated images in the training data of the retrieval models exacerbates the invisible relevance bias. The above phenomenon triggers a vicious cycle, which makes the invisible relevance bias become more and more serious. To elucidate the potential causes of invisible relevance and address the aforementioned issues, we introduce an effective training method aimed at alleviating the invisible relevance bias. Subsequently, we apply our proposed debiasing method to retroactively identify the causes of invisible relevance, revealing that the AI-generated images induce the image encoder to embed additional information into their representation. This information exhibits a certain consistency across generated images with different semantics and can make the retriever estimate a higher relevance score.	翻訳日:2023-11-27 23:02:49 公開日:2023-11-23
# ファジィビット The fuzzy bit ( http://arxiv.org/abs/2311.14083v1 ) ライセンス: Link先を確認	Milagrosa Aldana, Mar\'ia A. Lled\'o	(参考訳) 本稿では,ファジィ論理とファジィ集合の観点から量子力学の定式化について述べる。ヒルベルト空間の射影子の格子であるバーホフ・ヴォン・ノイマン論理(英語版)(Birkhoff-von Neumann logic)に、(量子)論理(ある性質を持つ格子)とファジィ集合のある種の族との対応を確立するピカツの結果が適用される。 3つのケース: 量子ビット、2つの量子ビットが絡み合っており、2つのエンタングルされた量子ビットの中に「ネスト」がある。ファジィ集合の会員関数は明示的に計算され、ファジィ集合のすべての接続は、これらの特定の会員関数の操作として解釈される。このようにして、考慮されたシステムに対してファジィ集合の観点から標準量子論理の完全な図を得る。 In this paper, the formulation of Quantum Mechanics in terms of fuzzy logic and fuzzy sets is explored. A result by Pykacz, that establishes a correspondence between (quantum) logics (lattices with certain properties) and certain families of fuzzy sets, is applied to the Birkhoff-von Neumann logic, the lattice of projectors of a Hilbert space. Three cases are considered: the qubit, two qubits entangled and a qutrit `nested' inside the two entangled qubits. The membership functions of the fuzzy sets are explicitly computed and all the connectives of the fuzzy sets are interpreted as operations with these particular membership functions. In this way, a complete picture of the standard quantum logic in terms of fuzzy sets is obtained for the systems considered.	翻訳日:2023-11-27 23:02:24 公開日:2023-11-23
# 一度だけ説明すると You Only Explain Once ( http://arxiv.org/abs/2311.14081v1 ) ライセンス: Link先を確認	David A. Kelly, Hana Chockler, Daniel Kroening, Nathan Blake, Aditi Ramaswamy, Melane Navaratnarajah, Aaditya Shivakumar	(参考訳) 本稿では,対象検出器の出力を効率的に説明するための新しいブラックボックス説明可能性アルゴリズム, YO-ReXを提案する。新しいアルゴリズムは、画像で検出されたすべてのオブジェクトに対する説明を同時に計算する。したがって,新しいアルゴリズムは,ベースラインと比較して,検出対象が10個ある場合のクエリ数を10倍に削減する。スピードアップは、オブジェクトの数によってさらに増加する。実験の結果,YOLOの走行時間に対して,YOLOの出力を無視できるオーバーヘッドで説明できることがわかった。また、SSDとFaster R-CNNについても同様の結果を示す。この高速化は、アグレッシブプルーニングと因果解析を組み合わせることで、バックトラックを回避することで達成される。 In this paper, we propose a new black-box explainability algorithm and tool, YO-ReX, for efficient explanation of the outputs of object detectors. The new algorithm computes explanations for all objects detected in the image simultaneously. Hence, compared to the baseline, the new algorithm reduces the number of queries by a factor of 10X for the case of ten detected objects. The speedup increases further with with the number of objects. Our experimental results demonstrate that YO-ReX can explain the outputs of YOLO with a negligible overhead over the running time of YOLO. We also demonstrate similar results for explaining SSD and Faster R-CNN. The speedup is achieved by avoiding backtracking by combining aggressive pruning with a causal analysis.	翻訳日:2023-11-27 23:02:07 公開日:2023-11-23
# GigaPose: 1つの対応による高速でロバストな新しいオブジェクトポス推定 GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence ( http://arxiv.org/abs/2311.14155v1 ) ライセンス: Link先を確認	Van Nguyen Nguyen, Thibault Groueix, Mathieu Salzmann, Vincent Lepetit	(参考訳) 本稿では,RGB画像におけるCADに基づく新しいオブジェクトポーズ推定手法であるGigaPoseを提案する。 gigaposeはまず識別テンプレート、cadモデルのレンダリング画像を活用し、面外回転を回復し、残りの4つのパラメータをパッチ対応で推定する。提案手法では,通常の3倍ではなく,2自由度でのみテンプレートをサンプリングし,特徴空間の高速近傍探索を用いて入力画像とテンプレートをマッチングすることにより,最先端技術と比較して38倍の高速化率が得られる。さらに、GigaPoseはセグメンテーションエラーに対してはるかに堅牢である。 BOPチャレンジの7つのコアデータセットに対する広範な評価は、最先端の精度を実現し、改良手法とシームレスに統合できることを示しています。さらに,1枚の画像から3次元再構成を行い,CADモデルの必要性を緩和し、6次元ポーズオブジェクト推定をより便利にするための3次元モデルによるGigaPoseの可能性を示す。私たちのソースコードとトレーニングされたモデルはhttps://github.com/nv-nguyen/gigaPoseで公開されています。 We present GigaPose, a fast, robust, and accurate method for CAD-based novel object pose estimation in RGB images. GigaPose first leverages discriminative templates, rendered images of the CAD models, to recover the out-of-plane rotation and then uses patch correspondences to estimate the four remaining parameters. Our approach samples templates in only a two-degrees-of-freedom space instead of the usual three and matches the input image to the templates using fast nearest neighbor search in feature space, results in a speedup factor of 38x compared to the state of the art. Moreover, GigaPose is significantly more robust to segmentation errors. Our extensive evaluation on the seven core datasets of the BOP challenge demonstrates that it achieves state-of-the-art accuracy and can be seamlessly integrated with a refinement method. Additionally, we show the potential of GigaPose with 3D models predicted by recent work on 3D reconstruction from a single image, relaxing the need for CAD models and making 6D pose object estimation much more convenient. Our source code and trained models are publicly available at https://github.com/nv-nguyen/gigaPose	翻訳日:2023-11-27 16:42:19 公開日:2023-11-23
# Tube-NeRF:Tube-Guided Data AugmentationとNeRFを用いたMPCからのVisuomotor Policiesの効率的な模倣学習 Tube-NeRF: Efficient Imitation Learning of Visuomotor Policies from MPC using Tube-Guided Data Augmentation and NeRFs ( http://arxiv.org/abs/2311.14153v1 ) ライセンス: Link先を確認	Andrea Tagliabue, Jonathan P. How	(参考訳) 模倣学習(il)は、リソース集約型モデル予測コントローラ(mpc)から計算効率の高いセンサモジュレータポリシをトレーニングできるが、多くのサンプルを必要とするため、長いトレーニング時間や限定的な堅牢性が求められる。これらの問題に対処するために,il と不確実性を考慮したロバストな mpc の変種を組み合わせることで,視覚に基づくポリシの効率的な学習を可能にするデータ拡張 (da) 戦略を設計する。提案手法はneural radiance field (nerfs) を利用して新しい合成画像を生成し、ロバストなmpc(チューブ)の特性を利用して関連するビューを選択し、対応するアクションを効率的に計算する。搭載カメラからの映像を水平位置のみのソースとして制御動作を生成するビジュモータポリシーを学習することにより、マルチロータ上での局所化と軌道追跡のタスクに対する我々のアプローチを調整する。実演効率を80倍に向上し,現行のIL法に比べて50%のトレーニング時間を短縮し,ロバストなビズモータ政策の学習を数値的に示す。さらに,本手法は実マルチロケータへの移行に成功し,大きな乱れがあっても正確なローカライズと低トラッキングエラーを実現し,オンボード推算時間は1.5msであった。 Imitation learning (IL) can train computationally-efficient sensorimotor policies from a resource-intensive Model Predictive Controller (MPC), but it often requires many samples, leading to long training times or limited robustness. To address these issues, we combine IL with a variant of robust MPC that accounts for process and sensing uncertainties, and we design a data augmentation (DA) strategy that enables efficient learning of vision-based policies. The proposed DA method, named Tube-NeRF, leverages Neural Radiance Fields (NeRFs) to generate novel synthetic images, and uses properties of the robust MPC (the tube) to select relevant views and to efficiently compute the corresponding actions. We tailor our approach to the task of localization and trajectory tracking on a multirotor, by learning a visuomotor policy that generates control actions using images from the onboard camera as only source of horizontal position. Our evaluations numerically demonstrate learning of a robust visuomotor policy with an 80-fold increase in demonstration efficiency and a 50% reduction in training time over current IL methods. Additionally, our policies successfully transfer to a real multirotor, achieving accurate localization and low tracking errors despite large disturbances, with an onboard inference time of only 1.5 ms.	翻訳日:2023-11-27 16:41:53 公開日:2023-11-23
# 経時的立方体PatchGAN(TCuP-GAN)を用いた3次元腫瘍切除 Automated 3D Tumor Segmentation using Temporal Cubic PatchGAN (TCuP-GAN) ( http://arxiv.org/abs/2311.14148v1 ) ライセンス: Link先を確認	Kameswara Bharadwaj Mantha, Ramanakumar Sankar, Lucy Fortson	(参考訳) 最新の深層学習技術を用いた堅牢な汎用3Dセグメンテーションフレームワークの開発は,様々なバイオメディカル領域において活発な話題の1つである。本研究では,3次元セグメンテーションの課題に対して,畳み込み長短期記憶ネットワーク(LSTM)を用いた生成的特徴学習フレームワークの概念をマージするボリューム・ツー・ボリューム翻訳モデルであるテンポラルキュービック・パッチGAN(TCuP-GAN)を紹介する。われわれは2023年のBrain tumor Segmentation (BraTS) Challengeで紹介された4つのセグメンテーション課題(Adult Glioma, Meningioma, Pediatric tumors, Sub-Saharan Africa subset)のデータに基づいてTCuP-GANの能力を実証し、LesionWise Dice類似度と95%のHausdorff Distance測定値を用いてその性能を定量化する。我々は,すべての課題に対してロバストなマルチクラスセグメンテーションマスクを予測するためのフレームワークの学習を成功させた。このベンチマーク作業は、将来TCuP-GANを電子顕微鏡イメージングにおける多臓器分割のような他のマルチクラスタスクに適用するための足掛かりとなる。 Development of robust general purpose 3D segmentation frameworks using the latest deep learning techniques is one of the active topics in various bio-medical domains. In this work, we introduce Temporal Cubic PatchGAN (TCuP-GAN), a volume-to-volume translational model that marries the concepts of a generative feature learning framework with Convolutional Long Short-Term Memory Networks (LSTMs), for the task of 3D segmentation. We demonstrate the capabilities of our TCuP-GAN on the data from four segmentation challenges (Adult Glioma, Meningioma, Pediatric Tumors, and Sub-Saharan Africa subset) featured within the 2023 Brain Tumor Segmentation (BraTS) Challenge and quantify its performance using LesionWise Dice similarity and $95\%$ Hausdorff Distance metrics. We demonstrate the successful learning of our framework to predict robust multi-class segmentation masks across all the challenges. This benchmarking work serves as a stepping stone for future efforts towards applying TCuP-GAN on other multi-class tasks such as multi-organelle segmentation in electron microscopy imaging.	翻訳日:2023-11-27 16:41:25 公開日:2023-11-23
# アクティブラーニングを用いたドメイン適応意味セグメンテーションのためのクラスバランス動的獲得 Class Balanced Dynamic Acquisition for Domain Adaptive Semantic Segmentation using Active Learning ( http://arxiv.org/abs/2311.14146v1 ) ライセンス: Link先を確認	Marc Schachtsiek and Simone Rossi and Thomas Hannagan	(参考訳) ドメイン適応型アクティブラーニングは、ニューラルネットワークのラベル効率の良いトレーニングの責任者である。セマンティクスのセグメンテーションでは、最先端のモデルは2つの不確実性と多様性の基準を使ってトレーニングラベルを選択し、ピクセル単位での獲得戦略を組み合わせる。しかし,このような手法は現在,大規模アクティブな学習予算に対する成績を低下させるクラス不均衡問題に苦しんでいる。次に,この問題を特に高予算環境において軽減する新しいアクティブラーニング手法であるcbda(class balanced dynamic acquisition)を導入する。よりバランスの取れたラベルによってマイノリティクラスのパフォーマンスが向上し、それによってモデルはそれぞれ5%、10%、20%の予算で以前のベースラインを0.6、1.7、2.4miouで上回ることができる。さらに、マイノリティクラスへのフォーカスは、それぞれ0.5、2.9、および4.6 IoUの最小クラスパフォーマンスの改善につながる。トップパフォーマンスモデルは、完全に教師されたベースラインを超え、地上の真実全体よりもバランスのとれたラベルが有益であることを示す。 Domain adaptive active learning is leading the charge in label-efficient training of neural networks. For semantic segmentation, state-of-the-art models jointly use two criteria of uncertainty and diversity to select training labels, combined with a pixel-wise acquisition strategy. However, we show that such methods currently suffer from a class imbalance issue which degrades their performance for larger active learning budgets. We then introduce Class Balanced Dynamic Acquisition (CBDA), a novel active learning method that mitigates this issue, especially in high-budget regimes. The more balanced labels increase minority class performance, which in turn allows the model to outperform the previous baseline by 0.6, 1.7, and 2.4 mIoU for budgets of 5%, 10%, and 20%, respectively. Additionally, the focus on minority classes leads to improvements of the minimum class performance of 0.5, 2.9, and 4.6 IoU respectively. The top-performing model even exceeds the fully supervised baseline, showing that a more balanced label than the entire ground truth can be beneficial.	翻訳日:2023-11-27 16:41:00 公開日:2023-11-23
# 半導体中の2次元水素原子に対する宇宙弦の影響とRytova-Keldysh対数近似との関係 Cosmic string influence on a 2D hydrogen atom and its relationship with the Rytova-Keldysh logarithmic approximation in semiconductors ( http://arxiv.org/abs/2311.14144v1 ) ライセンス: Link先を確認	Frankbelson dos S. Azevedo, Izael A. Lima, Gallileu Genesis, Rodolfo Casana, Edilberto O. Silva	(参考訳) 二次元水素原子は、ストレートな宇宙線の存在下で電子と陽子の間の量子相互作用を記述するための有望な代替手段を提供する。水素原子を2次元に減らすことは、宇宙の弦に付随する円筒/円錐対称性を捉え、より適切な物理系の記述を与えるのに適している。 schr\"dinger's equation を解いた後、位相的欠陥の影響下で対数ポテンシャルを持つ水素原子の固有エネルギー、確率分布関数、および期待値を計算する。有限差分法を用いて、2次元水素原子の計算を初めて行う。結果はグラフィックス、テーブル、図を通して示され、システムの物理的特性を解明する。計算結果が線形変分法の結果と一致することを確認した。本モデルは, 特定の半導体領域内に位置する2次元単層半導体において, 励起子と興味深い類似性をもたらす。この類似性を解明するために,いくつかの相互作用ポテンシャルと励起子固有状態について文献から得られた結果と比較し,議論する。 A two-dimensional hydrogen atom offers a promising alternative for describing the quantum interaction between an electron and a proton in the presence of a straight cosmic string. Reducing the hydrogen atom to two dimensions enhances its suited to capture the cylindrical/conical symmetry associated with the cosmic string, providing a more appropriate description of the physical system. After solving Schr\"dinger's equation, we calculate the eigenenergies, probability distribution function, and expected values for the hydrogen atom with logarithmic potential under the influence of the topological defect. The calculations for the 2D hydrogen atom are performed for the first time using the Finite Difference Method. The results are presented through graphics, tables, and diagrams to elucidate the system's physical properties. We have verified that our calculations agree with a linear variational method result. Our model leads to an interesting analogy with excitons in a two-dimensional monolayer semiconductor located within a specific semiconductor region. To elucidate this analogy, we present and discuss some interaction potentials and their exciton eigenstates by comparing them with the results from the literature.	翻訳日:2023-11-27 16:40:42 公開日:2023-11-23
# 量子コンピュータにおける粗粒タンパク質折り畳み問題の解法 An approach to solve the coarse-grained Protein folding problem in a Quantum Computer ( http://arxiv.org/abs/2311.14141v1 ) ライセンス: Link先を確認	Jaya Vasavi P, Soham Bopardikar, Avinash D, Ashwini K, Kalyan Dasgupta, Sanjib Senapati	(参考訳) タンパク質の折り畳みは、アミノ酸配列からタンパク質の構造を決定するもので、生物学の半世紀前の問題である。タンパク質の機能はその構造と相関し、生体内で起こる細胞や分子のメカニズムを研究するためにタンパク質の折りたたみを理解する必要性を強調する。タンパク質の構造と酵素の理解は、標的ベースの薬物設計、タンパク質関連疾患機構の解明、新規酵素の革新において重要な役割を担っている。 AIに基づくタンパク質構造予測法の最近の進歩はタンパク質の折り畳み問題をある程度解決しているが、タンパク質の構造を低い配列類似性で決定する精度は限られている。古典的手法は広範囲なコンフォメーションサンプリングの生成において困難に直面しており、量子ベースのアプローチはタンパク質折り畳み問題を解くのに有利である。本研究では,hpモデルを初期枠組みとして,より小さなタンパク質配列の構造を予測するためのゲート型量子コンピュータ上で実行可能な,新たなターンベースのエンコーディングアルゴリズムを開発した。 HPモデルはタンパク質の折り畳み現象における大きなステップであり、疎水性アミノ酸をタンパク質の内部にもたらす疎水性崩壊である。折り畳み問題は、直交軸に平行な縁に沿って自由度を持つ3次元立方体格子、および軸方向平面に平行な対角線に沿って鋳造される。高次項のオリジナルの定式化はゲートベースの量子ハードウェアで実行できるが、QUBOの定式化は、アンニールとIBM CPLEXと量子ハードウェアの両方の古典的なソフトウェアに対して結果を与えることができる。 Protein folding, which dictates the protein structure from its amino acid sequence, is half a century old problem of biology. The function of the protein correlates with its structure, emphasizing the need of understanding protein folding for studying the cellular and molecular mechanisms that occur within biological systems. Understanding protein structures and enzymes plays a critical role in target based drug designing, elucidating protein-related disease mechanisms, and innovating novel enzymes. While recent advancements in AI based protein structure prediction methods have solved the protein folding problem to an extent, their precision in determining the structure of the protein with low sequence similarity is limited. Classical methods face challenges in generating extensive conformational samplings, making quantum-based approaches advantageous for solving protein folding problems. In this work we developed a novel turn based encoding algorithm that can be run on a gate based quantum computer for predicting the structure of smaller protein sequences using the HP model as an initial framework, which can be extrapolated in its application to larger and more intricate protein systems in future. The HP model best represents a major step in protein folding phenomena - the hydrophobic collapse which brings the hydrophobic amino acid to the interior of a protein. The folding problem is cast in a 3D cubic lattice with degrees of freedom along edges parallel to the orthogonal axes, as well as along diagonals parallel to the axial planes. While, the original formulation with higher order terms can be run on gate based quantum hardwares, the QUBO formulation can give results on both classical softwares employing annealers and IBM CPLEX as well as quantum hardwares.	翻訳日:2023-11-27 16:40:25 公開日:2023-11-23
# 医療保険の費用予測のための機械学習 Machine Learning For An Explainable Cost Prediction of Medical Insurance ( http://arxiv.org/abs/2311.14139v1 ) ライセンス: Link先を確認	Ugochukwu Orji and Elochukwu Ukwandu	(参考訳) 医療における予測モデリングは、生産性と効率を高めるために機械学習アプローチの可能性を最大化しようとする保険会社が増えているため、活動的な研究テーマであり続けている。本稿では, 医療保険コストの予測のために, 高度勾配ブースティング, 勾配ブースティングマシン, ランダムフォレストによる決定木の変動を組み合わせる回帰型アンサンブルmlモデルを3つ導入した。説明可能な人工知能手法 SHapley Additive exPlanationsと個人条件予測プロットを配置し、データセットにおける医療保険のプレミアム価格に影響を与える重要な要因を発見し説明した。使用されるデータセットは986レコードで構成され、KAGGLEリポジトリで公開されている。 r-squared, 平均絶対誤差, 根平均二乗誤差, 平均絶対パーセンテージ誤差の4つの性能評価指標を用いて評価を行った。結果,xgboostモデルでは計算資源が増加したが,rfモデルでは予測誤差が小さく,xgboostモデルに比べて計算資源がはるかに少ないという結果が得られた。さらに,各モデルのPremiumPricesに影響を及ぼす重要な決定的特徴を同定する上で,両者のXAi手法の結果を比較し,両者が類似した結果を得たのに対し,ICEプロットはより高レベルなSHAP解析よりも,各変数間の相互作用をより詳細に示した。本研究は, 政策立案者, 保険会社, 潜在的な医療保険購入者が, 特定のニーズを満たす適切な政策を選択するための意思決定プロセスを支援することを目的としている。 Predictive modeling in healthcare continues to be an active actuarial research topic as more insurance companies aim to maximize the potential of Machine Learning approaches to increase their productivity and efficiency. In this paper, the authors deployed three regression-based ensemble ML models that combine variations of decision trees through Extreme Gradient Boosting, Gradient-boosting Machine, and Random Forest) methods in predicting medical insurance costs. Explainable Artificial Intelligence methods SHapley Additive exPlanations and Individual Conditional Expectation plots were deployed to discover and explain the key determinant factors that influence medical insurance premium prices in the dataset. The dataset used comprised 986 records and is publicly available in the KAGGLE repository. The models were evaluated using four performance evaluation metrics, including R-squared, Mean Absolute Error, Root Mean Squared Error, and Mean Absolute Percentage Error. The results show that all models produced impressive outcomes; however, the XGBoost model achieved a better overall performance although it also expanded more computational resources, while the RF model recorded a lesser prediction error and consumed far fewer computing resources than the XGBoost model. Furthermore, we compared the outcome of both XAi methods in identifying the key determinant features that influenced the PremiumPrices for each model and whereas both XAi methods produced similar outcomes, we found that the ICE plots showed in more detail the interactions between each variable than the SHAP analysis which seemed to be more high-level. It is the aim of the authors that the contributions of this study will help policymakers, insurers, and potential medical insurance buyers in their decision-making process for selecting the right policies that meet their specific needs.	翻訳日:2023-11-27 16:39:54 公開日:2023-11-23
# プライバシー保護アルゴリズム Privacy-Preserving Algorithmic Recourse ( http://arxiv.org/abs/2311.14137v1 ) ライセンス: Link先を確認	Sikha Pentyala, Shubham Sharma, Sanjay Kariyappa, Freddy Lecue, Daniele Magazzeni	(参考訳) 個人が機械学習モデルから有害な結果を受ける場合、ポジティブな結果を達成するためのリコースパスの提供が望ましい。最近の研究は、反事実的説明(一段階の会話の手段として使用できる)がプライバシー問題に弱いことを示し、個人のプライバシーを危険に晒している。リコースのためのシーケンシャルなマルチステップパスを提供することで、このリスクを増幅することができる。さらに、既存の手法から得られた経路にノイズを加えるだけで、エンドユーザーにとって経路の現実性と実行可能性に影響を与える可能性がある。本研究では, 実例に基づく現実的なリコースパスを生成する際のプライバシ問題に対処し, 現実的なリコースパスを提供できるエンドツーエンドのプライバシ保護パイプラインであるPrivRecourseを提供する。 PrivRecourseは、プライベートデータセットの重複しないサブセットを表現するために、差分プライベート(DP)クラスタリングを使用する。これらのDPクラスタセンターは、クラスタセンターをノードとしてグラフを形成することで、リコースパスを生成するために使用される。金融データセットに対する我々のアプローチを実証的に評価し、それを単にデータインスタンスにノイズを加えること、DP合成データを用いてグラフを生成することと比較した。 PrivRecourseはプライベートでリアルなパスを提供することができる。 When individuals are subject to adverse outcomes from machine learning models, providing a recourse path to help achieve a positive outcome is desirable. Recent work has shown that counterfactual explanations - which can be used as a means of single-step recourse - are vulnerable to privacy issues, putting an individuals' privacy at risk. Providing a sequential multi-step path for recourse can amplify this risk. Furthermore, simply adding noise to recourse paths found from existing methods can impact the realism and actionability of the path for an end-user. In this work, we address privacy issues when generating realistic recourse paths based on instance-based counterfactual explanations, and provide PrivRecourse: an end-to-end privacy preserving pipeline that can provide realistic recourse paths. PrivRecourse uses differentially private (DP) clustering to represent non-overlapping subsets of the private dataset. These DP cluster centers are then used to generate recourse paths by forming a graph with cluster centers as the nodes, so that we can generate realistic - feasible and actionable - recourse paths. We empirically evaluate our approach on finance datasets and compare it to simply adding noise to data instances, and to using DP synthetic data, to generate the graph. We observe that PrivRecourse can provide paths that are private and realistic.	翻訳日:2023-11-27 16:39:24 公開日:2023-11-23
# IoT上での協調機械学習のためのブロックチェーンソリューション A Blockchain Solution for Collaborative Machine Learning over IoT ( http://arxiv.org/abs/2311.14136v1 ) ライセンス: Link先を確認	Carlos Beis-Penedo and Francisco Troncoso-Pastoriza and Rebeca P. D\'iaz-Redondo and Ana Fern\'andez-Vilas and Manuel Fern\'andez-Veiga and Mart\'in Gonz\'alez Soto	(参考訳) iot(internet of things, モノのインターネット)デバイスとアプリケーションの急速な成長は、データのプライバシ、セキュリティ、スケーラビリティに関わる課題を処理可能な高度な分析と機械学習技術に対する需要の増加につながった。フェデレートラーニング(FL)とブロックチェーン技術は、分散データソース上での分散型、セキュア、プライバシ保護モデルトレーニングを可能にすることによって、これらの課題に対処するための有望なアプローチとして登場した。本稿では,学習ベクトル量子化アルゴリズム(XuILVQ)とEthereumブロックチェーン技術を組み合わせて,セキュアで効率的なデータ共有,モデルトレーニング,分散環境におけるプロトタイプストレージを実現する,新たなIoTソリューションを提案する。提案アーキテクチャは,データプライバシとセキュリティを維持しながら,計算オーバーヘッドと通信オーバーヘッドを削減することにより,既存のブロックチェーンベースのFLソリューションの欠点に対処する。我々は,iot環境における機械学習タスクの精度と効率を向上させる可能性を示す一連の実験を通じて,システムの性能を評価する。 The rapid growth of Internet of Things (IoT) devices and applications has led to an increased demand for advanced analytics and machine learning techniques capable of handling the challenges associated with data privacy, security, and scalability. Federated learning (FL) and blockchain technologies have emerged as promising approaches to address these challenges by enabling decentralized, secure, and privacy-preserving model training on distributed data sources. In this paper, we present a novel IoT solution that combines the incremental learning vector quantization algorithm (XuILVQ) with Ethereum blockchain technology to facilitate secure and efficient data sharing, model training, and prototype storage in a distributed environment. Our proposed architecture addresses the shortcomings of existing blockchain-based FL solutions by reducing computational and communication overheads while maintaining data privacy and security. We assess the performance of our system through a series of experiments, showcasing its potential to enhance the accuracy and efficiency of machine learning tasks in IoT settings.	翻訳日:2023-11-27 16:39:04 公開日:2023-11-23
# 準隠れ分子自由度を用いた量子計算のためのロバストな枠組み A robust framework for quantum computation using quasi-hidden molecular degrees of freedom ( http://arxiv.org/abs/2311.14133v1 ) ライセンス: Link先を確認	Martin Zeppenfeld	(参考訳) 本稿では,分子の環境や他の環境から分離された自由度に基づく分子による量子情報処理の新たなアプローチについて論じる。このような自由度は、ノイズの多い環境でも長期量子ストレージを提供し、他の分子と外部システムの間で量子操作が行われる間、独立した保護された量子メモリを提供する。分子内の準隠れ自由度を実現するいくつかの可能性を示し、そのような自由度を実際に利用するいくつかの例について論じる。準隠れ自由度を使うことは、分子ベースの量子コンピュータの展望を大幅に改善することができる。 We discuss a novel approach to quantum information processing with molecules based on molecular degrees of freedom which are isolated from the environment as well as from the rest of the molecule. Such a degree of freedom can provide long-term quantum storage even in a noisy environment, and provides an independent protected quantum memory while quantum operations are performed between the rest of the molecule and external systems. We present several possibilities for realizing a quasi-hidden degree of freedom in a molecule, and discuss a number of examples for using such a degree of freedom in practice. Using quasi-hidden degrees of freedom could substantially improve the prospects for a molecule-based quantum computer.	翻訳日:2023-11-27 16:38:47 公開日:2023-11-23
# 物理インフォームドニューラルネットワークと動的システムのためのディープ・オペレーター・ネットワーク Exactly conservative physics-informed neural networks and deep operator networks for dynamical systems ( http://arxiv.org/abs/2311.14131v1 ) ライセンス: Link先を確認	Elsa Cardoso-Bihlo and Alex Bihlo	(参考訳) 本稿では,自然力学系に対する厳密な物理不定形ニューラルネットワークと物理不定形深層作用素ネットワークの訓練手法を提案する。この方法は、ニューラルネットワークソルバが学習した候補解を、少なくとも1つの第一積分を持つ任意の力学系に対して不変多様体にマッピングするプロジェクションベースの手法を用いる。物理インフォームドニューラルネットワークと物理インフォームドな動的システムのためのディープ・オペレーター・ネットワークは、数理科学から現実のいくつかの問題に対する非保守的問題を著しく上回っている。 We introduce a method for training exactly conservative physics-informed neural networks and physics-informed deep operator networks for dynamical systems. The method employs a projection-based technique that maps a candidate solution learned by the neural network solver for any given dynamical system possessing at least one first integral onto an invariant manifold. We illustrate that exactly conservative physics-informed neural network solvers and physics-informed deep operator networks for dynamical systems vastly outperform their non-conservative counterparts for several real-world problems from the mathematical sciences.	翻訳日:2023-11-27 16:38:37 公開日:2023-11-23
# ビザンチンのロバスト性と部分的参加を同時に達成できる: クリップ勾配の違いだけ Byzantine Robustness and Partial Participation Can Be Achieved Simultaneously: Just Clip Gradient Differences ( http://arxiv.org/abs/2311.14127v1 ) ライセンス: Link先を確認	Grigory Malinovsky, Peter Richt\'arik, Samuel Horv\'ath, Eduard Gorbunov	(参考訳) 大規模機械学習モデルをトレーニングするための主要なパラダイムとして、分散学習が登場した。しかし、現実世界のシナリオでは、参加者は信頼できないか悪意がある場合があり、訓練されたモデルの完全性と正確性に重大な課題がある。これらの問題に対処するためにビザンチンのフォールトトレランスメカニズムが提案されているが、彼らはしばしば全クライアントからの完全な参加を想定している。本研究では,クライアントサンプリングとビザンチン労働者への耐性を証明可能な最初の分散手法を提案する。提案手法の背後にある重要なアイデアは,再帰的分散還元における確率的勾配差を制御するために勾配クリッピングを用いることである。これにより、すべてのサンプルクライアントがビザンチンであるイテレーションの間でさえも、ビザンチンワーカーによって引き起こされる潜在的な危害に縛られることができます。さらに,通信効率を向上させるために,通信圧縮を組み込んだ。非常に一般的な仮定の下では、既存の最先端(SOTA)理論結果と一致する提案手法の収束率を証明する。 Distributed learning has emerged as a leading paradigm for training large machine learning models. However, in real-world scenarios, participants may be unreliable or malicious, posing a significant challenge to the integrity and accuracy of the trained models. Byzantine fault tolerance mechanisms have been proposed to address these issues, but they often assume full participation from all clients, which is not always practical due to the unavailability of some clients or communication constraints. In our work, we propose the first distributed method with client sampling and provable tolerance to Byzantine workers. The key idea behind the developed method is the use of gradient clipping to control stochastic gradient differences in recursive variance reduction. This allows us to bound the potential harm caused by Byzantine workers, even during iterations when all sampled clients are Byzantine. Furthermore, we incorporate communication compression into the method to enhance communication efficiency. Under quite general assumptions, we prove convergence rates for the proposed method that match the existing state-of-the-art (SOTA) theoretical results.	翻訳日:2023-11-27 16:38:28 公開日:2023-11-23
# 大規模言語モデルへの展望:テキストに基づくステレオタイプ検出の改善 Towards Auditing Large Language Models: Improving Text-based Stereotype Detection ( http://arxiv.org/abs/2311.14126v1 ) ライセンス: Link先を確認	Wu Zekun, Sahan Bulathwela, Adriano Soares Koshiyama	(参考訳) 大規模言語モデル(llm)は、ai(artificial intelligence, 人工知能)によって人間を対象とするアプリケーションにおいて、近年大きな進歩を遂げている。しかし、LLMは歴史的データから受け継いだステレオタイプ的な出力を生成し、社会的偏見を増幅し、倫理的関心を喚起する。この作品を紹介する一性別、人種、職業及び宗教のステレオタイプテキストの52,751例を含む多粒ステレオタイプデータセット二英語テキストの新規なステレオタイプ分類器そこで本研究では,新しいデータセットを用いてトレーニングしたモデルを提案する。実験では,マルチクラス環境でモデルのトレーニングが1-vs-allバイナリセットよりも優れることを示した。異なるeXplainable AIツールからの一貫性のある機能重要信号は、新しいモデルが関連するテキスト機能を利用することを示す。我々は,新たに作成されたモデルを用いて,一般的なGPTモデルのステレオタイプ行動を評価し,時間とともにバイアスの低減を観察する。まとめると,本研究はLLMのステレオタイプバイアスを監査・評価するための,堅牢で実用的な枠組みを確立している。 Large Language Models (LLM) have made significant advances in the recent past becoming more mainstream in Artificial Intelligence (AI) enabled human-facing applications. However, LLMs often generate stereotypical output inherited from historical data, amplifying societal biases and raising ethical concerns. This work introduces i) the Multi-Grain Stereotype Dataset, which includes 52,751 instances of gender, race, profession and religion stereotypic text and ii) a novel stereotype classifier for English text. We design several experiments to rigorously test the proposed model trained on the novel dataset. Our experiments show that training the model in a multi-class setting can outperform the one-vs-all binary counterpart. Consistent feature importance signals from different eXplainable AI tools demonstrate that the new model exploits relevant text features. We utilise the newly created model to assess the stereotypic behaviour of the popular GPT family of models and observe the reduction of bias over time. In summary, our work establishes a robust and practical framework for auditing and evaluating the stereotypic bias in LLM.	翻訳日:2023-11-27 16:38:10 公開日:2023-11-23
# 二重効率な議論によるスケーラブルなAI安全性 Scalable AI Safety via Doubly-Efficient Debate ( http://arxiv.org/abs/2311.14125v1 ) ライセンス: Link先を確認	Jonah Brown-Cohen, Geoffrey Irving, Georgios Piliouras	(参考訳) 多様な複雑なドメインの集合にまたがる強力な能力を持つ事前訓練されたAIシステムの出現は、人間が直接判断するにはタスクが複雑すぎるため、AIの安全性にとって重要な課題となった。アーヴィングとアル。 [2018]は、(ミス)アライメントを識別する問題が管理可能なサブタスクに分解されるまで、このようなAIモデルのパワーを互いに比較することを目的として、この方向の議論手法を提案した。このアプローチの約束は明確だが、当初のフレームワークは、正直な戦略が決定論的AIシステムを指数関数的なステップでシミュレートし、適用性を制限するという前提に基づいていた。本稿では, 確率的AIシステムのアライメントを検証すると同時に, 指数関数的に多くのシミュレーションステップを使用できる場合にも, 多項式数のシミュレーションを用いて, 正直な戦略が常に成功するような, 新たな議論プロトコルを設計することで, これらの課題に対処する方法を示す。 The emergence of pre-trained AI systems with powerful capabilities across a diverse and ever-increasing set of complex domains has raised a critical challenge for AI safety as tasks can become too complicated for humans to judge directly. Irving et al. [2018] proposed a debate method in this direction with the goal of pitting the power of such AI models against each other until the problem of identifying (mis)-alignment is broken down into a manageable subtask. While the promise of this approach is clear, the original framework was based on the assumption that the honest strategy is able to simulate deterministic AI systems for an exponential number of steps, limiting its applicability. In this paper, we show how to address these challenges by designing a new set of debate protocols where the honest strategy can always succeed using a simulation of a polynomial number of steps, whilst being able to verify the alignment of stochastic AI systems, even when the dishonest strategy is allowed to use exponentially many simulation steps.	翻訳日:2023-11-27 16:37:54 公開日:2023-11-23
# ストリーミングモデルにおける最大有向カット近似に対する指数量子空間アドバンテージ Exponential Quantum Space Advantage for Approximating Maximum Directed Cut in the Streaming Model ( http://arxiv.org/abs/2311.14123v1 ) ライセンス: Link先を確認	John Kallaugher and Ojas Parekh and Nadezhda Voronova	(参考訳) 量子アドバンテージの探索は一般的に実行時間の高速化に重点を置いているが、量子アルゴリズムは空間複雑性の利点も提供する。これまでの研究は、要素がランダムアクセスなしで順次処理されるデータストリーム問題に対してそのような利点を示してきたが、これらは特別に構築された問題(Le Gall, SPAA `06]や多項式の利点(Kallaugher, FOCS `21]に限られていた。最大有向切断問題に対する指数量子空間の優位性を示す。これは、任意の自然ストリーミング問題に対する最初の指数関数的量子空間アドバンテージである。これはまた、任意の設定で離散最適化問題を近似する最初の非条件指数量子リソースの利点を構成する。我々の量子ストリーミングアルゴリズム$0.4844$は、ポリログ$(n)$空間を用いたグラフストリームにおける最大有向カットの値を近似するが、chou, golovnev, velusamy [focs '20] による以前の研究は、任意の古典的なストリーミングアルゴリズムに対して$\omega(\sqrt{n})$空間を必要とする。この結果は、Sexena, Singer, Sudan, Velusamy [FOCS '23] による空間古典的ストリーミングアプローチである $\widetilde{\text{O}}(\sqrt{n})$ に基づいており、Singer [APPROX '23] による最近の研究により近似比がさらに改善されている。 While the search for quantum advantage typically focuses on speedups in execution time, quantum algorithms also offer the potential for advantage in space complexity. Previous work has shown such advantages for data stream problems, in which elements arrive and must be processed sequentially without random access, but these have been restricted to specially-constructed problems [Le Gall, SPAA `06] or polynomial advantage [Kallaugher, FOCS `21]. We show an exponential quantum space advantage for the maximum directed cut problem. This is the first known exponential quantum space advantage for any natural streaming problem. This also constitutes the first unconditional exponential quantum resource advantage for approximating a discrete optimization problem in any setting. Our quantum streaming algorithm $0.4844$-approximates the value of the largest directed cut in a graph stream with $n$ vertices using polylog$(n)$ space, while previous work by Chou, Golovnev, and Velusamy [FOCS '20] implies that obtaining an approximation ratio better than $4/9 \approx 0.4444$ requires $\Omega(\sqrt{n})$ space for any classical streaming algorithm. Our result is based on a recent $\widetilde{\text{O}}(\sqrt{n})$ space classical streaming approach by Saxena, Singer, Sudan, and Velusamy [FOCS '23], with an additional improvement in the approximation ratio due to recent work by Singer [APPROX '23].	翻訳日:2023-11-27 16:37:36 公開日:2023-11-23
# 3次元ctを用いた残差三重畳畳み込みニューラルネットワークによるmtbi診断の強化 Enhancing mTBI Diagnosis with Residual Triplet Convolutional Neural Network Using 3D CT ( http://arxiv.org/abs/2311.14197v1 ) ライセンス: Link先を確認	Hanem Ellethy, Shekhar S. Chandra and Viktor Vegh	(参考訳) 軽度外傷性脳損傷(mTBI)は、正確に診断する上で一般的で困難な疾患である。タイムリーかつ正確な診断は、効果的な治療と患者の成績改善に不可欠である。 mtbiの従来の診断方法は、精度と感度に制限があることが多い。本研究では,3次元CT画像を用いたmTBI診断の革新的手法と,三重項損失を訓練した計量学習手法を提案する。これらの課題に対処するために,3次元CTスキャンを特徴空間に埋め込むことにより,mTBI症例と健常症例を識別するResidual Triplet Convolutional Neural Network (RTCNN)モデルを提案する。三重項損失関数は、類似画像対と異画像対のマージンを最大化し、特徴表現を最適化する。これにより、個々の症例の文脈的配置が改善し、情報的意思決定を支援し、患者の結果を改善する可能性がある。 RTCNNモデルは,mTBI診断における有望な成績を示し,平均精度94.3%,感度94.1%,特異性95.2%を5倍のクロスバリデーションで確認した。重要なことに、従来のResidual Convolutional Neural Network (RCNN)モデルと比較すると、RTCNNは顕著な改善を示し、22.5%の特異性、16.2%の精度向上、11.3%の感度向上を示す。さらに、RTCNNは低いメモリリソースを必要とするため、偽陽性を最小化するだけでなく、診断精度を最大化し、通常のCTスキャンとmTBIケースを区別する。モデルの決定過程を視覚的に説明するためのオクルージョン感度マップの量的性能測定と利用により、我々のアプローチの解釈可能性と透明性がさらに向上した。 Mild Traumatic Brain Injury (mTBI) is a common and challenging condition to diagnose accurately. Timely and precise diagnosis is essential for effective treatment and improved patient outcomes. Traditional diagnostic methods for mTBI often have limitations in terms of accuracy and sensitivity. In this study, we introduce an innovative approach to enhance mTBI diagnosis using 3D Computed Tomography (CT) images and a metric learning technique trained with triplet loss. To address these challenges, we propose a Residual Triplet Convolutional Neural Network (RTCNN) model to distinguish between mTBI cases and healthy ones by embedding 3D CT scans into a feature space. The triplet loss function maximizes the margin between similar and dissimilar image pairs, optimizing feature representations. This facilitates better context placement of individual cases, aids informed decision-making, and has the potential to improve patient outcomes. Our RTCNN model shows promising performance in mTBI diagnosis, achieving an average accuracy of 94.3%, a sensitivity of 94.1%, and a specificity of 95.2%, as confirmed through a five-fold cross-validation. Importantly, when compared to the conventional Residual Convolutional Neural Network (RCNN) model, the RTCNN exhibits a significant improvement, showcasing a remarkable 22.5% increase in specificity, a notable 16.2% boost in accuracy, and an 11.3% enhancement in sensitivity. Moreover, RTCNN requires lower memory resources, making it not only highly effective but also resource-efficient in minimizing false positives while maximizing its diagnostic accuracy in distinguishing normal CT scans from mTBI cases. The quantitative performance metrics provided and utilization of occlusion sensitivity maps to visually explain the model's decision-making process further enhance the interpretability and transparency of our approach.	翻訳日:2023-11-27 16:29:45 公開日:2023-11-23
# タッチ分析:タッチデータを用いた機械学習分類アルゴリズムの実証的評価 Touch Analysis: An Empirical Evaluation of Machine Learning Classification Algorithms on Touch Data ( http://arxiv.org/abs/2311.14195v1 ) ライセンス: Link先を確認	Melodee Montgomery, Prosenjit Chatterjee, John Jenkins, and Kaushik Roy	(参考訳) 本研究の目的は、タッチスクリーンベースのスマートフォン上でのユニークなインタラクションに基づいて個人を分類することである。本研究では、41の被験者と30の異なる行動特徴を含むTouch-Analyticsデータセットを使用する。さらに,全認証性能を向上させるために,生データから新機能を導出した。 Touch-Analyticsデータセットには、SVM(Support Vector Machine)やk-nearest neighbor(kNN)など、最先端の分類器が組み込まれており、エラーレート(EER)は0%から4%である。本稿では、個人を正しく分類するための新しいDeep Neural Net(DNN)アーキテクチャを提案する。提案するdnnアーキテクチャは、3つの密集層を持ち、多対多マッピング技術を用いる。新機能と既存の機能を組み合わせると、SVMとkNNはそれぞれ94.7%と94.6%の分類精度を達成した。本研究は,他の7つの分類器を探索し,その内,決定木と提案したDNN分類器の精度を100%とした。その他、ロジスティック回帰(lr)、線形判別分析(lda)、ガウス的ナイーブベイズ(nb)、ニューラルネットワーク、およびvggnetはそれぞれ94.7%、95.9%、31.9%、88.8%、96.1%である。 Our research aims at classifying individuals based on their unique interactions on touchscreen-based smartphones. In this research, we use Touch-Analytics datasets, which include 41 subjects and 30 different behavioral features. Furthermore, we derived new features from the raw data to improve the overall authentication performance. Previous research has already been done on the Touch-Analytics datasets with the state-of-the-art classifiers, including Support Vector Machine (SVM) and k-nearest neighbor (kNN), and achieved equal error rates (EERs) between 0% to 4%. Here, we propose a novel Deep Neural Net (DNN) architecture to classify the individuals correctly. The proposed DNN architecture has three dense layers and uses many-to-many mapping techniques. When we combine the new features with the existing ones, SVM and kNN achieved the classification accuracy of 94.7% and 94.6%, respectively. This research explored seven other classifiers and out of them, the decision tree and our proposed DNN classifiers resulted in the highest accuracy of 100%. The others included: Logistic Regression (LR), Linear Discriminant Analysis (LDA), Gaussian Naive Bayes (NB), Neural Network, and VGGNet with the following accuracy scores of 94.7%, 95.9%, 31.9%, 88.8%, and 96.1%, respectively.	翻訳日:2023-11-27 16:29:12 公開日:2023-11-23
# HACDによる単眼ハンドヘルド物体再建のための条件拡散 HACD: Hand-Aware Conditional Diffusion for Monocular Hand-Held Object Reconstruction ( http://arxiv.org/abs/2311.14189v1 ) ライセンス: Link先を確認	Bowen Fu, Yan Di, Chenyangguang Zhang, Gu Wang, Ziqin Huang, Zhiying Leng, Fabian Manhardt, Xiangyang Ji and Federico Tombari	(参考訳) コンピュータビジョンでは、手持ちのオブジェクトを3dオブジェクトのテンプレートやカテゴリの事前情報、奥行き情報なしで単一のrgbイメージから再構築することは、非常に難しい問題である。手動・自己閉塞による不確実性を考慮しにくい決定論的モデリングパラダイムを利用する先行研究とは対照的に,我々は上記の課題に対処するために,確率的点雲デノナイズ拡散モデルを用いている。そこで本研究では, モノクロハンドヘルドオブジェクト再構成(hacd)のためのハンドアウェア条件拡散(hand-aware conditional diffusion)を提案し, ハンドオブジェクト間相互作用を2つの側面からモデル化する。まず,意味的視点と幾何学的視点の両方から手・物体間相互作用をモデル化する。具体的には、統合された手オブジェクト意味埋め込みは、手閉塞によって引き起こされる2次元局所特徴障害を補うものであり、さらに手関節埋め込みは、対象の頂点と手関節の関係を符号化する。第2に,手の頂点を前もって使用し,拡散・反転過程における部分分断点雲の遠心偏差を制限する手拘束型遠心固定スキームを提案する。遠心バイアスの干渉を取り除くことにより、拡散モデルは形状の再構成に集中することができ、局所的特徴投影の安定性と精度が向上する。 ObManデータセットと2つの実世界のデータセット、HO3DとMOWの実験は、我々のアプローチが既存のすべてのメソッドを大きなマージンで上回っていることを実証している。 Reconstructing hand-held objects from a single RGB image without known 3D object templates, category prior, or depth information is a vital yet challenging problem in computer vision. In contrast to prior works that utilize deterministic modeling paradigms, which make it hard to account for the uncertainties introduced by hand- and self-occlusion, we employ a probabilistic point cloud denoising diffusion model to tackle the above challenge. In this work, we present Hand-Aware Conditional Diffusion for monocular hand-held object reconstruction (HACD), modeling the hand-object interaction in two aspects. First, we introduce hand-aware conditioning to model hand-object interaction from both semantic and geometric perspectives. Specifically, a unified hand-object semantic embedding compensates for the 2D local feature deficiency induced by hand occlusion, and a hand articulation embedding further encodes the relationship between object vertices and hand joints. Second, we propose a hand-constrained centroid fixing scheme, which utilizes hand vertices priors to restrict the centroid deviation of partially denoised point cloud during diffusion and reverse process. Removing the centroid bias interference allows the diffusion models to focus on the reconstruction of shape, thus enhancing the stability and precision of local feature projection. Experiments on the synthetic ObMan dataset and two real-world datasets, HO3D and MOW, demonstrate our approach surpasses all existing methods by a large margin.	翻訳日:2023-11-27 16:28:46 公開日:2023-11-23
# 行列微分計算による多重ペナルティリッジ回帰の勾配ベースバイレベル最適化 Gradient-based bilevel optimization for multi-penalty Ridge regression through matrix differential calculus ( http://arxiv.org/abs/2311.14182v1 ) ライセンス: Link先を確認	Gabriele Maroni, Loris Cannelli, Dario Piga	(参考訳) LASSOやリッジ回帰のような線形回帰の共通正規化アルゴリズムは、適合誤差と学習モデル係数のノルムとのトレードオフを最小化する正規化ハイパーパラメータに依存している。このハイパーパラメータはスカラーであるため、クロスバリデーション基準を最適化するランダムまたはグリッドサーチにより容易に選択できる。しかし、スカラーハイパーパラメーターを用いることで、アルゴリズムの柔軟性と一般化の可能性が制限される。本稿では,各入力変数に異なる正規化ハイパーパラメータが関連付けられるl2正規化を伴う線形回帰問題に対処する。これらのハイパーパラメータを勾配に基づく手法で最適化し、正規化ハイパーパラメータに対するクロスバリデーション基準の勾配を行列微分計算により解析的に計算する。さらに,検証データへの過剰適合のリスクを低減することを目的とした,スパースモデル学習問題に適した2つの戦略を提案する。数値的な例は、我々のマルチハイパーパラメータ正規化アプローチがLASSO、リッジ、弾性ネット回帰よりも優れていることを示している。さらに, 勾配の解析計算は, 計算時間の面では, 特に多くの入力変数を扱う場合には, 自動微分よりも効率的であることが証明された。過パラメータ付き線形パラメータ変動モデルの同定にも応用した。 Common regularization algorithms for linear regression, such as LASSO and Ridge regression, rely on a regularization hyperparameter that balances the tradeoff between minimizing the fitting error and the norm of the learned model coefficients. As this hyperparameter is scalar, it can be easily selected via random or grid search optimizing a cross-validation criterion. However, using a scalar hyperparameter limits the algorithm's flexibility and potential for better generalization. In this paper, we address the problem of linear regression with l2-regularization, where a different regularization hyperparameter is associated with each input variable. We optimize these hyperparameters using a gradient-based approach, wherein the gradient of a cross-validation criterion with respect to the regularization hyperparameters is computed analytically through matrix differential calculus. Additionally, we introduce two strategies tailored for sparse model learning problems aiming at reducing the risk of overfitting to the validation data. Numerical examples demonstrate that our multi-hyperparameter regularization approach outperforms LASSO, Ridge, and Elastic Net regression. Moreover, the analytical computation of the gradient proves to be more efficient in terms of computational time compared to automatic differentiation, especially when handling a large number of input variables. Application to the identification of over-parameterized Linear Parameter-Varying models is also presented.	翻訳日:2023-11-27 16:28:17 公開日:2023-11-23
# TCuPGAN:市民科学における人間と機械の相互作用を最適化するための新しいフレームワーク TCuPGAN: A novel framework developed for optimizing human-machine interactions in citizen science ( http://arxiv.org/abs/2311.14177v1 ) ライセンス: Link先を確認	Ramanakumar Sankar and Kameswara Mantha and Lucy Fortson and Helen Spiers and Thomas Pengo and Douglas Mashek and Myat Mo and Mark Sanders and Trace Christensen and Jeffrey Salisbury and Laura Trouille	(参考訳) 科学研究におけるビッグデータの時代において、高度な機械ツールによって大規模データセットのラベル付けや分類に人間の労力を削減できる技術を活用する必要がある。この問題に対処するために,パッチワイド逆数とLong Short-Term Memoryを利用して逐次情報をエンコードする3次元セグメンテーションの汎用モデルを提案する。このモデルと3dデータセット(画像キューブ)をzooniverseプラットフォームで使用する市民科学プロジェクトと合わせて,これらのキューブからの2dスライスのほんの一部がボランティアによって見られる反復的ヒューマンマシン最適化フレームワークを提案する。我々は,このモデルのパッチワイズ判別器を用いて,画像キューブ内のスライスが特徴表現に乏しい部分と,それに対応するマシン性能の低下を推定する。これらの画像と対応する機械の提案は、修正のためにZooniverseのボランティアに提示され、市民科学プロジェクトにおけるボランティア活動の大幅な削減につながった。約2300個の肝組織3D電子顕微鏡でモデルを訓練した。脂質滴は、zoonniverseプラットフォームにホストされた「etch a cell - fat checker」市民科学プロジェクトを通じて人間のアノテーションを通して画像内に分割された。本研究では,この枠組みと選択手法を実証し,ボランティア活動の60%以上削減を実現した。このタイプの人間と機械のパートナーシップは、将来のZooniverseプロジェクトで大いに役立つだろう。 In the era of big data in scientific research, there is a necessity to leverage techniques which reduce human effort in labeling and categorizing large datasets by involving sophisticated machine tools. To combat this problem, we present a novel, general purpose model for 3D segmentation that leverages patch-wise adversariality and Long Short-Term Memory to encode sequential information. Using this model alongside citizen science projects which use 3D datasets (image cubes) on the Zooniverse platforms, we propose an iterative human-machine optimization framework where only a fraction of the 2D slices from these cubes are seen by the volunteers. We leverage the patch-wise discriminator in our model to provide an estimate of which slices within these image cubes have poorly generalized feature representations, and correspondingly poor machine performance. These images with corresponding machine proposals would be presented to volunteers on Zooniverse for correction, leading to a drastic reduction in the volunteer effort on citizen science projects. We trained our model on ~2300 liver tissue 3D electron micrographs. Lipid droplets were segmented within these images through human annotation via the `Etch A Cell - Fat Checker' citizen science project, hosted on the Zooniverse platform. In this work, we demonstrate this framework and the selection methodology which resulted in a measured reduction in volunteer effort by more than 60%. We envision this type of joint human-machine partnership will be of great use on future Zooniverse projects.	翻訳日:2023-11-27 16:27:58 公開日:2023-11-23
# 深部ニューラルネットワークを用いた合成画像による外見に基づく視線推定 Appearance-based gaze estimation enhanced with synthetic images using deep neural networks ( http://arxiv.org/abs/2311.14175v1 ) ライセンス: Link先を確認	Dmytro Herashchenko and Igor Farka\v{s}	(参考訳) 人間の視線推定は人間とロボットの相互作用を成功させる上で重要な認知要素であり、ロボットは人間の行動を読み、予測することができる。ニューラルネットワークを用いてこの問題にアプローチし,顔検出 (RetinaFace) と頭部ポーズ推定 (6DRepNet) のために,既存のよく機能するコンポーネントを活用して,切り取った目から視線を推定するモジュールシステムを構築する。提案手法では,特殊なハードウェアや赤外線フィルタは必要とせず,通常,ノートブック内蔵のrgbカメラを用いる。 MetaHumanツールを使用して、57,000人以上の顔の大規模な合成データセットを生成し、公開しました。標準的なコロンビアの視線データセットの上にこのデータセット(視線と頭部のポーズ情報を含む)を組み込んでモデルのトレーニングを行うことで、平均平均誤差が2度未満で精度が向上し、関連する方法と比較した。また,nicoセミヒューマノイドロボットの眼球に内蔵された4kカメラを用いて実世界における予備テストを行い,本モデルの有効性を検証した。 Human eye gaze estimation is an important cognitive ingredient for successful human-robot interaction, enabling the robot to read and predict human behavior. We approach this problem using artificial neural networks and build a modular system estimating gaze from separately cropped eyes, taking advantage of existing well-functioning components for face detection (RetinaFace) and head pose estimation (6DRepNet). Our proposed method does not require any special hardware or infrared filters but uses a standard notebook-builtin RGB camera, as often approached with appearance-based methods. Using the MetaHuman tool, we also generated a large synthetic dataset of more than 57,000 human faces and made it publicly available. The inclusion of this dataset (with eye gaze and head pose information) on top of the standard Columbia Gaze dataset into training the model led to better accuracy with a mean average error below two degrees in eye pitch and yaw directions, which compares favourably to related methods. We also verified the feasibility of our model by its preliminary testing in real-world setting using the builtin 4K camera in NICO semi-humanoid robot's eye.	翻訳日:2023-11-27 16:27:30 公開日:2023-11-23
# 絡み合う絡み合い:需要による二光子の結合周波数と偏光 Entangling entanglement: coupling frequency and polarization of biphotons on demand ( http://arxiv.org/abs/2311.14173v1 ) ライセンス: Link先を確認	Arash Riazi, Eric Y. Zhu, Dan Xu, and Li Qian	(参考訳) 量子情報は、単一光子と絡み合った光子の周波数と偏光度(DoF)でしばしば記録される。広帯域バイフォトンの周波数と偏光度を結合・分離する新しい手法を示す。提案手法は,2つの非線形媒質の間に挟まれた線形分散媒質と偏光制御器を備えた共通パス非線形干渉計 (CP-NLI) に基づく。偏光制御器を調整することで、2つのdofを効果的に操作できる。 2つのdofがデカップリングされた場合、極性化dofでは極性偏光絡みバイフォトンは観測され、干渉縞はバイフォトンのスペクトル強度で観測される。しかし、2つのDoFが結合されると、干渉線はスペクトル強度から消え、代わりに偏光絡みの程度に現れる。原理的に量子化された分極絡み合いの度合いは、信号とアイドラー光子周波数によって0から1に変化する。本手法は偏光エンタングルメントを調整し、任意の双光子偏光状態の生成に利用し、量子情報処理や基礎物理学の研究に応用することができる。 Quantum information is often carried in the frequency and polarization degrees of freedom (DoFs) in single photons and entangled photons. We demonstrate a new approach to couple and decouple the frequency and polarization DoFs of broadband biphotons. Our approach is based on a common-path nonlinear interferometer (CP-NLI) with a linear dispersive medium and a polarization controller sandwiched in between two nonlinear media that generate the interfering biphotons. By adjusting the polarization controller, we can effectively manipulate the two DoFs. When the two DoFs are decoupled, maximally polarization-entangled biphotons are observed in the polarization DoF, while interference fringes are observed in the spectral intensity of the biphotons. When the two DoFs are coupled, however, interference fringes disappear from the spectral intensity and instead appear in the degree of polarization entanglement. The degree of polarization entanglement quantified by concurrence in principle can vary from 0 to 1 depending on the signal and idler photon frequencies. Our approach offers a convenient means of tuning the polarization entanglement and can be employed for arbitrary biphoton polarization state generation, with applications in quantum information processing and the study of fundamental physics.	翻訳日:2023-11-27 16:27:08 公開日:2023-11-23
# シードおよび損失非線形干渉計におけるメトロロジカルアドバンテージ Metrological Advantages in Seeded and Lossy Nonlinear Interferometers ( http://arxiv.org/abs/2311.14172v1 ) ライセンス: Link先を確認	Jasper Kranias, Guillaume Thekkadath, Khabat Heshami, Aaron Z. Goldberg	(参考訳) 量子フィッシャー情報(QFI)は、量子測定の感度を制限し、量子上の利点の条件を規定する。非線形干渉計を用いた単パラメータ位相センシングにおいて量子優位が実現できる条件を求める。本稿では, 非線形干渉計のqfi解析式を, 損失条件とコヒーレント状態のシードで計算する。我々は、位相誘導素子を通過する光子数に基づいて結果を正規化する。我々は、線形干渉計と直接比較することにより、様々な測地における非線形干渉計の性能と、内部および外部損失に対する量子優位性の堅牢性を分析する。我々は,量子優位が消滅する内部損失のしきい値を発見し,コヒーレント状態のシードが内部損失に最適に対応する時期と時期を特定し,十分な量のスクイージングが,外部損失と非効率検出に対する量子有利性に寄与することを示した。 The quantum Fisher information (QFI) bounds the sensitivity of a quantum measurement, heralding the conditions for quantum advantages. We aim to find conditions at which quantum advantage can be realized in single-parameter phase sensing with nonlinear interferometers. Here, we calculate analytical expressions for the QFI of nonlinear interferometers under lossy conditions and with coherent-state seeding. We normalize the results based on the number of photons going through the phase-inducing element, which eliminates some of the previously declared metrological advantages. We analyze the performance of nonlinear interferometers in a variety of geometries and robustness of the quantum advantage with respect to internal and external loss through direct comparison with a linear interferometer. We find the threshold on the internal loss at which the quantum advantage vanishes, specify when and how much coherent-state seeding optimally counters internal loss, and show that a sufficient amount of squeezing confers to the quantum advantages robustness against external loss and inefficient detection.	翻訳日:2023-11-27 16:26:48 公開日:2023-11-23
# ブラジル大学入学試験におけるGPT-4の視力評価 Evaluating GPT-4's Vision Capabilities on Brazilian University Admission Exams ( http://arxiv.org/abs/2311.14169v1 ) ライセンス: Link先を確認	Ramon Pires, Thales Sales Almeida, Hugo Abonizio, Rodrigo Nogueira	(参考訳) 近年の言語モデルの進歩は、学術受験において人間に相応しい性能を示した。しかし、既存の研究はしばしば、視覚的理解の統合を必要とする問題を見落とし、現実のシナリオに固有の完全なスペクトルと複雑さを妥協させる。このギャップに対処するために,テキスト要素と視覚要素の両方を組み込んだ入学試験における言語モデル評価フレームワークを提案する。ブラジルの大学が採用する主要な標準入学試験であるExame Nacional do Ensino M\'edio(ENEM)の2つの最新版を評価した。本研究は,GPT-4の複雑な多分野質問処理技術としての能力を再確認するだけでなく,ポルトガル語試験におけるマルチモーダル言語モデルの現実的評価の先駆者でもある。ハイライトの1つは、視覚コンテンツを翻訳するテキストキャプションが画像の直接使用よりも優れており、視覚モデルに改善の余地があることである。しかし、画像やキャプションによる改善にもかかわらず、数学的な疑問はこれらの最先端モデルの課題である。実験で使用されるコードとデータは、https://github.com/piresramon/gpt-4-enemで入手できる。 Recent advancements in language models have showcased human-comparable performance in academic entrance exams. However, existing studies often overlook questions that require the integration of visual comprehension, thus compromising the full spectrum and complexity inherent in real-world scenarios. To address this gap, we present a comprehensive framework to evaluate language models on entrance exams, which incorporates both textual and visual elements. We evaluate the two most recent editions of Exame Nacional do Ensino M\'edio (ENEM), the main standardized entrance examination adopted by Brazilian universities. Our study not only reaffirms the capabilities of GPT-4 as the state of the art for handling complex multidisciplinary questions, but also pioneers in offering a realistic assessment of multimodal language models on Portuguese examinations. One of the highlights is that text captions transcribing visual content outperform the direct use of images, suggesting that the vision model has room for improvement. Yet, despite improvements afforded by images or captions, mathematical questions remain a challenge for these state-of-the-art models. The code and data used on experiments are available at https://github.com/piresramon/gpt-4-enem.	翻訳日:2023-11-27 16:26:29 公開日:2023-11-23
# エントロピー正規化を伴う線形二次レギュレータの高速ポリシー学習 Fast Policy Learning for Linear Quadratic Regulator with Entropy Regularization ( http://arxiv.org/abs/2311.14168v1 ) ライセンス: Link先を確認	Xin Guo, Xinyu Li and Renyuan Xu	(参考訳) 本稿では,エントロピー正規化を伴う無限時間軸上のディスカウント線形量子化レギュレータ(lqr)問題に対して,正則化ポリシー勾配(rpg)と反復ポリシー最適化(ipo)という2つの新しいポリシー学習法を提案し,解析する。正確な政策評価にアクセスできると仮定すると、どちらの手法も正規化LQRの最適ポリシーを見つける際に線形収束することが証明される。さらに、最適政策の周辺地域に入ると、IPO方式は超直線収束率を達成することができる。最後に、RL問題におけるよく理解された環境からの最適ポリシーを、未知の環境のRL問題に対する初期ポリシーとして適切に転送した場合、後者が十分に前者に近い場合、IPO方式により超線形収束率を実現する。これらのアルゴリズムの性能は数値例によって支持されている。 This paper proposes and analyzes two new policy learning methods: regularized policy gradient (RPG) and iterative policy optimization (IPO), for a class of discounted linear-quadratic regulator (LQR) problems over an infinite time horizon with entropy regularization. Assuming access to the exact policy evaluation, both proposed approaches are proved to converge linearly in finding optimal policies of the regularized LQR. Moreover, the IPO method can achieve a super-linear convergence rate once it enters a local region around the optimal policy. Finally, when the optimal policy from a well-understood environment in an RL problem is appropriately transferred as the initial policy to an RL problem with an unknown environment, the IPO method is shown to enable a super-linear convergence rate if the latter is sufficiently close to the former. The performances of these proposed algorithms are supported by numerical examples.	翻訳日:2023-11-27 16:26:09 公開日:2023-11-23
# ハイブリッド回路マッピング:中性原子量子コンピュータの計算能力の全スペクトルを活用する Hybrid Circuit Mapping: Leveraging the Full Spectrum of Computational Capabilities of Neutral Atom Quantum Computers ( http://arxiv.org/abs/2311.14164v1 ) ライセンス: Link先を確認	Ludwig Schmid, Sunghye Park, Seokhyeong Kang, Robert Wille	(参考訳) ニュートラル原子(NA)に基づく量子コンピューティングは、ネイティブなマルチキュービットゲートとの高忠実な長距離相互作用を含む幅広い計算能力を提供し、キュービットの配列をシャトルする能力を提供する。従来これらの機能は個別に研究されてきたが、我々は高速ハイブリッドコンパイラの最初のアプローチとして、高忠実度ゲート相互作用とクビットシャットリングの両方に基づいて回路マッピングとルーティングを行う。複数の機能を組み合わせて、結果として生じる課題に対処する効果的なソリューションを示す際に、コンパイルプロセスの複雑さを掘り下げる。最終的なコンパイル戦略は、さまざまなハードウェア設定にまたがって紹介され、その汎用性を明らかにし、ゲートとシャットリングベースのルーティングを戦略的に利用することで、潜在的な忠実性の向上を強調する。両方のルーティング機能に対するマルチキュービットゲートの追加サポートにより、提案されたアプローチはnasが提供する計算能力の完全なスペクトルを活用することができる。 Quantum computing based on Neutral Atoms (NAs) provides a wide range of computational capabilities, encompassing high-fidelity long-range interactions with native multi-qubit gates, and the ability to shuttle arrays of qubits. While previously these capabilities have been studied individually, we propose the first approach of a fast hybrid compiler to perform circuit mapping and routing based on both high-fidelity gate interactions and qubit shuttling. We delve into the intricacies of the compilation process when combining multiple capabilities and present effective solutions to address resulting challenges. The final compilation strategy is then showcased across various hardware settings, revealing its versatility, and highlighting potential fidelity enhancements achieved through the strategic utilization of combined gate- and shuttling-based routing. With the additional multi-qubit gate support for both routing capabilities, the proposed approach is able to take advantage of the full spectrum of computational capabilities offered by NAs.	翻訳日:2023-11-27 16:25:49 公開日:2023-11-23
# 量子力学における一般化虚数単位 Generalized imaginary units in quantum mechanics ( http://arxiv.org/abs/2311.14162v1 ) ライセンス: Link先を確認	Sergio Giardino	(参考訳) 虚数単位の一般化は、複素量子力学(英語版)(\mathbb C$QM)や準イオン量子力学(英語版)(\mathbb H$QM)の例でも検討される。複素理論は非定常量子過程を記述するが、四元数論はそのような解釈を認めず、一般化された虚数単位を新しい時間進化関数に関連付ける。今後の研究の方向性として様々な可能性が開けられている。 The generalization of the imaginary unit is examined within the instances of the complex quantum mechanics ($\mathbb C$QM), and of the quaternionic quantum mechanics ($\mathbb H$QM) as well. Whereas the complex theory describes non-stationary quantum processes, the quaternionic theory does not admit such an interpretation, and associates the generalized imaginary unit to a novel time evolution function. Various possibilities are opened as future directions for future research.	翻訳日:2023-11-27 16:25:31 公開日:2023-11-23
# 知識蒸留によるLHCにおける効率的ロバストジェットタグリング Efficient and Robust Jet Tagging at the LHC with Knowledge Distillation ( http://arxiv.org/abs/2311.14160v1 ) ライセンス: Link先を確認	Ryan Liu, Abhijith Gandrakota, Jennifer Ngadiuba, Maria Spiropulu, Jean-Roch Vlimant	(参考訳) LHC(Large Hadron Collider)におけるリアルタイムデータ処理システムの困難な環境は、デプロイ可能なアルゴリズムの計算複雑性を厳しく制限する。ディープラーニングモデルでは、帰納バイアスの弱い計算複雑性の低いモデルのみが実現可能であることを意味する。この問題に対処するため,我々は,大規模モデルの性能と小型モデルの計算複雑性の低減を両立するために,知識蒸留を利用する。本稿では,LHCにおけるジェットの分類作業において,学生モデルの性能の全体的な向上を示す知識蒸留の実装について述べる。さらに,ローレンツ対称性の強いインダクティブバイアスを持つ教師モデルを用いることにより,任意のローレンツブーストに対するロバスト性が向上する学生モデルにおいて,同じインダクティブバイアスを誘導できることを示した。 The challenging environment of real-time data processing systems at the Large Hadron Collider (LHC) strictly limits the computational complexity of algorithms that can be deployed. For deep learning models, this implies that only models with low computational complexity that have weak inductive bias are feasible. To address this issue, we utilize knowledge distillation to leverage both the performance of large models and the reduced computational complexity of small ones. In this paper, we present an implementation of knowledge distillation, demonstrating an overall boost in the student models' performance for the task of classifying jets at the LHC. Furthermore, by using a teacher model with a strong inductive bias of Lorentz symmetry, we show that we can induce the same inductive bias in the student model which leads to better robustness against arbitrary Lorentz boost.	翻訳日:2023-11-27 16:25:22 公開日:2023-11-23
# 実験的匿名量子会議 Experimental anonymous quantum conferencing ( http://arxiv.org/abs/2311.14158v1 ) ライセンス: Link先を確認	Jonathan W. Webb, Joseph Ho, Federico Grasselli, Gl\'aucia Murta, Alexander Pickston, Andr\'es Ulibarrena and Alessandro Fedrizzi	(参考訳) 匿名量子会議鍵契約(AQCKA)により、ネットワーク内のユーザのグループは、参加を公表することなく、共有暗号鍵を確立することができる。これは、二部構成のプリミティブだけで実現できるが、必要なネットワークラウンドの数には費用がかかる。マルチパーティ・エンタングルメントの使用を可能にすることで、大幅な効率向上が達成される。我々は,greenberger-horne-zeilinger (ghz) 状態の絡み合いを用いた6ユーザ量子ネットワークにおいて aqcka タスクを実験的に実装し,二成分のみのアプローチと比較して,理論に沿ったリソースコストの大幅な削減を実現する。また,このプロトコルは,鍵効果が有限である4ユーザシナリオにおいて有利であることを示す。 Anonymous quantum conference key agreement (AQCKA) allows a group of users within a network to establish a shared cryptographic key without revealing their participation. Although this can be achieved using bi-partite primitives alone, it is costly in the number of network rounds required. By allowing the use of multi-partite entanglement, there is a substantial efficiency improvement. We experimentally implement the AQCKA task in a six-user quantum network using Greenberger-Horne-Zeilinger (GHZ)-state entanglement and obtain a significant resource cost reduction in line with theory when compared to a bi-partite-only approach. We also demonstrate that the protocol retains an advantage in a four-user scenario with finite key effects taken into account.	翻訳日:2023-11-27 16:25:07 公開日:2023-11-23
# 組合せ最適化のためのグラフ上の変分アニーリング Variational Annealing on Graphs for Combinatorial Optimization ( http://arxiv.org/abs/2311.14156v1 ) ライセンス: Link先を確認	Sebastian Sanokowski, Wilhelm Berghammer, Sepp Hochreiter, Sebastian Lehner	(参考訳) いくつかの非教師なし学習手法は確率論的手法を用いて統計的に独立な解変数の仮定に基づいて組合せ最適化(CO)問題を解決する。この仮定が特に難しい問題インスタンスにパフォーマンス上の制限を課すことを実証する。その結果, 解変数間の統計的依存関係を捉える自己回帰的手法は, 多くのCO問題に対して優れた性能を示すことがわかった。本稿では,ソリューション変数の集合の構成を単一のトークンで表現するサブグラフトークン化を導入する。このトークン化技術は、表現性を犠牲にすることなく自己回帰法固有の長いシーケンシャルサンプリング手順の欠点を軽減する。重要なのは、理論上、アニールエントロピー正規化を動機付け、効率的で安定した学習に必須であることを実証的に示すことである。 Several recent unsupervised learning methods use probabilistic approaches to solve combinatorial optimization (CO) problems based on the assumption of statistically independent solution variables. We demonstrate that this assumption imposes performance limitations in particular on difficult problem instances. Our results corroborate that an autoregressive approach which captures statistical dependencies among solution variables yields superior performance on many popular CO problems. We introduce subgraph tokenization in which the configuration of a set of solution variables is represented by a single token. This tokenization technique alleviates the drawback of the long sequential sampling procedure which is inherent to autoregressive methods without sacrificing expressivity. Importantly, we theoretically motivate an annealed entropy regularization and show empirically that it is essential for efficient and stable learning.	翻訳日:2023-11-27 16:24:50 公開日:2023-11-23
# 逆行訓練による胸部X線画像のロバストかつ解釈可能な新型コロナウイルス診断 Robust and Interpretable COVID-19 Diagnosis on Chest X-ray Images using Adversarial Training ( http://arxiv.org/abs/2311.14227v1 ) ライセンス: Link先を確認	Karina Yang, Alexis Bennett, Dominique Duncan	(参考訳) 2019年の新型コロナウイルス(covid-19)の世界的パンデミックは、明確な健康危機である。最近の取り組みは、病気の強度と広がりを和らげるため、症状のある患者にまたがるcovid-19の迅速かつ正確な検出に向けられている。胸部X線(CXR)画像に適用される人工知能(AI)アルゴリズムは、有望な診断ツールとして登場し、以前の研究は印象的な分類性能を示した。しかし、このような方法は、ブラックボックスの推論プロセスや予測不能な性質から、医師から批判されている。専門的な放射線科医の診断とは対照的に、aiシステムは、しばしば臨床意思決定プロセスにおける一般化可能性、説明可能性、堅牢性に欠ける。本研究では,21種類の畳み込みニューラルネットワーク(convolutional neural network, cnn)モデルについて,健康,covid-19,非covid-19肺炎cxrを分類するための33,000以上のcxr画像を用いた広範なベースラインスタディを提案し,その評価を行った。得られたモデルは,最大97.03\%,97.97\%,99.95\%の3方向の分類精度,リコール,精度を達成した。次に,グラデーション強調型クラスアクティベーションマッピング(grad-cam)によるモデルロバストネスと説明可能性に対する敵意学習の有効性について検討する。逆向きに訓練されたモデルは、摂動画像の分類において標準モデルよりも大幅に優れているだけでなく、正当性マップも得られることがわかった。 1)臨床的に関連のある特徴をより適切に特定する。 2)外的アーティファクトに対して頑健であり、 3) 専門医の放射線検査結果とはかなり一致した。 The novel 2019 Coronavirus disease (COVID-19) global pandemic is a defining health crisis. Recent efforts have been increasingly directed towards achieving quick and accurate detection of COVID-19 across symptomatic patients to mitigate the intensity and spread of the disease. Artificial intelligence (AI) algorithms applied to chest X-ray (CXR) images have emerged as promising diagnostic tools, and previous work has demonstrated impressive classification performances. However, such methods have faced criticisms from physicians due to their black-box reasoning process and unpredictable nature. In contrast to professional radiologist diagnosis, AI systems often lack generalizability, explainability, and robustness in the clinical decision making process. In our work, we address these issues by first proposing an extensive baseline study, training and evaluating 21 convolutional neural network (CNN) models on a diverse set of 33,000+ CXR images to classify between healthy, COVID-19, and non-COVID-19 pneumonia CXRs. Our resulting models achieved a 3-way classification accuracy, recall, and precision of up to 97.03\%, 97.97\%, and 99.95\%, respectively. Next, we investigate the effectiveness of adversarial training on model robustness and explainability via Gradient-weighted Class Activation Mapping (Grad-CAM) heatmaps. We find that adversarially trained models not only significantly outperform their standard counterparts on classifying perturbed images, but also yield saliency maps that 1) better specify clinically relevant features, 2) are robust against extraneous artifacts, and 3) agree considerably more with expert radiologist findings.	翻訳日:2023-11-27 16:16:32 公開日:2023-11-23
# ビデオゲームキャラクターデザインにおけるジェンダーステレオタイプを明らかにする--キングス名誉のマルチモーダル分析 Uncovering Gender Stereotypes in Video Game Character Designs: A Multi-Modal Analysis of Honor of Kings ( http://arxiv.org/abs/2311.14226v1 ) ライセンス: Link先を確認	Bingqing Liu, Kyrie Zhixuan Zhou, Danlei Zhu, Jaihyun Park	(参考訳) 本稿では,中国で人気のマルチプレイヤーオンラインバトルアリーナ(MOBA)ゲームであるHonor of Kingsのキャラクターデザインにおいて,ジェンダーステレオタイプを包括的に分析する。我々は,役割割り当て,視覚デザイン,話し言葉,背景話のレンズを通してジェンダーステレオタイプを探索し,道徳的基礎理論に基づく質的分析とテキストマイニングを組み合わせた。男性ヒーローは、典型的には力を持つ男性戦士として、女性ヒーローは理想的外観を持つ女性ヒーローとしてデザインされる。ゲームにおけるジェンダーステレオタイプに対する文化認識とマルチモーダル理解に寄与し,テキスト,視覚,ロールに基づくエビデンスを活用する。 In this paper, we conduct a comprehensive analysis of gender stereotypes in the character design of Honor of Kings, a popular multiplayer online battle arena (MOBA) game in China. We probe gender stereotypes through the lens of role assignments, visual designs, spoken lines, and background stories, combining qualitative analysis and text mining based on the moral foundation theory. Male heroes are commonly designed as masculine fighters with power and female heroes as feminine "ornaments" with ideal looks. We contribute with a culture-aware and multi-modal understanding of gender stereotypes in games, leveraging text-, visual-, and role-based evidence.	翻訳日:2023-11-27 16:15:35 公開日:2023-11-23
# 過パラメータ線形回帰に対する加速SGDのリスク境界 Risk Bounds of Accelerated SGD for Overparameterized Linear Regression ( http://arxiv.org/abs/2311.14222v1 ) ライセンス: Link先を確認	Xuheng Li and Yihe Deng and Jingfeng Wu and Dongruo Zhou and Quanquan Gu	(参考訳) 加速度確率勾配降下(ASGD)は深層学習におけるワークホースであり、しばしばSGDよりも優れた一般化性能を達成する。しかし、既存の最適化理論はASGDのより高速な収束しか説明できないが、より優れた一般化は説明できない。本稿では,過パラメータ化による学習の最も簡単な設定である過パラメータ化線形回帰に対するasgdの一般化について検討する。データ共分散行列の各固有部分空間内で、ASGDのインスタンス依存過剰リスクを定めている。私たちの分析は (i)ASGDは小さな固有値の部分空間においてSGDより優れ、バイアス誤差の指数的減衰の速度が速い一方、大きな固有値の部分空間では、そのバイアス誤差はSGDよりも遅い。 (ii) ASGD の分散誤差は SGD の分散誤差よりも常に大きい。その結果,初期化と真の重みベクトルの差が小さい固有値の部分空間に限られている場合,ASGDはSGDより優れていることが示唆された。さらに,本解析が強凸集合における線形回帰に特化すると,最もよく知られた結果よりもバイアス誤差に強く結びつく。 Accelerated stochastic gradient descent (ASGD) is a workhorse in deep learning and often achieves better generalization performance than SGD. However, existing optimization theory can only explain the faster convergence of ASGD, but cannot explain its better generalization. In this paper, we study the generalization of ASGD for overparameterized linear regression, which is possibly the simplest setting of learning with overparameterization. We establish an instance-dependent excess risk bound for ASGD within each eigen-subspace of the data covariance matrix. Our analysis shows that (i) ASGD outperforms SGD in the subspace of small eigenvalues, exhibiting a faster rate of exponential decay for bias error, while in the subspace of large eigenvalues, its bias error decays slower than SGD; and (ii) the variance error of ASGD is always larger than that of SGD. Our result suggests that ASGD can outperform SGD when the difference between the initialization and the true weight vector is mostly confined to the subspace of small eigenvalues. Additionally, when our analysis is specialized to linear regression in the strongly convex setting, it yields a tighter bound for bias error than the best-known result.	翻訳日:2023-11-27 16:15:20 公開日:2023-11-23
# 推定リーンとデータ適応予測 Assumption-lean and Data-adaptive Post-Prediction Inference ( http://arxiv.org/abs/2311.14220v1 ) ライセンス: Link先を確認	Jiacheng Miao, Xinran Miao, Yixuan Wu, Jiwei Zhao, and Qiongshi Lu	(参考訳) 現代の科学研究が直面する主な課題は金本位制のデータの入手が限られていることであり、費用と労力がかかる。機械学習(ML)の急速な発展により、科学者は容易に得られる共変量でこれらの金標準結果を予測するためにMLアルゴリズムに依存してきた。しかし、これらの予測結果は、予測手順によってもたらされた不正確さや不均質性を無視して、後続の統計分析で直接使用されることが多い。これはおそらく偽陽性の発見と無効な科学的結論をもたらす。本研究では、ML予測結果に基づいて、有効かつ強力な推論を可能にする仮定型およびデータ適応型ポストプレディション推論(POP-Inf)手法を提案する。その「推定リーン」特性は、幅広い統計量のML予測を仮定せずに信頼できる統計的推測を保証する。その"data-adaptive"機能は、ml-predictionの精度に関わらず、既存の予測後推論メソッドよりも効率性が向上する。シミュレーションと大規模ゲノムデータを用いて,本手法の優位性と適用性を示す。 A primary challenge facing modern scientific research is the limited availability of gold-standard data which can be both costly and labor-intensive to obtain. With the rapid development of machine learning (ML), scientists have relied on ML algorithms to predict these gold-standard outcomes with easily obtained covariates. However, these predicted outcomes are often used directly in subsequent statistical analyses, ignoring imprecision and heterogeneity introduced by the prediction procedure. This will likely result in false positive findings and invalid scientific conclusions. In this work, we introduce an assumption-lean and data-adaptive Post-Prediction Inference (POP-Inf) procedure that allows valid and powerful inference based on ML-predicted outcomes. Its "assumption-lean" property guarantees reliable statistical inference without assumptions on the ML-prediction, for a wide range of statistical quantities. Its "data-adaptive'" feature guarantees an efficiency gain over existing post-prediction inference methods, regardless of the accuracy of ML-prediction. We demonstrate the superiority and applicability of our method through simulations and large-scale genomic data.	翻訳日:2023-11-27 16:14:39 公開日:2023-11-23
# 画像マニピュレーション検出のための新しいベンチマークとモデル A New Benchmark and Model for Challenging Image Manipulation Detection ( http://arxiv.org/abs/2311.14218v1 ) ライセンス: Link先を確認	Zhenfei Zhang, Mingyang Li and Ming-Ching Chang	(参考訳) マルチメディアデータの操作を検出する能力は、デジタル法医学において不可欠である。既存の画像操作検出(IMD)法は主に、画像編集や二重圧縮による異常な特徴の検出に基づいている。既存のimd技術はすべて、大きな画像から小さな改ざんされた領域を検出する際に困難に直面する。さらに、同一品質因子の二重圧縮の場合、圧縮に基づくimdアプローチは困難に直面する。そこで我々は,これらの課題に対処するために,編集ベースおよび圧縮ベースIMD手法をそれぞれ評価するための2つのサブセットから構成されるChallenging Image Manipulation Detection (CIMD)ベンチマークデータセットを提案する。データセットのイメージは手動で撮影され、高品質なアノテーションで改ざんされた。さらに,hrnetに基づく新たな2分岐ネットワークモデルを提案し,これらの課題条件において,画像編集と圧縮アーティファクトの両方をよりよく検出する手法を提案する。 CIMDベンチマークの大規模な実験により,本モデルはCIMD上でのSoTA IMD法よりも有意に優れていた。 The ability to detect manipulation in multimedia data is vital in digital forensics. Existing Image Manipulation Detection (IMD) methods are mainly based on detecting anomalous features arisen from image editing or double compression artifacts. All existing IMD techniques encounter challenges when it comes to detecting small tampered regions from a large image. Moreover, compression-based IMD approaches face difficulties in cases of double compression of identical quality factors. To investigate the State-of-The-Art (SoTA) IMD methods in those challenging conditions, we introduce a new Challenging Image Manipulation Detection (CIMD) benchmark dataset, which consists of two subsets, for evaluating editing-based and compression-based IMD methods, respectively. The dataset images were manually taken and tampered with high-quality annotations. In addition, we propose a new two-branch network model based on HRNet that can better detect both the image-editing and compression artifacts in those challenging conditions. Extensive experiments on the CIMD benchmark show that our model significantly outperforms SoTA IMD methods on CIMD.	翻訳日:2023-11-27 16:14:21 公開日:2023-11-23
# 射影アサーションをもつ量子プログラムの精製計算 Refinement calculus of quantum programs with projective assertions ( http://arxiv.org/abs/2311.14215v1 ) ライセンス: Link先を確認	Yuan Feng, Li Zhou, Yingte Xu	(参考訳) リファインメント解析は、プログラムの進歩的かつモジュラーな開発のための構造化フレームワークを提供し、リファインメントプロセスを通してその正確性を保証する。本稿では,量子プログラムに適した精細化計算法を提案する。この目的のために、まず、処方文を含む言語において、量子内での非決定論的プログラムの部分的正当性について検討する。状態ヒルベルト空間の部分空間と等価な直交射影は、量子状態のアサーションとして扱われる。非決定論的プログラムがトレース非インクリエーションスーパーオペレータのセットに関連付けられる記述的意味論に加えて、ポスト条件を最も弱いリベラルなポスト条件に変換し、逆にプリコンを最強ポスト条件に変換する意味論も提示する。その後、これらの双対意味論に基づいて洗練規則を導入し、様々な文脈で適用可能な量子プログラムの漸進的開発に体系的なアプローチを提供する。精錬計算の実際的な応用例を示すために,$z$回転ゲートの実装,反復コード,量子から量子へのベルヌーイ工場などについて検討する。さらに,正しい量子プログラムのステップワイズ開発に携わるプログラマに実用的なサポートを提供する,pythonベースのインタラクティブプロトタイプツールquireを提案する。 Refinement calculus provides a structured framework for the progressive and modular development of programs, ensuring their correctness throughout the refinement process. This paper introduces a refinement calculus tailored for quantum programs. To this end, we first study the partial correctness of nondeterministic programs within a quantum while language featuring prescription statements. Orthogonal projectors, which are equivalent to subspaces of the state Hilbert space, are taken as assertions for quantum states. In addition to the denotational semantics where a nondeterministic program is associated with a set of trace-nonincreasing super-operators, we also present their semantics in transforming a postcondition to the weakest liberal postconditions and, conversely, transforming a precondition to the strongest postconditions. Subsequently, refinement rules are introduced based on these dual semantics, offering a systematic approach to the incremental development of quantum programs applicable in various contexts. To illustrate the practical application of the refinement calculus, we examine examples such as the implementation of a $Z$-rotation gate, the repetition code, and the quantum-to-quantum Bernoulli factory. Furthermore, we present Quire, a Python-based interactive prototype tool that provides practical support to programmers engaged in the stepwise development of correct quantum programs.	翻訳日:2023-11-27 16:14:03 公開日:2023-11-23
# 機械学習プロジェクトにおけるバイアス検出による変数認識モデル選択の拡張 Extending Variability-Aware Model Selection with Bias Detection in Machine Learning Projects ( http://arxiv.org/abs/2311.14214v1 ) ライセンス: Link先を確認	Cristina Tavares, Nathalia Nascimento, Paulo Alencar, Donald Cowan	(参考訳) データサイエンスプロジェクトは、データ、コード、モデルに依存するさまざまな機械学習(ML)メソッドを含むことが多い。これらのプロジェクトにおける重要な活動の1つは、手元のデータ分析に適したモデルやアルゴリズムの選択である。 mlモデルの選択は、サンプルサイズなどのデータ関連属性、予測アルゴリズムタイプのような機能要件、パフォーマンスやバイアスなどの非機能要件など、いくつかの要因に依存する。しかし、このような選択に影響を与える要因はよく理解されず、明確に表現される。本稿では,mlプロジェクトにおけるバイアス検出を用いた適応的可変性認識モデル選択手法の拡張について述べる。方法は次のとおりである。 (i)文献に提示されたヒューリスティックスに基づく特徴モデルを用いたモデル選択に影響する要因の変動のモデル化 (ii)バイアスに関連する追加機能(例えば、バイアス関連指標)による変動性モデルのインスタンス化、 (iii)心不全予測プロジェクトに基づいたアプローチを説明するための、特定のケーススタディにおける方法を示す実験を行うこと。提案手法は,モデル選択に影響を及ぼす明示的な要因,特にバイアスに関連する要因を,その相互作用にもとづく技術の発展を目標としている。提供された表現は、MLプロジェクトのモデル選択を非アドホックで適応的で説明可能なプロセスに変換することができる。 Data science projects often involve various machine learning (ML) methods that depend on data, code, and models. One of the key activities in these projects is the selection of a model or algorithm that is appropriate for the data analysis at hand. ML model selection depends on several factors, which include data-related attributes such as sample size, functional requirements such as the prediction algorithm type, and non-functional requirements such as performance and bias. However, the factors that influence such selection are often not well understood and explicitly represented. This paper describes ongoing work on extending an adaptive variability-aware model selection method with bias detection in ML projects. The method involves: (i) modeling the variability of the factors that affect model selection using feature models based on heuristics proposed in the literature; (ii) instantiating our variability model with added features related to bias (e.g., bias-related metrics); and (iii) conducting experiments that illustrate the method in a specific case study to illustrate our approach based on a heart failure prediction project. The proposed approach aims to advance the state of the art by making explicit factors that influence model selection, particularly those related to bias, as well as their interactions. The provided representations can transform model selection in ML projects into a non ad hoc, adaptive, and explainable process.	翻訳日:2023-11-27 16:13:40 公開日:2023-11-23
# アノテーション感性:訓練データ収集手法がモデル性能に与える影響 Annotation Sensitivity: Training Data Collection Methods Affect Model Performance ( http://arxiv.org/abs/2311.14212v1 ) ライセンス: Link先を確認	Christoph Kern, Stephanie Eckman, Jacob Beck, Rob Chew, Bolei Ma, Frauke Kreuter	(参考訳) ヒューマンアノテータからトレーニングデータを収集する場合、アノテーション機器の設計、アノテータに与えられる指示、アノテータの特性、それらの相互作用はトレーニングデータに影響を与える可能性がある。本研究は,アノテーション楽器作成時の設計選択が,結果のアノテーションに基づいてトレーニングされたモデルにも影響を与えることを実証する。アノテーションの感度という用語を導入し、アノテーションデータ収集メソッドがアノテーション自身と下流モデルのパフォーマンスと予測に与える影響について紹介する。アノテーション装置の5つの実験条件においてヘイトスピーチと攻撃的言語のアノテーションを収集し,アノテータを条件にランダムに割り当てる。次に、得られた5つのデータセットのそれぞれでBERTモデルを微調整し、各条件のホールドアウト部分でモデル性能を評価する。条件によってかなり異なることが分かりました 1)ヘイトスピーチ/違反言語アノテーションの共有 2)モデル性能 3)モデル予測,及び 4)モデル学習曲線。本研究は,機械学習の文献にはほとんど注目されていない楽器が果たす重要な役割を強調した。楽器設計におけるベストプラクティスの発展を知らせるために,アノテーションにどのような影響を与えるのか,またその理由について,さらなる研究を求めている。 When training data are collected from human annotators, the design of the annotation instrument, the instructions given to annotators, the characteristics of the annotators, and their interactions can impact training data. This study demonstrates that design choices made when creating an annotation instrument also impact the models trained on the resulting annotations. We introduce the term annotation sensitivity to refer to the impact of annotation data collection methods on the annotations themselves and on downstream model performance and predictions. We collect annotations of hate speech and offensive language in five experimental conditions of an annotation instrument, randomly assigning annotators to conditions. We then fine-tune BERT models on each of the five resulting datasets and evaluate model performance on a holdout portion of each condition. We find considerable differences between the conditions for 1) the share of hate speech/offensive language annotations, 2) model performance, 3) model predictions, and 4) model learning curves. Our results emphasize the crucial role played by the annotation instrument which has received little attention in the machine learning literature. We call for additional research into how and why the instrument impacts the annotations to inform the development of best practices in instrument design.	翻訳日:2023-11-27 16:13:22 公開日:2023-11-23
# 球状量子ドットにおける位置依存質量粒子のエネルギー固有状態 Energy eigenstates of position-dependent mass particles in a spherical quantum dot ( http://arxiv.org/abs/2311.14211v1 ) ライセンス: Link先を確認	R. M. Lima and H. R. Christiansen	(参考訳) 量子ドットへの3次元アプローチにより、ハミルトニアンの集合に対する非一様質量粒子の正確なエネルギースペクトルを求める。粒子の運動量と質量の順序の異なる一般化されたシュリンガー方程式の集合を考えると、エネルギー境界状態は硬い境界条件に対して解析的に計算される。この結果は原子物理学と量子ドット理論に非常に興味がある。 We obtain the exact energy spectrum of nonuniform mass particles for a collection of Hamiltonians in a three-dimensional approach to a quantum dot. By considering a set of generalized Schr\"odinger equations with different orderings between the particle's momentum and mass, the energy bound-states are calculated analytically for hard boundary conditions. The present results are of interest in atomic physics and quantum dot theory.	翻訳日:2023-11-27 16:13:02 公開日:2023-11-23
# ECRF:周波数領域最適化を用いたエントロピー制約ニューラルラジアンス場圧縮 ECRF: Entropy-Constrained Neural Radiance Fields Compression with Frequency Domain Optimization ( http://arxiv.org/abs/2311.14208v1 ) ライセンス: Link先を確認	Soonbin Lee, Fangwen Shu, Yago Sanchez, Thomas Schierl, Cornelius Hellge	(参考訳) 明示的な機能グリッドベースのNeRFモデルは、レンダリング品質とトレーニングにおける大幅なスピードアップの点で有望な結果を示している。しかし、これらのメソッドは単一のシーンやオブジェクトを表現するのに大量のデータを必要とすることが多い。本研究では,データサイズを効果的に削減するために,周波数領域のエントロピーを最小化することを目的とした圧縮モデルを提案する。まず、テンソル放射場上の離散コサイン変換(DCT)を用いて特徴グリッドを圧縮する。この特徴グリッドは係数に変換され、従来のビデオ符号化パイプラインと同様のアプローチに従って量子化されエントロピー符号化される。さらに,高レベルのスパーシティを実現するために,周波数領域,特に特徴格子のdct係数に対するエントロピーパラメータ化手法を提案する。変換係数はトレーニング段階で最適化されるため、提案モデルでは微調整や追加情報を必要としない。我々のモデルは、符号化と復号化のために軽量な圧縮パイプラインのみを必要とするため、実世界のアプリケーションにボリュームラディアンスフィールド法を適用するのが容易になる。実験により,提案する周波数領域エントロピーモデルにより,各種データセットの圧縮性能が向上することを示す。ソースコードは一般公開される予定だ。 Explicit feature-grid based NeRF models have shown promising results in terms of rendering quality and significant speed-up in training. However, these methods often require a significant amount of data to represent a single scene or object. In this work, we present a compression model that aims to minimize the entropy in the frequency domain in order to effectively reduce the data size. First, we propose using the discrete cosine transform (DCT) on the tensorial radiance fields to compress the feature-grid. This feature-grid is transformed into coefficients, which are then quantized and entropy encoded, following a similar approach to the traditional video coding pipeline. Furthermore, to achieve a higher level of sparsity, we propose using an entropy parameterization technique for the frequency domain, specifically for DCT coefficients of the feature-grid. Since the transformed coefficients are optimized during the training phase, the proposed model does not require any fine-tuning or additional information. Our model only requires a lightweight compression pipeline for encoding and decoding, making it easier to apply volumetric radiance field methods for real-world applications. Experimental results demonstrate that our proposed frequency domain entropy model can achieve superior compression performance across various datasets. The source code will be made publicly available.	翻訳日:2023-11-27 16:12:55 公開日:2023-11-23
# 人工知能を用いたインフラプロジェクトのデータ駆動リスクモデリング Data-Driven Risk Modeling for Infrastructure Projects Using Artificial Intelligence Techniques ( http://arxiv.org/abs/2311.14203v1 ) ライセンス: Link先を確認	Abdolmajid Erfani	(参考訳) プロジェクトリスクの管理は、あらゆる大規模プロジェクトの成功の鍵となる部分であり、公共機関がインフラを提供するためのベストプラクティスとして広く認識されている。プロジェクトのリスクを識別し評価する従来の方法は、プロジェクトの初期段階のリスクワークショップにおいて、主題の専門家からインプットを受けることを伴う。プロジェクトがライフサイクルを進むにつれて、これらのリスクと評価が進化します。一部のリスクは問題になり、一部は軽減され、一部はもはや重要でないとして引退する。従来のエキスパートベースのアプローチが提供した価値にもかかわらず、時間とコストのかかるプロセスのために、いくつかの課題が残っている。さらに、リスクが前者から前者へとどのように進化していくかは、時間とともに限られている。プロジェクト実行中に何が起こるかと比較して、プロジェクトチームは初期段階のリスクを特定し、評価しますか? 過去のデータと人工知能技術を用いて、リスクを自動的に識別し、早期のリスクレジスタとリスク評価の品質を調べるデータ駆動型フレームワークを導入することで、これらの制限に対処した。 70以上のアメリカの主要輸送プロジェクトのリスクレジスタが入力データセットを形成する。 Managing project risk is a key part of the successful implementation of any large project and is widely recognized as a best practice for public agencies to deliver infrastructures. The conventional method of identifying and evaluating project risks involves getting input from subject matter experts at risk workshops in the early phases of a project. As a project moves through its life cycle, these identified risks and their assessments evolve. Some risks are realized to become issues, some are mitigated, and some are retired as no longer important. Despite the value provided by conventional expert-based approaches, several challenges remain due to the time-consuming and expensive processes involved. Moreover, limited is known about how risks evolve from ex-ante to ex-post over time. How well does the project team identify and evaluate risks in the initial phase compared to what happens during project execution? Using historical data and artificial intelligence techniques, this study addressed these limitations by introducing a data-driven framework to identify risks automatically and to examine the quality of early risk registers and risk assessments. Risk registers from more than 70 U.S. major transportation projects form the input dataset.	翻訳日:2023-11-27 16:12:31 公開日:2023-11-23
# 深層学習に基づく放射線レポート生成研究の体系的レビュー A Systematic Review of Deep Learning-based Research on Radiology Report Generation ( http://arxiv.org/abs/2311.14199v1 ) ライセンス: Link先を確認	Chang Liu, Yuanhe Tian, Yan Song	(参考訳) 放射線学報告生成(RRG)は、胸部X線画像などの臨床放射線写真から自由テキスト記述を自動的に生成することを目的としている。 rrgは臨床自動化の促進に欠かせない役割を担っており、経験の浅い医師や放射線科医の作業の軽減に役立つ。したがって、これらの有意義なポテンシャルを考えると、RRGの研究は過去半年で爆発的な成長を経験しており、特にディープラーニングアプローチの急速な発展と共にである。既存の研究は、様々なモダリティの強化の観点からRRGを実行し、視覚情報とテキスト情報の両方から詳細な特徴を持つレポート生成プロセスを最適化するための洞察を与え、それら間の相互モーダル相互作用によりRRGを促進する。本稿では,深層学習に基づくRRGについて,様々な観点から概観する。具体的には、まず、無線グラフのタスク固有の特徴、レポート、それらの間の相互関係に基づいて、重要なRRGアプローチを取り上げ、その後、従来のベンチマークデータセットを評価指標で説明し、その後、異なるアプローチのパフォーマンスを分析し、最後に、今後の課題とトレンドについて概説する。本論文の目的は,既存の文献を理解するためのツールとして機能し,RRG分野における潜在的価値研究を促進することである。 Radiology report generation (RRG) aims to automatically generate free-text descriptions from clinical radiographs, e.g., chest X-Ray images. RRG plays an essential role in promoting clinical automation and presents significant help to provide practical assistance for inexperienced doctors and alleviate radiologists' workloads. Therefore, consider these meaningful potentials, research on RRG is experiencing explosive growth in the past half-decade, especially with the rapid development of deep learning approaches. Existing studies perform RRG from the perspective of enhancing different modalities, provide insights on optimizing the report generation process with elaborated features from both visual and textual information, and further facilitate RRG with the cross-modal interactions among them. In this paper, we present a comprehensive review of deep learning-based RRG from various perspectives. Specifically, we firstly cover pivotal RRG approaches based on the task-specific features of radiographs, reports, and the cross-modal relations between them, and then illustrate the benchmark datasets conventionally used for this task with evaluation metrics, subsequently analyze the performance of different approaches and finally offer our summary on the challenges and the trends in future directions. Overall, the goal of this paper is to serve as a tool for understanding existing literature and inspiring potential valuable research in the field of RRG.	翻訳日:2023-11-27 16:12:16 公開日:2023-11-23
# 伝達学習に基づくリアルタイム拳銃検出 Transfer Learning-based Real-time Handgun Detection ( http://arxiv.org/abs/2311.13559v2 ) ライセンス: Link先を確認	Youssef Elmir, Sid Ahmed Laouar, Larbi Hamdaoui	(参考訳) 従来の監視システムは人間の注意に依存し、その効果を制限している。本研究では,畳み込みニューラルネットワークとトランスファー学習を用いて,拳銃自動検出のためのリアルタイムコンピュータビジョンシステムを開発した。オンライン拳銃検出手法の包括的分析を行い,偽陽性の低減と学習時間の短縮を強調する。転校学習は効果的なアプローチとして示される。技術的課題にもかかわらず、提案システムは84.74%の精度を実現し、関連する作業に匹敵する有望な性能を示し、より高速な学習と精度の高い自動拳銃検出を可能にした。本研究は, 人体監視依存度を低減し, 効率・信頼性の高い拳銃検出のための伝達学習アプローチの可能性を示す。 Traditional surveillance systems rely on human attention, limiting their effectiveness. This study employs convolutional neural networks and transfer learning to develop a real-time computer vision system for automatic handgun detection. Comprehensive analysis of online handgun detection methods is conducted, emphasizing reducing false positives and learning time. Transfer learning is demonstrated as an effective approach. Despite technical challenges, the proposed system achieves a precision rate of 84.74%, demonstrating promising performance comparable to related works, enabling faster learning and accurate automatic handgun detection for enhanced security. This research advances security measures by reducing human monitoring dependence, showcasing the potential of transfer learning-based approaches for efficient and reliable handgun detection.	翻訳日:2023-11-27 12:28:03 公開日:2023-11-23
# LucidDreamer:3Dガウス撮影シーンのドメインフリー生成 LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes ( http://arxiv.org/abs/2311.13384v2 ) ライセンス: Link先を確認	Jaeyoung Chung, Suyoung Lee, Hyeongjin Nam, Jaerin Lee, Kyoung Mu Lee	(参考訳) VR機器やコンテンツの普及に伴い、3Dシーン生成技術への需要が高まっている。しかし、既存の3Dシーン生成モデルでは、ターゲットシーンを特定のドメインに制限している。このような制限に対処するために,既存の大規模拡散ベース生成モデルのパワーをフル活用したドメインフリーシーン生成パイプラインであるLucidDreamerを提案する。我々のLucidDreamerには、DreamingとAlignmentという2つの別のステップがあります。まず、入力から複数視点の一貫した画像を生成するため、ポイントクラウドを画像生成ごとに幾何学的ガイドラインとして設定する。具体的には、ポイントクラウドの一部を所望の視点に投影し、生成モデルを用いて絵を描くためのガイダンスとしてプロジェクションを提供する。塗装された画像は、推定深度マップで3D空間に持ち上げられ、新しいポイントを構成する。次に,新たなポイントを3Dシーンに集約するために,新たに生成された3Dシーンの一部を調和的に統合するアライメントアルゴリズムを提案する。最終的に得られた3Dシーンはガウススプラッターを最適化する最初のポイントとなる。 LucidDreamerは、従来の3Dシーン生成手法と比較して、ターゲットシーンのドメインに制約がなく、非常に詳細なガウススプラットを生成する。プロジェクトページ: https://luciddreamer-cvlab.github.io/ With the widespread usage of VR devices and contents, demands for 3D scene generation techniques become more popular. Existing 3D scene generation models, however, limit the target scene to specific domain, primarily due to their training strategies using 3D scan dataset that is far from the real-world. To address such limitation, we propose LucidDreamer, a domain-free scene generation pipeline by fully leveraging the power of existing large-scale diffusion-based generative model. Our LucidDreamer has two alternate steps: Dreaming and Alignment. First, to generate multi-view consistent images from inputs, we set the point cloud as a geometrical guideline for each image generation. Specifically, we project a portion of point cloud to the desired view and provide the projection as a guidance for inpainting using the generative model. The inpainted images are lifted to 3D space with estimated depth maps, composing a new points. Second, to aggregate the new points into the 3D scene, we propose an aligning algorithm which harmoniously integrates the portions of newly generated 3D scenes. The finally obtained 3D scene serves as initial points for optimizing Gaussian splats. LucidDreamer produces Gaussian splats that are highly-detailed compared to the previous 3D scene generation methods, with no constraint on domain of the target scene. Project page: https://luciddreamer-cvlab.github.io/	翻訳日:2023-11-27 12:27:52 公開日:2023-11-23
# 逆流モデルのない微動拡散モデルへの人間のフィードバックの利用 Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model ( http://arxiv.org/abs/2311.13231v2 ) ライセンス: Link先を確認	Kai Yang, Jian Tao, Jiafei Lyu, Chunjiang Ge, Jiaxin Chen, Qimai Li, Weihan Shen, Xiaolong Zhu, Xiu Li	(参考訳) 人間のフィードバックを用いた強化学習(RLHF)は微調整拡散モデルにおいて有望である。これまでの方法は、人間の好みに合わせて報酬モデルをトレーニングし、RL技術を利用して基礎となるモデルを微調整することから始まる。しかし、効率的な報酬モデルを作成するには、膨大なデータセット、最適なアーキテクチャ、手動のハイパーパラメータチューニングが必要である。大規模言語モデルの微調整に有効な直接選好最適化(DPO)法は,報奨モデルの必要性を排除している。しかし,拡散モデルのデノイングプロセスにおけるGPUメモリの広範な要求は,DPO法の直接適用を妨げる。この問題に対処するため、直列拡散モデルにD3PO(Denoising Diffusion Policy Optimization)法を導入する。理論的解析により,D3POは報酬モデルのトレーニングを省略するが,人間のフィードバックデータを用いて学習過程をガイドする最適な報酬モデルとして効果的に機能することが示された。このアプローチでは、報酬モデルのトレーニングを必要とせず、より直接的でコスト効率が良く、計算オーバーヘッドを最小限に抑えることが証明される。実験では,目的の相対尺度を人間の嗜好のプロキシとして使用し,地道報酬を用いた手法に匹敵する結果を与える。さらに、D3POは画像歪み率を低減し、より安全な画像を生成する能力を示し、堅牢な報酬モデルに欠ける課題を克服する。私たちのコードはhttps://github.com/yk7333/D3PO/tree/mainで公開されています。 Using reinforcement learning with human feedback (RLHF) has shown significant promise in fine-tuning diffusion models. Previous methods start by training a reward model that aligns with human preferences, then leverage RL techniques to fine-tune the underlying models. However, crafting an efficient reward model demands extensive datasets, optimal architecture, and manual hyperparameter tuning, making the process both time and cost-intensive. The direct preference optimization (DPO) method, effective in fine-tuning large language models, eliminates the necessity for a reward model. However, the extensive GPU memory requirement of the diffusion model's denoising process hinders the direct application of the DPO method. To address this issue, we introduce the Direct Preference for Denoising Diffusion Policy Optimization (D3PO) method to directly fine-tune diffusion models. The theoretical analysis demonstrates that although D3PO omits training a reward model, it effectively functions as the optimal reward model trained using human feedback data to guide the learning process. This approach requires no training of a reward model, proving to be more direct, cost-effective, and minimizing computational overhead. In experiments, our method uses the relative scale of objectives as a proxy for human preference, delivering comparable results to methods using ground-truth rewards. Moreover, D3PO demonstrates the ability to reduce image distortion rates and generate safer images, overcoming challenges lacking robust reward models. Our code is publicly available in https://github.com/yk7333/D3PO/tree/main.	翻訳日:2023-11-27 12:27:31 公開日:2023-11-23
# DRIFu: 識別可能なレンダリングとインシシト関数に基づくシングルビュー3D再構成 DRIFu: Differentiable Rendering and Implicit Function-based Single-View 3D Reconstruction ( http://arxiv.org/abs/2311.13199v2 ) ライセンス: Link先を確認	Zijian Kuang, Lihang Ying, Shi Jin, Li Cheng	(参考訳) DRIFu(Dariable Rendering and Implicit Function-based model)は、当初は人体用に設計された3Dデジタル化技術のパイオニアであるPixel-aligned Implicit Function(PIFU)をルーツとしている。 PIFUは低次元空間におけるニュアンスドボディー形状の変化を捉え、ヒトの3Dスキャンで広範囲に訓練されている。しかし, 生動物へのピフの応用は, 主に3dスキャンのための動物の協力を得るのが困難であるため, 重要な課題となっている。この課題への対応として,動物デジタル化に特化したdrifuモデルを提案する。 DRIFuの訓練には、様々な形状、大きさ、さらには赤ちゃんの鳥などのバリエーションを考慮に入れた合成3D動物モデルを用いている。私たちの革新的なアライメントツールは、これらの多様な合成動物モデルを統一されたテンプレートにマッピングする上で重要な役割を担います。重要なことは、私たちのテンプレートアライメント戦略は共有された形状空間を確立し、新しい動物の形をシームレスにサンプリングし、それらをリアルに撮り、アニメーションし、それらを現実世界のデータと整合させる。この画期的なアプローチは、鳥の形を包括的に理解し表現する能力に革命をもたらします。プロジェクトの詳細とアクセスについては、プロジェクトのwebサイトがhttps://github.com/kuangzijian/drifu-for-animalsにある。 The Differentiable Rendering and Implicit Function-based model (DRIFu) draws its roots from the Pixel-aligned Implicit Function (PIFU), a pioneering 3D digitization technique initially designed for clothed human bodies. PIFU excels in capturing nuanced body shape variations within a low-dimensional space and has been extensively trained on human 3D scans. However, the application of PIFU to live animals poses significant challenges, primarily due to the inherent difficulty in obtaining the cooperation of animals for 3D scanning. In response to this challenge, we introduce the DRIFu model, specifically tailored for animal digitization. To train DRIFu, we employ a curated set of synthetic 3D animal models, encompassing diverse shapes, sizes, and even accounting for variations such as baby birds. Our innovative alignment tools play a pivotal role in mapping these diverse synthetic animal models onto a unified template, facilitating precise predictions of animal shape and texture. Crucially, our template alignment strategy establishes a shared shape space, allowing for the seamless sampling of new animal shapes, posing them realistically, animating them, and aligning them with real-world data. This groundbreaking approach revolutionizes our capacity to comprehensively understand and represent avian forms. For further details and access to the project, the project website can be found at https://github.com/kuangzijian/drifu-for-animals	翻訳日:2023-11-27 12:27:04 公開日:2023-11-23
# マルチモーダルインコンテキスト学習によるエゴ進化型シーンテキスト認識 Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer ( http://arxiv.org/abs/2311.13120v2 ) ライセンス: Link先を確認	Zhen Zhao, Jingqun Tang, Chunhui Lin, Binghong Wu, Hao Liu, Zhizhong Zhang, Xin Tan, Can Huang, Yuan Xie	(参考訳) 野生のシーンテキスト認識(STR)は、ドメインのバリエーション、フォントの多様性、形状の変形などに対処する際の課題に頻繁に遭遇する。簡単な解決策は、特定のシナリオに合わせて微調整を行うことだが、計算量が多く、様々なシナリオに複数のモデルコピーを必要とする。近年の研究では、大規模言語モデル(LLM)が、訓練のない方法でいくつかの実演例から学習できることが示されている。それでも、LLMをテキスト認識器として適用することは許容できないリソース消費である。さらに,本実験の結果から,ILCがSTRで失敗するのは,学習段階における多様なサンプルからの文脈情報の組み入れが不十分であったためと考えられる。そこで本稿では,コンテキストに富んだシーンテキストシーケンスをトレーニングしたSTRモデルであるE$^2$STRを紹介し,提案したコンテキスト内トレーニング戦略を用いてシーケンスを生成する。 E$^2$STR は、STR において有効な ICL 機能を達成するのに、正規サイズのモデルで十分であることを示す。大規模な実験により、E$^2$STRは、様々なシナリオにおいて顕著なトレーニングなし適応を示し、公開ベンチマークにおける微調整された最先端アプローチよりも優れていた。 Scene text recognition (STR) in the wild frequently encounters challenges when coping with domain variations, font diversity, shape deformations, etc. A straightforward solution is performing model fine-tuning tailored to a specific scenario, but it is computationally intensive and requires multiple model copies for various scenarios. Recent studies indicate that large language models (LLMs) can learn from a few demonstration examples in a training-free manner, termed "In-Context Learning" (ICL). Nevertheless, applying LLMs as a text recognizer is unacceptably resource-consuming. Moreover, our pilot experiments on LLMs show that ICL fails in STR, mainly attributed to the insufficient incorporation of contextual information from diverse samples in the training stage. To this end, we introduce E$^2$STR, a STR model trained with context-rich scene text sequences, where the sequences are generated via our proposed in-context training strategy. E$^2$STR demonstrates that a regular-sized model is sufficient to achieve effective ICL capabilities in STR. Extensive experiments show that E$^2$STR exhibits remarkable training-free adaptation in various scenarios and outperforms even the fine-tuned state-of-the-art approaches on public benchmarks.	翻訳日:2023-11-27 12:26:02 公開日:2023-11-23
# 大規模基礎モデルの自律運転への適用 Applications of Large Scale Foundation Models for Autonomous Driving ( http://arxiv.org/abs/2311.12144v3 ) ライセンス: Link先を確認	Yu Huang, Yue Chen, Zhu Li	(参考訳) 2004/05年のDARPA Grand Challenges、2007年のUrban Challenges以来、自動運転はAIアプリケーションの最も活発な分野となっている。近年,大規模言語モデル (LLM) を基盤として,チャットGPT や PaLM などのチャットシステムが出現し,自然言語処理 (NLP) において人工知能 (AGI) を実現するための有望な方向となった。自動運転の改革にこれらの能力を使うことは自然な考えだ。 llmを基礎モデルと組み合わせることで、人間の知識、常識、推論を利用して、現在のロングテールのaiジレンマから自動運転システムを再構築することができる。本稿では、シミュレーション、世界モデル、データアノテーションと計画、E2Eソリューションなどに分類される、自動運転に応用された基礎モデルとLLMの技術について検討する。 Since DARPA Grand Challenges (rural) in 2004/05 and Urban Challenges in 2007, autonomous driving has been the most active field of AI applications. Recently powered by large language models (LLMs), chat systems, such as chatGPT and PaLM, emerge and rapidly become a promising direction to achieve artificial general intelligence (AGI) in natural language processing (NLP). There comes a natural thinking that we could employ these abilities to reformulate autonomous driving. By combining LLM with foundation models, it is possible to utilize the human knowledge, commonsense and reasoning to rebuild autonomous driving systems from the current long-tailed AI dilemma. In this paper, we investigate the techniques of foundation models and LLMs applied for autonomous driving, categorized as simulation, world model, data annotation and planning or E2E solutions etc.	翻訳日:2023-11-27 12:25:11 公開日:2023-11-23
# edgefm: エッジ上のオープンセット学習に基盤モデルを活用する EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge ( http://arxiv.org/abs/2311.10986v3 ) ライセンス: Link先を確認	Bufang Yang, Lixing He, Neiwen Ling, Zhenyu Yan, Guoliang Xing, Xian Shuai, Xiaozhe Ren, Xin Jiang	(参考訳) ディープラーニング(DL)モデルは、DLアルゴリズムとチップの進歩の助けを借りて、IoTデバイスに広くデプロイされている。しかし、エッジデバイスの限られたリソースは、これらのデバイス上のDLモデルを様々な環境やタスクに一般化することを困難にしている。最近出現した基盤モデル(FM)は、驚くべき一般化力を示しているが、リソース制限エッジデバイスにFMの豊富な知識を効果的に活用する方法はまだ検討されていない。本稿では,オープンセット認識機能を備えたエッジクラウド協調システムであるEdgeFMを提案する。 EdgeFMは、クラウド上のFMに問い合わせるためにラベルのないデータを選択的にアップロードし、エッジモデルの特定の知識とアーキテクチャをカスタマイズする。一方、EdgeFMは、データ不確実性と動的ネットワークのばらつきの両方を考慮して、実行時に動的モデル切替を行うため、元のFMに常に近い精度が保証される。 2つのエッジプラットフォームに2つのfmsを使用してedgefmを実装します。 EdgeFMを3つの公開データセットと2つの自己収集データセットで評価する。結果としてEdgeFMは、エンドツーエンドのレイテンシを3.2倍に削減し、ベースラインと比較して34.3%の精度向上を実現している。 Deep Learning (DL) models have been widely deployed on IoT devices with the help of advancements in DL algorithms and chips. However, the limited resources of edge devices make these on-device DL models hard to be generalizable to diverse environments and tasks. Although the recently emerged foundation models (FMs) show impressive generalization power, how to effectively leverage the rich knowledge of FMs on resource-limited edge devices is still not explored. In this paper, we propose EdgeFM, a novel edge-cloud cooperative system with open-set recognition capability. EdgeFM selectively uploads unlabeled data to query the FM on the cloud and customizes the specific knowledge and architectures for edge models. Meanwhile, EdgeFM conducts dynamic model switching at run-time taking into account both data uncertainty and dynamic network variations, which ensures the accuracy always close to the original FM. We implement EdgeFM using two FMs on two edge platforms. We evaluate EdgeFM on three public datasets and two self-collected datasets. Results show that EdgeFM can reduce the end-to-end latency up to 3.2x and achieve 34.3% accuracy increase compared with the baseline.	翻訳日:2023-11-27 12:24:55 公開日:2023-11-23
# 潜在空間における乱れによる回帰の因果的説明 Counterfactual Explanation for Regression via Disentanglement in Latent Space ( http://arxiv.org/abs/2311.08228v3 ) ライセンス: Link先を確認	Xuan Zhao and Klaus Broelemann and Gjergji Kasneci	(参考訳) 予測モデルの予測に影響を与える要因は、ユーザの視点からより好ましい結果を得るために、どのように変えられるのか? このように、簡単に理解可能な説明を表現できるため、AIシステムとのユーザインタラクションをガイドする可能性を秘めている。適用するには、CEは現実的で実行可能でなければなりません。文献では、CEを生成する様々な方法が提案されている。しかし、CEに関する研究の大部分は、「拒否されたローンを承認するために何をすべきか?」といった疑問が提起されるような分類問題に焦点が当てられている。実際には、"給与を上げるために何をすべきか?"というような質問に答えることは、より回帰的な性質です。本稿では,ラベル関係をラベル非関係次元から潜在空間に分離して,事前学習したレグレッセプタのcesを生成する新しい手法を提案する。 CEはラベル非関連次元と事前定義された出力を組み合わせることで生成される。このアプローチの背景にある直感は、理想的な反事実探索は、入力のラベル非関連特性に焦点を合わせ、ターゲット関連特性への変化を提案することである。潜在領域での検索はこの目標を達成するのに役立つ。本手法は,反事実探索中にクエリサンプルの特性を維持していることを示す。様々な実験において、回帰問題設定における画像と表のデータセットの異なる品質尺度に基づいて、提案手法が競合することを示した。リアルな高次元機械学習アプリケーションに不可欠な3つの最先端手法と比較して、元のデータ多様体に近い結果を効率よく返します。私たちのコードは、この作業の公開時にオープンソースパッケージとして公開されます。 Counterfactual Explanations (CEs) help address the question: How can the factors that influence the prediction of a predictive model be changed to achieve a more favorable outcome from a user's perspective? Thus, they bear the potential to guide the user's interaction with AI systems since they represent easy-to-understand explanations. To be applicable, CEs need to be realistic and actionable. In the literature, various methods have been proposed to generate CEs. However, the majority of research on CEs focuses on classification problems where questions like "What should I do to get my rejected loan approved?" are raised. In practice, answering questions like "What should I do to increase my salary?" are of a more regressive nature. In this paper, we introduce a novel method to generate CEs for a pre-trained regressor by first disentangling the label-relevant from the label-irrelevant dimensions in the latent space. CEs are then generated by combining the label-irrelevant dimensions and the predefined output. The intuition behind this approach is that the ideal counterfactual search should focus on the label-irrelevant characteristics of the input and suggest changes toward target-relevant characteristics. Searching in the latent space could help achieve this goal. We show that our method maintains the characteristics of the query sample during the counterfactual search. In various experiments, we demonstrate that the proposed method is competitive based on different quality measures on image and tabular datasets in regression problem settings. It efficiently returns results closer to the original data manifold compared to three state-of-the-art methods, which is essential for realistic high-dimensional machine learning applications. Our code will be made available as an open-source package upon the publication of this work.	翻訳日:2023-11-27 12:24:36 公開日:2023-11-23
# ライダー位置認識のためのポーズグラフ注意グラフニューラルネットワーク Pose-Graph Attentional Graph Neural Network for Lidar Place Recognition ( http://arxiv.org/abs/2309.00168v3 ) ライセンス: Link先を確認	Milad Ramezani, Liang Wang, Joshua Knights, Zhibin Li, Pauline Pounds, Peyman Moghadam	(参考訳) 本稿では,現在somaプレース認識法で実施されている一般的なフレーム間検索問題とは対照的に,逐次および非系列のサブグラフ間の(キー)ノードの比較を行う,ポーズグラフ注目グラフニューラルネットワークであるp-gatを提案する。 p-gatは、ポーズグラフスラムの概念を利用して、既存のエンコーダによって生成された隣り合うクラウドディスクリプタ間の最大空間的および時間的情報を利用する。 p-gatは、アテンション内およびグラフニューラルネットワークを利用して、ユークリッド空間の近傍で捕獲された点雲とその特徴空間への埋め込みを関連付ける。大規模公開データセットにおける実験結果は,異なる特徴を欠いた場面や,トレーニング環境やテスト環境が異なる分布(ドメイン適応)を持つ場面において,我々のアプローチの有効性を示す。さらに,最先端技術との比較により,性能向上が見られた。コードはhttps://github.com/csiro-robotics/p-gatで入手できる。 This paper proposes a pose-graph attentional graph neural network, called P-GAT, which compares (key)nodes between sequential and non-sequential sub-graphs for place recognition tasks as opposed to a common frame-to-frame retrieval problem formulation currently implemented in SOTA place recognition methods. P-GAT uses the maximum spatial and temporal information between neighbour cloud descriptors -- generated by an existing encoder -- utilising the concept of pose-graph SLAM. Leveraging intra- and inter-attention and graph neural network, P-GAT relates point clouds captured in nearby locations in Euclidean space and their embeddings in feature space. Experimental results on the large-scale publically available datasets demonstrate the effectiveness of our approach in scenes lacking distinct features and when training and testing environments have different distributions (domain adaptation). Further, an exhaustive comparison with the state-of-the-art shows improvements in performance gains. Code is available at https://github.com/csiro-robotics/P-GAT.	翻訳日:2023-11-27 12:23:56 公開日:2023-11-23
# オーバーザ・エアフェデレーション学習のためのチャネルおよびグラデーション・インポータンス・アウェア・スケジューリング Channel and Gradient-Importance Aware Device Scheduling for Over-the-Air Federated Learning ( http://arxiv.org/abs/2305.16854v4 ) ライセンス: Link先を確認	Yuchang Sun and Zehong lin and Yuyi Mao and Shi Jin and Jun Zhang	(参考訳) Federated Learning(FL)は、複数のデバイスが協力して、ローカルモデルの更新をアップロードすることで機械学習モデルをトレーニングする、一般的なプライバシ保護分散トレーニングスキームである。通信効率を向上させるため、flはアナログ変調を利用して電波の重ね合わせ特性を利用して、多数のデバイスがモデル更新をアグリゲーションに同時にアップロードできるように、aircomp(over-the-air computation)を適用している。しかし、アップリンクチャネルノイズは、デバイススケジューリングによって決定的に決定され、学習したモデル性能を損なうかなりのモデル凝集歪みを引き起こす。本稿では,ある確率に応じて各デバイスをスケジュールし,そのモデル更新をこのアグリゲーションの確率を用いて再重み付けする,チャネルノイズの負の影響を軽減するために,PO-FLと呼ばれるオーバーザエアFLの確率的デバイススケジューリングフレームワークを提案する。この凝集スキームの不偏性を証明し、凸損失関数と非凸損失関数の両方におけるpo-flの収束を実証する。我々の収束限界は、デバイススケジューリングがコミュニケーションの歪みとグローバル更新のばらつきを通じて学習性能に影響することを明かした。収束解析に基づいて、PO-FLにおけるデバイススケジューリング確率を最適化するチャネルと勾配重要度認識アルゴリズムをさらに開発する。広範なシミュレーション結果から,提案手法は,提案手法がベースライン法よりも高速に収束し,より優れたモデルを生成することを示す。 Federated learning (FL) is a popular privacy-preserving distributed training scheme, where multiple devices collaborate to train machine learning models by uploading local model updates. To improve communication efficiency, over-the-air computation (AirComp) has been applied to FL, which leverages analog modulation to harness the superposition property of radio waves such that numerous devices can upload their model updates concurrently for aggregation. However, the uplink channel noise incurs considerable model aggregation distortion, which is critically determined by the device scheduling and compromises the learned model performance. In this paper, we propose a probabilistic device scheduling framework for over-the-air FL, named PO-FL, to mitigate the negative impact of channel noise, where each device is scheduled according to a certain probability and its model update is reweighted using this probability in aggregation. We prove the unbiasedness of this aggregation scheme and demonstrate the convergence of PO-FL on both convex and non-convex loss functions. Our convergence bounds unveil that the device scheduling affects the learning performance through the communication distortion and global update variance. Based on the convergence analysis, we further develop a channel and gradient-importance aware algorithm to optimize the device scheduling probabilities in PO-FL. Extensive simulation results show that the proposed PO-FL framework with channel and gradient-importance awareness achieves faster convergence and produces better models than baseline methods.	翻訳日:2023-11-27 12:23:38 公開日:2023-11-23

Title

Authors

Abstract

論文公表日・翻訳日

# 分布移動に基づく敵防衛

Adversarial defense based on distribution transfer ( http://arxiv.org/abs/2311.13841v1 )

ライセンス: Link先を確認

Jiahao Chen, Diqun Yan, Li Dong,

(参考訳) 敵対的な例の存在は、ディープラーニングモデルとその応用に重大な脅威をもたらす。既存の防御手法は、敵の例に対してある種の弾力性を提供するが、しばしば精度の低下と一般化性能に悩まされ、堅牢性と一般化の間のトレードオフを達成できない。そこで本研究では, サンプル分布の観点から, 逆例問題を解釈し, 拡散モデルの分布伝達能力を利用して, 分布シフトに基づく防御手法を提案する。その中核となる考え方は、正規分布と対向分布の差を利用して、事前訓練された拡散モデルを用いて対向防御を実現することである。具体的には、逆方向のサンプルは、ソース分布から離れて前方拡散プロセスを実行し、その後、保護されたモデル(最適化モデル)出力で導かれる逆プロセスで通常の分布にマッピングする。 CIFAR10とImageNet30データセットの実験評価を行った。 8/255摂動を持つ無限ノルム攻撃では、それぞれ78.1%と83.5%の精度が達成されている。 128/255の摂動を持つ2ノルム攻撃の場合、精度は74.3%と82.5%である。また, 摂動振幅, 拡散反復, 適応攻撃を考慮した追加実験を行い, 提案手法の有効性を検証した。提案手法は,攻撃者が防御について知識を持っている場合でも,敵の例に効果的に対処できることを示す。従来のアプローチのギャップを埋め、高品質なオリジナルサンプルを復元し、モデルの堅牢性と一般化において優れたパフォーマンスを示す。

The presence of adversarial examples poses a significant threat to deep learning models and their applications. Existing defense methods provide certain resilience against adversarial examples, but often suffer from decreased accuracy and generalization performance, making it challenging to achieve a trade-off between robustness and generalization. To address this, our paper interprets the adversarial example problem from the perspective of sample distribution and proposes a defense method based on distribution shift, leveraging the distribution transfer capability of a diffusion model for adversarial defense. The core idea is to exploit the discrepancy between normal and adversarial sample distributions to achieve adversarial defense using a pretrained diffusion model. Specifically, an adversarial sample undergoes a forward diffusion process, moving away from the source distribution, followed by a reverse process guided by the protected model (victim model) output to map it back to the normal distribution. Experimental evaluations on CIFAR10 and ImageNet30 datasets are conducted, comparing with adversarial training and input preprocessing methods. For infinite-norm attacks with 8/255 perturbation, accuracy rates of 78.1% and 83.5% are achieved, respectively. For 2-norm attacks with 128/255 perturbation, accuracy rates are 74.3% and 82.5%. Additional experiments considering perturbation amplitude, diffusion iterations, and adaptive attacks also validate the effectiveness of the proposed method. Results demonstrate that even when the attacker has knowledge of the defense, the proposed distribution-based method effectively withstands adversarial examples. It fills the gaps of traditional approaches, restoring high-quality original samples and showcasing superior performance in model robustness and generalization.

翻訳日:2024-03-25 13:16:38 公開日:2023-11-23

# StockFormer: STL分解と自己注意ネットワークに基づくSwingトレーディング戦略

StockFormer: A Swing Trading Strategy Based on STL Decomposition and Self-Attention Networks ( http://arxiv.org/abs/2401.06139v1 )

ライセンス: Link先を確認

Bohan Ma, Yiheng Wang, Yuchao Lu, Tianzixuan Hu, Jinling Xu, Patrick Houlihan

(参考訳) 市場再編と投資家の楽観主義の高まりの中で、米国の株式市場は復活し、ポートフォリオの保護と成長のための高度なツールの必要性が高まっている。そこで我々は,スイングトレーディングに最適化された最先端のディープラーニングフレームワークである"Stockformer"を紹介した。 STL分解と自己アテンションネットワークを統合することで、StockformerはS&P 500の複雑なデータを使用して、ストックリターン予測を洗練する。提案手法では,訓練と検証のためのセグメンテーションデータ(2021年1月～2023年1月)とテスト(2023年2月～6月)を行った。試験中、stockformerの予測は10の業界モデルを上回り、主要な予測精度指標(mae、rmse、mape)において優れた精度を達成し、市場のトレンド検出において62.39%という驚くべき精度で達成された。私たちのバックテストでは、Stockformerのスイングトレーディング戦略は13.19%の累積リターンと30.80%の年次リターンをもたらし、現在の最先端モデルを大きく上回った。 stockformerは、この不安定な時代にイノベーションの指標として登場し、投資家に市場予測のための強力なツールを提供している。 Stockformerはhttps://github.com/Eric991005/Stockformer.comで公開されている。

Amidst ongoing market recalibration and increasing investor optimism, the U.S. stock market is experiencing a resurgence, prompting the need for sophisticated tools to protect and grow portfolios. Addressing this, we introduce "Stockformer," a cutting-edge deep learning framework optimized for swing trading, featuring the TopKDropout method for enhanced stock selection. By integrating STL decomposition and self-attention networks, Stockformer utilizes the S&P 500's complex data to refine stock return predictions. Our methodology entailed segmenting data for training and validation (January 2021 to January 2023) and testing (February to June 2023). During testing, Stockformer's predictions outperformed ten industry models, achieving superior precision in key predictive accuracy indicators (MAE, RMSE, MAPE), with a remarkable accuracy rate of 62.39% in detecting market trends. In our backtests, Stockformer's swing trading strategy yielded a cumulative return of 13.19% and an annualized return of 30.80%, significantly surpassing current state-of-the-art models. Stockformer has emerged as a beacon of innovation in these volatile times, offering investors a potent tool for market forecasting. To advance the field and foster community collaboration, we have open-sourced Stockformer, available at https://github.com/Eric991005/Stockformer.

翻訳日:2024-01-22 13:03:39 公開日:2023-11-23

# 短文クラスタリングのためのフェデレーション学習

Federated Learning for Short Text Clustering ( http://arxiv.org/abs/2312.07556v1 )

ライセンス: Link先を確認

Mengling Hu, Chaochao Chen, Weiming Liu, Xinting Liao, and Xiaolin Zheng

(参考訳) 短文クラスタリングは、多くの短文から貴重な洞察を引き出す上での意義について広く研究されている。本稿では、フェデレートされた短文クラスタリング(FSTC)問題、すなわち、異なるクライアントに分散された短文をクラスタリングすることに焦点を当て、これはプライバシー要件の下で現実的な問題である。中央サーバにショートテキストが格納されている集中型ショートテキストクラスタリング問題と比較して、fstcの問題はまだ検討されていない。このギャップを埋めるために,fstc(federated robust short text clustering)フレームワークを提案する。 FSTCには2つの主要なモジュール、すなわちロバストな短文クラスタリングモジュールとフェデレートされたクラスタセンターアグリゲーションモジュールが含まれる。堅牢なショートテキストクラスタリングモジュールは、各クライアントのローカルデータによる効果的なショートテキストクラスタリングモデルをトレーニングすることを目的としている。我々は,疑似教師付きデータの信頼性を確保するために,疑似ラベル生成のための最適なトランスポートとガウス-一様混合モデルを組み合わせた。フェデレーションクラスタセンターアグリゲーションモジュールは、ローカルな生データを効率的に共有することなく、クライアント間で知識を交換することを目的としている。サーバは、異なるクライアントからローカルクラスタセンターを集約し、各通信ラウンドのすべてのクライアントにグローバルセンターを送信する。 3つの短いテキストクラスタリングデータセットに関する実証研究は、FSTCがフェデレートされた短いテキストクラスタリングベースラインよりも大幅に優れていることを示した。

Short text clustering has been popularly studied for its significance in mining valuable insights from many short texts. In this paper, we focus on the federated short text clustering (FSTC) problem, i.e., clustering short texts that are distributed in different clients, which is a realistic problem under privacy requirements. Compared with the centralized short text clustering problem that short texts are stored on a central server, the FSTC problem has not been explored yet. To fill this gap, we propose a Federated Robust Short Text Clustering (FSTC) framework. FSTC includes two main modules, i.e., robust short text clustering module and federated cluster center aggregation module. The robust short text clustering module aims to train an effective short text clustering model with local data in each client. We innovatively combine optimal transport to generate pseudo-labels with Gaussian-uniform mixture model to ensure the reliability of the pseudo-supervised data. The federated cluster center aggregation module aims to exchange knowledge across clients without sharing local raw data in an efficient way. The server aggregates the local cluster centers from different clients and then sends the global centers back to all clients in each communication round. Our empirical studies on three short text clustering datasets demonstrate that FSTC significantly outperforms the federated short text clustering baselines.

翻訳日:2024-01-15 14:36:32 公開日:2023-11-23

# ランガナサンの再発見:知識グラフスペクトルによる彼の人生の原始的視点

Rediscovering Ranganathan: A Prismatic View of His Life through the Knowledge Graph Spectrum ( http://arxiv.org/abs/2401.03343v1 )

ライセンス: Link先を確認

B. Dutta and S. Arzoo

(参考訳) 本稿では,図書館情報科学(LIS)分野の先駆者の一人であるS.R.ランガナサン教授の伝記知識グラフ(KG)について述べる。ランガナサンに関する関連する事実のほとんどは、様々な資源(書籍、エッセイ、雑誌記事、ウェブサイト、ブログなど)に存在し、断片的で断片的な情報を提供する。この献身的なkg (henceforth, rkg) により、我々は彼の生涯と業績を360度見れるようにしたい。私たちの知る限りでは、このような専門的な表現は、その範囲と範囲において、別個のものです: オープンアクセス、使用/再利用、貢献のために最先端の技術を使用するのです。ランガナサンの理論とアイデアにインスパイアされたこのkgは、重要な伝記的側面の同定と存在論的モデルの開発という2段階の「顔に基づく方法論」を用いて開発された。最後に,本研究は,lis領域の活性化から100周年を記念した図書館学の父に対して,その持続的な参加を通じて,kgの向上と献金を行うコミュニティ主導の努力を求めるものである。

The present study puts forward a novel biographical knowledge graph (KG) on Prof. S. R. Ranganathan, one of the pioneering figures in the Library and Information Science (LIS) domain. It has been found that most of the relevant facts about Ranganathan exist in a variety of resources (e.g., books, essays, journal articles, websites, blogs, etc.), offering information in a fragmented and piecemeal way. With this dedicated KG (henceforth known as RKG), we hope to furnish a 360-degree view of his life and achievements. To the best of our knowledge, such a dedicated representation is unparalleled in its scope and coverage: using state-of-the-art technology for anyone to openly access, use/re-use, and contribute. Inspired by Ranganathan's theories and ideas, the KG was developed using a "facet-based methodology" at two levels: in the identification of the vital biographical aspects and the development of the ontological model. Finally, with this study, we call for a community-driven effort to enhance the KG and pay homage to the Father of Library Science on the hundredth anniversary of his revitalizing the LIS domain through his enduring participation.

翻訳日:2024-01-15 09:19:36 公開日:2023-11-23

# yoloを用いた牛のストール数の分類

Classifying cow stall numbers using YOLO ( http://arxiv.org/abs/2401.03340v1 )

ライセンス: Link先を確認

Dheeraj Vajjarapu

(参考訳) 本稿では,牛の群れ検出の分野を推し進めるために,牛の群れに着目したビデオから抽出した画像の集合であるCowStallNumbersデータセットを紹介する。データセットは、0から60までのストール番号を特徴とする1042のトレーニング画像と261のテスト画像からなる。データセットを強化するために, YOLOモデルを用いて微調整を行い, 乱作, 中心作物, ランダム回転などのデータ拡張手法を適用した。実験結果は、ストール数を認識する際の顕著な95.4\%の精度を示している。

This paper introduces the CowStallNumbers dataset, a collection of images extracted from videos focusing on cow teats, designed to advance the field of cow stall number detection. The dataset comprises 1042 training images and 261 test images, featuring stall numbers ranging from 0 to 60. To enhance the dataset, we performed fine-tuning on a YOLO model and applied data augmentation techniques, including random crop, center crop, and random rotation. The experimental outcomes demonstrate a notable 95.4\% accuracy in recognizing stall numbers.

翻訳日:2024-01-15 09:19:13 公開日:2023-11-23

# 畳み込みニューラルネットワークと局所バイナリパターンを用いたプレゼンテーションアタック検出

Presentation Attack Detection using Convolutional Neural Networks and Local Binary Patterns ( http://arxiv.org/abs/2312.00041v1 )

ライセンス: Link先を確認

Justin Spencer, Deborah Lawrence, Prosenjit Chatterjee, Kaushik Roy, Albert Esterline, and Jung-Hee Kim

(参考訳) 近年,ユーザ認証や安全な地域へのアクセス制御にバイオメトリックスを用いることが盛んになり,政府や民間企業でもバイオメトリック・アクセス制御システムが頻繁に利用されている。しかし、これらのシステムは生体情報提示攻撃(spoofing)の可能性を考慮せずに、配備時にセキュリティのリスクを表わす可能性がある。プレゼンテーション攻撃は、現在使用されている多くの生体認証システムに対して有効でありながら実行に要する時間、費用、技術を必要としないため、深刻な脅威である。本研究は,画像における顔および虹彩提示攻撃検出のための3つの異なるソフトウェアベース手法を比較した。最初の方法は、GoogleがImageNetチャレンジのために開発した、事前トレーニング済みの深層畳み込みニューラルネットワーク(CNN)であるInception-v3を使用する。 2つ目は、修正されたSpofnetアーキテクチャに基づいた浅いCNNを使用する。第3は,ローカルバイナリパターン(lbp)を用いたテクスチャベースの手法である。使用されるデータセットは、実画像と偽画像を含むatvs-firデータセットと、実際の画像だけでなく、歪んだ写真、カットされた写真、ビデオリプレイのプレゼンテーション攻撃を含むcasia face anti-spoofingデータセットである。また,casia画像の切り抜きバージョンに基づいて,第3の結果を提示する。

The use of biometrics to authenticate users and control access to secure areas has become extremely popular in recent years, and biometric access control systems are frequently used by both governments and private corporations. However, these systems may represent risks to security when deployed without considering the possibility of biometric presentation attacks (also known as spoofing). Presentation attacks are a serious threat because they do not require significant time, expense, or skill to carry out while remaining effective against many biometric systems in use today. This research compares three different software-based methods for facial and iris presentation attack detection in images. The first method uses Inception-v3, a pre-trained deep Convolutional Neural Network (CNN) made by Google for the ImageNet challenge, which is retrained for this problem. The second uses a shallow CNN based on a modified Spoofnet architecture, which is trained normally. The third is a texture-based method using Local Binary Patterns (LBP). The datasets used are the ATVS-FIr dataset, which contains real and fake iris images, and the CASIA Face Anti-Spoofing Dataset, which contains real images as well as warped photos, cut photos, and video replay presentation attacks. We also present a third set of results, based on cropped versions of the CASIA images.

翻訳日:2023-12-11 03:58:41 公開日:2023-11-23

# ウェーブレット変換とディープ残留ニューラルネットワークを用いた提示検出

Presentation Attack detection using Wavelet Transform and Deep Residual Neural Net ( http://arxiv.org/abs/2312.00040v1 )

ライセンス: Link先を確認

Prosenjit Chatterjee, Alex Yalchin, Joseph Shelton, Kaushik Roy, Xiaohong Yuan, and Kossi D. Edoh

(参考訳) 生体認証は、セキュアな認証システムでより普及している。しかし、生体計測物質はいくつかの方法で使徒によって騙されることがある。その他のインポスタ攻撃、プリント攻撃、マスク攻撃、リプレイ攻撃は、プレゼンテーションアタックのカテゴリに該当する。バイオメトリック画像、特に虹彩と顔は、異なるプレゼンテーション攻撃に対して脆弱である。本研究は、生体認証アクセス制御システムにおけるプレゼンテーション攻撃を軽減するためにディープラーニングアプローチを適用する。まず,生体画像から特徴を抽出するためにウェーブレット変換を適用した。第2に,深層残留ニューラルネットワークを修正してspoofデータセットに適用し,プレゼンテーション攻撃の検出を試みた。本研究は,提案手法を生体spoofデータセット(atvs,casia two class,casia cropped image sets)に適用した。この研究で使用されるデータセットには、さまざまな解像度とサイズとともに、制御された環境と制御されていない環境の両方でキャプチャされるイメージが含まれている。我々はATVS Irisデータセットで93%の精度を得た。 casia 2 クラスと casia cropped データセットでは,それぞれ 91% と 82% のテスト精度が得られた。

Biometric authentication is becoming more prevalent for secured authentication systems. However, the biometric substances can be deceived by the imposters in several ways. Among other imposter attacks, print attacks, mask attacks, and replay attacks fall under the presentation attack category. The bio-metric images, especially the iris and face, are vulnerable to different presentation attacks. This research applies deep learning approaches to mitigate presentation attacks in a biometric access control system. Our contribution in this paper is two-fold: First, we applied the wavelet transform to extract the features from the biometric images. Second, we modified the deep residual neural net and applied it to the spoof datasets in an attempt to detect the presentation attacks. This research applied the proposed approach to biometric spoof datasets, namely ATVS, CASIA two class, and CASIA cropped image sets. The datasets used in this research contain images that are captured in both a controlled and uncontrolled environment along with different resolutions and sizes. We obtained the best accuracy of 93% on the ATVS Iris datasets. For CASIA two class and CASIA cropped datasets, we achieved test accuracies of 91% and 82%, respectively.

翻訳日:2023-12-11 03:58:01 公開日:2023-11-23

# アコースティックサイバーセキュリティ:音声アクティベートシステムの利用

Acoustic Cybersecurity: Exploiting Voice-Activated Systems ( http://arxiv.org/abs/2312.00039v1 )

ライセンス: Link先を確認

Forrest McKee and David Noever

(参考訳) 本研究では,2024年までに人口が世界人口を超えると予測されていることから,デジタル音声アシスタントを対象とする不審な音響攻撃の脅威について検討する。私たちの研究は、AmazonのAlexa、Android、iOS、Cortanaといったさまざまなプラットフォームにおけるこれらの攻撃の可能性を広げ、スマートデバイスの重大な脆弱性を明らかにしています。特定された12の攻撃ベクターには、スマートホームデバイスと自動車システムの操作の成功、軍事通信の潜在的な侵入、重要なインフラセキュリティの課題が含まれる。攻撃成功率は60%程度で、100フィート以上離れた場所からリモートでデバイスをアクティベートできることを定量的に示しています。さらに、これらの攻撃は重要なインフラストラクチャを脅かし、音響シールド、高度な信号処理、マシンラーニング、堅牢なユーザ認証を組み合わせた多面的な防御戦略の必要性を強調している。

In this study, we investigate the emerging threat of inaudible acoustic attacks targeting digital voice assistants, a critical concern given their projected prevalence to exceed the global population by 2024. Our research extends the feasibility of these attacks across various platforms like Amazon's Alexa, Android, iOS, and Cortana, revealing significant vulnerabilities in smart devices. The twelve attack vectors identified include successful manipulation of smart home devices and automotive systems, potential breaches in military communication, and challenges in critical infrastructure security. We quantitatively show that attack success rates hover around 60%, with the ability to activate devices remotely from over 100 feet away. Additionally, these attacks threaten critical infrastructure, emphasizing the need for multifaceted defensive strategies combining acoustic shielding, advanced signal processing, machine learning, and robust user authentication to mitigate these risks.

翻訳日:2023-12-11 03:57:32 公開日:2023-11-23

# マウスクリックストリームデータ解析による連続認証

Continuous Authentication Using Mouse Clickstream Data Analysis ( http://arxiv.org/abs/2312.00802v1 )

ライセンス: Link先を確認

Sultan Almalki, Prosenjit Chatterjee, and Kaushik Roy

(参考訳) バイオメトリックスは、生理的または行動的特性に基づいて個人を認証するために使用される。マウスダイナミクスは、セキュリティ違反に対する保護として継続的認証を実行するために使用できる行動バイオメトリックの例である。マウスのダイナミクスに関する最近の研究は、ユーザを特定する上で有望な結果を示しているが、まだ許容できる精度に達していない。本稿では,マウスダイナミクスデータセットであるbalabit mouse challengeデータセットを用いて,異なる分類手法の実証的評価を行った。ユーザ識別は、マウス移動、ポイント・アンド・クリック、ドラッグ・アンド・ドロップの3つのマウスアクションを使って行われる。検証と認証は、決定木分類器、K-Nearest Neighbors分類器、ランダムフォレスト分類器の3つの機械学習分類器を用いて行われる。その結果、3つの分類器は実際のユーザと詐欺師を比較的高い精度で区別できることがわかった。検証モードでは、全ての分類器は完全な精度100%を達成する。認証モードでは、3つの分類器が最も正確(ACC:87.6%、AUC:90.3%)、(K-Nearest Neighbors ACC:99.3%、AUC:99.9%)、(Random Forest ACC:89.9%、AUC:92.5%)である。

Biometrics is used to authenticate an individual based on physiological or behavioral traits. Mouse dynamics is an example of a behavioral biometric that can be used to perform continuous authentication as protection against security breaches. Recent research on mouse dynamics has shown promising results in identifying users; however, it has not yet reached an acceptable level of accuracy. In this paper, an empirical evaluation of different classification techniques is conducted on a mouse dynamics dataset, the Balabit Mouse Challenge dataset. User identification is carried out using three mouse actions: mouse move, point and click, and drag and drop. Verification and authentication methods are conducted using three machine-learning classifiers: the Decision Tree classifier, the K-Nearest Neighbors classifier, and the Random Forest classifier. The results show that the three classifiers can distinguish between a genuine user and an impostor with a relatively high degree of accuracy. In the verification mode, all the classifiers achieve a perfect accuracy of 100%. In authentication mode, all three classifiers achieved the highest accuracy (ACC) and Area Under Curve (AUC) from scenario B using the point and click action data: (Decision Tree ACC:87.6%, AUC:90.3%), (K-Nearest Neighbors ACC:99.3%, AUC:99.9%), and (Random Forest ACC:89.9%, AUC:92.5%).

翻訳日:2023-12-11 03:46:06 公開日:2023-11-23

# 敗血症死亡予測におけるSOFAスコアのクラスター軌跡

Cluster trajectory of SOFA score in predicting mortality in sepsis ( http://arxiv.org/abs/2311.17066v1 )

ライセンス: Link先を確認

Yuhe Ke, Matilda Swee Sun Tang, Celestine Jia Ling Loh, Hairil Rizal Abdullah, Nicholas Brian Shannon

(参考訳) 目的: セプシスは生命を脅かす状態である。連続的臓器不全評価(sequential organ failure assessment, sofa)スコアは、臓器機能障害の評価やicu死亡率の予測に一般的に用いられるが、静的測定として捉えられ、動的変化を捉えることができない。本研究の目的は、ICU入院72時間におけるSOFAスコアの動的変化と患者の成績との関係を検討することである。設計, 設定, 参加者: 集中治療ivデータベースのための医療情報マート3,253名の患者に対し, 敗血症-3基準を満たし, icu入院72時間以上, 蘇生状態も検討した。動的時間ゆがみとk平均クラスタリングを用いた群ベース軌道モデリングは、動的ソファスコアの異なる軌道パターンを同定した。その後、Pythonで比較された。主な成果は, 病院, ICU死亡率, 入院期間, ICU, 入院期間, 入院期間などであった。 ICUから病棟への放電時間と7日間と14日間のカットオフが行われた。結果: A群, B群, C群, D群の4群を同定した。クラスターDは、最も長いICUと病院滞在、最も高いICUと病院死亡率を有していた。 ICUからの放電速度はクラスターAとBに似ており、クラスタCは当初は同等の速度であったが、ウォードへの移行は遅い。結論:SOFAスコアの動的変化のモニタリングは敗血症重症度と治療応答性を評価する上で重要である。

Objective: Sepsis is a life-threatening condition. Sequential Organ Failure Assessment (SOFA) score is commonly used to assess organ dysfunction and predict ICU mortality, but it is taken as a static measurement and fails to capture dynamic changes. This study aims to investigate the relationship between dynamic changes in SOFA scores over the first 72 hours of ICU admission and patient outcomes. Design, setting, and participants: 3,253 patients in the Medical Information Mart for Intensive Care IV database who met the sepsis-3 criteria and were admitted from the emergency department with at least 72 hours of ICU admission and full-active resuscitation status were analysed. Group-based trajectory modelling with dynamic time warping and k-means clustering identified distinct trajectory patterns in dynamic SOFA scores. They were subsequently compared using Python. Main outcome measures: Outcomes including hospital and ICU mortality, length of stay in hospital and ICU, and readmission during hospital stay, were collected. Discharge time from ICU to wards and cut-offs at 7-day and 14-day were taken. Results: Four clusters were identified: A (consistently low SOFA scores), B (rapid increase followed by a decline in SOFA scores), C (higher baseline scores with gradual improvement), and D (persistently elevated scores). Cluster D had the longest ICU and hospital stays, highest ICU and hospital mortality. Discharge rates from ICU were similar for Clusters A and B, while Cluster C had initially comparable rates but a slower transition to ward. Conclusion: Monitoring dynamic changes in SOFA score is valuable for assessing sepsis severity and treatment responsiveness.

翻訳日:2023-12-03 13:08:34 公開日:2023-11-23

# 陽電子結合分子ダイオードの多体理論計算

Many-body theory calculations of positronic-bonded molecular dianions ( http://arxiv.org/abs/2311.16318v1 )

ライセンス: Link先を確認

J. P. Cassidy, J. Hofierka, B. Cunningham and D. G. Green

(参考訳) 陽電子対アニオン系のエネルギー安定性 [A$^-;e^+;$A$^-$] は多体理論によって研究され、そこでは、$A^-$は H$^{-}$, F$^{-}$, Cl$^{-}$ および分子アニオン (CN)$^{-}$ および (NCO)$^{-}$ を含む。具体的には、[j]で構築された陽電子アニオン自己エネルギーを用いて、2つのアニオンの場における陽電子のダイソン方程式を解いて、イオン分離の関数としての系のエネルギーを決定する。 Hofierka, B. Cunningham, C. M. Rawlins, C. Patterson, D. G. Green, \emph{Nature} {\bf 606} 688 (2022)] は偏光、スクリーニング、仮想ポジトロニウム形成などの相関関係を記述している。計算は H$_{2}^{2-}$, F$_{2}^{2-}$, Cl$_{2}^{2-}$ と相互作用する陽電子に対して行われ、以前の理論とよく一致している。特に、イオン分離に関する[h$^-;e^+$;h$^-$]系のポテンシャルエネルギーにおける2つのミニマの存在を確認する: 1つは陽電子結合 [h$^-;e^+$;h$^-$] イオン分離における局所最小値 $r\sim3.4$~\aa\phantom{} と、より小さなイオン分離におけるグローバル最小値 $r\lesssim1.6$~\aa\phantom{} は、h$_2$分子と陽電子負イオンへの解離に関して系の全体的な不安定性を与える。最初の予測は、分子アニオンの断片からなる陽電子結合、特に(cn)$_{2}^{2-}$と(nco)$_{2}^{2-}$である。いずれの場合も、陽電子結合の生成によって形成される分子は、A$^-$と$e^+$A$^-$(陽電子は1つの陽イオンに結合する)への解離に対して安定であり、結合エネルギーは1~eVのオーダーで、結合長は数個の \r アングストロームの順番で変化する。

The energetic stability of positron di-anion systems [A$^-;e^+;$A$^-$] is studied via many-body theory, where $A^-$ includes H$^{-}$, F$^{-}$, Cl$^{-}$ and the molecular anions (CN)$^{-}$ and (NCO)$^{-}$. Specifically, the energy of the system as a function of ionic separation is determined by solving the Dyson equation for the positron in the field of the two anions, using a positron-anion self energy as constructed in [J. Hofierka, B. Cunningham, C. M. Rawlins, C. H. Patterson and D. G. Green, \emph{Nature} {\bf 606} 688 (2022)] that accounts for correlations including polarization, screening, and virtual-positronium formation. Calculations are performed for a positron interacting with H$_{2}^{2-}$, F$_{2}^{2-}$, and Cl$_{2}^{2-}$, and are found to be in good agreement with previous theory. In particular, we confirm the presence of two minima in the potential energy of the [H$^-;e^+$;H$^-$] system with respect to ionic separation: one a positronically-bonded [H$^-;e^+$;H$^-$] local minimum at ionic separations $r\sim3.4$~\AA\phantom{}, and a global minimum at smaller ionic separations $r\lesssim1.6$~\AA\phantom{} that gives overall instability of the system with respect to dissociation into a H$_2$ molecule and a positronium negative ion, Ps$^-$. The first predictions are made for positronic bonding in dianions consisting of molecular anionic fragments, specifically for (CN)$_{2}^{2-}$, and (NCO)$_{2}^{2-}$. In all cases we find that the molecules formed by the creation of a positronic bond are stable relative to dissociation into A$^-$ and $e^+$A$^-$ (positron bound to a single anion), with bond energies on the order of 1~eV and bond lengths on the order of several \r angstroms.

翻訳日:2023-12-03 13:07:35 公開日:2023-11-23

# 拡散確率モデルを用いた分岐多様性による短絡バイアス軽減

Shortcut Bias Mitigation via Ensemble Diversity Using Diffusion Probabilistic Models ( http://arxiv.org/abs/2311.16176v1 )

ライセンス: Link先を確認

Luca Scimeca, Alexander Rubinstein, Damien Teney, Seong Joon Oh, Armand Mihai Nicolicioiu, Yoshua Bengio

(参考訳) 複数の手がかりがターゲットラベルを予測しているデータにおける散発的な相関は、しばしば単純バイアス(simple bias)と呼ばれる現象に繋がる。本研究では,拡散確率モデル(DPM)を用いた短絡バイアス軽減のためのアンサンブル多様化フレームワークを提案する。また,特定のトレーニング間隔において,dpmは,相関した入力特徴を提示する画像上でも,新たな特徴の組み合わせを持つ画像を生成することができることを示した。我々は、この重要な特性を利用して合成反事実を生成し、アンサンブル不一致によるモデルの多様性を向上させる。そこで本研究では,DPM誘導の多様化は,制御信号の追加を必要とせず,一次ショートカットキューへの依存を取り除くのに十分であることを示す。さらに,複数の多様化目標に対して有効性を実証的に定量化し,さらに補助データ収集に依存する先行作業と同等に一般化および多様化性能の向上を図った。

Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to a phenomenon known as simplicity bias, where a model relies on erroneous, easy-to-learn cues while ignoring reliable ones. In this work, we propose an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs) for shortcut bias mitigation. We show that at particular training intervals, DPMs can generate images with novel feature combinations, even when trained on images displaying correlated input features. We leverage this crucial property to generate synthetic counterfactuals to increase model diversity via ensemble disagreement. We show that DPM-guided diversification is sufficient to remove dependence on primary shortcut cues, without a need for additional supervised signals. We further empirically quantify its efficacy on several diversification objectives, and finally show improved generalization and diversification performance on par with prior work that relies on auxiliary data collection.

翻訳日:2023-12-03 13:06:29 公開日:2023-11-23

# Point2RBox: エンドツーエンドオブジェクト検出のための合成視覚パターンからの知識と単一点スーパービジョンを組み合わせる

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision ( http://arxiv.org/abs/2311.14758v1 )

ライセンス: Link先を確認

Yu Yi, Xue Yang, Qingyun Li, Feipeng Da, Junchi Yan, Jifeng Dai, Yu Qiao

(参考訳) 指向性物体検出(ood)の需要が急速に高まる中、水平箱(hbox)から回転箱(rbox)を学ぶための弱教師付き検出器に関する最近の研究が注目を集めている。本稿では,より困難なラベル効率設定,すなわち単一点制御OODについて検討し,Point2RBoxというアプローチを提案する。具体的には,2つの原則を活用することを提案する。 1) 合成パターン知識の組み合わせ: 画像上の各ラベル付き点をサンプリングすることにより、既知境界ボックスによる合成視覚パターンにオブジェクト特徴を移譲し、ボックス回帰の知識を提供する。 2) 変換自己スーパービジョン: 変換された入力画像(例えば、スケール/ローテーション)により、出力RBoxは、オブジェクト間の相対的なサイズ/ローテーションを知覚できるように、同じ変換に従うように訓練される。この検出器は、周辺問題に対処するいくつかの工夫された技術によってさらに強化されている。例えば、我々の点監督設定では、オブジェクトのサイズが不足しているため、アンカー/層割り当てなどである。私たちの知る限りでは、Point2RBoxはポイント管理OODの最初のエンドツーエンドソリューションです。特に,本手法は軽量なパラダイムを用いているが,DOTA/DIOR/HRSCデータセットの41.05%/27.62%/80.01%の点教師付き代替品間での競合性能を実現する。

With the rapidly increasing demand for oriented object detection (OOD), recent research involving weakly-supervised detectors for learning rotated box (RBox) from the horizontal box (HBox) has attracted more and more attention. In this paper, we explore a more challenging yet label-efficient setting, namely single point-supervised OOD, and present our approach called Point2RBox. Specifically, we propose to leverage two principles: 1) Synthetic pattern knowledge combination: By sampling around each labelled point on the image, we transfer the object feature to synthetic visual patterns with the known bounding box to provide the knowledge for box regression. 2) Transform self-supervision: With a transformed input image (e.g. scaled/rotated), the output RBoxes are trained to follow the same transformation so that the network can perceive the relative size/rotation between objects. The detector is further enhanced by a few devised techniques to cope with peripheral issues, e.g. the anchor/layer assignment as the size of the object is not available in our point supervision setting. To our best knowledge, Point2RBox is the first end-to-end solution for point-supervised OOD. In particular, our method uses a lightweight paradigm, yet it achieves a competitive performance among point-supervised alternatives, 41.05%/27.62%/80.01% on DOTA/DIOR/HRSC datasets.

翻訳日:2023-11-30 09:52:37 公開日:2023-11-23

# PointOBB:シングルポイントスーパービジョンによるオブジェクト指向物体検出の学習

PointOBB: Learning Oriented Object Detection via Single Point Supervision ( http://arxiv.org/abs/2311.14757v1 )

ライセンス: Link先を確認

Junwei Luo, Xue Yang, Yi Yu, Qingyun Li, Junchi Yan, Yansheng Li

(参考訳) 単点監視対象検出はコスト効率のため注目されている。しかし、既存のアプローチでは水平境界ボックス(hbbs)の生成に重点を置いており、空中画像のオブジェクトに一般的に使用される指向境界ボックス(obbs)は無視している。本稿では,オブジェクト指向物体検出のための最初の単一点ベース OBB 生成法である PointOBB を提案する。 PointOBBは、オリジナルビュー、リサイズビュー、ローテーション/フリップ(rot/flp)ビューの3つのユニークなビューの協調利用を通じて動作する。元のビューでは、resizedとrot/flpビューを利用して、それぞれスケール拡張モジュールと角取得モジュールを構築します。前者のモジュールでは、SSC(Scale-Sensitive Consistency)損失は、オブジェクトのスケールを知覚するディープネットワークの能力を高めるために設計されている。正確な対象角度予測のために、後者のモジュールは自己教師付き学習を取り入れて、スパースオブジェクトに対応する密集角度を集約するスケール誘導Dense-to-Sparse(DS)マッチング戦略と関連付ける。リサイズとrot/flpビューは、トレーニング中にプログレッシブなマルチビュースイッチング戦略を用いて切り替えられ、スケールとアングルの同時最適化を実現する。 DIOR-RとDOTA-v1.0データセットの実験結果は、PointOBBが有望な性能を達成し、潜在的点監督ベースラインを著しく上回ることを示した。

Single point-supervised object detection is gaining attention due to its cost-effectiveness. However, existing approaches focus on generating horizontal bounding boxes (HBBs) while ignoring oriented bounding boxes (OBBs) commonly used for objects in aerial images. This paper proposes PointOBB, the first single Point-based OBB generation method, for oriented object detection. PointOBB operates through the collaborative utilization of three distinctive views: an original view, a resized view, and a rotated/flipped (rot/flp) view. Upon the original view, we leverage the resized and rot/flp views to build a scale augmentation module and an angle acquisition module, respectively. In the former module, a Scale-Sensitive Consistency (SSC) loss is designed to enhance the deep network's ability to perceive the object scale. For accurate object angle predictions, the latter module incorporates self-supervised learning to predict angles, which is associated with a scale-guided Dense-to-Sparse (DS) matching strategy for aggregating dense angles corresponding to sparse objects. The resized and rot/flp views are switched using a progressive multi-view switching strategy during training to achieve coupled optimization of scale and angle. Experimental results on the DIOR-R and DOTA-v1.0 datasets demonstrate that PointOBB achieves promising performance, and significantly outperforms potential point-supervised baselines.

翻訳日:2023-11-30 09:52:09 公開日:2023-11-23

# タスク分散的ロバストなデータフリーなメタラーニング

Task-Distributionally Robust Data-Free Meta-Learning ( http://arxiv.org/abs/2311.14756v1 )

ライセンス: Link先を確認

Zixuan Hu, Li Shen, Zhenyi Wang, Yongxian Wei, Baoyuan Wu, Chun Yuan, Dacheng Tao

(参考訳) data-free meta-learning (dfml) は、トレーニングデータを必要としない複数の事前学習モデルを活用することで、新しいタスクを効率的に学習することを目的としている。既存のインバージョンベースのDFMLメソッドは、学習可能なデータセットから擬似タスクを構築する。タスク・ディストリビューション・シフト(TDS)とタスク・ディストリビューション・破壊(TDC)の2つの大きな課題を初めて明らかにした。 TDSは、新しく生成されたタスクに対する歪んだタスク分布のため、バイアス付きメタラーナーにつながる。 TDCは、ラベルを誤解させるような不信任モデルや品質の悪いモデルがタスク分布を汚染する場合に発生する。これらの課題に対処するために,タスク分散の堅牢性を保証する頑健なDFMLフレームワークを導入する。本稿では,タスクメモリバッファ内でのタスク補間を多用した擬似タスク分布からメタ学習を提案する。このアプローチは、広範囲の補間メモリタスクで一貫したパフォーマンスを維持することによって、新しく生成されたタスクに対するメタラーナーの過度な依存を減らす。さらに,自動モデル選択機構をメタトレーニングフェーズにシームレスに組み込んで,各モデルの信頼性を学習可能な重みとしてパラメータ化する。これは強化学習に触発されたポリシー勾配アルゴリズムで最適化され、モデル選択によって生じる非微分可能課題を効果的に解決する。さまざまなデータセットにわたる総合的な実験は、TDSとTDCの緩和におけるフレームワークの有効性を示し、現実のシナリオでDFMLを改善する可能性を示している。

Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data. Existing inversion-based DFML methods construct pseudo tasks from a learnable dataset, which is inversely generated from the pre-trained model pool. For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift (TDS) and Task-Distribution Corruption (TDC). TDS leads to a biased meta-learner because of the skewed task distribution towards newly generated tasks. TDC occurs when untrusted models characterized by misleading labels or poor quality pollute the task distribution. To tackle these issues, we introduce a robust DFML framework that ensures task distributional robustness. We propose to meta-learn from a pseudo task distribution, diversified through task interpolation within a compact task-memory buffer. This approach reduces the meta-learner's overreliance on newly generated tasks by maintaining consistent performance across a broader range of interpolated memory tasks, thus ensuring its generalization for unseen tasks. Additionally, our framework seamlessly incorporates an automated model selection mechanism into the meta-training phase, parameterizing each model's reliability as a learnable weight. This is optimized with a policy gradient algorithm inspired by reinforcement learning, effectively addressing the non-differentiable challenge posed by model selection. Comprehensive experiments across various datasets demonstrate the framework's effectiveness in mitigating TDS and TDC, underscoring its potential to improve DFML in real-world scenarios.

翻訳日:2023-11-30 09:51:41 公開日:2023-11-23

# excel : ex-of-distribution detection 強化のためのextreme と collective logit の複合情報

ExCeL : Combined Extreme and Collective Logit Information for Enhancing Out-of-Distribution Detection ( http://arxiv.org/abs/2311.14754v1 )

ライセンス: Link先を確認

Naveen Karunanayake, Suranga Seneviratne, Sanjay Chawla

(参考訳) ディープラーニングモデルは、アウト・オブ・ディストリビューション(OOD)データの予測に自信を欠くことが多く、予測の信頼性を確保する上でのOOD検出の重要性を浮き彫りにしている。様々なOOD検出手法の中で、ポストホック検出器は、主に使いやすさと実装の容易さから、大きな人気を集めている。しかしながら、ほとんどのポストホックなood検出器の有効性は、最大ロジットのような極端な情報のみに依存するか、出力層に埋め込まれた集団情報(すなわち、クラスにまたがる情報やトレーニングサンプル)に依存しているため、制限されている。本稿では,OOD検出の精度を高めるために,出力層内の極端情報と集合情報を組み合わせたExCeLを提案する。最上位の予測クラスのロジットを極端な情報(すなわち最大ロジット)として活用する一方、集団情報は、様々なトレーニングサンプルにまたがって、次の階級に現れる他のクラスの可能性を評価する新しいアプローチによって導かれる。我々の考えは、ID(In-distriion)データに対して、予測クラスを超えたクラスのランキングがOODデータよりも決定論的である、という観察に動機づけられている。 CIFAR100とImageNet-200データセットで実施された実験により、ExCeLは、近OODと遠OODのジョイントパフォーマンスが考慮されている場合、既存の21のベースラインのうち、一貫して5つのトップパフォーマンスメソッドであることが示された(AUROCとFPR95)。さらにexcelは、一方のデータセットでベストだが他方でパフォーマンスが低下する他のベースラインとは異なり、両方のデータセットで最高の全体的なパフォーマンスを示している。

Deep learning models often exhibit overconfidence in predicting out-of-distribution (OOD) data, underscoring the crucial role of OOD detection in ensuring reliability in predictions. Among various OOD detection approaches, post-hoc detectors have gained significant popularity, primarily due to their ease of use and implementation. However, the effectiveness of most post-hoc OOD detectors has been constrained as they rely solely either on extreme information, such as the maximum logit, or on the collective information (i.e., information spanned across classes or training samples) embedded within the output layer. In this paper, we propose ExCeL that combines both extreme and collective information within the output layer for enhanced accuracy in OOD detection. We leverage the logit of the top predicted class as the extreme information (i.e., the maximum logit), while the collective information is derived in a novel approach that involves assessing the likelihood of other classes appearing in subsequent ranks across various training samples. Our idea is motivated by the observation that, for in-distribution (ID) data, the ranking of classes beyond the predicted class is more deterministic compared to that in OOD data. Experiments conducted on CIFAR100 and ImageNet-200 datasets demonstrate that ExCeL consistently is among the five top-performing methods out of twenty-one existing post-hoc baselines when the joint performance on near-OOD and far-OOD is considered (i.e., in terms of AUROC and FPR95). Furthermore, ExCeL shows the best overall performance across both datasets, unlike other baselines that work best on one dataset but has a performance drop in the other.

翻訳日:2023-11-30 09:51:14 公開日:2023-11-23

# 一般ゼロショット学習のための属性認識型表現法

Attribute-Aware Representation Rectification for Generalized Zero-Shot Learning ( http://arxiv.org/abs/2311.14750v1 )

ライセンス: Link先を確認

Zhijie Rao, Jingcai Guo, Xiaocheng Lu, Qihua Zhou, Jie Zhang, Kang Wei, Chenxin Li, Song Guo

(参考訳) 一般化されたゼロショット学習(gzsl)は、一連の偏りのないビジュアル・セマンティクスマッピングを設計し、その精度は目に見えるクラスと見えないクラスの両方から抽出された視覚特徴の完全性に大きく依存している。しかしながら、gzslにおける一般的な慣例として、事前訓練された特徴抽出器は、下流のタスク/データセットのドメイン固有の特性を捉えるのが容易であり、特に見当たらないクラスにおいて、全体的な認識性能を妨げる、きめ細かい識別機能、すなわちドメインバイアスを提供する。最近の研究では、微調整された特徴抽出器によって部分的にこの問題に対処しているが、必然的に破滅的な放棄と過剰フィッティングの問題を引き起こす可能性がある。本稿では,GZSL の簡易かつ効果的な属性認識表現フレームワークである $\mathbf{(AR)^{2}}$ を提案する。具体的には,UAD (Unseen-Aware Distillation) とAGL (Attribute-Guided Learning) の2つの要素から構成される。トレーニング中、UDAは、未確認のクラスと未確認のクラスの両方で共有される属性テキストの事前知識を利用して、未確認のクラス感受性の視覚的特徴をターゲットとして検出・維持すると同時に、AGLは、価値ある特徴に焦点を合わせ、属性誘導表現学習により、そのクラスにノイズのある要素を適合させることを抑えることを目的としている。各種ベンチマークデータセットの大規模な実験により,本手法の有効性が示された。

Generalized Zero-shot Learning (GZSL) has yielded remarkable performance by designing a series of unbiased visual-semantics mappings, wherein, the precision relies heavily on the completeness of extracted visual features from both seen and unseen classes. However, as a common practice in GZSL, the pre-trained feature extractor may easily exhibit difficulty in capturing domain-specific traits of the downstream tasks/datasets to provide fine-grained discriminative features, i.e., domain bias, which hinders the overall recognition performance, especially for unseen classes. Recent studies partially address this issue by fine-tuning feature extractors, while may inevitably incur catastrophic forgetting and overfitting issues. In this paper, we propose a simple yet effective Attribute-Aware Representation Rectification framework for GZSL, dubbed $\mathbf{(AR)^{2}}$, to adaptively rectify the feature extractor to learn novel features while keeping original valuable features. Specifically, our method consists of two key components, i.e., Unseen-Aware Distillation (UAD) and Attribute-Guided Learning (AGL). During training, UAD exploits the prior knowledge of attribute texts that are shared by both seen/unseen classes with attention mechanisms to detect and maintain unseen class-sensitive visual features in a targeted manner, and meanwhile, AGL aims to steer the model to focus on valuable features and suppress them to fit noisy elements in the seen classes by attribute-guided representation learning. Extensive experiments on various benchmark datasets demonstrate the effectiveness of our method.

翻訳日:2023-11-30 09:50:37 公開日:2023-11-23

# プログレッシブ言語に基づく合成ゼロショット学習

Compositional Zero-shot Learning via Progressive Language-based Observations ( http://arxiv.org/abs/2311.14749v1 )

ライセンス: Link先を確認

Lin Li, Guikun Chen, Jun Xiao, Long Chen

(参考訳) compositional zero-shot learningは、トレーニング中に既知のプリミティブ(状態とオブジェクト)を活用することで、目に見えない状態オブジェクトの構成を認識することを目的としている。しかしながら、プリミティブ間の相互作用を効果的にモデル化し、新しい構成に知識を一般化することは、年次課題である。オブジェクト条件付きおよび状態条件付き分散、すなわち、状態(またはオブジェクト)の出現は、異なるオブジェクト(または状態)と組み合わせると著しく変化する。例えば、状態"old"は、"car"のヴィンテージデザインや"cat"の高齢を表すことができる。本稿では,事前観測されたプリミティブに基づく合成カテゴリの予測により,これらの分散を緩和できると主張する。そこで本研究では,プリミティブの観測順序を動的に決定できるprogressive language-based observations (plo)を提案する。これらの観察は、モデルがステップバイステップで画像の内容を理解することを可能にする一連の概念または言語から構成される。具体的には、PLOは事前に訓練された視覚言語モデル(VLM)を採用し、観察能力を持つモデルを強化する。さらに2つの変種を考案します 1) PLO-VLM: 予備観測分類器が2つのプリミティブの観測順序を動的に決定する2段階法。 2) PLO-LLM: 大規模言語モデル(LLM)を用いて, ステップバイステップ観測のための合成プロンプトを作成する多段階スキーム。 3つの挑戦的なデータセットに対する大規模な改善は、最先端の手法と比較してPLOの優位性を示し、合成認識におけるその能力を確認している。

Compositional zero-shot learning aims to recognize unseen state-object compositions by leveraging known primitives (state and object) during training. However, effectively modeling interactions between primitives and generalizing knowledge to novel compositions remains a perennial challenge. There are two key factors: object-conditioned and state-conditioned variance, i.e., the appearance of states (or objects) can vary significantly when combined with different objects (or states). For instance, the state "old" can signify a vintage design for a "car" or an advanced age for a "cat". In this paper, we argue that these variances can be mitigated by predicting composition categories based on pre-observed primitive. To this end, we propose Progressive Language-based Observations (PLO), which can dynamically determine a better observation order of primitives. These observations comprise a series of concepts or languages that allow the model to understand image content in a step-by-step manner. Specifically, PLO adopts pre-trained vision-language models (VLMs) to empower the model with observation capabilities. We further devise two variants: 1) PLO-VLM: a two-step method, where a pre-observing classifier dynamically determines the observation order of two primitives. 2) PLO-LLM: a multi-step scheme, which utilizes large language models (LLMs) to craft composition-specific prompts for step-by-step observing. Extensive ablations on three challenging datasets demonstrate the superiority of PLO compared with state-of-the-art methods, affirming its abilities in compositional recognition.

翻訳日:2023-11-30 09:50:04 公開日:2023-11-23

# homoe:hopfield networkとソフトミキシングによるゼロショット学習のためのメモリベースおよびコンポジションアウェアフレームワーク

HOMOE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts ( http://arxiv.org/abs/2311.14747v1 )

ライセンス: Link先を確認

Do Huu Dat, Po Yuan Mao, Tien Hoang Nguyen, Wray Buntine, Mohammed Bennamoun

(参考訳) 合成ゼロショット学習(CZSL)は、合成思考を方法論に組み込むことで、従来のゼロショット学習の制約を克服することを目的として、機械学習において不可欠なパラダイムとして登場した。従来のゼロショット学習は、事前定義されたクラス埋め込みに依存するため、見知らぬクラスと見知らぬクラスの組み合わせを管理するのが困難である。対照的に、コンポジションゼロショット学習はクラス間の固有の階層と構造的接続を使用し、属性やコンポーネント、その他の意味要素を組み合わせて新しいクラス表現を作成する。本稿では,現代ホップフィールドネットワークとMixture of Experts(HOMOE)を組み合わせた新しいフレームワークを提案する。具体的には、Modern Hopfield Networkは、ラベルのプロトタイプを格納し、入力画像に関連するラベルを識別するメモリを作成する。その後、Mixture of Expertモデルでは、画像とフィッティングプロトタイプを統合して、最終的な構成分類を生成する。提案手法は,MIT-StatesやUT-Zapposなど,いくつかのベンチマークにおいてSOTA性能を実現する。また,各コンポーネントが一般化にどのように寄与するかについても検討した。

Compositional Zero-Shot Learning (CZSL) has emerged as an essential paradigm in machine learning, aiming to overcome the constraints of traditional zero-shot learning by incorporating compositional thinking into its methodology. Conventional zero-shot learning has difficulty managing unfamiliar combinations of seen and unseen classes because it depends on pre-defined class embeddings. In contrast, Compositional Zero-Shot Learning uses the inherent hierarchies and structural connections among classes, creating new class representations by combining attributes, components, or other semantic elements. In our paper, we propose a novel framework that for the first time combines the Modern Hopfield Network with a Mixture of Experts (HOMOE) to classify the compositions of previously unseen objects. Specifically, the Modern Hopfield Network creates a memory that stores label prototypes and identifies relevant labels for a given input image. Following this, the Mixture of Expert models integrates the image with the fitting prototype to produce the final composition classification. Our approach achieves SOTA performance on several benchmarks, including MIT-States and UT-Zappos. We also examine how each component contributes to improved generalization.

翻訳日:2023-11-30 09:49:36 公開日:2023-11-23

# まとめ:RGB、RGB-D、RGB-T Salient Object Detection

All in One: RGB, RGB-D, and RGB-T Salient Object Detection ( http://arxiv.org/abs/2311.14746v1 )

ライセンス: Link先を確認

Xingzhao Jia, Zhongqiu Zhao, Changlei Dongye, and Zhao Zhang

(参考訳) 正当性物体検出(SOD)は、画像内の最も魅力的な物体を特定することを目的としている。検出されるデータの種類によって、SODはRGB、RGB-D(Depth)、RGB-T(Thermal)、光電場SODなど様々な形態に分類できる。これまでの研究は、個々のデータ型による塩分検出に重点を置いてきた。 RGB-D SODモデルがRGB-Tデータを検出せざるを得ない場合、性能は低下する。 3種類のデータ(rgb, rgb-d, rgb-t)のサルエントオブジェクト検出タスクに対して, 統一解を提供する革新的なモデルフレームワークを提案する。 3種類のデータは、同じ重みパラメータを持つ1つのモデル(すべて1つ)で処理できる。本フレームワークでは,3種類のデータを単一入力バッチ内で順序的に結合し,トランスフォーマネットワークを用いて特徴を抽出する。本研究では, 高速でRGB, RGB-D, RGB-Tデータ(RGBデータ780FPS, RGB-D, RGB-Tデータ485FPS)を検出可能な,効率的な軽量SODモデルを提案する。 625万のパラメータだけで、AiOSODはRGB、RGB-D、RGB-Tデータセット上で優れたパフォーマンスを達成する。

Salient object detection (SOD) aims to identify the most attractive objects within an image. Depending on the type of data being detected, SOD can be categorized into various forms, including RGB, RGB-D (Depth), RGB-T (Thermal) and light field SOD. Previous researches have focused on saliency detection with individual data type. If the RGB-D SOD model is forced to detect RGB-T data it will perform poorly. We propose an innovative model framework that provides a unified solution for the salient object detection task of three types of data (RGB, RGB-D, and RGB-T). The three types of data can be handled in one model (all in one) with the same weight parameters. In this framework, the three types of data are concatenated in an ordered manner within a single input batch, and features are extracted using a transformer network. Based on this framework, we propose an efficient lightweight SOD model, namely AiOSOD, which can detect any RGB, RGB-D, and RGB-T data with high speed (780FPS for RGB data, 485FPS for RGB-D or RGB-T data). Notably, with only 6.25M parameters, AiOSOD achieves excellent performance on RGB, RGB-D, and RGB-T datasets.

翻訳日:2023-11-30 09:49:16 公開日:2023-11-23

# 第2回海洋コンピュータビジョンワークショップ(MaCVi)2024

The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 ( http://arxiv.org/abs/2311.14762v1 )

ライセンス: Link先を確認

Benjamin Kiefer, Lojze \v{Z}ust, Matej Kristan, Janez Per\v{s}, Matija Ter\v{s}ek, Arnold Wiliem, Martin Messmer, Cheng-Yen Yang, Hsiang-Wei Huang, Zhongyu Jiang, Heng-Cheng Kuo, Jie Mei, Jenq-Neng Hwang, Daniel Stadler, Lars Sommer, Kaer Huang, Aiguo Zheng, Weitu Chong, Kanokphan Lertniphonphan, Jun Xie, Feng Chen, Jian Li, Zhepeng Wang, Luca Zedda, Andrea Loddo, Cecilia Di Ruberto, Tuan-Anh Vu, Hai Nguyen-Truong, Tan-Sang Ha, Quan-Dung Pham, Sai-Kit Yeung, Yuan Feng, Nguyen Thanh Thien, Lixin Tian, Sheng-Yao Kuan, Yuan-Hao Ho, Angel Bueno Rodriguez, Borja Carrillo-Perez, Alexander Klein, Antje Alex, Yannik Steiniger, Felix Sattler, Edgardo Solano-Carrillo, Matej Fabijani\'c, Magdalena \v{S}umunec, Nadir Kapetanovi\'c, Andreas Michel, Wolfgang Gross, Martin Weinmann

(参考訳) 第2回海洋コンピュータビジョンワークショップ (MaCVi) 2024は、無人航空機 (UAV) と無人表面車両 (USV) の海上コンピュータビジョンを扱っている。 3つの課題が考えられる。 (i)再識別によるUAVによる海上物体追跡 (II)USVによる海面障害物の分離・検出 (iii)usvによる海上艇追跡。 usvベースの海上障害物のセグメンテーションと検出は、実世界の組み込みデバイス上での効率のよい推論に取り組む新しい組込みチャレンジを含む3つの下位課題を特徴としている。本報告では,課題から得られた知見を概観する。統計的および定性的な分析を行い、195件以上の応募からトレンドを評価する。すべてのデータセット、評価コード、およびリーダボードがhttps://macvi.org/workshop/macvi24.comで公開されている。

The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obstacle Segmentation and Detection features three sub-challenges, including a new embedded challenge addressing efficicent inference on real-world embedded devices. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 195 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi24.

翻訳日:2023-11-30 09:35:57 公開日:2023-11-23

# SinSR:1ステップで拡散に基づく超解像

SinSR: Diffusion-Based Image Super-Resolution in a Single Step ( http://arxiv.org/abs/2311.14760v1 )

ライセンス: Link先を確認

Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C. Kot, Bihan Wen

(参考訳) 拡散モデルに基づく超解像(SR)法は有望な結果を示すが、その実用性は相当な数の推論ステップによって妨げられる。最近の方法では初期状態での劣化画像を利用してマルコフ連鎖を短縮する。しかしながら、これらの解は分解過程の正確な定式化に依存するか、あるいは比較的長い生成経路(例えば15回)を必要とする。推論速度を向上させるため,SinSRと呼ばれる単一ステップのSR生成を実現するための簡易かつ効果的な手法を提案する。具体的には、拡散型SRを高速化する最新のSOTA法から、まず決定論的サンプリング過程を導出する。これにより、入力されたランダムノイズと生成された高解像度画像とのマッピングが、トレーニング中の推論ステップの削減および許容される数で得られる。この決定論的マッピングを1つの推論ステップでsrを実行する学生モデルに蒸留できることを示す。さらに, 蒸留プロセスにおいて, 地上構造画像を同時に活用する新たな一貫性保存損失を提案し, 生徒モデルの性能が教師モデルの特徴多様体にのみ束縛されることを保証し, さらなる性能向上をもたらす。合成および実世界のデータセットを用いた大規模な実験により,提案手法は従来のSOTA法と教師モデルに比較して,1つのサンプリングステップで同等あるいはそれ以上の性能を達成できることが実証された。私たちのコードはhttps://github.com/wyf0912/SinSRでリリースされます。

While super-resolution (SR) methods based on diffusion models exhibit promising results, their practical application is hindered by the substantial number of required inference steps. Recent methods utilize degraded images in the initial state, thereby shortening the Markov chain. Nevertheless, these solutions either rely on a precise formulation of the degradation process or still necessitate a relatively lengthy generation path (e.g., 15 iterations). To enhance inference speed, we propose a simple yet effective method for achieving single-step SR generation, named SinSR. Specifically, we first derive a deterministic sampling process from the most recent state-of-the-art (SOTA) method for accelerating diffusion-based SR. This allows the mapping between the input random noise and the generated high-resolution image to be obtained in a reduced and acceptable number of inference steps during training. We show that this deterministic mapping can be distilled into a student model that performs SR within only one inference step. Additionally, we propose a novel consistency-preserving loss to simultaneously leverage the ground-truth image during the distillation process, ensuring that the performance of the student model is not solely bound by the feature manifold of the teacher model, resulting in further performance improvement. Extensive experiments conducted on synthetic and real-world datasets demonstrate that the proposed method can achieve comparable or even superior performance compared to both previous SOTA methods and the teacher model, in just one sampling step, resulting in a remarkable up to x10 speedup for inference. Our code will be released at https://github.com/wyf0912/SinSR

翻訳日:2023-11-30 09:35:43 公開日:2023-11-23

# ディープラーニングによる暗号通貨価格予測 - 金融、ブロックチェーン、テキストデータの統合

Forecasting Cryptocurrency Prices Using Deep Learning: Integrating Financial, Blockchain, and Text Data ( http://arxiv.org/abs/2311.14759v1 )

ライセンス: Link先を確認

Vincent Gurgul, Stefan Lessmann, Wolfgang Karl H\"ardle

(参考訳) 本稿では,暗号通貨の価格予測,特にBitcoin(BTC)とEthereum(ETH)における機械学習(ML)と自然言語処理(NLP)の応用について検討する。主にtwitterとredditのニュースやソーシャルメディアのデータに焦点を当て,高度なディープラーニングnlp手法を用いて,公開感情が暗号通貨評価に与える影響を分析した。従来の価格回帰に加えて、暗号通貨の価格予測を分類問題として扱う。これには価格変動の予測(上昇または下降)と局所極端の識別の両方が含まれる。我々は,NLPデータ統合の有無にかかわらず,各種MLモデルの性能を比較した。その結果,NLPデータの導入により予測性能が大幅に向上することが判明した。我々は,Twitter-RoBERTaやBART MNLIといった事前学習モデルが市場感情を捉える上で非常に有効であること,そして細調整されたLarge Language Models (LLMs) が大幅な予測改善をもたらすことを発見した。特に、BART MNLIゼロショット分類モデルでは、テキストデータからブルリッシュ信号やベアリッシュ信号を抽出する能力がかなり高い。当社のモデルはすべて、さまざまな検証シナリオで一貫して利益を生成しますが、利益の減少やNLPデータの影響の減少は観測できません。本研究は,金融予測の改善におけるテキスト分析の可能性を浮き彫りにし,ニュアンス市場の感情を捉えたnlp手法の有効性を実証する。

This paper explores the application of Machine Learning (ML) and Natural Language Processing (NLP) techniques in cryptocurrency price forecasting, specifically Bitcoin (BTC) and Ethereum (ETH). Focusing on news and social media data, primarily from Twitter and Reddit, we analyse the influence of public sentiment on cryptocurrency valuations using advanced deep learning NLP methods. Alongside conventional price regression, we treat cryptocurrency price forecasting as a classification problem. This includes both the prediction of price movements (up or down) and the identification of local extrema. We compare the performance of various ML models, both with and without NLP data integration. Our findings reveal that incorporating NLP data significantly enhances the forecasting performance of our models. We discover that pre-trained models, such as Twitter-RoBERTa and BART MNLI, are highly effective in capturing market sentiment, and that fine-tuning Large Language Models (LLMs) also yields substantial forecasting improvements. Notably, the BART MNLI zero-shot classification model shows considerable proficiency in extracting bullish and bearish signals from textual data. All of our models consistently generate profit across different validation scenarios, with no observed decline in profits or reduction in the impact of NLP data over time. The study highlights the potential of text analysis in improving financial forecasts and demonstrates the effectiveness of various NLP techniques in capturing nuanced market sentiment.

翻訳日:2023-11-30 09:35:18 公開日:2023-11-23

# 呼吸疾患検出のための融合音声例と表現

Fused Audio Instance and Representation for Respiratory Disease Detection ( http://arxiv.org/abs/2204.10581v4 )

ライセンス: Link先を確認

Tuan Truong, Matthias Lenga, Antoine Serrurier, Sadegh Mohammadi

(参考訳) 体音の音声に基づく分類技術は、呼吸器疾患の診断を助けるために長年研究されてきた。ほとんどの研究は、主要なバイオマーカーとしてcoughの使用に重点を置いているが、他の身体音は呼吸器疾患を検出する可能性も持っている。新型コロナウイルスに関する最近の研究によると、息の音と発声音は、この病気と相関している。本研究は,呼吸性疾患の診断方法としてFAIR(Fused Audio Instance and Representation)を提案する。フェアは波形とスペクトログラムで表される様々なボディサウンドからジョイント特徴ベクトルを構築することに依存している。体音の波形とスペクトログラムの表現を組み合わせることで、COVID-19検出のユースケースについて実験を行った。以上の結果から, 聴覚, 呼吸, 音声から抽出した特徴を組み合わすことで, 受信者動作特性曲線(AUC)スコアが0.8658, 感度が0.8057, 特異性が0.7958であることが示唆された。スペクトログラムや波形にのみ訓練されたモデルと比較して、両表現の使用によりAUCスコアが向上し、スペクトルと波形表現の組み合わせは抽出した特徴を豊かにし、1つの表現のみを使用するモデルよりも優れていることを示す。

Audio-based classification techniques on body sounds have long been studied to aid in the diagnosis of respiratory diseases. While most research is centered on the use of cough as the main biomarker, other body sounds also have the potential to detect respiratory diseases. Recent studies on COVID-19 have shown that breath and speech sounds, in addition to cough, correlate with the disease. Our study proposes Fused Audio Instance and Representation (FAIR) as a method for respiratory disease detection. FAIR relies on constructing a joint feature vector from various body sounds represented in waveform and spectrogram form. We conducted experiments on the use case of COVID-19 detection by combining waveform and spectrogram representation of body sounds. Our findings show that the use of self-attention to combine extracted features from cough, breath, and speech sounds leads to the best performance with an Area Under the Receiver Operating Characteristic Curve (AUC) score of 0.8658, a sensitivity of 0.8057, and a specificity of 0.7958. Compared to models trained solely on spectrograms or waveforms, the use of both representations results in an improved AUC score, demonstrating that combining spectrogram and waveform representation helps to enrich the extracted features and outperforms the models that use only one representation.

翻訳日:2023-11-28 05:21:44 公開日:2023-11-23

# ビッグデータクラスタリングにK-meansを使うには?

How to Use K-means for Big Data Clustering? ( http://arxiv.org/abs/2204.07485v3 )

ライセンス: Link先を確認

Rustam Mussabayev, Nenad Mladenovic, Bassem Jarboui, Ravil Mussabayev

(参考訳) K-meansはデータマイニングにおいて重要な役割を担い、ユークリッド最小値クラスタリング(MSSC)モデルの下で最もシンプルで広く使われているアルゴリズムである。しかし、その性能は膨大なデータに適用すると劇的に低下する。したがって、データ、時間、アルゴリズムといった計算資源を可能な限り少なく使用して、ビッグデータにスケールすることで、K平均を改善することが重要である。そこで我々は,K-meansとK-means++アルゴリズムを,‘true Big Data’アルゴリズムの特性を満足するビッグデータクラスタリングに利用し,ソリューションの品質と実行性の観点から,古典的かつ最新のMSSCアプローチよりも優れた並列方式を提案する。新たなアプローチでは,MSSC問題をメタヒューリスティクスを使わずに分解することで,グローバル検索を自然に実現している。この研究は、ビッグデータクラスタリング問題を解決するための基本的なアプローチがデータの分解であることを示している。新しいアルゴリズムの実証的な成功により、優れたクラスタリングソリューションを得るためにより多くのデータが必要であるという共通の信念に挑戦することができました。さらに,より高度なクラスタリングソリューションを得るためには,より洗練されたハイブリッドアプローチとアルゴリズムが必要であるという確立されたトレンドに疑問を呈する。

K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of data. Therefore, it is crucial to improve K-means by scaling it to big data using as few of the following computational resources as possible: data, time, and algorithmic ingredients. We propose a new parallel scheme of using K-means and K-means++ algorithms for big data clustering that satisfies the properties of a ``true big data'' algorithm and outperforms the classical and recent state-of-the-art MSSC approaches in terms of solution quality and runtime. The new approach naturally implements global search by decomposing the MSSC problem without using additional metaheuristics. This work shows that data decomposition is the basic approach to solve the big data clustering problem. The empirical success of the new algorithm allowed us to challenge the common belief that more data is required to obtain a good clustering solution. Moreover, the present work questions the established trend that more sophisticated hybrid approaches and algorithms are required to obtain a better clustering solution.

翻訳日:2023-11-28 05:21:24 公開日:2023-11-23

# 量子コンピュータ上での重力波マッチングフィルタリング

Gravitational-wave matched filtering on a quantum computer ( http://arxiv.org/abs/2204.04159v3 )

ライセンス: Link先を確認

Do\u{g}a Veske, Cenk T\"uys\"uz, Mirko Amico, Nicholas T. Bronn, Olivia T. Lanes, Imre Bartos, Zsuzsa M\'arka, Sebastian Will, Szabolcs M\'arka

(参考訳) 最先端の量子コンピュータは正確な計算に適用性が非常に限られている。本稿では,2値ブラックホールの融合から重力波信号を検出するために,量子ビットベースマッチングフィルタの最初の実験を行った。超伝導量子ビットの実装により、2元ブラックホールの融合の信号対雑音比が古典的な計算で達成でき、実用的なタスクに量子ビットの有効性を示す証拠が得られた。この応用のために考案したアルゴリズムは,量子計算と古典計算を併用したモンテカルロアルゴリズムである。時間領域畳み込みの準量子速度アップを提供するもので、高速フーリエ変換で実現可能である。

State of the art quantum computers have very limited applicability for accurate calculations. Here we report the first experimental demonstration of qubit-based matched filtering for a detection of the gravitational-wave signal from a binary black hole merger. With our implementation on noisy superconducting qubits, we obtained a similar signal-to-noise ratio for the binary black hole merger as achievable with classical computation, providing evidence for the utility of qubits for practically relevant tasks. The algorithm we invented for this application is a Monte Carlo algorithm which uses quantum and classical computation together. It provides a quasi-quadartic speed-up for time-domain convolution, similar to achievable with fast Fourier transform.

翻訳日:2023-11-28 05:21:04 公開日:2023-11-23

# 安全・信頼性の観点からの物体(ミス)検出の評価:議論と対策

Evaluating Object (mis)Detection from a Safety and Reliability Perspective: Discussion and Measures ( http://arxiv.org/abs/2203.02205v3 )

ライセンス: Link先を確認

Andrea Ceccarelli and Leonardo Montecchi

(参考訳) 安全クリティカルドメインのオブジェクト検出器は、自律的なアクターの動作に最も干渉しそうなオブジェクトの検出を優先すべきである、と我々は主張する。特に、アクターの安全性と信頼性に影響を与える可能性のあるオブジェクトに当てはまる。自律運転における物体(誤検出)の検出が安全性と信頼性に与える影響を定量化するために,最も危険で運転決定に影響する可能性が最も高い物体の正確な識別を報いる新しい物体検出手法を提案する。これを実現するために,対象車両に対する近接,方向,相対速度に基づく物体の検出に報いる物体臨界モデルを構築した。次に、最近の自律走行データセットnuScenesにモデルを適用し、9つの物体検出器を比較した。その結果、いくつかの環境では、安全性と信頼性に重点を置いている場合、nuScenesランキングでベストに機能するオブジェクト検出器は好ましくないことが判明した。

We argue that object detectors in the safety critical domain should prioritize detection of objects that are most likely to interfere with the actions of the autonomous actor. Especially, this applies to objects that can impact the actor's safety and reliability. To quantify the impact of object (mis)detection on safety and reliability in the context of autonomous driving, we propose new object detection measures that reward the correct identification of objects that are most dangerous and most likely to affect driving decisions. To achieve this, we build an object criticality model to reward the detection of the objects based on proximity, orientation, and relative velocity with respect to the subject vehicle. Then, we apply our model on the recent autonomous driving dataset nuScenes, and we compare nine object detectors. Results show that, in several settings, object detectors that perform best according to the nuScenes ranking are not the preferable ones when the focus is shifted on safety and reliability.

翻訳日:2023-11-28 05:20:19 公開日:2023-11-23

# スピノダル分解による超高インダクタンス材料

Ultrahigh-inductance materials from spinodal decomposition ( http://arxiv.org/abs/2111.05088v2 )

ライセンス: Link先を確認

Ran Gao, Hsiang-Sheng Ku, Hao Deng, Wenlong Yu, Tian Xia, Feng Wu, Zhijun Song, Xiaohe Miao, Chao Zhang, Yue Lin, Yaoyun Shi, Hui-Hai Zhao, Chunqing Deng

(参考訳) 動的インダクタンスを有する不規則超伝導窒化物は、長い間、高インダクタンス量子回路応用の主要な材料候補と見なされてきた。動的インダクタンスと対応する回路インピーダンスを増加させるため、材料寸法を減少させる努力は継続するが、材料品質を損なうことなくさらに改善することが根本的な課題となる。そこで本研究では,マイクロ波損失を低く抑えつつ,スピノダル分解による超伝導材料の動的インダクタンスを劇的に増加させる手法を提案する。モデルシステムとして, エピタキシャルTi\textsubscript{0.48}Al\textsubscript{0.52}Nを用い, スピノダール分解による絶縁体-超伝導体遷移の誘導を初めて実証した。測定された速度論的インダクタンスは、最も不規則な超伝導窒化物と比較して2～3等級に増加した。本研究は, 超伝導量子回路のインダクタンスを大幅に向上し, 決定的に制御する手法である。

Disordered superconducting nitrides with kinetic inductance have long been considered a leading material candidate for high-inductance quantum-circuit applications. Despite continuing efforts in reducing material dimensions to increase the kinetic inductance and the corresponding circuit impedance, it becomes a fundamental challenge to improve further without compromising material qualities. To this end, we propose a method to drastically increase the kinetic inductance of superconducting materials via spinodal decomposition while keeping a low microwave loss. We use epitaxial Ti\textsubscript{0.48}Al\textsubscript{0.52}N as a model system, and for the first time demonstrate the utilization of spinodal decomposition to trigger the insulator-to-superconductor transition with a drastically enhanced material disorder. The measured kinetic inductance has increased by 2-3 orders of magnitude compared with all the best reported disordered superconducting nitrides. Our work paves the way for substantially enhancing and deterministically controlling the inductance for advanced superconducting quantum circuits.

翻訳日:2023-11-28 05:18:53 公開日:2023-11-23

# エピタキシャル窒化チタンマイクロ波共振器:構造、化学、電気およびマイクロ波特性

Epitaxial titanium nitride microwave resonators: Structural, chemical, electrical, and microwave properties ( http://arxiv.org/abs/2111.04227v4 )

ライセンス: Link先を確認

Ran Gao, Wenlong Yu, Hao Deng, Hsiang-Sheng Ku, Zhisheng Li, Minghua Wang, Xiaohe Miao, Yue Lin, Chunqing Deng

(参考訳) 窒化チタンはマイクロ波損失が少なく、表面インダクタンスが高く、化学的安定性があるため、超伝導量子回路応用の魅力的な材料である。しかし、物理的特性とデバイス性能は材料の品質に大きく依存している。ここでは中間温度(300$^{\circ}$c)でマグネトロンスパッタリングによりサファイア基板上に堆積した高結晶性およびエピタキシャル窒化チタン薄膜に注目した。構造的, 化学的, 輸送的性質を徹底的に理解するために, 体系的かつ包括的な材料特性評価を行う。パターン型マイクロ波共振器を用いて低温でのマイクロ波損失を計測し, 単一光子系内部の最高品質係数を3.3\times 10^6$, $>1.0\times 10^7$とした。共振器の材料充填係数に調整されたマイクロ波損失試薬は、以前報告した超伝導共振器のベスト値とよく比較される。この研究は、エピタキシャル窒化チタンを用いた低損失超伝導量子回路の基礎を成す。

Titanium nitride is an attractive material for a range of superconducting quantum-circuit applications owing to its low microwave losses, high surface inductance, and chemical stability. The physical properties and device performance, nevertheless, depend strongly on the quality of the materials. Here we focus on the highly crystalline and epitaxial titanium nitride thin films deposited on sapphire substrates using magnetron sputtering at an intermediate temperature (300$^{\circ}$C). We perform a set of systematic and comprehensive material characterization to thoroughly understand the structural, chemical, and transport properties. Microwave losses at low temperatures are studied using patterned microwave resonators, where the best internal quality factor in the single-photon regime is measured to be $3.3\times 10^6$, and $> 1.0\times 10^7$ in the high-power regime. Adjusted with the material filling factor of the resonators, the microwave loss-tangent here compares well with the previously reported best values for superconducting resonators. This work lays the foundation of using epitaxial titanium nitride for low-loss superconducting quantum circuits.

翻訳日:2023-11-28 05:18:33 公開日:2023-11-23

# ニューラルネットワーク検証のための共有証明書

Shared Certificates for Neural Network Verification ( http://arxiv.org/abs/2109.00542v4 )

ライセンス: Link先を確認

Marc Fischer, Christian Sprecher, Dimitar I. Dimitrov, Gagandeep Singh, Martin Vechev

(参考訳) 既存のニューラルネットワーク検証器は、各層における到達可能な値の象徴的抽象化を伝播することにより、各入力が所定の摂動の下で正しく扱われることを示す。このプロセスは、各入力(画像など)と摂動(回転など)に対して独立してスクラッチから繰り返されるため、データセット全体を扱う場合のコストがかかる。本研究では,入力と摂動の異なる中間層で得られる抽象概念が重なり,あるいは互いに包含できるという重要な洞察に基づいて,精度を損なうことなく検証コストを削減する新しい手法を提案する。この知見を活かして,共有証明書の一般概念を導入し,複数の入力をまたいだ検証作業の再利用を可能にし,検証コストを削減した。一般的なパッチや幾何学的摂動を含む画像分類器に対する攻撃仕様やデータセットに対する検証コストの低減に有効な共有証明書の有効性を示すための実験的な評価を行った。実装はhttps://github.com/eth-sri/proof-sharingでリリースします。

Existing neural network verifiers compute a proof that each input is handled correctly under a given perturbation by propagating a symbolic abstraction of reachable values at each layer. This process is repeated from scratch independently for each input (e.g., image) and perturbation (e.g., rotation), leading to an expensive overall proof effort when handling an entire dataset. In this work, we introduce a new method for reducing this verification cost without losing precision based on a key insight that abstractions obtained at intermediate layers for different inputs and perturbations can overlap or contain each other. Leveraging our insight, we introduce the general concept of shared certificates, enabling proof effort reuse across multiple inputs to reduce overall verification costs. We perform an extensive experimental evaluation to demonstrate the effectiveness of shared certificates in reducing the verification cost on a range of datasets and attack specifications on image classifiers including the popular patch and geometric perturbations. We release our implementation at https://github.com/eth-sri/proof-sharing.

翻訳日:2023-11-28 05:17:48 公開日:2023-11-23

# 非マルコフ系における初期系相関によるサイドバンド冷却の最適化

Optimized sideband cooling with initial system correlations in non-Markovian regime ( http://arxiv.org/abs/2007.14094v2 )

ライセンス: Link先を確認

Wen-Zhao Zhang, Ting Tan, Jie Zhao, Wenlin Li, and Jiong Cheng

(参考訳) 一般機械式非マルコフ型貯水池と結合した標準光力学系において,初期系相関の存在下でのサイドバンド冷却を最適化した。本研究では,初期相関の効果をハイゼンベルク方程式の時間依存係数に組み込むことで,フォノン数の進化を研究する。冷却速度の概念を導入し,非マルコフ系におけるサイドバンド冷却効果を記述するために,平均フォノン還元関数を定義する。その結果,パラメトリック増幅型とビームスプリッター型の初期相関を導入することにより,瞬時フォノン数を大幅に削減できることがわかった。また,ビームスプリッタ型の初期相関を高めることにより,地中冷却速度を向上することができる。システムの初期状態を最適化し、Q変調技術を活用することにより、非常に短時間で安定した機械的地盤状態を得ることができる。我々の最適化冷却プロトコルは固体系のフォノン操作と量子情報処理のための魅力的なプラットフォームを提供する。

An optimized sideband cooling in the presence of initial system correlations is investigated for a standard optomechanical system coupled to a general mechanical non-Markovian reservoir. We study the evolution of phonon number by incorporating the effects of initial correlations into the time-dependent coefficients in the Heisenberg equation. We introduce the concept of cooling rate and define an average phonon reduction function to describe the sideband cooling effect in non-Markovian regime. Our results show that the instantaneous phonon number can be significantly reduced by introducing either the parametric-amplification type or the beam-splitter type initial correlations. In addition, the ground state cooling rate can be accelerated by enhancing the initial correlation of beam-splitter type. By optimizing the initial state of the system and utilizing Q-modulation technology, a stable mechanical ground state can be obtained in a very short time. Our optimized cooling protocol provides an appealing platform for phonon manipulation and quantum information processing in solid-state systems.

翻訳日:2023-11-28 05:16:32 公開日:2023-11-23

# スマートフォンリアルタイムアプリケーションのための知覚画像強調

Perceptual Image Enhancement for Smartphone Real-Time Applications ( http://arxiv.org/abs/2210.13552v2 )

ライセンス: Link先を確認

Marcos V. Conde, Florin Vasluianu, Javier Vazquez-Corral, Radu Timofte

(参考訳) 近年のカメラ設計や画像パイプラインの進歩により,スマートフォンによる高品質な画像の撮影が可能になった。しかし、スマートフォンカメラの小型化とレンズの限界のため、一般的には加工画像のアーチファクトや劣化が見られる。最も不快な効果は、ノイズアーティファクト、回折アーティファクト、ぼかし、HDR過剰露光である。画像復元のためのディープラーニング手法は、これらのアーティファクトをうまく取り除くことができる。しかし、多くのアプローチは、計算量とメモリ要件が重いため、モバイルデバイスのリアルタイムアプリケーションには適していない。本稿では,知覚的画像強調のための軽量ネットワークであるLPIENetを提案する。実験の結果,パラメータや操作がはるかに少ないため,提案したアーティファクトに対処でき,標準ベンチマークの最先端手法と比較して競争性能が向上することがわかった。さらに,提案手法の効率性と信頼性を証明するため,市販スマートフォンに直接モデルを配置し,性能評価を行った。我々のモデルは中級商用スマートフォンで1秒未満で2K解像度画像を処理することができる。

Recent advances in camera designs and imaging pipelines allow us to capture high-quality images using smartphones. However, due to the small size and lens limitations of the smartphone cameras, we commonly find artifacts or degradation in the processed images. The most common unpleasant effects are noise artifacts, diffraction artifacts, blur, and HDR overexposure. Deep learning methods for image restoration can successfully remove these artifacts. However, most approaches are not suitable for real-time applications on mobile devices due to their heavy computation and memory requirements. In this paper, we propose LPIENet, a lightweight network for perceptual image enhancement, with the focus on deploying it on smartphones. Our experiments show that, with much fewer parameters and operations, our model can deal with the mentioned artifacts and achieve competitive performance compared with state-of-the-art methods on standard benchmarks. Moreover, to prove the efficiency and reliability of our approach, we deployed the model directly on commercial smartphones and evaluated its performance. Our model can process 2K resolution images under 1 second in mid-level commercial smartphones.

翻訳日:2023-11-28 05:10:26 公開日:2023-11-23

# 移植学習による病理組織像の自動スコア化

Automatically Score Tissue Images Like a Pathologist by Transfer Learning ( http://arxiv.org/abs/2209.05954v4 )

ライセンス: Link先を確認

Iris Yan

(参考訳) がんは世界で2番目に多い死因である。早期にがんを診断することで多くの命を救える。病理学者は、腫瘍を特定するために手動で組織マイクロアレイ(TMA)画像を見る必要がある。既存の自動アルゴリズムは病理学者の正確性レベルに達していないか、あるいはかなりの人間の関与を必要とする。最大の課題は、異なる形状、サイズ、位置のtma画像が同じスコアを持つ可能性があることである。 tma画像における染色パターンの学習には膨大な数の画像が必要であり、医療機関のプライバシーや規制上の懸念からかなり制限されている。異なるがんタイプのTMA画像は、特定の共通の特徴を共有できるが、それらの組み合わせは、染色パターンの不均一性による精度を直接的に損なう。トランスファーラーニングは、同様の問題から強みを借りることのできる、新たな学習パラダイムである。しかし、既存のアプローチでは、通常、類似した学習問題の大規模なサンプルを必要とするが、異なるがんタイプのTMAイメージは、小さなサンプルサイズでしばしば利用可能であり、さらに既存のアルゴリズムは、類似した問題からの学習の転送に限られている。本稿では,複数の問題から学習可能な新しい移動学習アルゴリズムを提案する。各問題には小さなサンプルがあり,元の問題とはかなり異なる分布を持つことができる。提案したアルゴリズムは、スタンフォード組織マイクロアレイデータベース(Stanford tissue Microarray Database)から乳がんTMA画像の75.9%の精度で、重要な精度障壁(病理医の75%の精度レベル)を破ることを可能にした。転送学習理論の最近の発展とクラスタリング技術の実証的証拠によって支持されている。これにより、病理学者は腫瘍をリアルタイムでより高い精度で認識する自動アルゴリズムを確実に採用できる。

Cancer is the second leading cause of death in the world. Diagnosing cancer early on can save many lives. Pathologists have to look at tissue microarray (TMA) images manually to identify tumors, which can be time-consuming, inconsistent and subjective. Existing automatic algorithms either have not achieved the accuracy level of a pathologist or require substantial human involvements. A major challenge is that TMA images with different shapes, sizes, and locations can have the same score. Learning staining patterns in TMA images requires a huge number of images, which are severely limited due to privacy and regulation concerns in medical organizations. TMA images from different cancer types may share certain common characteristics, but combining them directly harms the accuracy due to heterogeneity in their staining patterns. Transfer learning is an emerging learning paradigm that allows borrowing strength from similar problems. However, existing approaches typically require a large sample from similar learning problems, while TMA images of different cancer types are often available in small sample size and further existing algorithms are limited to transfer learning from one similar problem. We propose a new transfer learning algorithm that could learn from multiple related problems, where each problem has a small sample and can have a substantially different distribution from the original one. The proposed algorithm has made it possible to break the critical accuracy barrier (the 75% accuracy level of pathologists), with a reported accuracy of 75.9% on breast cancer TMA images from the Stanford Tissue Microarray Database. It is supported by recent developments in transfer learning theory and empirical evidence in clustering technology. This will allow pathologists to confidently adopt automatic algorithms in recognizing tumors consistently with a higher accuracy in real time.

翻訳日:2023-11-28 05:09:12 公開日:2023-11-23

# 量子誤差緩和のためのユニバーサルサンプリング下限

Universal Sampling Lower Bounds for Quantum Error Mitigation ( http://arxiv.org/abs/2208.09178v4 )

ライセンス: Link先を確認

Ryuji Takagi and Hiroyasu Tajima and Mile Gu

(参考訳) 中間スケールの量子デバイスにおけるノイズ効果を抑制するために、多くの量子誤り軽減プロトコルが提案されている。しかし、その一般的な可能性と限界はいまだ解明されていない。特に、量子エラー軽減の究極の実現可能性を理解するためには、基本サンプリングコスト -- 任意の緩和プロトコルがノイズの多い量子デバイスを実行しなければならない回数 -- を特徴付けることが不可欠である。本稿では,量子誤差軽減のためのサンプリングコストの普遍的下限を定め,高い確率で所望の精度を達成する。我々の限界は、非線形後処理を含む一般的な緩和プロトコルや、未発見のプロトコルにも当てはまる。その結果、様々なノイズモデルにおいて、幅広い種類のプロトコルがエラーを緩和するために必要となるサンプリングコストは指数関数的に増大し、有用なノイズの短期量子デバイスのスケーラビリティにおける基本的な障害が明らかになった。

Numerous quantum error-mitigation protocols have been proposed, motivated by the critical need to suppress noise effects on intermediate-scale quantum devices. Yet, their general potential and limitations remain elusive. In particular, to understand the ultimate feasibility of quantum error mitigation, it is crucial to characterize the fundamental sampling cost -- how many times an arbitrary mitigation protocol must run a noisy quantum device. Here, we establish universal lower bounds on the sampling cost for quantum error mitigation to achieve the desired accuracy with high probability. Our bounds apply to general mitigation protocols, including the ones involving nonlinear postprocessing and those yet-to-be-discovered. The results imply that the sampling cost required for a wide class of protocols to mitigate errors must grow exponentially with the circuit depth for various noise models, revealing the fundamental obstacles in the scalability of useful noisy near-term quantum devices.

翻訳日:2023-11-28 05:08:22 公開日:2023-11-23

# DivideとConquer:Point-Wiseのバイナリ化による3Dポイントクラウドインスタンスセグメンテーション

Divide and Conquer: 3D Point Cloud Instance Segmentation With Point-Wise Binarization ( http://arxiv.org/abs/2207.11209v4 )

ライセンス: Link先を確認

Weiguang Zhao, Yuyao Yan, Chaolong Yang, Jianan Ye, Xi Yang, Kaizhu Huang

(参考訳) ポイントクラウド上のインスタンスセグメンテーションは、3Dシーン理解にとって極めて重要である。ほとんどのSOTAは距離クラスタリングを採用しており、通常は有効であるが、隣接するオブジェクトを同じセマンティックラベルで区分けする(特に隣接するポイントを共有する場合)にはうまく機能しない。オフセットポイントの不均一な分布のため、これらの既存のメソッドはすべてのインスタンスポイントをクラスタ化できない。そこで本研究では,各点を二項化してセグメントインスタンスに分割してクラスタ化するPBNetという新しい分割・コンカレント戦略を設計する。我々のバイナリクラスタリングでは、オフセットインスタンスポイントを高密度点と低密度点(HP対LP)の2つのカテゴリに分けています。隣接オブジェクトは、LPを除去して明確に分離し、隣の投票方法でLPを割り当てることで完了および洗練することができる。過剰なセグメンテーションを抑制するために,各インスタンスの重みマスクを用いてローカルシーンを構築することを提案する。プラグインとして提案されているバイナリクラスタリングは、従来の距離クラスタリングを置き換えることができ、多くの主流ベースラインで一貫したパフォーマンス向上につながる。 ScanNetV2とS3DISデータセットに関する一連の実験は、我々のモデルの優位性を示している。特にPBNetは、ScanNetV2の公式ベンチマークチャレンジでトップにランクインし、最も高いmAPを達成した。コードはhttps://github.com/weiguangzhao/pbnetで公開される予定だ。

Instance segmentation on point clouds is crucially important for 3D scene understanding. Most SOTAs adopt distance clustering, which is typically effective but does not perform well in segmenting adjacent objects with the same semantic label (especially when they share neighboring points). Due to the uneven distribution of offset points, these existing methods can hardly cluster all instance points. To this end, we design a novel divide-and-conquer strategy named PBNet that binarizes each point and clusters them separately to segment instances. Our binary clustering divides offset instance points into two categories: high and low density points (HPs vs. LPs). Adjacent objects can be clearly separated by removing LPs, and then be completed and refined by assigning LPs via a neighbor voting method. To suppress potential over-segmentation, we propose to construct local scenes with the weight mask for each instance. As a plug-in, the proposed binary clustering can replace traditional distance clustering and lead to consistent performance gains on many mainstream baselines. A series of experiments on ScanNetV2 and S3DIS datasets indicate the superiority of our model. In particular, PBNet ranks first on the ScanNetV2 official benchmark challenge, achieving the highest mAP. Code will be available publicly at https://github.com/weiguangzhao/PBNet.

翻訳日:2023-11-28 05:07:22 公開日:2023-11-23

# 記憶能を考慮した3次元ヘリカルCT再構成法

3D helical CT Reconstruction with a Memory Efficient Learned Primal-Dual Architecture ( http://arxiv.org/abs/2205.11952v2 )

ライセンス: Link先を確認

Jevgenija Rudzusika, Buda Baji\'c, Thomas Koehler, Ozan \"Oktem

(参考訳) 深層学習によるCT(Computerd tomography)の再構成は, シミュレーション2次元低線量CTデータにおいて顕著な性能を示した。これは特に、CTイメージングのための手作り物理モデルを含む、ドメイン適応ニューラルネットワークに適用できる。このようなアーキテクチャを採用することで、トレーニングデータの需要が減少し、一般化によって改善される、という実証的な証拠がある。しかし,3次元ヘリカルCTは医用画像の取得法として最も一般的な3次元ヘリカルCTにおいて,急速に禁止となる膨大な計算資源を必要とする。さらに、臨床データには、フラックス測定の誤差、分解ミスマッチ、そして最も重要なことは、実際の真実がないことなど、シミュレーションで考慮されていない他の課題も伴っている。これらの課題に対処するために必要な計算可能トレーニングと組み合わせることの必要性は,臨床3次元ヘリカルCTによる深層学習の再構築を困難にしている。本稿では,学習プライマル・ダイアル (lpd) というドメイン適応型ニューラルネットワークアーキテクチャを改良し,この環境での再構築を訓練し,応用する。ヘリカル軌道をセクションに分割し,そのセクションに無回転のLPD反復を順次適用することで,これを実現する。我々の知る限りでは、この研究は、低線量CT画像や投影データセット(LDCT)のようなフルサイズの臨床データに、非ロールのディープラーニングアーキテクチャを適用した最初のものである。さらに、トレーニングとテストは、24GBのメモリを持つ単一のGPUカード上で行われる。

Deep learning based computed tomography (CT) reconstruction has demonstrated outstanding performance on simulated 2D low-dose CT data. This applies in particular to domain adapted neural networks, which incorporate a handcrafted physics model for CT imaging. Empirical evidence shows that employing such architectures reduces the demand for training data and improves upon generalisation. However, their training requires large computational resources that quickly become prohibitive in 3D helical CT, which is the most common acquisition geometry used for medical imaging. Furthermore, clinical data also comes with other challenges not accounted for in simulations, like errors in flux measurement, resolution mismatch and, most importantly, the absence of the real ground truth. The necessity to have a computationally feasible training combined with the need to address these issues has made it difficult to evaluate deep learning based reconstruction on clinical 3D helical CT. This paper modifies a domain adapted neural network architecture, the Learned Primal-Dual (LPD), so that it can be trained and applied to reconstruction in this setting. We achieve this by splitting the helical trajectory into sections and applying the unrolled LPD iterations to those sections sequentially. To the best of our knowledge, this work is the first to apply an unrolled deep learning architecture for reconstruction on full-sized clinical data, like those in the Low dose CT image and projection data set (LDCT). Moreover, training and testing is done on a single GPU card with 24GB of memory.

翻訳日:2023-11-28 05:05:50 公開日:2023-11-23

# 低周波・高周波同時ブートストラップによる大規模時系列表現学習

Large Scale Time-Series Representation Learning via Simultaneous Low and High Frequency Feature Bootstrapping ( http://arxiv.org/abs/2204.11291v2 )

ライセンス: Link先を確認

Vandan Gorade, Azad Singh and Deepak Mishra

(参考訳) ラベルのない時系列データからの表現の学習は難しい問題である。時系列領域における既存の自己監督的および非教師的アプローチの多くは、同時に低周波数の特徴を捉えない。さらに、これらの方法のいくつかは、トランスフォーマーのような大規模モデルを採用するか、コントラスト学習のような計算コストの高い技術に依存している。これらの問題に対処するために,非コントラスト型自己教師型学習手法を提案する。本手法は, 時系列データを入力として入力し, 同一家族からランダムに増補をサンプリングすることで, モデルの2つの分岐に対して2つの異なる拡張ビューを生成する。 BYOLの用語に従い、2つのブランチはオンラインとターゲットネットワークと呼ばれ、潜在表現のブートストラップを可能にする。 BYOLとは対照的に、バックボーンエンコーダにマルチ層パーセプトロン(MLP)ヘッドが続き、提案モデルは、追加の時間畳み込みネットワーク(TCN)ヘッドを含む。拡張ビューはエンコーダの大きなカーネル畳み込みブロックを通過するため、後続のMLPとTCNの組み合わせは、様々な受容場による低域と高周波数の時間変化の特徴を効果的に表現することができる。 2つのモジュール (MLP と TCN) は相補的に作用する。対象ネットワークブランチの各モジュールの結果を予測するために,各モジュールが学習するオンラインネットワークをトレーニングする。モデルの堅牢性を実証するために,5つの実世界の時系列データセットに関する広範な実験とアブレーション研究を行った。本手法は,5つの実世界のデータセットすべてにおいて最先端のパフォーマンスを達成した。

Learning representation from unlabeled time series data is a challenging problem. Most existing self-supervised and unsupervised approaches in the time-series domain do not capture low and high-frequency features at the same time. Further, some of these methods employ large scale models like transformers or rely on computationally expensive techniques such as contrastive learning. To tackle these problems, we propose a non-contrastive self-supervised learning approach efficiently captures low and high-frequency time-varying features in a cost-effective manner. Our method takes raw time series data as input and creates two different augmented views for two branches of the model, by randomly sampling the augmentations from same family. Following the terminology of BYOL, the two branches are called online and target network which allows bootstrapping of the latent representation. In contrast to BYOL, where a backbone encoder is followed by multilayer perceptron (MLP) heads, the proposed model contains additional temporal convolutional network (TCN) heads. As the augmented views are passed through large kernel convolution blocks of the encoder, the subsequent combination of MLP and TCN enables an effective representation of low as well as high-frequency time-varying features due to the varying receptive fields. The two modules (MLP and TCN) act in a complementary manner. We train an online network where each module learns to predict the outcome of the respective module of target network branch. To demonstrate the robustness of our model we performed extensive experiments and ablation studies on five real-world time-series datasets. Our method achieved state-of-art performance on all five real-world datasets.

翻訳日:2023-11-28 05:05:18 公開日:2023-11-23

# 高速ノイズ動作を用いた高速繰り返し猫符号

High-performance repetition cat code using fast noisy operations ( http://arxiv.org/abs/2212.11927v4 )

ライセンス: Link先を確認

Francois-Marie Le R\'egent, Camille Berdou, Zaki Leghtas, J\'er\'emie Guillaud and Mazyar Mirrahimi

(参考訳) 2光子駆動の散逸によって安定化されるボソニックキャットキュービットは、ビットフリップエラーの指数関数的な抑制と、この保護を保った広いゲートの恩恵を受ける。これらの特性により、ハードウェア効率が高くフォールトトレラントな量子プロセッサのビルディングブロックが期待できる。本稿では,高速だがノイズの多いCNOTゲートを用いた繰り返しキャットコードアーキテクチャの性能最適化手法を提案する。この最適化は、ボソニックモードの内在的な単光子損失率と2光子損失率との比として与えられる物理量に対する高い閾値をもたらし、また、必要オーバーヘッドのしきい値以下の非常に興味深いスケーリングにより、期待される論理誤差率に達する。キャット量子ビット演算の特定の誤差モデルに基づき、この最適化は高速パリティ測定を利用して、高速化された低忠実度CNOTゲートと高速アンシラパリティチェックキュービットを組み合わせる。キャットキュービットCNOTゲートが制御(アンシラ)キュービットの主要成分を持つ高度非対称誤差モデルである1-と、高速動作によって誘導されるリークの有無でエラー訂正性能の堅牢性を示す2-である。これらの性能を示すために,猫のクビット状態のリークを考慮した回路レベルの雑音下での繰り返しコードのサンプリング法を開発した。

Bosonic cat qubits stabilized by two-photon driven dissipation benefit from exponential suppression of bit-flip errors and an extensive set of gates preserving this protection. These properties make them promising building blocks of a hardware-efficient and fault-tolerant quantum processor. In this paper, we propose a performance optimization of the repetition cat code architecture using fast but noisy CNOT gates for stabilizer measurements. This optimization leads to high thresholds for the physical figure of merit, given as the ratio between intrinsic single-photon loss rate of the bosonic mode and the engineered two-photon loss rate, as well as a very interesting scaling below threshold of the required overhead, to reach an expected level of logical error rate. Relying on the specific error models for cat qubit operations, this optimization exploits fast parity measurements, using accelerated low-fidelity CNOT gates, combined with fast ancilla parity-check qubits. The significant enhancement in the performance is explained by: 1- the highly asymmetric error model of cat qubit CNOT gates with a major component on control (ancilla) qubits, and 2- the robustness of the error correction performance in presence of the leakage induced by fast operations. In order to demonstrate these performances, we develop a method to sample the repetition code under circuit-level noise that also takes into account cat qubit state leakage.

翻訳日:2023-11-28 04:56:52 公開日:2023-11-23

# 単一サーバキューシステムにおける学習型最適アドミッション制御

Learning-based Optimal Admission Control in a Single Server Queuing System ( http://arxiv.org/abs/2212.11316v2 )

ライセンス: Link先を確認

Asaf Cohen, Vijay G. Subramanian, Yili Zhang

(参考訳) 我々は,M/M/1待ち行列システムにおける入場制御問題の長期平均利益を最大化することを検討する。サービス完了時に定額の報酬と、待ち行列で待機している顧客に対して実施される時間単位当たりのコストにより、ディスペンサーは、システムの待ち行列の長さの観察の全履歴に基づいて、到着客を認めるか否かを判断する。 (naor 1969, econometrica) は、モデルの全パラメータが知られている場合、静的しきい値ポリシーを使用するのが最適であることを示した。本研究では,Naor(1969)の全情報モデルに対する最適ディスパッチポリシーについて,学習に基づくディスパッチアルゴリズムを提案し,その後悔を特徴づける。我々は,全情報を含む最適しきい値がゼロでない場合,アルゴリズムが$O(1)$後悔を達成し,全情報を持つ最適しきい値が0$である場合,$N$が到着数である場合,$O(\ln^{1+\epsilon}(N))$後悔を達成できることを示す。

We consider a long-term average profit maximizing admission control problem in an M/M/1 queuing system with unknown service and arrival rates. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue-length of the system. (Naor 1969, Econometrica) showed that if all the parameters of the model are known, then it is optimal to use a static threshold policy -- admit if the queue-length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full information model of Naor (1969). We show that the algorithm achieves an $O(1)$ regret when all optimal thresholds with full information are non-zero, and achieves an $O(\ln^{1+\epsilon}(N))$ regret for any specified $\epsilon>0$, in the case that an optimal threshold with full information is $0$ (i.e., an optimal policy is to reject all arrivals), where $N$ is the number of arrivals.

翻訳日:2023-11-28 04:56:27 公開日:2023-11-23

# 物体検出のための適応的自己学習

Adaptive Self-Training for Object Detection ( http://arxiv.org/abs/2212.05911v2 )

ライセンス: Link先を確認

Renaud Vandeghen and Gilles Louppe and Marc Van Droogenbroeck

(参考訳) ディープラーニングは、画像内のオブジェクト検出のタスクを解決する効果的なソリューションとして登場したが、大きなラベル付きデータセットを必要とするコストがかかる。このコストを軽減するために、豊富なラベルのないデータを活用する半教師付き物体検出手法が提案され、既に印象的な結果が出ている。しかし、これらの方法のほとんどがしきい値化によって擬似ラベルと接地オブジェクトをリンクする必要がある。以前の研究では、このしきい値は通常経験的に決定され、それは時間がかかり、1つのデータ分布に対してのみ実行される。ドメイン、つまりデータ分布が変化すると、新しくコストのかかるパラメータ検索が必要となる。本稿では,単純かつ効果的な教師教育手法である物体検出のための適応型自己学習法(astod)を提案する。 astodはスコアヒストグラムの基底値に基づいて閾値をコストなしで決定する。また,教師の予測の質を向上させるために,新しい擬似ラベル手法を提案する。疑似ラベル付けステップでは,未ラベル画像の異なるビューを用いて,誤り予測回数を削減し,よりよい候補ラベルを得る。教師と生徒は個別に教育を受けており、教師を生徒に置き換えることで、反復的な手法で利用することができる。 ms-cocoデータセットでは、しきい値パラメータを必要としない最先端のメソッドに対して一貫して良好に動作し、パラメータスイープ検索を必要とするメソッドで競合結果を示す。衛星画像を含むDIORデータセット上の教師付きベースラインに関する追加実験は、同様の結論を導き、データ分布に関係なく、自己学習においてスコア閾値を自動で適応させることが可能であることを証明した。コードはhttps:// github.com/rvandeghen/ASTODで公開されている。

Deep learning has emerged as an effective solution for solving the task of object detection in images but at the cost of requiring large labeled datasets. To mitigate this cost, semi-supervised object detection methods, which consist in leveraging abundant unlabeled data, have been proposed and have already shown impressive results. However, most of these methods require linking a pseudo-label to a ground-truth object by thresholding. In previous works, this threshold value is usually determined empirically, which is time consuming, and only done for a single data distribution. When the domain, and thus the data distribution, changes, a new and costly parameter search is necessary. In this work, we introduce our method Adaptive Self-Training for Object Detection (ASTOD), which is a simple yet effective teacher-student method. ASTOD determines without cost a threshold value based directly on the ground value of the score histogram. To improve the quality of the teacher predictions, we also propose a novel pseudo-labeling procedure. We use different views of the unlabeled images during the pseudo-labeling step to reduce the number of missed predictions and thus obtain better candidate labels. Our teacher and our student are trained separately, and our method can be used in an iterative fashion by replacing the teacher by the student. On the MS-COCO dataset, our method consistently performs favorably against state-of-the-art methods that do not require a threshold parameter, and shows competitive results with methods that require a parameter sweep search. Additional experiments with respect to a supervised baseline on the DIOR dataset containing satellite images lead to similar conclusions, and prove that it is possible to adapt the score threshold automatically in self-training, regardless of the data distribution. The code is available at https:// github.com/rvandeghen/ASTOD

翻訳日:2023-11-28 04:56:04 公開日:2023-11-23

# 雑音量子回路シミュレーションのための近似アルゴリズム

Approximation Algorithm for Noisy Quantum Circuit Simulation ( http://arxiv.org/abs/2211.17028v2 )

ライセンス: Link先を確認

Mingyu Huang, Ji Guan, Wang Fang and Mingsheng Ying

(参考訳) ノイズ量子回路のシミュレーションは、量子ノイズが避けられない現在のNISQ(ノイズ中間量子)時代の量子アルゴリズムの設計と検証に不可欠である。しかし、量子状態の爆発問題(状態空間の次元は量子ビット数で指数関数的である)とノイズの複素(単項でない)表現のため、古典的よりもはるかに非効率である。これにより、約50キュービットのノイズ回路のみを略よくシミュレートすることができる。本稿では,シミュレーション可能な回路のスケーラビリティを向上させるために,ノイズ効率が重要でない場合に,ノイズ量子回路をシミュレートするための新しい近似アルゴリズムを提案する。このアルゴリズムは、ノイズシミュレーションのための新しいテンソルネットワーク図に基づいており、特異値分解を用いて、ダイアグラム内の量子ノイズのテンソルを近似する。テンソルネットワークダイアグラムの収縮は、GoogleのTensorNetwork上に実装されている。このアルゴリズムの有効性と実用性は、現実的な超伝導ノイズモデルを持つ実用的な量子回路の一連の実験によって実証される。その結果、アルゴリズムは最大225キュービット、20ノイズ(約1.8時間)の量子回路を近似的にシミュレートできる。特に,本手法は,量子軌道法(quantum trajectories method)の近似(サンプリング)アルゴリズムを高速化する。さらに,提案手法は,雑音が十分に小さい場合,量子軌道法におけるサンプル数を大幅に削減することができる。

Simulating noisy quantum circuits is vital in designing and verifying quantum algorithms in the current NISQ (Noisy Intermediate-Scale Quantum) era, where quantum noise is unavoidable. However, it is much more inefficient than the classical counterpart because of the quantum state explosion problem (the dimension of state space is exponential in the number of qubits) and the complex (non-unitary) representation of noises. Consequently, only noisy circuits with up to about 50 qubits can be simulated approximately well. This paper introduces a novel approximation algorithm for simulating noisy quantum circuits when the noisy effectiveness is insignificant to improve the scalability of the circuits that can be simulated. The algorithm is based on a new tensor network diagram for the noisy simulation and uses the singular value decomposition to approximate the tensors of quantum noises in the diagram. The contraction of the tensor network diagram is implemented on Google's TensorNetwork. The effectiveness and utility of the algorithm are demonstrated by experimenting on a series of practical quantum circuits with realistic superconducting noise models. As a result, our algorithm can approximately simulate quantum circuits with up to 225 qubits and 20 noises (within about 1.8 hours). In particular, our method offers a speedup over the commonly-used approximation (sampling) algorithm -- quantum trajectories method. Furthermore, our approach can significantly reduce the number of samples in the quantum trajectories method when the noise rate is small enough.

翻訳日:2023-11-28 04:54:51 公開日:2023-11-23

# MECCH:メタパスコンテキスト畳み込みに基づく異種グラフニューラルネットワーク

MECCH: Metapath Context Convolution-based Heterogeneous Graph Neural Networks ( http://arxiv.org/abs/2211.12792v2 )

ライセンス: Link先を確認

Xinyu Fu, Irwin King

(参考訳) 複数のノードとエッジを持つ構造データによる表現学習のために,ヘテロジニアスグラフニューラルネットワーク(hgnns)が提案されている。 HGNNが深くなったときのパフォーマンス劣化問題に対処するため、研究者はHGNNにメタパスを結合し、セマンティクスに密接に関連するノードをグラフ内でより遠くまで関連付ける。しかし、既存のメタパスベースのモデルは情報損失または高い計算コストに悩まされている。これらの問題を解決するために,メタパスコンテキスト畳み込みに基づく異種グラフニューラルネットワーク(MECCH)を提案する。 MECCHは、冗長性を避けながら損失のないノード情報の集約を容易にする新しいタイプのグラフ構造であるメタパスコンテキストを活用する。具体的には,(1)メタパスコンテクスト構成,(2)メタパスコンテクストエンコーダ,(3)畳み込みメタパス融合の3つの特徴前処理により,入力グラフから包括的情報を効率的に抽出する。ノード分類とリンク予測のための5つの実世界の異種グラフデータセットの実験により、MECCHは計算効率を向上した最先端のベースラインと比較して予測精度が優れていることが示された。

Heterogeneous graph neural networks (HGNNs) were proposed for representation learning on structural data with multiple types of nodes and edges. To deal with the performance degradation issue when HGNNs become deep, researchers combine metapaths into HGNNs to associate nodes closely related in semantics but far apart in the graph. However, existing metapath-based models suffer from either information loss or high computation costs. To address these problems, we present a novel Metapath Context Convolution-based Heterogeneous Graph Neural Network (MECCH). MECCH leverages metapath contexts, a new kind of graph structure that facilitates lossless node information aggregation while avoiding any redundancy. Specifically, MECCH applies three novel components after feature preprocessing to extract comprehensive information from the input graph efficiently: (1) metapath context construction, (2) metapath context encoder, and (3) convolutional metapath fusion. Experiments on five real-world heterogeneous graph datasets for node classification and link prediction show that MECCH achieves superior prediction accuracy compared with state-of-the-art baselines with improved computational efficiency.

翻訳日:2023-11-28 04:54:32 公開日:2023-11-23

# 部分空間間の量子コヒーレンス:状態変換、コヒーレンスパワー、$k$コヒーレンスおよびその他の性質

Quantum coherence between subspaces: State transformation, Cohering Power, $k$-coherence and other properties ( http://arxiv.org/abs/2302.13148v3 )

ライセンス: Link先を確認

Azam Mani, Fatemeh Rezazadeh, Vahid Karimipour

(参考訳) 最初に[1]で導入され[2,3]で開発されたボックコヒーレンスの概念は、個々の原子上で任意の精密な測定を行うために実験能力がそれほど繊細でない場合を含む。我々は,この資源理論のさらなる研究を促進する枠組みを,いくつかの点で開発する。この枠組みを用いて、不整合演算による状態変換の問題と、クラウス作用素の明示的な形式を提示することにより、ブロックコヒーレンス(英語版)の文脈における状態変換の十分条件を導出する。我々はまた、他の全ての状態およびすべてのユニタリゲートが非コヒーレント操作によって構築できる最大コヒーレント状態の形式を決定する。その後、量子チャネルのブロックコヒーレンスおよびブロックデコヒーレンスパワーの概念を定義し、これらのパワーを複数の種類のチャネルで決定する。最後に、ブロックコヒーレンスと、$k$-コヒーレンスと呼ばれる以前のコヒーレンスの拡張との関係について検討する。

The concept of bock-coherence, first introduced in [1] and developed in [2,3] encompasses the case where experimental capabilities are not so delicate to perform arbitrary refined measurements on individual atoms. We develop a framework which facilitates further investigation of this resource theory in several respects. Using this framework, we investigate the problem of state conversion by incoherent operations and by presenting the explicit form of Kraus operators, we derive a majorization-like sufficient condition for state conversion within the context of block coherence. We also determine the form of the maximally coherent state from which all other states and all unitary gates can be constructed by incoherent operations. Thereafter, we define the concept of block-cohering and block-decohering powers of quantum channels and determine these powers for several types of channels. Finally, we explore the relation between block coherence and a previous extension of coherence, known as $k$-coherence.

翻訳日:2023-11-28 04:44:41 公開日:2023-11-23

# 相互作用するカオス小体量子系における普遍スペクトル相関

Universal spectral correlations in interacting chaotic few-body quantum systems ( http://arxiv.org/abs/2302.09955v3 )

ライセンス: Link先を確認

Felix Fritzsch and Maximilian F. I. Kieler

(参考訳) 相互作用量子系におけるランダム行列スペクトル相関の出現は、量子カオスの定義的特徴である。このような相関関係を,適切なランダム・マトリクス・アンサンブルでモデル化したカオス的少数・多体相互作用におけるスペクトル形状因子とそのモーメントの観点から検討した。スペクトル形式因子は、大きなヒルベルト空間次元に対して正確に得られる。これらの結果を有限ヒルベルト空間次元に補間すると、非相互作用から強相互作用の場合への普遍的な遷移が見つかる。この遷移は単一のスケーリングパラメータによって制御される。二成分の場合、スペクトル形式因子の全てのモーメントについても同様の結果が得られる。その結果を広範囲な数値研究により確認し, 数値化された一対の蹴りローターによって与えられるより現実的なシステムにも適用できることを実証した。最終的に、我々は小さな結合状態をカバーする摂動的アプローチで分析を補完する。

The emergence of random matrix spectral correlations in interacting quantum systems is a defining feature of quantum chaos. We study such correlations in terms of the spectral form factor and its moments in interacting chaotic few- and many-body systems, modeled by suitable random-matrix ensembles. We obtain the spectral form factor exactly for large Hilbert space dimension. Extrapolating those results to finite Hilbert space dimension we find a universal transition from the non-interacting to the strongly interacting case, which can be described as a simple combination of these two limits. This transition is governed by a single scaling parameter. In the bipartite case we derive similar results also for all moments of the spectral form factor. We confirm our results by extensive numerical studies and demonstrate that they apply to more realistic systems given by a pair of quantized kicked rotors as well. Ultimately we complement our analysis by a perturbative approach covering the small coupling regime.

翻訳日:2023-11-28 04:44:21 公開日:2023-11-23

# 集団測定による混合量子状態の判別

Discriminating mixed qubit states with collective measurements ( http://arxiv.org/abs/2302.08882v2 )

ライセンス: Link先を確認

Lorcan O. Conlon, Falk Eilenberger, Ping Koy Lam and Syed M. Assad

(参考訳) 非直交状態が完全に区別できないのは量子力学の中心的な事実である。この特性は量子鍵分布の安全性を保証する。したがって、量子状態を最適に区別する戦略を設計し実装する量子通信において重要なタスクである。一般に、複数の量子状態のコピーにアクセスすると、最適な測定は集合的な測定となる。しかし、これまでは量子状態の識別を強化するために集団計測は使われていない。この主な理由の1つは、同じ事前確率の通常の状態識別設定では、量子状態の少なくとも3つのコピーを総合的に測定し、分離可能な測定値を上回る必要があるという事実である。これは実験的に非常に難しい。本研究では,不平等な先行確率を考慮し,非絡み合い測定で達成できるよりも低い誤差率を実現する集合計測を用いて,単一量子状態の2つのコピーを識別するプロトコルを実験的に提案する。我々は、超伝導量子プロセッサであるIBM Q System Oneデバイス上で測定を実装した。さらに,未知状態の3と4のコピーに対して集団測定を行い,その有効性が低かった。

It is a central fact in quantum mechanics that non-orthogonal states cannot be distinguished perfectly. This property ensures the security of quantum key distribution. It is therefore an important task in quantum communication to design and implement strategies to optimally distinguish quantum states. In general, when we have access to multiple copies of quantum states the optimal measurement will be a collective measurement. However, to date, collective measurements have not been used to enhance quantum state discrimination. One of the main reasons for this is the fact that, in the usual state discrimination setting with equal prior probabilities, at least three copies of a quantum state are required to be measured collectively to outperform separable measurements. This is very challenging experimentally. In this work, by considering unequal prior probabilities, we propose and experimentally demonstrate a protocol for distinguishing two copies of single qubit states using collective measurements which achieves a lower probability of error than can be achieved by any non-entangling measurement. We implement our measurements on an IBM Q System One device, a superconducting quantum processor. Additionally, we implemented collective measurements on three and four copies of the unknown state and found they performed poorly.

翻訳日:2023-11-28 04:44:07 公開日:2023-11-23

# 確率的表現によるPDE学習のためのモンテカルロニューラルPDE解法

Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation ( http://arxiv.org/abs/2302.05104v2 )

ライセンス: Link先を確認

Rui Zhang, Qi Meng, Rongchan Zhu, Yue Wang, Wenlei Shi, Shihua Zhang, Zhi-Ming Ma, Tie-Yan Liu

(参考訳) 利用可能なデータや高品質のデータに制限のあるシナリオでは、教師なしの方法で関数から関数へのニューラルPDEソルバを訓練することが不可欠である。しかし、既存の手法の効率性と精度は、有限差分法や擬スペクトル法といった数値アルゴリズムの特性によって制約される。これらの手法は、適切な精度を達成するために、慎重な時空間離散化を必要とし、特に相当な時空間変動のある場合において、重要な計算課題と不正確なシミュレーションをもたらす。これらの制限に対処するため,我々は,マクロ現象をランダム粒子のアンサンブルとして扱うpdesの確率表現を介して教師なしニューラルネットワークを学習するためのモンテカルロ神経pdeソルバ(mcnpソルバ)を提案する。他の教師なし法と比較して、mcnpソルバは自然にモンテカルロ法の利点を継承しており、これは時空間の変動に対して頑健であり、粗いステップサイズを許容できる。粒子のランダムウォークをシミュレートするために, 対流過程に heun 法を適用し, 拡散過程における近傍格子点の確率密度関数による期待値を計算した。これらの技術は精度を高め、モンテカルロサンプリングに関連する計算メモリと時間問題を回避し、従来のモンテカルロ法よりも改善した。対流拡散, アレン・カーン, ナヴィエ・ストークス方程式に関する数値実験により, 他の教師なしベースラインと比較して精度と効率が著しく向上した。ソースコードは、https://github.com/optray/MCNPで公開される。

In scenarios with limited available or high-quality data, training the function-to-function neural PDE solver in an unsupervised manner is essential. However, the efficiency and accuracy of existing methods are constrained by the properties of numerical algorithms, such as finite difference and pseudo-spectral methods, integrated during the training stage. These methods necessitate careful spatiotemporal discretization to achieve reasonable accuracy, leading to significant computational challenges and inaccurate simulations, particularly in cases with substantial spatiotemporal variations. To address these limitations, we propose the Monte Carlo Neural PDE Solver (MCNP Solver) for training unsupervised neural solvers via the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles. Compared to other unsupervised methods, MCNP Solver naturally inherits the advantages of the Monte Carlo method, which is robust against spatiotemporal variations and can tolerate coarse step size. In simulating the random walk of particles, we employ Heun's method for the convection process and calculate the expectation via the probability density function of neighbouring grid points during the diffusion process. These techniques enhance accuracy and circumvent the computational memory and time issues associated with Monte Carlo sampling, offering an improvement over traditional Monte Carlo methods. Our numerical experiments on convection-diffusion, Allen-Cahn, and Navier-Stokes equations demonstrate significant improvements in accuracy and efficiency compared to other unsupervised baselines. The source code will be publicly available at: https://github.com/optray/MCNP.

翻訳日:2023-11-28 04:43:24 公開日:2023-11-23

# 回路量子力学の非分散レジームにおけるポラリトン状態の特徴

Characterising Polariton States in Non-Dispersive Regime of Circuit Quantum Electrodynamics ( http://arxiv.org/abs/2302.04523v2 )

ライセンス: Link先を確認

Arvind Mamgain, Samarth Hawaldar, Athreya Shankar and Baladitya Suri

(参考訳) 読み出し共振器に結合された超伝導量子ビットは、現在、複数の量子コンピューティングと量子光学実験の構成要素となっている。典型的なクビット共振器系は分散系において結合され、クビットと共振器の分解はそれらの結合よりもはるかに大きい。本研究では,非分散系における超伝導トランスモン共振器を作製し,測定した。素量子ビットと共振器状態の混合によって形成される着飾った状態は、キュービットに駆動を印加することでさらに混合することができ、偏光子状態の形成につながる。本研究では,様々な駆動パワーと周波数におけるポラリトン状態間の遷移を実験的に検討し,量子共鳴系の高次レベルの非分散結合がポラリトン固有状態と対応する遷移周波数をどのように修飾するかを示す。また,Jaynes-Cummingsモデルから得られる分散状態以外の数値結果との密接な一致を報告する。

A superconducting qubit coupled to a read-out resonator is currently the building block of multiple quantum computing as well as quantum optics experiments. A typical qubit-resonator system is coupled in the dispersive regime, where the detuning between qubit and resonator is much greater than the coupling between them. In this work, we fabricated and measured a superconducting transmon-resonator system in the non-dispersive regime. The dressed states formed by the mixing of the bare qubit and resonator states can be further mixed by applying a drive on the qubit, leading to the formation of polariton states. We report experimental studies of transitions between polariton states at varying driving powers and frequencies and show how the non-dispersive coupling of the higher levels of the qubit-resonator system modifies the polariton eigenstates and the corresponding transition frequencies. We also report close agreement with numerical results obtained from a driven Jaynes-Cummings Model beyond the dispersive regime.

翻訳日:2023-11-28 04:42:56 公開日:2023-11-23

# ベイズ階層モデルの比較のための深層学習法

A Deep Learning Method for Comparing Bayesian Hierarchical Models ( http://arxiv.org/abs/2301.11873v4 )

ライセンス: Link先を確認

Lasse Elsem\"uller, Martin Schnuerch, Paul-Christian B\"urkner, Stefan T. Radev

(参考訳) ベイズモデル比較(BMC)は、競合する計算モデルの相対的な利点を評価し、不確実性をモデル選択決定に伝播する原理的なアプローチを提供する。しかし、BMCは高次元ネストパラメータ構造のため、一般的な階層モデルのクラスにとってしばしば難解である。この難易度に対処するために,確率的プログラムとしてインスタンス化可能な階層モデルの集合上でBMCを実行する深層学習手法を提案する。そこで本手法では,任意の実データアプリケーションに先立って,後続モデル確率の効率的な再推定と高速な性能検証を可能にする。そこで本研究では, 提案手法の性能を最先端の橋梁サンプリング法と比較し, 全てのBMC設定において優れた償却推論を示す。次に,従来bmcでは難解であった4つの階層的エビデンス蓄積モデルを比較し,その手法を示す。さらに,トランスファー学習を活用してトレーニング効率を向上させる方法を示す。すべての解析に再現可能なコードを提供し,オープンソースで実装する。

Bayesian model comparison (BMC) offers a principled approach for assessing the relative merits of competing computational models and propagating uncertainty into model selection decisions. However, BMC is often intractable for the popular class of hierarchical models due to their high-dimensional nested parameter structure. To address this intractability, we propose a deep learning method for performing BMC on any set of hierarchical models which can be instantiated as probabilistic programs. Since our method enables amortized inference, it allows efficient re-estimation of posterior model probabilities and fast performance validation prior to any real-data application. In a series of extensive validation studies, we benchmark the performance of our method against the state-of-the-art bridge sampling method and demonstrate excellent amortized inference across all BMC settings. We then showcase our method by comparing four hierarchical evidence accumulation models that have previously been deemed intractable for BMC due to partly implicit likelihoods. Additionally, we demonstrate how transfer learning can be leveraged to enhance training efficiency. We provide reproducible code for all analyses and an open-source implementation of our method.

翻訳日:2023-11-28 04:41:48 公開日:2023-11-23

# 原子スケールにおける量子崩壊モデルの指紋としてのキャンセル効果

Cancellation effects as a fingerprint of quantum collapse models at atomic scale ( http://arxiv.org/abs/2301.09920v2 )

ライセンス: Link先を確認

Kristian Piscicchia, Sandro Donadi, Simone Manti, Angelo Bassi, Maaneli Derakhshani, Lajos Diosi and Catalina Curceanu

(参考訳) 本研究は, 動的波動関数崩壊によって引き起こされる原子系からの自然電磁放射をX線領域で研究する。強い離脱は、これまでの文献で考慮された単純な場合において、放出が完全にコヒーレント(同じ核にある陽子)または非コヒーレント(電子)であることを示す。この低エネルギー状態において、自然放射線の速度は調査中の原子種に強く依存しており、初めて特定の崩壊モデルに依存することが判明した。

In this work the spontaneous electromagnetic radiation from atomic systems, induced by dynamical wave-function collapse, is investigated in the X-rays domain. Strong departures are evidenced with respect to the simple cases considered until now in the literature, in which the emission is either perfectly coherent (protons in the same nuclei) or incoherent (electrons). In this low-energy regime the spontaneous radiation rate strongly depends on the atomic species under investigation and, for the first time, is found to depend on the specific collapse model.

翻訳日:2023-11-28 04:41:09 公開日:2023-11-23

# 分割弦とホログラフィ

Segmented strings and holography ( http://arxiv.org/abs/2304.10389v2 )

ライセンス: Link先を確認

Bercel Boldis, P\'eter L\'evay

(参考訳) 本稿では,ミンコフスキー時空における$ads_{d+1}$ と $cft_d$ で伝播する分割文字列間の接続を,真空状態から計算した量子情報理論量によって特徴づける。本稿では,AdS側の文字列セグメントのワールドシートの面積を,CFT側のフィリティ感受性(量子幾何テンソルの実部分)に接続可能であることを示す。この量は、運動空間の計量に従って空間的に変位する因果ダイヤモンドに対応する無限に分離された状態に対する計算複雑性として別の解釈を持つ。これらの転位因果ダイヤモンドは、弦ワールドシートセグメントをホログラフィック的に一意に再構成するための情報を符号化する。二次的に、バルクセグメントは、押し上げられた慣性フレームまたは一定加速度で進行する非慣性フレームにおける連続的な境界事象の因果的に順序づけられた集合を表す。特別な場合、$AdS_3$は4GL$(G$はニュートン定数、$L$はAdS長)の単位で区切られた文字列領域を条件付き相互情報 $I(A,C\vert B)$ として見ることもできる。この特別な場合、離散化されたナムブ・ゴト作用の変動は、トーダ方程式の形式の境界理論における絡み合いエントロピーの方程式につながる。任意の$d$ に対して、string world のシートパッチは、エンタングルメント・ウェッジのモジュラースライスに含まれている。それらは絡み合うくさびのある種のトモグラフィーを提供しており、そこでパッチは補間アンサッツ(英語版)、すなわちナムブ・ゴト作用の運動方程式の離散バージョンによって連結される。

In this paper we establish a connection between segmented strings propagating in $AdS_{d+1}$ and $CFT_d$ subsystems in Minkowski spacetime characterized by quantum information theoretic quantities calculated for the vacuum state. We show that the area of the world sheet of a string segment on the AdS side can be connected to fidelity susceptibility (the real part of the quantum geometric tensor) on the CFT side. This quantity has another interpretation as the computational complexity for infinitesimally separated states corresponding to causal diamonds that are displaced in a spacelike manner according to the metric of kinematic space. These displaced causal diamonds encode information for a unique reconstruction of the string world sheet segments in a holographic manner. Dually the bulk segments are representing causally ordered sets of consecutive boundary events in boosted inertial frames or in noninertial ones proceeding with constant acceleration. For the special case of $AdS_3$ one can also see the segmented stringy area in units of $4GL$ ($G$ is Newton's constant and $L$ is the AdS length) as the conditional mutual information $I(A,C\vert B)$ calculated for a trapezoid configuration arising from boosted spacelike intervals $A$,$B$ and $C$. In this special case the variation of the discretized Nambu-Goto action leads to an equation for entanglement entropies in the boundary theory of the form of a Toda equation. For arbitrary $d$ the string world sheet patches are living in the modular slices of the entanglement wedge. They seem to provide some sort of tomography of the entanglement wedge where the patches are linked together by the interpolation ansatz, i.e. the discretized version of the equations of motion for the Nambu-Goto action.

翻訳日:2023-11-28 04:33:02 公開日:2023-11-23

# HRS-Bench: テキスト-画像モデルのためのホロスティックで信頼性が高くスケーラブルなベンチマーク

HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models ( http://arxiv.org/abs/2304.05390v2 )

ライセンス: Link先を確認

Eslam Mohamed Bakr, Pengzhan Sun, Xiaoqian Shen, Faizan Farooq Khan, Li Erran Li, Mohamed Elhoseiny

(参考訳) 近年,テキスト・トゥ・イメージ(T2I)モデルの研究が盛んに行われており,特にT2I合成タスクにおける最新結果が得られる拡散モデルが出現している。しかし、既存のベンチマークは主観的な人間の評価に大きく依存しており、モデルの性能を全体的評価する能力を制限する。さらに、新しいT2Iアーキテクチャの開発と評価の成果との間には大きなギャップがある。そこで本研究では,t2iモデルの具体的評価ベンチマークであるhrs-bench(hrs-bench)を提案する。限られた側面に焦点を当てた既存のベンチマークとは異なり、hrs-benchは13のスキルを測定し、正確性、堅牢性、一般化、公平性、バイアスの5つの主要なカテゴリに分類できる。さらに、HRS-Benchはファッション、動物、輸送、食べ物、衣服を含む50のシナリオをカバーする。幅広いスキルをカバーするメトリクスを用いて,最近の9つの大規模t2iモデルを評価した。 HRS-Benchの有効性を調査するために, 平均的評価の95%と一致した人的評価を行った。我々の実験では、既存のモデルは、望まれる対象数、視覚的テキストまたは接地感情で画像を生成するのに苦労することが多い。われわれのベンチマークは、将来のテキストから画像までの研究を容易にすることを願っている。コードとデータはhttps://eslambakr.github.io/hrsbench.github.ioで入手できる。

In recent years, Text-to-Image (T2I) models have been extensively studied, especially with the emergence of diffusion models that achieve state-of-the-art results on T2I synthesis tasks. However, existing benchmarks heavily rely on subjective human evaluation, limiting their ability to holistically assess the model's capabilities. Furthermore, there is a significant gap between efforts in developing new T2I architectures and those in evaluation. To address this, we introduce HRS-Bench, a concrete evaluation benchmark for T2I models that is Holistic, Reliable, and Scalable. Unlike existing bench-marks that focus on limited aspects, HRS-Bench measures 13 skills that can be categorized into five major categories: accuracy, robustness, generalization, fairness, and bias. In addition, HRS-Bench covers 50 scenarios, including fashion, animals, transportation, food, and clothes. We evaluate nine recent large-scale T2I models using metrics that cover a wide range of skills. A human evaluation aligned with 95% of our evaluations on average was conducted to probe the effectiveness of HRS-Bench. Our experiments demonstrate that existing models often struggle to generate images with the desired count of objects, visual text, or grounded emotions. We hope that our benchmark help ease future text-to-image generation research. The code and data are available at https://eslambakr.github.io/hrsbench.github.io

翻訳日:2023-11-28 04:31:57 公開日:2023-11-23

# フェデレーション学習を用いた映画推薦のためのプライバシー保護システム

A Privacy Preserving System for Movie Recommendations Using Federated Learning ( http://arxiv.org/abs/2303.04689v3 )

ライセンス: Link先を確認

David Neumann, Andreas Lutz, Karsten M\"uller, Wojciech Samek

(参考訳) 過去数年間、レコメンダシステムはユビキタスになってきた。多くのユーザーが直面する選択の専横を解消し、多くのオンラインビジネスがエンゲージメントと販売を促進するために利用している。ソーシャルネットワーク内でフィルターバブルを作成するなど、他の批判に加えて、レコメンダシステムは大量の個人データを集めるためにしばしば証明される。しかし、レコメンデーションをパーソナライズするには、個人情報が不可欠である。フェデレートラーニング(Federated Learning)と呼ばれる最近の分散学習方式により,集中的な収集なしに個人データから学習できるようになった。第一に、第一に、フェデレーション学習を用いてトレーニングされており、その性質上、プライバシーを保護しつつ、ユーザはグローバルな洞察から恩恵を受けられるようにしています。さらに、FedQと呼ばれる新しいフェデレーション学習方式が採用され、非i-d-nessや小さなローカルデータセットの問題に対処するだけでなく、クライアント更新を早期に集約することで入力データ再構成攻撃を防止する。最後に、通信オーバーヘッドを低減するために圧縮を適用し、交換されたニューラルネットワークのパラメータ化を元のサイズのごく一部に大幅に圧縮する。量子化の欠如によってデータのプライバシも向上する可能性があると推測する。

Recommender systems have become ubiquitous in the past years. They solve the tyranny of choice problem faced by many users, and are utilized by many online businesses to drive engagement and sales. Besides other criticisms, like creating filter bubbles within social networks, recommender systems are often reproved for collecting considerable amounts of personal data. However, to personalize recommendations, personal information is fundamentally required. A recent distributed learning scheme called federated learning has made it possible to learn from personal user data without its central collection. Consequently, we present a recommender system for movie recommendations, which provides privacy and thus trustworthiness on multiple levels: First and foremost, it is trained using federated learning and thus, by its very nature, privacy-preserving, while still enabling users to benefit from global insights. Furthermore, a novel federated learning scheme, called FedQ, is employed, which not only addresses the problem of non-i.i.d.-ness and small local datasets, but also prevents input data reconstruction attacks by aggregating client updates early. Finally, to reduce the communication overhead, compression is applied, which significantly compresses the exchanged neural network parametrizations to a fraction of their original size. We conjecture that this may also improve data privacy through its lossy quantization stage.

翻訳日:2023-11-28 04:29:07 公開日:2023-11-23

# 文脈強化学習のための構造化状態空間モデル

Structured State Space Models for In-Context Reinforcement Learning ( http://arxiv.org/abs/2303.03982v3 )

ライセンス: Link先を確認

Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder Singh, Feryal Behbahani

(参考訳) structured state space sequence (s4)モデルは最近、長距離シーケンスモデリングタスクで最先端のパフォーマンスを達成している。これらのモデルは高速な推論速度と並列トレーニングも備えており、多くの強化学習環境で有用である可能性がある。本研究では,隠れた状態を並列に初期化,リセットすることが可能なS4の変種を改良し,強化学習タスクに取り組むことを提案する。変更したアーキテクチャはシーケンス長のトランスフォーマーよりも漸近的に高速に動作し、単純なメモリベースのタスクでRNNよりも優れた性能を示す。修正されたアーキテクチャを部分的に観測可能な環境上で評価し、実際に、我々のモデルはRNNより5倍以上高速に動作し、RNNより優れています。そして,モデルが長距離シーケンスを処理できる能力を活用することで,エージェントがランダムにサンプリングされた連続的な制御環境と,ランダムにサンプリングされた環境の観察と行動の線形投影を併用した,挑戦的なメタ学習タスクにおいて,高い性能を達成する。さらに,結果モデルが分散処理に適応できることを示す。本論文では,構造化状態空間モデルがテキスト内強化学習タスクにおいて高速かつ高性能であることを示す。 https://github.com/luchris429/popjaxrl.comでコードを提供しています。

Structured state space sequence (S4) models have recently achieved state-of-the-art performance on long-range sequence modeling tasks. These models also have fast inference speeds and parallelisable training, making them potentially useful in many reinforcement learning settings. We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel, allowing us to tackle reinforcement learning tasks. We show that our modified architecture runs asymptotically faster than Transformers in sequence length and performs better than RNN's on a simple memory-based task. We evaluate our modified architecture on a set of partially-observable environments and find that, in practice, our model outperforms RNN's while also running over five times faster. Then, by leveraging the model's ability to handle long-range sequences, we achieve strong performance on a challenging meta-learning task in which the agent is given a randomly-sampled continuous control environment, combined with a randomly-sampled linear projection of the environment's observations and actions. Furthermore, we show the resulting model can adapt to out-of-distribution held-out tasks. Overall, the results presented in this paper show that structured state space models are fast and performant for in-context reinforcement learning tasks. We provide code at https://github.com/luchris429/popjaxrl.

翻訳日:2023-11-28 04:28:44 公開日:2023-11-23

# ニューラルネットワークにおける重要度推定器の信頼性評価のための特徴摂動増強

Feature Perturbation Augmentation for Reliable Evaluation of Importance Estimators in Neural Networks ( http://arxiv.org/abs/2303.01538v2 )

ライセンス: Link先を確認

Lennart Brocki and Neo Christopher Chung

(参考訳) ポストホックな説明手法は、ディープニューラルネットワークの内部動作をより解釈しやすくする。しかし、基礎的な真理が一般に欠けているため、入力特徴に重要得点を割り当てる局所的なポストホック解釈可能性手法は、評価が困難である。最も一般的な評価フレームワークの1つは、解釈可能性法による重要な特徴の摂動と予測精度の変化を測定することである。直感的には、予測精度の大幅な低下は、説明が予測結果(例えばロジット)に対する特徴の重要性を正しく定量化したことを示している。しかしながら、テストデータセット内の摂動サンプルは、トレーニングデータセットと比較して分散(ood)外であり、予期せぬ方法でモデルを妨げる可能性があるため、予測結果の変化は摂動アーティファクトに起因する可能性がある。この課題を克服するために、モデルトレーニング中に摂動画像を生成し、付加する機能摂動増強(FPA)を提案する。広範な計算実験を通じて,fpaが深層ニューラルネットワーク(dnn)を摂動に対してより強固にすることを示す。さらに、FPAを用いたDNNのトレーニングでは、重要なスコアのサインが、以前想定されていたよりも有意義にモデルを説明する可能性がある。全体として、FPAは、ポストホック解釈可能性の評価を改善する直感的なデータ拡張技術である。

Post-hoc explanation methods attempt to make the inner workings of deep neural networks more interpretable. However, since a ground truth is in general lacking, local post-hoc interpretability methods, which assign importance scores to input features, are challenging to evaluate. One of the most popular evaluation frameworks is to perturb features deemed important by an interpretability method and to measure the change in prediction accuracy. Intuitively, a large decrease in prediction accuracy would indicate that the explanation has correctly quantified the importance of features with respect to the prediction outcome (e.g., logits). However, the change in the prediction outcome may stem from perturbation artifacts, since perturbed samples in the test dataset are out of distribution (OOD) compared to the training dataset and can therefore potentially disturb the model in an unexpected manner. To overcome this challenge, we propose feature perturbation augmentation (FPA) which creates and adds perturbed images during the model training. Through extensive computational experiments, we demonstrate that FPA makes deep neural networks (DNNs) more robust against perturbations. Furthermore, training DNNs with FPA demonstrate that the sign of importance scores may explain the model more meaningfully than has previously been assumed. Overall, FPA is an intuitive data augmentation technique that improves the evaluation of post-hoc interpretability methods.

翻訳日:2023-11-28 04:28:22 公開日:2023-11-23

# 優先型強化学習におけるクエリ・ポリティクスのミスアライメント

Query-Policy Misalignment in Preference-Based Reinforcement Learning ( http://arxiv.org/abs/2305.17400v2 )

ライセンス: Link先を確認

Xiao Hu, Jianxiong Li, Xianyuan Zhan, Qing-Shan Jia, Ya-Qin Zhang

(参考訳) 嗜好に基づく強化学習(PbRL)は、RLエージェントの振る舞いを人間の望ましい結果と整合させる自然な方法を提供するが、コストのかかる人間のフィードバックによって抑制されることが多い。フィードバック効率を向上させるため,既存のPbRL手法の多くは,報酬モデル全体の品質を最大化するためにクエリの選択に重点を置いている。この謎を解くために、既存のPbRL研究のクエリ選択スキームにおいて、長年無視されてきた問題を特定する: Query-Policy Misalignment。報酬モデル全体の品質を改善するために選択された一見有意義なクエリは、実際にはRLエージェントの関心と一致せず、政策学習にはほとんど役立ちず、結果としてフィードバック効率が低下することを示します。この課題は,双方向のクエリとポリシのアライメントを両立させる特別に設計されたハイブリッド・エクスペリエンス・リプレイによって効果的に解決できることを示す。シンプルでエレガントな手法で、数行のコードだけを変更することで、既存のアプローチに容易に組み込むことができます。提案手法は,PbRLタスクにおけるクエリ・ポリティクスのミスアライメントに対処することの重要性を実証し,人間のフィードバックとRLサンプルの効率の両面で大幅に向上することを示す。

Preference-based reinforcement learning (PbRL) provides a natural way to align RL agents' behavior with human desired outcomes, but is often restrained by costly human feedback. To improve feedback efficiency, most existing PbRL methods focus on selecting queries to maximally improve the overall quality of the reward model, but counter-intuitively, we find that this may not necessarily lead to improved performance. To unravel this mystery, we identify a long-neglected issue in the query selection schemes of existing PbRL studies: Query-Policy Misalignment. We show that the seemingly informative queries selected to improve the overall quality of reward model actually may not align with RL agents' interests, thus offering little help on policy learning and eventually resulting in poor feedback efficiency. We show that this issue can be effectively addressed via near on-policy query and a specially designed hybrid experience replay, which together enforce the bidirectional query-policy alignment. Simple yet elegant, our method can be easily incorporated into existing approaches by changing only a few lines of code. We showcase in comprehensive experiments that our method achieves substantial gains in both human feedback and RL sample efficiency, demonstrating the importance of addressing query-policy misalignment in PbRL tasks.

翻訳日:2023-11-28 04:19:39 公開日:2023-11-23

# 1次元非エルミタンスターク系におけるエルゴディディティから多体局在へ

From Ergodicity to Many-Body Localization in a One-Dimensional Interacting Non-Hermitian Stark System ( http://arxiv.org/abs/2305.13636v3 )

ライセンス: Link先を確認

Jinghu Liu and Zhihao Xu

(参考訳) 非エルミート量子系における無秩序誘導多体局在(mbl)の研究が注目されている。しかし、非エルミート障害のないMBLは明らかにする必要がある。時間-逆対称性を持つ非相互ホッピングを持つ1次元の相互作用するスタークモデルを考える。周期境界条件 (pbcs) 下では、そのようなモデルは3種類の位相遷移(固有エネルギーの実複素遷移、位相相転移、非エルミートスターク mbl遷移)を示す。実複素およびトポロジカル相転移は熱力学的極限において同じ点で起こるが、非エルミートスタークMBL遷移とは一致しない。レベル統計により、系は、線形傾動ポテンシャルの強さの増加とともに、ジニブレアンサンブル(GE)からガウス直交アンサンブル(GOE)からポッションアンサンブル(Possionアンサンブル)へ遷移する。固有値の実際の複素遷移は、エルゴード系におけるGE-to-GOE遷移を伴う。さらに、レベル統計の第二の遷移は非エルミートスターク mbl の発生に対応する。我々は、非エルミートスタークMBLがロバストであり、スペクトル統計学および固有状態特性の既存の特徴量で確認できる障害誘発MBLと多くの類似点を有することを示した。絡み合いエントロピーと密度不均衡の動的進化は、実複素およびスタークMBL遷移を区別することができる。最後に, 開境界条件下での系には実複素遷移が欠如しており, 非エルミートスターク mbl の遷移はpbcs の系と同じであることがわかった。

Recent studies on disorder-induced many-body localization (MBL) in non-Hermitian quantum systems have attracted great interest. However, the non-Hermitian disorder-free MBL still needs to be clarified. We consider a one-dimensional interacting Stark model with nonreciprocal hoppings having time-reversal symmetry, the properties of which are boundary dependent. Under periodic boundary conditions (PBCs), such a model exhibits three types of phase transitions: the real-complex transition of eigenenergies, the topological phase transition, and the non-Hermitian Stark MBL transition. The real-complex and topological phase transitions occur at the same point in the thermodynamic limit but do not coincide with the non-Hermitian Stark MBL transition, which is quite different from the non-Hermitian disordered cases. By the level statistics, the system transitions from the Ginibre ensemble (GE) to the Gaussian orthogonal ensemble (GOE) to the Possion ensemble with the increase of the linear tilt potential's strength. The real-complex transition of the eigenvalues is accompanied by the GE-to-GOE transition in the ergodic regime. Moreover, the second transition of the level statistics corresponds to the occurrence of non-Hermitian Stark MBL. We demonstrate that the non-Hermitian Stark MBL is robust and shares many similarities with disorder-induced MBL, which several existing characteristic quantities of the spectral statistics and eigenstate properties can confirm. The dynamical evolutions of the entanglement entropy and the density imbalance can distinguish the real-complex and Stark MBL transitions. Finally, we find that our system under open boundary conditions lacks a real-complex transition, and the transition of non-Hermitian Stark MBL is the same as that under PBCs.

翻訳日:2023-11-28 04:18:58 公開日:2023-11-23

# Clembench: チャット最適化言語モデルを会話エージェントとして評価するためにゲームプレイを使用する

Clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents ( http://arxiv.org/abs/2305.13455v3 )

ライセンス: Link先を確認

Kranti Chalamalasetti and Jana G\"otze and Sherzod Hakimov and Brielen Madureira and Philipp Sadler and David Schlangen

(参考訳) 近年,豊かな言語的・非言語的文脈で行動する「言語理解エージェント(situated language understanding agents)」-エイジェントを,注意深く構築された対話的環境でテストすることで体系的に評価する手法が提案されている。その他の最近の研究は、もし適切に設定されたとしても、Large Language Models (LLMs) はそのようなエージェント(シミュレーション)として理解できると主張している。 LLMは、特定の機能に挑戦するために構築された制約付きゲームライクな設定に公開することで、有意義に評価することができますか? そこで本研究では,現在のチャット最適化LDMがゲームプレイの指示に従うことができる程度に,5つのインタラクション設定について検討する。この能力とゲームプレイの品質は、異なるゲームの目的がどの程度うまく満たされているかによって測定され、開発サイクルに従って、より新しいモデルのパフォーマンスが向上する。比較的単純な例のゲームでもメトリクスは飽和していないため、提案された機器は診断値を持つことになる。 LLMを使ったゲームの実装と評価のための一般的なフレームワークは、https://github.com/clembench で公開されています。

Recent work has proposed a methodology for the systematic evaluation of "Situated Language Understanding Agents"-agents that operate in rich linguistic and non-linguistic contexts-through testing them in carefully constructed interactive settings. Other recent work has argued that Large Language Models (LLMs), if suitably set up, can be understood as (simulators of) such agents. A connection suggests itself, which this paper explores: Can LLMs be evaluated meaningfully by exposing them to constrained game-like settings that are built to challenge specific capabilities? As a proof of concept, this paper investigates five interaction settings, showing that current chat-optimised LLMs are, to an extent, capable to follow game-play instructions. Both this capability and the quality of the game play, measured by how well the objectives of the different games are met, follows the development cycle, with newer models performing better. The metrics even for the comparatively simple example games are far from being saturated, suggesting that the proposed instrument will remain to have diagnostic value. Our general framework for implementing and evaluating games with LLMs is available at https://github.com/clembench .

翻訳日:2023-11-28 04:18:31 公開日:2023-11-23

# 大言語モデルからの複合視覚手がかりによるゼロショット視覚関連検出

Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models ( http://arxiv.org/abs/2305.12476v3 )

ライセンス: Link先を確認

Lin Li, Jun Xiao, Guikun Chen, Jian Shao, Yueting Zhuang, Long Chen

(参考訳) CLIPのような事前訓練された視覚言語モデルは強力な一般化能力を示しており、ゼロショット視覚認識の領域において有望なツールとなっている。視覚的関係検出(VRD)は、画像内のオブジェクトペア間の関係(または相互作用)タイプを特定する典型的なタスクである。しかし、ゼロショットvrdのクラスベースプロンプトが一般的であるクリップは、異なる細かな関係タイプを区別するのに苦労し、2つのオブジェクトの本質的な空間情報を無視するなど、いくつかの弱点がある。そこで本研究では,複合記述プロンプトによる関係検出を解消する,ゼロショットvrd: recodeを提案する。具体的には、まず各述語カテゴリを主題、対象、空間構成要素に分解する。次に、大きな言語モデル(LLM)を活用して、各コンポーネントに対する記述ベースのプロンプト(またはビジュアルキュー)を生成する。異なる視覚的な手がかりは、異なる視点から類似した関連カテゴリの識別性を高め、vrdのパフォーマンスを著しく向上させる。異なる視覚的手がかりを動的に融合させるために,LLMが異なる視覚的手がかりに対して適切な重みを生成できるようにするチェーン・オブ・シント法を導入する。 4つのVRDベンチマークの大規模な実験は、RECODEの有効性と解釈可能性を示している。

Pretrained vision-language models, such as CLIP, have demonstrated strong generalization capabilities, making them promising tools in the realm of zero-shot visual recognition. Visual relation detection (VRD) is a typical task that identifies relationship (or interaction) types between object pairs within an image. However, naively utilizing CLIP with prevalent class-based prompts for zero-shot VRD has several weaknesses, e.g., it struggles to distinguish between different fine-grained relation types and it neglects essential spatial information of two objects. To this end, we propose a novel method for zero-shot VRD: RECODE, which solves RElation detection via COmposite DEscription prompts. Specifically, RECODE first decomposes each predicate category into subject, object, and spatial components. Then, it leverages large language models (LLMs) to generate description-based prompts (or visual cues) for each component. Different visual cues enhance the discriminability of similar relation categories from different perspectives, which significantly boosts performance in VRD. To dynamically fuse different cues, we further introduce a chain-of-thought method that prompts LLMs to generate reasonable weights for different visual cues. Extensive experiments on four VRD benchmarks have demonstrated the effectiveness and interpretability of RECODE.

翻訳日:2023-11-28 04:17:28 公開日:2023-11-23

# 高次Annealed Langevin拡散を用いた線形逆問題の解法

Solving Linear Inverse Problems using Higher-Order Annealed Langevin Diffusion ( http://arxiv.org/abs/2305.05014v3 )

ライセンス: Link先を確認

Nicolas Zilberstein, Ashutosh Sabharwal, Santiago Segarra

(参考訳) 我々は高次ランゲヴィン拡散に基づく線形逆問題に対する解を提案する。より正確には、未知の変数の後続分布から確実にサンプリングできる事前条件付き二階および三階ランゲヴィン力学を提案し、その計算効率は、その第一条件と両方の力学の非条件バージョンよりも高い。さらに, 事前条件付きダイナミクスはどちらも well-defined であり, 非条件付きの場合と同じ一意な不変分布を持つことを証明した。また,アルゴリズムの収束をさらに加速し,未知変数が離散的な場合に対応するという2つの利点を持つアニーリング手順も取り入れた。通信における2つの異なるタスク(MIMOシンボルの検出とチャネル推定)と画像に対する3つのタスクの数値実験は、我々の手法の汎用性を示し、計算複雑性を同等あるいは低めながら、競合するアプローチ(学習ベースを含む)と比較して高い性能を示す。

We propose a solution for linear inverse problems based on higher-order Langevin diffusion. More precisely, we propose pre-conditioned second-order and third-order Langevin dynamics that provably sample from the posterior distribution of our unknown variables of interest while being computationally more efficient than their first-order counterpart and the non-conditioned versions of both dynamics. Moreover, we prove that both pre-conditioned dynamics are well-defined and have the same unique invariant distributions as the non-conditioned cases. We also incorporate an annealing procedure that has the double benefit of further accelerating the convergence of the algorithm and allowing us to accommodate the case where the unknown variables are discrete. Numerical experiments in two different tasks in communications (MIMO symbol detection and channel estimation) and in three tasks for images showcase the generality of our method and illustrate the high performance achieved relative to competing approaches (including learning-based ones) while having comparable or lower computational complexity.

翻訳日:2023-11-28 04:16:42 公開日:2023-11-23

# 多目的進化強化学習によるロードバランサによる金融クラウドサービスのアイドルネスの低減

Reducing Idleness in Financial Cloud Services via Multi-objective Evolutionary Reinforcement Learning based Load Balancer ( http://arxiv.org/abs/2305.03463v2 )

ライセンス: Link先を確認

Peng Yang, Laoming Zhang, Haifeng Liu, Guiying Li

(参考訳) 近年,さまざまな企業が,自社のデータセンタを従来型のデータセンタからクラウドに移行する動きを見せている。主な動機の1つは、クラウドの弾力性によって運用コストを節約することである。本稿では,サーバ側から切り離すことなく,ユーザ接続の少ないアイドルサーバの出現率を低減するための金融サービスの必要性について論じる。本稿では、このニーズを双方向のオンライン負荷分散問題と考える。ニューラルネットワークベースのスケーラブルポリシは,要求される弾力性のために,ユーザ要求をさまざまなサーバにルーティングするように設計されている。政策の重み付けを最適化するために,進化的多目的学習フレームワークを提案する。アイドルネスの新たな目的が従来の産業ソリューションよりも130%以上削減されるだけでなく、本来の負荷分散目標自体もわずかに改善されている。合成データと実世界のデータの両方を用いた広範囲なシミュレーションは,金融サービスのアイドルネス低減に関する創発的問題に対する提案手法の詳細な適用可能性を明らかにするのに役立つ。

In recent years, various companies have started to shift their data services from traditional data centers to the cloud. One of the major motivations is to save on operational costs with the aid of cloud elasticity. This paper discusses an emerging need from financial services to reduce the incidence of idle servers retaining very few user connections, without disconnecting them from the server side. This paper considers this need as a bi-objective online load balancing problem. A neural network based scalable policy is designed to route user requests to varied numbers of servers for the required elasticity. An evolutionary multi-objective training framework is proposed to optimize the weights of the policy. Not only is the new objective of idleness is reduced by over 130% more than traditional industrial solutions, but the original load balancing objective itself is also slightly improved. Extensive simulations with both synthetic and real-world data help reveal the detailed applicability of the proposed method to the emergent problem of reducing idleness in financial services.

翻訳日:2023-11-28 04:16:24 公開日:2023-11-23

# Pick-a-Pic: テキスト対画像生成のためのユーザ嗜好のオープンデータセット

Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation ( http://arxiv.org/abs/2305.01569v2 )

ライセンス: Link先を確認

Yuval Kirstain and Adam Polyak and Uriel Singer and Shahbuland Matiana and Joe Penna and Omer Levy

(参考訳) テキスト・ツー・イメージのユーザから人間の好みの大規模なデータセットを収集する能力は通常、企業に限定されており、そのようなデータセットは一般にはアクセスできない。この問題に対処するため,テキスト・ツー・イメージのユーザが画像を生成し,好みを指定できるWebアプリを開発した。このWebアプリを使ってPick-a-Picという,テキストと画像のプロンプトの大規模でオープンなデータセットを構築します。このデータセットを利用して、CLIPベースのスコアリング機能PickScoreをトレーニングし、人間の好みを予測するタスクで超人的なパフォーマンスを示す。次に、モデル評価を行うPickScoreの能力を検証し、他の自動評価指標よりも人格との相関が優れていることを観察する。そこで我々は、将来のテキスト・画像生成モデルの評価にPickScoreを使うこと、MS-COCOよりも関連するデータセットとしてPick-a-Picプロンプトを使用することを推奨する。最後に、PickScoreが既存のテキスト・ツー・イメージモデルをどのように強化できるかをランキングで示す。

The ability to collect a large dataset of human preferences from text-to-image users is usually limited to companies, making such datasets inaccessible to the public. To address this issue, we create a web app that enables text-to-image users to generate images and specify their preferences. Using this web app we build Pick-a-Pic, a large, open dataset of text-to-image prompts and real users' preferences over generated images. We leverage this dataset to train a CLIP-based scoring function, PickScore, which exhibits superhuman performance on the task of predicting human preferences. Then, we test PickScore's ability to perform model evaluation and observe that it correlates better with human rankings than other automatic evaluation metrics. Therefore, we recommend using PickScore for evaluating future text-to-image generation models, and using Pick-a-Pic prompts as a more relevant dataset than MS-COCO. Finally, we demonstrate how PickScore can enhance existing text-to-image models via ranking.

翻訳日:2023-11-28 04:15:49 公開日:2023-11-23

# LeCo: シリアル相関学習による軽量圧縮

LeCo: Lightweight Compression via Learning Serial Correlations ( http://arxiv.org/abs/2306.15374v3 )

ライセンス: Link先を確認

Yihao Liu, Xinyu Zeng, Huanchen Zhang

(参考訳) 軽量データ圧縮は、カラムストアが分析クエリのパフォーマンスを向上する鍵となる技術である。シャノンのエントロピーに近づくための辞書ベースのエンコーディングに関する包括的な研究にもかかわらず、圧縮のための列のシリアル相関を体系的に利用した先行研究はほとんどない。本稿では,機械学習を用いて値列の連続冗長性を自動的に除去し,優れた圧縮率と減圧縮性能を同時に達成するフレームワークであるleco(すなわち学習圧縮)を提案する。 LeCoはこの目的に対して一般的なアプローチを示し、既存の(アドホックな)アルゴリズムであるFrame-of-Reference(FOR)、Delta Encoding(Delta Encoding)、Run-Length Encoding(RLE)をフレームワークの下に置く。 3つの合成データと6つの実世界のデータセットを持つマイクロベンチマークは、lecoのプロトタイプが既存のソリューションよりも圧縮比とランダムアクセス速度の両方においてparetoの改善を達成していることを示している。 lecoを広く使われているアプリケーションに統合する場合、arrowカラム型実行エンジンのデータ分析クエリで最大5.2倍のスピードアップとrocksdbのスループットが16%向上しました。

Lightweight data compression is a key technique that allows column stores to exhibit superior performance for analytical queries. Despite a comprehensive study on dictionary-based encodings to approach Shannon's entropy, few prior works have systematically exploited the serial correlation in a column for compression. In this paper, we propose LeCo (i.e., Learned Compression), a framework that uses machine learning to remove the serial redundancy in a value sequence automatically to achieve an outstanding compression ratio and decompression performance simultaneously. LeCo presents a general approach to this end, making existing (ad-hoc) algorithms such as Frame-of-Reference (FOR), Delta Encoding, and Run-Length Encoding (RLE) special cases under our framework. Our microbenchmark with three synthetic and six real-world data sets shows that a prototype of LeCo achieves a Pareto improvement on both compression ratio and random access speed over the existing solutions. When integrating LeCo into widely-used applications, we observe up to 5.2x speed up in a data analytical query in the Arrow columnar execution engine and a 16% increase in RocksDB's throughput.

翻訳日:2023-11-28 04:08:09 公開日:2023-11-23

# ブロックチェーンによるフェデレーション学習 - リファレンスアーキテクチャ設計、実装、検証

Blockchain-Enabled Federated Learning: A Reference Architecture Design, Implementation, and Verification ( http://arxiv.org/abs/2306.10841v3 )

ライセンス: Link先を確認

Eunsu Goh, Dae-Yeol Kim, Kwangkee Lee, Suyeong Oh, Jong-Eui Chae, Do-Yup Kim

(参考訳) 本稿では,ブロックチェーン可能な連合学習(bcfl)のための新たなリファレンスアーキテクチャを提案する。このアプローチは,連合学習とブロックチェーン技術の強みを融合させるものである。我々は,スマートコントラクト機能,利害関係者とその役割を定義し,惑星間ファイルシステム(ipfs)をbcfの重要なコンポーネントとして使用し,包括的な分析を行う。従来の集中型フェデレーション学習では、各ラウンド毎のローカルノードの選択と学習結果の収集は、中央サーバの制御の下でマージされる。対照的にBCFLでは、これらのプロセスはすべて監視され、スマートコントラクトを通じて管理されます。さらに,クロスデバイスとクロスサイロ連合学習シナリオの両方をサポートする拡張アーキテクチャを提案する。さらに,実際のEthereum開発環境におけるアーキテクチャの実装と検証を行う。私たちのBCFL参照アーキテクチャは柔軟性と拡張性を提供し、特定の要件やユースケースに応じて様々な追加要素を統合することで、広範囲のBCFLアプリケーションに適応可能なソリューションになります。拡張性の顕著な例として、did(decentralized identifiers)がbcflで実用的利用を導入するための認証手法として採用されている。この研究は、研究と実践的な展開の間に重要なギャップを埋めるだけでなく、BCFLの領域における将来の探査の基盤となる。この研究の重要な貢献は、現実的なBCFL参照アーキテクチャの実装と検証の成功である。私たちは近いうちにソースコードを公開し、コミュニティ内のさらなる進歩と適応を促進するつもりです。

This paper presents a novel reference architecture for blockchain-enabled federated learning (BCFL), a state-of-the-art approach that amalgamates the strengths of federated learning and blockchain technology.We define smart contract functions, stakeholders and their roles, and the use of interplanetary file system (IPFS) as key components of BCFL and conduct a comprehensive analysis. In traditional centralized federated learning, the selection of local nodes and the collection of learning results for each round are merged under the control of a central server. In contrast, in BCFL, all these processes are monitored and managed via smart contracts. Additionally, we propose an extension architecture to support both crossdevice and cross-silo federated learning scenarios. Furthermore, we implement and verify the architecture in a practical real-world Ethereum development environment. Our BCFL reference architecture provides significant flexibility and extensibility, accommodating the integration of various additional elements, as per specific requirements and use cases, thereby rendering it an adaptable solution for a wide range of BCFL applications. As a prominent example of extensibility, decentralized identifiers (DIDs) have been employed as an authentication method to introduce practical utilization within BCFL. This study not only bridges a crucial gap between research and practical deployment but also lays a solid foundation for future explorations in the realm of BCFL. The pivotal contribution of this study is the successful implementation and verification of a realistic BCFL reference architecture. We intend to make the source code publicly accessible shortly, fostering further advancements and adaptations within the community.

翻訳日:2023-11-28 04:06:47 公開日:2023-11-23

# MARBLE:ユニバーサル評価のための音楽オーディオ表現ベンチマーク

MARBLE: Music Audio Representation Benchmark for Universal Evaluation ( http://arxiv.org/abs/2306.10548v4 )

ライセンス: Link先を確認

Ruibin Yuan, Yinghao Ma, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Le Zhuo, Yiqi Liu, Jiawen Huang, Zeyue Tian, Binyue Deng, Ningzhi Wang, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Roger Dannenberg, Wenhu Chen, Gus Xia, Wei Xue, Si Liu, Shi Wang, Ruibo Liu, Yike Guo, Jie Fu

(参考訳) 画像生成やフィクションの共創など、芸術と人工知能(AI)の広範な交差の時代において、音楽のためのAIは、特に音楽の理解において比較的初期段階にある。これは、深い音楽表現に関する限られた作業、大規模データセットの不足、普遍的でコミュニティ主導のベンチマークの欠如によって明らかである。この問題に対処するため,MARBLEと呼ばれるUniversaL評価のためのMusic Audio Representation Benchmarkを導入する。音響、パフォーマンス、スコア、ハイレベル記述を含む4つの階層レベルを持つ包括的分類を定義することで、様々な音楽情報検索(MIR)タスクのベンチマークを提供する。次に,8つの公開データセット上で14のタスクに基づく統一プロトコルを構築し,音楽録音をベースラインとして開発したオープンソース事前学習モデルの表現を公平かつ標準的に評価する。さらに、MARBLEは、データセットの著作権問題に関する明確な声明とともに、使いやすく、拡張可能で、再現可能なスイートをコミュニティに提供する。その結果、近年提案されている大規模事前学習型言語モデルは、多くのタスクにおいて最善を尽くし、さらなる改善の余地があることがわかった。 leaderboardと toolkitリポジトリは、将来の音楽ai研究を促進するためにhttps://marble-bm.shef.ac.ukで公開されている。

In the era of extensive intersection between art and Artificial Intelligence (AI), such as image generation and fiction co-creation, AI for music remains relatively nascent, particularly in music understanding. This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark. To address this issue, we introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description. We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines. Besides, MARBLE offers an easy-to-use, extendable, and reproducible suite for the community, with a clear statement on copyright issues on datasets. Results suggest recently proposed large-scale pre-trained musical language models perform the best in most tasks, with room for further improvement. The leaderboard and toolkit repository are published at https://marble-bm.shef.ac.uk to promote future music AI research.

翻訳日:2023-11-28 04:06:22 公開日:2023-11-23

# 平均化」による不均一時系列予測の改善と食料需要予測への応用

Improving Forecasts for Heterogeneous Time Series by "Averaging", with Application to Food Demand Forecast ( http://arxiv.org/abs/2306.07119v2 )

ライセンス: Link先を確認

Lukas Neubauer, Peter Filzmoser

(参考訳) 実世界のアプリケーションにおける一般的な予測設定は、同一領域のおそらく異種時系列の集合を考える。長さなどの各時系列の特性が異なるため、各時系列の予測を直進的に得ることは困難である。本稿では,k-ネアレスト近傍の近傍に類似する時系列を探索するために,動的時間ウォーピングにおける類似度尺度を用いた一般的な枠組みを提案し,平均化による簡易モデルの予測を改善する。平均化を行ういくつかの方法が提案され、理論的議論は平均化が予測に有用であることを示す。さらに、診断ツールの提案により、手順の深い理解が可能になる。

A common forecasting setting in real world applications considers a set of possibly heterogeneous time series of the same domain. Due to different properties of each time series such as length, obtaining forecasts for each individual time series in a straight-forward way is challenging. This paper proposes a general framework utilizing a similarity measure in Dynamic Time Warping to find similar time series to build neighborhoods in a k-Nearest Neighbor fashion, and improve forecasts of possibly simple models by averaging. Several ways of performing the averaging are suggested, and theoretical arguments underline the usefulness of averaging for forecasting. Additionally, diagnostics tools are proposed allowing a deep understanding of the procedure.

翻訳日:2023-11-28 04:05:31 公開日:2023-11-23

# drive-bath interplayのデコード:超伝導向上のための指針

Decoding the drive-bath interplay: A guideline to enhance superconductivity ( http://arxiv.org/abs/2306.02861v2 )

ライセンス: Link先を確認

Rui Lin, Aline Ramires, R. Chitra

(参考訳) 駆動散逸物理学は量子光学の核にある。しかし、駆動量子多体系とその環境との完全な相互作用は、固体領域では比較的解明されていない。本研究では, 駆動型超伝導体の具体例に基づいて, 一般に採用されているストロボスコピック・ハミルトニアン・ピクチャーを超えて, この相互作用を検証した。シャーリー・フロケットとケルディッシュの定式化と、駆動されたケースに対する超伝導適合性の概念の一般化を用いて、超伝導ギャップ演算子と反共役する駆動が、熱浴の観点からスペクトル関数の異常な粒子ホール構造を一般化することを示した。基礎となる相互作用の固有遮断周波数とほぼ共振する駆動周波数と相まって、このスペクトル構造を利用して超伝導遷移温度を高めることができる。我々の研究は、固体系における物質のエキゾチック相の駆動散逸工学のさらなる研究の道を開く。

Driven-dissipative physics lie at the core of quantum optics. However, the full interplay between a driven quantum many-body system and its environment remains relatively unexplored in the solid state realm. In this work, we inspect this interplay beyond the commonly employed stroboscopic Hamiltonian picture based on the specific example of a driven superconductor. Using the Shirley-Floquet and Keldysh formalisms as well as a generalization of the notion of superconducting fitness to the driven case, we show how a drive which anti-commutes with the superconducting gap operator generically induces an unusual particle-hole structure in the spectral functions from the perspective of the thermal bath. Concomitant with a driving frequency which is near resonant with the intrinsic cutoff frequency of the underlying interaction, this spectral structure can be harnessed to enhance the superconducting transition temperature. Our work paves the way for further studies for driven-dissipative engineering of exotic phases of matter in solid-state systems.

翻訳日:2023-11-28 04:04:33 公開日:2023-11-23

# 量子力学による固有エネルギー推定--統一ノイズレジリエント測定駆動アプローチ

Estimating Eigenenergies from Quantum Dynamics: A Unified Noise-Resilient Measurement-Driven Approach ( http://arxiv.org/abs/2306.01858v3 )

ライセンス: Link先を確認

Yizhi Shen, Daan Camps, Aaron Szasz, Siva Darbha, Katherine Klymko, David B. Williams--Young, Norm M. Tubman, Roel Van Beeumen

(参考訳) 物理、化学、材料科学における基底状態エネルギーの推定は、量子コンピューティングの最も有望な応用の1つである。本稿では,動的モード分解機構(dmd)を用いて,実時間計測値の収集と後処理を行い,固有値を求める新しいハイブリッド手法を提案する。量子力学の観点からは、量子多体系から利用可能な可観測の関数空間上の安定な変分法として、このアプローチを形式的に理解できることを確かめる。また,本手法が摂動雑音の存在下においても急速に収束することを示す理論的・数値的な証拠も提供し,様々な科学コミュニティで独立に開発された頑健な行列分解法に対する同型性を示す。スピン系および分子系に関する数値ベンチマークにより,最先端アルゴリズムに対する収束の促進と資源削減が示された。 DMD中心の戦略は、ノイズを系統的に緩和し、主要なハイブリッド量子古典的固有解法として際立っている。

Ground state energy estimation in physical, chemical, and materials sciences is one of the most promising applications of quantum computing. In this work, we introduce a new hybrid approach that finds the eigenenergies by collecting real-time measurements and post-processing them using the machinery of dynamic mode decomposition (DMD). From the perspective of quantum dynamics, we establish that our approach can be formally understood as a stable variational method on the function space of observables available from a quantum many-body system. We also provide strong theoretical and numerical evidence that our method converges rapidly even in the presence of a large degree of perturbative noise, and show that the method bears an isomorphism to robust matrix factorization methods developed independently across various scientific communities. Our numerical benchmarks on spin and molecular systems demonstrate an accelerated convergence and a favorable resource reduction over state-of-the-art algorithms. The DMD-centric strategy can systematically mitigate noise and stands out as a leading hybrid quantum-classical eigensolver.

翻訳日:2023-11-28 04:04:17 公開日:2023-11-23

# XTransCT:2本の直交X線投影による超高速CT再構成によるトランスフォーマーネットワークを用いた画像誘導放射線治療

XTransCT: Ultra-Fast Volumetric CT Reconstruction using Two Orthogonal X-Ray Projections for Image-guided Radiation Therapy via a Transformer Network ( http://arxiv.org/abs/2305.19621v2 )

ライセンス: Link先を確認

Chulong Zhang, Lin Liu, Jingjing Dai, Xuan Liu, Wenfeng He, Yinping Chan, Yaoqin Xie, Feng Chi, and Xiaokun Liang

(参考訳) CTスキャンは、患者の内臓の詳細な三次元的表現を提供する。しかし、従来のCT再構成技術は、体の完全な回転スキャンを通して数百から数千のX線投影を取得する必要があり、手術中のナビゲーションや位置決めは不可能である。超疎x線投影をct画像に再構成する画像誘導放射線療法では,放射線線量を大幅に削減し,測位やナビゲーションの機器負担を最小限に抑えることができる。本研究では,2次元X線画像からのCT画像のリアルタイム再構成を容易にするために,XTransCTと呼ばれる新しいトランスフォーマーアーキテクチャを提案する。病院から提供された50件の患者データセットと、数千件の患者を対象とするより大きなLIDC-IDRIを用いて、画像品質と構造信頼性に関するアプローチを評価する。さらに,LNDbデータセット上でのアルゴリズムの一般化性を検証した。本研究は, 画像品質, 構造精度, 一般化可能性において, アルゴリズムが他の手法を上回ることを示す。さらに,従来の3次元畳み込み法と比較して,約300%の速度向上を示し,約44msの3次元画像再構成を達成した。

Computed tomography (CT) scans offer a detailed, three-dimensional representation of patients' internal organs. However, conventional CT reconstruction techniques necessitate acquiring hundreds or thousands of x-ray projections through a complete rotational scan of the body, making navigation or positioning during surgery infeasible. In image-guided radiation therapy, a method that reconstructs ultra-sparse X-ray projections into CT images, we can exploit the substantially reduced radiation dose and minimize equipment burden for localization and navigation. In this study, we introduce a novel Transformer architecture, termed XTransCT, devised to facilitate real-time reconstruction of CT images from two-dimensional X-ray images. We assess our approach regarding image quality and structural reliability using a dataset of fifty patients, supplied by a hospital, as well as the larger public dataset LIDC-IDRI, which encompasses thousands of patients. Additionally, we validated our algorithm's generalizability on the LNDb dataset. Our findings indicate that our algorithm surpasses other methods in image quality, structural precision, and generalizability. Moreover, in comparison to previous 3D convolution-based approaches, we note a substantial speed increase of approximately 300 %, achieving 44 ms per 3D image reconstruction.

翻訳日:2023-11-28 04:04:00 公開日:2023-11-23

# 無損失可視化を用いた分類・混合データの説明可能な機械学習

Explainable Machine Learning for Categorical and Mixed Data with Lossless Visualization ( http://arxiv.org/abs/2305.18437v3 )

ライセンス: Link先を確認

Boris Kovalerchuk, Elijah McCoy

(参考訳) 不均一/混合データのための正確で解釈可能な機械学習(ML)モデルの構築は、数値データ用に設計されたアルゴリズムの長年にわたる課題である。この研究は、正確で説明可能なMLモデルをサポートするMLアルゴリズムの非数値属性のための数値符号化スキーム、これらの視覚化における視覚的ルール発見を伴うn-D非数値分類データの無意味な可視化方法、そして分類データのための正確で説明可能なMLモデルの開発に焦点を当てる。本研究では、混合データ型を分類し、機械学習におけるそれらの重要な役割を分析する。混合データ上での視覚的データ探索により、MLアルゴリズムのすべての内部操作の解釈可能性を高めるツールキットを提供する。カテゴリーデータを用いた説明可能なルール生成のための新しい逐次ルール生成(SRG)アルゴリズムを提案し,複数の計算実験で評価した。この研究は、Parallel Coordinatesを超えたGeneral Line Coordinatesにおけるn-Dデータのロスレス可視化をサポートする混合データのための全スコープMLアルゴリズムのステップの1つである。

Building accurate and interpretable Machine Learning (ML) models for heterogeneous/mixed data is a long-standing challenge for algorithms designed for numeric data. This work focuses on developing numeric coding schemes for non-numeric attributes for ML algorithms to support accurate and explainable ML models, methods for lossless visualization of n-D non-numeric categorical data with visual rule discovery in these visualizations, and accurate and explainable ML models for categorical data. This study proposes a classification of mixed data types and analyzes their important role in Machine Learning. It presents a toolkit for enforcing interpretability of all internal operations of ML algorithms on mixed data with a visual data exploration on mixed data. A new Sequential Rule Generation (SRG) algorithm for explainable rule generation with categorical data is proposed and successfully evaluated in multiple computational experiments. This work is one of the steps to the full scope ML algorithms for mixed data supported by lossless visualization of n-D data in General Line Coordinates beyond Parallel Coordinates.

翻訳日:2023-11-28 04:02:48 公開日:2023-11-23

# 多成分ボソニック系の記述における正準アンサンブル対大正準アンサンブル

Canonical Ensemble vs. Grand Canonical Ensemble in the Description of Multicomponent Bosonic Systems ( http://arxiv.org/abs/2307.08518v2 )

ライセンス: Link先を確認

D. Anchishkin, V. Gnatovskyy, D. Zhuravel, V. Karpenko, I. Mishustin, and H. Stoecker

(参考訳) ボース・アインシュタイン凝縮体の存在下でのボゾン粒子と反粒子の相互作用系の熱力学は、スカイムのような平均場モデルの枠組みで研究されている。全電荷密度(アイソスピン密度)は全温度で保存されていると仮定される。 2つのケースが明確に考慮されている: 系のゼロと非ゼロのアイソスピン電荷。カノニカル・アンサンブルとグランド・カノニカル・アンサンブルを用いて比較分析を行う。大正準アンサンブルは凝縮物の存在下で粒子と反粒子のボソニック系を記述するのに適していないことが示されている。

The thermodynamics of a system of interacting bosonic particles and antiparticles in the presence of the Bose-Einstein condensate is studied in the framework of the Skyrme-like mean-field model. It is assumed that the total charge density (isospin density) is conserved at all temperatures. Two cases are explicitly considered: zero and nonzero isospin charge of the system. A comparative analysis is carried out using Canonical Ensemble and Grand Canonical Ensemble. It is shown that the Grand Canonical Ensemble is not suitable for describing bosonic systems of particles and antiparticles in the presence of condensate.

翻訳日:2023-11-28 03:53:19 公開日:2023-11-23

# 社会AI学派 : 発達心理学から社会・文化エージェントへ

The SocialAI School: Insights from Developmental Psychology Towards Artificial Socio-Cultural Agents ( http://arxiv.org/abs/2307.07871v2 )

ライセンス: Link先を確認

Grgur Kova\v{c}, R\'emy Portelas, Peter Ford Dominey, Pierre-Yves Oudeyer

(参考訳) 発達心理学者は、人間の知性における社会認知能力の重要性を長い間確立してきた。これらの能力により、私たちは人間の文化に入り、参加し、利益を得ることができます。社会対話エージェントに関するAI研究は、主にマルチエージェント環境での文化の出現を懸念している(しばしば発達心理学の基盤が強くない)。我々は、AI研究は心理学から知らされ、文化への参入を可能にする社会認知能力を研究するべきだと論じている。我々は、michael tomasello と jerome bruner の理論を議論し、彼らの概念のいくつかをaiに導入し、重要な概念と社会認知能力の概要を説明する。 The SocialAI School - 手続き的に生成された環境のカスタマイズ可能なパラメータ化uiteを含むツールで、それらの概念に関する実験を単純化する。 rlエージェントと大規模言語モデルを用いた実験の例を示す。この研究の主な動機は、発達心理学から情報を得た社会知能の問題に関わるAIコミュニティへの取り組みと、この方向への第一歩を単純化するためのツールの提供である。コードと追加情報についてはプロジェクトのWebサイトを参照してください。

Developmental psychologists have long-established the importance of socio-cognitive abilities in human intelligence. These abilities enable us to enter, participate and benefit from human culture. AI research on social interactive agents mostly concerns the emergence of culture in a multi-agent setting (often without a strong grounding in developmental psychology). We argue that AI research should be informed by psychology and study socio-cognitive abilities enabling to enter a culture too. We discuss the theories of Michael Tomasello and Jerome Bruner to introduce some of their concepts to AI and outline key concepts and socio-cognitive abilities. We present The SocialAI school - a tool including a customizable parameterized uite of procedurally generated environments, which simplifies conducting experiments regarding those concepts. We show examples of such experiments with RL agents and Large Language Models. The main motivation of this work is to engage the AI community around the problem of social intelligence informed by developmental psychology, and to provide a tool to simplify first steps in this direction. Refer to the project website for code and additional information: https://sites.google.com/view/socialai-school.

翻訳日:2023-11-28 03:53:09 公開日:2023-11-23

# 大規模言語モデルの包括的概要

A Comprehensive Overview of Large Language Models ( http://arxiv.org/abs/2307.06435v6 )

ライセンス: Link先を確認

Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, Ajmal Mian

(参考訳) 大規模言語モデル(LLM)は、最近自然言語処理タスクなどにおいて顕著な機能を示した。 LLMの成功は、この方向に多くの研究貢献をもたらした。これらの作業は、アーキテクチャの革新、より良いトレーニング戦略、コンテキスト長の改善、微調整、マルチモーダルllm、ロボティクス、データセット、ベンチマーク、効率など、さまざまなトピックをカバーする。 LLM研究における技術の急速な発展と定期的なブレークスルーにより、この方向の進歩の全体像を理解することは極めて困難になっている。 LLMに関する文献が急速に増えていることを考えると、研究コミュニティは、この分野の最近の発展の簡潔かつ包括的概要から恩恵を受けることができることが不可欠である。本稿では, LLM関連概念の幅広い範囲について, 既存の文献について概説する。 LLM研究の最前線における先進的なトピックを取り上げ,その背景概念について概観した。このレビュー記事は、体系的な調査だけでなく、研究者や実践者が既存の研究の広範な情報的要約から洞察を引き出し、LLM研究を前進させることも意図している。

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks and beyond. This success of LLMs has led to a large influx of research contributions in this direction. These works encompass diverse topics such as architectural innovations, better training strategies, context length improvements, fine-tuning, multi-modal LLMs, robotics, datasets, benchmarking, efficiency, and more. With the rapid development of techniques and regular breakthroughs in LLM research, it has become considerably challenging to perceive the bigger picture of the advances in this direction. Considering the rapidly emerging plethora of literature on LLMs, it is imperative that the research community is able to benefit from a concise yet comprehensive overview of the recent developments in this field. This article provides an overview of the existing literature on a broad range of LLM-related concepts. Our self-contained comprehensive overview of LLMs discusses relevant background concepts along with covering the advanced topics at the frontier of research in LLMs. This review article is intended to not only provide a systematic survey but also a quick comprehensive reference for the researchers and practitioners to draw insights from extensive informative summaries of the existing works to advance the LLM research.

翻訳日:2023-11-28 03:52:22 公開日:2023-11-23

# 平衡から外れた位相的近藤模型

The topological Kondo model out of equilibrium ( http://arxiv.org/abs/2307.03773v2 )

ライセンス: Link先を確認

Matteo M. Wauters, Chia-Min Chung, Lorenzo Maffi, Michele Burrello

(参考訳) 位相的近藤効果はマヨラナモードの非局所性の真の顕現である。それらのトポロジカルモードを4つホストするクーパーペアボックスを用いたモデルにおいて, それぞれが金属鉛に結合した平衡外シグネチャについて検討する。超伝導体の力学を研究するために調整された高度なマトリックス-生成物-状態アプローチにより、マヨラナ磁化の緩和をシミュレートし、関連する近藤温度を判定し、鉛電圧の量子クエンチ後の電気輸送の開始を分析する。本研究は, 二重ナノワイヤで作製したMajorana Cooper-pairボックスに適用し, 弱結合状態から強相関の近藤政権への交叉の非摂動的証拠を提供する。後者は超伝導電荷縮退点で支配的であり、期待される普遍分数ゼロバイアスコンダクタンスを表示する。

The topological Kondo effect is a genuine manifestation of the nonlocality of Majorana modes. We investigate its out-of-equilibrium signatures in a model with a Cooper-pair box hosting four of these topological modes, each connected to a metallic lead. Through an advanced matrix-product-state approach tailored to study the dynamics of superconductors, we simulate the relaxation of the Majorana magnetization, which allows us to determine the related Kondo temperature, and we analyze the onset of electric transport after a quantum quench of a lead voltage. Our results apply to Majorana Cooper-pair boxes fabricated in double nanowire devices and provide nonperturbative evidence of the crossover from weak-coupling states to the strongly correlated topological Kondo regime. The latter dominates at the superconductor charge degeneracy points and displays the expected universal fractional zero-bias conductance.

翻訳日:2023-11-28 03:51:55 公開日:2023-11-23

# 等変フローマッチング

Equivariant flow matching ( http://arxiv.org/abs/2306.15030v2 )

ライセンス: Link先を確認

Leon Klein, Andreas Kr\"amer, Frank No\'e

(参考訳) 正規化フロー(英: normalizing flow)は、物理学における確率分布のモデル化において特に興味深い深層生成モデルの一種であり、流れの正確な可能性によって既知の対象エネルギー関数への重み付けと偏りのない観測可能性の計算が可能になる。例えば、ボルツマン発生器は、小さな分子やタンパク質のような多体系の平衡サンプルを生成するためのトレーニングフローによって、統計物理学における長期間にわたるサンプリング問題に取り組む。このようなシステムに対して効果的なモデルを構築するためには、同変連続正規化フロー(CNF)によって達成される対象エネルギーの対称性をモデルに組み込むことが重要である。しかし、cnfはトレーニングやサンプル生成に計算コストがかかるため、スケーラビリティや実用的応用を妨げている。本稿では,最近提案された最適輸送流マッチングに基づく同変CNFの新しいトレーニング目標である同変フローマッチングを提案する。等変流マッチングは、標的エネルギーの物理対称性を利用して、同変CNFの効率的でシミュレーションなしな訓練を行う。本稿では, 回転および置換不変多粒子系および小分子アラニンジペプチドに対するフローマッチングの有効性を実証する。この結果から,同変フローマッチングの対象は,従来の手法に比べて,より短い積分経路,サンプリング効率の向上,スケーラビリティの向上を図っている。

Normalizing flows are a class of deep generative models that are especially interesting for modeling probability distributions in physics, where the exact likelihood of flows allows reweighting to known target energy functions and computing unbiased observables. For instance, Boltzmann generators tackle the long-standing sampling problem in statistical physics by training flows to produce equilibrium samples of many-body systems such as small molecules and proteins. To build effective models for such systems, it is crucial to incorporate the symmetries of the target energy into the model, which can be achieved by equivariant continuous normalizing flows (CNFs). However, CNFs can be computationally expensive to train and generate samples from, which has hampered their scalability and practical application. In this paper, we introduce equivariant flow matching, a new training objective for equivariant CNFs that is based on the recently proposed optimal transport flow matching. Equivariant flow matching exploits the physical symmetries of the target energy for efficient, simulation-free training of equivariant CNFs. We demonstrate the effectiveness of flow matching on rotation and permutation invariant many-particle systems and a small molecule, alanine dipeptide, where for the first time we obtain a Boltzmann generator with significant sampling efficiency without relying on tailored internal coordinate featurization. Our results show that the equivariant flow matching objective yields flows with shorter integration paths, improved sampling efficiency, and higher scalability compared to existing methods.

翻訳日:2023-11-28 03:51:38 公開日:2023-11-23

# ディック状態のエントロピー円錐と絡み合い進化

Entropy Cones and Entanglement Evolution for Dicke States ( http://arxiv.org/abs/2306.13146v2 )

ライセンス: Link先を確認

William Munizzi, Howard J. Schnitzer

(参考訳) N$-qubit Dicke state $|D^N_k\rangle$, of Hamming-weight $k$は量子アルゴリズムの最適化において重要な役割を果たす絡み合った状態のクラスである。ディッケ状態における絡み合いエントロピーの一般計算を行い, |d^n_k\rangle$エントロピー円錐を記述する。我々は、すべての$|D^N_k\rangle$エントロピーベクトルが対称化されることを示し、これを用いて、$|D^N_k\rangle$エントロピーベクトルを実現するスターグラフ上のmin-cutプロトコルを定義する。すべての$|D^N_k\rangle$に対する安定化群を、$N$-qubit Pauli群と2-qubit Clifford群の作用の下で同定し、$|D^N_k\rangle$リーチビリティグラフを構成する。これらの到達可能性グラフを用いて、クリフォード回路における$|d^n_k\rangle$エントロピーベクトルの進化を解析・束縛する。

The $N$-qubit Dicke states $|D^N_k\rangle$, of Hamming-weight $k$, are a class of entangled states which play an important role in quantum algorithm optimization. We present a general calculation of entanglement entropy in Dicke states, which we use to describe the $|D^N_k\rangle$ entropy cone. We demonstrate that all $|D^N_k\rangle$ entropy vectors emerge symmetrized, and use this to define a min-cut protocol on star graphs which realizes $|D^N_k\rangle$ entropy vectors. We identify the stabilizer group for all $|D^N_k\rangle$, under the action of the $N$-qubit Pauli group and two-qubit Clifford group, which we use to construct $|D^N_k\rangle$ reachability graphs. We use these reachability graphs to analyze and bound the evolution of $|D^N_k\rangle$ entropy vectors in Clifford circuits.

翻訳日:2023-11-28 03:50:28 公開日:2023-11-23

# Karasu: ビッグデータ分析のための効率的なクラスタ構成のためのコラボレーションアプローチ

Karasu: A Collaborative Approach to Efficient Cluster Configuration for Big Data Analytics ( http://arxiv.org/abs/2308.11792v2 )

ライセンス: Link先を確認

Dominik Scheinert, Philipp Wiesner, Thorsten Wittkopp, Lauritz Thamsen, Jonathan Will, and Odej Kao

(参考訳) マシンタイプやクラスタサイズなど、さまざまな設定オプションがあるため、ビッグデータ分析ジョブの適切なリソースの選択は困難です。選択不足が資源効率、コスト、エネルギー利用に重大な影響を与えるため、自動化アプローチが人気を集めています。既存のメソッドのほとんどは、時間とともに最適に近いソリューションを見つけるために、繰り返し発生するワークロードのプロファイリングに依存している。コールドスタートの問題のため、これはしばしば長くコストのかかるプロファイリングフェーズにつながる。しかし、ユーザ間のビッグデータ分析ジョブは、多くの共通プロパティを共有することができる。集約されたプロファイリングを共有する可能性は、コールドスタート問題に対処するために協調的に実行されます。 Karasuは、同様のインフラストラクチャ、フレームワーク、アルゴリズム、データセットを扱うユーザ間のデータ共有を促進する、より効率的なリソース構成プロファイリングのアプローチである。 karasuはコラボレータの集約ランタイム情報を使用して軽量なパフォーマンスモデルをトレーニングし、それらをアンサンブルメソッドに組み合わせ、構成検索空間の固有の知識を利用する。さらに、カラスでは複数の目的を同時に最適化できる。評価は,パブリッククラウド環境における多様なワークロード実行のパフォーマンスデータに基づく。対象のジョブに共通する部分的な特徴のみを共有するプロファイリングの実行がほとんどない場合でも,カラスではパフォーマンス,検索時間,コストの観点から既存手法を大幅に向上できることを示す。

Selecting the right resources for big data analytics jobs is hard because of the wide variety of configuration options like machine type and cluster size. As poor choices can have a significant impact on resource efficiency, cost, and energy usage, automated approaches are gaining popularity. Most existing methods rely on profiling recurring workloads to find near-optimal solutions over time. Due to the cold-start problem, this often leads to lengthy and costly profiling phases. However, big data analytics jobs across users can share many common properties: they often operate on similar infrastructure, using similar algorithms implemented in similar frameworks. The potential in sharing aggregated profiling runs to collaboratively address the cold start problem is largely unexplored. We present Karasu, an approach to more efficient resource configuration profiling that promotes data sharing among users working with similar infrastructures, frameworks, algorithms, or datasets. Karasu trains lightweight performance models using aggregated runtime information of collaborators and combines them into an ensemble method to exploit inherent knowledge of the configuration search space. Moreover, Karasu allows the optimization of multiple objectives simultaneously. Our evaluation is based on performance data from diverse workload executions in a public cloud environment. We show that Karasu is able to significantly boost existing methods in terms of performance, search time, and cost, even when few comparable profiling runs are available that share only partial common characteristics with the target job.

翻訳日:2023-11-28 03:43:11 公開日:2023-11-23

# 変分量子ビット効率maxcutヒューリスティックアルゴリズム

A Variational Qubit-Efficient MaxCut Heuristic Algorithm ( http://arxiv.org/abs/2308.10383v2 )

ライセンス: Link先を確認

Yovav Tene-Cohen, Tomer Kelman, Ohad Lev, and Adi Makmal

(参考訳) MaxCutは、Isingモデルやチップ設計を含む広範な理論および工業的応用を持つNP-Hard組合せ最適化グラフ問題である。量子コンピューティングは、古典的なスキームよりも優れた組合せ問題に対する新しい解決策を提供する一方で、Quantum Approximate Optimization Algorithm (QAOA)は最先端の例であるが、その性能は現在、ハードウェアノイズと限定量子ビット数によって妨げられている。本稿では,QAOAと比較して指数的に削減されたグラフサイズに対して,量子ビットの対数を必要とする変分Qubit-Efficient MaxCut (QEMC)アルゴリズムを提案する。実超伝導ハードウェア上で最大32ノード (5 qubits) のグラフインスタンスに対して,ノイズレスシミュレーションを用いた最大2048ノード(11 qubits) のグラフに対して,Guemans and Williamson (GW) の確立した古典的アルゴリズムよりも優れた最先端性能を示す。 qemcアルゴリズムの革新的なエンコーディング方式は、大きなノイズ耐性を持つ一方で、その効率の良い古典的シミュレーションを可能にするため、量子的な利点を損なう。にもかかわらず、量子優位性がなくても、QEMCアルゴリズムは量子に着想を得た潜在的なアルゴリズムとして機能し、QAOAの挑戦的なベンチマークを提供し、他の量子および古典的アルゴリズムに拡張する可能性のある新しいエンコーディングパラダイムを提供する。

MaxCut is a key NP-Hard combinatorial optimization graph problem with extensive theoretical and industrial applications, including the Ising model and chip design. While quantum computing offers new solutions for such combinatorial challenges which are potentially better than classical schemes, with the Quantum Approximate Optimization Algorithm (QAOA) being a state-of-the-art example, its performance is currently hindered by hardware noise and limited qubit number. Here, we present a new variational Qubit-Efficient MaxCut (QEMC) algorithm that requires a logarithmic number of qubits with respect to the graph size, an exponential reduction compared to QAOA. We demonstrate cutting-edge performance for graph instances consisting of up to 32 nodes (5 qubits) on real superconducting hardware, and for graphs with up to 2048 nodes (11 qubits) using noiseless simulations, outperforming the established classical algorithm of Goemans and Williamson (GW). The QEMC algorithm's innovative encoding scheme empowers it with great noise-resiliency on the one hand, but also enables its efficient classical simulation on the other, thus obscuring a distinct quantum advantage. Nevertheless, even in the absence of quantum advantage, the QEMC algorithm serves as a potential quantum-inspired algorithm, provides a challenging benchmark for QAOA, and presents a novel encoding paradigm with potential applications extending to other quantum and classical algorithms.

翻訳日:2023-11-28 03:42:49 公開日:2023-11-23

# EDDense-Net: オプティカルカップとディスクの同時分割のための完全高密度エンコーダデコーダネットワーク

EDDense-Net: Fully Dense Encoder Decoder Network for Joint Segmentation of Optic Cup and Disc ( http://arxiv.org/abs/2308.10192v2 )

ライセンス: Link先を確認

Mehwish Mehmood, Khuram Naveed, Khursheed Aurangzeb, Haroon Ahmed Khan, Musaed Alhussein, Syed Saud Naqvi

(参考訳) 緑内障(英: Glaucoma)は、視神経に損傷を与える眼疾患であり、視覚障害と永久盲眼を引き起こす。したがって、早期緑内障検出は永久盲目を避けるために重要である。緑内障の診断には、光ディスク(OD)検査におけるカップ・ツー・ディスク比(CDR)の推定が用いられる。本稿では,OCとODの結合分割のためのEDDense-Netセグメンテーションネットワークを提案する。このネットワークのエンコーダとデコーダは、各ブロックにグループ化された畳み込み層を持つ密ブロックで構成されており、同時にネットワークの複雑さを低減しつつ、画像から空間情報を取得、伝達することができる。空間情報損失を低減するため,全ての畳み込み層におけるフィルタの最適数を利用した。セマンティックセグメンテーションでは、クラス不均衡の問題を軽減するためにデコーダにダイスピクセル分類を用いる。提案するネットワークは2つの公開データセットで評価され、精度と効率の点で既存の最先端手法を上回っていた。緑内障の診断と解析には、医用眼科医を支援するための第2の意見システムとして使用できる。

Glaucoma is an eye disease that causes damage to the optic nerve, which can lead to visual loss and permanent blindness. Early glaucoma detection is therefore critical in order to avoid permanent blindness. The estimation of the cup-to-disc ratio (CDR) during an examination of the optical disc (OD) is used for the diagnosis of glaucoma. In this paper, we present the EDDense-Net segmentation network for the joint segmentation of OC and OD. The encoder and decoder in this network are made up of dense blocks with a grouped convolutional layer in each block, allowing the network to acquire and convey spatial information from the image while simultaneously reducing the network's complexity. To reduce spatial information loss, the optimal number of filters in all convolution layers were utilised. In semantic segmentation, dice pixel classification is employed in the decoder to alleviate the problem of class imbalance. The proposed network was evaluated on two publicly available datasets where it outperformed existing state-of-the-art methods in terms of accuracy and efficiency. For the diagnosis and analysis of glaucoma, this method can be used as a second opinion system to assist medical ophthalmologists.

翻訳日:2023-11-28 03:42:20 公開日:2023-11-23

# 確率フリー仮説テストのためのカーネルベーステスト

Kernel-Based Tests for Likelihood-Free Hypothesis Testing ( http://arxiv.org/abs/2308.09043v2 )

ライセンス: Link先を確認

Patrik R\'obert Gerber, Tianze Jiang, Yury Polyanskiy, Rui Sun

(参考訳) 2つの平衡クラスからの$n$の観察を与えられた場合、これら2つのクラスの \emph{one} に属することが分かっている追加の$m$入力をラベル付けるタスクを考える。この問題の特別なケースはよく知られており、クラス分布の完全な知識(n=\infty$)は、確率比テストによって最適に解かれる;$m=1$は二値分類に対応し、$m\approx n$は二値検定と同値である。中間設定は、ラベル付きサンプルを前方シミュレーションにより取得し、ラベルなしサンプルを実験的に収集する確率フリー推論の分野で行われる。最近の研究で、$m$と$n$の間に基本的なトレードオフがあることが判明した。この作品では (a) ラベルのないサンプルが2つのクラスを混ぜ合わせたものであるという一般化を導入すること。 b) <textit{maximum mean discrepancy} (MMD) 分離の下での非パラメトリックな密度のクラスに対するミニマックスサンプル複雑性の研究 (c) ニューラルネットワークでパラメータ化されたカーネルの2つのタスクにおいて, ヒッグス粒子の検出と, CIFAR-10画像中のDDPM生成画像の検出を行う。どちらの問題に対しても、理論的に予測された非対称$m$対$n$トレードオフの存在を確認する。

Given $n$ observations from two balanced classes, consider the task of labeling an additional $m$ inputs that are known to all belong to \emph{one} of the two classes. Special cases of this problem are well-known: with complete knowledge of class distributions ($n=\infty$) the problem is solved optimally by the likelihood-ratio test; when $m=1$ it corresponds to binary classification; and when $m\approx n$ it is equivalent to two-sample testing. The intermediate settings occur in the field of likelihood-free inference, where labeled samples are obtained by running forward simulations and the unlabeled sample is collected experimentally. In recent work it was discovered that there is a fundamental trade-off between $m$ and $n$: increasing the data sample $m$ reduces the amount $n$ of training/simulation data needed. In this work we (a) introduce a generalization where unlabeled samples come from a mixture of the two classes -- a case often encountered in practice; (b) study the minimax sample complexity for non-parametric classes of densities under \textit{maximum mean discrepancy} (MMD) separation; and (c) investigate the empirical performance of kernels parameterized by neural networks on two tasks: detection of the Higgs boson and detection of planted DDPM generated images amidst CIFAR-10 images. For both problems we confirm the existence of the theoretically predicted asymmetric $m$ vs $n$ trade-off.

翻訳日:2023-11-28 03:41:40 公開日:2023-11-23

# lindblad以外の量子ビットダイナミクス:非マルコフ性と回転波近似

Qubit Dynamics beyond Lindblad: Non-Markovianity versus Rotating Wave Approximation ( http://arxiv.org/abs/2308.06029v2 )

ライセンス: Link先を確認

Kiyoto Nakamura, Joachim Ankerhold

(参考訳) 実際の量子ビットデバイスの性能が向上するにつれて、量子ビットと環境自由度の間の相互作用の微妙な効果が徐々に関連し、実験的に見えるようになる。これは特に、従来のリンドブラッド・マスター方程式(LE: Lindblad master equation)、マルコフ近似(Markov approximation)と回転波近似(RWA:Roing Wave approximation)という、キュービット演算に最もよく使用される数値シミュレーションプラットフォームの基礎となる時間スケールの分離に適用される。この貢献で私たちは質問に光を当てた (i)これらの時間スケール分離のいずれの違反を実験的に監視できる程度 (ii)関連するパラメータ範囲における(近似)数値スキーム内の高精度な予測を提供するのに最も厳しいものはどれか。そこで本研究では, 還元密度行列の3つのシミュレーション手法と, 漸進的に増加する精度を比較した。特に,オーミックとサブオーミックのスペクトル密度を持つ貯水池の存在下での量子ビット系の緩和と非一貫性の予測について検討し,ラムゼー実験に基づく適切なプロトコルを用いて,非マルコフ性とrwaの役割を明らかにする。今後の実験への可能性や、近似的かつ正確な数値的アプローチの設計について論じる。

With increasing performance of actual qubit devices, even subtle effects in the interaction between qubits and environmental degrees of freedom become progressively relevant and experimentally visible. This applies particularly to the timescale separations that are at the basis of the most commonly used numerical simulation platform for qubit operations, namely, the conventional Lindblad master equation (LE): the Markov approximation and the rotating wave approximation (RWA). In this contribution we shed light on the questions (i) to which extent it is possible to monitor violations of either of these timescale separations experimentally and (ii) which of them is the most severe to provide highly accurate predictions within (approximate) numerical schemes in relevant parameter ranges. For this purpose, we compare three simulation methods for the reduced density matrix with progressively growing accuracy. In particular, predictions for relaxation and decoherence of a qubit system in presence of reservoirs with Ohmic and sub-Ohmic spectral densities are explored and, with the aid of proper protocols based on Ramsey experiments, the role of non-Markovianity and RWA are revealed. We discuss potential implications for future experiments and the design of approximate yet accurate numerical approaches.

翻訳日:2023-11-28 03:40:44 公開日:2023-11-23

# 非マルコフ量子ゲートセットトモグラフィ

Non-Markovian Quantum Gate Set Tomography ( http://arxiv.org/abs/2307.14696v3 )

ライセンス: Link先を確認

Ze-Tong Li, Cong-Cong Zheng, Fan-Xu Meng, Han Zeng, Tian Luan, Zai-Chen Zhang, Xu-Tao Yu

(参考訳) エンジニアリング量子デバイスは、量子ビット、量子演算(計器としても知られる)、量子ノイズを含む、量子システムの信頼性の高いキャラクタリゼーションを必要とする。近年,量子ゲート集合トモグラフィ(gst)は,量子状態,ゲート,測定を自己整合的に記述するための強力な技術として出現している。しかし、量子系と環境の間の非マルコフ相関はGSTの信頼性に影響を与える。これを解決するために,非マルコフGSTのための計器セットトモグラフィー(IST)と呼ばれる自己整合演算フレームワークを提案する。確率的量子過程に基づいて、機器セットは機器とシステム環境(SE)相関を記述する。楽器とSE相関を物理的制約なく記述するための線形反転IST(LIST)を導入する。楽器間の線形関係の不整合を検出する。さらに、マルコフ順序に関するパラメータの多項式数を持つIST(MLE-IST)の最大推定値に基づいて、物理的に制約された統計手法を提案する。 MLE-ISTは、モデルと制約を調整することで、ノイズの多い中間スケール量子(NISQ)デバイスなど、さまざまな種類のデバイスに適応する際の大きな柔軟性を示している。実験結果から,機器とSE相関の同時記述の有効性と必要性が示された。特に、LIST と MLE-IST は、比較法と比較して、-23.77 と -6.21 の順で実装された不完全なシミュレーションにおいて平均2乗誤差の削減を著しく改善する。その結果、ISTは、機器セットの観点で量子デバイスを特徴づけ、ベンチマークし、開発するための本質的で自己整合的なフレームワークを提供する。

Engineering quantum devices requires reliable characterization of the quantum system, including qubits, quantum operations (also known as instruments) and the quantum noise. Recently, quantum gate set tomography (GST) has emerged as a powerful technique for self-consistently describing quantum states, gates, and measurements. However, non-Markovian correlations between the quantum system and environment impact the reliability of GST. To address this, we propose a self-consistent operational framework called instrument set tomography (IST) for non-Markovian GST. Based on the stochastic quantum process, the instrument set describes instruments and system-environment (SE) correlations. We introduce a linear inversion IST (LIST) to describe instruments and SE correlations without physical constraints. The disharmony of linear relationships between instruments is detected. Furthermore, we propose a physically constrained statistical method based on the maximum likelihood estimation for IST (MLE-IST), with a polynomial number of parameters with respect to the Markovian order. MLE-IST shows significant flexibility in adapting to different types of devices, such as noisy intermediate-scale quantum (NISQ) devices, by adjusting the model and constraints. Experimental results demonstrate the effectiveness and necessity of simultaneously describing instruments and SE correlations. Specifically, the LIST and MLE-IST obtains significant improvement on average square error reduction in the imperfect implemented simulations by orders of -23.77 and -6.21, respectively, compared to their comparative methods. Consequently, IST provides an essential and self-consistent framework for characterizing, benchmarking, and developing quantum devices in terms of the instrument set.

翻訳日:2023-11-28 03:38:19 公開日:2023-11-23

# 第1回社会ロボットパーソナライゼーションワークショップの開催にあたって

Proceeding of the 1st Workshop on Social Robots Personalisation At the crossroads between engineering and humanities (CONCATENATE) ( http://arxiv.org/abs/2307.12777v2 )

ライセンス: Link先を確認

Imene Tarakli, Georgios Angelopoulos, Mehdi Hellou, Camille Vindolet, Boris Abramovic, Rocco Limongelli, Dimitri Lacroix, Andrea Bertolini, Silvia Rossi, Alessandro Di Nuovo, Angelo Cangelosi, Gordon Cheng

(参考訳) 現在、ロボットはより物理的、認知的、社会的に人と対話することが期待されている。彼らは様々な行動を持つ個人と一緒に予測不能な状況に適応すべきである。そのため、個人化は、特定のユーザのニーズや好みに応じて行動し、人間にとって自然で透明なロボット行動を達成することができるため、社会ロボットにとって貴重な属性である。正しく実装されれば、パーソナライズがソーシャルロボティクスの大規模採用の鍵となるかもしれない。しかし、様々な分野の専門知識を活用してロボット工学の境界を広げる必要があるため、パーソナライゼーションの達成は困難である。実際、パーソナライズされたロボットは、適応プロセスへの関与を考慮してユーザーインタラクションを分析し、モデル化する必要がある。また、個人化されたHRIの倫理的・社会的側面に対処し、包括的かつ多様な相互作用を達成し、ユーザとの対話において詐欺や誤った信頼を避ける必要がある。同時に、政策立案者は短期的かつ長期的適応的HRIの観点から規制を確保する必要がある。本ワークショップは,ロボットのパーソナライゼーションに関する学際的な議論を提起することを目的とする。異なる分野の研究者をまとめてパーソナライズのためのガイドラインを提案し、どのように定義するか、どのように達成するか、法的および倫理的要件に合うようにガイドするかという問題に対処することを目的としている。

Nowadays, robots are expected to interact more physically, cognitively, and socially with people. They should adapt to unpredictable contexts alongside individuals with various behaviours. For this reason, personalisation is a valuable attribute for social robots as it allows them to act according to a specific user's needs and preferences and achieve natural and transparent robot behaviours for humans. If correctly implemented, personalisation could also be the key to the large-scale adoption of social robotics. However, achieving personalisation is arduous as it requires us to expand the boundaries of robotics by taking advantage of the expertise of various domains. Indeed, personalised robots need to analyse and model user interactions while considering their involvement in the adaptative process. It also requires us to address ethical and socio-cultural aspects of personalised HRI to achieve inclusive and diverse interaction and avoid deception and misplaced trust when interacting with the users. At the same time, policymakers need to ensure regulations in view of possible short-term and long-term adaptive HRI. This workshop aims to raise an interdisciplinary discussion on personalisation in robotics. It aims at bringing researchers from different fields together to propose guidelines for personalisation while addressing the following questions: how to define it - how to achieve it - and how it should be guided to fit legal and ethical requirements.

翻訳日:2023-11-28 03:37:40 公開日:2023-11-23

# より効果的な体系的レビューのための自然言語クエリの生成

Generating Natural Language Queries for More Effective Systematic Review Screening Prioritisation ( http://arxiv.org/abs/2309.05238v3 )

ライセンス: Link先を確認

Shuai Wang, Harrisen Scells, Martin Potthast, Bevan Koopman, Guido Zuccon

(参考訳) 医学的体系的レビューにおける優先順位付けは、複雑なブールクエリによって検索された文書の集合をランク付けすることを目的としている。最も重要な文書の優先順位付けにより、その後のレビュー手順をより効率的に効果的に行うことができる。現在の技術状況では、レビューの最終タイトルをクエリとして、BERTベースのニューラルランキングを使用して文書をランク付けする。しかし、最終タイトルはレビュープロセスの終了時にのみ定式化されるため、このアプローチはポストファクト情報に依存しているため、現実的ではない。スクリーニングの時点では、最終タイトルよりもbertベースのランク付けが著しく悪くなるような大雑把な作業タイトルしか提供されていない。本稿では,ChatGPT や Alpaca のような命令ベースで生成した大規模言語モデルによって生成される文書の検索に使用される Boolean クエリなど,スクリーニングを優先するクエリの代替源を検討する。私たちのベストアプローチは、スクリーニング時に利用可能な情報に基づいて実現されるだけでなく、最終タイトルと同じような効果があります。

Screening prioritisation in medical systematic reviews aims to rank the set of documents retrieved by complex Boolean queries. Prioritising the most important documents ensures that subsequent review steps can be carried out more efficiently and effectively. The current state of the art uses the final title of the review as a query to rank the documents using BERT-based neural rankers. However, the final title is only formulated at the end of the review process, which makes this approach impractical as it relies on ex post facto information. At the time of screening, only a rough working title is available, with which the BERT-based ranker performs significantly worse than with the final title. In this paper, we explore alternative sources of queries for prioritising screening, such as the Boolean query used to retrieve the documents to be screened and queries generated by instruction-based generative large-scale language models such as ChatGPT and Alpaca. Our best approach is not only viable based on the information available at the time of screening, but also has similar effectiveness to the final title.

翻訳日:2023-11-28 03:28:50 公開日:2023-11-23

# SAM3D: ボリューム医療画像におけるセグメンテーションモデル

SAM3D: Segment Anything Model in Volumetric Medical Images ( http://arxiv.org/abs/2309.03493v3 )

ライセンス: Link先を確認

Nhat-Tan Bui and Dinh-Hieu Hoang and Minh-Triet Tran and Gianfranco Doretto and Donald Adjeroh and Brijesh Patel and Arabinda Choudhary and Ngan Le

(参考訳) 画像セグメンテーションは医用画像解析において重要な要素であり、正確な診断のための重要な情報の抽出を支援する。深層学習の出現により、画像の自動分割手法が隆盛し、医療画像の処理において異常な熟練度を示している。 Segment Anything Model (SAM) による動機付け - 2次元の自然画像のセグメンテーションにおける顕著な精度と堅牢な一般化能力で有名な基礎モデルである。我々のSAM3Dモデルは、ボリュームを個別に2次元スライスに変換することでボリュームデータを分割する現在のSAMベース手法とは異なり、統一的なアプローチで全3次元ボリューム画像を処理する。複数の医用画像データセットを用いて大規模な実験を行い, パラメータの面では極めて効率的でありながら, 従来の3次元医用セグメンテーションの手法と比較して, ネットワークが競争力を発揮することを示した。コードとチェックポイントはhttps://github.com/UARK-AICV/SAM3Dで入手できる。

Image segmentation remains a pivotal component in medical image analysis, aiding in the extraction of critical information for precise diagnostic practices. With the advent of deep learning, automated image segmentation methods have risen to prominence, showcasing exceptional proficiency in processing medical imagery. Motivated by the Segment Anything Model (SAM)-a foundational model renowned for its remarkable precision and robust generalization capabilities in segmenting 2D natural images-we introduce SAM3D, an innovative adaptation tailored for 3D volumetric medical image analysis. Unlike current SAM-based methods that segment volumetric data by converting the volume into separate 2D slices for individual analysis, our SAM3D model processes the entire 3D volume image in a unified approach. Extensive experiments are conducted on multiple medical image datasets to demonstrate that our network attains competitive results compared with other state-of-the-art methods in 3D medical segmentation tasks while being significantly efficient in terms of parameters. Code and checkpoints are available at https://github.com/UARK-AICV/SAM3D.

翻訳日:2023-11-28 03:28:10 公開日:2023-11-23

# 事前学習モデルがジャストインタイム欠陥予測に及ぼす影響に関する研究

A study on the impact of pre-trained model on Just-In-Time defect prediction ( http://arxiv.org/abs/2309.02317v2 )

ライセンス: Link先を確認

Yuxiang Guo, Xiaopeng Gao, Zhenyu Zhang, W.K.Chan and Bo Jiang

(参考訳) JIT(Just-In-Time)欠陥予測タスクを実行する以前の研究者は、主に、トレーニング済みモデルとトレーニング済みモデルの関係をバックボーンとして調べることなく、トレーニング済みモデルの個々のパフォーマンスに焦点を当ててきた。本研究では,RoBERTaJIT,CodeBERTJIT,BARTJIT,PLBARTJIT,GPT2JIT,CodeGPTJITの6つのモデルを構築する。これらのモデルの違いと関係を体系的に検討する。具体的には、コミットコードとコミットメッセージを入力として使用する際のモデルの性能と、これらの6つのモデル間のトレーニング効率とモデル分布の関係について検討する。さらに,入力に対する各モデルの感度を調べるため,アブレーション実験を行った。さらに,ゼロショットと少数ショットのシナリオでモデルがどのように機能するかを検討する。以上の結果から, 異なるバックボーンに基づく各モデルでは改善が見られ, バックボーンの事前学習モデルが類似している場合には, 使用すべきトレーニングリソースがはるかに近いことが示唆された。我々はまた、Commitコードが欠陥検出において重要な役割を果たすことを観察し、様々な事前訓練されたモデルが、数ショットのシナリオ下でバランスの取れたデータセットでより良い欠陥検出能力を示す。これらの結果は、事前学習したモデルを用いてjit欠陥予測タスクを最適化するための新しい洞察を与え、これらのモデルを構築する際により注意を要する要因を強調する。さらに、CodeGPTJITとGPT2JITは、2000のトレーニングサンプルでそれぞれ2つのデータセットでDeepJITとCC2Vecよりも優れたパフォーマンスを達成した。これらの結果は,JIT欠陥予測タスク,特に限られたトレーニングデータを持つシナリオにおいて,トランスフォーマーに基づく事前学習モデルの有効性を強調した。

Previous researchers conducting Just-In-Time (JIT) defect prediction tasks have primarily focused on the performance of individual pre-trained models, without exploring the relationship between different pre-trained models as backbones. In this study, we build six models: RoBERTaJIT, CodeBERTJIT, BARTJIT, PLBARTJIT, GPT2JIT, and CodeGPTJIT, each with a distinct pre-trained model as its backbone. We systematically explore the differences and connections between these models. Specifically, we investigate the performance of the models when using Commit code and Commit message as inputs, as well as the relationship between training efficiency and model distribution among these six models. Additionally, we conduct an ablation experiment to explore the sensitivity of each model to inputs. Furthermore, we investigate how the models perform in zero-shot and few-shot scenarios. Our findings indicate that each model based on different backbones shows improvements, and when the backbone's pre-training model is similar, the training resources that need to be consumed are much more closer. We also observe that Commit code plays a significant role in defect detection, and different pre-trained models demonstrate better defect detection ability with a balanced dataset under few-shot scenarios. These results provide new insights for optimizing JIT defect prediction tasks using pre-trained models and highlight the factors that require more attention when constructing such models. Additionally, CodeGPTJIT and GPT2JIT achieved better performance than DeepJIT and CC2Vec on the two datasets respectively under 2000 training samples. These findings emphasize the effectiveness of transformer-based pre-trained models in JIT defect prediction tasks, especially in scenarios with limited training data.

翻訳日:2023-11-28 03:27:51 公開日:2023-11-23

# KubernetesからKnactorへ - サービス構成に関するデータ中心の再考

From Kubernetes to Knactor: A Data-Centric Rethink of Service Composition ( http://arxiv.org/abs/2309.01805v2 )

ライセンス: Link先を確認

Silvery Fu, Hong Zhang, Ryan Teoh, Taras Priadka, Sylvia Ratnasamy

(参考訳) マイクロサービスは現代のアプリケーションでますます使われており、効果的なサービス構成ソリューションの必要性が高まっている。しかし、従来のapi中心の構成メカニズム(rpc、rest、pub/subなど)がマイクロサービスのモジュラリティを阻害していると主張する。これらのメカニズムは、厳格なコードレベルの結合、分散コンポジションロジック、およびサービス間のデータ交換への可視性を妨げる。最終的にこれらの制限は、マイクロサービスベースのアプリケーションのメンテナンスと進化を複雑にする。そこで,我々はサービス構成の再考と,マイクロサービスの提供するモジュール性を取り戻すためのデータ中心の構成フレームワークである knactor を提案する。 Knactorはサービス構成をサービス開発から切り離し、コンポジションを複数のサービス間で明示的なデータ交換として実装することができる。最初のケーススタディでは、Knactorはサービス構成を単純化し、最適化の新しい機会を生み出します。

Microservices are increasingly used in modern applications, leading to a growing need for effective service composition solutions. However, we argue that traditional API-centric composition mechanisms (e.g., RPC, REST, and Pub/Sub) hamper the modularity of microservices. These mechanisms introduce rigid code-level coupling, scatter composition logic, and hinder visibility into cross-service data exchanges. Ultimately, these limitations complicate the maintenance and evolution of microservice-based applications. In response, we propose a rethinking of service composition and present Knactor, a new data-centric composition framework to restore the modularity that microservices were intended to offer. Knactor decouples service composition from service development, allowing composition to be implemented as explicit data exchanges among multiple services. Our initial case study suggests that Knactor simplifies service composition and creates new opportunities for optimizations.

翻訳日:2023-11-28 03:26:57 公開日:2023-11-23

# 量子超曲面に対する物質

Matter relative to quantum hypersurfaces ( http://arxiv.org/abs/2308.12912v2 )

ライセンス: Link先を確認

Philipp A. Hoehn, Andrea Russo, and Alexander R. H. Smith

(参考訳) 時空超曲面を特徴づける追加の埋め込み場を含む拡張位相空間上でのパラメータ化場理論としてスカラー場の標準記述を、スカラー場が記述される相対的に$\mathsf{X}$とする。この理論はディラックの処方によって量子化され、理論の物理的状態は条件付き波動汎関数 $|\psi_\phi[\mathsf{x}]\rangle$ を超曲面 $\mathsf{x}$ に対する場の状態として解釈するために用いられる。この条件波関数は友長=シュウィンガー方程式を満たすことが示され、この拡張されたページ・ウォッタース形式と標準量子場理論の間の形式的同値性を示す。また、関係ディラック可観測性を構築し、物理的ヒルベルト空間の量子非パラメータ化を定義して関係ハイゼンベルク像を導く。さらに,超曲面を量子参照フレームとして扱うことで,古典的・非古典的な超曲面の変化に対する量子フレーム変換を拡張した。これにより、より大きな変換のクラスの下で量子場の変換特性を示し、フレーム依存の粒子生成効果をもたらすことができる。

We explore the canonical description of a scalar field as a parameterized field theory on an extended phase space that includes additional embedding fields that characterize spacetime hypersurfaces $\mathsf{X}$ relative to which the scalar field is described. This theory is quantized via the Dirac prescription and physical states of the theory are used to define conditional wave functionals $|\psi_\phi[\mathsf{X}]\rangle$ interpreted as the state of the field relative to the hypersurface $\mathsf{X}$, thereby extending the Page-Wootters formalism to quantum field theory. It is shown that this conditional wave functional satisfies the Tomonaga-Schwinger equation, thus demonstrating the formal equivalence between this extended Page-Wootters formalism and standard quantum field theory. We also construct relational Dirac observables and define a quantum deparameterization of the physical Hilbert space leading to a relational Heisenberg picture, which are both shown to be unitarily equivalent to the Page-Wootters formalism. Moreover, by treating hypersurfaces as quantum reference frames, we extend recently developed quantum frame transformations to changes between classical and nonclassical hypersurfaces. This allows us to exhibit the transformation properties of a quantum field under a larger class of transformations, which leads to a frame-dependent particle creation effect.

翻訳日:2023-11-28 03:26:07 公開日:2023-11-23

# パラメータ効率の良い微調整でトロイの木馬を攻撃

Fewer is More: Trojan Attacks on Parameter-Efficient Fine-Tuning ( http://arxiv.org/abs/2310.00648v3 )

ライセンス: Link先を確認

Lauren Hong, Ting Wang

(参考訳) パラメータ効率のよい微調整(PEFT)により、事前訓練された言語モデル(PLM)を特定のタスクに効率的に適応させることができる。 PEFTは最小限のパラメータのみをチューニングすることで、完全な微調整に匹敵するパフォーマンスを達成する。しかし、広く使われているにもかかわらず、PEFTのセキュリティ上の意味はほとんど解明されていない。本稿では,PEFTがトロイの木馬攻撃に特有の脆弱性を示すことを示すパイロット実験を行った。具体的には,両レベル最適化による下流適応を考慮した新たな攻撃である PETA について述べる。上層目標がバックドアを PLM に埋め込む一方で,下層目標が PEFT をシミュレートして PLM のタスク固有性能を維持する。様々なダウンストリームタスクやトリガ設計において,攻撃成功率と影響を受けないクリーンさの両方の観点から,PETAの有効性を実証する。両レベル最適化は本質的にはバックドアとPEFTモジュールを「直交」し、PEFT全体を通してバックドアを保持する。この知見に基づいて,PEFT をバックドア PLM の選択層で省略し,これらの層のパラメータのサブセットを解凍する簡単な防御法を探索し,PETA を効果的に中和することを示した。

Parameter-efficient fine-tuning (PEFT) enables efficient adaptation of pre-trained language models (PLMs) to specific tasks. By tuning only a minimal set of (extra) parameters, PEFT achieves performance comparable to full fine-tuning. However, despite its prevalent use, the security implications of PEFT remain largely unexplored. In this paper, we conduct a pilot study revealing that PEFT exhibits unique vulnerability to trojan attacks. Specifically, we present PETA, a novel attack that accounts for downstream adaptation through bilevel optimization: the upper-level objective embeds the backdoor into a PLM while the lower-level objective simulates PEFT to retain the PLM's task-specific performance. With extensive evaluation across a variety of downstream tasks and trigger designs, we demonstrate PETA's effectiveness in terms of both attack success rate and unaffected clean accuracy, even after the victim user performs PEFT over the backdoored PLM using untainted data. Moreover, we empirically provide possible explanations for PETA's efficacy: the bilevel optimization inherently 'orthogonalizes' the backdoor and PEFT modules, thereby retaining the backdoor throughout PEFT. Based on this insight, we explore a simple defense that omits PEFT in selected layers of the backdoored PLM and unfreezes a subset of these layers' parameters, which is shown to effectively neutralize PETA.

翻訳日:2023-11-28 03:17:28 公開日:2023-11-23

# OpenStreetMapから一般目的表現を学習するための都市基盤モデル

City Foundation Models for Learning General Purpose Representations from OpenStreetMap ( http://arxiv.org/abs/2310.00583v2 )

ライセンス: Link先を確認

Pasquale Balsebre, Weiming Huang, Gao Cong, Yi Li

(参考訳) 事前訓練されたファンデーションモデル(PFM)は、幅広い下流タスクで容易に使用できる汎用表現を学習する能力のために、人工知能のパラダイムシフトに取って代わられている。 PFMは自然言語処理やコンピュータビジョンなど様々な分野で採用されているが、地理空間データを扱う能力や都市部の質問に答える能力は依然として限られている。これは、点、セグメント、領域を含む様々なデータ型と、空間的位置、視覚特性、テキスト的アノテーションといった複数の情報モダリティを含む地理空間データの固有不均一性に起因する可能性がある。 Volunteered Geographic Informationイニシアチブの急増と、世界中で自由にアクセスできるOpenStreetMapのようなオープンな地理空間データソースの普及は、このギャップを埋める有望な機会を明らかにしている。そこで本稿では,都市のような地理的地域において基礎モデルを学習するための自己監督型枠組みであるcityfmを提案する。 CityFMはOSMからのオープンデータのみに依存し、空間情報、視覚情報、テキスト情報を組み込んだ異なるタイプのエンティティのマルチモーダル表現を生成する。基礎モデルを用いて生成したエンティティ表現を定性的な観点から分析し,道路,建物,地域レベルの下流タスクを定量的に実験する。その結果を各アプリケーション用に特別に調整したアルゴリズムと比較する。すべての実験において、CityFMはベースラインに匹敵する、あるいは同等のパフォーマンスを達成する。

Pre-trained Foundation Models (PFMs) have ushered in a paradigm-shift in Artificial Intelligence, due to their ability to learn general-purpose representations that can be readily employed in a wide range of downstream tasks. While PFMs have been successfully adopted in various fields such as Natural Language Processing and Computer Vision, their capacity in handling geospatial data and answering urban questions remains limited. This can be attributed to the intrinsic heterogeneity of geospatial data, which encompasses different data types, including points, segments and regions, as well as multiple information modalities, such as a spatial position, visual characteristics and textual annotations. The proliferation of Volunteered Geographic Information initiatives, and the ever-increasing availability of open geospatial data sources, like OpenStreetMap, which is freely accessible globally, unveil a promising opportunity to bridge this gap. In this paper, we present CityFM, a self-supervised framework to train a foundation model within a selected geographical area of interest, such as a city. CityFM relies solely on open data from OSM, and produces multimodal representations of entities of different types, incorporating spatial, visual, and textual information. We analyse the entity representations generated using our foundation models from a qualitative perspective, and conduct quantitative experiments on road, building, and region-level downstream tasks. We compare its results to algorithms tailored specifically for the respective applications. In all the experiments, CityFM achieves performance superior to, or on par with, the baselines.

翻訳日:2023-11-28 03:17:05 公開日:2023-11-23

# YFlows: CPU上のSIMDアーキテクチャを用いた効率的なニューラルネットワーク推論のための体系的データフロー探索とコード生成

YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUs ( http://arxiv.org/abs/2310.00574v3 )

ライセンス: Link先を確認

Cyrus Zhou, Zack Hassman, Ruize Xu, Dhirpal Shah, Vaugnn Richard, Yanjing Li

(参考訳) 我々は、CPU上にニューラルネットワークをデプロイする際の課題に対処し、精度を維持しながら推論時間を最小化することに重点を置いている。本稿では、ニューラルネットワークのデータフロー(すなわち計算順序)を用いて、ヒューリスティック誘導分析とコード生成フレームワークを用いてデータ再利用の機会を探索し、様々な単一命令や複数のデータ(simd)の実装を探索し、最適化されたニューラルネットワークの実行を実現する。その結果、入力と重みの再利用の両方を最大化しつつ、simdレジスタに出力を保持するデータフローは、8ビットニューラルネットワークの最大3倍のスピードアップ、バイナリニューラルネットワークの最大4.8倍のスピードアップを実現し、様々な推論ワークロードにおいて一貫して最高のパフォーマンスをもたらすことがわかった。

We address the challenges associated with deploying neural networks on CPUs, with a particular focus on minimizing inference time while maintaining accuracy. Our novel approach is to use the dataflow (i.e., computation order) of a neural network to explore data reuse opportunities using heuristic-guided analysis and a code generation framework, which enables exploration of various Single Instruction, Multiple Data (SIMD) implementations to achieve optimized neural network execution. Our results demonstrate that the dataflow that keeps outputs in SIMD registers while also maximizing both input and weight reuse consistently yields the best performance for a wide variety of inference workloads, achieving up to 3x speedup for 8-bit neural networks, and up to 4.8x speedup for binary neural networks, respectively, over the optimized implementations of neural networks today.

翻訳日:2023-11-28 03:16:40 公開日:2023-11-23

# プロンプトベースのテスト時間実画像デハジング:新しいパイプライン

Prompt-based test-time real image dehazing: a novel pipeline ( http://arxiv.org/abs/2309.17389v3 )

ライセンス: Link先を確認

Zixuan Chen, Zewei He, Ziqian Lu, Xuecheng Sun, Zhe-Ming Lu

(参考訳) 既存の手法は、よく設計されたトレーニングスキーム(例えば、CycleGAN、事前損失)を探索することで、実世界のハジー画像におけるモデルの一般化能力を向上しようとする。しかし、そのほとんどは満足な結果を得るために非常に複雑な訓練手順が必要である。そこで本研究では,提案手法を用いたプロンプトベーステストタイムデハジング(pttd)と呼ばれる全く新しいテストパイプラインを提案する。 PTTDは、合成データに基づいて訓練された復調モデルを用いて、符号化機能の統計(平均偏差と標準偏差)を微調整することにより、領域ギャップを狭め、実画像の復調性能を高めることができることを実験的に見出した。そこで我々はまず,平均および標準偏差に対する適切な統計的摂動の源である視覚的プロンプトを生成するために,プロンプト生成モジュール(PGM)を適用した。そして,既存のデハージングモデルに特徴適応モジュール(FAM)を用いて,生成したプロンプトのガイダンスを用いて,元の統計量を調整する。なお、PTTDはモデル非依存であり、合成ヘイズクリーンペアで訓練された様々な最先端の脱ハージングモデルを備えることができる。 PTTDは現実のシナリオにおける最先端の脱ハージング手法に対して優れた性能を達成可能であることを示す。 PTTDのソースコードはhttps://github.com/cecret3350/PTTD-Dehazing.comで公開されます。

Existing methods attempt to improve models' generalization ability on real-world hazy images by exploring well-designed training schemes (e.g., CycleGAN, prior loss). However, most of them need very complicated training procedures to achieve satisfactory results. In this work, we present a totally novel testing pipeline called Prompt-based Test-Time Dehazing (PTTD) to help generate visually pleasing results of real-captured hazy images during the inference phase. We experimentally find that given a dehazing model trained on synthetic data, by fine-tuning the statistics (i.e., mean and standard deviation) of encoding features, PTTD is able to narrow the domain gap, boosting the performance of real image dehazing. Accordingly, we first apply a prompt generation module (PGM) to generate a visual prompt, which is the source of appropriate statistical perturbations for mean and standard deviation. And then, we employ the feature adaptation module (FAM) into the existing dehazing models for adjusting the original statistics with the guidance of the generated prompt. Note that, PTTD is model-agnostic and can be equipped with various state-of-the-art dehazing models trained on synthetic hazy-clean pairs. Extensive experimental results demonstrate that our PTTD is flexible meanwhile achieves superior performance against state-of-the-art dehazing methods in real-world scenarios. The source code of our PTTD will be made available at https://github.com/cecret3350/PTTD-Dehazing.

翻訳日:2023-11-28 03:15:50 公開日:2023-11-23

# 表現型概念認識のためのGPTモデルの評価

An evaluation of GPT models for phenotype concept recognition ( http://arxiv.org/abs/2309.17169v2 )

ライセンス: Link先を確認

Tudor Groza, Harry Caufield, Dylan Gration, Gareth Baynam, Melissa A Haendel, Peter N Robinson, Christopher J Mungall and Justin T Reese

(参考訳) 目的: 臨床深部表現型検査と表現型アノテーションは, 稀な疾患の診断と, 稀な疾患の分野での計算学的知識構築において重要な役割を担っている。これらのプロセスは、しばしば人間の表現型オントロジーからのオントロジ概念の使用と、患者のプロファイルや既存の科学文献をキュレートするための表現型概念認識タスク(機械学習手法によって支援される)との併用に依存している。多くのNLPタスクに大規模言語モデル(LLM)を用いることで,ChatGPTを基盤とした最新の生成事前学習トランスフォーマ(GPT)モデルの性能を臨床的表現型および表現型アノテーションのタスクの基盤として検討する。材料と方法: 実験装置は, 各種特異性の7つのプロンプト, 2つのGPTモデル(gpt-3.5-turboとgpt-4.0)および2つの確立された表現型認識のための金標準コーパス, 1つは出版要約とその他の臨床観察を含む。結果: 得られた結果は, 適切な設定で, これらのモデルが芸術的性能の状態を達成できることを示す。ベストランは、数発の学習を用いて、出版物の要約で0.58マクロF1スコア、臨床観察で0.75マクロF1スコアを達成し、前者は最先端の美術品に匹敵し、後者は現在のクラスツールで最高のものを上回った。結論: 結果は有望であるが、結果の非決定論的性質、高いコスト、同じプロンプトと入力を使用して異なる実行間の一致の欠如により、この特定のタスクにこれらのLCMを使用することは困難である。

Objective: Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes rely on using ontology concepts, often from the Human Phenotype Ontology, in conjunction with a phenotype concept recognition task (supported usually by machine learning methods) to curate patient profiles or existing scientific literature. With the significant shift in the use of large language models (LLMs) for most NLP tasks, we examine the performance of the latest Generative Pre-trained Transformer (GPT) models underpinning ChatGPT as a foundation for the tasks of clinical phenotyping and phenotype annotation. Materials and Methods: The experimental setup of the study included seven prompts of various levels of specificity, two GPT models (gpt-3.5-turbo and gpt-4.0) and two established gold standard corpora for phenotype recognition, one consisting of publication abstracts and the other clinical observations. Results: Our results show that, with an appropriate setup, these models can achieve state of the art performance. The best run, using few-shot learning, achieved 0.58 macro F1 score on publication abstracts and 0.75 macro F1 score on clinical observations, the former being comparable with the state of the art, while the latter surpassing the current best in class tool. Conclusion: While the results are promising, the non-deterministic nature of the outcomes, the high cost and the lack of concordance between different runs using the same prompt and input make the use of these LLMs challenging for this particular task.

翻訳日:2023-11-28 03:14:54 公開日:2023-11-23

# n-of-1試験における運動推奨のためのオンライン強化学習エージェントの設計と評価

Designing and evaluating an online reinforcement learning agent for physical exercise recommendations in N-of-1 trials ( http://arxiv.org/abs/2309.14156v2 )

ライセンス: Link先を確認

Dominik Meier, Ipek Ensari, Stefan Konigorski

(参考訳) パーソナライズされた適応型介入は患者の利益を高める機会を提供するが、計画と実施には課題がある。一旦実施すると、パーソナライズされた適応的介入が、固定金標準介入よりも臨床的に効果的であるかどうかが重要な問題である。本稿では,オンライン強化学習エージェントによるパーソナライズされた介入の実装が実現可能か,有効かを検証した,革新的なN-of-1トライアルデザインを提案する。本研究は, 子宮内膜症の痛みを軽減するために, エクササイズレコメンデーションに関する新しい研究を用いている。本稿では,文脈的包括的推薦エージェントの設計とシミュレーション研究における評価について述べる。その結果、まず、オンライン強化学習エージェントによるパーソナライズされた介入を実装することは可能であった。第二に、適応的介入は、わずかな観察しか得られなくても、患者の利益を改善する可能性がある。課題のひとつは、設計と実装プロセスに複雑さを加えることです。期待される利益を定量化するためには、過去の介入研究のデータが必要である。アプローチは他の介入や臨床介入に移行できるものと期待している。

Personalized adaptive interventions offer the opportunity to increase patient benefits, however, there are challenges in their planning and implementation. Once implemented, it is an important question whether personalized adaptive interventions are indeed clinically more effective compared to a fixed gold standard intervention. In this paper, we present an innovative N-of-1 trial study design testing whether implementing a personalized intervention by an online reinforcement learning agent is feasible and effective. Throughout, we use a new study on physical exercise recommendations to reduce pain in endometriosis for illustration. We describe the design of a contextual bandit recommendation agent and evaluate the agent in simulation studies. The results show that, first, implementing a personalized intervention by an online reinforcement learning agent is feasible. Second, such adaptive interventions have the potential to improve patients' benefits even if only few observations are available. As one challenge, they add complexity to the design and implementation process. In order to quantify the expected benefit, data from previous interventional studies is required. We expect our approach to be transferable to other interventions and clinical interventions.

翻訳日:2023-11-28 03:13:27 公開日:2023-11-23

# Associative Transformerはスパース表現学習者

Associative Transformer Is A Sparse Representation Learner ( http://arxiv.org/abs/2309.12862v2 )

ライセンス: Link先を確認

Yuwei Sun, Hideya Ochiai, Zhirong Wu, Stephen Lin, Ryota Kanai

(参考訳) 従来のトランスフォーマーモデルのモノリシックなペアワイズアテンション機構から生まれ、生物学的原理とより密接に一致する疎結合な相互作用を活用することへの関心が高まっている。セットトランスやパーセプタを含むアプローチでは、潜在空間とクロスアテンションが統合され、限られた容量で注意のボトルネックとなる。近年のグローバルワークスペース理論と連想記憶の神経科学研究に基づいて,AiT(Associative Transformer)を提案する。 AiTは、共有ワークスペースにおけるボトルネックの注意とホップフィールドネットワークの連想メモリ内のアトラクタを導くために、両方の先行として機能する低ランクな明示メモリを誘導する。エンドツーエンドの合同トレーニングを通じて、これらの優先順位はモジュールの特殊化を自然に発展させ、それぞれが注意のボトルネックを形成するために異なる帰納的バイアスをもたらします。ボトルネックは、情報をメモリに書き込む際の入力間の競合を促進する。 AiTはスパース表現学習者であり、入力量や次元に複雑性不変なボトルネックを通じて、異なる事前学習を行う。 AiTは、様々な視覚タスクにおいて、Set Transformer、Vision Transformer、Coordinationなどのメソッドよりも優れていることを示す。

Emerging from the monolithic pairwise attention mechanism in conventional Transformer models, there is a growing interest in leveraging sparse interactions that align more closely with biological principles. Approaches including the Set Transformer and the Perceiver employ cross-attention consolidated with a latent space that forms an attention bottleneck with limited capacity. Building upon recent neuroscience studies of Global Workspace Theory and associative memory, we propose the Associative Transformer (AiT). AiT induces low-rank explicit memory that serves as both priors to guide bottleneck attention in the shared workspace and attractors within associative memory of a Hopfield network. Through joint end-to-end training, these priors naturally develop module specialization, each contributing a distinct inductive bias to form attention bottlenecks. A bottleneck can foster competition among inputs for writing information into the memory. We show that AiT is a sparse representation learner, learning distinct priors through the bottlenecks that are complexity-invariant to input quantities and dimensions. AiT demonstrates its superiority over methods such as the Set Transformer, Vision Transformer, and Coordination in various vision tasks.

翻訳日:2023-11-28 03:12:48 公開日:2023-11-23

# XAIの公正性に関する批判的調査

A Critical Survey on Fairness Benefits of XAI ( http://arxiv.org/abs/2310.13007v4 )

ライセンス: Link先を確認

Luca Deck, Jakob Schoeffer, Maria De-Arteaga, Niklas K\"uhl

(参考訳) 本稿では,説明可能なai(xai)と公平性の関係に関する典型的な主張を分析し,これら2つの概念間の多次元関係を解消する。体系的な文献レビューとその後の質的内容分析に基づいて,XAIの公正性に関する175論文から7つの古文書を抽出した。我々はこれらの主張に関して重要な注意事項を提示し、特定の公正なデシダラタに対するXAIの可能性と限界に関する今後の議論のエントリポイントを提供する。文献では、XAIがいくつかのフェアネス・デシダラタの有効性を示すことが多いが、これらのデシダラタとXAIの能力の相違に気付く。我々は,XAIを,アルゴリズムフェアネスの多次元社会技術的課題にアプローチするための多くのツールの1つとして捉え,どのようなXAI手法がどのフェアネス・デシディラトゥムに対処できるかを正確に示すことを推奨する。

In this critical survey, we analyze typical claims on the relationship between explainable AI (XAI) and fairness to disentangle the multidimensional relationship between these two concepts. Based on a systematic literature review and a subsequent qualitative content analysis, we identify seven archetypal claims from 175 papers on the alleged fairness benefits of XAI. We present crucial caveats with respect to these claims and provide an entry point for future discussions around the potentials and limitations of XAI for specific fairness desiderata. While the literature often suggests XAI to be an enabler for several fairness desiderata, we notice a divide between these desiderata and the capabilities of XAI. We encourage to conceive XAI as one of many tools to approach the multidimensional, sociotechnical challenge of algorithmic fairness and to be more specific about how exactly what kind of XAI method enables whom to address which fairness desideratum.

翻訳日:2023-11-28 03:06:13 公開日:2023-11-23

# 感度を意識したベイズ推定

Sensitivity-Aware Amortized Bayesian Inference ( http://arxiv.org/abs/2310.11122v3 )

ライセンス: Link先を確認

Lasse Elsem\"uller, Hans Olischl\"ager, Marvin Schmitt, Paul-Christian B\"urkner, Ullrich K\"othe, Stefan T. Radev

(参考訳) ベイズ推論は不確実性の下で確率的推論と決定を行うための強力なフレームワークである。現代のベイズワークフローの基本的選択は、可能性関数と事前分布、後部近似器、およびデータに関するものである。各選択はモデルに基づく推論とその後の決定に大きく影響し、感度分析を必要とする。本研究では,無形ベイズ推論(abi,すなわちニューラルネットワークを用いたシミュレーションベース推論)に感度解析を統合するための多面的手法を提案する。まず,計算オーバーヘッドを最小に抑えながら,学習プロセスにおける代替可能性と事前仕様との間の構造的類似性を符号化するために,重みの共有を利用する。第2に,ニューラルネットワークの迅速な推論を利用して,様々なデータ摂動や前処理に対する感度を評価する。他のほとんどのベイズ的アプローチとは対照的に、どちらのステップも、確率、事前、データセットの選択ごとにモデルを再フィッティングするコストのかかるボトルネックを回避する。最後に,ニューラルネットワークアンサンブルを用いて,未知データに対する信頼できない近似による結果のばらつきを評価することを提案する。本稿では,本手法の応用モデリング問題における有効性を示す。疫病の発生動態と地球温暖化閾値の推定から,人為的意思決定モデルの比較まで。実験では,モデル選択と推論的帰結の間の隠れた関係を効果的に明らかにする手法を示す。

Bayesian inference is a powerful framework for making probabilistic inferences and decisions under uncertainty. Fundamental choices in modern Bayesian workflows concern the specification of the likelihood function and prior distributions, the posterior approximator, and the data. Each choice can significantly influence model-based inference and subsequent decisions, thereby necessitating sensitivity analysis. In this work, we propose a multifaceted approach to integrate sensitivity analyses into amortized Bayesian inference (ABI, i.e., simulation-based inference with neural networks). First, we utilize weight sharing to encode the structural similarities between alternative likelihood and prior specifications in the training process with minimal computational overhead. Second, we leverage the rapid inference of neural networks to assess sensitivity to various data perturbations or pre-processing procedures. In contrast to most other Bayesian approaches, both steps circumvent the costly bottleneck of refitting the model(s) for each choice of likelihood, prior, or dataset. Finally, we propose to use neural network ensembles to evaluate variation in results induced by unreliable approximation on unseen data. We demonstrate the effectiveness of our method in applied modeling problems, ranging from the estimation of disease outbreak dynamics and global warming thresholds to the comparison of human decision-making models. Our experiments showcase how our approach enables practitioners to effectively unveil hidden relationships between modeling choices and inferential conclusions.

翻訳日:2023-11-28 03:05:57 公開日:2023-11-23

# 深層学習に基づく空間依存音響特性の回復

Deep Learning based Spatially Dependent Acoustical Properties Recovery ( http://arxiv.org/abs/2310.10970v2 )

ライセンス: Link先を確認

Ruixian Liu, Peter Gerstoft

(参考訳) 物理インフォームドニューラルネットワーク(PINN)は、物理測定から直接空間領域全体を通して一定である偏微分方程式(PDE)係数を回復することができる。本研究では,単一のニューラルネットワークを用いて空間依存型pdesにおける係数の回復を可能にする空間依存物理形ニューラルネットワーク(sd-pinn)を提案する。 sdピンを空間依存波動方程式係数復元に適用し,不均質媒質中の音響特性の空間分布を明らかにする。提案手法は、推定されたPDEが満たさなければならない物理的制約に対する損失関数の組み入れによる雑音に対する堅牢性を示す。空間的に2次元のPDEの係数回復のために、PDE係数は興味のある2次元領域のすべての位置を行列に格納し、そのような行列に対する低ランクの仮定を組み込んで、測定できない場所で係数を復元する。

The physics-informed neural network (PINN) is capable of recovering partial differential equation (PDE) coefficients that remain constant throughout the spatial domain directly from physical measurements. In this work, we propose a spatially dependent physics-informed neural network (SD-PINN), which enables the recovery of coefficients in spatially-dependent PDEs using a single neural network, eliminating the requirement for domain-specific physical expertise. We apply the SD-PINN to spatially-dependent wave equation coefficients recovery to reveal the spatial distribution of acoustical properties in the inhomogeneous medium. The proposed method exhibits robustness to noise owing to the incorporation of a loss function for the physical constraint that the assumed PDE must be satisfied. For the coefficients recovery of spatially two-dimensional PDEs, we store the PDE coefficients at all locations in the 2D region of interest into a matrix and incorporate the low-rank assumption for such a matrix to recover the coefficients at locations without available measurements.

翻訳日:2023-11-28 03:05:35 公開日:2023-11-23

# ダイバーAIスーパービジョンの原理による探索

Exploration with Principles for Diverse AI Supervision ( http://arxiv.org/abs/2310.08899v2 )

ライセンス: Link先を確認

Hao Liu, Matei Zaharia, Pieter Abbeel

(参考訳) 次世代の予測を用いた大規模トランスフォーマーのトレーニングは、AIの画期的な進歩をもたらした。この生成AIアプローチは印象的な結果をもたらしたが、人間の監督に大きく依存している。 ChatGPTのような最先端のAIモデルでさえ、人間のデモを通じて微調整を行い、人間の入力とドメインの専門知識を必要とする。この人間の監視への強い依存は、AIイノベーションの進歩に大きなハードルとなる。この制限に対処するために,我々は,高品質なトレーニングデータの自動生成を目的とした新しいパラダイムであるexploratory ai(eai)を提案する。教師なし強化学習(RL)プレトレーニングからインスピレーションを得たEAIは、自然言語空間内での探索を実現する。我々は,生成されたコンテンツの新規性を評価するために,大規模言語モデルを用いてこれを実現する。このアプローチでは,探索原理に従って新たなコンテンツを生成するアクタと,生成したコンテンツを評価する批評家の2つの重要なコンポーネントを用いて,アクタを導くための批判を提供する。実証的な評価は、EAIが複雑な推論タスクにおけるモデルパフォーマンスを著しく向上させ、人間集約的な監督の限界に対処することを示している。

Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI. While this generative AI approach has produced impressive results, it heavily leans on human supervision. Even state-of-the-art AI models like ChatGPT depend on fine-tuning through human demonstrations, demanding extensive human input and domain expertise. This strong reliance on human oversight poses a significant hurdle to the advancement of AI innovation. To address this limitation, we propose a novel paradigm termed Exploratory AI (EAI) aimed at autonomously generating high-quality training data. Drawing inspiration from unsupervised reinforcement learning (RL) pretraining, EAI achieves exploration within the natural language space. We accomplish this by harnessing large language models to assess the novelty of generated content. Our approach employs two key components: an actor that generates novel content following exploration principles and a critic that evaluates the generated content, offering critiques to guide the actor. Empirical evaluations demonstrate that EAI significantly boosts model performance on complex reasoning tasks, addressing the limitations of human-intensive supervision.

翻訳日:2023-11-28 03:04:55 公開日:2023-11-23

# 量子コヒーレンスにおける超伝導材料の不規則性の役割の解明

Unraveling the role of disorderness in superconducting materials on qubit coherence ( http://arxiv.org/abs/2310.06621v2 )

ライセンス: Link先を確認

Ran Gao, Feng Wu, Hantao Sun, Jianjun Chen, Hao Deng, Xizheng Ma, Xiaohe Miao, Zhijun Song, Xin Wan, Fei Wang, Tian Xia, Make Ying, Chao Zhang, Yaoyun Shi, Hui-Hai Zhao, Chunqing Deng

(参考訳) 超伝導材料の障害導入は、電磁インピーダンスの向上と耐雑音性超伝導量子ビットの実現に期待されている。多くの先駆的な実装にもかかわらず、物質障害とキュービットコヒーレンスとの相関の理解はまだ発展途上である。ここでは, チタン-窒化アルミニウム-窒化チタン製スーパーインダクタを用いたフラクソニウム量子ビットの最初の, 体系的特性を示す。クビット雑音スペクトルから、コヒーレンス特性の指標としてフラックスノイズと誘電損失を抽出する。その結果, 1/f$のフラックスノイズはフラックスフラストレーション点付近のクビットのデコヒーレンスを支配しており, 誘電体損失は幅広い材料特性下では低いが, 材料障害と強く相関していることがわかった。フラックスノイズ振幅から, 現象的スピン欠陥のアラル密度(\sigma$)と材料障害は, $\sigma \propto \rho_{xx}^3$, あるいは有効$(k_F l)^{-3}$とほぼ相関していることがわかった。この研究は超伝導体内のデコヒーレンスチャネルの起源に関する新たな洞察を与え、材料設計と最適化のための有用なガイドラインとして役立った。

Introducing disorderness in the superconducting materials has been considered promising to enhance the electromagnetic impedance and realize noise-resilient superconducting qubits. Despite a number of pioneering implementations, the understanding of the correlation between the material disorderness and the qubit coherence is still developing. Here, we demonstrate the first and a systematic characterization of fluxonium qubits with the superinductors made from titanium-aluminum-nitride with varied disorderness. From qubit noise spectroscopy, the flux noise and the dielectric loss are extracted as a measure of the coherence properties. Our results reveal that the $1/f$ flux noise dominates the qubit decoherence around the flux-frustration point, strongly correlated with the material disorderness; while the dielectric loss remains low under a wide range of material properties. From the flux-noise amplitudes, the areal density ($\sigma$) of the phenomenological spin defects and material disorderness are found to be approximately correlated by $\sigma \propto \rho_{xx}^3$, or effectively $(k_F l)^{-3}$. This work has provided new insights on the origin of decoherence channels within superconductors, and could serve as a useful guideline for material design and optimization.

翻訳日:2023-11-28 03:04:07 公開日:2023-11-23

# CoT3DRef:データ効率のよい3Dビジュアルグラウンド

CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding ( http://arxiv.org/abs/2310.06214v2 )

ライセンス: Link先を確認

Eslam Mohamed Bakr, Mohamed Ayman, Mahmoud Ahmed, Habib Slim, Mohamed Elhoseiny

(参考訳) 3Dビジュアルグラウンドティングは、発話によって条件付けられた3Dシーンでオブジェクトをローカライズする機能である。既存のほとんどのメソッドは参照ヘッドを使って参照オブジェクトを直接ローカライズし、複雑なシナリオで失敗する。さらに、ネットワークが最終決定に達する方法や理由も説明されていない。本稿では,人間の知覚システムを模倣する可能性を秘めた,解釈可能な3次元視覚接地フレームワークを設計できるのか? . この目的のために、まずアンカーの連鎖と最終ターゲットを予測することによって、シーケンス・ツー・シーケンスタスクとして3次元視覚接地問題を定式化する。解釈性は全体的なパフォーマンスを向上させるだけでなく、障害事例の特定にも役立ちます。思考の連鎖に従えば、参照タスクを解釈可能な中間ステップに分解し、パフォーマンスを高め、フレームワークを極めてデータ効率良くすることができる。さらに,提案するフレームワークは既存のアーキテクチャに容易に組み込むことができる。我々は,Nr3D,Sr3D,Scanreferベンチマークの総合的な実験を通じてアプローチを検証するとともに,手動のアノテートデータを必要としない既存手法と比較して一貫した性能向上を示す。さらに,提案フレームワークであるcot3drefはデータ効率が著しく向上するが,sr3dデータセットでは10%のデータしかトレーニングしない場合,データ全体のsata性能と一致している。

3D visual grounding is the ability to localize objects in 3D scenes conditioned by utterances. Most existing methods devote the referring head to localize the referred object directly, causing failure in complex scenarios. In addition, it does not illustrate how and why the network reaches the final decision. In this paper, we address this question Can we design an interpretable 3D visual grounding framework that has the potential to mimic the human perception system?. To this end, we formulate the 3D visual grounding problem as a sequence-to-sequence task by first predicting a chain of anchors and then the final target. Interpretability not only improves the overall performance but also helps us identify failure cases. Following the chain of thoughts approach enables us to decompose the referring task into interpretable intermediate steps, boosting the performance and making our framework extremely data-efficient. Moreover, our proposed framework can be easily integrated into any existing architecture. We validate our approach through comprehensive experiments on the Nr3D, Sr3D, and Scanrefer benchmarks and show consistent performance gains compared to existing methods without requiring manually annotated data. Furthermore, our proposed framework, dubbed CoT3DRef, is significantly data-efficient, whereas on the Sr3D dataset, when trained only on 10% of the data, we match the SOTA performance that trained on the entire data.

翻訳日:2023-11-28 03:03:42 公開日:2023-11-23

# Lyapunovの予測通り、ライオンは秘密裏に最適化する

Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts ( http://arxiv.org/abs/2310.05898v3 )

ライセンス: Link先を確認

Lizhang Chen, Bo Liu, Kaizhao Liang, Qiang Liu

(参考訳) プログラム検索を通じて発見された新しいオプティマイザであるLion(Evolved Sign Momentum)は、大規模なAIモデルのトレーニングにおいて有望な結果を示している。 AdamWと同等か好意的に動作するが、メモリ効率は高い。ランダム探索プログラムの結果から想像できるように、lionは、符号付き運動量、デカップリングされた重みの減衰、polak、ネステロフ運動量を含む、いくつかの既存のアルゴリズムの要素を組み込んでいるが、理論上既定のオプティマイザのどのカテゴリにも当てはまらない。したがって、ライオンは幅広いタスクの汎用最適化器として機能するように見えるが、理論的根拠は定かではない。この理論的明快さの欠如は、ライオンの有効性をさらに強化し拡大する機会を制限している。この作品はライオンを軽蔑することを目的としている。連続時間解析と離散時間解析の両方に基づき、Lion は一般損失関数 $f(x)$ を最小化し、有界制約 $\|x\|_\infty \leq 1/\lambda$ を強制する理論的および原理的アプローチであることを示した。ライオンはこれをデカップリングウェイト崩壊の包含によって達成し、$\lambda$はウェイト崩壊係数を表す。我々の分析はライオン更新のための新しいリアプノフ関数の開発によって可能である。これは、Lion-$\kappa$アルゴリズムのより広範なファミリーに適用され、Lionの$\text{sign}(\cdot)$演算子は凸関数 $\kappa$ の次数に置き換えられ、一般的な合成最適化問題である $\min_x f(x) + \kappa^*(x)$ の解となる。我々の発見はライオンのダイナミクスに関する貴重な洞察を与え、ライオン関連アルゴリズムのさらなる改良と拡張の道を開く。

Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $\|x\|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^*(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.

翻訳日:2023-11-28 03:02:37 公開日:2023-11-23

# FlexTrain: 異種デバイス環境のための動的トレーニングフレームワーク

FlexTrain: A Dynamic Training Framework for Heterogeneous Devices Environments ( http://arxiv.org/abs/2310.20457v2 )

ライセンス: Link先を確認

Mert Unsal, Ali Maatouk, Antonio De Domenico, Nicola Piovesan, Fadhel Ayed

(参考訳) ディープラーニングモデルが大きくなるにつれて、異種デバイス環境において大きな課題が生じる。ディープラーニングモデルのサイズは、低消費電力またはリソース制約のデバイスにそれらをデプロイすることを難しくし、長い推論時間と高エネルギー消費をもたらす。これらの課題に対処するため、トレーニング期間中に異なるデバイスで利用可能な多様なストレージと計算資源に対応するフレームワークFlexTrainを提案する。 FlexTrainは、デバイス制約を尊重し、通信コストを最小化し、多様なデバイスとのシームレスな統合を確保しながら、ディープラーニングモデルの効率的なデプロイを可能にする。 flextrainをトレーニングした単一のグローバルモデルをヘテロジニアスデバイスに簡単にデプロイでき、トレーニング時間とエネルギー消費を節約できるcifar-100データセット上でflextrainの有効性を実証する。また、FlexTrainをフェデレーション学習環境に拡張し、CIFAR-10およびCIFAR-100データセットの標準フェデレーション学習ベンチマークよりも優れていることを示す。

As deep learning models become increasingly large, they pose significant challenges in heterogeneous devices environments. The size of deep learning models makes it difficult to deploy them on low-power or resource-constrained devices, leading to long inference times and high energy consumption. To address these challenges, we propose FlexTrain, a framework that accommodates the diverse storage and computational resources available on different devices during the training phase. FlexTrain enables efficient deployment of deep learning models, while respecting device constraints, minimizing communication costs, and ensuring seamless integration with diverse devices. We demonstrate the effectiveness of FlexTrain on the CIFAR-100 dataset, where a single global model trained with FlexTrain can be easily deployed on heterogeneous devices, saving training time and energy consumption. We also extend FlexTrain to the federated learning setting, showing that our approach outperforms standard federated learning benchmarks on both CIFAR-10 and CIFAR-100 datasets.

翻訳日:2023-11-28 02:53:13 公開日:2023-11-23

# ChatGPTはソフトウェアテストインテリジェンスを前進させることができるか? 変成試験の経験報告

Can ChatGPT advance software testing intelligence? An experience report on metamorphic testing ( http://arxiv.org/abs/2310.19204v2 )

ライセンス: Link先を確認

Quang-Hung Luu, Huai Liu, and Tsong Yueh Chen

(参考訳) ChatGPTは人間の質問に答えるために使われている人工知能チャットボットとしてよく知られているが、ソフトウェアテストの進歩の可能性を見出したいかもしれない。本稿では,最新のソフトウェアテスト技術であるメタモルフィックテスト(MT)のケーススタディを通じて,ソフトウェアテストのインテリジェンス向上におけるChatGPTの有効性を検討する。私たちはchatgptに、基本的にはオブジェクトプログラムに必要な特性であり、伝統的に人間の知性を必要とするメタモーフィックリレーション(mrs)の候補を生成するように依頼します。これらのMR候補は、ドメインの専門家による正確性の観点から評価される。複数のソフトウェアシステムをテストするために、chatgptを新しい正しいmrsを生成するために使用できることを示す。とはいえ、MR候補の大多数は曖昧に定義されているか、正しく定義されていないか、特にMTでテストされたことのないシステムで定義されている。ChatGPTは、後にテストを実施するために採用されるMR候補を提案することで、ソフトウェアテストインテリジェンスを促進するために使用できる。

While ChatGPT is a well-known artificial intelligence chatbot being used to answer human's questions, one may want to discover its potential in advancing software testing. We examine the capability of ChatGPT in advancing the intelligence of software testing through a case study on metamorphic testing (MT), a state-of-the-art software testing technique. We ask ChatGPT to generate candidates of metamorphic relations (MRs), which are basically necessary properties of the object program and which traditionally require human intelligence to identify. These MR candidates are then evaluated in terms of correctness by domain experts. We show that ChatGPT can be used to generate new correct MRs to test several software systems. Having said that, the majority of MR candidates are either defined vaguely or incorrect, especially for systems that have never been tested with MT. ChatGPT can be used to advance software testing intelligence by proposing MR candidates that can be later adopted for implementing tests; but human intelligence should still inevitably be involved to justify and rectify their correctness.

翻訳日:2023-11-28 02:52:28 公開日:2023-11-23

# 混合前駆体を用いたベイズ予測型共変量調整

Bayesian Prognostic Covariate Adjustment With Additive Mixture Priors ( http://arxiv.org/abs/2310.18027v3 )

ライセンス: Link先を確認

Alyssa M. Vanderbeek and Arman Sabbaghi and Jon R. Walsh and Charles K. Fisher

(参考訳) ランダム化対照試験(rcts)による効果的かつ迅速な意思決定には、偏りなく正確な治療効果推論が必要である。この要求に対処する2つの戦略は、結果と高い相関関係を持つ共変分を調整し、ベイズの定理を通じて歴史的制御情報を活用することである。我々は,これら2つの戦略を組み合わせた新たなベイズ予測型共変量調整手法であるベイズプロコバを提案する。ベイジアン ProCOVA における共変量調整は、RCT 参加者のためのデジタルツインジェネレータ (DTG) を構築する生成人工知能 (AI) アルゴリズムに基づいている。 DTGは、履歴制御データに基づいてトレーニングされ、制御処理により各RTT参加者の結果に対してデジタルツイン(DT)確率分布を生成する。 DT分布の予測は、確率的スコアと呼ばれ、調整のための共変量を定義する。履歴制御情報は、履歴制御データに基づいて指定された情報的事前確率分布と、弱情報的事前確率分布の2つの成分とを予め添加混合して活用される。混合重みは、下位の推論が情報成分から引き出される程度を、弱い情報成分に対して決定する。この重量も事前分布を持つため、前の添加剤の混合物はRCT情報を含まない状態で完全に特定可能である。ベイジアン・プロコバにおいて,後方分布からサンプリングするための効率的なgibbsアルゴリズムを確立し,後平均と治療効果パラメータ条件のばらつきに対する閉形式表現を導出する。異なる相違性を含むシミュレーション研究において,ベイジアン ProCOVA の効率向上を,そのバイアス制御と分散低減により評価した。これらの利得はより小さなRDTに変換される。

Effective and rapid decision-making from randomized controlled trials (RCTs) requires unbiased and precise treatment effect inferences. Two strategies to address this requirement are to adjust for covariates that are highly correlated with the outcome, and to leverage historical control information via Bayes' theorem. We propose a new Bayesian prognostic covariate adjustment methodology, referred to as Bayesian PROCOVA, that combines these two strategies. Covariate adjustment in Bayesian PROCOVA is based on generative artificial intelligence (AI) algorithms that construct a digital twin generator (DTG) for RCT participants. The DTG is trained on historical control data and yields a digital twin (DT) probability distribution for each RCT participant's outcome under the control treatment. The expectation of the DT distribution, referred to as the prognostic score, defines the covariate for adjustment. Historical control information is leveraged via an additive mixture prior with two components: an informative prior probability distribution specified based on historical control data, and a weakly informative prior distribution. The mixture weight determines the extent to which posterior inferences are drawn from the informative component, versus the weakly informative component. This weight has a prior distribution as well, and so the entire additive mixture prior is completely pre-specifiable without involving any RCT information. We establish an efficient Gibbs algorithm for sampling from the posterior distribution, and derive closed-form expressions for the posterior mean and variance of the treatment effect parameter conditional on the weight, in Bayesian PROCOVA. We evaluate efficiency gains of Bayesian PROCOVA via its bias control and variance reduction compared to frequentist PROCOVA in simulation studies that encompass different discrepancies. These gains translate to smaller RCTs.

翻訳日:2023-11-28 02:50:47 公開日:2023-11-23

# データ駆動交通シミュレーション:総括的レビュー

Data-driven Traffic Simulation: A Comprehensive Review ( http://arxiv.org/abs/2310.15975v2 )

ライセンス: Link先を確認

Di Chen, Meixin Zhu, Hao Yang, Xuesong Wang, Yinhai Wang

(参考訳) 自動運転車(avs)は安全で効率的な交通手段を提供することで社会を大きく変革する可能性を秘めている。近年、自律運転の認識と予測において顕著な進歩が見られるが、AVの性能を検証するという課題はほとんど解決されていない。データ駆動型微視的交通シミュレーションは自動運転テストにとって重要なツールとなった 1) 高忠実度交通データの提供 2)大規模テストとシナリオ再現性の実現のメリット 3)反応的かつ現実的な交通シミュレーションの可能性。しかし、現在このトピックに関する包括的なレビューは欠落している。本稿では,このギャップを埋めるために関連する研究を要約する。本稿の目的は,現在の研究成果を概観し,この分野の今後の発展に資する未来的視点を提供することである。データ駆動トラフィックシミュレーションの一般的な問題を紹介し、重要な概念と用語を概説する。トラヒックシミュレーションを概観した後、一般的に使用される様々なデータセットと評価メトリクスをレビューする。そこで本研究では,模倣学習,強化学習,深層生成学習,深層学習を総合的に評価し,それぞれを要約し,その利点と欠点を詳細に分析する。さらに、最先端、既存の課題、そして将来の研究方向性を評価する。

Autonomous vehicles (AVs) have the potential to significantly revolutionize society by providing a secure and efficient mode of transportation. Recent years have witnessed notable advancements in autonomous driving perception and prediction, but the challenge of validating the performance of AVs remains largely unresolved. Data-driven microscopic traffic simulation has become an important tool for autonomous driving testing due to 1) availability of high-fidelity traffic data; 2) its advantages of enabling large-scale testing and scenario reproducibility; and 3) its potential in reactive and realistic traffic simulation. However, a comprehensive review of this topic is currently lacking. This paper aims to fill this gap by summarizing relevant studies. The primary objective of this paper is to review current research efforts and provide a futuristic perspective that will benefit future developments in the field. It introduces the general issues of data-driven traffic simulation and outlines key concepts and terms. After overviewing traffic simulation, various datasets and evaluation metrics commonly used are reviewed. The paper then offers a comprehensive evaluation of imitation learning, reinforcement learning, deep generative and deep learning methods, summarizing each and analyzing their advantages and disadvantages in detail. Moreover, it evaluates the state-of-the-art, existing challenges, and future research directions.

翻訳日:2023-11-28 02:49:37 公開日:2023-11-23

# コープマン表現の修正コース

Course Correcting Koopman Representations ( http://arxiv.org/abs/2310.15386v2 )

ライセンス: Link先を確認

Mahan Fathi and Clement Gehring and Jonathan Pilault and David Kanaa and Pierre-Luc Bacon and Ross Goroshin

(参考訳) クープマン表現は、潜在空間における線形力学をもたらす非線形力学系(NLDS)の特徴を学習することを目的としている。理論的には、これらの特徴はNLDSのモデリングと制御における多くの問題を単純化するために使用できる。本研究では, この問題のオートエンコーダの定式化と, ダイナミックスをモデル化するための様々な方法, 特に長期水平線上での将来の状態予測について検討する。我々は、潜在空間における将来の状態を予測するいくつかの制限を発見し、長期的ダイナミクスを忠実に捉えるために、周期的再符号化と呼ばれる推論時間機構を提案する。我々は,低次元および高次元NLDSの実験を通して解析的および経験的にこの手法を正当化する。

Koopman representations aim to learn features of nonlinear dynamical systems (NLDS) which lead to linear dynamics in the latent space. Theoretically, such features can be used to simplify many problems in modeling and control of NLDS. In this work we study autoencoder formulations of this problem, and different ways they can be used to model dynamics, specifically for future state prediction over long horizons. We discover several limitations of predicting future states in the latent space and propose an inference-time mechanism, which we refer to as Periodic Reencoding, for faithfully capturing long term dynamics. We justify this method both analytically and empirically via experiments in low and high dimensional NLDS.

翻訳日:2023-11-28 02:49:21 公開日:2023-11-23

# 重み付き関節最大平均差によるマルチソース・マルチターゲット非教師付きドメイン適応障害診断

Weighted Joint Maximum Mean Discrepancy Enabled Multi-Source-Multi-Target Unsupervised Domain Adaptation Fault Diagnosis ( http://arxiv.org/abs/2310.14790v2 )

ライセンス: Link先を確認

Zixuan Wang, Haoran Tang, Haibo Wang, Bo Qin, Mark D. Butala, Weiming Shen, Hongwei Wang

(参考訳) データ駆動型インテリジェント障害診断技術によって達成される顕著な結果にもかかわらず、トレーニングデータとテストデータの同じ分布と十分なラベル付きデータを想定している。様々な運用状態が実用的なシナリオにしばしば存在し、障害診断の有効性を妨げるドメインシフトの問題に繋がる。最近の教師なしドメイン適応法はクロスドメイン障害診断を可能にするが、複数のソースドメインからの情報を効果的に活用し、複数のターゲットドメインにおいて効果的な診断障害を同時に達成することは困難である。本稿では,障害診断の分野では,マルチソース・マルチターゲットシナリオ下でドメイン適応を実現するマルチソース・マルチターゲット・非教師なしドメイン適応(wjmmd-mda)を実現するために,重み付きジョイント最大平均偏差を提案する。提案手法では,複数のラベル付きソースドメインから十分な情報を抽出し,重み付き距離損失を改善することにより,ソースドメインとターゲットドメインのドメインアライメントを実現する。その結果、複数のソースとターゲットドメイン間のドメイン不変性と識別的特徴は、クロスドメイン障害診断により学習される。提案手法の性能を3つのデータセットの総合的な比較実験で評価し,本手法の優位性を実証した。

Despite the remarkable results that can be achieved by data-driven intelligent fault diagnosis techniques, they presuppose the same distribution of training and test data as well as sufficient labeled data. Various operating states often exist in practical scenarios, leading to the problem of domain shift that hinders the effectiveness of fault diagnosis. While recent unsupervised domain adaptation methods enable cross-domain fault diagnosis, they struggle to effectively utilize information from multiple source domains and achieve effective diagnosis faults in multiple target domains simultaneously. In this paper, we innovatively proposed a weighted joint maximum mean discrepancy enabled multi-source-multi-target unsupervised domain adaptation (WJMMD-MDA), which realizes domain adaptation under multi-source-multi-target scenarios in the field of fault diagnosis for the first time. The proposed method extracts sufficient information from multiple labeled source domains and achieves domain alignment between source and target domains through an improved weighted distance loss. As a result, domain-invariant and discriminative features between multiple source and target domains are learned with cross-domain fault diagnosis realized. The performance of the proposed method is evaluated in comprehensive comparative experiments on three datasets, and the experimental results demonstrate the superiority of this method.

翻訳日:2023-11-28 02:48:54 公開日:2023-11-23

# プロサッカー選手の市場価値識別のための説明可能な人工知能モデル

Explainable artificial intelligence model for identifying Market Value in Professional Soccer Players ( http://arxiv.org/abs/2311.04599v2 )

ライセンス: Link先を確認

Chunyang Huang, Shaoliang Zhang

(参考訳) 本研究では,サッカー選手の市場価値を予測するための高度な機械学習手法を提案し,アンサンブルモデルとShapley Additive Explanations (SHAP)を組み合わせて解釈可能とした。 sofifaの約12,000人のプレイヤーのデータを利用して、borutaアルゴリズムは特徴の選択を合理化した。グラディエントブースティング決定木(GBDT)モデルは予測精度に優れ、R-squaredは0.901、Root Mean Squared Error(RMSE)は3,221,632.175である。プレイヤーのスキル、フィットネス、認知領域の属性は市場価値に大きく影響した。これらの洞察は選手の評価においてスポーツ業界のステークホルダーを助ける。しかし、この研究には、スーパースタープレーヤーの値を過小評価したり、より大きなデータセットを必要とするような制限がある。今後の研究の方向性には、モデルの適用性の向上と、さまざまなコンテキストにおける価値予測の探索が含まれる。

This study introduces an advanced machine learning method for predicting soccer players' market values, combining ensemble models and the Shapley Additive Explanations (SHAP) for interpretability. Utilizing data from about 12,000 players from Sofifa, the Boruta algorithm streamlined feature selection. The Gradient Boosting Decision Tree (GBDT) model excelled in predictive accuracy, with an R-squared of 0.901 and a Root Mean Squared Error (RMSE) of 3,221,632.175. Player attributes in skills, fitness, and cognitive areas significantly influenced market value. These insights aid sports industry stakeholders in player valuation. However, the study has limitations, like underestimating superstar players' values and needing larger datasets. Future research directions include enhancing the model's applicability and exploring value prediction in various contexts.

翻訳日:2023-11-28 02:40:32 公開日:2023-11-23

# Big-Meansアルゴリズムの並列化戦略: 効果的なビッグデータクラスタリングのための総合的チュートリアル

Strategies for Parallelizing the Big-Means Algorithm: A Comprehensive Tutorial for Effective Big Data Clustering ( http://arxiv.org/abs/2311.04517v2 )

ライセンス: Link先を確認

Ravil Mussabayev and Rustam Mussabayev

(参考訳) 本研究では,大規模データセットをクラスタリングするためのBig-meansアルゴリズムの最適化に注目し,4つの異なる並列化戦略を探索する。各アプローチの計算効率,スケーラビリティ,クラスタリング性能を評価し,そのメリットと限界を明らかにするため,広範な実験を行った。また,計算効率とクラスタリング品質のトレードオフについても検討し,各種要因の影響について検討した。今回の知見は,利用可能なリソースとデータセット特性に基づく最良並列化戦略の選択に関する実践的ガイダンスを提供し,big-meansアルゴリズムの並列化手法のより深い理解に寄与する。

This study focuses on the optimization of the Big-means algorithm for clustering large-scale datasets, exploring four distinct parallelization strategies. We conducted extensive experiments to assess the computational efficiency, scalability, and clustering performance of each approach, revealing their benefits and limitations. The paper also delves into the trade-offs between computational efficiency and clustering quality, examining the impacts of various factors. Our insights provide practical guidance on selecting the best parallelization strategy based on available resources and dataset characteristics, contributing to a deeper understanding of parallelization techniques for the Big-means algorithm.

翻訳日:2023-11-28 02:40:18 公開日:2023-11-23

# 企業データガバナンスの基盤としての組織知識のセマンティックモデリング 4.0 --統一臨床データモデルへの応用

Semantic Modelling of Organizational Knowledge as a Basis for Enterprise Data Governance 4.0 -- Application to a Unified Clinical Data Model ( http://arxiv.org/abs/2311.02082v3 )

ライセンス: Link先を確認

Miguel AP Oliveira, Stephane Manara, Bruno Mol\'e, Thomas Muller, Aur\'elien Guillouche, Lysann Hesske, Bruce Jordan, Gilles Hubert, Chinmay Kulkarni, Pralipta Jagdev and Cedric R. Berger

(参考訳) 個人や組織は常に増加するデータ量に対応し、その内容や形式は異質である。データの品質とライフサイクルの制御をもたらす適切なデータ管理プロセスは、このデータから価値を取り出し、複数の利用に関する固有のリスクを最小化するための前提条件である。一般的なデータガバナンスフレームワークは、データの圧倒的な複雑さに欠ける人々、ポリシー、プロセスに依存しています。しかし、高品質な標準を達成するためには、この複雑さを活用する必要がある。後者は、このデータに基づいてトレーニングされた生成人工知能を含む、ダウンストリームのデータ使用結果を条件とする。本稿では,メタデータ駆動,アジャイル,(準)自動データガバナンス(すなわちデータガバナンス 4.0)を実現する,シンプルでコスト効率のよいフレームワークを構築した具体的経験を報告する。本稿では,25年間の臨床研究データを企業規模で完全に生産的な環境で統合する方法について説明する。このフレームワークはセマンティックウェブの原則を利用する方法論と技術の両方を含んでいる。ガバナンスの原則を含む、ビジネスコンテキストにおけるデータ資産のアバターを記述する知識グラフを構築しました。エンタープライズ上のオントロジーによって記述された複数のオントロジーは、FAIRification、ライフサイクル管理、役割と責任の定義、トランスフォーメーション間の血統、ソースコードからの証明といった重要なガバナンスのアクションを可能にします。このメタデータモデルは、ビジネスコンテキストをアジャイルな方法で考慮し、各ユースケースにガバナンスの制約を適用し、ビジネスの変化に基づいて動的に調整する、半自動的なデータ管理プロセスであるdata governance 4.0の鍵となるものです。

Individuals and organizations cope with an always-growing amount of data, which is heterogeneous in its contents and formats. An adequate data management process yielding data quality and control over its lifecycle is a prerequisite to getting value out of this data and minimizing inherent risks related to multiple usages. Common data governance frameworks rely on people, policies, and processes that fall short of the overwhelming complexity of data. Yet, harnessing this complexity is necessary to achieve high-quality standards. The latter will condition any downstream data usage outcome, including generative artificial intelligence trained on this data. In this paper, we report our concrete experience establishing a simple, cost-efficient framework that enables metadata-driven, agile and (semi-)automated data governance (i.e. Data Governance 4.0). We explain how we implement and use this framework to integrate 25 years of clinical study data at an enterprise scale in a fully productive environment. The framework encompasses both methodologies and technologies leveraging semantic web principles. We built a knowledge graph describing avatars of data assets in their business context, including governance principles. Multiple ontologies articulated by an enterprise upper ontology enable key governance actions such as FAIRification, lifecycle management, definition of roles and responsibilities, lineage across transformations and provenance from source systems. This metadata model is the keystone to data governance 4.0: a semi-automatised data management process that considers the business context in an agile manner to adapt governance constraints to each use case and dynamically tune it based on business changes.

翻訳日:2023-11-28 02:37:27 公開日:2023-11-23

# VCISR:ビデオ圧縮合成データを用いたBlind Single Image Super-Resolution

VCISR: Blind Single Image Super-Resolution with Video Compression Synthetic Data ( http://arxiv.org/abs/2311.00996v2 )

ライセンス: Link先を確認

Boyang Wang, Bowen Liu, Shiyu Liu, Fengyu Yang

(参考訳) ブラインド・シングル・イメージ・スーパーレゾリューション(SISR)タスクでは、画像レベルの未知の劣化の回復に成功している。しかし、単一のビデオフレームが入力となると、これらの作業は通常、蚊の音、鳴き声、ブロック性、階段の音などのビデオ圧縮による劣化に対処できない。本稿では,まず,映像圧縮に基づく劣化モデルを用いて,ブラインドsisrタスクにおける低分解能画像データを合成する。提案手法は既存の画像データセットに広く適用可能であり,映像圧縮アルゴリズムの損失による歪みを1つの劣化画像に含めることができる。これにより、ビデオデータの機能の多様性の漏洩が克服され、トレーニング効率が維持される。 SISR分解モデルにビデオ符号化アーティファクトを導入することで、ニューラルネットワークは、ビデオ圧縮の劣化を回復し、画像圧縮による一般的な歪みを回復するためのより良い結果を得ることができる。提案手法は, sotaノーリファレンス画像品質評価において優れた性能を達成し, 各種データセットの視覚品質を向上させる。さらに,ビデオスーパーレゾリューション(vsr)データセットの分解モデルを用いてトレーニングしたsisrニューラルネットワークを評価する。 VSR用に特別に設計されたアーキテクチャと比較して、ビデオベースの劣化を注入する提案された戦略は、時間的手がかりがなくても、より複雑な圧縮アーティファクトに対処するために一般化可能である。

In the blind single image super-resolution (SISR) task, existing works have been successful in restoring image-level unknown degradations. However, when a single video frame becomes the input, these works usually fail to address degradations caused by video compression, such as mosquito noise, ringing, blockiness, and staircase noise. In this work, we for the first time, present a video compression-based degradation model to synthesize low-resolution image data in the blind SISR task. Our proposed image synthesizing method is widely applicable to existing image datasets, so that a single degraded image can contain distortions caused by the lossy video compression algorithms. This overcomes the leak of feature diversity in video data and thus retains the training efficiency. By introducing video coding artifacts to SISR degradation models, neural networks can super-resolve images with the ability to restore video compression degradations, and achieve better results on restoring generic distortions caused by image compression as well. Our proposed approach achieves superior performance in SOTA no-reference Image Quality Assessment, and shows better visual quality on various datasets. In addition, we evaluate the SISR neural network trained with our degradation model on video super-resolution (VSR) datasets. Compared to architectures specifically designed for the VSR purpose, our method exhibits similar or better performance, evidencing that the presented strategy on infusing video-based degradation is generalizable to address more complicated compression artifacts even without temporal cues.

翻訳日:2023-11-28 02:36:35 公開日:2023-11-23

# ゼロコーディネートシフト:物理インフォームド演算子学習のためのWhetted Automatic Differentiation

Zero Coordinate Shift: Whetted Automatic Differentiation for Physics-informed Operator Learning ( http://arxiv.org/abs/2311.00860v2 )

ライセンス: Link先を確認

Kuangdai Leng, Mallikarjun Shankar, Jeyan Thiyagalingam

(参考訳) 自動微分(AD)は、ネットワーク出力w.r.t.座標の高次微分を計算するために必要となる物理インフォームド機械学習における重要なステップである。本稿では,ゼロ座標シフト (zcs) のトリックと呼ばれる,物理に変形した演算子学習のためのadを行う新しい軽量アルゴリズムを提案する。すべてのサンプル座標をリーフ変数にするのではなく、zcsは空間的または時間的次元ごとにスカラー値のリーフ変数を1つだけ導入し、望んでいた微分を"many-roots-many-leaves"から"one-root-many-leaves"へと単純化した。これは関数の次元(物理パラメータ)に沿って計算グラフの重複を避けることによって、優れた性能の飛躍をもたらした。 ZCSは現在のディープラーニングライブラリで簡単に実装できますが、私たちの独自の実装はDeepXDEパッケージを拡張して実現しています。我々は、データなしで偏微分方程式(PDE)を解くために、総合的なベンチマーク分析といくつかのケーススタディを行い、物理情報を用いたDeepONetsを訓練する。以上の結果から,ZCSはGPUメモリ使用量とトレーニングのウォール時間を桁違いに削減し,その削減係数は関数数に比例して拡大した。低レベルの最適化手法として、ZCSはデータ、物理(PDE)、ネットワークアーキテクチャに制限を課さず、あらゆる面からトレーニング結果を妥協しない。

Automatic differentiation (AD) is a critical step in physics-informed machine learning, required for computing the high-order derivatives of network output w.r.t. coordinates of collocation points. In this paper, we present a novel and lightweight algorithm to conduct AD for physics-informed operator learning, which we call the trick of Zero Coordinate Shift (ZCS). Instead of making all sampled coordinates as leaf variables, ZCS introduces only one scalar-valued leaf variable for each spatial or temporal dimension, simplifying the wanted derivatives from "many-roots-many-leaves" to "one-root-many-leaves" whereby reverse-mode AD becomes directly utilisable. It has led to an outstanding performance leap by avoiding the duplication of the computational graph along the dimension of functions (physical parameters). ZCS is easy to implement with current deep learning libraries; our own implementation is achieved by extending the DeepXDE package. We carry out a comprehensive benchmark analysis and several case studies, training physics-informed DeepONets to solve partial differential equations (PDEs) without data. The results show that ZCS has persistently reduced GPU memory consumption and wall time for training by an order of magnitude, and such reduction factor scales with the number of functions. As a low-level optimisation technique, ZCS imposes no restrictions on data, physics (PDE) or network architecture and does not compromise training results from any aspect.

翻訳日:2023-11-28 02:36:09 公開日:2023-11-23

# agramplifier: 局所更新増幅による中毒攻撃に対する連合学習の防御

AGRAMPLIFIER: Defending Federated Learning Against Poisoning Attacks Through Local Update Amplification ( http://arxiv.org/abs/2311.06996v2 )

ライセンス: Link先を確認

Zirui Gong, Liyue Shen, Yanjun Zhang, Leo Yu Zhang, Jingwei Wang, Guangdong Bai, and Yong Xiang

(参考訳) 連合学習の協調性(fl)は、ビザンチン中毒攻撃として知られる局所的なトレーニングデータと局所的な更新を操作するという形で大きな脅威となる。この問題に対処するために、ビザンチン参加者がアップロードした不審なローカルアップデートをフィルタリングまたは緩和するために、多くのビザンチン・ロバスト集約ルール(agr)が提案されている。本稿では,既存のAGRの堅牢性,忠実性,効率性を同時に向上することを目的とした,AGRAMPLIFIERと呼ばれる新しいアプローチを提案する。 AGRAMPLIFIERの中核となる考え方は、各勾配更新の最も抑圧的な特徴を特定して、ローカル更新の「道徳」を増幅することであり、悪意のある更新と良心的な更新を明確に区別し、その結果、検出効果を改善することである。この目的を達成するために、AGRMPとAGRXAIという2つのアプローチを提案する。 AGRMPはパッチへのローカルアップデートを整理し、各パッチから最大の値を抽出する一方、AGRXAIは説明可能なAIメソッドを活用して、最もアクティブな機能の勾配を抽出する。 AGRAMPLIFIERに既存のビザンチン・ロバスト機構を組み込むことで、モデルの堅牢性を向上し、その忠実性を維持し、全体的な効率を向上する。 AGRAMPLIFIERは、既存のビザンチン・ロバスト機構と普遍的に互換性がある。本報告では, 主要なAGR機構に組み込むことにより, 有効性を示す。 7つの代表的な毒殺攻撃に対する多様なドメインから7つのデータセットに対して行われた広範な評価では、ロバスト性、忠実性、効率性が一貫して向上し、それぞれ40.08%、39.18%、10.68%の値が得られた。

The collaborative nature of federated learning (FL) poses a major threat in the form of manipulation of local training data and local updates, known as the Byzantine poisoning attack. To address this issue, many Byzantine-robust aggregation rules (AGRs) have been proposed to filter out or moderate suspicious local updates uploaded by Byzantine participants. This paper introduces a novel approach called AGRAMPLIFIER, aiming to simultaneously improve the robustness, fidelity, and efficiency of the existing AGRs. The core idea of AGRAMPLIFIER is to amplify the "morality" of local updates by identifying the most repressive features of each gradient update, which provides a clearer distinction between malicious and benign updates, consequently improving the detection effect. To achieve this objective, two approaches, namely AGRMP and AGRXAI, are proposed. AGRMP organizes local updates into patches and extracts the largest value from each patch, while AGRXAI leverages explainable AI methods to extract the gradient of the most activated features. By equipping AGRAMPLIFIER with the existing Byzantine-robust mechanisms, we successfully enhance the model's robustness, maintaining its fidelity and improving overall efficiency. AGRAMPLIFIER is universally compatible with the existing Byzantine-robust mechanisms. The paper demonstrates its effectiveness by integrating it with all mainstream AGR mechanisms. Extensive evaluations conducted on seven datasets from diverse domains against seven representative poisoning attacks consistently show enhancements in robustness, fidelity, and efficiency, with average gains of 40.08%, 39.18%, and 10.68%, respectively.

翻訳日:2023-11-28 02:28:32 公開日:2023-11-23

# 生体神経力学からの因果関係発見への注意

Attention for Causal Relationship Discovery from Biological Neural Dynamics ( http://arxiv.org/abs/2311.06928v3 )

ライセンス: Link先を確認

Ziyu Lu, Anika Tabassum, Shruti Kulkarni, Lu Mi, J. Nathan Kutz, Eric Shea-Brown, Seung-Hwan Lim

(参考訳) 本稿では,神経生物学的および生体物理ネットワークのように,各ノードに複雑な非線形ダイナミクスを持つネットワークにおけるグランガー因果関係を学習するためのトランスフォーマーモデルの可能性について検討する。本研究は主に、基礎となる接続マトリックスを介して基底的因果関係が知られているシミュレーションニューラルネットワークに基づく概念実証研究に焦点をあてた。神経集団動態を予測するために訓練されたトランスフォーマーモデルに対し、クロスアテンションモジュールはニューロン間の因果関係を効果的に捉え、最も一般的なグランガー因果解析法と同等かそれ以上の精度で得ることを示した。現実の神経生物学のデータは、動的接続性や観測されていない変動性など、さらなる課題をもたらすことを認めていますが、この研究は、神経科学における因果表現学習のためのトランスフォーマーモデルの有用性について、前向きな予見を与えてくれます。

This paper explores the potential of the transformer models for learning Granger causality in networks with complex nonlinear dynamics at every node, as in neurobiological and biophysical networks. Our study primarily focuses on a proof-of-concept investigation based on simulated neural dynamics, for which the ground-truth causality is known through the underlying connectivity matrix. For transformer models trained to forecast neuronal population dynamics, we show that the cross attention module effectively captures the causal relationship among neurons, with an accuracy equal or superior to that for the most popular Granger causality analysis method. While we acknowledge that real-world neurobiology data will bring further challenges, including dynamic connectivity and unobserved variability, this research offers an encouraging preliminary glimpse into the utility of the transformer model for causal representation learning in neuroscience.

翻訳日:2023-11-28 02:27:59 公開日:2023-11-23

# TrainerAgent: LLM搭載マルチエージェントシステムによるカスタマイズ可能かつ効率的なモデルトレーニング

TrainerAgent: Customizable and Efficient Model Training through LLM-Powered Multi-Agent System ( http://arxiv.org/abs/2311.06622v2 )

ライセンス: Link先を確認

Haoyuan Li, Hao Jiang, Tianke Zhang, Zhelun Yu, Aoxiong Yin, Hao Cheng, Siming Fu, Yuhao Zhang, Wanggui He

(参考訳) AIモデルのトレーニングは、特にパーソナライズされたサービスを提供するカスタムモデルが必要な場合、常に困難だった。アルゴリズムエンジニアは、特定のビジネス要件に合わせて反復的にモデルを開発するための長いプロセスに直面します。高品質で効率的なモデル開発の探求は、大規模言語モデル(llm)エージェントの出現とともに、業界において重要な焦点となっている。 LLMの強力な分析,計画,意思決定機能を活用し,タスク,データ,モデル,サーバエージェントを含むマルチエージェントフレームワークからなるTranerAgentシステムを提案する。これらのエージェントは、ユーザ定義のタスク、入力データ、要求(例えば、精度、速度)を分析し、データとモデルの両方の観点から包括的な最適化を行い、満足なモデルを取得し、最終的にこれらのモデルをオンラインサービスとしてデプロイする。コンピュータビジョンおよび自然言語処理領域における古典的識別的・生成的タスクに関する実験的評価は,我々のシステムが所望の基準を満たすモデルを一貫して生成していることを示す。さらに、システムは、ファンタスティックなシナリオや非倫理的な要求など、達成不可能なタスクを批判的に識別し、拒否する能力を示し、堅牢性と安全性を確保する。本研究は, LLMを用いた分析, 意思決定, 実行能力の統合, および4つのエージェント間の協調により, 従来のモデル開発と比較して, 効率と品質が向上した望ましいモデルの実現において, 大幅な進歩を示すものである。我々は,AI分野におけるモデル開発の新たなパラダイムとして,学術および産業コミュニティにおけるTranerAgentの研究の進展に,我々の研究が貢献することを期待している。

Training AI models has always been challenging, especially when there is a need for custom models to provide personalized services. Algorithm engineers often face a lengthy process to iteratively develop models tailored to specific business requirements, making it even more difficult for non-experts. The quest for high-quality and efficient model development, along with the emergence of Large Language Model (LLM) Agents, has become a key focus in the industry. Leveraging the powerful analytical, planning, and decision-making capabilities of LLM, we propose a TrainerAgent system comprising a multi-agent framework including Task, Data, Model and Server agents. These agents analyze user-defined tasks, input data, and requirements (e.g., accuracy, speed), optimizing them comprehensively from both data and model perspectives to obtain satisfactory models, and finally deploy these models as online service. Experimental evaluations on classical discriminative and generative tasks in computer vision and natural language processing domains demonstrate that our system consistently produces models that meet the desired criteria. Furthermore, the system exhibits the ability to critically identify and reject unattainable tasks, such as fantastical scenarios or unethical requests, ensuring robustness and safety. This research presents a significant advancement in achieving desired models with increased efficiency and quality as compared to traditional model development, facilitated by the integration of LLM-powered analysis, decision-making, and execution capabilities, as well as the collaboration among four agents. We anticipate that our work will contribute to the advancement of research on TrainerAgent in both academic and industry communities, potentially establishing it as a new paradigm for model development in the field of AI.

翻訳日:2023-11-28 02:27:42 公開日:2023-11-23

# Instant3D:スパースビュー生成と大規模再構成モデルによる高速テキストから3D

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model ( http://arxiv.org/abs/2311.06214v2 )

ライセンス: Link先を確認

Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, Sai Bi

(参考訳) 拡散モデルを用いたtext-to-3dは近年著しく進歩している。しかし, 従来の方法では, 低い推算, 低多様性, ジャヌス問題に悩まされる, あるいは3次元トレーニングデータの不足による低品質な結果を生成するフィードフォワード法に依拠している。本稿では,テキストプロンプトから高品質で多様な3Dアセットをフィードフォワードで生成する新しい手法であるInstant3Dを提案する。我々はまず,2次元テキスト・画像拡散モデルを用いてテキストから4つの構造的・一貫したビューのスパースセットを1ショットで生成し,その後,新しいトランスフォーマー・ベース・スパース・ビュー・コンストラクタを用いて生成画像から直接NeRFを回帰する2段階のパラダイムを採用する。広範にわたる実験により,従来の最適化手法よりも2桁高速で1～10時間で高画質の3Dアセットを20秒以内で生成できることが実証された。私たちのプロジェクトwebページは、https://jiahao.ai/instant3d/です。

Text-to-3D with diffusion models has achieved remarkable progress in recent years. However, existing methods either rely on score distillation-based optimization which suffer from slow inference, low diversity and Janus problems, or are feed-forward methods that generate low-quality results due to the scarcity of 3D training data. In this paper, we propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner. We adopt a two-stage paradigm, which first generates a sparse set of four structured and consistent views from text in one shot with a fine-tuned 2D text-to-image diffusion model, and then directly regresses the NeRF from the generated images with a novel transformer-based sparse-view reconstructor. Through extensive experiments, we demonstrate that our method can generate diverse 3D assets of high visual quality within 20 seconds, which is two orders of magnitude faster than previous optimization-based methods that can take 1 to 10 hours. Our project webpage: https://jiahao.ai/instant3d/.

翻訳日:2023-11-28 02:26:09 公開日:2023-11-23

# ChiMed-GPT:フルトレーニングレギュムと人間の嗜好への適応性を備えた中国医学大言語モデル

ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences ( http://arxiv.org/abs/2311.06025v2 )

ライセンス: Link先を確認

Yuanhe Tian, Ruyi Gan, Yan Song, Jiaxing Zhang, Yongdong Zhang

(参考訳) 近年,医療サービスに対する需要の高まりが,医療インフラの格差を浮き彫りにしている。ビッグデータ、特にテキストは医療サービスの基盤を形成するため、医療領域に合わせた効果的な自然言語処理(NLP)ソリューションが必要不可欠である。事前学習モデルを活用する従来のアプローチは、この領域で有望な結果をもたらし、現在の大規模言語モデル(LLM)は、医療テキスト処理の高度な基盤を提供する。しかし、ほとんどの医療用LDMは、医用指導の理解と対応を効率よく行うが、ドメイン知識の習得や人間の嗜好の整合には効果がないにもかかわらず、教師付き微調整(SFT)でしか訓練されない。現在の医療用LLMがテキスト処理能力を改善するのを防ぐもう1つの工学的障壁は、制限されたコンテキスト長(2,048トークンなど)であり、医学領域で頻繁に必要とされる長いコンテキストを処理するのが困難である。本研究では,中国医学領域向けに明示的に設計された新しいベンチマーク LLM であるChiMed-GPT を提案する。情報抽出,質問応答,対話生成などの実世界のタスクの評価は,一般的なドメインLLMよりもChiMed-GPTの方が優れた性能を示している。さらに,ChiMed-GPTに患者の識別に関する態度尺度を実施させ,医療領域におけるLCMのさらなる発展に寄与する可能性が示唆された。コードとモデルはhttps://github.com/synlp/ChiMed-GPTで公開されている。

Recently, the increasing demand for superior medical services has highlighted the discrepancies in the medical infrastructure. With big data, especially texts, forming the foundation of medical services, there is an exigent need for effective natural language processing (NLP) solutions tailored to the healthcare domain. Conventional approaches leveraging pre-trained models present promising results in this domain and current large language models (LLMs) offer advanced foundation for medical text processing. However, most medical LLMs are trained only with supervised fine-tuning (SFT), even though it efficiently empowers LLMs to understand and respond to medical instructions but is ineffective in learning domain knowledge and aligning with human preference. Another engineering barrier that prevents current medical LLM from better text processing ability is their restricted context length (e.g., 2,048 tokens), making it hard for the LLMs to process long context, which is frequently required in the medical domain. In this work, we propose ChiMed-GPT, a new benchmark LLM designed explicitly for Chinese medical domain, with enlarged context length to 4,096 tokens and undergoes a comprehensive training regime with pre-training, SFT, and RLHF. Evaluations on real-world tasks including information extraction, question answering, and dialogue generation demonstrate ChiMed-GPT's superior performance over general domain LLMs. Furthermore, we analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients, so as to contribute to further responsible development of LLMs in the medical domain. The code and model are released at https://github.com/synlp/ChiMed-GPT.

翻訳日:2023-11-28 02:25:32 公開日:2023-11-23

# コスト正規化最適輸送による空間間の構造変換

Structured Transforms Across Spaces with Cost-Regularized Optimal Transport ( http://arxiv.org/abs/2311.05788v2 )

ライセンス: Link先を確認

Othmane Sebbouh and Marco Cuturi and Gabriel Peyr\'e

(参考訳) 目標確率測度へのソースマッチングは、点間の差分を定量化する地価関数によってパラメータ化される線形最適輸送(OT)問題をインスタンス化することでしばしば解決される。これらの測度が同じ距離空間にある場合、地価はその距離にデフォルトとなることが多い。しかし、2つの異なる空間にまたがってインスタンス化されると、整列データがない場合のコストを選択することは難題である。その結果、実践者は代わりに二次グロモウ=ワッサーシュタイン(Gromow-Wasserstein, GW)問題を解く。本研究は,gwとコスト正規化otを並列に活用し,地上コストでパラメータ化された線形ot目標の正規化最小化を行う。我々は、このコスト規則化された定式化を用いて、2つの異なるユークリッド空間における測度を一致させ、変換元点と目標点の間のコストを評価する。二次ot問題のいくつかはこのカテゴリに陥り、構造誘導正規化子を導入することで線形変換(例えばスパーシティ)における構造を強制することを考える。非整合データからそのような変換を抽出できる近位法アルゴリズムを提案し,単細胞空間転写学/マルチオミクスマッチングタスクへの適用性を示す。

Matching a source to a target probability measure is often solved by instantiating a linear optimal transport (OT) problem, parameterized by a ground cost function that quantifies discrepancy between points. When these measures live in the same metric space, the ground cost often defaults to its distance. When instantiated across two different spaces, however, choosing that cost in the absence of aligned data is a conundrum. As a result, practitioners often resort to solving instead a quadratic Gromow-Wasserstein (GW) problem. We exploit in this work a parallel between GW and cost-regularized OT, the regularized minimization of a linear OT objective parameterized by a ground cost. We use this cost-regularized formulation to match measures across two different Euclidean spaces, where the cost is evaluated between transformed source points and target points. We show that several quadratic OT problems fall in this category, and consider enforcing structure in linear transform (e.g. sparsity), by introducing structure-inducing regularizers. We provide a proximal algorithm to extract such transforms from unaligned data, and demonstrate its applicability to single-cell spatial transcriptomics/multiomics matching tasks.

翻訳日:2023-11-28 02:24:47 公開日:2023-11-23

# 階層的"視覚表現は階層的か?

Are "Hierarchical" Visual Representations Hierarchical? ( http://arxiv.org/abs/2311.05784v2 )

ライセンス: Link先を確認

Ethan Shen, Ali Farhadi, Aditya Kusupati

(参考訳) 学習された視覚表現はしばしば、正確な下流アプリケーションのための大量の意味情報をキャプチャする。世界に対する人間の理解は階層構造に根ざしている。これを模倣し、さらに表現能力を改善するために、コミュニティは視覚世界の基盤となる階層をモデル化することを目的とした「階層的な」視覚表現を探求した。本研究では,階層的視覚表現が標準的な学習表現よりも人間の知覚階層を本当に捉えているかを検討する。この目的のために、ImageNetのBREEDsサブセットから3種類の階層にまたがる12のデータセットからなるHierNetを作成します。トレーニング環境におけるハイパーボリック表現とマトリシカ表現の広範な評価の後, 階層構造が標準表現より優れているのではなく, 探索効率や解釈可能性などの他の側面を支援することができると結論付けた。ベンチマークとデータセットはhttps://github.com/ethanlshen/HierNet.comで公開されている。

Learned visual representations often capture large amounts of semantic information for accurate downstream applications. Human understanding of the world is fundamentally grounded in hierarchy. To mimic this and further improve representation capabilities, the community has explored "hierarchical" visual representations that aim at modeling the underlying hierarchy of the visual world. In this work, we set out to investigate if hierarchical visual representations truly capture the human perceived hierarchy better than standard learned representations. To this end, we create HierNet, a suite of 12 datasets spanning 3 kinds of hierarchy from the BREEDs subset of ImageNet. After extensive evaluation of Hyperbolic and Matryoshka Representations across training setups, we conclude that they do not capture hierarchy any better than the standard representations but can assist in other aspects like search efficiency and interpretability. Our benchmark and the datasets are open-sourced at https://github.com/ethanlshen/HierNet.

翻訳日:2023-11-28 02:24:26 公開日:2023-11-23

# テキストからの因果推論:変数間の相互作用を明らかにする

Causal Inference from Text: Unveiling Interactions between Variables ( http://arxiv.org/abs/2311.05286v2 )

ライセンス: Link先を確認

Yuxiang Zhou, Yulan He

(参考訳) 観測テキストデータから因果効果を推定するには潜在共変量の調整が不可欠である。既存の方法の多くは、治療と結果の両方に影響を及ぼす共変量の結合のみを考慮し、潜在的に偏りのある因果効果をもたらす。このバイアスは、治療または結果にのみ関係する非共変量に対する不十分な考慮から生じる。本研究では,テキストから因果効果を推定する際,異なる変数間の相互作用を露呈し,非折りたたみ共変体を乱すことによりバイアスを軽減することを目的とする。分離過程は、共変数がそれぞれの目的にのみ寄与することを保証し、変数間の独立を可能にする。さらに,選択バイアスを軽減するために,治療群と対照群からの表現のバランスをとるための制約を課す。様々なシナリオにおいて, 2つの異なる治療因子について実験を行い, 提案モデルは近年の強基線を著しく上回っている。さらに、収支報告書の徹底的な分析により、我々のモデルが変数を効果的に解体できることが示され、現実世界のシナリオに関するさらなる調査は、投資家が情報的な意思決定を行うためのガイダンスを提供する。

Adjusting for latent covariates is crucial for estimating causal effects from observational textual data. Most existing methods only account for confounding covariates that affect both treatment and outcome, potentially leading to biased causal effects. This bias arises from insufficient consideration of non-confounding covariates, which are relevant only to either the treatment or the outcome. In this work, we aim to mitigate the bias by unveiling interactions between different variables to disentangle the non-confounding covariates when estimating causal effects from text. The disentangling process ensures covariates only contribute to their respective objectives, enabling independence between variables. Additionally, we impose a constraint to balance representations from the treatment group and control group to alleviate selection bias. We conduct experiments on two different treatment factors under various scenarios, and the proposed model significantly outperforms recent strong baselines. Furthermore, our thorough analysis on earnings call transcripts demonstrates that our model can effectively disentangle the variables, and further investigations into real-world scenarios provide guidance for investors to make informed decisions.

翻訳日:2023-11-28 02:24:11 公開日:2023-11-23

# オープンワールドにおけるクロスドメインシークエンシャルレコメンデーション:モデルに依存しないコントラシブデノイングアプローチ

Towards Open-world Cross-Domain Sequential Recommendation: A Model-Agnostic Contrastive Denoising Approach ( http://arxiv.org/abs/2311.04760v2 )

ライセンス: Link先を確認

Wujiang Xu, Xuying Ning, Wenfang Lin, Mingming Ha, Qiongxu Ma, Qianqiao Liang, Xuewen Tao, Linxun Chen, Bing Han, Minnan Luo

(参考訳) クロスドメインシーケンシャルレコメンデーション(CDSR)は、従来のシーケンシャルレコメンデーション(SR)システムに存在するデータ空間の問題に対処することを目的としている。既存手法は,複数のドメインにまたがって情報を伝達・伝播する特定のクロスドメインユニットを設計することを目的としている。しかし、現実のレコメンデーションシステムでは、CDSRシナリオは通常、疎い振る舞いを持つ長い尾を持つユーザーの大多数と、一つのドメインにしか存在しないコールドスタートユーザーから構成される。これにより、現実世界の業界プラットフォームにおける既存のCDSRメソッドのパフォーマンスが低下する。したがって、オープンワールドCDSRシナリオにおけるモデルの一貫性と有効性を改善することは、CDSRモデルを構築する上で重要である(\textit{1st} CH)。近年,SR手法のいくつかは,長期使用者の情報を補完する補助行動を利用している。しかし、これらのマルチビヘイビアSR法は、ターゲットと補助動作のセマンティックなギャップや、ドメイン間のユーザ関心の偏り(\textit{2nd} CH)を見落としているため、CDSRにおいて有望な性能をもたらすことはできない。

Cross-domain sequential recommendation (CDSR) aims to address the data sparsity problems that exist in traditional sequential recommendation (SR) systems. The existing approaches aim to design a specific cross-domain unit that can transfer and propagate information across multiple domains by relying on overlapping users with abundant behaviors. However, in real-world recommender systems, CDSR scenarios usually consist of a majority of long-tailed users with sparse behaviors and cold-start users who only exist in one domain. This leads to a drop in the performance of existing CDSR methods in the real-world industry platform. Therefore, improving the consistency and effectiveness of models in open-world CDSR scenarios is crucial for constructing CDSR models (\textit{1st} CH). Recently, some SR approaches have utilized auxiliary behaviors to complement the information for long-tailed users. However, these multi-behavior SR methods cannot deliver promising performance in CDSR, as they overlook the semantic gap between target and auxiliary behaviors, as well as user interest deviation across domains (\textit{2nd} CH).

翻訳日:2023-11-28 02:23:48 公開日:2023-11-23

# ProAgent: ロボットプロセス自動化からエージェントプロセス自動化へ

ProAgent: From Robotic Process Automation to Agentic Process Automation ( http://arxiv.org/abs/2311.10751v2 )

ライセンス: Link先を確認

Yining Ye, Xin Cong, Shizuo Tian, Jiannan Cao, Hao Wang, Yujia Qin, Yaxi Lu, Heyang Yu, Huadong Wang, Yankai Lin, Zhiyuan Liu, Maosong Sun

(参考訳) 古代の水車からロボットプロセス自動化(RPA)まで、自動化技術は歴史を通じて進化し、人間を困難な仕事から解放してきた。しかし、RPAは人間のような知性を必要とするタスク、特にワークフロー構築の精巧な設計とワークフロー実行における動的意思決定に苦慮している。大規模言語モデル (LLM) が人間のような知性を持つようになったため, 建設・実行に関連するエージェントに人的労働力をオフロードすることで, LLMをベースとしたエージェントによる高度な自動化のための基盤的自動化パラダイムである Agentic Process Automation (APA) を導入する。そして、人間の指示からワークフローを作り、特殊エージェントを調整することで複雑な決定を下すように設計されたLLMベースのエージェントであるProAgentをインスタンス化する。ワークフローの構築と実行手順を詳細に説明し、APAの実現可能性を示し、エージェントによって駆動される新しい自動化パラダイムの可能性を明らかにする実証実験を行った。私たちのコードはhttps://github.com/openbmb/proagent.comで公開しています。

From ancient water wheels to robotic process automation (RPA), automation technology has evolved throughout history to liberate human beings from arduous tasks. Yet, RPA struggles with tasks needing human-like intelligence, especially in elaborate design of workflow construction and dynamic decision-making in workflow execution. As Large Language Models (LLMs) have emerged human-like intelligence, this paper introduces Agentic Process Automation (APA), a groundbreaking automation paradigm using LLM-based agents for advanced automation by offloading the human labor to agents associated with construction and execution. We then instantiate ProAgent, an LLM-based agent designed to craft workflows from human instructions and make intricate decisions by coordinating specialized agents. Empirical experiments are conducted to detail its construction and execution procedure of workflow, showcasing the feasibility of APA, unveiling the possibility of a new paradigm of automation driven by agents. Our code is public at https://github.com/OpenBMB/ProAgent.

翻訳日:2023-11-28 02:15:27 公開日:2023-11-23

# 注意を再考する - トランスフォーマーの注意層に代わる、浅層フィードフォワードニューラルネットワークの探索

Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers ( http://arxiv.org/abs/2311.10642v2 )

ライセンス: Link先を確認

Vukasin Bozic, Danilo Dordevic, Daniele Coppola, Joseph Thommes, Sidak Pal Singh

(参考訳) 本研究は,sequence-to-sequenceタスクのための最先端アーキテクチャであるオリジナルのtransformerモデルにおいて,アテンション機構の挙動を模倣するために,標準的な浅層フィードフォワードネットワークを用いた場合の有効性の分析を行う。トランスの注意機構のキー要素を単純なフィードフォワードネットワークに置き換え, 知識蒸留により元の成分を用いて学習する。 IWSLT2017データセットで実施した実験では,これらの“アテンションレストランスフォーマー”の能力が,元のアーキテクチャのパフォーマンスに匹敵することを示した。厳密なアブレーション研究と、様々な代替ネットワークタイプとサイズの実験を通じて、我々のアプローチの生存可能性を支える洞察を提供する。これは、アテンション機構をエミュレートする上での浅いフィードフォワードネットワークの適応性に光を当てるだけでなく、シーケンスからシーケンスへのタスクの複雑なアーキテクチャを合理化する可能性にも光を当てている。

This work presents an analysis of the effectiveness of using standard shallow feed-forward networks to mimic the behavior of the attention mechanism in the original Transformer model, a state-of-the-art architecture for sequence-to-sequence tasks. We substitute key elements of the attention mechanism in the Transformer with simple feed-forward networks, trained using the original components via knowledge distillation. Our experiments, conducted on the IWSLT2017 dataset, reveal the capacity of these "attentionless Transformers" to rival the performance of the original architecture. Through rigorous ablation studies, and experimenting with various replacement network types and sizes, we offer insights that support the viability of our approach. This not only sheds light on the adaptability of shallow feed-forward networks in emulating attention mechanisms but also underscores their potential to streamline complex architectures for sequence-to-sequence tasks.

翻訳日:2023-11-28 02:15:07 公開日:2023-11-23

# 視覚受容場に対する一般化ガウス微分モデルによる時空間受容場に対する幾何学的画像変換の共分散特性

Joint covariance property under geometric image transformations for spatio-temporal receptive fields according to the generalized Gaussian derivative model for visual receptive fields ( http://arxiv.org/abs/2311.10543v3 )

ライセンス: Link先を確認

Tony Lindeberg

(参考訳) 自然な画像変換が受容野応答に与える影響は、コンピュータビジョンと生体視覚の視覚操作のモデリングに不可欠である。この点において、視覚階層の最初期の層における幾何学的画像変換に関する共分散特性は、ロバストな画像操作の表現や、高レベルでの不変な視覚操作の定式化に不可欠である。本稿では,空間的スケーリング変換,空間的アフィン変換,ガリレオ変換,時間的スケーリング変換といった構成下での結合共分散特性を定義し,両者の相互作用を特徴付ける。具体的には、時空間の知覚場からの出力と時空間のイメージ変換とを一致させるために、受容場パラメータをどのように変換する必要があるかを示す。

The influence of natural image transformations on receptive field responses is crucial for modelling visual operations in computer vision and biological vision. In this regard, covariance properties with respect to geometric image transformations in the earliest layers of the visual hierarchy are essential for expressing robust image operations and for formulating invariant visual operations at higher levels. This paper defines and proves a joint covariance property under compositions of spatial scaling transformations, spatial affine transformations, Galilean transformations and temporal scaling transformations, which makes it possible to characterize how different types of image transformations interact with each other. Specifically, the derived relations show how the receptive field parameters need to be transformed, in order to match the output from spatio-temporal receptive fields with the underlying spatio-temporal image transformations.

翻訳日:2023-11-28 02:14:47 公開日:2023-11-23

# 複数インスタンス学習による逐次的時系列分類

Inherently Interpretable Time Series Classification via Multiple Instance Learning ( http://arxiv.org/abs/2311.10049v2 )

ライセンス: Link先を確認

Joseph Early, Gavin KC Cheung, Kurt Cutajar, Hanting Xie, Jas Kandola, Niall Twomey

(参考訳) 従来の時系列分類 (tsc) 法は、意思決定過程を曖昧に解釈するブラックボックスであることが多い。本研究では、この問題を解決するためにMIL(Multiple Instance Learning)を活用し、MILLET: Multiple Instance Learning for Locally Explainable Time Series Classificationという新しいフレームワークを提案する。我々はMILLETを既存のディープラーニングTSCモデルに適用し、予測性能を損なうことなく(場合によっては改善しても)本質的に解釈可能であることを示す。 85 UCR TSCデータセット上でMILLETを評価し,解釈可能性評価を容易にするために特別に設計された新しい合成データセットを提案する。これらのデータセットにおいて,ミレットは,他のよく知られた解釈方法よりも高い品質のスパースな説明を素早く生成することを示した。私たちの知る限り、GitHubで入手可能なMILLET(https://github.com/JAEarly/MILTimeSeriesClassification)は、TSCのための一般的なMILメソッドを開発し、それらを広範囲のドメインに適用する最初の方法です。

Conventional Time Series Classification (TSC) methods are often black boxes that obscure inherent interpretation of their decision-making processes. In this work, we leverage Multiple Instance Learning (MIL) to overcome this issue, and propose a new framework called MILLET: Multiple Instance Learning for Locally Explainable Time series classification. We apply MILLET to existing deep learning TSC models and show how they become inherently interpretable without compromising (and in some cases, even improving) predictive performance. We evaluate MILLET on 85 UCR TSC datasets and also present a novel synthetic dataset that is specially designed to facilitate interpretability evaluation. On these datasets, we show MILLET produces sparse explanations quickly that are of higher quality than other well-known interpretability methods. To the best of our knowledge, our work with MILLET, which is available on GitHub (https://github.com/JAEarly/MILTimeSeriesClassification), is the first to develop general MIL methods for TSC and apply them to an extensive variety of domains

翻訳日:2023-11-28 02:13:28 公開日:2023-11-23

# 超解法の再定義:古典的シミュレーションを伴わないPDE予測

Redefining Super-Resolution: Fine-mesh PDE predictions without classical simulations ( http://arxiv.org/abs/2311.09740v2 )

ライセンス: Link先を確認

Rajat Kumar Sarkar, Ritam Majumdar, Vishal Jadhav, Sagar Srinivas Sakhinana, Venkataramana Runkana

(参考訳) 計算流体力学(CFD)では、粗いメッシュシミュレーションは計算効率を提供するが、精度は低いことが多い。これらのシミュレーションに従来の超解像を適用することは、高分解能画像のダウンサンプリングと低分解能物理のオーステンシャルエミュレーションの基本的なコントラストのために大きな課題となる。前者の手法は、現実のシナリオの通常の制約を超越して、基礎となる物理学をより保存する。 PDEに基づく問題に適した超解像の新たな定義を提案する。高解像度データセットから単純にサンプリングする代わりに、粗いグリッドシミュレーションデータを入力として使用し、細粒度シミュレーション結果を予測する。物理拡散型UNetアップスケーリング法を用いて,バーガー方程式の不連続検出,メタン燃焼,産業熱交換器のファウリングなど,様々な2次元CFD問題に対して有効性を示す。提案手法は,従来のシミュレーションを通過させることで,基礎となる真理結果に対する計算的保存と忠実性の確保を可能にする。トレーニング中の境界条件の多様さにより,本手法の堅牢性をさらに確立し,工学および科学的CFD解法における幅広い応用の道を開く。

In Computational Fluid Dynamics (CFD), coarse mesh simulations offer computational efficiency but often lack precision. Applying conventional super-resolution to these simulations poses a significant challenge due to the fundamental contrast between downsampling high-resolution images and authentically emulating low-resolution physics. The former method conserves more of the underlying physics, surpassing the usual constraints of real-world scenarios. We propose a novel definition of super-resolution tailored for PDE-based problems. Instead of simply downsampling from a high-resolution dataset, we use coarse-grid simulated data as our input and predict fine-grid simulated outcomes. Employing a physics-infused UNet upscaling method, we demonstrate its efficacy across various 2D-CFD problems such as discontinuity detection in Burger's equation, Methane combustion, and fouling in Industrial heat exchangers. Our method enables the generation of fine-mesh solutions bypassing traditional simulation, ensuring considerable computational saving and fidelity to the original ground truth outcomes. Through diverse boundary conditions during training, we further establish the robustness of our method, paving the way for its broad applications in engineering and scientific CFD solvers.

翻訳日:2023-11-28 02:13:09 公開日:2023-11-23

# 事前学習されたコードモデルに対する敵意攻撃に関する広範囲研究

An Extensive Study on Adversarial Attack against Pre-trained Models of Code ( http://arxiv.org/abs/2311.07553v2 )

ライセンス: Link先を確認

Xiaohu Du, Ming Wen, Zichao Wei, Shangwen Wang, Hai Jin

(参考訳) Transformer-based pre-trained code (PTMC)は、多くのミッションクリティカルなアプリケーションで最先端のパフォーマンスを実現している。しかし、識別子置換やコーディングスタイル変換による敵攻撃に対して脆弱であり、精度を著しく低下させ、さらにセキュリティ上の懸念を生じさせる可能性がある。 PTMCの逆例を生成するためのいくつかの手法が提案されているが、このような手法の有効性と効率性は、特に異なるコードインテリジェンスタスクにおいてよく理解されていない。このギャップを埋めるために,本研究では,5つの最先端の敵攻撃アプローチを,有効性,効率,生成例の品質という3つの視点から体系的に分析した。結果は、5つのアプローチのいずれもこれらの観点のバランスが取れていないことを示している。特に攻撃成功率の高いアプローチは、時間を要する傾向がある。この制限に対処するために、異なるコンテキスト下で識別子を摂動させることの影響を調べ、forおよびif文内の識別子置換が最も効果的であることを示す。そこで本研究では,様々なタスクに対して異なる種類の文を優先し,さらにビーム探索を用いて逆例を生成する新しい手法を提案する。評価結果から, ALERTは, 実例の自然性を保ちながら, 有効性と効率の両面で高い性能を示した。

Transformer-based pre-trained models of code (PTMC) have been widely utilized and have achieved state-of-the-art performance in many mission-critical applications. However, they can be vulnerable to adversarial attacks through identifier substitution or coding style transformation, which can significantly degrade accuracy and may further incur security concerns. Although several approaches have been proposed to generate adversarial examples for PTMC, the effectiveness and efficiency of such approaches, especially on different code intelligence tasks, has not been well understood. To bridge this gap, this study systematically analyzes five state-of-the-art adversarial attack approaches from three perspectives: effectiveness, efficiency, and the quality of generated examples. The results show that none of the five approaches balances all these perspectives. Particularly, approaches with a high attack success rate tend to be time-consuming; the adversarial code they generate often lack naturalness, and vice versa. To address this limitation, we explore the impact of perturbing identifiers under different contexts and find that identifier substitution within for and if statements is the most effective. Based on these findings, we propose a new approach that prioritizes different types of statements for various tasks and further utilizes beam search to generate adversarial examples. Evaluation results show that it outperforms the state-of-the-art ALERT in terms of both effectiveness and efficiency while preserving the naturalness of the generated adversarial examples.

翻訳日:2023-11-28 02:11:10 公開日:2023-11-23

# 移動ロボットのセマンティック・セマンティック・セグメンテーションと境界検出

Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for Mobile Robots ( http://arxiv.org/abs/2311.12651v2 )

ライセンス: Link先を確認

Youqi Liao, Shuhao Kang, Jianping Li, Yang Liu, Yun Liu, Zhen Dong, Bisheng Yang, Xieyuanli Chen

(参考訳) シャープバウンダリとロバストセマンティクスの高精度かつ迅速なデライン化は、ロボットの把握と操作、リアルタイムセマンティクスマッピング、エッジコンピューティングユニットで実行されるオンラインセンサーキャリブレーションなど、多くの下流ロボットタスクに不可欠である。境界検出とセマンティックセグメンテーションは相補的なタスクであるが、ほとんどの研究はセマンティックセグメンテーションの軽量モデルに焦点を当てているが、境界検出の重要な役割を見落としている。本研究では,同時セマンティックセグメンテーションと境界検出に適した軽量なデュアルタスクフレームワークであるMobile-Seedを紹介する。我々のフレームワークは、2ストリームエンコーダ、アクティブフュージョンデコーダ(AFD)、デュアルタスク正規化アプローチを備えている。エンコーダは2つの経路に分けられる: 1つはカテゴリ認識のセマンティック情報をキャプチャし、もう1つはマルチスケールの特徴から境界を識別する。 AFDモジュールは、チャネル関係を学習することで意味情報と境界情報の融合を動的に適応し、各チャネルの正確な重み付けを可能にする。さらに,二重タスク学習と深層ダイバーシティの監督における矛盾を軽減するために,正規化損失を導入する。既存の手法と比較して,提案するMobile-Seedはセマンティックセグメンテーション性能を同時に改善し,オブジェクト境界を正確に特定する軽量なフレームワークを提供する。 Cityscapesデータセットの実験によると、Mobile-Seedは、RTX 2080 Ti GPU上で1024x2048の解像度で23.9フレーム/秒(FPS)のオンライン推論速度を維持しながら、mIoUで2.2ポイント(pp)、mFスコアで4.2ppという、最先端のSOTAベースラインよりも顕著に改善されている。 CamVidおよびPASCALコンテキストデータセットに関する追加実験により、我々のメソッドの一般化可能性が確認された。コードと追加結果はhttps://whu-usi3dv.github.io/Mobile-Seed/で公開されている。

Precise and rapid delineation of sharp boundaries and robust semantics is essential for numerous downstream robotic tasks, such as robot grasping and manipulation, real-time semantic mapping, and online sensor calibration performed on edge computing units. Although boundary detection and semantic segmentation are complementary tasks, most studies focus on lightweight models for semantic segmentation but overlook the critical role of boundary detection. In this work, we introduce Mobile-Seed, a lightweight, dual-task framework tailored for simultaneous semantic segmentation and boundary detection. Our framework features a two-stream encoder, an active fusion decoder (AFD) and a dual-task regularization approach. The encoder is divided into two pathways: one captures category-aware semantic information, while the other discerns boundaries from multi-scale features. The AFD module dynamically adapts the fusion of semantic and boundary information by learning channel-wise relationships, allowing for precise weight assignment of each channel. Furthermore, we introduce a regularization loss to mitigate the conflicts in dual-task learning and deep diversity supervision. Compared to existing methods, the proposed Mobile-Seed offers a lightweight framework to simultaneously improve semantic segmentation performance and accurately locate object boundaries. Experiments on the Cityscapes dataset have shown that Mobile-Seed achieves notable improvement over the state-of-the-art (SOTA) baseline by 2.2 percentage points (pp) in mIoU and 4.2 pp in mF-score, while maintaining an online inference speed of 23.9 frames-per-second (FPS) with 1024x2048 resolution input on an RTX 2080 Ti GPU. Additional experiments on CamVid and PASCAL Context datasets confirm our method's generalizability. Code and additional results are publicly available at https://whu-usi3dv.github.io/Mobile-Seed/.

翻訳日:2023-11-28 02:03:23 公開日:2023-11-23

# DisPLACE Challenge 2023の概要 -- 会話環境におけるSPeakerとLanguageのダイアリゼーション

Summary of the DISPLACE Challenge 2023 -- DIarization of SPeaker and LAnguage in Conversational Environments ( http://arxiv.org/abs/2311.12564v2 )

ライセンス: Link先を確認

Shikha Baghel, Shreyas Ramoji, Somil Jain, Pratik Roy Chowdhuri, Prachi Singh, Deepu Vijayasenan, Sriram Ganapathy

(参考訳) 複数の言語が小さな地理的近傍で話される多言語社会では、非公式な会話はしばしば言語が混在する。既存の音声技術は、音声データが複数の言語や話者の多様性に富んでいるような会話から情報を抽出するのに非効率である。 displace (diaarization of speaker and language in conversational environment) チャレンジは、この困難な条件下で話者と言語ダイアリゼーション技術を評価するためのオープンコールを構成する。トラック1は多言語環境での話者ダイアリゼーション(SD)に焦点を当て、トラック2は多話者シナリオで言語ダイアリゼーション(LD)に対処した。両トラックは同じ音声データを用いて評価された。この評価を容易にするために,多言語・多話者対話型遠距離音声を用いた実世界のデータセットを作成した。さらに、SDタスクとLDタスクの両方でベースラインシステムが利用可能となり、これらのタスクの最先端を模倣した。このチャレンジは全世界で42ドルの登録金を集め、トラック1とトラック2の合計で19ドルの応募金を受け取った。本稿では,課題,データセット,タスク,ベースラインシステムの詳細について述べる。さらに,本論文では,提案したシステムの概要を両トラックで簡潔に概説し,上位のシステムに重点を置いている。また,SDタスクとLDタスクに対する洞察と今後の展望を述べるとともに,このような会話に広範に展開する前に,システムが克服すべき重要な課題に焦点をあてる。

In multi-lingual societies, where multiple languages are spoken in a small geographic vicinity, informal conversations often involve mix of languages. Existing speech technologies may be inefficient in extracting information from such conversations, where the speech data is rich in diversity with multiple languages and speakers. The DISPLACE (DIarization of SPeaker and LAnguage in Conversational Environments) challenge constitutes an open-call for evaluating and bench-marking the speaker and language diarization technologies on this challenging condition. The challenge entailed two tracks: Track-1 focused on speaker diarization (SD) in multilingual situations while, Track-2 addressed the language diarization (LD) in a multi-speaker scenario. Both the tracks were evaluated using the same underlying audio data. To facilitate this evaluation, a real-world dataset featuring multilingual, multi-speaker conversational far-field speech was recorded and distributed. Furthermore, a baseline system was made available for both SD and LD task which mimicked the state-of-art in these tasks. The challenge garnered a total of $42$ world-wide registrations and received a total of $19$ combined submissions for Track-1 and Track-2. This paper describes the challenge, details of the datasets, tasks, and the baseline system. Additionally, the paper provides a concise overview of the submitted systems in both tracks, with an emphasis given to the top performing systems. The paper also presents insights and future perspectives for SD and LD tasks, focusing on the key challenges that the systems need to overcome before wide-spread commercial deployment on such conversations.

翻訳日:2023-11-28 02:02:26 公開日:2023-11-23

# HoVer-UNet:知識蒸留によるUNetをベースとした多クラス核セグメンテーションによるHoVerNetの高速化

"HoVer-UNet": Accelerating HoVerNet with UNet-based multi-class nuclei segmentation via knowledge distillation ( http://arxiv.org/abs/2311.12553v2 )

ライセンス: Link先を確認

Cristian Tommasino, Cristiano Russo, Antonio Maria Rinaldi, Francesco Ciompi

(参考訳) 本稿では,核のインスタンス分割と組織学的分類のためのマルチブランチHoVerNetフレームワークの知識を抽出する手法として,HoVer-UNetを提案する。我々は,Mix Vision Transformerのバックボーンを備えたコンパクトで合理化された単一UNetネットワークを提案し,HoVerNetの蒸留知識を最適に符号化し,性能を損なうことなく計算要求を減らした。提案モデルは,公開PanNukeデータセットとConsepデータセットでHoVerNetに匹敵する結果を達成し,推論時間を3倍に短縮したことを示す。モデルのコードはhttps://github.com/DIAGNijmegen/HoVer-UNet.comで公開しています。

We present "HoVer-UNet", an approach to distill the knowledge of the multi-branch HoVerNet framework for nuclei instance segmentation and classification in histopathology. We propose a compact, streamlined single UNet network with a Mix Vision Transformer backbone, and equip it with a custom loss function to optimally encode the distilled knowledge of HoVerNet, reducing computational requirements without compromising performances. We show that our model achieved results comparable to HoVerNet on the public PanNuke and Consep datasets with a three-fold reduction in inference time. We make the code of our model publicly available at https://github.com/DIAGNijmegen/HoVer-UNet.

翻訳日:2023-11-28 02:01:56 公開日:2023-11-23

# PF-LRM:共振器と形状予測のための多孔性大再構成モデル

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction ( http://arxiv.org/abs/2311.12024v2 )

ライセンス: Link先を確認

Peng Wang, Hao Tan, Sai Bi, Yinghao Xu, Fujun Luan, Kalyan Sunkavalli, Wenping Wang, Zexiang Xu, Kai Zhang

(参考訳) A100 GPUで相対カメラのポーズを約1.3秒で推定しながら、視覚的オーバーラップが少なく、少数の未提示画像から3Dオブジェクトを再構成するPF-LRMを提案する。 pf-lrmは3dオブジェクトトークンと2dイメージトークン間の情報を交換するために自己アテンションブロックを利用する高度にスケーラブルな手法であり、各ビューで粗いポイントクラウドを予測し、微分可能なpnpソルバを用いてカメラポーズを得る。 PF-LRMは, 約1Mオブジェクトの膨大な多ビューポーズデータに基づいてトレーニングを行うと, 強力なクロスデータセット一般化能力を示し, 様々な未知の評価データセットに対して, ポーズ予測精度と3次元再構成品質の点で, ベースライン手法を大きなマージンで上回っている。また,高速フィードフォワード推論によるダウンストリームテキスト/画像間3dタスクにおけるモデルの適用性を示す。プロジェクトのWebサイトは以下の通り。

We propose a Pose-Free Large Reconstruction Model (PF-LRM) for reconstructing a 3D object from a few unposed images even with little visual overlap, while simultaneously estimating the relative camera poses in ~1.3 seconds on a single A100 GPU. PF-LRM is a highly scalable method utilizing the self-attention blocks to exchange information between 3D object tokens and 2D image tokens; we predict a coarse point cloud for each view, and then use a differentiable Perspective-n-Point (PnP) solver to obtain camera poses. When trained on a huge amount of multi-view posed data of ~1M objects, PF-LRM shows strong cross-dataset generalization ability, and outperforms baseline methods by a large margin in terms of pose prediction accuracy and 3D reconstruction quality on various unseen evaluation datasets. We also demonstrate our model's applicability in downstream text/image-to-3D task with fast feed-forward inference. Our project website is at: https://totoro97.github.io/pf-lrm .

翻訳日:2023-11-28 02:00:44 公開日:2023-11-23

# ダイヤモンドスズ空孔中心の電荷状態と光遷移周波数の周知初期化

Heralded initialization of charge state and optical transition frequency of diamond tin-vacancy centers ( http://arxiv.org/abs/2311.11962v3 )

ライセンス: Link先を確認

Julia M. Brevoord, Lorenzo De Santis, Takashi Yamamoto, Matteo Pasini, Nina Codreanu, Tim Turan, Hans K. C. Beukers, Christopher Waas, Ronald Hanson

(参考訳) Diamond Tin-Vacancy Centerは、量子情報科学と技術のための有望なプラットフォームとして登場した。より複雑な量子実験やスケーラブルな応用で使用する上で重要な課題は、所望の電荷状態の中心を予め定義された周波数で光遷移させる能力である。本稿では,レーザー励起,光子検出,リアルタイム論理を併用したヘラルド作成について報告する。まず、最適化共振プローブパルス中の蛍光光子数とその後の電荷状態と光遷移周波数とを強く相関させ、閾値光子計数により所望の状態をリアルタイムに階層化することを示した。次に,光発光励起測定,コヒーレント光駆動,光ラムゼイ実験に適用し,閾値の上昇に伴う光コヒーレンスを強く改善した。最後に、作製した光周波数が不均質線幅を横切るプローブレーザに従い、複数の均質線幅上の遷移周波数のチューニングを可能にすることを実証する。

Diamond Tin-Vacancy centers have emerged as a promising platform for quantum information science and technology. A key challenge for their use in more complex quantum experiments and scalable applications is the ability to prepare the center in the desired charge state with the optical transition at a pre-defined frequency. Here we report on heralding such successful preparation using a combination of laser excitation, photon detection, and real-time logic. We first show that fluorescence photon counts collected during an optimized resonant probe pulse strongly correlate with the subsequent charge state and optical transition frequency, enabling real-time heralding of the desired state through threshold photon counting. We then implement and apply this heralding technique to photoluminescence excitation measurements, coherent optical driving, and an optical Ramsey experiment, finding strongly improved optical coherence with increasing threshold. Finally, we demonstrate that the prepared optical frequency follows the probe laser across the inhomogeneous linewidth, enabling tuning of the transition frequency over multiple homogeneous linewidths.

翻訳日:2023-11-28 02:00:09 公開日:2023-11-23

# muvo:幾何表現を用いた自律運転のためのマルチモーダル生成世界モデル

MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations ( http://arxiv.org/abs/2311.11762v2 )

ライセンス: Link先を確認

Daniel Bogdoll, Yitian Yang, J. Marius Z\"ollner

(参考訳) 自律運転のための教師なしの世界モデルを学ぶことは、今日のシステムの推論能力を大幅に改善する可能性がある。しかし、ほとんどの作業は世界の物理的特性を無視し、センサーデータのみに焦点を当てている。本稿では,幾何学的ボクセル表現を持つマルチモーダル世界モデルであるmuvoを提案する。生のカメラとライダーデータを用いて,センサに依存しない世界の幾何学的表現を学習する。マルチモーダルな将来の予測を実証し,この幾何表現により,カメラ画像とライダー点雲の両方の予測品質が向上することを示す。

Learning unsupervised world models for autonomous driving has the potential to improve the reasoning capabilities of today's systems dramatically. However, most work neglects the physical attributes of the world and focuses on sensor data alone. We propose MUVO, a MUltimodal World Model with Geometric VOxel Representations to address this challenge. We utilize raw camera and lidar data to learn a sensor-agnostic geometric representation of the world, which can directly be used by downstream tasks, such as planning. We demonstrate multimodal future predictions and show that our geometric representation improves the prediction quality of both camera images and lidar point clouds.

翻訳日:2023-11-28 01:59:34 公開日:2023-11-23

# 抽出ダイアログ要約のためのLLM支援セミスーパービジョン

LLM aided semi-supervision for Extractive Dialog Summarization ( http://arxiv.org/abs/2311.11462v2 )

ライセンス: Link先を確認

Nishant Mishra, Gaurav Sahu, Iacer Calixto, Ameen Abu-Hanna, Issam H. Laradji

(参考訳) チャットダイアログの高品質な要約を生成するには、しばしば大きなラベル付きデータセットが必要になる。本研究では,ラベルなしデータを用いてユーザエージェント対話の抽出を効率的に行う手法を提案する。本手法では,問合せ問題として要約をフレーム化し,現在最先端の大規模言語モデル(LLM)を用いてダイアログの擬似ラベルを生成する。次に、これらの擬似ラベルを用いてチャット要約モデルを微調整し、大きなLLMからの知識をより小さな特殊モデルに効果的に転送する。従来のラベル付きデータセットの10%を使って65.9/57.0/61.0 ROUGE-1/-2/Lを達成するのに対し、トレーニングデータセット全体に基づいてトレーニングされた現在の最先端技術は65.16/55.81/64.37 ROUGE-1/-2/Lを得る。言い換えれば、最悪の場合(ROUGE-L)では、パフォーマンスの94.7%を維持しながら、データの10%しか使用していません。

Generating high-quality summaries for chat dialogs often requires large labeled datasets. We propose a method to efficiently use unlabeled data for extractive summarization of customer-agent dialogs. In our method, we frame summarization as a question-answering problem and use state-of-the-art large language models (LLMs) to generate pseudo-labels for a dialog. We then use these pseudo-labels to fine-tune a chat summarization model, effectively transferring knowledge from the large LLM into a smaller specialized model. We demonstrate our method on the \tweetsumm dataset, and show that using 10% of the original labelled data set we can achieve 65.9/57.0/61.0 ROUGE-1/-2/-L, whereas the current state-of-the-art trained on the entire training data set obtains 65.16/55.81/64.37 ROUGE-1/-2/-L. In other words, in the worst case (i.e., ROUGE-L) we still effectively retain 94.7% of the performance while using only 10% of the data.

翻訳日:2023-11-28 01:59:23 公開日:2023-11-23

# BrainZ-BP:脳バイオインダプタンスと心電図を利用した非侵襲的カフレス血圧推定法

BrainZ-BP: A Non-invasive Cuff-less Blood Pressure Estimation Approach Leveraging Brain Bio-impedance and Electrocardiogram ( http://arxiv.org/abs/2311.10996v2 )

ライセンス: Link先を確認

Bufang Yang, Le Liu, Wenxuan Wu, Mengliang Zhou, Hongxing Liu, Xinbao Ning

(参考訳) 心血管疾患の早期予防には,正確な血圧モニタリング(BP)が不可欠である。近年,非侵襲的かつカフレスBP推定アルゴリズムが注目されている。これまでの研究では、脳内バイオインダプタンス(BIOZ)が非侵襲的頭蓋内圧(ICP)モニタリングの有望な技術であることが示された。臨床的には、外傷性脳損傷(TBI)患者の治療には、ICPとBPを同時に監視する必要がある。脳BIOZによるBPの推定は、患者に装着されるセンサーの数を減少させ、快適さを向上させる。そこで本研究では,脳内BIOZを用いたBP推定の実現可能性について検討し,新しいカフレスBP推定手法であるBrainZ-BPを提案する。頭部の額骨と後頭骨の2つの電極を脳バイオス測定の前後方向に配置する。脈波伝達時間とBIOZの形態的特徴を抽出し, BP推定のための4つの回帰モデルに入力した。その結果, 無作為森林回帰モデルの平均絶対誤差, 根平均二乗誤差, 相関係数は2.17 mmHg, 3.91 mmHg, 0.90で, 拡張期圧力推定では1.71 mmHg, 3.02 mmHg, 0.89であった。提案するbrainz-bpは、脳biozベースのicp監視シナリオに適用でき、同時にbpを監視することができる。

Accurate and continuous blood pressure (BP) monitoring is essential to the early prevention of cardiovascular diseases. Non-invasive and cuff-less BP estimation algorithm has gained much attention in recent years. Previous studies have demonstrated that brain bio-impedance (BIOZ) is a promising technique for non-invasive intracranial pressure (ICP) monitoring. Clinically, treatment for patients with traumatic brain injuries (TBI) requires monitoring the ICP and BP of patients simultaneously. Estimating BP by brain BIOZ directly can reduce the number of sensors attached to the patients, thus improving their comfort. To address the issues, in this study, we explore the feasibility of leveraging brain BIOZ for BP estimation and propose a novel cuff-less BP estimation approach called BrainZ-BP. Two electrodes are placed on the forehead and occipital bone of the head in the anterior-posterior direction for brain BIOZ measurement. Various features including pulse transit time and morphological features of brain BIOZ are extracted and fed into four regression models for BP estimation. Results show that the mean absolute error, root mean square error, and correlation coefficient of random forest regression model are 2.17 mmHg, 3.91 mmHg, and 0.90 for systolic pressure estimation, and are 1.71 mmHg, 3.02 mmHg, and 0.89 for diastolic pressure estimation. The presented BrainZ-BP can be applied in the brain BIOZ-based ICP monitoring scenario to monitor BP simultaneously.

翻訳日:2023-11-28 01:59:03 公開日:2023-11-23

# サルポックス病検出のための深層学習技術の進歩に関する最近の調査

A Recent Survey of the Advancements in Deep Learning Techniques for Monkeypox Disease Detection ( http://arxiv.org/abs/2311.10754v2 )

ライセンス: Link先を確認

Saddam Hussain Khan, Rashid Iqbal, Saeeda Naz (Artifical Intelligence Lab, Department of Computer Systems Engineering, University of Engineering and Applied Science (UEAS), Swat, Pakistan)

(参考訳) サルポックス(英: Monkeypox、MPox)は、アフリカで最初に発見され、2022年半ばに世界的注目を集めたポックスウイルスの一群であるMPoxウイルスによって引き起こされた動物感染症である。 2022年7月には、頭痛、寒冷感、発熱、天然痘、麻疹、ニワトリのような皮膚の症状や、WHOが世界公衆衛生のパンデミックとして公式に発表したMPoxなどの症状がある。しかし、病院内の手動分析は、医療専門家の負担、限られた施設、医師の可用性と疲労、公衆衛生上の緊急事態時のヒューマンエラーなど、大きな課題となっている。そこで本研究では,皮膚病変画像におけるmpox自動検出のための深層学習法(dl)の広範囲かつ効率的な解析を行う。これらのdl技術は、深層cnn、深層cnnsアンサンブル、深層ハイブリッド学習、新規開発、mpox診断のための視覚トランスフォーマといったカテゴリに広く分類されている。さらに, 本研究は, DL技術の進化的進展を体系的に調査し, 従来の手法の限界に対処し, 価値ある貢献とイノベーションを強調した。さらに,本論文では,各種情報源からのベンチマークデータセットとその収集,前処理技術,評価指標について述べる。調査はまた、新たな概念を簡単に探り、研究のギャップ、限界、応用を特定し、診断プロセスの課題を概説する。この調査は、dlイノベーティブなアイデアの展望領域に関する貴重な洞察を提供し、研究者の道筋となることが期待されている。

Monkeypox (MPox) is a zoonotic infectious disease induced by the MPox Virus, part of the poxviridae orthopoxvirus group initially discovered in Africa and gained global attention in mid-2022 with cases reported outside endemic areas. Symptoms include headaches, chills, fever, smallpox, measles, and chickenpox-like skin manifestations and the WHO officially announced MPox as a global public health pandemic, in July 2022.Traditionally, PCR testing of skin lesions is considered a benchmark for the primary diagnosis by WHO, with symptom management as the primary treatment and antiviral drugs like tecovirimat for severe cases. However, manual analysis within hospitals poses a substantial challenge including the substantial burden on healthcare professionals, limited facilities, availability and fatigue among doctors, and human error during public health emergencies. Therefore, this survey paper provides an extensive and efficient analysis of deep learning (DL) methods for the automatic detection of MPox in skin lesion images. These DL techniques are broadly grouped into categories, including deep CNN, Deep CNNs ensemble, deep hybrid learning, the newly developed, and Vision transformer for diagnosing MPox. Moreover, this study offers a systematic exploration of the evolutionary progression of DL techniques and identifies, and addresses limitations in previous methods while highlighting the valuable contributions and innovation. Additionally, the paper addresses benchmark datasets and their collection from various authentic sources, pre-processing techniques, and evaluation metrics. The survey also briefly delves into emerging concepts, identifies research gaps, limitations, and applications, and outlines challenges in the diagnosis process. This survey furnishes valuable insights into the prospective areas of DL innovative ideas and is anticipated to serve as a path for researchers.

翻訳日:2023-11-28 01:58:19 公開日:2023-11-23

# プロンプト誘導多モード変圧器による結晶材料の状態予測密度

Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal Transformer ( http://arxiv.org/abs/2311.12856v2 )

ライセンス: Link先を確認

Namkyeong Lee, Heewoong Noh, Sungwon Kim, Dongmin Hyun, Gyoung S. Na, Chanyoung Park

(参考訳) 状態密度 (DOS) は結晶材料のスペクトル特性であり、物質の様々な特性に関する基本的な知見を提供する。従来の研究は主にDOS予測のための結晶材料の高品質な表現の獲得に焦点が当てられていたが、我々はDOSの性質を反映して得られた表現からDOSを予測することに重点を置いている。つまり、dosは結晶性物質だけでなく、以前の作品では無視されているエネルギーレベルによっても決定される。本稿では,多モード変圧器を用いて結晶材料とエネルギーから得られる不均一な情報を統合し,結晶材料中の原子と様々なエネルギー準位との複雑な関係をモデル化し,dos予測を行う。さらに, 結晶構造系とエネルギーの相互作用を学習するためのモデルとして, プロンプトを活用することを提案する。 Phonon DOSとElectron DOSの2種類のDOSに関する大規模な実験は、DOSTransformerの優位性を実証している。 DOSTransformerのソースコードはhttps://github.com/HeewoongNoh/DOSTransformerで入手できる。

The density of states (DOS) is a spectral property of crystalline materials, which provides fundamental insights into various characteristics of the materials. While previous works mainly focus on obtaining high-quality representations of crystalline materials for DOS prediction, we focus on predicting the DOS from the obtained representations by reflecting the nature of DOS: DOS determines the general distribution of states as a function of energy. That is, DOS is not solely determined by the crystalline material but also by the energy levels, which has been neglected in previous works. In this paper, we propose to integrate heterogeneous information obtained from the crystalline materials and the energies via a multi-modal transformer, thereby modeling the complex relationships between the atoms in the crystalline materials and various energy levels for DOS prediction. Moreover, we propose to utilize prompts to guide the model to learn the crystal structural system-specific interactions between crystalline materials and energies. Extensive experiments on two types of DOS, i.e., Phonon DOS and Electron DOS, with various real-world scenarios demonstrate the superiority of DOSTransformer. The source code for DOSTransformer is available at https://github.com/HeewoongNoh/DOSTransformer.

翻訳日:2023-11-28 01:46:13 公開日:2023-11-23

# minimax: JAX における Autocurricula の効率的なベースライン

minimax: Efficient Baselines for Autocurricula in JAX ( http://arxiv.org/abs/2311.12716v2 )

ライセンス: Link先を確認

Minqi Jiang, Michael Dennis, Edward Grefenstette, Tim Rockt\"aschel

(参考訳) 教師なし環境設計(unsupervised environment design, ued)は、ロバストな意思決定エージェントを訓練し、目に見えない環境へゼロショット転送する自動カリキュラム学習の一形態である。このようなautocurriculaはrlコミュニティから大きな関心を集めている。しかし、CPUロールアウトとGPUモデルの更新に基づくUED実験は、しばしば数週間のトレーニングを必要とした。この計算要求は、この分野の急速な革新の大きな障害である。本研究は、加速ハードウェア上でのuedトレーニングのためのminimaxライブラリを紹介する。 JAXを使って完全に拡張された環境とオートキュラムアルゴリズムを実装し、minimaxはハードウェアアクセラレーションのためにトレーニングループ全体をコンパイルできる。手続き的に生成された環境でオートキュリキュラを行うための再利用可能な抽象化に加えて、MiniGridに基づくテンソル化グリッドワールドを含む、迅速な実験用のペトリ皿を提供する。これらのコンポーネントにより、minimaxは、バッチサイズのトレーニングで以前の実装と比較して120$\times$のスピードアップを実現する新しい並列化バージョンを含む、強力なuedベースラインを提供する。 minimaxライブラリはApache 2.0ライセンスでhttps://github.com/facebookresearch/minimax.comから入手できる。

Unsupervised environment design (UED) is a form of automatic curriculum learning for training robust decision-making agents to zero-shot transfer into unseen environments. Such autocurricula have received much interest from the RL community. However, UED experiments, based on CPU rollouts and GPU model updates, have often required several weeks of training. This compute requirement is a major obstacle to rapid innovation for the field. This work introduces the minimax library for UED training on accelerated hardware. Using JAX to implement fully-tensorized environments and autocurriculum algorithms, minimax allows the entire training loop to be compiled for hardware acceleration. To provide a petri dish for rapid experimentation, minimax includes a tensorized grid-world based on MiniGrid, in addition to reusable abstractions for conducting autocurricula in procedurally-generated environments. With these components, minimax provides strong UED baselines, including new parallelized variants, which achieve over 120$\times$ speedups in wall time compared to previous implementations when training with equal batch sizes. The minimax library is available under the Apache 2.0 license at https://github.com/facebookresearch/minimax.

翻訳日:2023-11-28 01:45:40 公開日:2023-11-23

# オーストラリアの建設サプライチェーンリスクマネジメントにおけるトランスフォーマティブに基づく名前付きエンティティ認識

Transformer-based Named Entity Recognition in Construction Supply Chain Risk Management in Australia ( http://arxiv.org/abs/2311.13755v1 )

ライセンス: Link先を確認

Milad Baghalzadeh Shishehgarkhaneh, Robert C. Moehler, Yihai Fang, Amer A. Hijazi, Hamed Aboutorab

(参考訳) オーストラリアの建設産業は複雑なサプライチェーンと無数のリスクに対する脆弱性が特徴である。これにより、効果的なサプライチェーンリスクマネジメント(SCRM)が必須となる。本稿では,異なるトランスフォーマーモデルを用いて,オーストラリアのSCRMにおける名前付きエンティティ認識(NER)の訓練を行う。 NERを利用することで、トランスフォーマーモデルはニュース記事の特定のリスク関連エンティティを特定し、分類し、サプライチェーンの脆弱性に関する詳細な洞察を提供する。異なるトランスフォーマーモデルを通じてニュース記事を分析することにより,オーストラリアの建築景観に特定のリスク分類群 (milieu) に関連する関連エンティティと洞察を抽出できる。本研究は, トランスフォーマーモデルのようなNLP駆動型ソリューションが, 地理メディア特有の文脈で構築するためのSCRMに革命をもたらす可能性を強調する。

The construction industry in Australia is characterized by its intricate supply chains and vulnerability to myriad risks. As such, effective supply chain risk management (SCRM) becomes imperative. This paper employs different transformer models, and train for Named Entity Recognition (NER) in the context of Australian construction SCRM. Utilizing NER, transformer models identify and classify specific risk-associated entities in news articles, offering a detailed insight into supply chain vulnerabilities. By analysing news articles through different transformer models, we can extract relevant entities and insights related to specific risk taxonomies local (milieu) to the Australian construction landscape. This research emphasises the potential of NLP-driven solutions, like transformer models, in revolutionising SCRM for construction in geo-media specific contexts.

翻訳日:2023-11-28 00:59:55 公開日:2023-11-23

# 3D-MIR:放射線画像検索のベンチマークと実証的研究

3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in Radiology ( http://arxiv.org/abs/2311.13752v1 )

ライセンス: Link先を確認

Asma Ben Abacha, Alberto Santamaria-Pang, Ho Hin Lee, Jameson Merkow, Qin Cai, Surya Teja Devarakonda, Abdullah Islam, Julia Gong, Matthew P. Lungren, Thomas Lin, Noel C Codella, Ivan Tarapov

(参考訳) 医療現場での医療画像の利用の増加は、放射線科医の作業負荷の増加によって大きな課題となっているが、効果的に活用すれば医療結果を高める機会も提供する。 3d画像検索は、臨床医が診断的に類似または関連のある症例を効率的に検索することで、放射線科医の作業を減らす可能性を秘めている。しかし、3次元医用画像検索の分野は、確立された評価ベンチマーク、包括的なデータセット、徹底的な研究が欠如している。本稿では,3次元医用画像検索(3D-MIR)の新たなベンチマークを導入することで,このギャップを埋めようとしている。このベンチマークを用いて,一般的なマルチモーダル基礎モデルの2次元スライス,3次元ボリューム,マルチモーダル埋め込みをクエリとして利用する,多様な検索戦略を探索する。各アプローチの定量的で質的な評価は、将来の研究への洞察を提供する詳細な議論とともに提供される。この分野の進歩を促進するため、我々のベンチマーク、データセット、コードを公開しています。

The increasing use of medical imaging in healthcare settings presents a significant challenge due to the increasing workload for radiologists, yet it also offers opportunity for enhancing healthcare outcomes if effectively leveraged. 3D image retrieval holds potential to reduce radiologist workloads by enabling clinicians to efficiently search through diagnostically similar or otherwise relevant cases, resulting in faster and more precise diagnoses. However, the field of 3D medical image retrieval is still emerging, lacking established evaluation benchmarks, comprehensive datasets, and thorough studies. This paper attempts to bridge this gap by introducing a novel benchmark for 3D Medical Image Retrieval (3D-MIR) that encompasses four different anatomies imaged with computed tomography. Using this benchmark, we explore a diverse set of search strategies that use aggregated 2D slices, 3D volumes, and multi-modal embeddings from popular multi-modal foundation models as queries. Quantitative and qualitative assessments of each approach are provided alongside an in-depth discussion that offers insight for future research. To promote the advancement of this field, our benchmark, dataset, and code are made publicly available.

翻訳日:2023-11-28 00:59:40 公開日:2023-11-23

# 自律性のための伝達可能なマルチモーダル知覚表現学習に向けて:NeRF-Supervised Masked AutoEncoder

Towards Transferable Multi-modal Perception Representation Learning for Autonomy: NeRF-Supervised Masked AutoEncoder ( http://arxiv.org/abs/2311.13750v1 )

ライセンス: Link先を確認

Xiaohao Xu

(参考訳) 本研究では、NeRF(Near Radiance Field)におけるマスク付きマルチモーダル再構成(NeRF-Supervised Masked AutoEncoder, NS-MAE)による、伝達可能なマルチモーダル認識表現学習のための統合型事前学習フレームワークを提案する。具体的には、特定の視点方向や位置に基づいて、破損したマルチモーダル入力信号、すなわちlidar点雲や画像から抽出されたマルチモーダル埋め込みを、ニューラルネットワークによる投影されたマルチモーダル特徴マップに描画する。そして、元のマルチモーダル信号はレンダリングされたマルチモーダル特徴写像の再構成ターゲットとして機能し、自己教師付き表現学習を可能にする。 NS-MAEを用いて学習した表現は、多様な微調整ラベル付きデータを用いて、多様な3次元認識下流タスク(3Dオブジェクト検出およびBEVマップセグメンテーション)上の多モードおよび単モード(カメラのみおよびライダーのみ)知覚モデルに対する有望な伝達可能性を示す。さらに、NS-MAEは、マスキングオートエンコーダとニューラルラディアンスフィールドの両方の機構の相乗効果を経験的に享受している。我々のコードは受諾後に解放される。

This work proposes a unified self-supervised pre-training framework for transferable multi-modal perception representation learning via masked multi-modal reconstruction in Neural Radiance Field (NeRF), namely NeRF-Supervised Masked AutoEncoder (NS-MAE). Specifically, conditioned on certain view directions and locations, multi-modal embeddings extracted from corrupted multi-modal input signals, i.e., Lidar point clouds and images, are rendered into projected multi-modal feature maps via neural rendering. Then, original multi-modal signals serve as reconstruction targets for the rendered multi-modal feature maps to enable self-supervised representation learning. Extensive experiments show that the representation learned via NS-MAE shows promising transferability for diverse multi-modal and single-modal (camera-only and Lidar-only) perception models on diverse 3D perception downstream tasks (3D object detection and BEV map segmentation) with diverse amounts of fine-tuning labeled data. Moreover, we empirically find that NS-MAE enjoys the synergy of both the mechanism of masked autoencoder and neural radiance field. Our code shall be released upon acceptance.

翻訳日:2023-11-28 00:59:16 公開日:2023-11-23

# 創発組織の原則について

On Principles of Emergent Organization ( http://arxiv.org/abs/2311.13749v1 )

ライセンス: Link先を確認

Adam T. Rupe and James P. Crutchfield

(参考訳) 1世紀以上にわたる共同作業の後、物理学は依然として自発的な自己組織化の基本原理を欠いている。その理由を理解すべく、我々はまず問題を述べ、歴史的アプローチを概説し、自己組織化の物理学の現状を調査する。これは、数学的難解性から生じる特定の課題と、計算アプローチの必要性、および構造を定義するための慢性的な失敗から生じる課題の枠組みである。次に、組織における2つの現代的な数学的定式化(内在的計算と進化演算子)の概要が、これらの課題を克服する方法を示している。同時に、それらが得るバンテージポイントは、平衡から遠ざかる系の統計力学を通して構造状態の出現を説明する方法を示している。その結果は、構造を数学的に識別する組織原理への建設的な道のりである。

After more than a century of concerted effort, physics still lacks basic principles of spontaneous self-organization. To appreciate why, we first state the problem, outline historical approaches, and survey the present state of the physics of self-organization. This frames the particular challenges arising from mathematical intractability and the resulting need for computational approaches, as well as those arising from a chronic failure to define structure. Then, an overview of two modern mathematical formulations of organization -- intrinsic computation and evolution operators -- lays out a way to overcome these challenges. Together, the vantage point they afford shows how to account for the emergence of structured states via a statistical mechanics of systems arbitrarily far from equilibrium. The result is a constructive path forward to principles of organization that builds on mathematical identification of structure.

翻訳日:2023-11-28 00:58:46 公開日:2023-11-23

# 拡散のためのサンプル効率トレーニング

Sample-Efficient Training for Diffusion ( http://arxiv.org/abs/2311.13745v1 )

ライセンス: Link先を確認

Shivam Gupta, Aditya Parulekar, Eric Price, Zhiyang Xun

(参考訳) スコアベースの拡散モデルは、その経験的性能と信頼性から、画像の深層生成モデルに対する最も一般的なアプローチとなっている。近年,いくつかの理論研究が,l^2$-accurate score 推定を仮定して,拡散モデルが効率的にサンプル化できることを実証している。スコアマッチングの目的は自然に$L^2$の真のスコアを近似するが、既存の境界のサンプルの複雑さはデータ半径と所望のワッサーシュタイン精度に依存する。対照的に、サンプリングの時間複雑性はこれらのパラメータの対数のみである。この多項式依存度は$l^2$ \emph{requires} でスコアを推定するが、ワッサースタイン精度で多対数にスケールする多くのサンプルはサンプリングに十分である。本研究は, 多対数的なサンプル数を用いて, スコアマッチング対象のERMが真の分布の確率$\delta$分数以外のすべてに対して$L^2$精度であり, より弱い保証は効率的なサンプリングに十分であることを示す。

Score-based diffusion models have become the most popular approach to deep generative modeling of images, largely due to their empirical performance and reliability. Recently, a number of theoretical works \citep{chen2022, Chen2022ImprovedAO, Chenetal23flowode, benton2023linear} have shown that diffusion models can efficiently sample, assuming $L^2$-accurate score estimates. The score-matching objective naturally approximates the true score in $L^2$, but the sample complexity of existing bounds depends \emph{polynomially} on the data radius and desired Wasserstein accuracy. By contrast, the time complexity of sampling is only logarithmic in these parameters. We show that estimating the score in $L^2$ \emph{requires} this polynomial dependence, but that a number of samples that scales polylogarithmically in the Wasserstein accuracy actually do suffice for sampling. We show that with a polylogarithmic number of samples, the ERM of the score-matching objective is $L^2$ accurate on all but a probability $\delta$ fraction of the true distribution, and that this weaker guarantee is sufficient for efficient sampling.

翻訳日:2023-11-28 00:58:33 公開日:2023-11-23

# ディープラーニングモデルにおけるセキュリティとプライバシの課題

Security and Privacy Challenges in Deep Learning Models ( http://arxiv.org/abs/2311.13744v1 )

ライセンス: Link先を確認

Gopichandh Golla

(参考訳) 近年、ディープラーニングモデルは、自律運転から医療診断まで、複数の分野で大きな成功を収めている。これらのモデルは、これまで解決が困難だった複雑な問題に対する優れた解決策を提供することで、人工知能の能力を拡大した。さまざまな面で成功しているにも関わらず、ディープラーニングモデルは、ディープニューラルネットワークモデルのモデルセキュリティとデータプライバシを侵害するさまざまな攻撃を受ける可能性がある、という研究を通じて特定されている。ディープラーニングモデルは、ライフサイクルのさまざまな段階でさまざまな攻撃を受けることができる。テストフェーズでは、モデル抽出攻撃、モデル反転攻撃、逆攻撃など、さまざまな種類の攻撃を通じて脆弱性を利用することができる。モデル抽出攻撃は、訓練されたディープラーニングモデルをリバースエンジニアリングすることを目的としており、アーキテクチャとパラメータを明らかにすることが主な目的である。モデル反転攻撃は、ディープラーニングモデルで使用されるデータのプライバシーを侵害することを目的としている。これらの攻撃は、モデルの予測からセンシティブなトレーニングデータを調べることによって、モデルの機密性を損なうために行われる。モデルの応答を分析することで、攻撃者は機密情報を再構築することを目指している。このようにして、モデルのデータプライバシが侵害される。主にコンピュータビジョンモデルに使用される敵攻撃は、悪意のあるテストデータを通じて、モデルが確実に不正な予測を行うよう、モデルを破損させる。これらの攻撃は入力データを微妙に変更し、正常に見えるが、誤った判断をする深層学習モデルを誤解させる。このような攻撃は、モデルの評価とトレーニングフェーズの両方で起こりうる。データ中毒攻撃はトレーニングセットに有害なデータを加え、学習プロセスを破壊し、ディープラーニングモードの信頼性を低下させる。

These days, deep learning models have achieved great success in multiple fields, from autonomous driving to medical diagnosis. These models have expanded the abilities of artificial intelligence by offering great solutions to complex problems that were very difficult to solve earlier. In spite of their unseen success in various, it has been identified, through research conducted, that deep learning models can be subjected to various attacks that compromise model security and data privacy of the Deep Neural Network models. Deep learning models can be subjected to various attacks at different stages of their lifecycle. During the testing phase, attackers can exploit vulnerabilities through different kinds of attacks such as Model Extraction Attacks, Model Inversion attacks, and Adversarial attacks. Model Extraction Attacks are aimed at reverse-engineering a trained deep learning model, with the primary objective of revealing its architecture and parameters. Model inversion attacks aim to compromise the privacy of the data used in the Deep learning model. These attacks are done to compromise the confidentiality of the model by going through the sensitive training data from the model's predictions. By analyzing the model's responses, attackers aim to reconstruct sensitive information. In this way, the model's data privacy is compromised. Adversarial attacks, mainly employed on computer vision models, are made to corrupt models into confidently making incorrect predictions through malicious testing data. These attacks subtly alter the input data, making it look normal but misleading deep learning models to make incorrect decisions. Such attacks can happen during both the model's evaluation and training phases. Data Poisoning Attacks add harmful data to the training set, disrupting the learning process and reducing the reliability of the deep learning mode.

翻訳日:2023-11-28 00:58:11 公開日:2023-11-23

# FinMe: 階層記憶と文字設計を備えたパフォーマンス向上した大規模言語モデルトレーディングエージェント

FinMe: A Performance-Enhanced Large Language Model Trading Agent with Layered Memory and Character Design ( http://arxiv.org/abs/2311.13743v1 )

ライセンス: Link先を確認

Yangyang Yu, Haohang Li, Zhi Chen, Yuechen Jiang, Yang Li, Denghui Zhang, Rong Liu, Jordan W. Suchow, Khaldoun Khashanah

(参考訳) 近年のLarge Language Models (LLMs) の進歩は、様々な領域にわたる質問応答(QA)タスクにおいて顕著な効果を示した。彼らの広範なウェブ知識の統合への取り組みは、LSM自律エージェントの開発への関心を喚起した。 LLMは、人間の指示を復号し、歴史的入力を水平に処理することで解を導出するのに効率的であるが、目的駆動エージェントへの移行には、多元的情報処理、推論連鎖の確立、重要なタスクの優先順位付けなどの補助的合理的なアーキテクチャが必要である。そこで我々は, LLM をベースとした新たなエージェントフレームワークである \textsc{FinMe} を導入し, エージェントの特徴を概説するためのプロファイリング, 階層化処理によるエージェントの現実的な階層的金融データの同化を支援するメモリ, メモリから得られる洞察を投資決定に変換するための意思決定, という3つの中核モジュールを包含する。特に、 \textsc{FinMe} のメモリモジュールは人間のトレーダーの認知構造と密接に一致し、堅牢な解釈可能性とリアルタイムチューニングを提供する。その調整可能な認知スパンにより、人間の知覚限界を超えた重要な情報の保持が可能になり、取引結果が向上する。このフレームワークにより、エージェントは自身の専門知識を自発的に活用し、新たな投資のヒントにアジャイルに反応し、不安定な金融環境におけるトレーディング決定を継続的に洗練することができる。まず、さまざまなアルゴリズムエージェントをスケーラブルな現実世界の財務データセットで比較し、株価やファンドにおける主要なトレーディングパフォーマンスを裏付ける。その後、エージェントの知覚範囲を微調整して、重要な取引パフォーマンスを実現しました。集合的に、 \textsc{FinMe} は自動取引のための最先端の LLM エージェントフレームワークを提示し、累積投資リターンを高める。

Recent advancements in Large Language Models (LLMs) have exhibited notable efficacy in question-answering (QA) tasks across diverse domains. Their prowess in integrating extensive web knowledge has fueled interest in developing LLM autonomous agents. While LLMs are efficient in decoding human instructions and deriving solutions by holistically processing historical inputs, transitioning to purpose-driven agents requires a supplementary rational architecture to process multi-source information, establish reasoning chains, and prioritize critical tasks. Addressing this, we introduce \textsc{FinMe}, a novel LLM-based agent framework devised for financial decision-making, encompassing three core modules: Profiling, to outline the agent's characteristics; Memory, with layered processing, to aid the agent in assimilating realistic hierarchical financial data; and Decision-making, to convert insights gained from memories into investment decisions. Notably, \textsc{FinMe}'s memory module aligns closely with the cognitive structure of human traders, offering robust interpretability and real-time tuning. Its adjustable cognitive span allows for the retention of critical information beyond human perceptual limits, thereby enhancing trading outcomes. This framework enables the agent to self-evolve its professional knowledge, react agilely to new investment cues, and continuously refine trading decisions in the volatile financial environment. We first compare \textsc{FinMe} with various algorithmic agents on a scalable real-world financial dataset, underscoring its leading trading performance in stocks and funds. We then fine-tuned the agent's perceptual spans to achieve a significant trading performance. Collectively, \textsc{FinMe} presents a cutting-edge LLM agent framework for automated trading, boosting cumulative investment returns.

翻訳日:2023-11-28 00:57:46 公開日:2023-11-23

# OASIS:フェデレートラーニングにおけるアクティブリコンストラクションアタックのオフセット

OASIS: Offsetting Active Reconstruction Attacks in Federated Learning ( http://arxiv.org/abs/2311.13739v1 )

ライセンス: Link先を確認

Tre' R. Jeter, Truc Nguyen, Raed Alharbi, My T. Thai

(参考訳) フェデレーション学習(federated learning, fl)は、モデルのトレーニング効率を高めながら、ユーザのプライバシを保護する可能性に大きな注目を集めている。しかし、近年の研究では、flプロトコルが不正なサーバによって実行されたアクティブリコンストラクション攻撃によって容易に侵害できることが示されている。これらの攻撃には、グローバルモデルパラメータの悪質な修正が含まれており、勾配更新を反転させることで、サーバがユーザのプライベートデータの冗長コピーを取得することができる。このタイプの攻撃に対処することは、強力な脅威モデルのために重要な課題である。本稿では,モデル性能を維持しつつ,能動的再構築攻撃を効果的に防止する画像拡張に基づく防御機構であるoasisを提案する。まず,これらの攻撃を可能にする勾配反転の原理を明らかにし,攻撃戦略によらず防御が堅牢である主条件を理論的に同定する。次に,攻撃原理を損なう可能性があることを示す画像拡張によりoasisを構築する。包括的評価はoasisのソリューションとしての可能性を強調した効果を示している。

Federated Learning (FL) has garnered significant attention for its potential to protect user privacy while enhancing model training efficiency. However, recent research has demonstrated that FL protocols can be easily compromised by active reconstruction attacks executed by dishonest servers. These attacks involve the malicious modification of global model parameters, allowing the server to obtain a verbatim copy of users' private data by inverting their gradient updates. Tackling this class of attack remains a crucial challenge due to the strong threat model. In this paper, we propose OASIS, a defense mechanism based on image augmentation that effectively counteracts active reconstruction attacks while preserving model performance. We first uncover the core principle of gradient inversion that enables these attacks and theoretically identify the main conditions by which the defense can be robust regardless of the attack strategies. We then construct OASIS with image augmentation showing that it can undermine the attack principle. Comprehensive evaluations demonstrate the efficacy of OASIS highlighting its feasibility as a solution.

翻訳日:2023-11-28 00:57:09 公開日:2023-11-23

# フェデレーション学習による車両のインターネット侵入検知の強化

Enhancing Intrusion Detection In Internet Of Vehicles Through Federated Learning ( http://arxiv.org/abs/2311.13800v1 )

ライセンス: Link先を確認

Abhishek Sebastian, Pragna R, Sudhakaran G, Renjith P N and Leela Karthikeyan H

(参考訳) フェデレートラーニング(Federated Learning)は、分散機械学習のテクニックである。複数のパーティが協力して、生データを共有せずに共有モデルを学びます。本稿では,CIC-IDS 2017データセットを用いたIoT(Internet of Vehicles)における侵入検知のための連合学習フレームワークを提案する。提案フレームワークでは,クラス不均衡の処理にSMOTE,異常観測の識別と除去にoutlier Detection,モデルの性能最適化にハイパーパラメータチューニングを採用している。提案手法を各種性能指標を用いて評価し,他のデータセット(KDD-Cup 99およびUNSW-NB-15)と従来の分類器との侵入検出の有効性を示した。さらに,提案フレームワークは,高い侵入検出性能を実現しながら,機密データを保護できる。

Federated learning is a technique of decentralized machine learning. that allows multiple parties to collaborate and learn a shared model without sharing their raw data. Our paper proposes a federated learning framework for intrusion detection in Internet of Vehicles (IOVs) using the CIC-IDS 2017 dataset. The proposed framework employs SMOTE for handling class imbalance, outlier detection for identifying and removing abnormal observations, and hyperparameter tuning to optimize the model's performance. The authors evaluated the proposed framework using various performance metrics and demonstrated its effectiveness in detecting intrusions with other datasets (KDD-Cup 99 and UNSW- NB-15) and conventional classifiers. Furthermore, the proposed framework can protect sensitive data while achieving high intrusion detection performance.

翻訳日:2023-11-28 00:47:33 公開日:2023-11-23

# Evidential Active Recognition: Intelligent and Prudent Open World Embodied Perception

Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception ( http://arxiv.org/abs/2311.13793v1 )

ライセンス: Link先を確認

Lei Fan, Mingfu Liang, Yunxuan Li, Gang Hua and Ying Wu

(参考訳) アクティブな認識により、ロボットは新しい観察をインテリジェントに探索し、望ましくない視界を回避しながらより多くの情報を得ることができる。近年のアプローチでは、シミュレーションや収集データからの学習方針が好まれており、認識が正確である場合には適切な行動がより頻繁に選択される。しかし、ほとんどの認識モジュールはクローズドワールド仮定の下で開発されており、現在の観測における対象物体の欠如のような予期せぬ入力を処理できない。そこで本研究では, 逐次的エビデンス収集プロセスとしての能動認識の処理を提案し, 証拠組合せ理論に基づく不確実性定量化と信頼性予測を行う。さらに,本稿で開発された報酬関数は,オープンワールド環境での運用における行動のメリットを効果的に特徴付ける。性能を評価するため,室内シミュレータからデータセットを収集し,距離,閉塞レベル,可視性などの様々な認識課題を含む。認識とロバスト性解析に関する一連の実験を通じて,提案手法の能動認識に不確実性を導入する必要性と優れた性能を示す。

Active recognition enables robots to intelligently explore novel observations, thereby acquiring more information while circumventing undesired viewing conditions. Recent approaches favor learning policies from simulated or collected data, wherein appropriate actions are more frequently selected when the recognition is accurate. However, most recognition modules are developed under the closed-world assumption, which makes them ill-equipped to handle unexpected inputs, such as the absence of the target object in the current observation. To address this issue, we propose treating active recognition as a sequential evidence-gathering process, providing by-step uncertainty quantification and reliable prediction under the evidence combination theory. Additionally, the reward function developed in this paper effectively characterizes the merit of actions when operating in open-world environments. To evaluate the performance, we collect a dataset from an indoor simulator, encompassing various recognition challenges such as distance, occlusion levels, and visibility. Through a series of experiments on recognition and robustness analysis, we demonstrate the necessity of introducing uncertainties to active recognition and the superior performance of the proposed method.

翻訳日:2023-11-28 00:47:20 公開日:2023-11-23

# 単一捕捉イオンを伴うボソニック系のlee-yang零点

Lee-Yang Zeros of a Bosonic system associated with a single trapped ion ( http://arxiv.org/abs/2311.13790v1 )

ライセンス: Link先を確認

Wenjie Shao, Yulian Chen, Ren-Liu, Yiheng Lin

(参考訳) 分割関数の零点、特にlee-yang零点は、複素平面において位相遷移を理解する上で重要な情報を提供する。中心量子系のコヒーレンスと複素平面における環境の分配関数との等価性に関する最近の発見は、スピン系に関するいくつかの先駆的な実験でLee-Yangゼロの実験的な研究を可能にした。リー・ヤンゼロはボソニック系では観測されていない。本稿では,スピンと運動の自由度の間の強い結合,すなわち弱結合ラム・ダイク状態を超えて,単一閉じ込めイオンに関連するボソニック系のリー・ヤン零点を実験的に示す手法を提案する。我々のスキームは、複素平面におけるボソン系の熱力学の量子シミュレーションの新しい可能性を提供する。

Zeros of partition functions, in particular Lee-Yang zeros, in a complex plane provide important information for understanding phase transitions. A recent discovery on the equivalence between the coherence of a central quantum system and the partition function of the environment in the complex plane enabled the experimental study of Lee-Yang zeros, with several pioneering experiments on spin systems. Lee-Yang zeros have not been observed in Bosonic systems. Here we propose an experimental scheme to demonstrate Lee-Yang zeros in Bosonic systems associated with a single trapped ion by introducing strong coupling between the spin and motion degrees of freedom, i.e. beyond the weak coupling Lamb-Dicke regime. Our scheme provides new possibilities for quantum simulation of the thermodynamics of Bosonic systems in the complex plane.

翻訳日:2023-11-28 00:47:00 公開日:2023-11-23

# 知識蒸留に基づく複数ユーザのための意味コミュニケーション

Knowledge Distillation Based Semantic Communications For Multiple Users ( http://arxiv.org/abs/2311.13789v1 )

ライセンス: Link先を確認

Chenguang Liu, Yuxin Zhou, Yunfei Chen and Shuang-Hua Yang

(参考訳) ディープラーニング(DL)は,従来のコミュニケーションシステムに革命をもたらす大きな可能性を示している。コミュニケーションにおける多くのアプリケーションは、強力な表現能力のためにDL技術を採用している。しかしながら、学習に基づく手法は、トレーニングデータセットに依存し、モデルの一般化性や複雑さが限られているため、見過ごされない干渉によりさらに悪化する可能性がある。本稿では,複数のユーザを対象としたセマンティックコミュニケーション(SemCom)システムについて考察する。そこで本研究では,トランスフォーマーをベースとしたエンコーダデコーダをセマンティックエンコーダデコーダとして実装し,チャネルエンコーダデコーダとして完全に接続されたニューラルネットワークを実装した知識蒸留(KD)システムを提案する。具体的には,4種類の知識伝達とモデル圧縮を解析する。ノイズと干渉のレベル、干渉ユーザ数、エンコーダとデコーダのサイズなど、重要なシステムとモデルパラメータが考慮されている。数値計算の結果,kdは不意な干渉に適用した場合のロバスト性と一般化能力を大幅に改善し,モデルサイズ圧縮時の性能損失を低減できることがわかった。

Deep learning (DL) has shown great potential in revolutionizing the traditional communications system. Many applications in communications have adopted DL techniques due to their powerful representation ability. However, the learning-based methods can be dependent on the training dataset and perform worse on unseen interference due to limited model generalizability and complexity. In this paper, we consider the semantic communication (SemCom) system with multiple users, where there is a limited number of training samples and unexpected interference. To improve the model generalization ability and reduce the model size, we propose a knowledge distillation (KD) based system where Transformer based encoder-decoder is implemented as the semantic encoder-decoder and fully connected neural networks are implemented as the channel encoder-decoder. Specifically, four types of knowledge transfer and model compression are analyzed. Important system and model parameters are considered, including the level of noise and interference, the number of interfering users and the size of the encoder and decoder. Numerical results demonstrate that KD significantly improves the robustness and the generalization ability when applied to unexpected interference, and it reduces the performance loss when compressing the model size.

翻訳日:2023-11-28 00:46:45 公開日:2023-11-23

# DaG LLM ver 1.0: 韓国NLPのための命令調整言語モデリングのパイオニア化

DaG LLM ver 1.0: Pioneering Instruction-Tuned Language Modeling for Korean NLP ( http://arxiv.org/abs/2311.13784v1 )

ライセンス: Link先を確認

Dongjun Jang, Sangah Lee, Sungjoo Byun, Jinwoong Kim, Jean Seo, Minseok Kim, Soyeon Kim, Chaeyoung Oh, Jaeyoon Kim, Hyemi Jo, Hyopil Shin

(参考訳) 本稿では,韓国語に特化した言語モデルであるDaG LLM(David and Goliath Large Language Model)について述べる。

This paper presents the DaG LLM (David and Goliath Large Language Model), a language model specialized for Korean and fine-tuned through Instruction Tuning across 41 tasks within 13 distinct categories.

翻訳日:2023-11-28 00:46:25 公開日:2023-11-23

# ネットワークセマンティック通信のためのスケーラブルAI生成コンテンツ

Scalable AI Generative Content for Vehicular Network Semantic Communication ( http://arxiv.org/abs/2311.13782v1 )

ライセンス: Link先を確認

Hao Feng, Yi Yang, Zhu Han

(参考訳) ドライバーの盲点における車両の認識は安全な運転には不可欠である。これらの盲点における潜在的に危険な車両の検出は、車載ネットワークセマンティック通信技術の恩恵を受けることができる。しかし、効率的なセマンティック通信は、特に帯域幅に制限のある状況において、精度と遅延の間のトレードオフを伴う。本稿では,エンコーダ・デコーダアーキテクチャを活用したスケーラブルな人工知能生成コンテンツ(aigc)システムを提案する。本システムは,画像をテキスト表現に変換し,高品質な画像に再構成し,車載ネットワークセマンティック通信の伝送を最適化する。また、帯域幅が許されると補助情報も統合される。エンコーダデコーダは、様々なタスクにわたる元の画像とのセマンティックな等価性を維持することを目的としている。提案手法は強化学習を用いて生成したコンテンツの信頼性を高める。実験結果から,提案手法はブラインドスポットにおける車両のベースラインを超え,通信データを効果的に圧縮することが示唆された。この手法はシナリオを駆動するために特別に設計されているが、このエンコーダ・デコーダアーキテクチャは様々なセマンティック通信シナリオにまたがる幅広い用途の可能性を秘めている。

Perceiving vehicles in a driver's blind spot is vital for safe driving. The detection of potentially dangerous vehicles in these blind spots can benefit from vehicular network semantic communication technology. However, efficient semantic communication involves a trade-off between accuracy and delay, especially in bandwidth-limited situations. This paper unveils a scalable Artificial Intelligence Generated Content (AIGC) system that leverages an encoder-decoder architecture. This system converts images into textual representations and reconstructs them into quality-acceptable images, optimizing transmission for vehicular network semantic communication. Moreover, when bandwidth allows, auxiliary information is integrated. The encoder-decoder aims to maintain semantic equivalence with the original images across various tasks. Then the proposed approach employs reinforcement learning to enhance the reliability of the generated contents. Experimental results suggest that the proposed method surpasses the baseline in perceiving vehicles in blind spots and effectively compresses communication data. While this method is specifically designed for driving scenarios, this encoder-decoder architecture also holds potential for wide use across various semantic communication scenarios.

翻訳日:2023-11-28 00:46:19 公開日:2023-11-23

# 効率的な複合人体運動予測のための動的構成グラフ畳み込みネットワーク

Dynamic Compositional Graph Convolutional Network for Efficient Composite Human Motion Prediction ( http://arxiv.org/abs/2311.13781v1 )

ライセンス: Link先を確認

Wanying Zhang, Shen Zhao, Fanyang Meng, Songtao Wu, Mengyuan Liu

(参考訳) インテリジェントな監視と人間とロボットのインタラクションを含む分野への潜在的な応用により、人間の動き予測タスクはホットな研究トピックとなり、特に最近のグラフ畳み込みネットワーク(GCN)を用いて高い成功を収めた。現在の人間の運動予測タスクは、通常、原子の動きの予測に焦点を当てている。原子の作用が同時に起こり得ることを観察し,その複合作用を定式化することにより,複合人体動作予測タスクを提案する。この課題に対処するために,まず複合動作生成(cag)モジュールを提示し,訓練用合成複合アクションを生成し,複合アクションサンプル収集の手間を回避する。さらに,動的構成グラフ畳み込みネットワーク(DC-GCN)を提示することにより,複合動作がより複雑なモデルの需要に与える影響を緩和する。 Human3.6Mデータセットと新たに収集したCHAMPデータセットの大規模な実験により、最新の動き予測精度を実現するDC-GCN法の効率が一貫して検証され、一方、従来のGCNベースの人間の動き法よりも計算コストが少なくなる。

With potential applications in fields including intelligent surveillance and human-robot interaction, the human motion prediction task has become a hot research topic and also has achieved high success, especially using the recent Graph Convolutional Network (GCN). Current human motion prediction task usually focuses on predicting human motions for atomic actions. Observing that atomic actions can happen at the same time and thus formulating the composite actions, we propose the composite human motion prediction task. To handle this task, we first present a Composite Action Generation (CAG) module to generate synthetic composite actions for training, thus avoiding the laborious work of collecting composite action samples. Moreover, we alleviate the effect of composite actions on demand for a more complicated model by presenting a Dynamic Compositional Graph Convolutional Network (DC-GCN). Extensive experiments on the Human3.6M dataset and our newly collected CHAMP dataset consistently verify the efficiency of our DC-GCN method, which achieves state-of-the-art motion prediction accuracies and meanwhile needs few extra computational costs than traditional GCN-based human motion methods.

翻訳日:2023-11-28 00:46:03 公開日:2023-11-23

# ハイパースペクトル画像のPCA高速化リアルタイム処理の検出と同定精度

Detection and Identification Accuracy of PCA-Accelerated Real-Time Processing of Hyperspectral Imagery ( http://arxiv.org/abs/2311.13779v1 )

ライセンス: Link先を確認

Abigail Basener and Meagan Herald

(参考訳) リアルタイムまたはほぼリアルタイムのハイパースペクトル検出と同定は、多くの分野で非常に有用であり、必要である。これらのデータセットは非常に大きく、アルゴリズムは処理を遅くする多数の計算を必要とする可能性がある。プロセスの高速化の一般的な方法は、次元の縮小に主成分分析(PCA)を使用することである。主成分のサブセットによって提供される縮小次元空間では、データの処理に必要な計算量が少なくなり、実行時間が短縮される。本稿では,PCAの使用時間を削減するために,検出率に最小限の影響を伴って,主成分の省略数を調べることで,PCAの使用時間を短縮する手法を提案する。 aceを用いて検出を行い、次に確率とスペクトルを同定し、検出率の顕著な変化を見る前に、主成分の数をかなりの量削減できることを示す。

Real-time or near real-time hyperspectral detection and identification are extremely useful and needed in many fields. These data sets can be quite large, and the algorithms can require numerous computations that slow the process down. A common way of speeding up the process is to use principal component analysis (PCA) for dimension reduction. In the reduced dimensional space, provided by a subset of the principal components, fewer computations are needed to process the data resulting in a faster run time. In this paper, we propose a way to further decrease the time required to use PCA by investigating how many principal components may be omitted with minimal impact on the detection rate. Using ACE to perform the detection, and then probability, and spectral fit for identification, we find that the number of principal components can be reduced by a substantial amount before seeing a noticeable change in detection rates.

翻訳日:2023-11-28 00:45:42 公開日:2023-11-23

# GS-Pose:幾何学的・意味的対応によるカテゴリーレベルオブジェクトポス推定

GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence ( http://arxiv.org/abs/2311.13777v1 )

ライセンス: Link先を確認

Pengyuan Wang, Takuya Ikeda, Robert Lee, Koichi Nishiwaki

(参考訳) カテゴリーレベルのポーズ推定は、コンピュータビジョンとロボット工学における多くの潜在的な応用において難しい課題である。近年、ディープラーニングベースのアプローチは大きな進歩を遂げているが、通常はポーズラベル付き実画像の大規模なデータセットの必要性や、注意深く調整されたフォトリアリスティックシミュレータの必要性によって妨げられている。これは、深度画像などの幾何入力のみを使用してドメインギャップを減らすことで回避できるが、これらのアプローチは意味情報の欠如に苦しむため、ポーズ推定問題において不可欠である。この矛盾を解決するために,我々は,事前学習した基礎モデルから得られた幾何学的特徴と意味的特徴の両方を利用するように提案する。我々のアプローチでは,この基礎モデルから2d特徴をカテゴリ毎に1つのオブジェクトモデルに対して3dに計画する。セマンティクス機能はオブジェクトのテクスチャと外観にロバストであるため、トレーニングするデータ量は以前のメソッドよりもはるかに少ない。我々はこれをリッチな評価で実証し、必要なデータの一部で事前の手法よりも優れた性能を示す。

Category-level pose estimation is a challenging task with many potential applications in computer vision and robotics. Recently, deep-learning-based approaches have made great progress, but are typically hindered by the need for large datasets of either pose-labelled real images or carefully tuned photorealistic simulators. This can be avoided by using only geometry inputs such as depth images to reduce the domain-gap but these approaches suffer from a lack of semantic information, which can be vital in the pose estimation problem. To resolve this conflict, we propose to utilize both geometric and semantic features obtained from a pre-trained foundation model.Our approach projects 2D features from this foundation model into 3D for a single object model per category, and then performs matching against this for new single view observations of unseen object instances with a trained matching network. This requires significantly less data to train than prior methods since the semantic features are robust to object texture and appearance. We demonstrate this with a rich evaluation, showing improved performance over prior methods with a fraction of the data required.

翻訳日:2023-11-28 00:45:29 公開日:2023-11-23

# メソスコピック超高速非線形光学 --多モード量子非ガウス物理学の出現

Mesoscopic ultrafast nonlinear optics -- The emergence of multimode quantum non-Gaussian physics ( http://arxiv.org/abs/2311.13775v1 )

ライセンス: Link先を確認

Ryotatsu Yanagimoto, Edwin Ng, Marc Jankowski, Rajveer Nehra, Timothy P. McKenna, Tatsuhiro Onodera, Logan G. Wright, Ryan Hamerly, Alireza Marandi, M. M. Fejer, Hideo Mabuchi

(参考訳) 過去数十年間、非線形光学は大幅に非線形になり、エネルギー効率は10億倍近く向上し、特に超高速の非線形ナノフォトニクスは空間工学と時間工学の融合のフロンティアとして登場してきた。現在、非線形ナノフォトニクスにおける最先端の実験は、数百個の光子が非線形飽和を誘発するメソスコピックレジームのすぐ上に位置している。古典光学や深量子光学とは対照的に、メソスケールは平均場、ガウス的、非ガウス的量子的特徴の間の動的相互作用によって特徴づけられる。光場の本質的に多重モードの複雑さと組み合わせると、そのようなハイブリッド量子古典力学は現代の量子光学の枠組みに理論的、実験的、および工学的な課題をもたらす。本稿では、メソスケールにおけるマルチモード非線形光学において現れる特異な物理を取り上げ、古典的特徴と量子的特徴の両方を活用するための鍵となる原理を概説する。我々は, 材料, 分散工学, およびメゾスコピック操作を行うためのデバイス設計において, 優れた技術的課題に留意する。最後に、量子情報処理から非古典的光駆動ダイナミクスや現象、全光学的非ゲージ計測やセンシングまで、量子フォトニクスにおけるこれらの能力がどのような新しいパラダイムをもたらすのかを推測する。メソスケールで解き放たれた物理学は、理論と実験における重要な課題と機会をも示しており、このレビューは、超高速量子非線形光学におけるこの新たなフロンティアをナビゲートする上での指針となることを意図している。

Over the last few decades, nonlinear optics has become significantly more nonlinear, traversing nearly a billionfold improvement in energy efficiency, with ultrafast nonlinear nanophotonics in particular emerging as a frontier for combining both spatial and temporal engineering. At present, cutting-edge experiments in nonlinear nanophotonics place us just above the mesoscopic regime, where a few hundred photons suffice to trigger nonlinear saturation. In contrast to classical or deep-quantum optics, the mesoscale is characterized by dynamical interactions between mean-field, Gaussian, and non-Gaussian quantum features, all within a close hierarchy of scales. When combined with the inherent multimode complexity of optical fields, such hybrid quantum-classical dynamics present theoretical, experimental, and engineering challenges to the contemporary framework of quantum optics. In this review, we highlight the unique physics that emerges in multimode nonlinear optics at the mesoscale and outline key principles for exploiting both classical and quantum features to engineer novel functionalities. We briefly survey the experimental landscape and draw attention to outstanding technical challenges in materials, dispersion engineering, and device design for accessing mesoscopic operation. Finally, we speculate on how these capabilities might usher in some new paradigms in quantum photonics, from quantum-augmented information processing to nonclassical-light-driven dynamics and phenomena to all-optical non-Gaussian measurement and sensing. The physics unlocked at the mesoscale present significant challenges and opportunities in theory and experiment alike, and this review is intended to serve as a guidepost as we begin to navigate this new frontier in ultrafast quantum nonlinear optics.

翻訳日:2023-11-28 00:45:07 公開日:2023-11-23

# 3層ニューラルネットワークによる階層多項式の学習

Learning Hierarchical Polynomials with Three-Layer Neural Networks ( http://arxiv.org/abs/2311.13774v1 )

ライセンス: Link先を確認

Zihao Wang, Eshaan Nichani, Jason D. Lee

(参考訳) 3層ニューラルネットワークを用いた標準ガウス分布における階層多項式の学習問題について検討する。ここで、$p : \mathbb{r}^d \rightarrow \mathbb{r}$ は次数 $k$ 多項式であり、$g: \mathbb{r} \rightarrow \mathbb{r}$ は次数 $q$ 多項式である。この関数クラスは、$k=1$に対応する単一インデックスモデルを一般化し、基礎となる階層構造を持つ関数の自然なクラスである。我々の主な結果は、次数$k$多項式の大規模サブクラス$p$に対して、正方形損失の層次勾配降下によってトレーニングされた3層ニューラルネットワークは、$\widetilde{\mathcal{O}}(d^k)$サンプルと多項式時間でテストエラーを消すための目標$h$を学習することを示している。これはカーネルメソッドに対する厳格な改善であり、$\widetilde \theta(d^{kq})$サンプルと、ターゲット関数を低ランクで要求する2層ネットワークに対する既存の保証が必要である。また,3層ニューラルネットワークに関する先行研究を一般化し,これを2次ニューラルネットワークである$p$に制限した。実際に$p$が二次であるとき、情報理論上最適なサンプル複雑性 $\widetilde{\mathcal{O}}(d^2)$ が得られ、これは以前の作業よりも改善され、サンプルサイズが$\widetilde\Theta(d^4)$ となる。我々の証明は、トレーニングの初期段階において、ネットワークが機能学習を行い、$\widetilde{\mathcal{O}}(d^k)$サンプルで$$p$の機能を回復することを示す。この研究は、複雑な特徴を学習する3層ニューラルネットワークの能力を示し、その結果、階層関数の幅広いクラスを学習する。

We study the problem of learning hierarchical polynomials over the standard Gaussian distribution with three-layer neural networks. We specifically consider target functions of the form $h = g \circ p$ where $p : \mathbb{R}^d \rightarrow \mathbb{R}$ is a degree $k$ polynomial and $g: \mathbb{R} \rightarrow \mathbb{R}$ is a degree $q$ polynomial. This function class generalizes the single-index model, which corresponds to $k=1$, and is a natural class of functions possessing an underlying hierarchical structure. Our main result shows that for a large subclass of degree $k$ polynomials $p$, a three-layer neural network trained via layerwise gradient descent on the square loss learns the target $h$ up to vanishing test error in $\widetilde{\mathcal{O}}(d^k)$ samples and polynomial time. This is a strict improvement over kernel methods, which require $\widetilde \Theta(d^{kq})$ samples, as well as existing guarantees for two-layer networks, which require the target function to be low-rank. Our result also generalizes prior works on three-layer neural networks, which were restricted to the case of $p$ being a quadratic. When $p$ is indeed a quadratic, we achieve the information-theoretically optimal sample complexity $\widetilde{\mathcal{O}}(d^2)$, which is an improvement over prior work~\citep{nichani2023provable} requiring a sample size of $\widetilde\Theta(d^4)$. Our proof proceeds by showing that during the initial stage of training the network performs feature learning to recover the feature $p$ with $\widetilde{\mathcal{O}}(d^k)$ samples. This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.

翻訳日:2023-11-28 00:44:34 公開日:2023-11-23

# 身体運動のアーカイブ:中国書道の集合的生成

Archiving Body Movements: Collective Generation of Chinese Calligraphy ( http://arxiv.org/abs/2311.13770v1 )

ライセンス: Link先を確認

Aven Le Zhou, Jiayi Ye, Tianchen Liu, Kang Zhang

(参考訳) コミュニケーションチャネルとして、身体運動は行動研究やキネシクスで広く研究されている。演技と視覚芸術は同じ関心を持っているが、ダンス表記や視覚作品の作成など、人間の身体運動の文書化と表現に焦点を当てている。本稿では,東洋書道における身体運動と,身体運動を刺激し,アーカイブする書道原理について検討する。作品(ウーシュ)を通して,著者らは,生成した書道の要約として,身体的参加や身体運動のアーカイブを行うための対話的かつ生成的なアプローチを試した。読者は作家と読者の両方の役割を引き受け、文字や書道に関するさらなる注意と議論の動機となる無限の「本」の中で、生成した書を創造し、鑑賞する(読む)ことは循環的なプロセスとなる。

As a communication channel, body movements have been widely explored in behavioral studies and kinesics. Performing and visual arts share the same interests but focus on documenting and representing human body movements, such as for dance notation and visual work creation. This paper investigates body movements in oriental calligraphy and how to apply calligraphy principles to stimulate and archive body movements. Through an artwork (Wushu), the authors experiment with an interactive and generative approach to engage the audience's bodily participation and archive the body movements as a compendium of generated calligraphy. The audience assumes the role of both writers and readers; creating ("writing") and appreciating ("reading") the generated calligraphy becomes a cyclical process within this infinite "Book," which can motivate further attention and discussions concerning Chinese characters and calligraphy.

翻訳日:2023-11-28 00:43:52 公開日:2023-11-23

# 効果的なグラフ学習によるフェアスペクトルクラスタリングのための統一フレームワーク

A Unified Framework for Fair Spectral Clustering With Effective Graph Learning ( http://arxiv.org/abs/2311.13766v1 )

ライセンス: Link先を確認

Xiang Zhang, Qiao Wang

(参考訳) グループフェアネス制約下でのスペクトルクラスタリングの問題を考察し、各感度グループからのサンプルは各クラスタでほぼ比例的に表現される。従来のフェアスペクトルクラスタリング(FSC)手法は、2つの連続的な段階、すなわち与えられたグラフにフェアスペクトルを埋め込み、離散クラスタラベルを得るために$k$meansを実行する。しかし、実際にはグラフは通常不明であり、潜在的にノイズの多いデータから基礎となるグラフを構築する必要がある。さらに、別々のステップでFSCを実行すると、これらのステップ間の接続が断ち切られ、最適な結果が得られます。この目的のために、構築されたグラフがFSCに与える影響を理論的に解析する。そこで本研究では,ノード適応グラフフィルタを用いた新しいグラフ構築手法を提案し,ノイズの多いデータからグラフを学習する。そして、従来のSFCのすべての独立したステージを単一の目的関数に統合し、生データを入力し、離散クラスタラベルを出力するエンドツーエンドのフレームワークを形成する。各段の変数を共同で交互に更新するアルゴリズムが開発された。最後に、我々は、合成、ベンチマーク、および実データに関する広範な実験を行い、我々のモデルは最先端の公正クラスタリング手法よりも優れていることを示す。

We consider the problem of spectral clustering under group fairness constraints, where samples from each sensitive group are approximately proportionally represented in each cluster. Traditional fair spectral clustering (FSC) methods consist of two consecutive stages, i.e., performing fair spectral embedding on a given graph and conducting $k$means to obtain discrete cluster labels. However, in practice, the graph is usually unknown, and we need to construct the underlying graph from potentially noisy data, the quality of which inevitably affects subsequent fair clustering performance. Furthermore, performing FSC through separate steps breaks the connections among these steps, leading to suboptimal results. To this end, we first theoretically analyze the effect of the constructed graph on FSC. Motivated by the analysis, we propose a novel graph construction method with a node-adaptive graph filter to learn graphs from noisy data. Then, all independent stages of conventional FSC are integrated into a single objective function, forming an end-to-end framework that inputs raw data and outputs discrete cluster labels. An algorithm is developed to jointly and alternately update the variables in each stage. Finally, we conduct extensive experiments on synthetic, benchmark, and real data, which show that our model is superior to state-of-the-art fair clustering methods.

翻訳日:2023-11-28 00:43:37 公開日:2023-11-23

# 展開時に収集したデータから骨格資源のオンライン配置のための最適かつ公正な政策の学習

Learning Optimal and Fair Policies for Online Allocation of Scarce Societal Resources from Data Collected in Deployment ( http://arxiv.org/abs/2311.13765v1 )

ライセンス: Link先を確認

Bill Tang, \c{C}a\u{g}{\i}l Ko\c{c}yi\u{g}it, Eric Rice, Phebe Vayanos

(参考訳) 本研究では,待機リスト上の異種アロケート(ホームレス,末期腎疾患患者,covid-19患者など)に対して,異なるタイプの希少社会資源(永住,移植用ドナー腎臓,人工呼吸器など)を,観察された共変量に基づいて割り当てる問題について検討した。デプロイメントで収集した管理データを活用して、長期的には予算制約を満たしながら、期待される成果を最大化するオンラインポリシーを設計します。提案するポリシウェイトリストは,各リソースに対する評価平均処理結果と推定資源の2値値との差を最大化するか,あるいは大まかに言えば,リソースの利用機会コストを最大化する。リソースは、最初の最初のサービスとして、到着時に割り当てられる。我々は,我々のデータ駆動型政策が,穏やかな技術的前提の下で,最適なサンプル外政策の期待結果をほぼ確実に達成できることを実証した。フレームワークを拡張して、さまざまな公正な制約を取り入れます。ホームレス管理情報システムから得られたデータをもとに,ロサンゼルスのホームレス体験者を対象に,不足する住宅資源を割り当てる政策を設計する上でのアプローチの有効性を評価した。特に,我々の政策は,ホームレスからの退去率を1.9%向上させ,人種による配分や結果に公平な政策は,非常に低いフェアネス価格となることを示す。

We study the problem of allocating scarce societal resources of different types (e.g., permanent housing, deceased donor kidneys for transplantation, ventilators) to heterogeneous allocatees on a waitlist (e.g., people experiencing homelessness, individuals suffering from end-stage renal disease, Covid-19 patients) based on their observed covariates. We leverage administrative data collected in deployment to design an online policy that maximizes expected outcomes while satisfying budget constraints, in the long run. Our proposed policy waitlists each individual for the resource maximizing the difference between their estimated mean treatment outcome and the estimated resource dual-price or, roughly, the opportunity cost of using the resource. Resources are then allocated as they arrive, in a first-come first-serve fashion. We demonstrate that our data-driven policy almost surely asymptotically achieves the expected outcome of the optimal out-of-sample policy under mild technical assumptions. We extend our framework to incorporate various fairness constraints. We evaluate the performance of our approach on the problem of designing policies for allocating scarce housing resources to people experiencing homelessness in Los Angeles based on data from the homeless management information system. In particular, we show that using our policies improves rates of exit from homelessness by 1.9% and that policies that are fair in either allocation or outcomes by race come at a very low price of fairness.

翻訳日:2023-11-28 00:43:16 公開日:2023-11-23

# J-TEXTにおけるニューラルネットワークに基づくロックモード検出器によるn = 0ピックアップの抽出

Extraction of n = 0 pick-up by locked mode detectors based on neural networks in J-TEXT ( http://arxiv.org/abs/2311.13763v1 )

ライセンス: Link先を確認

Chengshuo Shen, Jianchao Li, Yonghua Ding, Jiaolong Dong, Nengchao Wang, Dongliang.Han, Feiyue Mao, Da Li, Zhipeng Chen, Zhoujun Yang, Zhongyong Chen, Yuan Pan and J-Text Team

(参考訳) ロックモード(LM)の測定は磁気流体力学(MHD)不安定性とプラズマ破壊の物理的研究において重要である。 n = 0 のピックアップは、LMの振幅と位相を計算するために抽出および減算する必要がある。 J-TEXTのニューラルネットワーク(NN)に基づくLM検出器により,n = 0のピックアップbrn=0を予測することにより,このピックアップを抽出する新たな手法を開発した。 power multiple time scale (pmts) と呼ばれる手法が開発され、複数の周波数範囲で優れた回帰効果が得られた。 PMTS NNをベースとした3つのモデルが開発されている。 PMTSは時間領域と周波数領域の両方でほとんど誤差のないLM検出器にbrn=0を適合させることができた。抽出したbrn=0を減算した後、共鳴磁気摂動(RMP)により生じるn>0ピックアップbrn>0が得られる。この方法では、4個のLM検出器の代わりに1個のLMのみを使用してbrn=0を抽出する。したがって, この手法により, LM検出器の分布を最適化することもできる。

Measurement of locked mode (LM) is important for the physical research of Magnetohydrodynamic (MHD) instabilities and plasma disruption. The n = 0 pick-up need to be extracted and subtracted to calculate the amplitude and phase of the LM. A new method to extract this pick-up has been developed by predicting the n = 0 pick-up brn=0 by the LM detectors based on Neural Networks (NNs) in J-TEXT. An approach called Power Multiple Time Scale (PMTS) has been developed with outstanding regressing effect in multiple frequency ranges. Three models have been progressed based on PMTS NNs. PMTS could fit the brn=0 on the LM detectors with little errors both in time domain and frequency domain. The n>0 pick-up brn>0 generated by resonant magnetic perturbations (RMPs) can be obtained after subtracting the extracted brn=0. This new method uses only one LM instead of 4 LM detectors to extract brn=0. Therefore, the distribution of the LM detectors can also be optimized based on this new method.

翻訳日:2023-11-28 00:42:50 公開日:2023-11-23

# 可変レート画像圧縮のためのビジュアルプロンプトチューニングによるプログレッシブラーニング

Progressive Learning with Visual Prompt Tuning for Variable-Rate Image Compression ( http://arxiv.org/abs/2311.13846v1 )

ライセンス: Link先を確認

Shiyu Qin, Yimin Zhou, Jinpeng Wang, Bin Chen, Baoyi An, Tao Dai, Shu-Tao Xia

(参考訳) 本稿では,変圧器を用いた可変レート画像圧縮のための漸進学習パラダイムを提案する。提案手法は,Layer-Adaptive Prompt Module (LPM) の助けを借りて,幅広い圧縮率をカバーする。視覚的プロンプトチューニングにより,LPMを用いてエンコーダ側の入力画像とデコーダ側の隠れ特徴のプロンプトを抽出し,事前学習されたトランスフォーマーベース画像圧縮モデルのSwinトランスフォーマー層に付加情報として供給し,アテンション領域とビットの割り当てに影響を及ぼし,モデルの目標圧縮率を変化させる。ネットワークがより軽量であることを保証するため、より畳み込みの少ないプロンプトネットワークの統合を伴います。実験の結果,異なるターゲットレートで個別に最適化された複数のモデルに基づく手法と比較して,パラメータストレージの80%,データセットの90%の削減で,提案手法は同一性能に到達した。一方,本モデルでは,現在の可変ビットレート画像法をレートゆらぎ性能で上回り,スクラッチからトレーニングした最先端の固定ビットレート画像圧縮手法にアプローチする。

In this paper, we propose a progressive learning paradigm for transformer-based variable-rate image compression. Our approach covers a wide range of compression rates with the assistance of the Layer-adaptive Prompt Module (LPM). Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively, which are fed as additional information into the Swin Transformer layer of a pre-trained transformer-based image compression model to affect the allocation of attention region and the bits, which in turn changes the target compression ratio of the model. To ensure the network is more lightweight, we involves the integration of prompt networks with less convolutional layers. Exhaustive experiments show that compared to methods based on multiple models, which are optimized separately for different target rates, the proposed method arrives at the same performance with 80% savings in parameter storage and 90% savings in datasets. Meanwhile, our model outperforms all current variable bitrate image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed bitrate image compression methods trained from scratch.

翻訳日:2023-11-28 00:36:03 公開日:2023-11-23

# プッシュフォワードマップを用いたツアーサンプリング

Touring sampling with pushforward maps ( http://arxiv.org/abs/2311.13845v1 )

ライセンス: Link先を確認

Vivien Cabannes, Charles Arnal

(参考訳) 強力な機械学習手法を特定の問題に当てはめようとしている実践者にとって、サンプリングメソッドの数は恐ろしいかもしれない。本稿では,「世代モデリング」設定における多くのサンプリング手法の見直しと整理に理論的スタンスを採り入れ,いくつかのトレーニング例に類似した新しいデータを作成したいと考えている。既存の手法間のリンクを明らかにすることで、拡散シミュレーションによる長い推論時間や生成されたサンプルの多様性の欠如といった、現在の拡散モデルによるサンプリングの課題を克服できる可能性がある。

The number of sampling methods could be daunting for a practitioner looking to cast powerful machine learning methods to their specific problem. This paper takes a theoretical stance to review and organize many sampling approaches in the ``generative modeling'' setting, where one wants to generate new data that are similar to some training examples. By revealing links between existing methods, it might prove useful to overcome some of the current challenges in sampling with diffusion models, such as long inference time due to diffusion simulation, or the lack of diversity in generated samples.

翻訳日:2023-11-28 00:35:26 公開日:2023-11-23

# テンポアテンショングラフニューラルネットワークによる完全組合せ最適化

Exact Combinatorial Optimization with Temporo-Attentional Graph Neural Networks ( http://arxiv.org/abs/2311.13843v1 )

ライセンス: Link先を確認

Mehdi Seyfi, Amin Banitalebi-Dehkordi, Zirui Zhou, and Yong Zhang

(参考訳) 組合せ最適化は、変数と制約の離散集合の中で最適な解を見つける。この分野は研究と産業の両方で大きく進歩している。過去10年間のディープラーニングの成功により、組合せ最適化の最近の傾向は、キーヒューリスティックコンポーネントを機械学習(ML)モデルに置き換えることで、最先端の組合せ最適化問題を改善している。本稿では,組合せ最適化のための機械学習アルゴリズムの2つの重要な側面について考察する。分岐とバウンド(B&B)アルゴリズムにおける変数選択のタスクでは、時間情報と二部グラフの注意を組み込むことで、解法の性能が向上すると主張している。文献やコンペティションで使用されるいくつかの標準データセットに対する直観と数値結果による主張を支持する。コードは、https://developer.huaweicloud.com/develop/aigallery/notebook/detail? id=047c6cf2-8463-40d7-b92f-7b2ca998e935

Combinatorial optimization finds an optimal solution within a discrete set of variables and constraints. The field has seen tremendous progress both in research and industry. With the success of deep learning in the past decade, a recent trend in combinatorial optimization has been to improve state-of-the-art combinatorial optimization solvers by replacing key heuristic components with machine learning (ML) models. In this paper, we investigate two essential aspects of machine learning algorithms for combinatorial optimization: temporal characteristics and attention. We argue that for the task of variable selection in the branch-and-bound (B&B) algorithm, incorporating the temporal information as well as the bipartite graph attention improves the solver's performance. We support our claims with intuitions and numerical results over several standard datasets used in the literature and competitions. Code is available at: https://developer.huaweicloud.com/develop/aigallery/notebook/detail?id=047c6cf2-8463-40d7-b92f-7b2ca998e935

翻訳日:2023-11-28 00:35:10 公開日:2023-11-23

# Lego: テキストと画像の拡散モデルにおいて、オブジェクトの出現以上の概念を分離し、逆転させる学習

Lego: Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models ( http://arxiv.org/abs/2311.13833v1 )

ライセンス: Link先を確認

Saman Motamed and Danda Pani Paudel and Luc Van Gool

(参考訳) 拡散モデルは生成コンテンツの作成に革命をもたらし、特にテキスト・ツー・イメージ(t2i)拡散モデルは自然言語を用いたシーン合成を可能にし、ユーザの創造的自由度を高めた。 T2Iモデルは名詞、外観、スタイルといった概念の合成に優れている。所望のコンセプトを倒してテキスト反転やドリームブースなどの手法を用いて、コンセプトの少数の例画像に基づいてカスタマイズされたコンテンツ作成を可能とし、新たなシーンで合成できるようにする。しかし、オブジェクトの外観やスタイル(形容詞や動詞)を自然言語で超越した、より一般的な概念を逆転することは、依然として課題である。これらの概念の2つの重要な特徴は、現在の反転法の限界に寄与する。 1)形容詞と動詞は名詞(形容詞)と絡み合っており,概念埋め込みに主語出現が漏れる出現に基づく反転法を阻害しうる。 2) 従来の手法では扱えない単一の単語の埋め込み(氷で凍ったり、綱渡りなど)を超えて、そのような概念を記述することも多い。そこで本研究では,いくつかの例から対象の絡み合った概念を逆転するテキスト変換手法であるLegoを紹介する。 legoは、概念を、単純かつ効果的な主題分離ステップを使って、関連する主題から切り離し、単一/マルチエンベディングの概念の反転を導くコンテキストロスを採用する。徹底的なユーザスタディでは、レゴ生成の概念がベースラインと比較して70%以上好まれました。さらに、大きな言語モデルを用いた視覚的な質問応答では、レゴ生成の概念は、概念のテキスト記述と整合性が高いことが示唆されている。

Diffusion models have revolutionized generative content creation and text-to-image (T2I) diffusion models in particular have increased the creative freedom of users by allowing scene synthesis using natural language. T2I models excel at synthesizing concepts such as nouns, appearances, and styles. To enable customized content creation based on a few example images of a concept, methods such as Textual Inversion and DreamBooth invert the desired concept and enable synthesizing it in new scenes. However, inverting more general concepts that go beyond object appearance and style (adjectives and verbs) through natural language, remains a challenge. Two key characteristics of these concepts contribute to the limitations of current inversion methods. 1) Adjectives and verbs are entangled with nouns (subject) and can hinder appearance-based inversion methods, where the subject appearance leaks into the concept embedding and 2) describing such concepts often extends beyond single word embeddings (being frozen in ice, walking on a tightrope, etc.) that current methods do not handle. In this study, we introduce Lego, a textual inversion method designed to invert subject entangled concepts from a few example images. Lego disentangles concepts from their associated subjects using a simple yet effective Subject Separation step and employs a Context Loss that guides the inversion of single/multi-embedding concepts. In a thorough user study, Lego-generated concepts were preferred over 70% of the time when compared to the baseline. Additionally, visual question answering using a large language model suggested Lego-generated concepts are better aligned with the text description of the concept.

翻訳日:2023-11-28 00:34:45 公開日:2023-11-23

# 後部蒸留サンプリング

Posterior Distillation Sampling ( http://arxiv.org/abs/2311.13831v1 )

ライセンス: Link先を確認

Juil Koo, Chanho Park, Minhyuk Sung

(参考訳) 拡散モデルに基づくパラメトリック画像編集のための新しい最適化手法である PDS (Posterior Distillation Sampling) を導入する。様々なパラメトリック画像の処理に拡散モデルの強力な2次元前処理を利用する既存の最適化手法は,主に生成に重点を置いている。生成とは異なり、編集にはターゲット属性への準拠とソースコンテンツのアイデンティティ保持のバランスが必要となる。近年の2次元画像編集法は,拡散モデルの生成過程に符号化された確率的潜伏を利用してこのバランスを達成している。画素空間で示される拡散モデルのパラメータ空間への編集能力を拡張するため、2次元画像編集法をPDSという最適化形式に再構成する。 PDSはソースとターゲットの確率的潜在値と一致し、ソースのアイデンティティを維持しながら、望ましい属性と整合する多様なパラメータ空間におけるターゲットのサンプリングを可能にする。この最適化は、生成プロセスをターゲット属性で実行するのに似ているが、ソースの生成プロセスの軌跡と一致させることを実証する。 Neural Radiance Fields と Scalable Vector Graphics representations の広範囲な編集結果は、PDSが上記パラメータ空間間のバランスを満たすためにターゲットをサンプリングできることを示している。

We introduce Posterior Distillation Sampling (PDS), a novel optimization method for parametric image editing based on diffusion models. Existing optimization-based methods, which leverage the powerful 2D prior of diffusion models to handle various parametric images, have mainly focused on generation. Unlike generation, editing requires a balance between conforming to the target attribute and preserving the identity of the source content. Recent 2D image editing methods have achieved this balance by leveraging the stochastic latent encoded in the generative process of diffusion models. To extend the editing capabilities of diffusion models shown in pixel space to parameter space, we reformulate the 2D image editing method into an optimization form named PDS. PDS matches the stochastic latents of the source and the target, enabling the sampling of targets in diverse parameter spaces that align with a desired attribute while maintaining the source's identity. We demonstrate that this optimization resembles running a generative process with the target attribute, but aligning this process with the trajectory of the source's generative process. Extensive editing results in Neural Radiance Fields and Scalable Vector Graphics representations demonstrate that PDS is capable of sampling targets to fulfill the aforementioned balance across various parameter spaces.

翻訳日:2023-11-28 00:33:59 公開日:2023-11-23

# モデル平均化における安定性とl2ペナルティ

Stability and L2-penalty in Model Averaging ( http://arxiv.org/abs/2311.13827v1 )

ライセンス: Link先を確認

Hengkun Zhu, Guohua Zou

(参考訳) モデル平均化は過去20年間に多くの注目を集めており、モデルの平均化によって利用可能な情報を統合している。様々なモデル平均化手法が開発されているが、安定性の観点からモデル平均化の理論的な性質に関する文献は少なく、これらの手法の多くはモデル重み付けを単純なものに制限している。本研究の目的は,統計的学習理論からモデル平均化への安定性の導入である。したがって,モデル平均化の安定性,漸近的経験的リスク最小化,一般化,一貫性を定義し,それらの関係を考察する。以上の結果から,モデル平均化による予測誤差の漸近的最小化が可能であるため,モデル平均化性能と妥当な条件下での一貫性が確保できることが示唆された。また,モデル重みを制限することなくL2ペナルティモデル平均化法を提案し,安定性と整合性を示す。チューニングパラメータ選択の影響を低減するために,10倍のクロスバリデーションを用いて,パラメータの候補セットを選択し,推定誤差に基づいてモデル重み付けの推定値の重み付け平均を実行する。モンテカルロシミュレーションと図解的応用は,提案手法の有用性を実証するものである。

Model averaging has received much attention in the past two decades, which integrates available information by averaging over potential models. Although various model averaging methods have been developed, there are few literatures on the theoretical properties of model averaging from the perspective of stability, and the majority of these methods constrain model weights to a simplex. The aim of this paper is to introduce stability from statistical learning theory into model averaging. Thus, we define the stability, asymptotic empirical risk minimizer, generalization, and consistency of model averaging and study the relationship among them. Our results indicate that stability can ensure that model averaging has good generalization performance and consistency under reasonable conditions, where consistency means model averaging estimator can asymptotically minimize the mean squared prediction error. We also propose a L2-penalty model averaging method without limiting model weights and prove that it has stability and consistency. In order to reduce the impact of tuning parameter selection, we use 10-fold cross-validation to select a candidate set of tuning parameters and perform a weighted average of the estimators of model weights based on estimation errors. The Monte Carlo simulation and an illustrative application demonstrate the usefulness of the proposed method.

翻訳日:2023-11-28 00:33:21 公開日:2023-11-23

# 有限温度における改良ハーツリー・フォック近似における均質希薄ボース気体の非凝縮分数

The non-condensed fraction of a homogeneous dilute Bose gas within the improved Hartree-Fock approximation at finite temperature ( http://arxiv.org/abs/2311.13822v1 )

ライセンス: Link先を確認

Nguyen Van Thu

(参考訳) Cornwall-Jackiw-Tomboulis 効果作用を用いて,有限温度で弱相互作用するボースガスについて検討した。臨界温度のシフトは、普遍形 $\Delta T_C/T_C^{(0)} = cn_0^{1/3}a_s$ で得られる。非凝縮分数は、量子揺らぎ、熱揺らぎ、およびその両方に対応する3つの項の合計で表される。

By means of Cornwall-Jackiw-Tomboulis effective action we investigate a dilute weakly interacting Bose gas at finite temperature. The shift of critical temperature is obtained in the universal form $\Delta T_C/T_C^{(0)} = cn_0^{1/3}a_s$. The non-condensate fraction is expressed in sum of three terms, which correspond to the quantum fluctuation, thermal fluctuation and both.

翻訳日:2023-11-28 00:32:55 公開日:2023-11-23

# hypuc : 不平衡心電図の信頼性回帰のための勾配ブースト補正による超微視的不確実性校正

HypUC: Hyperfine Uncertainty Calibration with Gradient-boosted Corrections for Reliable Regression on Imbalanced Electrocardiograms ( http://arxiv.org/abs/2311.13821v1 )

ライセンス: Link先を確認

Uddeshya Upadhyay, Sairam Bade, Arjun Puranik, Shahir Asfahan, Melwin Babu, Francisco Lopez-Jimenez, Samuel J. Asirvatham, Ashim Prasad, Ajit Rajasekharan, Samir Awasthi, Rakesh Barve

(参考訳) 心電図(ECG)、脳波図(EEG)、パルスオキシメトリーなどの医療時系列の自動解析は、患者を遠隔監視し、高価で時間を要する医療処置をより効率的に活用するための貴重なツールとして機能する可能性がある。ディープニューラルネットワーク(DNN)は、そのような信号を効果的に処理することを示した。しかし、これまでの研究では、診断の中心となる生理的パラメータの連続的な評価を抑えるのではなく、医学時系列の分類に重点を置いてきた。この点において重要な課題の1つは、異常な状況の発生率の低さが、不正確な予測とデプロイ時の予測の確実性の欠如をもたらす大きな歪曲データに繋がる可能性があるため、データセットのバランスの取れない性質である。これらの課題に対処するため,医療時系列における不均衡確率回帰の枠組みであるHypUCを提案する。 (i)医療時系列と不均衡回帰問題に取り組むためのカーネル密度に基づく簡単な手法を導入する。さらに,予測された連続値に対する不確実性推定を可能にする確率回帰フレームワークを用いる。 (iii) 予測の不確実性をさらに校正する新たな手法を提案する。 (iv)最後に,予測された連続値を改善するために校正不確実性推定を用いる手法を示し,不確実性推定を校正不確実性推定の有効性を示す。 HypUCは、数百万の患者から収集された多種多様なECGの大規模な実世界のデータセットに基づいて評価され、様々な診断タスクにおける従来のベースラインよりも優れている。

The automated analysis of medical time series, such as the electrocardiogram (ECG), electroencephalogram (EEG), pulse oximetry, etc, has the potential to serve as a valuable tool for diagnostic decisions, allowing for remote monitoring of patients and more efficient use of expensive and time-consuming medical procedures. Deep neural networks (DNNs) have been demonstrated to process such signals effectively. However, previous research has primarily focused on classifying medical time series rather than attempting to regress the continuous-valued physiological parameters central to diagnosis. One significant challenge in this regard is the imbalanced nature of the dataset, as a low prevalence of abnormal conditions can lead to heavily skewed data that results in inaccurate predictions and a lack of certainty in such predictions when deployed. To address these challenges, we propose HypUC, a framework for imbalanced probabilistic regression in medical time series, making several contributions. (i) We introduce a simple kernel density-based technique to tackle the imbalanced regression problem with medical time series. (ii) Moreover, we employ a probabilistic regression framework that allows uncertainty estimation for the predicted continuous values. (iii) We also present a new approach to calibrate the predicted uncertainty further. (iv) Finally, we demonstrate a technique to use calibrated uncertainty estimates to improve the predicted continuous value and show the efficacy of the calibrated uncertainty estimates to flag unreliable predictions. HypUC is evaluated on a large, diverse, real-world dataset of ECGs collected from millions of patients, outperforming several conventional baselines on various diagnostic tasks, suggesting a potential use-case for the reliable clinical deployment of deep learning models.

翻訳日:2023-11-28 00:32:44 公開日:2023-11-23

# 分子同定とピークアサインメント:NMRによるマルチレベルマルチモーダルアライメントの活用

Molecular Identification and Peak Assignment: Leveraging Multi-Level Multimodal Alignment on NMR ( http://arxiv.org/abs/2311.13817v1 )

ライセンス: Link先を確認

Hao Xu, Zhengyang Zhou, Pengyu Hong

(参考訳) 核磁気共鳴(NMR)分光は、様々な科学分野において重要な役割を担い、分子動力学と相互作用に関する貴重な洞察を提供する。 AIによるNMR予測モデルの約束にもかかわらず、分子検索、異性体認識、ピーク割り当てといったタスクのスペクトルの解釈には課題が続いている。そこで本研究では、分子グラフ(構造)とNMRスペクトルの2つの不均一なモード間の有意な対応を確立するために、知識誘導型インスタンスワイズ識別を用いたマルチレベルマルチモーダルアライメント(K-M3AID)を提案する。特に、K-M3AIDは二重協調型コントラスト学習アーキテクチャを採用し、グラフレベルのアライメントモジュール、ノードレベルのアライメントモジュール、通信チャネルを備えている。特に、このフレームワークは、ノードレベルのアライメントモジュール内でのコントラスト学習に知識誘導型インスタンスワイド識別を導入し、クロスモーダルアライメントの精度を大幅に向上させる。さらにK-M3AIDは,ノードレベルのアライメントによって獲得したスキルがグラフレベルのアライメントに肯定的な影響を与えることを示すことで,メタラーニングの能力を示す。経験的検証は、K-M3AIDが複数のゼロショットタスクに対処する効果を強調し、複雑なNMRシナリオにおける構造情報とスペクトルデータのギャップを埋めるための有望な解決策を提供する。

Nuclear magnetic resonance (NMR) spectroscopy plays an essential role across various scientific disciplines, providing valuable insights into molecular dynamics and interactions. Despite the promise of AI-enhanced NMR prediction models, challenges persist in the interpretation of spectra for tasks such as molecular retrieval, isomer recognition, and peak assignment. In response, this paper introduces Multi-Level Multimodal Alignment with Knowledge-Guided Instance-Wise Discrimination (K-M3AID) to establish meaningful correspondences between two heterogeneous modalities: molecular graphs (structures) and NMR spectra. In particular, K-M3AID employs a dual-coordinated contrastive learning architecture, and incorporates a graph-level alignment module, a node-level alignment module, and a communication channel. Notably, the framework introduces knowledge-guided instance-wise discrimination into contrastive learning within the node-level alignment module, significantly enhancing accuracy in cross-modal alignment. Additionally, K-M3AID showcases its capability of meta-learning by demonstrating that skills acquired during node-level alignment positively impact graph-level alignment. Empirical validation underscores K-M3AID's effectiveness in addressing multiple zero-shot tasks, offering a promising solution to bridge the gap between structural information and spectral data in complex NMR scenarios.

翻訳日:2023-11-28 00:32:12 公開日:2023-11-23

# 共変量と依存シフトによるフェアネスアウェアドメインの一般化

Fairness-Aware Domain Generalization under Covariate and Dependence Shifts ( http://arxiv.org/abs/2311.13816v1 )

ライセンス: Link先を確認

Chen Zhao, Kai Jiang, Xintao Wu, Haoliang Wang, Latifur Khan, Christan Grant, Feng Chen

(参考訳) モデルフェアネスを同時に考慮しながら、ソースドメインからシフト対象ドメインへの不変な分類器の一般化を実現することは、機械学習における実質的で複雑な課題である。既存の領域一般化研究は、典型的には、クラスラベルの変更に関連する概念シフトと、データスタイルのバリエーションに関連する共変量シフトにドメインシフトがある。本稿では,領域間の公正な依存パターンの変動を伴う依存シフトと呼ばれる別の形態の分布シフトを導入することにより,共変量と依存シフトの両方を考慮し,領域シフトに対処する新しい領域一般化手法を提案する。基礎となる変換モデルの存在は、データをある領域から別の領域に変換できると断言する。モデルを用いて合成ドメインのデータを生成することにより、モデル精度と未知領域の公正性の両方を強制するフェアネス対応不変分類器が学習される。 4つのベンチマークデータセットに関する広範な実証研究は、我々のアプローチが最先端の手法を上回っていることを示している。

Achieving the generalization of an invariant classifier from source domains to shifted target domains while simultaneously considering model fairness is a substantial and complex challenge in machine learning. Existing domain generalization research typically attributes domain shifts to concept shift, which relates to alterations in class labels, and covariate shift, which pertains to variations in data styles. In this paper, by introducing another form of distribution shift, known as dependence shift, which involves variations in fair dependence patterns across domains, we propose a novel domain generalization approach that addresses domain shifts by considering both covariate and dependence shifts. We assert the existence of an underlying transformation model can transform data from one domain to another. By generating data in synthetic domains through the model, a fairness-aware invariant classifier is learned that enforces both model accuracy and fairness in unseen domains. Extensive empirical studies on four benchmark datasets demonstrate that our approach surpasses state-of-the-art methods.

翻訳日:2023-11-28 00:31:45 公開日:2023-11-23

# ニューラルネットワークを用いた確率的構造メタマテリアルの機械的特性と逆設計

Mechanical Characterization and Inverse Design of Stochastic Architected Metamaterials Using Neural Operators ( http://arxiv.org/abs/2311.13812v1 )

ライセンス: Link先を確認

Hanxun Jin, Enrui Zhang, Boyu Zhang, Sridhar Krishnaswamy, George Em Karniadakis, Horacio D. Espinosa

(参考訳) 機械学習(ML)は、設計した材料を設計するための変革的なツールとして登場し、ラボベースの試行錯誤手法によって達成可能なものを超える特性を提供する。しかし、現在の逆設計戦略における大きな課題は、計算および/または実験的なデータセットへの依存であり、特に非線形機械的挙動を示すマイクロスケールの確率的構造材料の設計において問題となる。本稿では,ディープニューラル演算子(deeponet)を活用した新しいエンド・ツー・エンドの科学mlフレームワークについて紹介する。このアプローチは、特定の非線形機械的挙動に合わせた構造物の逆設計を容易にする。 2光子リソグラフィで印刷したスピノダル微細構造から得られた結果は, 機械応答の予測誤差が5～10%の範囲内にあることを明らかにした。我々の研究は、先進的なマイクロメカニクス実験技術を用いたニューラル演算子を用いることで、データ不足に制約されたシナリオにおいても、所望の特性を持つ複雑なマイクロ構造材料の設計が実現可能であることを強調している。我々の研究は、材料設計の分野において重要な進歩を示し、実験的な洞察から直接得られる非平行な機械的特性を持つ次世代のメタマテリアルの発見と開発における新しい時代を告げる可能性を秘めている。

Machine learning (ML) is emerging as a transformative tool for the design of architected materials, offering properties that far surpass those achievable through lab-based trial-and-error methods. However, a major challenge in current inverse design strategies is their reliance on extensive computational and/or experimental datasets, which becomes particularly problematic for designing micro-scale stochastic architected materials that exhibit nonlinear mechanical behaviors. Here, we introduce a new end-to-end scientific ML framework, leveraging deep neural operators (DeepONet), to directly learn the relationship between the complete microstructure and mechanical response of architected metamaterials from sparse but high-quality in situ experimental data. The approach facilitates the inverse design of structures tailored to specific nonlinear mechanical behaviors. Results obtained from spinodal microstructures, printed using two-photon lithography, reveal that the prediction error for mechanical responses is within a range of 5 - 10%. Our work underscores that by employing neural operators with advanced micro-mechanics experimental techniques, the design of complex micro-architected materials with desired properties becomes feasible, even in scenarios constrained by data scarcity. Our work marks a significant advancement in the field of materials-by-design, potentially heralding a new era in the discovery and development of next-generation metamaterials with unparalleled mechanical characteristics derived directly from experimental insights.

翻訳日:2023-11-28 00:31:29 公開日:2023-11-23

# 教育蒸留:学生モデルを用いてシュクールで学ぶ

Education distillation:getting student models to learn in shcools ( http://arxiv.org/abs/2311.13811v1 )

ライセンス: Link先を確認

Ling Feng, Danyang Li, Tianhao Wu, Xuliang Duan

(参考訳) 知識蒸留はモデル圧縮の方法の一つであり、既存の知識蒸留技術は蒸留効率を高めるために蒸留アルゴリズムを改善する方法に焦点を当てている。本稿では,知識蒸留における動的漸進学習を導入し,教育蒸留のための蒸留戦略を提案する。具体的には,学生モデルから分割した断片化学生モデルを低次モデルとして検討することを提案する。学級レベルが上がるにつれて、断片化された学生モデルはデザインされた教育基準層と共に深くなり、さらに多くの教師モデルから学び、蒸留する。低学年から高学年への移行により、断片化された学生モデルは徐々に完全な対象の学生モデルに統合され、学生モデルの性能は段階の下位から上位へと徐々に向上した。教育蒸留戦略と蒸留アルゴリズムの組み合わせは、公開データセットであるcifar100,caltech256,food-101データセットで単一蒸留アルゴリズムの結果を上回る。

Knowledge distillation is one of the methods for model compression, and existing knowledge distillation techniques focus on how to improve the distillation algorithm so as to enhance the distillation efficdiency. This paper introduces dynamic incremental learning into knowledge distillation and proposes a distillation strategy for education distillation. Specifically, it is proposed to look at fragmented student models divided from the full student model as low models. As the grade level rises, fragmented student models deepen in conjunction with designed teaching reference layers, while learning and distilling from more teacher models. By moving from lower to higher grades, fragmented student models were gradually integrated into a complete target student model, and the performance of the student models gradually improved from lower to senior grades of the stage. Education distillation strategies combined with distillation algorithms outperform the results of single distillation algorithms on the public dataset CIFAR100,Caltech256, Food-101 dataset.

翻訳日:2023-11-28 00:31:03 公開日:2023-11-23

# 古典と量子機械学習のブリッジ:知識蒸留を用いた古典から量子ニューラルネットワークへの知識伝達

Bridging Classical and Quantum Machine Learning: Knowledge Transfer From Classical to Quantum Neural Networks Using Knowledge Distillation ( http://arxiv.org/abs/2311.13810v1 )

ライセンス: Link先を確認

Mohammad Junayed Hasan and M.R.C.Mahdy

(参考訳) ごく最近の研究では、同じ数の学習可能なパラメータが使用される場合、量子ニューラルネットワークが画像分類のようなタスクにおいて古典的なニューラルネットワークを上回ることが示されている。しかし、量子モデルの開発と最適化は、現在、量子ビット不安定性や量子ビット可用性の制限といった問題によって妨げられている。対照的に、古典的なモデルはリソースのかなりの可用性のために高性能を示すことができる。その結果、より多くの研究が古典量子統合のハイブリッドに焦点をあてている。特に、古典量子積分や量子量子アプローチによる転送学習に重点が置かれている。従来の研究とは異なり、従来の機械学習と創発的量子コンピューティングのギャップを効果的に橋渡しし、知識を古典的ニューラルネットワークから量子ニューラルネットワークに移す新しい手法を提案する。我々は、ルネットやアレックスネットのような古典的畳み込みニューラルネットワーク(CNN)アーキテクチャを教師ネットワークとして利用し、KL分割によるバックプロパガンス中の監視信号を送信することによって、学生量子モデルのトレーニングを容易にする。このアプローチは、古典的cnnのみに依存することで量子モデルのパフォーマンスを大幅に改善し、量子モデルの平均精度はmnistデータセットで0.80%、より複雑なファッションmnistデータセットで5.40%向上した。この技術を適用することで、リソース制約された環境での移動学習のための巨大な量子モデルの煩雑なトレーニングを不要にし、既存の事前学習された古典的モデルを再利用して性能を向上させることができる。

Very recently, studies have shown that quantum neural networks surpass classical neural networks in tasks like image classification when a similar number of learnable parameters are used. However, the development and optimization of quantum models are currently hindered by issues such as qubit instability and limited qubit availability, leading to error-prone systems with weak performance. In contrast, classical models can exhibit high-performance owing to substantial resource availability. As a result, more studies have been focusing on hybrid classical-quantum integration. A line of research particularly focuses on transfer learning through classical-quantum integration or quantum-quantum approaches. Unlike previous studies, this paper introduces a new method to transfer knowledge from classical to quantum neural networks using knowledge distillation, effectively bridging the gap between classical machine learning and emergent quantum computing techniques. We adapt classical convolutional neural network (CNN) architectures like LeNet and AlexNet to serve as teacher networks, facilitating the training of student quantum models by sending supervisory signals during backpropagation through KL-divergence. The approach yields significant performance improvements for the quantum models by solely depending on classical CNNs, with quantum models achieving an average accuracy improvement of 0.80% on the MNIST dataset and 5.40% on the more complex Fashion MNIST dataset. Applying this technique eliminates the cumbersome training of huge quantum models for transfer learning in resource-constrained settings and enables re-using existing pre-trained classical models to improve performance.Thus, this study paves the way for future research in quantum machine learning (QML) by positioning knowledge distillation as a core technique for advancing QML applications.

翻訳日:2023-11-28 00:30:49 公開日:2023-11-23

# AdaTyper:適応型セマンティックカラム型検出

AdaTyper: Adaptive Semantic Column Type Detection ( http://arxiv.org/abs/2311.13806v1 )

ライセンス: Link先を確認

Madelon Hulsebos and Paul Groth and \c{C}a\u{g}atay Demiralp

(参考訳) 関係表の意味を理解することは、データ探索と準備システムの自動化に役立つ。テーブルを理解するための重要な情報源は列のセマンティクスである。ディープラーニングの台頭に伴い、学習したテーブル表現が利用可能になり、セマンティックな型検出に適用でき、ベンチマークのパフォーマンスが向上する。それでも我々は,この性能と実用性とのギャップを観察する。本稿では,最も重要なデプロイメント課題の1つである適応性に対処するために,adatyperを提案する。 AdaTyperは弱いスーパービジョンを使用して、人間の最小限のフィードバックを使用して、ハイブリッド型予測器を新しいセマンティックタイプに適応し、推論時にデータ分散をシフトする。 AdaTyperのハイブリッド型予測器は,ルールベースの手法と,意味列型検出のための光機械学習モデルを組み合わせる。本稿では,実世界のデータベーステーブルにおけるadatyperの適応性能をクラウドソーシングによって評価し,f1-scoreが新規および既存型に対して改善することを示す。 adatyperは5つの例だけを見て平均0.6の精度にアプローチし、人間の正規表現や辞書に基づく既存の適応法を大きく上回っている。

Understanding the semantics of relational tables is instrumental for automation in data exploration and preparation systems. A key source for understanding a table is the semantics of its columns. With the rise of deep learning, learned table representations are now available, which can be applied for semantic type detection and achieve good performance on benchmarks. Nevertheless, we observe a gap between this performance and its applicability in practice. In this paper, we propose AdaTyper to address one of the most critical deployment challenges: adaptation. AdaTyper uses weak-supervision to adapt a hybrid type predictor towards new semantic types and shifted data distributions at inference time, using minimal human feedback. The hybrid type predictor of AdaTyper combines rule-based methods and a light machine learning model for semantic column type detection. We evaluate the adaptation performance of AdaTyper on real-world database tables hand-annotated with semantic column types through crowdsourcing and find that the f1-score improves for new and existing types. AdaTyper approaches an average precision of 0.6 after only seeing 5 examples, significantly outperforming existing adaptation methods based on human-provided regular expressions or dictionaries.

翻訳日:2023-11-28 00:30:21 公開日:2023-11-23

# 数百以上のRydberg原子のキング格子上の最大独立集合問題の量子計算データセット

Quantum Computing Dataset of Maximum Independent Set Problem on King's Lattice of over Hundred Rydberg Atoms ( http://arxiv.org/abs/2311.13803v1 )

ライセンス: Link先を確認

Kangheun Kim, Minhyuk Kim, Juyoung Park, Andrew Byun, Jaewook Ahn

(参考訳) 大規模グラフの最大独立集合(MIS)を見つけることは、古典計算では効率的に解けない非決定論的多項式時間(NP)完全問題である。ここでは、キングス格子上にランダムに配置された最大141個の原子のMIS問題を解決するために、ライドバーグ原子実験の量子断熱計算データについて述べる。 733,853の異なるグラフのMIS溶液に対して、Rydberg-atom測定の合計582,916の事象が収集された。実画像データと、測定された多体基底状態と分類されたグラフデータの全二値決定を行い、ベンチマーク試験と高度なデータ駆動分析により、Rydberg-atomアプローチの性能とシステム改善の検証を行う。

Finding the maximum independent set (MIS) of a large-size graph is a nondeterministic polynomial-time (NP)-complete problem not efficiently solvable with classical computations. Here, we present a set of quantum adiabatic computing data of Rydberg-atom experiments performed to solve the MIS problem of up to 141 atoms randomly arranged on the King's lattice. A total of 582,916 events of Rydberg-atom measurements are collected for experimental MIS solutions of 733,853 different graphs. We provide the raw image data along with the entire binary determinations of the measured many-body ground states and the classified graph data, to offer bench-mark testing and advanced data-driven analyses for validation of the performance and system improvements of the Rydberg-atom approach.

翻訳日:2023-11-28 00:29:59 公開日:2023-11-23

# 機械学習応用のためのサブスペースへの投影による最適輸送の活用

Leveraging Optimal Transport via Projections on Subspaces for Machine Learning Applications ( http://arxiv.org/abs/2311.13883v1 )

ライセンス: Link先を確認

Cl\'ement Bonet

(参考訳) 最適輸送は、基礎となる空間の幾何学を利用して確率分布を比較することができるため、機械学習において多くの注目を集めている。しかし、元々の定式化では、この問題を解決するにはかなりの計算負荷がかかる。したがって、有意義な作業ラインは、その特性を享受しながら、この負担を軽減するための代替案を提案することにある。この論文では、部分空間上の射影を用いる代替に焦点をあてる。そのような代替案の主なものはスリケード・ワッサーシュタイン距離(英語版)であり、この距離はリーマン多様体に拡張して機械学習アプリケーションに応用することを最初に提案し、近年、そのような空間を使うことが有用であることが示されている。また,いわゆる不均衡OT問題における正測度間のスライス距離についても検討した。確率測度間の元のユークリッドスライクド=ワッサーシュタイン距離に遡って、通常のワッサーシュタイン距離の代わりにこの距離を持つ空間を与えるときの勾配流のダイナミクスを研究する。次に、計量空間における内積の一般化である、確率測度の空間におけるブシェマン関数の利用について検討する。最後に、Gromov-Wasserstein 距離を用いて、部分空間デトラルアプローチを非可換空間に拡張する。

Optimal Transport has received much attention in Machine Learning as it allows to compare probability distributions by exploiting the geometry of the underlying space. However, in its original formulation, solving this problem suffers from a significant computational burden. Thus, a meaningful line of work consists at proposing alternatives to reduce this burden while still enjoying its properties. In this thesis, we focus on alternatives which use projections on subspaces. The main such alternative is the Sliced-Wasserstein distance, which we first propose to extend to Riemannian manifolds in order to use it in Machine Learning applications for which using such spaces has been shown to be beneficial in the recent years. We also study sliced distances between positive measures in the so-called unbalanced OT problem. Back to the original Euclidean Sliced-Wasserstein distance between probability measures, we study the dynamic of gradient flows when endowing the space with this distance in place of the usual Wasserstein distance. Then, we investigate the use of the Busemann function, a generalization of the inner product in metric spaces, in the space of probability measures. Finally, we extend the subspace detour approach to incomparable spaces using the Gromov-Wasserstein distance.

翻訳日:2023-11-28 00:22:33 公開日:2023-11-23

# DPAのGDPRAI対応完全性チェックに関するマルチソリューション研究

A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs ( http://arxiv.org/abs/2311.13881v1 )

ライセンス: Link先を確認

Muhammad Ilyas Azeem and Sallam Abualhaija

(参考訳) 要件エンジニアリング(RE)において、適用可能な規則に準拠するようにソフトウェアシステムの法的要件を指定することが大きな関心事である。組織によって収集される個人データは、特定の処理活動を行うために他の組織と共有されることが多い。このような場合、GDPR(General Data Protection Regulation)は、データ処理を規制し、個人データが保護され続けることを保証するデータ処理契約(DPA)を発行する必要がある。 GDPRに違反すると、巨額の罰金が数十億ユーロに達する可能性がある。個人データ処理を含むソフトウェアシステムは、GDPRで規定された法的義務に従わなければならない。要件エンジニアは、ソフトウェアシステム内のデータ処理アクティビティを規制するためのDPAの法的要件から引き出すことができる。したがって、GDPRの規定に従ってDPAの完全性を確認することは、要求が満たされることを保証するための必須条件である。 dpasを完全に手動で分析するのは時間がかかり、適切な法的専門知識を必要とする。本稿では,GDPRに対するDPAの完全性チェックに対処する自動化戦略を提案する。具体的には,従来の機械学習,ディープラーニング,言語モデリング,少数ショット学習など,さまざまなテクノロジで実現可能な10の代替ソリューションを追求する。私たちの仕事の目標は、これらの異なる技術が法的な領域においてどのように機能するかを実証的に調べることです。 F2スコアを30個の実DPAで計算した。評価の結果,F2スコアは86.7%,89.7%は事前学習されたBERTおよびRoBERTa言語モデルに基づく。我々の分析は、ディープラーニング(例えば、BiLSTM)や少数ショット学習(例えば、SetFit)に基づく他の代替ソリューションは、同等の精度を達成できるが、より効率的に開発できることを示している。

Specifying legal requirements for software systems to ensure their compliance with the applicable regulations is a major concern to requirements engineering (RE). Personal data which is collected by an organization is often shared with other organizations to perform certain processing activities. In such cases, the General Data Protection Regulation (GDPR) requires issuing a data processing agreement (DPA) which regulates the processing and further ensures that personal data remains protected. Violating GDPR can lead to huge fines reaching to billions of Euros. Software systems involving personal data processing must adhere to the legal obligations stipulated in GDPR and outlined in DPAs. Requirements engineers can elicit from DPAs legal requirements for regulating the data processing activities in software systems. Checking the completeness of a DPA according to the GDPR provisions is therefore an essential prerequisite to ensure that the elicited requirements are complete. Analyzing DPAs entirely manually is time consuming and requires adequate legal expertise. In this paper, we propose an automation strategy to address the completeness checking of DPAs against GDPR. Specifically, we pursue ten alternative solutions which are enabled by different technologies, namely traditional machine learning, deep learning, language modeling, and few-shot learning. The goal of our work is to empirically examine how these different technologies fare in the legal domain. We computed F2 score on a set of 30 real DPAs. Our evaluation shows that best-performing solutions yield F2 score of 86.7% and 89.7% are based on pre-trained BERT and RoBERTa language models. Our analysis further shows that other alternative solutions based on deep learning (e.g., BiLSTM) and few-shot learning (e.g., SetFit) can achieve comparable accuracy, yet are more efficient to develop.

翻訳日:2023-11-28 00:22:17 公開日:2023-11-23

# PointPCA+: 客観的品質評価尺度の拡張

PointPCA+: Extending PointPCA objective quality assessment metric ( http://arxiv.org/abs/2311.13880v1 )

ライセンス: Link先を確認

Xuemei Zhou, Evangelos Alexiou, Irene Viola, Pablo Cesar

(参考訳) 本稿では,pointpca の拡張である pointpca+ という計算的単純化とディスクリプタ・リッチ・ポイント・クラウド品質評価(pcqa)指標を提案する。完全参照PCQAのための点雲の幾何データとテクスチャデータの両方に適用したPCA分解に基づく知覚関連記述子セットを提案した。 PointPCA+は、より効率的に計算される既存の幾何学やテクスチャ記述子を豊かにしながら、幾何学データにのみPCAを使用する。 PointPCAと同様に、局所的な形状と外観特性をキャプチャする幾何学とテクスチャ記述子から個々の予測を学習ベースで融合することで、総品質スコアが得られる。機能融合の前に、提案されたスーパーセットから最も効果的な機能を選択するために、機能選択モジュールが導入される。実験結果から,PointPCA+は,公開データセットから得られた主観的真理値に対して高い予測性能を示した。コードは \url{https://github.com/cwi-dis/pointpca_suite/} で入手できる。

A computationally-simplified and descriptor-richer Point Cloud Quality Assessment (PCQA) metric, namely PointPCA+, is proposed in this paper, which is an extension of PointPCA. PointPCA proposed a set of perceptually-relevant descriptors based on PCA decomposition that were applied to both the geometry and texture data of point clouds for full reference PCQA. PointPCA+ employs PCA only on the geometry data while enriching existing geometry and texture descriptors, that are computed more efficiently. Similarly to PointPCA, a total quality score is obtained through a learning-based fusion of individual predictions from geometry and texture descriptors that capture local shape and appearance properties, respectively. Before feature fusion, a feature selection module is introduced to choose the most effective features from a proposed super-set. Experimental results show that PointPCA+ achieves high predictive performance against subjective ground truth scores obtained from publicly available datasets. The code is available at \url{https://github.com/cwi-dis/pointpca_suite/}.

翻訳日:2023-11-28 00:21:50 公開日:2023-11-23

# 時空オントロジーの相対性:空間における相関が時間内でコレルタになるとき

Relativity of spacetime ontology: When correlations in space become correlata in time ( http://arxiv.org/abs/2311.13879v1 )

ライセンス: Link先を確認

Marek Czachor and Marcin Nowakowski

(参考訳) マーミンの見解に「相関は物理的現実を持ち、相関関係は相関しない」という見解は、相関関係と相関関係は根本的に異なるものではないと主張する。これらは部分系を定義するテンソル積分解に依存する双対概念である。同じ量子状態は絡み合うか分離可能であるが、別のテンソル積構造に関して、ある文脈における空間的相関は別の文脈における時間的相関となり、その逆も成り立つ。結果として、$v\otimes v$ の下で不変な 2-量子ビット状態は、一重項状態に関するよく知られた一意性定理と相反し、絡み合うか絡み合うかのいずれかになり得る。

Challenging Mermin's perspective that ``correlations have physical reality; that which they correlate does not'' we argue that correlations and correlata are not fundamentally distinct. These are dual concepts depending on the tensor product decomposition defining subsystems. Since the same quantum states may be either entangled or separable, but with respect to alternative tensor product structures, a spatial correlation in one context can become a temporal correlatum in another, and vice versa. In consequence, 2-qubit states invariant under $V\otimes V$ can be either entangled or unentangled, in conflict with the well known uniqueness theorem about the singlet state, a fact with possible implications for the quantum measurement theory.

翻訳日:2023-11-28 00:21:32 公開日:2023-11-23

# 大規模言語モデルにおけるFactual UnconsistencyとHalucinationの最小化

Minimizing Factual Inconsistency and Hallucination in Large Language Models ( http://arxiv.org/abs/2311.13878v1 )

ライセンス: Link先を確認

Muneeswaran I, Shreya Saxena, Siva Prasad, M V Sai Prakash, Advaith Shankar, Varun V, Vishal Vaddina, Saisubramaniam Gopalakrishnan

(参考訳) 大規模言語モデル(英語版)(llm)は、様々な言語関連のタスクにおいて顕著な能力があるため、医療、教育、金融といった重要な分野で広く使われている。しかし、llmは事実的に不正確な応答や「幻覚」を生じやすいため、信頼性とユーザー間の信頼が失われる可能性がある。この問題に対処するため,我々は,まず根拠を生成し,誤用を検証し,改良し,回答を生成するための参照支援として使用する多段階フレームワークを提案する。生成された合理性は回答の透明性を高め、私たちのフレームワークは、この合理性とコンテキストへの参照を使用することで、この回答にモデルがどのように到達したかに関する洞察を提供します。本稿では,生命科学産業における薬物関連質問に対する回答の質の向上に有効であることを示す。 2つのデータセットにおいて,openai gpt-3.5-turboの方が14～25%忠実で16～22%精度が向上し,従来の検索拡張生成(rag)を改善した。さらに,提案手法に基づく微調整サンプルは,小型オープンアクセスllmの精度を33～42%向上させ,商用モデルのragと競合する。

Large Language Models (LLMs) are widely used in critical fields such as healthcare, education, and finance due to their remarkable proficiency in various language-related tasks. However, LLMs are prone to generating factually incorrect responses or "hallucinations," which can lead to a loss of credibility and trust among users. To address this issue, we propose a multi-stage framework that generates the rationale first, verifies and refines incorrect ones, and uses them as supporting references to generate the answer. The generated rationale enhances the transparency of the answer and our framework provides insights into how the model arrived at this answer, by using this rationale and the references to the context. In this paper, we demonstrate its effectiveness in improving the quality of responses to drug-related inquiries in the life sciences industry. Our framework improves traditional Retrieval Augmented Generation (RAG) by enabling OpenAI GPT-3.5-turbo to be 14-25% more faithful and 16-22% more accurate on two datasets. Furthermore, fine-tuning samples based on our framework improves the accuracy of smaller open-access LLMs by 33-42% and competes with RAG on commercial models.

翻訳日:2023-11-28 00:21:17 公開日:2023-11-23

# 動的ステップスケジューリングのための局所最適降下

Locally Optimal Descent for Dynamic Stepsize Scheduling ( http://arxiv.org/abs/2311.13877v1 )

ライセンス: Link先を確認

Gilad Yehudai, Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain

(参考訳) 本稿では,実際にスケジュールのマニュアルと時間的チューニングを簡略化することを目的として,理論に基づく新しい動的学習率スケジューリング手法を提案する。本手法は,局所最適ステップを推定し,現在のステップの確率勾配の方向における最大降下を保証する。まず, 滑らか性パラメータの知識のみを仮定しながら, 滑らかな非凸確率最適化の文脈において, 理論収束境界を定式化する。次に,本手法を既存の学習率スケジューラと比較し,多種多様なデータセットと最適化アルゴリズムにまたがる系統的実験を行う。提案手法は,既存の手法と比較して最小限のチューニングが必要であり,補助的な手動スケジュールやウォームアップフェーズを不要とし,パラメータチューニングを劇的に削減して同等の性能を達成できることを示す。

We introduce a novel dynamic learning-rate scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in practice. Our approach is based on estimating the locally-optimal stepsize, guaranteeing maximal descent in the direction of the stochastic gradient of the current step. We first establish theoretical convergence bounds for our method within the context of smooth non-convex stochastic optimization, matching state-of-the-art bounds while only assuming knowledge of the smoothness parameter. We then present a practical implementation of our algorithm and conduct systematic experiments across diverse datasets and optimization algorithms, comparing our scheme with existing state-of-the-art learning-rate schedulers. Our findings indicate that our method needs minimal tuning when compared to existing approaches, removing the need for auxiliary manual schedules and warm-up phases and achieving comparable performance with drastically reduced parameter tuning.

翻訳日:2023-11-28 00:20:55 公開日:2023-11-23

# Denoising Autoencoders を用いたDyslexia検出のための脳波接続性解析

EEG Connectivity Analysis Using Denoising Autoencoders for the Detection of Dyslexia ( http://arxiv.org/abs/2311.13876v1 )

ライセンス: Link先を確認

Francisco Jesus Martinez-Murcia, Andr\'es Ortiz, Juan Manuel G\'orriz, Javier Ram\'irez, Pedro Javier Lopez-Perez, Miguel L\'opez-Zamora, Juan Luis Luque

(参考訳) テンポラルサンプリングフレームワーク(TSF)は、ディプレキシアの特徴的な音韻学的困難は、1つ以上の時間速度で非定型的な振動サンプリングによって引き起こされると理論化している。 leeduca研究は、遅発性韻律(0.5-1 hz)、音節性(4-8 hz)、音素(12-40 hz)の速度で振幅変調(am)ノイズを聴く小児の脳波(eeg)実験を行い、失読症に関連する振動サンプリングの知覚の違いを検出することを目的とした。本研究の目的は、これらの違いが存在するかどうか、また、言語の違いによる子どものパフォーマンスと、ディフレキシーの検知に一般的に使用される認知タスクとの関連性を確認することである。この目的のために、時間的およびスペクトル的なチャネル間EEG接続を推定し、接続行列の低次元表現を学習するためにDAE(denoising autoencoder)を訓練した。この表現は相関分析と分類分析によって解析され、0.8以上の精度で失読症患者を検出する能力が明らかにされ、約0.7の精度でバランスが取れた。 dae表現のいくつかの特徴は、音韻認識や急速シンボリックネーミングといった音韻学的仮説のカテゴリーの言語と認知的タスクにおける子どものパフォーマンスと、読書効率と読書理解とで有意に相関した(p<0.005$)。最後に,adjacency matrixの深部解析により,dd患者において側頭葉(大体一次聴覚野)の電極間の両側接続が減少し,ブロカ領域に略してf7電極の接続が増加することが明らかとなった。これらの結果は、脳波などの客観的な方法論を用いた失読症の補完的評価への道を開いた。

The Temporal Sampling Framework (TSF) theorizes that the characteristic phonological difficulties of dyslexia are caused by an atypical oscillatory sampling at one or more temporal rates. The LEEDUCA study conducted a series of Electroencephalography (EEG) experiments on children listening to amplitude modulated (AM) noise with slow-rythmic prosodic (0.5-1 Hz), syllabic (4-8 Hz) or the phoneme (12-40 Hz) rates, aimed at detecting differences in perception of oscillatory sampling that could be associated with dyslexia. The purpose of this work is to check whether these differences exist and how they are related to children's performance in different language and cognitive tasks commonly used to detect dyslexia. To this purpose, temporal and spectral inter-channel EEG connectivity was estimated, and a denoising autoencoder (DAE) was trained to learn a low-dimensional representation of the connectivity matrices. This representation was studied via correlation and classification analysis, which revealed ability in detecting dyslexic subjects with an accuracy higher than 0.8, and balanced accuracy around 0.7. Some features of the DAE representation were significantly correlated ($p<0.005$) with children's performance in language and cognitive tasks of the phonological hypothesis category such as phonological awareness and rapid symbolic naming, as well as reading efficiency and reading comprehension. Finally, a deeper analysis of the adjacency matrix revealed a reduced bilateral connection between electrodes of the temporal lobe (roughly the primary auditory cortex) in DD subjects, as well as an increased connectivity of the F7 electrode, placed roughly on Broca's area. These results pave the way for a complementary assessment of dyslexia using more objective methodologies such as EEG.

翻訳日:2023-11-28 00:20:38 公開日:2023-11-23

# 法的要件分析

Legal Requirements Analysis ( http://arxiv.org/abs/2311.13871v1 )

ライセンス: Link先を確認

Sallam Abualhaija and Marcello Ceci and Lionel Briand

(参考訳) 現代のソフトウェアは多くの分野やアプリケーションコンテキストにおいて日常的な活動の不可欠な部分です。人工知能(AI)を活用したインテリジェントオートメーションの導入は、多くの分野でブレークスルーにつながった。 aiの有効性は、データの可用性の増加など、いくつかの要因によって引き起こされる可能性がある。欧州連合(EU)におけるGDPR(General Data Protection Regulation)などの規制は、個人データの保護を保証するために導入されている。個人データを収集、処理、共有するソフトウェアシステムは、そのような規則に従っている。コンプライアンスソフトウェアの開発は、ソフトウェア開発プロセスの要件工学(re)フェーズにおける中心的な活動である、適用規則に規定された法的要件の対処に大きく依存する。 REは、法的要件を含むシステム・トゥ・ビーの要件を特定し維持することに関心がある。個人データ処理のために組織が実施する政策を記述した法的合意は、法的要件を付与するための規制に付加的な情報源を提供することができる。本章では、法的要件を分析し、GDPR上でそれらを実証する様々な方法について考察する。具体的には、規制から機械分析可能な表現を作成するための代替案について述べ、規制に対するコンプライアンス検証を可能にする既存の自動化手段を調査し、法的要件分析の現在の課題をさらに反映する。

Modern software has been an integral part of everyday activities in many disciplines and application contexts. Introducing intelligent automation by leveraging artificial intelligence (AI) led to break-throughs in many fields. The effectiveness of AI can be attributed to several factors, among which is the increasing availability of data. Regulations such as the general data protection regulation (GDPR) in the European Union (EU) are introduced to ensure the protection of personal data. Software systems that collect, process, or share personal data are subject to compliance with such regulations. Developing compliant software depends heavily on addressing legal requirements stipulated in applicable regulations, a central activity in the requirements engineering (RE) phase of the software development process. RE is concerned with specifying and maintaining requirements of a system-to-be, including legal requirements. Legal agreements which describe the policies organizations implement for processing personal data can provide an additional source to regulations for eliciting legal requirements. In this chapter, we explore a variety of methods for analyzing legal requirements and exemplify them on GDPR. Specifically, we describe possible alternatives for creating machine-analyzable representations from regulations, survey the existing automated means for enabling compliance verification against regulations, and further reflect on the current challenges of legal requirements analysis.

翻訳日:2023-11-28 00:20:00 公開日:2023-11-23

# L(M)V-IQL:動物行動評価のための複数意図逆強化学習

L(M)V-IQL: Multiple Intention Inverse Reinforcement Learning for Animal Behavior Characterization ( http://arxiv.org/abs/2311.13870v1 )

ライセンス: Link先を確認

Hao Zhu, Brice De La Crompe, Gabriel Kalweit, Artur Schneider, Maria Kalweit, Ilka Diester, Joschka Boedecker

(参考訳) 意思決定プロセスの理解を深める過程で、数学モデル、特に逆強化学習(Inverse Reinforcement Learning、IRL)は、複雑な行動の中で動物の複数の意図を再構築するのに役立つことが証明されている。近年,連続時間マルチインテンションirlフレームワークが開発されており,マルチインテンションirlアプローチによる離散時間変動報酬関数の推測について検討が続けられている。この課題に対処するために、離散固有報酬の調整に適した新しいIRLフレームワークであるL(M)V-IQLアルゴリズム(Latent (Markov) V-IQL)を導入する。期待最大化手法を活用し,観測された軌跡を異なる意図に分類し,それぞれのirl問題を独立に解く。シミュレーション実験によるL(M)V-IQLの有効性の実証と実際のマウス行動データセットへの応用により,動物行動予測における現在のベンチマークを超え,解釈可能な報酬関数を生成する。この進歩は神経科学と心理学の約束を守り、動物の意思決定をより深く理解し、基礎となる脳のメカニズムを明らかにするのに役立つ。

In advancing the understanding of decision-making processes, mathematical models, particularly Inverse Reinforcement Learning (IRL), have proven instrumental in reconstructing animal's multiple intentions amidst complex behaviors. Given the recent development of a continuous-time multi-intention IRL framework, there has been persistent inquiry into inferring discrete time-varying reward functions with multiple intention IRL approaches. To tackle the challenge, we introduce the Latent (Markov) Variable Inverse Q-learning (L(M)V-IQL) algorithms, a novel IRL framework tailored for accommodating discrete intrinsic rewards. Leveraging an Expectation-Maximization approach, we cluster observed trajectories into distinct intentions and independently solve the IRL problem for each. Demonstrating the efficacy of L(M)V-IQL through simulated experiments and its application to different real mouse behavior datasets, our approach surpasses current benchmarks in animal behavior prediction, producing interpretable reward functions. This advancement holds promise for neuroscience and psychology, contributing to a deeper understanding of animal decision-making and uncovering underlying brain mechanisms.

翻訳日:2023-11-28 00:19:43 公開日:2023-11-23

# 言語誘導による少数ショット意味セグメンテーション

Language-guided Few-shot Semantic Segmentation ( http://arxiv.org/abs/2311.13865v1 )

ライセンス: Link先を確認

Jing Wang, Yuang Liu, Qiang Zhou, Fan Wang

(参考訳) 少数ショット学習は、小さなラベル付きサポートセットのガイダンスに従って、新しいカテゴリ適応でラベルコストを削減する有望な方法である。しかし、いくつかのセマンティックセグメンテーションでは、サポート画像のピクセルレベルのアノテーションはまだ高価だ。本稿では,言語情報,すなわち画像レベルテキストラベルのみを用いた,数発意味セグメンテーションの課題に取り組むための革新的な解決法を提案する。提案手法では,VLPモデルとマスク精錬器を含む視覚言語駆動型マスク蒸留方式を用いて,テキストプロンプトから高品質な擬似セマンティックマスクを生成する。さらに,支援画像と問合せ画像間の正確な意味関係を探索するモデルのガイドとして,分散プロトタイプ監督手法と補完相関マッチングモジュールを提案する。 2つのベンチマークデータセットにおける実験により,本手法は,言語誘導小ショット意味セグメンテーションのための新しいベースラインを確立し,近年の視覚誘導法と競合する結果を得ることができた。

Few-shot learning is a promising way for reducing the label cost in new categories adaptation with the guidance of a small, well labeled support set. But for few-shot semantic segmentation, the pixel-level annotations of support images are still expensive. In this paper, we propose an innovative solution to tackle the challenge of few-shot semantic segmentation using only language information, i.e.image-level text labels. Our approach involves a vision-language-driven mask distillation scheme, which contains a vision-language pretraining (VLP) model and a mask refiner, to generate high quality pseudo-semantic masks from text prompts. We additionally introduce a distributed prototype supervision method and complementary correlation matching module to guide the model in digging precise semantic relations among support and query images. The experiments on two benchmark datasets demonstrate that our method establishes a new baseline for language-guided few-shot semantic segmentation and achieves competitive results to recent vision-guided methods.

翻訳日:2023-11-28 00:19:20 公開日:2023-11-23

# ファンド投資の意思決定において、何が一番重要なのか。多粒度グラフ分散学習フレームワーク

Which Matters Most in Making Fund Investment Decisions? A Multi-granularity Graph Disentangled Learning Framework ( http://arxiv.org/abs/2311.13864v1 )

ライセンス: Link先を確認

Chunjing Gan, Binbin Hu, Bo Huang, Tianyu Zhao, Yingru Lin, Wenliang Zhong, Zhiqiang Zhang, Jun Zhou, Chuan Shi

(参考訳) 本稿では、個人的利益を超えた投資判断を行う上での適合性とリスクの選好が重要であることを強調し、これらの側面を相反する形で共同特徴付けしようとする。そこで我々は,投資商品の知的マッチングを効果的に行うため,MGDLと呼ばれる新しいM言語グラニュラリティグラフ分散学習フレームワークを開発した。十分に確立されたファンドグラフとアテンションモジュールから得られる多粒度ユーザ表現は、個人的関心、適合性、リスク嗜好をきめ細かな方法で別々に表現する歴史的な行動に由来する。特定のセマンティクスでより強い非絡み合い表現を実現するため、MGDLは2つの自己監督信号、すなわちファンドタイプのコントラストとファンドの人気を明示的に含んでいる。オフラインおよびオンライン環境での大規模な実験はMGDLの有効性を検証する。

In this paper, we highlight that both conformity and risk preference matter in making fund investment decisions beyond personal interest and seek to jointly characterize these aspects in a disentangled manner. Consequently, we develop a novel M ulti-granularity Graph Disentangled Learning framework named MGDL to effectively perform intelligent matching of fund investment products. Benefiting from the well-established fund graph and the attention module, multi-granularity user representations are derived from historical behaviors to separately express personal interest, conformity and risk preference in a fine-grained way. To attain stronger disentangled representations with specific semantics, MGDL explicitly involve two self-supervised signals, i.e., fund type based contrasts and fund popularity. Extensive experiments in offline and online environments verify the effectiveness of MGDL.

翻訳日:2023-11-28 00:19:03 公開日:2023-11-23

# メンタルヘルスカウンセリングにおける大規模言語モデルの課題

Challenges of Large Language Models for Mental Health Counseling ( http://arxiv.org/abs/2311.13857v1 )

ライセンス: Link先を確認

Neo Christopher Chung, George Dyer, Lennart Brocki

(参考訳) 世界的メンタルヘルス危機は、精神障害の急速な増加、限られた資源、治療を求める社会的汚名を伴っている。近年、人工知能(AI)の分野が顕著な進歩を見せているため、人間のような文章を理解・生成できる大規模言語モデル(LLM)が心理学的カウンセリングを支援したり提供したりすることができる。しかし、精神保健領域におけるLSMの適用は、提供された情報の正確性、有効性、信頼性に関する懸念を提起する。本稿では, モデル幻覚, 解釈可能性, バイアス, プライバシ, 臨床効果など, 心理カウンセリングのためのLSMの開発に伴う課題について検討する。我々は、現在のAIパラダイムに適用可能な、これらの課題に対する潜在的な解決策を探る。メンタルヘルスのためのLLMの開発とデプロイの経験から、LLMの落とし穴を慎重にナビゲートし克服できれば、AIはメンタルヘルスを改善するための大きな約束を持っています。

The global mental health crisis is looming with a rapid increase in mental disorders, limited resources, and the social stigma of seeking treatment. As the field of artificial intelligence (AI) has witnessed significant advancements in recent years, large language models (LLMs) capable of understanding and generating human-like text may be used in supporting or providing psychological counseling. However, the application of LLMs in the mental health domain raises concerns regarding the accuracy, effectiveness, and reliability of the information provided. This paper investigates the major challenges associated with the development of LLMs for psychological counseling, including model hallucination, interpretability, bias, privacy, and clinical effectiveness. We explore potential solutions to these challenges that are practical and applicable to the current paradigm of AI. From our experience in developing and deploying LLMs for mental health, AI holds a great promise for improving mental health care, if we can carefully navigate and overcome pitfalls of LLMs.

翻訳日:2023-11-28 00:18:47 公開日:2023-11-23

# うつ病診療ガイドラインを用いた診断説明可能性へのクロスアテンションアプローチ

A Cross Attention Approach to Diagnostic Explainability using Clinical Practice Guidelines for Depression ( http://arxiv.org/abs/2311.13852v1 )

ライセンス: Link先を確認

Sumit Dalal, Deepa Tilwani, Manas Gaur, Sarika Jain, Valerie Shalin, and Amit Seth

(参考訳) 関連する臨床知識を用いた説明可能性の欠如は、非構造化臨床対話の人工知能による分析の採用を妨げる。 MH(Mental Health)に関する豊富なデータがオンラインコミュニティで利用可能であり、オンラインとオフラインの両方のアプリケーションのスクリーニングツールとして、潜在的な影響で説明可能性の問題に対処する機会を提供する。本研究では, 一般的な変圧器モデルの注意力を高める手法を開発し, 外部臨床知識を組み込んだ分類のための臨床理解可能な説明を生成する。臨床医が患者と対話するときの専門知識にどのように依存するかに着想を得て,関連する臨床知識を活用して患者の入力をモデル化し,分類に有意義な説明を与える。これは手作業によるレビュー時間を節約し、信頼を高める。我々は,世界的懸念の精神保健障害であるうつ病の診断に臨床実習ガイドライン(CPG)を用いて,MHの文脈でこのようなシステムを開発する。本稿では,cpgを組み込んだpsat(process knowledge-infused cross attention)と呼ばれるアプリケーション固有言語モデルを提案する。うつ病に関連する3つの専門家計算データセットの厳密な評価を通じて, PSATの応用関連説明可能性を示す。 PSATは9つのベースラインモデルのパフォーマンスを上回り、他のベースラインが不足している説明を提供する。我々は,患者健康アンケート(PHQ-9)などの抑うつに焦点を当てたCPGリソースを,SNOMED-CTを用いた機械可読性オントロジーに変換する。このリソースにより、PSATはGPT-3.5のようなモデルでアプリケーション関連の説明を生成する能力を高める。

The lack of explainability using relevant clinical knowledge hinders the adoption of Artificial Intelligence-powered analysis of unstructured clinical dialogue. A wealth of relevant, untapped Mental Health (MH) data is available in online communities, providing the opportunity to address the explainability problem with substantial potential impact as a screening tool for both online and offline applications. We develop a method to enhance attention in popular transformer models and generate clinician-understandable explanations for classification by incorporating external clinical knowledge. Inspired by how clinicians rely on their expertise when interacting with patients, we leverage relevant clinical knowledge to model patient inputs, providing meaningful explanations for classification. This will save manual review time and engender trust. We develop such a system in the context of MH using clinical practice guidelines (CPG) for diagnosing depression, a mental health disorder of global concern. We propose an application-specific language model called ProcesS knowledge-infused cross ATtention (PSAT), which incorporates CPGs when computing attention. Through rigorous evaluation on three expert-curated datasets related to depression, we demonstrate application-relevant explainability of PSAT. PSAT also surpasses the performance of nine baseline models and can provide explanations where other baselines fall short. We transform a CPG resource focused on depression, such as the Patient Health Questionnaire (e.g. PHQ-9) and related questions, into a machine-readable ontology using SNOMED-CT. With this resource, PSAT enhances the ability of models like GPT-3.5 to generate application-relevant explanations.

翻訳日:2023-11-28 00:18:30 公開日:2023-11-23

# 混合重みトレーニングによる文法的誤り訂正

Grammatical Error Correction via Mixed-Grained Weighted Training ( http://arxiv.org/abs/2311.13848v1 )

ライセンス: Link先を確認

Jiahao Li, Quan Wang, Chiwei Zhu, Zhendong Mao, Yongdong Zhang

(参考訳) 文法的誤り訂正(GEC)の課題は,自然文の文法的誤りを自動的に補正することである。ほとんど全ての先行研究は、注釈付きトレーニングデータを平等に扱うが、固有のデータの不一致は無視される。本稿では,データアノテーションの精度と潜在的なアノテーションの多様性という2つの側面に固有の相違点を示す。そこで本研究では,データアノテーションの精度と潜在的多様性の相違に基づいて,トークンレベルと文レベルのトレーニング重み付けをそれぞれ設計し,gecのトレーニング効果を向上させるために混合粒度重み付けトレーニングを行うmaingecを提案する。経験的評価は、Seq2SeqとSeq2Editの方法では、MainGECは2つのベンチマークデータセットで一貫した、重要なパフォーマンス改善を実現し、混合粒度トレーニングの有効性と優位性を示している。さらにアブレーション実験により,MainGECにおける両粒度の設計重量の有効性が検証された。

The task of Grammatical Error Correction (GEC) aims to automatically correct grammatical errors in natural texts. Almost all previous works treat annotated training data equally, but inherent discrepancies in data are neglected. In this paper, the inherent discrepancies are manifested in two aspects, namely, accuracy of data annotation and diversity of potential annotations. To this end, we propose MainGEC, which designs token-level and sentence-level training weights based on inherent discrepancies in accuracy and potential diversity of data annotation, respectively, and then conducts mixed-grained weighted training to improve the training effect for GEC. Empirical evaluation shows that whether in the Seq2Seq or Seq2Edit manner, MainGEC achieves consistent and significant performance improvements on two benchmark datasets, demonstrating the effectiveness and superiority of the mixed-grained weighted training. Further ablation experiments verify the effectiveness of designed weights of both granularities in MainGEC.

翻訳日:2023-11-28 00:18:03 公開日:2023-11-23

# 協調的側方情報を用いた知覚画像圧縮

Perceptual Image Compression with Cooperative Cross-Modal Side Information ( http://arxiv.org/abs/2311.13847v1 )

ライセンス: Link先を確認

Shiyu Qin, Bin Chen, Yujun Huang, Baoyi An, Tao Dai, Shu-Tao Via

(参考訳) データの爆発により、画像とともに多くの関連テキストが送信されるようになった。分散ソース符号化から着想を得た多くの作品が画像側情報を利用して画像圧縮を強化する。しかし、既存の手法では、マルチモーダル・シナジーの利点が研究で広く実証されているにもかかわらず、画像の知覚的圧縮を高めるために、テキストをサイド情報として使うことを考慮していない。テキストレベルのセマンティクスを効果的に転送して、デコーダにのみ使用可能な画像圧縮を支援するには、どうすればよいのか? 本研究では,テキスト誘導側情報を用いた新しい深層画像圧縮手法を提案する。具体的には,CLIPテキストエンコーダとSemantic-Spatial Awareブロックを用いてテキストと画像の特徴を融合する。これは、学習したテキスト適応アフィン変換をピクセルレベルで導くためにセマンティックマスクを予測することで実現される。さらに,再構成画像の知覚品質を向上させるために,テキスト条件生成対向ネットワークを設計する。 4つのデータセットと10の画像品質評価指標を含む大規模な実験により、提案手法は速度知覚トレードオフと意味的歪みの点で優れた結果が得られることを示した。

The explosion of data has resulted in more and more associated text being transmitted along with images. Inspired by from distributed source coding, many works utilize image side information to enhance image compression. However, existing methods generally do not consider using text as side information to enhance perceptual compression of images, even though the benefits of multimodal synergy have been widely demonstrated in research. This begs the following question: How can we effectively transfer text-level semantic dependencies to help image compression, which is only available to the decoder? In this work, we propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff. Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features. This is done by predicting a semantic mask to guide the learned text-adaptive affine transformation at the pixel level. Furthermore, we design a text-conditional generative adversarial networks to improve the perceptual quality of reconstructed images. Extensive experiments involving four datasets and ten image quality assessment metrics demonstrate that the proposed approach achieves superior results in terms of rate-perception trade-off and semantic distortion.

翻訳日:2023-11-28 00:17:47 公開日:2023-11-23

# metafbp:パーソナライズされた美容予測のための高次予測学習

MetaFBP: Learning to Learn High-Order Predictor for Personalized Facial Beauty Prediction ( http://arxiv.org/abs/2311.13929v1 )

ライセンス: Link先を確認

Luojun Lin, Zhifeng Shen, Jia-Li Yin, Qipeng Liu, Yuanlong Yu, Weijie Chen

(参考訳) 個人の美的嗜好を予測することは、人間社会に重要な実用的応用と学術的意味を持つ。しかし,既存の研究では顔の魅力の共通性を学習・予測することを中心に,Personalized Facial Beauty Prediction (PFBP)にはほとんど注目されていない。 PFBPは、個々の審美的嗜好に適応できるマシンを開発することを目的としており、各ユーザーによって評価される画像はわずかである。本稿では,各ユーザがメタタスクに対応するメタ学習の観点から,このタスクを定式化する。このようなpfbp課題に対処するために、社会における視覚美学がガウス分布に従う人間の美的メカニズムからインスピレーションを得て、ユーザの嗜好を共通性と個性部分とに切り離すことを動機付ける。そこで本研究では,ユニバーサル特徴抽出器を考案し,審美的共通性を捉え,メタラーニング機構を介して予測器の判断境界をシフトすることで審美的個性に適応するように最適化した,新しいメタfbpフレームワークを提案する。適応の遅さや小さなサポートセットへの過剰適合に苦しむ従来のメタ学習手法とは異なり、高速適応のための高次予測器を最適化する新しい手法を提案する。提案手法の性能を検証するために,既存の顔美観予測データセットを用いて複数のpfbpベンチマークを構築した。これらのベンチマーク実験により,MetaFBP法の有効性が示された。

Predicting individual aesthetic preferences holds significant practical applications and academic implications for human society. However, existing studies mainly focus on learning and predicting the commonality of facial attractiveness, with little attention given to Personalized Facial Beauty Prediction (PFBP). PFBP aims to develop a machine that can adapt to individual aesthetic preferences with only a few images rated by each user. In this paper, we formulate this task from a meta-learning perspective that each user corresponds to a meta-task. To address such PFBP task, we draw inspiration from the human aesthetic mechanism that visual aesthetics in society follows a Gaussian distribution, which motivates us to disentangle user preferences into a commonality and an individuality part. To this end, we propose a novel MetaFBP framework, in which we devise a universal feature extractor to capture the aesthetic commonality and then optimize to adapt the aesthetic individuality by shifting the decision boundary of the predictor via a meta-learning mechanism. Unlike conventional meta-learning methods that may struggle with slow adaptation or overfitting to tiny support sets, we propose a novel approach that optimizes a high-order predictor for fast adaptation. In order to validate the performance of the proposed method, we build several PFBP benchmarks by using existing facial beauty prediction datasets rated by numerous users. Extensive experiments on these benchmarks demonstrate the effectiveness of the proposed MetaFBP method.

翻訳日:2023-11-28 00:11:11 公開日:2023-11-23

# ロバストな動的ドメイン一般化のためのパラメータ交換

Parameter Exchange for Robust Dynamic Domain Generalization ( http://arxiv.org/abs/2311.13928v1 )

ライセンス: Link先を確認

Luojun Lin, Zhifeng Shen, Zhishu Sun, Yuanlong Yu, Lei Zhang, Weijie Chen

(参考訳) ドメインの非依存化は、未知のターゲットドメインにおけるモデル劣化の主な原因であり、ドメイン一般化(DG)の開発が緊急に必要となる。近年のDGは、固定重み付き静的モデルにおける自己適応性の欠如を補う、動的ドメイン一般化(DDG)と呼ばれる、未知のターゲットドメインに対するトレーニング不要適応を実現するために、動的ネットワークを使用している。動的ネットワークのパラメータは、それぞれドメイン不変性とドメイン固有性を学ぶために設計された静的コンポーネントと動的コンポーネントに分離することができる。本研究では,既存の技術に基づいて,静的および動的コンポーネントを最適化の観点からより徹底的に切り離すことにより,DDGの限界を推し進める。ドメイン固有の情報を増強することで、静的コンポーネントがより包括的にドメイン不変な特徴を学べるようにすることが主な考慮事項です。その結果、静的コンポーネントによって学習されるより包括的なドメイン不変機能は、動的コンポーネントを適応したドメイン固有の機能を学ぶことに集中させることができます。そこで本研究では,静的成分と動的成分の組み合わせを摂動する,単純で効果的なパラメータ交換法を提案する。摂動および非摂動フィードフォワードの勾配を併用してモデルを最適化し, 上記不等角化を暗黙的に達成する。このように、2つのコンポーネントは相互に便宜的に最適化することができ、これは非依存領域シフトに抵抗し、未知のターゲット領域における自己適応性を改善することができる。大規模な実験により、PEは既存の動的ネットワークに簡単に接続でき、ベルやホイッスルを使わずに一般化能力を向上させることができる。

Agnostic domain shift is the main reason of model degradation on the unknown target domains, which brings an urgent need to develop Domain Generalization (DG). Recent advances at DG use dynamic networks to achieve training-free adaptation on the unknown target domains, termed Dynamic Domain Generalization (DDG), which compensates for the lack of self-adaptability in static models with fixed weights. The parameters of dynamic networks can be decoupled into a static and a dynamic component, which are designed to learn domain-invariant and domain-specific features, respectively. Based on the existing arts, in this work, we try to push the limits of DDG by disentangling the static and dynamic components more thoroughly from an optimization perspective. Our main consideration is that we can enable the static component to learn domain-invariant features more comprehensively by augmenting the domain-specific information. As a result, the more comprehensive domain-invariant features learned by the static component can then enforce the dynamic component to focus more on learning adaptive domain-specific features. To this end, we propose a simple yet effective Parameter Exchange (PE) method to perturb the combination between the static and dynamic components. We optimize the model using the gradients from both the perturbed and non-perturbed feed-forward jointly to implicitly achieve the aforementioned disentanglement. In this way, the two components can be optimized in a mutually-beneficial manner, which can resist the agnostic domain shifts and improve the self-adaptability on the unknown target domain. Extensive experiments show that PE can be easily plugged into existing dynamic networks to improve their generalization ability without bells and whistles.

翻訳日:2023-11-28 00:10:46 公開日:2023-11-23

# 機械学習分類アルゴリズムを用いた臨床・RT-PCR患者の回復・減少予測

Predicting Recovery or Decease of COVID-19 Patients with Clinical and RT-PCR Using Machine Learning Classification Algorithms ( http://arxiv.org/abs/2311.13925v1 )

ライセンス: Link先を確認

Mohammad Dehghani, Zahra Yazdanparast

(参考訳) 新型コロナウイルスのパンデミックは世界経済と人々の日常生活を前例のない方法で破壊している。適切な判断を下すには、covid-19を迅速かつ正確に診断する必要がある。臨床意思決定は患者から収集されたデータに左右される。人工知能の助けを借りて、COVID-19は症状、ポリメラーゼ連鎖反応(PCR)、CTスキャン、胸部X線検査、定期的な血液検査、さらにはうっ血音を分析して迅速に診断された。さらに、これらのデータは患者のモラルの予測に使用できるが、どのデータが最も正確な予測を行うかという疑問がある。したがって,本研究は2つの部分からなる。私たちの最初の目標は、データセットにある機能に基づいて、機械学習アルゴリズムがcovid-19のケース(回復か死か)の結果を予測することができるかどうかを調べることです。本研究の第2部では, 臨床とrt-pcrが回復の予測とデセアーゼに与えた影響について検討し, どちらが信頼性が高いかについて検討した。特徴セットの異なる4つのステージを定義し,6つの機械学習手法を用いて予測モデルを構築した。 78.7%の精度で、ランダム森林は患者の死亡と回復を予測できる有望な結果を示した。このことから,患者の回復と退院は機械学習を用いて予測可能であると考えられる。第2の目的は、AdaBoostアルゴリズムで訓練された臨床単独(RT-PCRを使用しない)が82.1%の精度で最も正確であることを示している。本研究は、危機やcovid-19に類似したアウトブレイクが発生した場合に、医療専門家にガイダンスを提供することができる。

The COVID-19 pandemic has disrupted the global economy and people's daily lives in unprecedented ways. To make appropriate decisions, it is necessary to diagnose COVID-19 rapidly and accurately. Clinical decision making is influenced by data collected from patients. With the aid of artificial intelligence, COVID-19 has been diagnosed quickly by analyzing symptoms, polymerase chain reaction (PCR), computed tomography scans, chest X-rays, routine laboratory blood tests and even cough sounds. Furthermore, these data can be used to predict a patient's morality, although there is a question about which data makes the most accurate predictions. Therefore, this study consists of two parts. Our first objective is to examine whether machine learning algorithms can predict the outcome of COVID-19 cases (recovery or death), based on the features present in the dataset. In the second part of the research, we investigated the impact of clinical and RT-PCR on prediction of recovery and decease to determine which one is more reliable. We defined four stages with different feature sets and use six machine learning methods to build prediction model. With an accuracy of 78.7%, random forest showed promising results for predicting death and recovery of patients. Based on this, it appears that recovery and decease of patients are predictable using machine learning. For second objective, results indicate that clinical alone (without using RT-PCR), trained with AdaBoost algorithm, is the most accurate with an accuracy of 82.1%. This study can provide guidance for medical professionals in the event of a crisis or outbreak similar to COVID-19.

翻訳日:2023-11-28 00:10:16 公開日:2023-11-23

# 産業アプリケーションのためのチェコ語意味的埋め込みモデル

Some Like It Small: Czech Semantic Embedding Models for Industry Applications ( http://arxiv.org/abs/2311.13921v1 )

ライセンス: Link先を確認

Ji\v{r}\'i Bedn\'a\v{r}, Jakub N\'aplava, Petra Baran\v{c}\'ikov\'a, Ond\v{r}ej Lisick\'y

(参考訳) 本稿では,小型チェコ文埋め込みモデルの開発と評価について述べる。小型モデルは資源制約環境におけるリアルタイム産業アプリケーションにとって重要なコンポーネントである。ラベル付きチェコデータの利用が限られている中、事前訓練、知識蒸留、教師なしのコントラスト微調整などの代替手法が検討されている。包括的本質的および極端的分析を行い,従来型モデルに比べて約8倍小さく,5倍の速度で比較した。協調と再現性を促進するため、モデルと評価パイプラインの両方が公開アクセス可能となる。本稿では,チェコの検索エンジンであるseznam.czにおける文埋め込みモデルの実践的応用について述べる。これらのモデルは、オーガニック検索、フィーチャースニペット、画像検索など、従来のモデルに取って代わり、全体的な検索エクスペリエンスを高めた。この移行により性能が向上した。

This article focuses on the development and evaluation of Small-sized Czech sentence embedding models. Small models are important components for real-time industry applications in resource-constrained environments. Given the limited availability of labeled Czech data, alternative approaches, including pre-training, knowledge distillation, and unsupervised contrastive fine-tuning, are investigated. Comprehensive intrinsic and extrinsic analyses are conducted, showcasing the competitive performance of our models compared to significantly larger counterparts, with approximately 8 times smaller size and 5 times faster speed than conventional Base-sized models. To promote cooperation and reproducibility, both the models and the evaluation pipeline are made publicly accessible. Ultimately, this article presents practical applications of the developed sentence embedding models in Seznam.cz, the Czech search engine. These models have effectively replaced previous counterparts, enhancing the overall search experience for instance, in organic search, featured snippets, and image search. This transition has yielded improved performance.

翻訳日:2023-11-28 00:09:50 公開日:2023-11-23

# 超高真空中へのナノ粒子のホロコアファイバ負荷

Hollow-core fiber loading of nanoparticles into ultra-high vacuum ( http://arxiv.org/abs/2311.13920v1 )

ライセンス: Link先を確認

Stefan Lindner and Paul Juschitz and Jakob Rieser and Yaakov Y. Fein and Mario Ciampini and Markus Aspelmeyer and Nikolai Kiesel

(参考訳) ナノ粒子による光浮上の分野での多くの実験は、粒子の積み込み技術によって制限されている。本稿では, 量子実験に必要な粒子の位置決定論的位置決めと超高真空レベルでのクリーンデリバリという課題を解決する新しい粒子負荷法を提案する。我々は,100-755\,\mathrm{nm}$径のナノ粒子の定常波光トラップの格子部位への効率的な載荷,位置決め,再配置,および10^{-9}\,\mathrm{mbar}$以下の前例のない圧力下でのナノ粒子の直接載荷を実証した。本手法は,光コンベヤベルトを用いた中空コアフォトニック結晶ファイバ内のナノ粒子の輸送に依存し,ターゲットトラップに対して正確に位置決めすることができる。超高真空中における浮遊固体の量子状態を利用した多粒子動力学と高ターンアラウンド時間の研究において、ナノ粒子数が増加する道を開く。

Many experiments in the field of optical levitation with nanoparticles today are limited by the available technologies for particle loading. Here we introduce a new particle loading method that solves the main challenges, namely deterministic positioning of the particles and clean delivery at ultra-high vacuum levels as required for quantum experiments. We demonstrate the efficient loading, positioning, and repositioning of nanoparticles in the range of $100-755\,\mathrm{nm}$ diameter into different lattice sites of a standing wave optical trap, as well as direct loading of nanoparticles at an unprecedented pressure below $10^{-9}\,\mathrm{mbar}$. Our method relies on the transport of nanoparticles within a hollow-core photonic crystal fiber using an optical conveyor belt, which can be precisely positioned with respect to the target trap. Our work opens the path for increasing nanoparticle numbers in the study of multiparticle dynamics and high turn-around times for exploiting the quantum regime of levitated solids in ultra-high vacuum.

翻訳日:2023-11-28 00:09:33 公開日:2023-11-23

# 周期的に駆動される量子点接触による電流に対する位相差の影響

Effect of dephasing on the current through a periodically driven quantum point contact ( http://arxiv.org/abs/2311.13918v1 )

ライセンス: Link先を確認

Igor Ermakov, Oleg Lychkovskiy

(参考訳) 周期駆動量子点接触(qpc)によってリンクされる2つの1次元量子xx$磁石について考察する。磁石が最初に反対方向に偏極すると、qpcを通るスピン電流が成立することを期待する。近年(B103,L041405(2021))では、駆動周波数が臨界値を超えると電流が完全に停止し、QPCが効果的に絶縁されていることが示されている。ここでは、この画像が量子デファスによってどのように影響を受けるかを探る。以上の結果から,非ゼロなデファス化が電流を回復させることが明らかとなった。

We consider two one-dimensional quantum $XX$ magnets linked by a periodically driven quantum point contact (QPC). If magnets are initially polarized in opposite directions, one expects that a spin current through the QPC will establish. It has been shown recently [Phys. Rev. B 103, L041405 (2021)] that, in fact, when the driving frequency exceeds a critical value, the current halts completely, the QPC being effectively insulating. Here we enquire how this picture is affected by quantum dephasing. Our findings reveal that any non-zero dephasing restores the current.

翻訳日:2023-11-28 00:09:14 公開日:2023-11-23

# 社会的ストレスがCOVID-19の適応動態に及ぼす影響を探る : 流行に直面する「活力」集団の行動のタイピング

Exploring the impact of social stress on the adaptive dynamics of COVID-19: Typing the behavior of na\"ive populations faced with epidemics ( http://arxiv.org/abs/2311.13917v1 )

ライセンス: Link先を確認

Innokentiy Kastalskiy, Andrei Zinovyev, Evgeny Mirkes, Victor Kazantsev and Alexander N. Gorban

(参考訳) 自然災害の文脈では、人間の反応は必然的に自然要因と相互作用する。新型コロナウイルス(covid-19)のパンデミックは大きなストレス要因として、さまざまな地域での感染拡大に対応する適応的なダイナミクスの観点から、各国間で大きな変化をもたらしている。これは自然災害解析における文化的特徴の重要な役割を強調している。大規模な流行の理論的理解は主に平均場運動モデルに依存している。しかし、従来のsirモデルでは、新型コロナウイルスの流行開始時に観測された現象を十分に説明できなかった。これらの現象は指数関数的成長の予期せぬ停止、高原の到達、マルチウェーブダイナミクスの発生を含む。高い病原性・不慣れな感染が発生した場合、負の社会経済的影響を軽減するために、非医療レベルで迅速に対応することが重要となる。本稿では、シンプルなSIRSSモデル(SIR with Social Stress)に基づいて、流行の最初の波に関する理論的検討を行う。我々は、世界各国におけるna\"ive population behaviorsの社会文化的特徴の分析を行う。各国/地域特有の特徴は、私たちのモデル内の数個の定数でカプセル化され、これは、適合したCOVID-19統計から導かれる。これらの定数はまた、外的ストレス要因に対する社会的反応のダイナミクスを反映しており、地球規模の社会災害における人間性と自然要因の相互行動を研究することの重要性を強調している。これらの地域特有の特徴に基づき、地域当局はワクチン開発まで疫病対策を効果的に行うことができる。

In the context of natural disasters, human responses inevitably intertwine with natural factors. The COVID-19 pandemic, as a significant stress factor, has brought to light profound variations among different countries in terms of their adaptive dynamics in addressing the spread of infection outbreaks across different regions. This emphasizes the crucial role of cultural characteristics in natural disaster analysis. The theoretical understanding of large-scale epidemics primarily relies on mean-field kinetic models. However, conventional SIR-like models failed to fully explain the observed phenomena at the onset of the COVID-19 outbreak. These phenomena included the unexpected cessation of exponential growth, the reaching of plateaus, and the occurrence of multi-wave dynamics. In situations where an outbreak of a highly virulent and unfamiliar infection arises, it becomes crucial to respond swiftly at a non-medical level to mitigate the negative socio-economic impact. Here we present a theoretical examination of the first wave of the epidemic based on a simple SIRSS model (SIR with Social Stress). We conduct an analysis of the socio-cultural features of na\"ive population behaviors across various countries worldwide. The unique characteristics of each country/territory are encapsulated in only a few constants within our model, derived from the fitted COVID-19 statistics. These constants also reflect the societal response dynamics to the external stress factor, underscoring the importance of studying the mutual behavior of humanity and natural factors during global social disasters. Based on these distinctive characteristics of specific regions, local authorities can optimize their strategies to effectively combat epidemics until vaccines are developed.

翻訳日:2023-11-28 00:09:05 公開日:2023-11-23

# LVNC診断のためのディープラーニングモデルの拡張:限界とトレードオフ

Expanding the deep-learning model to diagnosis LVNC: Limitations and trade-offs ( http://arxiv.org/abs/2311.13912v1 )

ライセンス: Link先を確認

Gregorio Bernab\'e and Pilar Gonz\'alez-F\'erez and Jos\'e M. Garc\'ia and Guillem Casas and Josefa Gonz\'alez-Carrillo

(参考訳) 心室左室(LVNC)における過形成あるいは非作用は、近年の心筋症の一形態である。左心室における気管の定量化にはいくつかの方法が提案されているが、特定のアプローチを用いるための一般の合意はない。 U-Net CNNアーキテクチャに基づく左室トラベキュラー定量化のための深層学習手法であるDL-LVTQを提案する。 DL-LVTQは、同じ心筋症(肥大型心筋症)患者のデータセットから開発された自動診断ツールである。本研究は, DL-LVTQを拡張, 適応し, 異なる心筋症に対処した。このデータセットは、3つのグループに379人の患者から成り、異なる特質と心筋症を呈する。患者画像は異なるスキャナーと病院から撮影された。我々は,u-net畳み込みニューラルネットワークを改良し,様々な分類不能・混合・遺伝性心筋症患者の異種集団の特異性を考慮した。提案した自動深層学習法の感度を維持しつつ,新たな患者グループを取り入れることで精度,特異性,カッパ値が向上した。したがって、異なる特徴を有する様々な心筋疾患に対して、より良好な診断ツールが準備されている。心臓科医は、評価されたアウトプットの98.9%が臨床的に診断のために検証されていると考えている。したがって, 心臓組織を分割する精度が高いことにより, 堅牢な診断システムを客観的かつ高速にし, 人的ミスや時間の短縮を図ることができる。

Hyper-trabeculation or non-compaction in the left ventricle of the myocardium (LVNC) is a recently classified form of cardiomyopathy. Several methods have been proposed to quantify the trabeculae accurately in the left ventricle, but there is no general agreement in the medical community to use a particular approach. In previous work, we proposed DL-LVTQ, a deep learning approach for left ventricular trabecular quantification based on a U-Net CNN architecture. DL-LVTQ was an automatic diagnosis tool developed from a dataset of patients with the same cardiomyopathy (hypertrophic cardiomyopathy). In this work, we have extended and adapted DL-LVTQ to cope with patients with different cardiomyopathies. The dataset consists of up 379 patients in three groups with different particularities and cardiomyopathies. Patient images were taken from different scanners and hospitals. We have modified and adapted the U-Net convolutional neural network to account for the different particularities of a heterogeneous group of patients with various unclassifiable or mixed and inherited cardiomyopathies. The inclusion of new groups of patients has increased the accuracy, specificity and kappa values while maintaining the sensitivity of the automatic deep learning method proposed. Therefore, a better-prepared diagnosis tool is ready for various cardiomyopathies with different characteristics. Cardiologists have considered that 98.9% of the evaluated outputs are verified clinically for diagnosis. Therefore, the high precision to segment the different cardiac structures allows us to make a robust diagnostic system objective and faster, decreasing human error and time spent.

翻訳日:2023-11-28 00:08:22 公開日:2023-11-23

# 顧客支援会話における対話品質と感情アノテーション

Dialogue Quality and Emotion Annotations for Customer Support Conversations ( http://arxiv.org/abs/2311.13910v1 )

ライセンス: Link先を確認

John Mendon\c{c}a and Patr\'icia Pereira and Miguel Menezes and Vera Cabarr\~ao and Ana C. Farinha and Helena Moniz and Jo\~ao Paulo Carvalho and Alon Lavie and Isabel Trancoso

(参考訳) タスク指向の会話型データセットは、トピック変動や言語多様性を欠くことが多い。しかし、大規模言語モデル(llm)が出現し、多言語多種多様なテキストデータに事前学習されたことにより、これらの制限は克服されたように思われる。しかしながら、対話アプリケーションにおける異なる言語やドメインへの一般化性は、ベンチマークデータセットなしでは不確実である。本稿では、二言語的顧客サポート会話の文脈における感情と会話品質に対する全体論的アノテーションアプローチを提案する。会話を構成する完全なインスタンスを考慮したアノテーションを実行することによって、対話全体のより広い視点を形成することができる。さらに、テキスト分類モデルの開発には、ユニークで貴重なリソースを提供する。そこで本研究では,感情認識と対話品質推定のベンチマークを行い,これらのモデルを活用するためのさらなる研究が必要であることを示す。

Task-oriented conversational datasets often lack topic variability and linguistic diversity. However, with the advent of Large Language Models (LLMs) pretrained on extensive, multilingual and diverse text data, these limitations seem overcome. Nevertheless, their generalisability to different languages and domains in dialogue applications remains uncertain without benchmarking datasets. This paper presents a holistic annotation approach for emotion and conversational quality in the context of bilingual customer support conversations. By performing annotations that take into consideration the complete instances that compose a conversation, one can form a broader perspective of the dialogue as a whole. Furthermore, it provides a unique and valuable resource for the development of text classification models. To this end, we present benchmarks for Emotion Recognition and Dialogue Quality Estimation and show that further research is needed to leverage these models in a production setting.

翻訳日:2023-11-28 00:07:44 公開日:2023-11-23

# 自転車用信号の確保に要する待ち時間削減のためのDRLソリューション

A DRL solution to help reduce the cost in waiting time of securing a traffic light for cyclists ( http://arxiv.org/abs/2311.13905v1 )

ライセンス: Link先を確認

Lucas Magnana (AGORA), Herv\'e Rivano (AGORA), Nicolas Chiabaut

(参考訳) サイクリストは、それらを電動交通から切り離すインフラを使うことを好む。交通信号を使って自動車と自転車の流れを分離し、自転車固有のグリーンフェーズを追加することで、自転車レーンのような重いインフラの機会を評価するために、動的に展開できる軽量で安価なソリューションである。そこで本論文では,これらの新しい位相によって引き起こされる待ち時間の増加を補うために,交通信号のグリーン相周期をトラヒックに適用する深層強化学習ソリューションを提案する。車両カウンタデータは、drlアプローチと交通光制御アルゴリズムを1日を通して比較するために使用される。その結果,DRLは車待ち時間をほぼ全時間で最小化できることがわかった。私たちのDRLアプローチは、自転車のトラフィックの適度な変化に対しても堅牢です。本論文のコードはhttps://github.com/LucasMagnana/A-DRL-solution-to-help-reduce-the-cost-in-await-of-securing-a-traffi c-light-for-cyclistsで公開されている。

Cyclists prefer to use infrastructure that separates them from motorized traffic. Using a traffic light to segregate car and bike flows, with the addition of bike-specific green phases, is a lightweight and cheap solution that can be deployed dynamically to assess the opportunity of a heavier infrastructure such as a separate bike lane. To compensate for the increased waiting time induced by these new phases, we introduce in this paper a deep reinforcement learning solution that adapts the green phase cycle of a traffic light to the traffic. Vehicle counter data are used to compare the DRL approach with the actuated traffic light control algorithm over whole days. Results show that DRL achieves better minimization of vehicle waiting time at almost all hours. Our DRL approach is also robust to moderate changes in bike traffic. The code of this paper is available at https://github.com/LucasMagnana/A-DRL-solution-to-help-reduce-the-cost-in-waiting-time-of-securing-a -traffic-light-for-cyclists.

翻訳日:2023-11-28 00:07:22 公開日:2023-11-23

# 野生でのアクティビティビデオによるクエリ

Query by Activity Video in the Wild ( http://arxiv.org/abs/2311.13895v1 )

ライセンス: Link先を確認

Tao Hu, William Thong, Pascal Mettes, Cees G.M. Snoek

(参考訳) 本稿では,不均衡シナリオにおけるビデオクエリからのアクティビティ検索に着目した。現在のクェリ・バイ・アクティビティ・ビデオの文献では、埋め込みを学ぶ際に全てのアクティビティに十分なラベル付き例があるという仮定が一般的である。しかし、この仮定は実際には成立せず、一部の活動には多くの例があるが、他の活動は少数の例によってのみ記述される。本稿では,アクティビティ検索における不均衡シナリオを明示的に扱う視覚意味埋め込みネットワークを提案する。私たちのネットワークには2つの新しいモジュールがあります。視覚アライメントモジュールは、すべてのアクティビティに対して、入力ビデオと固定サイズの視覚バンク表現のグローバルアライメントを実行する。セマンティックモジュールは、入力ビデオと固定サイズのセマンティックアクティビティ表現のアライメントを実行する。すべてのアクティビティに対して同じ大きさの視覚的および意味的なアクティビティ表現とマッチングすることにより、検索中の頻繁なアクティビティを無視することが可能になる。新たな不均衡アクティビティ検索ベンチマーク実験では,あらゆるタイプのアクティビティに対するアプローチの有効性が示された。

This paper focuses on activity retrieval from a video query in an imbalanced scenario. In current query-by-activity-video literature, a common assumption is that all activities have sufficient labelled examples when learning an embedding. This assumption does however practically not hold, as only a portion of activities have many examples, while other activities are only described by few examples. In this paper, we propose a visual-semantic embedding network that explicitly deals with the imbalanced scenario for activity retrieval. Our network contains two novel modules. The visual alignment module performs a global alignment between the input video and fixed-sized visual bank representations for all activities. The semantic module performs an alignment between the input video and fixed-sized semantic activity representations. By matching videos with both visual and semantic activity representations that are of equal size over all activities, we no longer ignore infrequent activities during retrieval. Experiments on a new imbalanced activity retrieval benchmark show the effectiveness of our approach for all types of activities.

翻訳日:2023-11-28 00:06:42 公開日:2023-11-23

# General Phrase Debiaser:マルチトークンレベルでのマスク言語モデルのデバイアス

General Phrase Debiaser: Debiasing Masked Language Models at a Multi-Token Level ( http://arxiv.org/abs/2311.13892v1 )

ライセンス: Link先を確認

Bingkang Shi, Xiaodan Zhang, Dehan Kong, Yulei Wu, Zongzhen Liu, Honglei Lyu, Longtao Huang

(参考訳) 事前訓練された言語モデルによって明らかになった社会的バイアスと不適切なステレオタイプは、彼らの応用の障害になりつつある。単語レベルを対象とする多くのデバイアス化手法と比較して、フレーズレベルに存在するバイアスに対する関心は比較的少なく、規律領域におけるデバイアス化のパフォーマンスが制限されている。本稿では,マスキング言語モデルにおける句レベルの偏りを緩和できる「textbf{ General Phrase Debiaser}」と呼ばれる自動多言語脱バイアスパイプラインを提案する。具体的には、wikipediaページから定型的なフレーズを生成する \textit{phrase filter stage} と、複数トケンレベルでモデルをデバイアスし、フレーズのバイアス課題に取り組む \textit{model debias stage} からなる。後者はモデルのバイアスをトリガーするプロンプトを検索し、デバイアスに使用する。標準データセットとメトリクスの最先端結果から、我々のアプローチは、様々なパラメータサイズを持つモデル間で、キャリアと複数の規律の両方における性別バイアスを著しく低減できることを示している。

The social biases and unwelcome stereotypes revealed by pretrained language models are becoming obstacles to their application. Compared to numerous debiasing methods targeting word level, there has been relatively less attention on biases present at phrase level, limiting the performance of debiasing in discipline domains. In this paper, we propose an automatic multi-token debiasing pipeline called \textbf{General Phrase Debiaser}, which is capable of mitigating phrase-level biases in masked language models. Specifically, our method consists of a \textit{phrase filter stage} that generates stereotypical phrases from Wikipedia pages as well as a \textit{model debias stage} that can debias models at the multi-token level to tackle bias challenges on phrases. The latter searches for prompts that trigger model's bias, and then uses them for debiasing. State-of-the-art results on standard datasets and metrics show that our approach can significantly reduce gender biases on both career and multiple disciplines, across models with varying parameter sizes.

翻訳日:2023-11-28 00:06:22 公開日:2023-11-23

# 交通ネットワークのトポロジカル分類のための教師なし学習

Unsupervised Learning for Topological Classification of Transportation Networks ( http://arxiv.org/abs/2311.13887v1 )

ライセンス: Link先を確認

Sina Sabzekar, Mohammad Reza Valipour Malakshah, Zahra Amini

(参考訳) 都市化が進むにつれて、交通は都市開発においてますます重要な役割を担っている。輸送システムのモデリング、最適化、シミュレーション、データ分析に関する研究が増えている。これらの研究の多くは、都市における実世界の交通システムを表現するために輸送試験ネットワークを用いており、提案手法の有効性を検証している。これらのネットワークは、それぞれのトポロジーにおいてユニークな特徴を示し、それらの応用は様々な研究目的のために区別される。研究で広く利用されているにもかかわらず、これらのネットワークのトポロジ的特徴に基づく分類に関する包括的な研究は乏しい。本研究では,教師なし学習手法,特にクラスタリングを用いて,このギャップを埋めることを目的とする。本稿では,様々なトポロジカルネットワーク特性を評価するための包括的フレームワークを提案する。さらに, 主成分分析 (PCA) と等尺的特徴マッピング (ISOMAP) の2つの次元化手法を用いて, 相関性の高い特徴の重複を低減し, その後の分類結果の解釈可能性を高める。次に、K-meansとHDBSCANという2つのクラスタリングアルゴリズムを用いて14のトランスポートネットワークを分類する。 PCA法はK平均クラスタリング法に続き、Silhouetteスコア0.510ドルの他の手法よりも優れており、輸送ネットワークを5つのクラスタに分類することができる。結果の分類に関する詳細な議論も行っています。

With increasing urbanization, transportation plays an increasingly critical role in city development. The number of studies on modeling, optimization, simulation, and data analysis of transportation systems is on the rise. Many of these studies utilize transportation test networks to represent real-world transportation systems in urban areas, examining the efficacy of their proposed approaches. Each of these networks exhibits unique characteristics in their topology, making their applications distinct for various study objectives. Despite their widespread use in research, there is a lack of comprehensive study addressing the classification of these networks based on their topological characteristics. This study aims to fill this gap by employing unsupervised learning methods, particularly clustering. We present a comprehensive framework for evaluating various topological network characteristics. Additionally, we employ two dimensionality reduction techniques, namely Principal Component Analysis (PCA) and Isometric Feature Mapping (ISOMAP), to reduce overlaps of highly correlated features and enhance the interpretability of the subsequent classification results. We then utilize two clustering algorithms, K-means and HDBSCAN, to classify 14 transportation networks. The PCA method, followed by the K-means clustering approach, outperforms other alternatives with a Silhouette score of $0.510$, enabling the classification of transportation networks into five clusters. We also provide a detailed discussion on the resulting classification.

翻訳日:2023-11-28 00:05:43 公開日:2023-11-23

# 物理インフォームドニューラルネットワークは自己改善できるか?

Can Physics Informed Neural Operators Self Improve? ( http://arxiv.org/abs/2311.13885v1 )

ライセンス: Link先を確認

Ritam Majumdar, Amey Varhade, Shirish Karande, Lovekesh Vig

(参考訳) 自己学習技術は多くのディープラーニングモデルやタスクで顕著な価値を示している。しかし、偏微分方程式系(Eg: Neural Operators)の高速解法学習の文脈で考えると、そのような手法はほとんど解明されていない。本研究では,フーリエニューラル演算子(FNO)の自己学習の利用について検討する。ニューラル演算子はデータ駆動技術として登場したが、実験や従来のソルバからのデータは、常に容易に利用できるとは限らない。物理インフォームドニューラルオペレータ(PINO)は、トレーニングに物理損失を利用することで、この制約を克服するが、データ無しでトレーニングされたPINOの精度は、データによるトレーニングによって得られた性能と一致しない。この研究で、このパフォーマンスのギャップを埋めるために自己学習が使えることを示す。本研究では,1D-Burgers と 2D-Darcy PDE を用いて,自己学習の有効性を示す。具体的には、FNOは、自己学習を通じて物理損失を専門に訓練すると、データと物理損失の両方で訓練されたFNOと比較して、Burgersが1.07x、Darcyが1.02xに近づきます。さらに,各反復の収束を必ずしも訓練することなく,擬似ラベルを自己学習に利用できることを発見した。その結果,PINOのベースライン性能を向上する自己学習スケジュールを,精度と時間の観点から発見できることがわかった。

Self-training techniques have shown remarkable value across many deep learning models and tasks. However, such techniques remain largely unexplored when considered in the context of learning fast solvers for systems of partial differential equations (Eg: Neural Operators). In this work, we explore the use of self-training for Fourier Neural Operators (FNO). Neural Operators emerged as a data driven technique, however, data from experiments or traditional solvers is not always readily available. Physics Informed Neural Operators (PINO) overcome this constraint by utilizing a physics loss for the training, however the accuracy of PINO trained without data does not match the performance obtained by training with data. In this work we show that self-training can be used to close this gap in performance. We examine canonical examples, namely the 1D-Burgers and 2D-Darcy PDEs, to showcase the efficacy of self-training. Specifically, FNOs, when trained exclusively with physics loss through self-training, approach 1.07x for Burgers and 1.02x for Darcy, compared to FNOs trained with both data and physics loss. Furthermore, we discover that pseudo-labels can be used for self-training without necessarily training to convergence in each iteration. A consequence of this is that we are able to discover self-training schedules that improve upon the baseline performance of PINO in terms of accuracy as well as time.

翻訳日:2023-11-28 00:05:23 公開日:2023-11-23

# 大規模意思決定のための大規模言語モデルベースエージェントの制御:アクタ・クリティカルアプローチ

Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach ( http://arxiv.org/abs/2311.13884v1 )

ライセンス: Link先を確認

Bin Zhang, Hangyu Mao, Jingqing Ruan, Ying Wen, Yang Li, Shao Zhang, Zhiwei Xu, Dapeng Li, Ziyue Li, Rui Zhao, Lijuan Li, Guoliang Fan

(参考訳) 大規模言語モデル(LLM)の大幅な進歩は、マルチエージェントシステムにおける計画と意思決定に対処する新たな機会をもたらした。しかし, エージェントの数が増加するにつれて, LLMの幻覚化やマルチエージェントシステム(MAS)のコーディネーションの問題がますます顕著になっている。さらに、多数のエージェントの相互作用を促進するためにLLMを使用する場合、トークンの効率的な利用が重要な考慮事項となる。本稿では,大規模マルチエージェント環境におけるLCMのコーディネーションと意思決定能力の向上を目的とした新しいフレームワークを提案する。提案手法は,マルチエージェント強化学習におけるアクタ批判的枠組みからインスピレーションを得て,LLMやMASが提示する課題に効果的に対処する,モジュール的でトークン効率のよいソリューションを開発した。システム資源割当とロボットグリッド輸送に関する実験で実施した評価を通じて,提案手法が有するかなりの利点を実証する。

The significant advancements in large language models (LLMs) have presented novel opportunities for tackling planning and decision-making within multi-agent systems. However, as the number of agents increases, the issues of hallucination in LLMs and coordination in multi-agent systems (MAS) have become increasingly pronounced. Additionally, the efficient utilization of tokens becomes a critical consideration when employing LLMs to facilitate the interactions of large numbers of agents. In this paper, we present a novel framework aimed at enhancing coordination and decision-making capabilities of LLMs within large-scale multi-agent environments. Our approach draws inspiration from the actor-critic framework employed in multi-agent reinforcement learning, and we develop a modular and token-efficient solution that effectively addresses challenges presented by LLMs and MAS. Through evaluations conducted in experiments involving system resource allocation and robot grid transportation, we demonstrate the considerable advantages afforded by our proposed approach.

翻訳日:2023-11-28 00:04:57 公開日:2023-11-23

# 医用画像のディープ・インタラクティブ・セグメンテーション : システムレビューと分類学

Deep Interactive Segmentation of Medical Images: A Systematic Review and Taxonomy ( http://arxiv.org/abs/2311.13964v1 )

ライセンス: Link先を確認

Zdravko Marinov, Paul F. J\"ager, Jan Egger, Jens Kleesiek, Rainer Stiefelhagen

(参考訳) 対話的セグメンテーションは、人的フィードバックを取り入れることでコストのかかるアノテーションの効率を高めることを目的とした、医用画像解析における重要な研究分野である。このフィードバックはクリック、スクリブル、マスクの形式で行われ、モデルの出力を反復的に洗練することで、システムが望ましい振る舞いに向かって効率的に導くことができる。近年、深層学習に基づくアプローチは、医療画像領域だけで提案されている121の手法によって、この分野の急速な成長をもたらす新たなレベルへと結果をもたらしている。本論では,包括的分類法,既存手法の体系的見直し,現在の実践の深い分析を特徴とする,この新興分野の構造化的概観について述べる。これらの貢献に基づいて,この分野の課題と機会について論じる。例えば、標準化されたベースラインとベンチマークによって取り組まなければならないメソッド間の比較が著しく欠落していることが分かります。

Interactive segmentation is a crucial research area in medical image analysis aiming to boost the efficiency of costly annotations by incorporating human feedback. This feedback takes the form of clicks, scribbles, or masks and allows for iterative refinement of the model output so as to efficiently guide the system towards the desired behavior. In recent years, deep learning-based approaches have propelled results to a new level causing a rapid growth in the field with 121 methods proposed in the medical imaging domain alone. In this review, we provide a structured overview of this emerging field featuring a comprehensive taxonomy, a systematic review of existing methods, and an in-depth analysis of current practices. Based on these contributions, we discuss the challenges and opportunities in the field. For instance, we find that there is a severe lack of comparison across methods which needs to be tackled by standardized baselines and benchmarks.

翻訳日:2023-11-27 23:58:21 公開日:2023-11-23

# ダイナミックMR画像再構成学習における公開天然ビデオの利用の検討

Investigating the use of publicly available natural videos to learn Dynamic MR image reconstruction ( http://arxiv.org/abs/2311.13963v1 )

ライセンス: Link先を確認

Olivier Jaubert, Michele Pascale, Javier Montalt-Tordera, Julius Akesson, Ruta Virsinskaite, Daniel Knight, Simon Arridge, Jennifer Steeden, Vivek Muthurangu

(参考訳) 目的:公開天然ビデオ(Inter4K)から動的MR画像再構成を学習するために,ディープラーニング(DL)パイプラインの開発と評価を行う。 Materials and Methods: DLアーキテクチャ(VarNet, 3D UNet, FastDVDNet)およびそれに対応するサンプリングパターン(Cartesian, radial, spiral)について,真のマルチコイル心MRデータ(N=692)から,あるいはInter4K自然ビデオ(N=692)からシミュレーションした擬似MRデータから学習を行った。実時間アンサンプされた動的MR画像は、心臓データと自然ビデオで訓練されたDLネットワークと圧縮センシング(CS)を用いて再構成された。 MSE, PSNR, SSIMのシミュレーション(N=104データセット)において, 心臓(短軸, 4室, N=20) と音声(N=10) の主観的画像品質ランキング, SNR, エッジシャープネスの相違について検討した。熱後ネメニイ分析によるFriedman Chi Square試験を行い,統計的意義を検討した。結果: 心臓データで訓練されたdlネットワークは自然ビデオで訓練されたdlネットワークを上回り, cs (p<0.05) を上回った。しかし, 予報実験では, 両トレーニングデータセットを用いたDL再構成はCSと同等にランク付けされ, ほとんどの条件においてSNRとエッジシャープネスの統計的差異は認められなかった。また,心臓データを用いたdl法と自然映像法(ssim>0.85)では高いssimが測定された。結論: 開発パイプラインでは, dl再構成の利点を保ちつつ, 限界(データの不足や共有)を克服しながら, dl再構成の利点を保ちながら, 自然映像から動的mr再構成を学習できる。自然なビデオデータセット、コード、トレーニング済みネットワークは、githubで簡単に利用できる。キーワード:リアルタイム、ダイナミックMRI、ディープラーニング、画像再構成、機械学習

Purpose: To develop and assess a deep learning (DL) pipeline to learn dynamic MR image reconstruction from publicly available natural videos (Inter4K). Materials and Methods: Learning was performed for a range of DL architectures (VarNet, 3D UNet, FastDVDNet) and corresponding sampling patterns (Cartesian, radial, spiral) either from true multi-coil cardiac MR data (N=692) or from pseudo-MR data simulated from Inter4K natural videos (N=692). Real-time undersampled dynamic MR images were reconstructed using DL networks trained with cardiac data and natural videos, and compressed sensing (CS). Differences were assessed in simulations (N=104 datasets) in terms of MSE, PSNR, and SSIM and prospectively for cardiac (short axis, four chambers, N=20) and speech (N=10) data in terms of subjective image quality ranking, SNR and Edge sharpness. Friedman Chi Square tests with post-hoc Nemenyi analysis were performed to assess statistical significance. Results: For all simulation metrics, DL networks trained with cardiac data outperformed DL networks trained with natural videos, which outperformed CS (p<0.05). However, in prospective experiments DL reconstructions using both training datasets were ranked similarly (and higher than CS) and presented no statistical differences in SNR and Edge Sharpness for most conditions. Additionally, high SSIM was measured between the DL methods with cardiac data and natural videos (SSIM>0.85). Conclusion: The developed pipeline enabled learning dynamic MR reconstruction from natural videos preserving DL reconstruction advantages such as high quality fast and ultra-fast reconstructions while overcoming some limitations (data scarcity or sharing). The natural video dataset, code and pre-trained networks are made readily available on github. Key Words: real-time; dynamic MRI; deep learning; image reconstruction; machine learning;

翻訳日:2023-11-27 23:58:06 公開日:2023-11-23

# 人間の機械の共創。 GANを用いた創造的文字設計プロセスへの補完的認知的アプローチ

Human Machine Co-Creation. A Complementary Cognitive Approach to Creative Character Design Process Using GANs ( http://arxiv.org/abs/2311.13960v1 )

ライセンス: Link先を確認

Mohammad Lataifeh, Xavier A Carrascoa, Ashraf M Elnagara, Naveed Ahmeda, Imran Junejo

(参考訳) 生成型逆ネットワークの最近の進歩 gans 応用は、様々な分野の研究者の注目を集め続けている。このようなフレームワークでは、2つのニューラルネットワークが競合し、元のデータセットと区別できない新しい視覚コンテンツを生成する。本研究の目的は,ゲームやアニメーションなどのマルチメディアプロジェクトにおけるキャラクタの可視化と作成におけるキャラクタ設計能力を高めるため,人間と機械の補完的なコード署名プロセスを作成することである。設計認知的足場によって駆動されるこのアプローチは、知覚、理解、および作りの過程を知らせることを目的としている。マシン生成の概念は、キャラクターデザイナーが新しいキャラクターを概念化するためのローンチプラットフォームとして使用される。この研究のためにラベル付き22,000文字のデータセットを開発し、異なるGANを用いてコンテキストに最も適した評価を行い、続いて機械出力と人間の導出の混合手法の評価を行った。提案したコクリエーションフレームワークの価値を検証し,創発された概念がデザイナーの能力と相互作用する認知物質としてどのように利用されているかを明らかにする。

Recent advances in Generative Adversarial Networks GANs applications continue to attract the attention of researchers in different fields. In such a framework, two neural networks compete adversely to generate new visual contents indistinguishable from the original dataset. The objective of this research is to create a complementary codesign process between humans and machines to augment character designers abilities in visualizing and creating new characters for multimedia projects such as games and animation. Driven by design cognitive scaffolding, the proposed approach aims to inform the process of perceiving, knowing, and making. The machine generated concepts are used as a launching platform for character designers to conceptualize new characters. A labelled dataset of 22,000 characters was developed for this work and deployed using different GANs to evaluate the most suited for the context, followed by mixed methods evaluation for the machine output and human derivations. The discussed results substantiate the value of the proposed cocreation framework and elucidate how the generated concepts are used as cognitive substances that interact with designers competencies in a versatile manner to influence the creative processes of conceptualizing novel characters.

翻訳日:2023-11-27 23:57:29 公開日:2023-11-23

# RankFeat\&RankWeight: Rank-1 Feature/Weight removal for Out-of-distriion Detection

RankFeat\&RankWeight: Rank-1 Feature/Weight Removal for Out-of-distribution Detection ( http://arxiv.org/abs/2311.13959v1 )

ライセンス: Link先を確認

Yue Song, Nicu Sebe, Wei Wang

(参考訳) out-of-distribution(ood)検出のタスクは、実際の環境で機械学習モデルをデプロイする上で非常に重要です。本稿では,in-distribution (id) と ood の特徴の特異値分布がかなり異なることを観察する。 ood 特徴行列は id 特徴よりも支配的特異値が大きい傾向にあり,ood サンプルのクラス予測はそれらによって決定される。この観察は、最大特異値と関連する特異ベクトルからなるランク1行列を高次特徴量から取り除き、OOD検出のための単純で効果的な \emph{post hoc} アプローチである \texttt{RankFeat} を提案する動機付けとなる。 texttt{RankFeat} は \emph{state-of-the-art} のパフォーマンスを達成し、以前のベストメソッドと比較して平均偽陽性率 (FPR95) を 17.90 % 削減する。 texttt{RankFeat} の成功は、ニューラルネットワークのパラメータ行列に同様の現象が存在するかどうかを調べる動機となる。そこで我々は,1つの深層パラメータ行列からランク1重みを除去する‘texttt{RankWeight} を提案する。我々の \texttt{RankWeight} もまた \emph{post hoc} であり、ランク1行列を一度だけ計算する必要がある。スタンドアロンのアプローチとして、 \texttt{RankWeight} は様々なバックボーンにわたる他のメソッドと非常に競合するパフォーマンスを持つ。さらに \texttt{RankWeight} は、幅広い OOD 検出方法との柔軟な互換性を享受しています。 texttt{rankweight} と \texttt{rankfeat} の組み合わせは、新しい \emph{state-of-the-art} のパフォーマンスをリフレッシュし、imagenet-1k ベンチマークで fpr95 を 16.13\% まで低くした。実験結果を支持するために,広範囲なアブレーション研究と包括的理論解析を行った。

The task of out-of-distribution (OOD) detection is crucial for deploying machine learning models in real-world settings. In this paper, we observe that the singular value distributions of the in-distribution (ID) and OOD features are quite different: the OOD feature matrix tends to have a larger dominant singular value than the ID feature, and the class predictions of OOD samples are largely determined by it. This observation motivates us to propose \texttt{RankFeat}, a simple yet effective \emph{post hoc} approach for OOD detection by removing the rank-1 matrix composed of the largest singular value and the associated singular vectors from the high-level feature. \texttt{RankFeat} achieves \emph{state-of-the-art} performance and reduces the average false positive rate (FPR95) by 17.90\% compared with the previous best method. The success of \texttt{RankFeat} motivates us to investigate whether a similar phenomenon would exist in the parameter matrices of neural networks. We thus propose \texttt{RankWeight} which removes the rank-1 weight from the parameter matrices of a single deep layer. Our \texttt{RankWeight}is also \emph{post hoc} and only requires computing the rank-1 matrix once. As a standalone approach, \texttt{RankWeight} has very competitive performance against other methods across various backbones. Moreover, \texttt{RankWeight} enjoys flexible compatibility with a wide range of OOD detection methods. The combination of \texttt{RankWeight} and \texttt{RankFeat} refreshes the new \emph{state-of-the-art} performance, achieving the FPR95 as low as 16.13\% on the ImageNet-1k benchmark. Extensive ablation studies and comprehensive theoretical analyses are presented to support the empirical results.

翻訳日:2023-11-27 23:57:08 公開日:2023-11-23

# テンソル$U_1$ノルムによる高次テンソル回復

High-Order Tensor Recovery with A Tensor $U_1$ Norm ( http://arxiv.org/abs/2311.13958v1 )

ライセンス: Link先を確認

Jingjing Zheng, Wenzhe Wang, Xiaoqin Zhang, Yankai Cao, Xianta Jiang

(参考訳) 近年,多くのテンソルSVD(t-SVD)ベースのテンソルリカバリ手法が登場し,視覚データ処理の可能性を示唆している。しかし、これらの手法は、非滑らかな変化を示す高次テンソルデータに直面すると、しばしば性能劣化に悩まされるが、従来のt-SVD法では無視される。本研究の目的は, テンソルデータの非滑らかな変化を効果的に処理し, 様々な次元にわたる高次テンソルデータの相関を, 多数の変数や重みを導入することなく効率的に探索することである。この目的のために、新しいテンソル分解とテンソル $u_1$ ノルムと呼ばれる新しいテンソルノルムを導入する。これらの手法を高階テンソル補完問題の解法に利用し,結果のテンソル補完モデルの厳密な回復のための理論的保証を提供する。近似アルゴリズムと乗算器の交互方向法を組み合わせることにより, 結果のテンソル完備化モデルを反復的に解く最適化アルゴリズムを提案する。理論的解析により最適化問題のKKT点へのアルゴリズムの収束が示された。数値実験により,高次テンソル補完法,特に非スムース変化を有するテンソルデータにおいて,提案手法の有効性が示された。

Recently, numerous tensor SVD (t-SVD)-based tensor recovery methods have emerged, showing promise in processing visual data. However, these methods often suffer from performance degradation when confronted with high-order tensor data exhibiting non-smooth changes, commonly observed in real-world scenarios but ignored by the traditional t-SVD-based methods. Our objective in this study is to provide an effective tensor recovery technique for handling non-smooth changes in tensor data and efficiently explore the correlations of high-order tensor data across its various dimensions without introducing numerous variables and weights. To this end, we introduce a new tensor decomposition and a new tensor norm called the Tensor $U_1$ norm. We utilize these novel techniques in solving the problem of high-order tensor completion problem and provide theoretical guarantees for the exact recovery of the resulting tensor completion models. An optimization algorithm is proposed to solve the resulting tensor completion model iteratively by combining the proximal algorithm with the Alternating Direction Method of Multipliers. Theoretical analysis showed the convergence of the algorithm to the Karush-Kuhn-Tucker (KKT) point of the optimization problem. Numerical experiments demonstrated the effectiveness of the proposed method in high-order tensor completion, especially for tensor data with non-smooth changes.

翻訳日:2023-11-27 23:56:27 公開日:2023-11-23

# 効率的なトリガーワード挿入

Efficient Trigger Word Insertion ( http://arxiv.org/abs/2311.13957v1 )

ライセンス: Link先を確認

Yueqi Zeng, Ziqiang Li, Pengfei Xia, Lei Liu, Bin Li

(参考訳) 近年、自然言語処理(NLP)分野のブームにより、バックドア攻撃はディープニューラルネットワークモデルに対して重大な脅威となる。しかし、前回の研究は中毒率の影響をほとんど考慮していない。本研究の目的は,テキストバックドア攻撃における適切な攻撃成功率(asr)を保ちながら,被毒サンプル数を削減することである。そこで本研究では,トリガーワードの最適化と有毒サンプル選択の観点から,効率的なトリガーワード挿入戦略を提案する。異なるデータセットとモデルに関する広範囲な実験により,提案手法がテキスト分類タスクにおける攻撃効率を著しく向上できることが証明された。また,本手法は汚れラベル設定では10個の有毒試料のみで90%以上を達成でき,クリーンラベル設定ではトレーニングデータの1.5%しか必要としない。

With the boom in the natural language processing (NLP) field these years, backdoor attacks pose immense threats against deep neural network models. However, previous works hardly consider the effect of the poisoning rate. In this paper, our main objective is to reduce the number of poisoned samples while still achieving a satisfactory Attack Success Rate (ASR) in text backdoor attacks. To accomplish this, we propose an efficient trigger word insertion strategy in terms of trigger word optimization and poisoned sample selection. Extensive experiments on different datasets and models demonstrate that our proposed method can significantly improve attack effectiveness in text classification tasks. Remarkably, our approach achieves an ASR of over 90% with only 10 poisoned samples in the dirty-label setting and requires merely 1.5% of the training data in the clean-label setting.

翻訳日:2023-11-27 23:56:06 公開日:2023-11-23

# 電気ネットワーク周波数光センシング装置

Electric Network Frequency Optical Sensing Devices ( http://arxiv.org/abs/2311.13954v1 )

ライセンス: Link先を確認

Christos Moysiadis, Georgios Karantaidis, Constantine Kotropoulos

(参考訳) ENF(Electric Network Frequency)は、マルチメディア法医学の応用において指紋として機能する。屋内環境では、ENFの変動は主電源に接続された光源の強度に影響を与える。これにより、センサ装置が捉えた光強度変動を利用してENFを推定することができる。光ダイオードに基づく第1の光センシング装置は、室内照明環境におけるENF変動を捉えるために開発された。また、電源メインから直接ENFを捕捉する装置を実装する。この装置は、真理ENFコレクターとして機能する。カメラが捉えたビデオ記録もENFを推定するために使われる。カメラは第2の光学センサーとして機能する。 ENF推定に影響を及ぼす要因について検討した。 2つの光学センサで推定されるenfとパワーメインから直接推定されるenfとの最大相関係数を用いて推定精度を測定する。論文の主な貢献は、白壁を捕獲する静的なものから人間の活動を含む非静的なものまで、enf推定に関する広範な実験的な証拠の開示である。

Electric Network Frequency (ENF) acts as a fingerprint in multimedia forensics applications. In indoor environments, ENF variations affect the intensity of light sources connected to power mains. Accordingly, the light intensity variations captured by sensing devices can be exploited to estimate the ENF. A first optical sensing device based on a photodiode is developed for capturing ENF variations in indoor lighting environments. In addition, a device that captures the ENF directly from power mains is implemented. This device serves as a ground truth ENF collector. Video recordings captured by a camera are also employed to estimate the ENF. The camera serves as a second optical sensor. The factors affecting the ENF estimation are thoroughly studied. The maximum correlation coefficient between the ENF estimated by the two optical sensors and that estimated directly from power mains is used to measure the estimation accuracy. The paper's major contribution is in the disclosure of extensive experimental evidence on ENF estimation in scenes ranging from static ones capturing a white wall to non-static ones, including human activity.

翻訳日:2023-11-27 23:55:51 公開日:2023-11-23

# グラフレベルのクラスタリングのためのハイパースフィア上の一様クラスタの学習

Learning Uniform Clusters on Hypersphere for Deep Graph-level Clustering ( http://arxiv.org/abs/2311.13953v1 )

ライセンス: Link先を確認

Mengling Hu, Chaochao Chen, Weiming Liu, Xinyi Zhang, Xinting Liao, and Xiaolin Zheng

(参考訳) 近年,グラフクラスタリングが広く研究されている。しかし、既存のグラフクラスタリング手法のほとんどは、単一のグラフ内のノードをクラスタにグループ化するノードレベルのクラスタリングに焦点を当てている。対照的に、複数のグラフをクラスタにグループ化するグラフレベルのクラスタリングは、ほとんど未調査のままである。グラフレベルのクラスタリングは、分子の特性予測やソーシャルネットワークにおけるコミュニティ分析など、様々な実世界のアプリケーションにおいて重要である。しかし,グラフレベルでのクラスタリングは,グラフレベルでの表現の識別性が不十分であることや,クラスタのクラスタリングが不十分であることから,解の縮退(クラスタ崩壊)が困難である。そこで本研究では,Uniform Deep Graph Clustering (UDGC) と呼ばれるグラフレベルのクラスタリング手法を提案する。 UDGCはインスタンスを異なるクラスタに均等に割り当て、次にこれらのクラスタをユニットハイパースフィア上に分散させ、より均一なクラスタレベルの分散と、より小さなクラスタ崩壊につながる。具体的には,クラスタ分割のための均一に分散された信頼性の高い擬似ラベルを生成するためのAugmentation-Consensus Optimal Transport (ACOT)を提案する。そして、これらのクラスタを分散するために、対比学習を採用します。さらに,より優れたパラメータを学習するためのモデルを導くために,Center Alignment Optimal Transport (CAOT)を提案する。 8つのよく知られたデータセットに関する実証研究は、UDGCが最先端のモデルを大幅に上回っていることを示している。

Graph clustering has been popularly studied in recent years. However, most existing graph clustering methods focus on node-level clustering, i.e., grouping nodes in a single graph into clusters. In contrast, graph-level clustering, i.e., grouping multiple graphs into clusters, remains largely unexplored. Graph-level clustering is critical in a variety of real-world applications, such as, properties prediction of molecules and community analysis in social networks. However, graph-level clustering is challenging due to the insufficient discriminability of graph-level representations, and the insufficient discriminability makes deep clustering be more likely to obtain degenerate solutions (cluster collapse). To address the issue, we propose a novel deep graph-level clustering method called Uniform Deep Graph Clustering (UDGC). UDGC assigns instances evenly to different clusters and then scatters those clusters on unit hypersphere, leading to a more uniform cluster-level distribution and a slighter cluster collapse. Specifically, we first propose Augmentation-Consensus Optimal Transport (ACOT) for generating uniformly distributed and reliable pseudo labels for partitioning clusters. Then we adopt contrastive learning to scatter those clusters. Besides, we propose Center Alignment Optimal Transport (CAOT) for guiding the model to learn better parameters, which further promotes the cluster performance. Our empirical study on eight well-known datasets demonstrates that UDGC significantly outperforms the state-of-the-art models.

翻訳日:2023-11-27 23:55:40 公開日:2023-11-23

# GPT-4Vを用いたマルチモーダルLCMのMLLM-Bench評価

MLLM-Bench, Evaluating Multi-modal LLMs using GPT-4V ( http://arxiv.org/abs/2311.13951v1 )

ライセンス: Link先を確認

Wentao Ge, Shunian Chen, Guiming Chen, Junying Chen, Zhihong Chen, Shuo Yan, Chenghao Zhu, Ziyue Lin, Wenya Xie, Xidong Wang, Anningzhe Gao, Zhiyi Zhang, Jianquan Li, Xiang Wan, Benyou Wang

(参考訳) AI(Artificial General Intelligence)の追求において、言語モデルにおけるビジョンの統合は重要なマイルストーンとなった。 GPT-4Vのような視覚言語モデル(MLLM)の出現は、人間の脳のマルチモーダル能力に合わせて、AIアプリケーションを拡張した。しかし、MLLMの有効性を評価することは、不十分な回答を欠くタスクの主観的な性質のために大きな課題となる。既存のマルチモーダルな大規模言語モデルの自動評価手法は、創造的で連想的なマルチモーダルタスクのニュアンスに不適切に対処する、標準回答を持つ客観的クエリに依存している。これに対処するため、我々はmllm-benchを紹介する。これはvicunaに触発された革新的なベンチマークで、認識、理解、適用、分析、評価、創造を含む様々なシナリオにまたがる。 MLLM-Benchは、ユーザエクスペリエンスをより正確に反映し、モデルパフォーマンスのより包括的な評価を提供するように設計されている。比較評価は、既存のオープンソースモデルとgpt-4vの大幅な性能差を示している。我々は,MLLM-Benchがオープンソースコミュニティの進展をきっかけに,現実世界の幅広いアプリケーションに対応するユーザ中心の視覚言語モデルを開発することを仮定する。 online leaderboard in \url{https://mllm-bench.llmzoo.com} を参照。

In the pursuit of Artificial General Intelligence (AGI), the integration of vision in language models has marked a significant milestone. The advent of vision-language models (MLLMs) like GPT-4V have expanded AI applications, aligning with the multi-modal capabilities of the human brain. However, evaluating the efficacy of MLLMs poses a substantial challenge due to the subjective nature of tasks that lack definitive answers. Existing automatic evaluation methodologies on multi-modal large language models rely on objective queries that have standard answers, inadequately addressing the nuances of creative and associative multi-modal tasks. To address this, we introduce MLLM-Bench, an innovative benchmark inspired by Vicuna, spanning a diverse array of scenarios, including Perception, Understanding, Applying, Analyzing, Evaluating, and Creation along with the ethical consideration. MLLM-Bench is designed to reflect user experience more accurately and provide a more holistic assessment of model performance. Comparative evaluations indicate a significant performance gap between existing open-source models and GPT-4V. We posit that MLLM-Bench will catalyze progress in the open-source community towards developing user-centric vision-language models that meet a broad spectrum of real-world applications. See online leaderboard in \url{https://mllm-bench.llmzoo.com}.

翻訳日:2023-11-27 23:55:15 公開日:2023-11-23

# LSTMニューラルネットと多項式回帰を用いたリアルタイム物体位置予測

Object Location Prediction in Real-time using LSTM Neural Network and Polynomial Regression ( http://arxiv.org/abs/2311.13950v1 )

ライセンス: Link先を確認

Petar Stojkovi\'c, Predrag Tadi\'c

(参考訳) 本稿では,物体位置座標の予測と補間を行うシステムの設計と実装について述べる。本手法は,Long Short-Term Memory(LSTM)ニューラルネットワークと多項式回帰による慣性測定とグローバル位置決めシステムデータに基づく。 LSTMは、データシーケンスの処理と長期依存性の問題を回避するのに特に適した、リカレントニューラルネットワークの一種である。実世界の車両とGPS(グローバル測位システム)センサーのデータを応用した。様々なセンサ周波数とGPSの時間ステップとドロップアウトに対応するために、重要な前処理ステップが開発された。 LSTMベースのシステムの性能はカルマンフィルタと比較された。システムは低レイテンシと高精度でリアルタイムに動作するように調整された。我々は, 加速, 旋回, 減速, 直線経路など, 様々な運転条件下での走行試験を行った。提案手法の精度と推定時間を検証し,リアルタイムに実現可能であることを示した。従来のカルマンフィルタ法と比較して誤差は76\%減少し, 平均誤差は0.46mであり, 推定時間はlstm法と同程度であった。

This paper details the design and implementation of a system for predicting and interpolating object location coordinates. Our solution is based on processing inertial measurements and global positioning system data through a Long Short-Term Memory (LSTM) neural network and polynomial regression. LSTM is a type of recurrent neural network (RNN) particularly suited for processing data sequences and avoiding the long-term dependency problem. We employed data from real-world vehicles and the global positioning system (GPS) sensors. A critical pre-processing step was developed to address varying sensor frequencies and inconsistent GPS time steps and dropouts. The LSTM-based system's performance was compared with the Kalman Filter. The system was tuned to work in real-time with low latency and high precision. We tested our system on roads under various driving conditions, including acceleration, turns, deceleration, and straight paths. We tested our proposed solution's accuracy and inference time and showed that it could perform in real-time. Our LSTM-based system yielded an average error of 0.11 meters with an inference time of 2 ms. This represents a 76\% reduction in error compared to the traditional Kalman filter method, which has an average error of 0.46 meters with a similar inference time to the LSTM-based system.

翻訳日:2023-11-27 23:54:51 公開日:2023-11-23

# 注意ニューラルネットワークに基づく高再生可能電力系統の最適潮流

Optimal Power Flow in Highly Renewable Power System Based on Attention Neural Networks ( http://arxiv.org/abs/2311.13949v1 )

ライセンス: Link先を確認

Chen Li, Alexander Kies, Kai Zhou, Markus Schlott, Omar El Sayed, Mariia Bilousova and Horst Stoecker

(参考訳) 最適電力フロー(opf)問題は、物理的および工学的な制約に固執しながら、最小コストで需要を満たすために発電機出力と電力分布を誘導する電力系統運用において重要な問題である。しかし、風や太陽といった再生可能エネルギー源の統合は、その固有の変動性のために問題となる。この変動性は、主に気象条件の変化によって引き起こされ、電源設定の頻繁な再調整を必要とする。このタスクは、特に広範囲の電力システムにおいて、従来の数値的手法を駆使している。本稿では,模倣学習と欧州の過去の気象データを用いて学習を行う,最先端の物理学を応用した機械学習手法を提案する。提案手法は電力需要と気象パターンを電力供給・発電と直接相関させ,従来のOPF解決器の繰り返し要求を回避する。これにより、リアルタイムアプリケーションに適応するより効率的なソリューションが提供される。集約欧州電力システムにおける厳密な評価は、opf解決における既存のデータ駆動技術よりも優れていることを検証している。高速で堅牢で効率的なソリューションを提示することにより、この研究は、再生可能エネルギー時代においてより回復力のある電力システムを実現するために、リアルタイムOPF解決の新しい標準を確立する。

The Optimal Power Flow (OPF) problem is pivotal for power system operations, guiding generator output and power distribution to meet demand at minimized costs, while adhering to physical and engineering constraints. The integration of renewable energy sources, like wind and solar, however, poses challenges due to their inherent variability. This variability, driven largely by changing weather conditions, demands frequent recalibrations of power settings, thus necessitating recurrent OPF resolutions. This task is daunting using traditional numerical methods, particularly for extensive power systems. In this work, we present a cutting-edge, physics-informed machine learning methodology, trained using imitation learning and historical European weather datasets. Our approach directly correlates electricity demand and weather patterns with power dispatch and generation, circumventing the iterative requirements of traditional OPF solvers. This offers a more expedient solution apt for real-time applications. Rigorous evaluations on aggregated European power systems validate our method's superiority over existing data-driven techniques in OPF solving. By presenting a quick, robust, and efficient solution, this research sets a new standard in real-time OPF resolution, paving the way for more resilient power systems in the era of renewable energy.

翻訳日:2023-11-27 23:54:32 公開日:2023-11-23

# 量子ネットワーク絡み合い対策

Quantum network-entanglement measures ( http://arxiv.org/abs/2311.13945v1 )

ライセンス: Link先を確認

Zhen-Peng Xu, Julio I. de Vicente, Liang-Liang Sun and Sixia Yu

(参考訳) 量子ネットワークは近年注目されており、量子インターネットは長い間考えられてきた。ネットワーク絡み合いは、ネットワーク絡み合いの概念をネットワークシナリオに適用し、ネットワーク絡み合い状態は、与えられたネットワーク構造の制限を克服するためのリソースであると考えられる。本研究では,量子資源理論の一般的な枠組みにおいてよく定義された量子ネットワークの絡み合いの尺度を導入すると同時に,与えられたネットワーク内でターゲットとする量子状態を作成するために必要な余剰資源を特徴付ける明確な操作解釈を行う。特に,ネットワーク通信コストとネットワークラウンド複雑性を定義し,グラフ理論パラメータと密接に関連していることがわかった。デバイスに依存しない,デバイスに依存しない手法も提案する。

Quantum networks are of high interest nowadays and a quantum internet has been long envisioned. Network-entanglement adapts the notion of entanglement to the network scenario and network-entangled states are considered to be a resource to overcome the limitations of a given network structure. In this work we introduce measures of quantum network-entanglement that are well-defined within the general framework of quantum resource theories, which at the same time have a clear operational interpretation characterizing the extra resources necessary to prepare a targeted quantum state within a given network. In particular, we define the network communication cost and the network round complexity, which turn out to be intimately related to graph-theoretic parameters. We also provide device-dependent and device-independent methods to estimate these measures.

翻訳日:2023-11-27 23:54:10 公開日:2023-11-23

# 言語間テキストスタイル転送のための探索法:テキストデトックス化の場合

Exploring Methods for Cross-lingual Text Style Transfer: The Case of Text Detoxification ( http://arxiv.org/abs/2311.13937v1 )

ライセンス: Link先を確認

Daryna Dementieva, Daniil Moskovskiy, David Dale and Alexander Panchenko

(参考訳) テキストデトックス化(text detoxification)は、テキストのスタイルを有害から中立に移す作業である。例えば(Dale et al., 2021; Hallinan et al., 2022)、単言語的なセットアップにおいて有望な結果をもたらすアプローチがあるが、このタスクに対する言語間移動は難しい問題のままである(Moskovskiy et al., 2022)。本研究では,ある言語に対して平行なデトックス化コーパスを与えられた言語間テキストのデトックス化戦略を大規模に検討し,その目的は,そのようなコーパスを持たない他の言語にデトックス化能力を伝達することである。さらに,テキスト翻訳と非翻訳を同時に行う新しいタスクを初めて検討し,このタスクに強力なベースラインをいくつか提供した。最後に,従来のベンチマークよりも高い相関率を持つ新しい自動解毒評価指標を提案する。手動のマークアップによる最も有望なアプローチの評価を行い、言語間でテキストデトキシフィケーションの知識を伝達する最善の戦略の答えを決定する。

Text detoxification is the task of transferring the style of text from toxic to neutral. While here are approaches yielding promising results in monolingual setup, e.g., (Dale et al., 2021; Hallinan et al., 2022), cross-lingual transfer for this task remains a challenging open problem (Moskovskiy et al., 2022). In this work, we present a large-scale study of strategies for cross-lingual text detoxification -- given a parallel detoxification corpus for one language; the goal is to transfer detoxification ability to another language for which we do not have such a corpus. Moreover, we are the first to explore a new task where text translation and detoxification are performed simultaneously, providing several strong baselines for this task. Finally, we introduce new automatic detoxification evaluation metrics with higher correlations with human judgments than previous benchmarks. We assess the most promising approaches also with manual markup, determining the answer for the best strategy to transfer the knowledge of text detoxification between languages.

翻訳日:2023-11-27 23:53:57 公開日:2023-11-23

# 相関距離とネットワークプルーニングによるロバスト性強化知識蒸留

Robustness-Reinforced Knowledge Distillation with Correlation Distance and Network Pruning ( http://arxiv.org/abs/2311.13934v1 )

ライセンス: Link先を確認

Seonghak Kim, Gyeongdo Ham, Yucheol Cho, and Daeshik Kim

(参考訳) 効率的で軽量なモデル(すなわち学生モデル)の性能の向上は、より複雑なモデル(すなわち教師モデル)から知識を伝達する知識蒸留(KD)によって達成される。しかし、既存のKD技術のほとんどは、特定の制限を持つKL(Kullback-Leibler)の発散に依存している。まず、教師分布がエントロピーが高い場合、kl発散のモード平均化の性質は、十分なターゲット情報の転送を妨げる。第二に、教師の分布が低エントロピーである場合、KL分散は特定のモードに過度に集中する傾向にあり、学生に十分な量の貴重な知識を伝達できない。結果として、多くの難解なサンプルを含むデータセットを扱う場合、学生モデルは十分な知識を得るのに苦労し、結果として性能が劣る可能性がある。さらに,これまでのkdアプローチでは,モデルの一般化を促進する技術であるデータ拡張が悪影響を及ぼす可能性があることを観察した。そこで我々は,相関距離とネットワークプルーニングを利用したロバストネス強化知識蒸留(R2KD)を提案する。このアプローチにより、KDはパフォーマンス改善のためにデータ拡張を効果的に組み込むことができる。 cifar-100、fgvr、tinyimagenet、imagenetなど、さまざまなデータセットに関する広範な実験は、現在の最先端の方法よりも優れた方法を示している。

The improvement in the performance of efficient and lightweight models (i.e., the student model) is achieved through knowledge distillation (KD), which involves transferring knowledge from more complex models (i.e., the teacher model). However, most existing KD techniques rely on Kullback-Leibler (KL) divergence, which has certain limitations. First, if the teacher distribution has high entropy, the KL divergence's mode-averaging nature hinders the transfer of sufficient target information. Second, when the teacher distribution has low entropy, the KL divergence tends to excessively focus on specific modes, which fails to convey an abundant amount of valuable knowledge to the student. Consequently, when dealing with datasets that contain numerous confounding or challenging samples, student models may struggle to acquire sufficient knowledge, resulting in subpar performance. Furthermore, in previous KD approaches, we observed that data augmentation, a technique aimed at enhancing a model's generalization, can have an adverse impact. Therefore, we propose a Robustness-Reinforced Knowledge Distillation (R2KD) that leverages correlation distance and network pruning. This approach enables KD to effectively incorporate data augmentation for performance improvement. Extensive experiments on various datasets, including CIFAR-100, FGVR, TinyImagenet, and ImageNet, demonstrate our method's superiority over current state-of-the-art methods.

翻訳日:2023-11-27 23:53:35 公開日:2023-11-23

# 音源自由物体検出のための教師学生の定期交流

Periodically Exchange Teacher-Student for Source-Free Object Detection ( http://arxiv.org/abs/2311.13930v1 )

ライセンス: Link先を確認

Qipeng Liu, Luojun Lin, Zhifeng Shen, Zhifeng Yang

(参考訳) source-free object detection (sfod) は、ソースドメインデータがない場合、未ラベルのターゲットドメインデータにソース検出器を適用することを目的としている。ほとんどのSFOD法は、学生モデルを1つの教師モデルのみで指導する平均教師(MT)フレームワークを用いて、同じ自己学習パラダイムに従っている。しかし、このようなパラダイムは、ドメインシフトによって教師モデルが制御不能に崩壊すると、生徒モデルも劇的なパフォーマンス低下に苦しむようなトレーニング不安定問題に容易に陥る可能性がある。そこで,本稿では,静的教師,動的教師,学生モデルからなるマルチ・ティーチャー・フレームワークを導入するための,単純かつ斬新な手法であるpets法を提案する。学習段階では,静的教師と生徒モデルの重み付けを定期的に交換する。そして,静的教師によって既に交換されている学生モデルの移動平均を用いて,動的教師を更新する。このようにして、動的教師は過去の知識を統合し、エラーの蓄積を効果的に削減し、mtベースのフレームワーク内でより安定したトレーニングプロセスを可能にする。さらに,2つの教師モデルの予測を融合し,生徒モデルに高品質な擬似ラベルを提供するコンセンサス機構を開発した。複数のSFODベンチマークにおいて,提案手法が他の手法と比較して最先端性能を実現し,SFODタスクにおける本手法の有効性と優位性を実証した。

Source-free object detection (SFOD) aims to adapt the source detector to unlabeled target domain data in the absence of source domain data. Most SFOD methods follow the same self-training paradigm using mean-teacher (MT) framework where the student model is guided by only one single teacher model. However, such paradigm can easily fall into a training instability problem that when the teacher model collapses uncontrollably due to the domain shift, the student model also suffers drastic performance degradation. To address this issue, we propose the Periodically Exchange Teacher-Student (PETS) method, a simple yet novel approach that introduces a multiple-teacher framework consisting of a static teacher, a dynamic teacher, and a student model. During the training phase, we periodically exchange the weights between the static teacher and the student model. Then, we update the dynamic teacher using the moving average of the student model that has already been exchanged by the static teacher. In this way, the dynamic teacher can integrate knowledge from past periods, effectively reducing error accumulation and enabling a more stable training process within the MT-based framework. Further, we develop a consensus mechanism to merge the predictions of two teacher models to provide higher-quality pseudo labels for student model. Extensive experiments on multiple SFOD benchmarks show that the proposed method achieves state-of-the-art performance compared with other related methods, demonstrating the effectiveness and superiority of our method on SFOD task.

翻訳日:2023-11-27 23:53:11 公開日:2023-11-23

# Duling Banditを用いた直接選好に基づく進化的多目的最適化

Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit ( http://arxiv.org/abs/2311.14003v1 )

ライセンス: Link先を確認

Tian Huang, Ke Li

(参考訳) 最適化問題は、単目的シナリオと多目的シナリオの両方で広く用いられる。実践的なアプリケーションでは、ユーザはParetoフロント(PF)に沿って関心領域(ROI)に収束するソリューションを志しています。従来のアプローチでは,適合度関数や客観的関数を近似してユーザの好みを反映するが,本論文では代替手段を検討する。具体的には、人間のフィードバックのみに頼って、フィットネス関数の計算を補助的に行う方法を見つけることを目的とする。提案手法は,アクティブなデュリングバンディットアルゴリズムによって直接選好学習が容易になることを示す。実験段階は3つのセッションに分けられる。まず,我々のアクティブデュエルバンディットアルゴリズムの性能を評価する。次に,多目的進化アルゴリズム(MOEA)の文脈内で提案手法を実装した。最後に,タンパク質構造予測(PSP)において本手法を実用上の問題に展開する。本研究は,従来の手法の限界に対処するだけでなく,最適化問題に対する新たな可能性を明らかにする,インタラクティブな嗜好ベースのMOEAフレームワークを提案する。

Optimization problems find widespread use in both single-objective and multi-objective scenarios. In practical applications, users aspire for solutions that converge to the region of interest (ROI) along the Pareto front (PF). While the conventional approach involves approximating a fitness function or an objective function to reflect user preferences, this paper explores an alternative avenue. Specifically, we aim to discover a method that sidesteps the need for calculating the fitness function, relying solely on human feedback. Our proposed approach entails conducting direct preference learning facilitated by an active dueling bandit algorithm. The experimental phase is structured into three sessions. Firstly, we assess the performance of our active dueling bandit algorithm. Secondly, we implement our proposed method within the context of Multi-objective Evolutionary Algorithms (MOEAs). Finally, we deploy our method in a practical problem, specifically in protein structure prediction (PSP). This research presents a novel interactive preference-based MOEA framework that not only addresses the limitations of traditional techniques but also unveils new possibilities for optimization problems.

翻訳日:2023-11-27 23:45:54 公開日:2023-11-23

# 固定周波数を有する量子オットー型熱エンジン

A quantum Otto-type heat engine with fixed frequency ( http://arxiv.org/abs/2311.13999v1 )

ライセンス: Link先を確認

Richard Q. Matos, Rogerio J. de Assis, and Norton G. de Almeida

(参考訳) 本研究では,量子調和振動子(qho)からなる動作物質を用いてオットー型サイクルを解析する。 qhoの周波数を変化させて、圧縮した貯水池で熱化させることで作業抽出を行う他の研究とは異なり、ここでは、qhoをスクイーズパラメータで制御されたパラメトリックポンプに送信し、熱貯水池で熱化させる。次に,パラメトリックポンプを用いたオットー型エンジンにおけるスクイーズパラメータの役割について検討し,スキューズパラメータを任意に増加させることでカルノー限界に到達可能であることを示す。特に、あるスクイーズパラメータ$r$、例えば$r=0.4$の場合、準静的オットー極限は非ゼロパワーでも到達できる。また, ユニタリストローク時の効率挙動におけるエントロピー生成の役割について検討し, エンジン効率の正の(負の)変化は, 予想通り増加(低下)に対応することを示した。さらに, 熱貯留下では, 系のハミルトニアンによって導入された量子資源によらず, カーノーエンジンよりも効率の良い作業抽出プロセスは不可能であることを示す。

In this work, we analyze an Otto-type cycle operating with a working substance composed of a quantum harmonic oscillator (QHO). Unlike other studies in which the work extraction is done by varying the frequency of the QHO and letting it thermalize with a squeezed reservoir, here we submit the QHO to a parametric pumping controlled by the squeezing parameter and let it thermalize with a thermal reservoir. We then investigate the role of the squeezing parameter in our Otto-type engine powered by parametric pumping and show that it is possible to reach the Carnot limit by arbitrarily increasing the squeezing parameter. Notably, for certain squeezing parameters $r$, e.g. $r=0.4$, the quasi-static Otto limit can be reached even at non-zero power. We also investigated the role of entropy production in the efficiency behavior during the unitary strokes, showing that positive (negative) changes in entropy production correspond to increases (decreases) in engine efficiency, as expected. Furthermore, we show that under thermal reservoirs a work extraction process that is more efficient than the Carnot engine is impossible, regardless of the quantum resource introduced via the Hamiltonian of the system.

翻訳日:2023-11-27 23:45:38 公開日:2023-11-23

# GRJointNET: 3次元不完全点雲の相乗的補完と部分分割

GRJointNET: Synergistic Completion and Part Segmentation on 3D Incomplete Point Clouds ( http://arxiv.org/abs/2311.13997v1 )

ライセンス: Link先を確認

Yigit Gurses, Melisa Taspinar, Mahmut Yurt, Sedat Ozer

(参考訳) 三次元3次元点雲の分割は自律システムにとって重要な課題である。しかしながら、セグメンテーションアルゴリズムの成功は、基礎となるポイントクラウド(解像度、完全性など)の品質に大きく依存する。特に不完全点雲は下流モデルの性能を低下させる可能性がある。 grnetは、完全ポイントクラウドに対する新しいディープラーニングソリューションとして提案されているが、部分セグメンテーションはできない。一方,提案手法であるGRJointNetは,GRNetの後継として,ポイントクラウド上で共同補完とセグメンテーションを行うことができるアーキテクチャである。 2つのタスクのために抽出された特徴も、全体的なパフォーマンスを高めるために互いに利用されます。提案したネットワークをShapeNet-Partデータセット上で評価し,その性能をGRNetと比較した。 GRJointNet は GRNet より優れていることを示す。 GRJointNetはセグメンテーションができないが、GRJointNetはセグメンテーションができない。この研究1は、自律システムの3Dビジョンにおける点雲の実用性と実用性を高めることを約束している。

Segmentation of three-dimensional (3D) point clouds is an important task for autonomous systems. However, success of segmentation algorithms depends greatly on the quality of the underlying point clouds (resolution, completeness etc.). In particular, incomplete point clouds might reduce a downstream model's performance. GRNet is proposed as a novel and recent deep learning solution to complete point clouds, but it is not capable of part segmentation. On the other hand, our proposed solution, GRJointNet, is an architecture that can perform joint completion and segmentation on point clouds as a successor of GRNet. Features extracted for the two tasks are also utilized by each other to increase the overall performance. We evaluated our proposed network on the ShapeNet-Part dataset and compared its performance to GRNet. Our results demonstrate GRJointNet can outperform GRNet on point completion. It should also be noted that GRNet is not capable of segmentation while GRJointNet is. This study1, therefore, holds a promise to enhance practicality and utility of point clouds in 3D vision for autonomous systems.

翻訳日:2023-11-27 23:45:15 公開日:2023-11-23

# eigen: 文書画像からの忠実度情報抽出のための専門家による共同学習アグリゲーション

EIGEN: Expert-Informed Joint Learning Aggregation for High-Fidelity Information Extraction from Document Images ( http://arxiv.org/abs/2311.13993v1 )

ライセンス: Link先を確認

Abhishek Singh, Venkatapathy Subramanian, Ayush Maheshwari, Pradeep Narayan, Devi Prasad Shetty, Ganesh Ramakrishnan

(参考訳) 文書画像からの情報抽出(IE)は,レイアウトフォーマットの多様性が高いため困難である。 LayoutLMやBROSのような深層モデルはこの問題に対処するために提案されており、有望な結果を示している。しかし、これらのモデルのトレーニングには、まだ大量のフィールドレベルのアノテーションが必要です。幾何学的位置やフィールドの種類といった形式のレイアウトやセマンティクスの理解に基づいて、ルールベースの手法を用いた他のアプローチも提案されている。本研究では,ルールベース手法とディープラーニングモデルを組み合わせて,大量のトレーニングデータのアノテーション要件を回避するための新しい手法であるeigen(expert-informed joint learning aggreation)を提案する。具体的には、eigenは複数のヒューリスティックから引き起こされる弱いラベルを生成モデルを通じて統合し、少数の注釈付きラベルと共に使用して深層モデルを訓練する。本稿では,文脈情報を組み込んだラベル付け機能を用いて,単語の視覚的・言語的コンテキストを正確に分類する手法を提案する。 EIGENフレームワークは,ラベル付きデータインスタンスをほとんど使用せずに,最先端のディープモデルの性能を大幅に向上させることができることを実証的に示す。ソースコードはhttps://github.com/ayushayush591/EIGEN-High-Fidelity-Extraction-Document-Imagesで公開されている。

Information Extraction (IE) from document images is challenging due to the high variability of layout formats. Deep models such as LayoutLM and BROS have been proposed to address this problem and have shown promising results. However, they still require a large amount of field-level annotations for training these models. Other approaches using rule-based methods have also been proposed based on the understanding of the layout and semantics of a form such as geometric position, or type of the fields, etc. In this work, we propose a novel approach, EIGEN (Expert-Informed Joint Learning aGgrEatioN), which combines rule-based methods with deep learning models using data programming approaches to circumvent the requirement of annotation of large amounts of training data. Specifically, EIGEN consolidates weak labels induced from multiple heuristics through generative models and use them along with a small number of annotated labels to jointly train a deep model. In our framework, we propose the use of labeling functions that include incorporating contextual information thus capturing the visual and language context of a word for accurate categorization. We empirically show that our EIGEN framework can significantly improve the performance of state-of-the-art deep models with the availability of very few labeled data instances. The source code is available at https://github.com/ayushayush591/EIGEN-High-Fidelity-Extraction-Document-Images.

翻訳日:2023-11-27 23:44:57 公開日:2023-11-23

# 学習型ダウンウォッシュモデルを用いた近接ドッキングマルチロータ

Docking Multirotors in Close Proximity using Learnt Downwash Models ( http://arxiv.org/abs/2311.13988v1 )

ライセンス: Link先を確認

Ajay Shankar, Heedo Woo, Amanda Prorok

(参考訳) 非モデル空力障害は、複数の車両が互いに近接している場合、マルチロータ飛行において重要な課題となる。しかし、あるミッション \textit{require} 2つのマルチローターが互いに1-2体の長さで接近し、形成を保ちます。このリーダー従者設定では、従者は最終ドッキング段階でリーダーから大きなダウンウォッシュの干渉を受ける。これを補うために,最適フィードバックコントローラ内でオンライン上で学習したダウンウォッシュモデルを用いてドッキング動作を正確に追跡し,形成を保持する。実世界の飛行と操縦の異なる飛行を通して、この補償が従来のニーブアプローチで必要とされる大きな垂直離間を減らすために重要であることを示す。本評価では,リーダーの2つの体長内に垂直に接近したときの追従者に対する追従誤差が0.06m未満(3～4倍減少)であった。最後に,2つの空飛ぶマルチローター間の物理的ドッキングを,単一のスムーズな計画軌道で実施する。

Unmodeled aerodynamic disturbances pose a key challenge for multirotor flight when multiple vehicles are in close proximity to each other. However, certain missions \textit{require} two multirotors to approach each other within 1-2 body-lengths of each other and hold formation -- we consider one such practical instance: vertically docking two multirotors in the air. In this leader-follower setting, the follower experiences significant downwash interference from the leader in its final docking stages. To compensate for this, we employ a learnt downwash model online within an optimal feedback controller to accurately track a docking maneuver and then hold formation. Through real-world flights with different maneuvers, we demonstrate that this compensation is crucial for reducing the large vertical separation otherwise required by conventional/naive approaches. Our evaluations show a tracking error of less than 0.06m for the follower (a 3-4x reduction) when approaching vertically within two body-lengths of the leader. Finally, we deploy the complete system to effect a successful physical docking between two airborne multirotors in a single smooth planned trajectory.

翻訳日:2023-11-27 23:44:36 公開日:2023-11-23

# jam-alt: フォーマッティングアウェアな歌詞書き起こしベンチマーク

Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark ( http://arxiv.org/abs/2311.13987v1 )

ライセンス: Link先を確認

Ond\v{r}ej C\'ifka, Constantinos Dimitriou, Cheng-i Wang, Hendrik Schreiber, Luke Miner, Fabian-Robert St\"oter

(参考訳) 現在のalt(automatic lyrics transcription)ベンチマークは、言語コンテンツのみに焦点を当てており、書式や句読点を含む歌詞の微妙なニュアンスを無視しているため、ミュージシャンやソングライターの創造的製品やリスナーの経験との潜在的な不一致につながる可能性がある。例えば、ラインブレークはリズム、感情強調、韻律、高レベルの構造に関する情報を伝える上で重要である。この問題に対処するため,JamendoLyricsデータセットをベースとした新しい歌詞書き起こしベンチマークであるJam-ALTを導入する。私たちの貢献は2倍です。まず、書き起こしの完全な改訂は、音楽産業のガイドラインを統一し、句読点、線切れ、綴り、バックグラウンドボーカル、非単語音といった側面をカバーする、新たに作成された注釈ガイドに従うことで、ALTの評価に特化している。第二に、従来の単語エラー率とは異なり、このような現象を捉えるために設計された評価指標のセット。提案するベンチマークがALTタスクに寄与し,より正確で信頼性の高い書き起こしシステムの評価と,字幕の字幕表示やカラオケなどの歌詞アプリケーションにおけるユーザエクスペリエンスの向上を期待する。

Current automatic lyrics transcription (ALT) benchmarks focus exclusively on word content and ignore the finer nuances of written lyrics including formatting and punctuation, which leads to a potential misalignment with the creative products of musicians and songwriters as well as listeners' experiences. For example, line breaks are important in conveying information about rhythm, emotional emphasis, rhyme, and high-level structure. To address this issue, we introduce Jam-ALT, a new lyrics transcription benchmark based on the JamendoLyrics dataset. Our contribution is twofold. Firstly, a complete revision of the transcripts, geared specifically towards ALT evaluation by following a newly created annotation guide that unifies the music industry's guidelines, covering aspects such as punctuation, line breaks, spelling, background vocals, and non-word sounds. Secondly, a suite of evaluation metrics designed, unlike the traditional word error rate, to capture such phenomena. We hope that the proposed benchmark contributes to the ALT task, enabling more precise and reliable assessments of transcription systems and enhancing the user experience in lyrics applications such as subtitle renderings for live captioning or karaoke.

翻訳日:2023-11-27 23:44:15 公開日:2023-11-23

# FViT-Grasp:高速ビジョン変換器を用いた物体のグラッピング

FViT-Grasp: Grasping Objects With Using Fast Vision Transformers ( http://arxiv.org/abs/2311.13986v1 )

ライセンス: Link先を確認

Arda Sarp Yenicesu, Berk Cicek and Ozgur S.Oguz

(参考訳) 本研究はロボット工学における課題であるマニピュレーションの課題を扱っている。我々は,ロボットが物体を操作するための最適な把握ポイントを迅速かつ正確に同定するための新しい手法を考案した。我々のアプローチは、視覚データを処理し、最も適切な把握位置を予測するように設計されたニューラルネットワークであるFViT(Fast Vision Transformer)を活用する。高い精度を維持しながら, 最先端の性能を実証し, リアルタイムロボット把持アプリケーションへの展開を期待する。この研究は、視覚ベースのロボット把持応用における将来の研究のベースラインを提供すると信じている。その高速かつ精度は、研究者を現実の応用に近づける。

This study addresses the challenge of manipulation, a prominent issue in robotics. We have devised a novel methodology for swiftly and precisely identifying the optimal grasp point for a robot to manipulate an object. Our approach leverages a Fast Vision Transformer (FViT), a type of neural network designed for processing visual data and predicting the most suitable grasp location. Demonstrating state-of-the-art performance in terms of speed while maintaining a high level of accuracy, our method holds promise for potential deployment in real-time robotic grasping applications. We believe that this study provides a baseline for future research in vision-based robotic grasp applications. Its high speed and accuracy bring researchers closer to real-life applications.

翻訳日:2023-11-27 23:43:48 公開日:2023-11-23

# フォトニックプロセッサにおける誤差緩和変動アルゴリズム

Error mitigated variational algorithm on a photonic processor ( http://arxiv.org/abs/2311.13985v1 )

ライセンス: Link先を確認

O.V. Borzenkova, G.I. Struchalin, I. Kondratyev, A. Moiseevskiy, I.V. Dyakonov, and S.S. Straupe

(参考訳) 量子フォトニックプロセッサにおける不明瞭性関連ノイズの誤差低減効果を示す。変動量子固有解法(VQE)にゼロノイズ外挿法(ZNE)を適用する。 2つの異なるエラーレベルで測定された観測可能な値は、ノイズのないレジームへと外挿することができる。シュウィンガーハミルトニアンのためのVQEを実装した2ビットプロセッサにおける光子の偏微分性の影響について検討する。その結果、ZNE法によるハミルトン固有値推定の改善が証明された。最後に、外部制御偏光を持つ他の線形光プロセッサに対する誤差緩和法の適用可能性を分析する。

We demonstrate successful error mitigation of indistinguishability-related noise in a quantum photonic processor. We apply zero-noise extrapolation (ZNE) technique to a variational quantum eigensolver (VQE). The observable values measured at two-different error levels allow us to extrapolate it towards noise-free regime. We study influence of the partial distinguishability of photons in a two-qubit processor which implements the VQE for a Schwinger Hamiltonian. The results evidence the improvement of the Hamiltonian eigenvalue estimation using the ZNE procedure. Lastly, we analyze potential applicability of this error mitigation method to other linear optical processors with externally controlled polarization.

翻訳日:2023-11-27 23:43:37 公開日:2023-11-23

# 家庭外配送の学習動的選択と価格設定

Learning Dynamic Selection and Pricing of Out-of-Home Deliveries ( http://arxiv.org/abs/2311.13983v1 )

ライセンス: Link先を確認

Fabian Akkerman, Peter Dieter, Martijn Mes

(参考訳) 宅配の失敗、交通渋滞、そして比較的大きなハンドリング時間が、ラストマイル物流の収益性に悪影響を及ぼす。これらの外部要因は、全体のコストの最大28セントと、宅配サプライチェーンのエミッションの最大25セントに寄与する。年間成長率が最大36.5%まで上昇する可能性のある解決策は、宅配(OOH)で示されるパーセルロッカーやパーセルショップへの配送である。学術文献では、OOH提供に関する顧客行動のモデルが決定論的設定に限定されており、実際の顧客選択の確率的性質とは対照的である。我々は、今後の顧客の到着や選択を考慮して、到着する顧客に対するインセンティブに対して、OOHロケーションが提供すべきシーケンシャルな意思決定問題をモデル化する。本稿では、畳み込みニューラルネットワークへの入力として、新しい時空間状態符号化を用いたアルゴリズムパイプラインであるOOH(DSPO)の動的選択と価格設定を提案する。提案手法を3つの最先端アプローチに対してベンチマークすることで,本手法の性能を実証する。実世界のデータによって導かれた広範な数値研究により、dspoはoohの配置のない状況と比較して20.8\%のコストを節約でき、静的選択と価格ポリシーと比較して8.1\%、最先端の需要管理ベンチマークと比較して4.6\%のコストを節約できることが明らかとなった。当社では,ooh配信のダイナミクスと価格戦略による顧客の行動との複雑な相互作用に関する総合的な洞察を提供する。この結果から,OOHデリバリーが市場シェアを拡大するにつれて,動的選択と価格政策を採用することが示唆された。

Home delivery failures, traffic congestion, and relatively large handling times have a negative impact on the profitability of last-mile logistics. These external factors contribute to up to $28\%$ of the overall costs and $25\%$ of emissions for the home delivery supply chain. A potential solution, showing annual growth rates up to $36\%$, is the delivery to parcel lockers or parcel shops, denoted by out-of-home (OOH) delivery. In the academic literature, models of customer behavior with respect to OOH delivery were so far limited to deterministic settings, contrasting with the stochastic nature of actual customer choices. We model the sequential decision-making problem of which OOH location to offer against what incentive for each incoming customer, taking into account future customer arrivals and choices. We propose Dynamic Selection and Pricing of OOH (DSPO), an algorithmic pipeline that uses a novel spatial-temporal state encoding as input to a convolutional neural network. We demonstrate the performance of our method by benchmarking it against three state-of-the-art approaches. Our extensive numerical study, guided by real-world data, reveals that DSPO can save $20.8\%$ in costs compared to a situation without OOH locations, $8.1\%$ compared to a static selection and pricing policy, and $4.6\%$ compared to a state-of-the-art demand management benchmark. We provide comprehensive insights into the complex interplay between OOH delivery dynamics and customer behavior influenced by pricing strategies. The implications of our findings suggest practitioners to adopt dynamic selection and pricing policies as OOH delivery gains a larger market share.

翻訳日:2023-11-27 23:43:30 公開日:2023-11-23

# 知識集約型質問に対する確率的思考木推論

Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions ( http://arxiv.org/abs/2311.13982v1 )

ライセンス: Link先を確認

Shulin Cao, Jiajie Zhang, Jiaxin Shi, Xin Lv, Zijun Yao, Qi Tian, Juanzi Li, Lei Hou

(参考訳) 大規模言語モデル(LLM)は、知識集約的な複雑な質問にチェーン・オブ・シント(CoT)推論で答えることができる。しかし、モデルパラメーターで必要な知識が利用できない場合や最新の場合、実際には誤った推論ステップを生成する傾向がある。最近の研究は、CoT推論を強化するための外部知識の回収に向けられている。有望であるにも拘わらず、これらのチェーンベースの方法は: 1)否定的検索。不要又は不正確な検索は,その推論を誤解することができる。 2) 視界が限られている。後方または前方を見る能力が欠如しているため、あるステップで局所的なエラーが連鎖に沿って伝播する。本稿では,確率的ツリー・オブ・シント推論(ProbTree)という新しいアプローチを提案する。まず、LLMは複雑な質問をクエリツリーに変換し、各非ルートノードはその親ノードのサブクエストを表す。そして、木の上に確率論的推論を行い、問合せと解答の両方の信頼性を考慮して葉から根まで質問を解く。推論中、葉ノードでは、パラメトリック知識を用いたクローズドブックQAと、検索した外部知識を用いたオープンブックQAからより確実な回答を選択し、負の検索問題を除去する。階層構造を持つ非リードノードの場合、llmはより広い視野を持ち、子ノードからの情報をグローバルに推論できるため、ローカルエラーから回復することができる。オープンドメイン設定下での3つの複雑なQAデータセット実験により,本手法がSOTA法よりも優れており,確率的ツリー・オブ・シークレット推論の効果が示された。

Large language models (LLMs) are capable of answering knowledge-intensive complex questions with chain-of-thought (CoT) reasoning. However, they tend to generate factually incorrect reasoning steps when the required knowledge is not available or up-to-date in models' parameters. Recent works turn to retrieving external knowledge to augment CoT reasoning. Despite being promising, these chain-based methods suffer from: 1) Negative retrieval. Unnecessary or incorrect retrieval may mislead the reasoning; 2) Limited sight. Lacking the ability to look backward or forward, a local error in one step will propagate along the chain. In this paper, we propose a novel approach: Probabilistic Tree-of-thought Reasoning (ProbTree). First, LLMs translate a complex question into a query tree, in which each non-root node denotes a sub-question of its parent node. Then, probabilistic reasoning is conducted over the tree, by solving questions from leaf to root considering the confidence of both question decomposing and answering. During reasoning, for leaf nodes, LLMs choose a more confident answer from Closed-book QA that employs parametric knowledge and Open-book QA that employs retrieved external knowledge, thus eliminating the negative retrieval problem. For non-leaf nodes, with the hierarchical structure, LLMs have broader sights and are able to globally reason with the information from child nodes, thus recovering from local errors. The experiments on three Complex QA datasets under the open-domain setting show that our approach outperforms SOTA methods significantly, demonstrating the effect of probabilistic tree-of-thought reasoning.

翻訳日:2023-11-27 23:43:00 公開日:2023-11-23

# MedISure: 混合境界解析を用いた機械学習に基づく医用画像分類支援

MedISure: Towards Assuring Machine Learning-based Medical Image Classifiers using Mixup Boundary Analysis ( http://arxiv.org/abs/2311.13978v1 )

ライセンス: Link先を確認

Adam Byfield, William Poulett, Ben Wallace, Anusha Jose, Shatakshi Tyagi, Smita Shembekar, Adnan Qayyum, Junaid Qadir, and Muhammad Bilal

(参考訳) 機械学習(ml)モデルは医療技術に不可欠なものになってきており、安全性、公平性、堅牢性、信頼性を検証するための公式な保証が求められている。これらのモデルは本質的にエラーを起こしやすいため、患者の健康に深刻なリスクを及ぼす可能性がある。従来のソフトウェア保証技術は固定コードに依存しており、これらのアルゴリズムはトレーニングプロセスを通じてキュレートされたデータセットから適応および学習できるため、mlモデルに直接適用されない。しかし、合成テストデータを用いた境界試験のような確立された原則を適用することで、このギャップを効果的に埋めることができる。そこで,本稿では,画像分類器の予測フェアネスの評価を容易にするmix-up boundary analysis (muba) という手法を提案する。脳腫瘍の分類と乳癌の分類という2つの重要な医療画像診断課題について評価し,有望な結果を得た。本研究の目的は、医療技術の安全性と信頼性を高めるため、MLモデルの評価に従来の保証原則を適用することの重要性を明らかにすることである。今後の研究を促進するため、MUBAのコードを公開する予定です。

Machine learning (ML) models are becoming integral in healthcare technologies, presenting a critical need for formal assurance to validate their safety, fairness, robustness, and trustworthiness. These models are inherently prone to errors, potentially posing serious risks to patient health and could even cause irreparable harm. Traditional software assurance techniques rely on fixed code and do not directly apply to ML models since these algorithms are adaptable and learn from curated datasets through a training process. However, adapting established principles, such as boundary testing using synthetic test data can effectively bridge this gap. To this end, we present a novel technique called Mix-Up Boundary Analysis (MUBA) that facilitates evaluating image classifiers in terms of prediction fairness. We evaluated MUBA for two important medical imaging tasks -- brain tumour classification and breast cancer classification -- and achieved promising results. This research aims to showcase the importance of adapting traditional assurance principles for assessing ML models to enhance the safety and reliability of healthcare technologies. To facilitate future research, we plan to publicly release our code for MUBA.

翻訳日:2023-11-27 23:42:34 公開日:2023-11-23

# 回転LiDARセンサの連続クラスタリングによる低レイテンシインスタンス分割

Low Latency Instance Segmentation by Continuous Clustering for Rotating LiDAR Sensors ( http://arxiv.org/abs/2311.13976v1 )

ライセンス: Link先を確認

Andreas Reich and Hans-Joachim Wuensche

(参考訳) LiDARポイントクラウドの低レイテンシインスタンスセグメンテーションは、ロボットの知覚パイプラインにおいて初期的で頻繁に使用されるビルディングブロックとして機能するため、現実世界のアプリケーションでは不可欠である。特に動的環境において、この全遅延は、高速道路のシナリオに見られるように、動的物体のかなりの位置オフセットをもたらす。この問題に対処するため,我々は,インスタンス単位のポイントクラウドを得るために,障害点の連続的クラスタリングを用いる。 LiDARセンサーの完全な革命を利用する既存のアプローチとは異なり、データストリームを連続的かつシームレスに処理します。より具体的には、レンジイメージの各カラムはすぐに処理される。障害ポイントは、既存のインスタンスにリアルタイムでクラスタ化され、インスタンスが完了して公開準備が整った高周波でチェックされる。もう1つの利点は、スキャンの開始点と終了点の間の問題のある不連続が観察されないことである。本稿では,入力データをリアルタイムにクラスタ化可能な2層データ構造と,それに対応する連続クラスタリングアルゴリズムについて述べる。我々は、大きな知覚的視野の重要性を説明します。さらに,ディープラーニングに基づく低レイテンシインスタンスセグメンテーションのためのアーキテクチャの設計に関係のある重要なアーキテクチャ設計選択について記述し,評価する。ソースコードはhttps://github.com/UniBwTAS/continuous_clustering.comで公開しています。

Low-latency instance segmentation of LiDAR point clouds is crucial in real-world applications because it serves as an initial and frequently-used building block in a robot's perception pipeline, where every task adds further delay. Particularly in dynamic environments, this total delay can result in significant positional offsets of dynamic objects, as seen in highway scenarios. To address this issue, we employ continuous clustering of obstacle points in order to obtain an instance-segmented point cloud. Unlike most existing approaches, which use a full revolution of the LiDAR sensor, we process the data stream in a continuous and seamless fashion. More specifically, each column of a range image is processed as soon it is available. Obstacle points are clustered to existing instances in real-time and it is checked at a high-frequency which instances are completed and are ready to be published. An additional advantage is that no problematic discontinuities between the points of the start and the end of a scan are observed. In this work we describe the two-layered data structure and the corresponding algorithm for continuous clustering, which is able to cluster the incoming data in real time. We explain the importance of a large perceptive field of view. Furthermore, we describe and evaluate important architectural design choices, which could be relevant to design an architecture for deep learning based low-latency instance segmentation. We are publishing the source code at https://github.com/UniBwTAS/continuous_clustering.

翻訳日:2023-11-27 23:42:02 公開日:2023-11-23

# 量子セキュアな直接通信の進化--qinternetへの道のり

The Evolution of Quantum Secure Direct Communication: On the Road to the Qinternet ( http://arxiv.org/abs/2311.13974v1 )

ライセンス: Link先を確認

Dong Pan, Gui-Lu Long, Liuguo Yin, Yu-Bo Sheng, Dong Ruan, Soon Xin Ng, Jianhua Lu, and Lajos Hanzo

(参考訳) 通信セキュリティは、新興の量子コンピュータの巨大なコンピューティングパワーの脅威に直面して、より高い平面に進化する必要がある。 qsdc(quantum secure direct communication)は、量子通信の有望な分野であり、量子コンピューティングの脅威を立証し克服すると同時に、量子チャネルを介してシークレットメッセージを直接伝達する。本稿では,qsdc研究の動機と現状について,その理論的基礎と実験的検証を中心に紹介する。我々は、関連するポイントツーポイント通信プロトコルを詳述し、情報の保護と送信方法を示す。最後に,QSDC は純量子鍵分布(QKD)プロトコルではなく,完全なセキュア通信方式であることを強調して,オープンな課題と QSDC ネットワークの今後の動向について議論する。

Communication security has to evolve to a higher plane in the face of the threat from the massive computing power of the emerging quantum computers. Quantum secure direct communication (QSDC) constitutes a promising branch of quantum communication, which is provably secure and overcomes the threat of quantum computing, whilst conveying secret messages directly via the quantum channel. In this survey, we highlight the motivation and the status of QSDC research with special emphasis on its theoretical basis and experimental verification. We will detail the associated point-to-point communication protocols and show how information is protected and transmitted. Finally, we discuss the open challenges as well as the future trends of QSDC networks, emphasizing again that QSDC is not a pure quantum key distribution (QKD) protocol, but a fully-fledged secure communication scheme.

翻訳日:2023-11-27 23:41:21 公開日:2023-11-23

# ファインマンとフォン・ノイマンの仮定から制限されたファインマン経路積分へ:時間的連続量子測定の数学的理論

From each of Feynman's and von Neumann's postulates to the restricted Feynman path integrals: a mathematical theory of temporally continuous quantum measurements ( http://arxiv.org/abs/2311.13972v1 )

ライセンス: Link先を確認

Wataru Ichinose

(参考訳) ファインマンは1948年に有名な論文で仮定や量子化の方法を提案した。ファインマンの仮定を粒子の位置の時間的連続量子測定に適用し、メンスキーは現象学的考察の後、連続量子測定のための制限されたファインマン経路積分を提案した。本論文の目的は,メンスキーの制限されたファインマン経路積分が,単純な近似の下でファインマンの仮定から現れることを厳密に証明することである。加えて、制限されたファインマン経路積分はフォン・ノイマンの仮定やファインマンの仮定から生じることが証明されている。私たちが研究している量子系はスピン系を含む。これらの結果は、マルチスプリット実験、量子ゼノンおよびアハラノフ・ボーム効果の定式化に適用される。

Feynman proposed a postulate or a method of quantization in his celebrated paper in 1948. Applying Feynman's postulate to temporally continuous quantum measurements of the positions of particles, Mensky proposed the restricted Feynman path integrals for continuous quantum measurements after phenomenological considerations. Our aim in the present paper is to give a rigorous proof that Mensky's restricted Feynman path integrals emerge out of the Feynman's postulate under a simple approximation. In addition, it is proved that the restricted Feynman path integrals emerge out of von Neumann's postulate on instantaneous measurements as well as Feynman's postulate. The quantum systems that we study include spin systems. These results are applied to formulations of the multi-split experiments, the quantum Zeno and the Aharanov-Bohm effects.

翻訳日:2023-11-27 23:40:56 公開日:2023-11-23

# 荷電マクロ分子を用いた連続自発局在モデルの試験

Testing Continuous Spontaneous Localization model with charged macro-molecules ( http://arxiv.org/abs/2311.13966v1 )

ライセンス: Link先を確認

Emil Lenler-Eriksen and Michael Drewsen and Matteo Carlesso

(参考訳) この10年間で、波動関数の自発的崩壊モデル(崩壊モデルとも呼ばれる)への関心が高まった。彼らは、schr\"odinger進化を適切に修正することにより、よく知られた量子測定問題をコヒーレントに解く。量子実験は現在、そのようなモデルをテストする(したがって量子理論の限界をテストする)範囲内にある。そこで本研究では,線形ポールトラップに閉じ込められた2イオンを用いた試験手法を提案する。原子イオンと高分子イオンの組み合わせは、それぞれの崩壊機構における運動の自由度と不可分な洞察の冷却に好適である。

In the last decade, a growing interest has been devoted to models of spontaneous collapse of the wavefunction, known also as collapse models. They coherently solve the well-known quantum measurement problem by suitably modifying the Schr\"odinger evolution. Quantum experiments are now finally within the reach of testing such models (and thus testing the limits of quantum theory). Here, we propose a method based on a two-ions confined in a linear Paul trap to possibly enhance the testing capabilities of such experiments. The combination of an atomic and a macromolecular ion provide a good match for the cooling of the motional degrees of freedom and a non-negligible insight in the collapse mechanism, respectively.

翻訳日:2023-11-27 23:39:59 公開日:2023-11-23

# DPSUR: 選択的更新とリリースによる個人性確率勾配の高速化

DPSUR: Accelerating Differentially Private Stochastic Gradient Descent Using Selective Update and Release ( http://arxiv.org/abs/2311.14056v1 )

ライセンス: Link先を確認

Jie Fu, Qingqing Ye, Haibo Hu, Zhili Chen, Lulu Wang, Kuncan Wang, Ran Xun

(参考訳) マシンラーニングモデルは、トレーニング損失を減らすためにプライベートデータを記憶することが知られており、モデルインバージョンやメンバシップ推論といったプライバシ攻撃によって不注意に悪用される可能性がある。これらの攻撃から保護するために、差分プライバシー(dp)は、特にdpsgdのような確率的勾配降下を用いた一般的なトレーニングアルゴリズムにおいて、プライバシ保存機械学習のデファクトスタンダードとなっている。それでも、DPSGDは、収束が遅いために、依然として深刻なユーティリティー損失に悩まされている。これは、勾配にバイアスとばらつきをもたらすランダムサンプリングと、勾配更新の変動を引き起こすガウスノイズによって部分的に引き起こされる。これらの問題に対処するための重要なアイデアは、モデルトレーニングに選択的に更新を適用することです。そこで本研究では,各イテレーションからの勾配を検証テストに基づいて評価し,収束に至る更新のみをモデルに適用する,選択的更新とリリースに基づく差分プライベートなトレーニングフレームワークdpsurを提案する。したがって、DPSURは正しい方向のトレーニングを確実にし、DPSGDよりも早く収束することができる。主な課題は2つの側面にある – 勾配評価に起因するプライバシの懸念と、モデル更新のための勾配選択戦略だ。この課題に対処するため、DPSURは、更新ランダム化のためのクリッピング戦略と、勾配選択のためのしきい値メカニズムを導入した。 MNIST、FMNIST、CIFAR-10、IMDBのデータセットで行った実験では、DPSURは収束速度とモデルユーティリティの点で、従来よりも大幅に優れていた。

Machine learning models are known to memorize private data to reduce their training loss, which can be inadvertently exploited by privacy attacks such as model inversion and membership inference. To protect against these attacks, differential privacy (DP) has become the de facto standard for privacy-preserving machine learning, particularly those popular training algorithms using stochastic gradient descent, such as DPSGD. Nonetheless, DPSGD still suffers from severe utility loss due to its slow convergence. This is partially caused by the random sampling, which brings bias and variance to the gradient, and partially by the Gaussian noise, which leads to fluctuation of gradient updates. Our key idea to address these issues is to apply selective updates to the model training, while discarding those useless or even harmful updates. Motivated by this, this paper proposes DPSUR, a Differentially Private training framework based on Selective Updates and Release, where the gradient from each iteration is evaluated based on a validation test, and only those updates leading to convergence are applied to the model. As such, DPSUR ensures the training in the right direction and thus can achieve faster convergence than DPSGD. The main challenges lie in two aspects -- privacy concerns arising from gradient evaluation, and gradient selection strategy for model update. To address the challenges, DPSUR introduces a clipping strategy for update randomization and a threshold mechanism for gradient selection. Experiments conducted on MNIST, FMNIST, CIFAR-10, and IMDB datasets show that DPSUR significantly outperforms previous works in terms of convergence speed and model utility.

翻訳日:2023-11-27 23:33:02 公開日:2023-11-23

# 量子行列乗法の改良法

An Improved Method for Quantum Matrix Multiplication ( http://arxiv.org/abs/2311.14044v1 )

ライセンス: Link先を確認

Nhat A. Nghiem and Tzu-Chieh Wei

(参考訳) 線形方程式を解くための有名な量子アルゴリズム(いわゆるHHLアルゴリズム)に続いて、Childs, Kothari and Somma (SIAM Journal on Computing, {\bf 46}: 1920, (2017))は、精度への依存を指数的に改善した線形方程式系の解法を提供した。本稿では、チェビシェフ多項式のアプローチに基づいて、行列をある量子状態に適用するためのそのような結果を補うことを目的とする。この応用を動機づけるいくつかの例を含め、この改良された行列応用アルゴリズムを効率の良い量子プロシージャで明示的に適用することをさらに議論する。

Following the celebrated quantum algorithm for solving linear equations (so-called HHL algorithm), Childs, Kothari and Somma [SIAM Journal on Computing, {\bf 46}: 1920, (2017)] provided an approach to solve a linear system of equations with exponentially improved dependence on precision. In this note, we aim to complement such a result for applying a matrix to some quantum state, based upon their Chebyshev polynomial approach. A few examples that motivate this application are included and we further discuss an application of this improved matrix application algorithm explicitly with an efficient quantum procedure.

翻訳日:2023-11-27 23:32:33 公開日:2023-11-23

# AdapterFL:資源制約型モバイルコンピューティングシステムのための適応的不均一フェデレーション学習

AdapterFL: Adaptive Heterogeneous Federated Learning for Resource-constrained Mobile Computing Systems ( http://arxiv.org/abs/2311.14037v1 )

ライセンス: Link先を確認

Ruixuan Liu and Ming Hu and Zeke Xia and Jun Xia and Pengyu Zhang and Yihao Huang and Yang Liu and Mingsong Chen

(参考訳) Federated Learning (FL)は、データ共有なしで大規模分散クライアントの協調学習を可能にする。しかし,大規模モバイル機器間での計算資源の格差から,従来の均質モデルに基づくフェデレーション学習(fl)の性能は極めて限られている。一方で、すべての多様なクライアントでモデルトレーニングを実現するために、モバイルコンピューティングシステムは、協調学習に小さな低パフォーマンスモデルしか使えない。一方,高い計算資源を持つデバイスは,不十分な生データで高性能な大規模モデルを訓練することはできない。本稿では,モバイル・コンピューティング・システムにおける資源制約問題に対処するために,モデル再構成戦略を用いて大規模異種モバイルデバイスの協調学習を適応的に行う,adapterflと呼ばれる新しいヘテロジニアスflアプローチを提案する。具体的には,大規模モバイルデバイスの計算性能に基づいて複数の候補異種モデルを選択し,各異種モデルを2つのパーティションに分割する。分割を再組み立てることで、大きなモデルの部分パラメータと小さなモデルの部分パラメータを組み合わせることで、さまざまなサイズのモデルを生成することができる。これらの再組み立てモデルを用いてflトレーニングを行い、低パフォーマンスデバイスを用いて大規模モデルの部分パラメータを訓練する。このように、資源制約による大規模モデルの性能劣化を軽減することができる。実験の結果,AdapterFLは資源制約のあるシナリオにおいて,最先端の不均一なフェデレーション学習手法と比較して最大12倍の精度向上を実現可能であることがわかった。

Federated Learning (FL) enables collaborative learning of large-scale distributed clients without data sharing. However, due to the disparity of computing resources among massive mobile computing devices, the performance of traditional homogeneous model-based Federated Learning (FL) is seriously limited. On the one hand, to achieve model training in all the diverse clients, mobile computing systems can only use small low-performance models for collaborative learning. On the other hand, devices with high computing resources cannot train a high-performance large model with their insufficient raw data. To address the resource-constrained problem in mobile computing systems, we present a novel heterogeneous FL approach named AdapterFL, which uses a model reassemble strategy to facilitate collaborative training of massive heterogeneous mobile devices adaptively. Specifically, we select multiple candidate heterogeneous models based on the computing performance of massive mobile devices and then divide each heterogeneous model into two partitions. By reassembling the partitions, we can generate models with varied sizes that are combined by the partial parameters of the large model with the partial parameters of the small model. Using these reassembled models for FL training, we can train the partial parameters of the large model using low-performance devices. In this way, we can alleviate performance degradation in large models due to resource constraints. The experimental results show that AdapterFL can achieve up to 12\% accuracy improvement compared to the state-of-the-art heterogeneous federated learning methods in resource-constrained scenarios.

翻訳日:2023-11-27 23:32:19 公開日:2023-11-23

# 量子コンピューティングアプローチによる高スピンモデルの2次元コヒーレントスペクトル

Two-dimensional coherent spectrum of high-spin models via a quantum computing approach ( http://arxiv.org/abs/2311.14035v1 )

ライセンス: Link先を確認

Martin Mootz, Peter P. Orth, Chuankun Huang, Liang Luo, Jigang Wang, and Yong-Xin Yao

(参考訳) 本稿では,高スピンモデルの2次元コヒーレントスペクトル(2DCS)を計算するための量子コンピューティング手法を提案する。我々のアプローチは、複数の磁場パルスの存在下でのリアルタイムダイナミクスをシミュレートすることに基づいている。適応型変動量子力学シミュレーション(AVQDS)アルゴリズムを,その小型回路による研究に利用し,周波数空間の必要な分解能を達成するために,十分に長時間のシミュレーションを可能にする。具体的には、dzyaloshinskii-moriya相互作用と単イオン異方性を含む反強磁性量子スピンモデルを考える。得られた2DCSスペクトルは、未摂動ハミルトニアンの異なる固有状態間の遷移から生じるマグノン周波数の倍数の異なるピークを示す。 1次元コヒーレントスペクトルを2DCSと比較することにより、2DCSがエネルギースペクトルの高分解能を提供することを示す。さらに、高スピン演算子の2つの異なるバイナリエンコーディング(標準バイナリエンコーディングとグレイ符号)を用いて、スピンの大きさで量子資源がスケールする方法について検討する。低磁場ではどちらのエンコーディングも同等の量子リソースを必要とするが、大きな磁場ではグレーコードは有利である。最後に,2DCSの数値計算結果と希土類オルソフェリット系の実験結果を比較した。量子高スピンモデルの2dcsにおけるマグノニック高ハーモニック発生信号の観測強度は実験データとよく一致し、対応する平均場結果に対して有意な改善を示した。

We present and benchmark a quantum computing approach to calculate the two-dimensional coherent spectrum (2DCS) of high-spin models. Our approach is based on simulating their real-time dynamics in the presence of several magnetic field pulses, which are spaced in time. We utilize the adaptive variational quantum dynamics simulation (AVQDS) algorithm for the study due to its compact circuits, which enables simulations over sufficiently long times to achieve the required resolution in frequency space. Specifically, we consider an antiferromagnetic quantum spin model that incorporates Dzyaloshinskii-Moriya interactions and single-ion anisotropy. The obtained 2DCS spectra exhibit distinct peaks at multiples of the magnon frequency, arising from transitions between different eigenstates of the unperturbed Hamiltonian. By comparing the one-dimensional coherent spectrum with 2DCS, we demonstrate that 2DCS provides a higher resolution of the energy spectrum. We further investigate how the quantum resources scale with the magnitude of the spin using two different binary encodings of the high-spin operators: the standard binary encoding and the Gray code. At low magnetic fields both encodings require comparable quantum resources, but at larger field strengths the Gray code is advantageous. Lastly, we compare the numerical 2DCS with experimental results on a rare-earth orthoferrite system. The observed strength of the magnonic high-harmonic generation signals in the 2DCS of the quantum high-spin model aligns well with the experimental data, showing significant improvement over the corresponding mean-field results.

翻訳日:2023-11-27 23:31:56 公開日:2023-11-23

# 正規化フローを用いた日頭電力価格の多変量シナリオ生成

Multivariate Scenario Generation of Day-Ahead Electricity Prices using Normalizing Flows ( http://arxiv.org/abs/2311.14033v1 )

ライセンス: Link先を確認

Hannes Hilger, Dirk Witthaut, Manuel Dahmen, Leonardo Rydin Gorjao, Julius Trebbien, Eike Cramer

(参考訳) 電気市場の取引には、電気価格の実現と予測に付随する不確実性に関する正確な情報が必要である。本稿では,完全データ駆動型深層生成モデルである正規化フローを用いた日頭電力価格の確率的予測手法を提案する。モデル手法は,残負荷予測などの条件的特徴に基づく日頭電力価格の1日当たりのシナリオを生成する。さらに, 先行実現のための拡張的特徴セットと, 正規化フローを現代電力市場の変動条件に適応させる定期的再訓練方式を提案する。特に、ロシアによるウクライナ侵攻に伴うエネルギー危機の影響について調査する。その結果,正規化フローは真の価格分布を再現し,高精度な予測を行う高品質なシナリオを生成することがわかった。さらに,環境変化における適応性の改善は,市場状況の変化に正規化フローを適応させ,高品質な日頭価格シナリオを継続的にサンプリングすることを可能にしている。

Trading on electricity markets requires accurate information about the realization of electricity prices and the uncertainty attached to the predictions. We present a probabilistic forecasting approach for day-ahead electricity prices using the fully data-driven deep generative model called normalizing flows. Our modeling approach generates full-day scenarios of day-ahead electricity prices based on conditional features such as residual load forecasts. Furthermore, we propose extended feature sets of prior realizations and a periodic retraining scheme that allows the normalizing flow to adapt to the changing conditions of modern electricity markets. In particular, we investigate the impact of the energy crisis ensuing from the Russian invasion of Ukraine. Our results highlight that the normalizing flow generates high-quality scenarios that reproduce the true price distribution and yield highly accurate forecasts. Additionally, our analysis highlights how our improvements towards adaptations in changing regimes allow the normalizing flow to adapt to changing market conditions and enables continued sampling of high-quality day-ahead price scenarios.

翻訳日:2023-11-27 23:31:31 公開日:2023-11-23

# 効率的なプライバシー保護のためのPrivateLoRA

PrivateLoRA For Efficient Privacy Preserving LLM ( http://arxiv.org/abs/2311.14030v1 )

ライセンス: Link先を確認

Yiming Wang, Yu Lin, Xiaodong Zeng, Guannan Zhang

(参考訳) エンドユーザは、現在のLarge Language Model(LLM)サービスのパラダイムにおいて、プライバシと効率の選択肢に直面します。クラウドベースのパラダイムでは、ユーザは生成品質と処理速度のためにデータのローカリティを妥協せざるを得ない。逆にエッジデバイスのパラダイムはデータのローカリティを維持しているが、十分なパフォーマンスを提供できない。本研究では,エッジデバイス上にプライバシに敏感な計算を分散し,クラウド上での共有計算を行うLLMサービスパラダイムを提案する。アクティベーションだけが中央クラウドとエッジデバイス間で送信され、データのローカリティが保証される。私たちの中心となるイノベーションであるPrivateLoRAは、残余アクティベーションの低いレベルを活用し、95%以上の通信削減を実現することで、困難な通信オーバーヘッドに対処しています。その結果、PrivateLoRAはデータのローカリティを効果的に維持し、非常にリソース効率が高い。標準的な5gネットワークでは、privateloraは7bモデルではデバイスのみのソリューションの300%、33bモデルではa100 gpuの80%以上のスループットを実現している。 PrivateLoRAはまた、高度なパーソナライゼーションのためのLoRAに匹敵するチューニングパフォーマンスを提供する。我々のアプローチは、最先端デバイスのための最先端のジェネレーティブAIへのアクセスを民主化し、一般向けによりカスタマイズされたLLM体験を実現する。我々の知る限り、我々の提案するフレームワークは文献における最初の効率的かつプライバシー保護のLLMソリューションである。

End users face a choice between privacy and efficiency in current Large Language Model (LLM) service paradigms. In cloud-based paradigms, users are forced to compromise data locality for generation quality and processing speed. Conversely, edge device paradigms maintain data locality but fail to deliver satisfactory performance. In this work, we propose a novel LLM service paradigm that distributes privacy-sensitive computation on edge devices and shared computation in the cloud. Only activations are transmitted between the central cloud and edge devices to ensure data locality. Our core innovation, PrivateLoRA, addresses the challenging communication overhead by exploiting the low rank of residual activations, achieving over 95% communication reduction. Consequently, PrivateLoRA effectively maintains data locality and is extremely resource efficient. Under standard 5G networks, PrivateLoRA achieves throughput over 300% of device-only solutions for 7B models and over 80% of an A100 GPU for 33B models. PrivateLoRA also provides tuning performance comparable to LoRA for advanced personalization. Our approach democratizes access to state-of-the-art generative AI for edge devices, paving the way for more tailored LLM experiences for the general public. To our knowledge, our proposed framework is the first efficient and privacy-preserving LLM solution in the literature.

翻訳日:2023-11-27 23:31:15 公開日:2023-11-23

# 画像圧縮におけるCLIPの脆弱性の理解

Understanding the Vulnerability of CLIP to Image Compression ( http://arxiv.org/abs/2311.14029v1 )

ライセンス: Link先を確認

Cangxiong Chen, Vinay P. Namboodiri, Julian Padget

(参考訳) CLIPは、ゼロショット画像認識やその他の画像テキストアライメントタスクに使用される、基礎的な視覚言語モデルである。圧縮条件下での画質変化に対してCLIPは脆弱であることを示す。この驚くべき結果は帰属法統合勾配を用いてさらに解析される。この属性法を用いることで,圧縮がゼロショット認識精度に影響を及ぼす性質を定量的かつ定性的に理解することができる。 CIFAR-10とSTL-10で広く評価した。私たちの研究は、CLIPのこの脆弱性を理解する基盤を提供し、CLIPや他のビジョン言語モデルの堅牢性を改善するためのより効果的な方法の開発に役立つ。

CLIP is a widely used foundational vision-language model that is used for zero-shot image recognition and other image-text alignment tasks. We demonstrate that CLIP is vulnerable to change in image quality under compression. This surprising result is further analysed using an attribution method-Integrated Gradients. Using this attribution method, we are able to better understand both quantitatively and qualitatively exactly the nature in which the compression affects the zero-shot recognition accuracy of this model. We evaluate this extensively on CIFAR-10 and STL-10. Our work provides the basis to understand this vulnerability of CLIP and can help us develop more effective methods to improve the robustness of CLIP and other vision-language models.

翻訳日:2023-11-27 23:30:52 公開日:2023-11-23

# 生成的蒸留を伴う拡散モデルの連続学習

Continual Learning of Diffusion Models with Generative Distillation ( http://arxiv.org/abs/2311.14028v1 )

ライセンス: Link先を確認

Sergi Masip, Pau Rodriguez, Tinne Tuytelaars, Gido M. van de Ven

(参考訳) 拡散モデルは画像合成などのタスクで最先端のパフォーマンスを達成する強力な生成モデルである。しかし、訓練には大量のデータと計算資源が必要である。継続的な学習は、新しいタスクを漸進的に学習し、知識を蓄積することを可能にする。ここでは、以前のタスクで訓練された生成モデルのコピーが、現在のタスクのデータとインターリーブされた合成データを生成する。しかし、拡散モデルに適用される標準的な生成リプレイは、消音能力の壊滅的な損失をもたらす。本稿では,拡散モデルの逆過程全体を拡散する生成蒸留法を提案する。本手法は,計算コストを緩やかに増やすだけで,生成リプレイの連続学習性能を大幅に向上させることを実証する。

Diffusion models are powerful generative models that achieve state-of-the-art performance in tasks such as image synthesis. However, training them demands substantial amounts of data and computational resources. Continual learning would allow for incrementally learning new tasks and accumulating knowledge, thus reusing already trained models would be possible. One potentially suitable approach is generative replay, where a copy of a generative model trained on previous tasks produces synthetic data that are interleaved with data from the current task. However, standard generative replay applied to diffusion models results in a catastrophic loss in denoising capabilities. In this paper, we propose generative distillation, an approach that distils the entire reverse process of a diffusion model. We demonstrate that our approach significantly improves the continual learning performance of generative replay with only a moderate increase in the computational costs.

翻訳日:2023-11-27 23:30:40 公開日:2023-11-23

# クラウド光厚推定のための合成データセットの作成とベンチマーク

Creating and Benchmarking a Synthetic Dataset for Cloud Optical Thickness Estimation ( http://arxiv.org/abs/2311.14024v1 )

ライセンス: Link先を確認

Aleksis Pirinen, Nosheen Abid, Nuria Agues Paszkowsky, Thomas Ohlson Timoudas, Ronald Scheirer, Chiara Ceccobello, Gy\"orgy Kov\'acs, Anders Persson

(参考訳) 雲の形成はしばしば衛星による地球表面の観測を曖昧にし、土地被覆マッピング、海洋色分析、作物のモニタリングなどの地球観測(eo)活動を制限する。リモートセンシング領域における機械学習(ML)メソッドの統合は、クラウド検出やフィルタリングを含む幅広いEOタスクのパフォーマンスを大幅に向上させたが、まだ改善の余地がたくさんある。重要なボトルネックは、一般的にMLメソッドがトレーニングのために大量のアノテートされたデータに依存していることだ。これは特に、雲の光学的厚さ(COT)の推定に当てはまる。 COTの信頼性の高い推定は、実際に一般的に行われているように、事前に特定されたクラウドカテゴリを使用する場合と比較して、よりきめ細かいアプリケーション依存の制御を可能にする。そこで本研究では,sentinel-2 プラットフォームに搭載されたマルチスペクトラルインスツルメンツ (msi) センサのスペクトル帯域12について,上層大気放射をシミュレートした,cot推定のための新しい合成データセットを提案する。これらのデータポイントは、異なる雲の種類、COT、地表および大気プロファイルを考慮してシミュレーションされている。スペクトル帯域の反射率の測定値からCOTを予測するためのMLモデルの大規模な実験により,提案したデータセットの有用性が示された。実データへの一般化は、2つの衛星画像データセットでも実証されています。合成データ、新たに収集された実際のデータセット、コード、モデルはhttps://github.com/aleksispi/ml-cloud-opt-thickで公開されている。

Cloud formations often obscure optical satellite-based monitoring of the Earth's surface, thus limiting Earth observation (EO) activities such as land cover mapping, ocean color analysis, and cropland monitoring. The integration of machine learning (ML) methods within the remote sensing domain has significantly improved performance on a wide range of EO tasks, including cloud detection and filtering, but there is still much room for improvement. A key bottleneck is that ML methods typically depend on large amounts of annotated data for training, which is often difficult to come by in EO contexts. This is especially true for the task of cloud optical thickness (COT) estimation. A reliable estimation of COT enables more fine-grained and application-dependent control compared to using pre-specified cloud categories, as is commonly done in practice. To alleviate the COT data scarcity problem, in this work we propose a novel synthetic dataset for COT estimation, where top-of-atmosphere radiances have been simulated for 12 of the spectral bands of the Multi-Spectral Instrument (MSI) sensor onboard Sentinel-2 platforms. These data points have been simulated under consideration of different cloud types, COTs, and ground surface and atmospheric profiles. Extensive experimentation of training several ML models to predict COT from the measured reflectivity of the spectral bands demonstrates the usefulness of our proposed dataset. Generalization to real data is also demonstrated on two satellite image datasets -- one that is publicly available, and one which we have collected and annotated. The synthetic data, the newly collected real dataset, code and models have been made publicly available at https://github.com/aleksispi/ml-cloud-opt-thick.

翻訳日:2023-11-27 23:30:29 公開日:2023-11-23

# 境界状態強化量子メトロロジーの量子シミュレーション

Quantum Simulation of Bound-State-Enhanced Quantum Metrology ( http://arxiv.org/abs/2311.14020v1 )

ライセンス: Link先を確認

Cheng-Ge Liu and Cong-Wei Lu and Na-Na Zhang and Qing Ai

(参考訳) 量子気象学は量子効果を探求し、古典的な限界を超える物理量の測定精度を向上させる。しかし,システムと環境の相互作用により,デコヒーレンスは測定精度を著しく低下させることができる。長期限界における測定精度を回復するための多くの手法が提案されている。最近、境界状態は誤差のない測定を補助し、$t^{-1}$スケーリング(K. Bai, Z. Peng, H. G. Luo, J. H. An, Phys. Rev. Lett. 123, 040402 (2019))]を回復できることがわかった。ここでは、$N$-qubitsを用いて、1つの原子と結合共振器を含むハイブリッドシステムのオープン量子力学をシミュレートする手法を提案する。境界状態の存在により時間が増えるにつれて測定の誤差がなくなることが判明した。解析的および数値的シミュレーションにより, ハイブリッド系に境界状態が存在する場合, 測定誤差の$t^{-1}$スケーリングが再現可能であることを証明した。興味深いことに、原子遷移周波数の評価に使用できる完璧な振動が存在することが観察される。有限$N$の場合、完全振動の持続時間は、もう1つのキュービットが関与するにつれて2倍になる。

Quantum metrology explores quantum effects to improve the measurement accuracy of some physical quantities beyond the classical limit. However, due to the interaction between the system and the environment, the decoherence can significantly reduce the accuracy of the measurement. Many methods have been proposed to restore the accuracy of the measurement in the long-time limit. Recently, it has been found that the bound state can assist the error-free measurement and recover the $t^{-1}$ scaling [K. Bai, Z. Peng, H. G. Luo, and J. H. An, Phys. Rev. Lett. 123, 040402 (2019)]. Here, by using $N$-qubits, we propose a method to simulate the open quantum dynamics of the hybrid system including one atom and coupled resonators. We find that the error of the measurement can vanish as the time increases due to the existence of the bound state. By both analytical and numerical simulations, we prove the $t^{-1}$ scaling of the measurement error can be recovered when there is a bound state in the hybrid system. Interestingly, we observe that there are perfect oscillations which can be used for the evaluation of the atomic transition frequency. For a finite-$N$, the duration of the perfect oscillations doubles as one more qubit is involved.

翻訳日:2023-11-27 23:29:59 公開日:2023-11-23

# 機械学習アルゴリズムのハイパーパラメータ景観について

On the Hyperparameter Landscapes of Machine Learning Algorithms ( http://arxiv.org/abs/2311.14014v1 )

ライセンス: Link先を確認

Mingyu Huang, Ke Li

(参考訳) 近年、機械学習(ML)モデルのための多くのハイパーパラメータ最適化(HPO)手法が成功しているにもかかわらず、モデルハイパーパラメータ(HP)と予測損失(フィットネス)の間の複雑な相互作用は、HPOを理解する上で重要な前提条件である。これにより、HPOプロセスにおける説明可能性に限界が生じ、人間の信頼の欠如とアルゴリズムのボトルネックの特定が困難になる。本稿では,6 mlモデルと11 モデル以上のモデル構成,67 のデータセットと異なるフィダリティレベルにおいて,1500 hp のロスランドスケープに対して,大規模フィットネスランドスケープ分析 (fla) を行うことにより,このブラックボックスに光を当てる。我々は、その地形のスムーズさ、中立性、モダリティの観点から、最初の統一された総合的な肖像画を明らかにする。また,このような特性はデータセットやフィディティー間で高い転送性を有しており,マルチ忠実度と転送学習手法の成功の基本的な証拠となっている。これらの発見は、視覚的および定量的な指標を組み合わせた専用のFLAフレームワークを開発することで可能となる。我々は、NAS-Bench-101のランドスケープを分析して、このフレームワークの可能性をさらに実証し、幅広いAutoMLタスクの基本的な理解をファシリケートできると考えている。

Despite the recent success in a plethora of hyperparameter optimization (HPO) methods for machine learning (ML) models, the intricate interplay between model hyperparameters (HPs) and predictive losses (a.k.a fitness), which is a key prerequisite for understanding HPO, remain notably underexplored in our community. This results in limited explainability in the HPO process, rendering a lack of human trust and difficulties in pinpointing algorithm bottlenecks. In this paper, we aim to shed light on this black box by conducting large-scale fitness landscape analysis (FLA) on 1,500 HP loss landscapes of 6 ML models with more than 11 model configurations, across 67 datasets and different levels of fidelities. We reveal the first unified, comprehensive portrait of their topographies in terms of smoothness, neutrality and modality. We also show that such properties are highly transferable across datasets and fidelities, providing fundamental evidence for the success of multi-fidelity and transfer learning methods. These findings are made possible by developing a dedicated FLA framework that incorporates a combination of visual and quantitative measures. We further demonstrate the potential of this framework by analyzing the NAS-Bench-101 landscape, and we believe it is able to faciliate fundamental understanding of a broader range of AutoML tasks.

翻訳日:2023-11-27 23:29:34 公開日:2023-11-23

# シャドー:シームズネットワークにおける効率的なトレーニングのための新しい損失関数

Shadow: A Novel Loss Function for Efficient Training in Siamese Networks ( http://arxiv.org/abs/2311.14012v1 )

ライセンス: Link先を確認

Alif Elham Khan, Mohammad Junayed Hasan, Humayra Anjum, Nabeel Mohammed

(参考訳) 最近の類似性検出タスクの大幅な進歩にもかかわらず、既存のアプローチはメモリ制約下で大きな課題をもたらす。この主な理由の1つは、シームズネットワークにおけるトリプルト損失のような計算コストのかかるメトリック学習損失関数を使用することである。本稿では,損失計算中の埋め込み空間の次元を,性能を損なわずに圧縮するシャドウロスと呼ばれる新しい損失関数を提案する。埋め込みの射影間の距離は、距離がクラス類似性の測度と直接一致するコンパクト射影空間上の入力から学習される。低次元射影空間を投影すると、損失関数はより早く収束し、その結果、分類された画像クラスターはクラス間距離が高く、クラス内距離も小さい。シャドウロスはメモリ制約デバイスを好む埋め込み次元を減らすだけでなく、さまざまなデータセットで5\%-10\%の精度で最先端のトリプルトマージンロスよりも一貫してパフォーマンスが向上する。提案した損失関数はモデル非依存であり、いくつかの試験されたモデルで性能を向上する。バランスのとれた、不均衡な、医療的、非医療的なイメージデータセットにおけるその効果と堅牢性は、特定のモデルやデータセットに固有のものではなく、メモリと計算量が少なく、一貫して優れたパフォーマンスを示すことを示唆している。

Despite significant recent advances in similarity detection tasks, existing approaches pose substantial challenges under memory constraints. One of the primary reasons for this is the use of computationally expensive metric learning loss functions such as Triplet Loss in Siamese networks. In this paper, we present a novel loss function called Shadow Loss that compresses the dimensions of an embedding space during loss calculation without loss of performance. The distance between the projections of the embeddings is learned from inputs on a compact projection space where distances directly correspond to a measure of class similarity. Projecting on a lower-dimension projection space, our loss function converges faster, and the resulting classified image clusters have higher inter-class and smaller intra-class distances. Shadow Loss not only reduces embedding dimensions favoring memory constraint devices but also consistently performs better than the state-of-the-art Triplet Margin Loss by an accuracy of 5\%-10\% across diverse datasets. The proposed loss function is also model agnostic, upholding its performance across several tested models. Its effectiveness and robustness across balanced, imbalanced, medical, and non-medical image datasets suggests that it is not specific to a particular model or dataset but demonstrates superior performance consistently while using less memory and computation.

翻訳日:2023-11-27 23:29:11 公開日:2023-11-23

# 非慣性フレームにおけるW状態の量子条件相互情報

Quantum conditional mutual information of W state in non-inertial frames ( http://arxiv.org/abs/2311.14010v1 )

ライセンス: Link先を確認

H Saveetha, Peter P. Rohde and R Chandrashekar

(参考訳) 量子条件相互情報(Quantum Conditional mutual information, QCMI)は、多元的情報理論の尺度である。これは、第3のキュービットの観点から、2つのキュービット間の相関の量を見つけるために用いられる。この研究では、量子ビットの一部が加速運動下にあるとき、三分割W状態のQCMIを特徴づける。ここでは, 単一モード近似における無質量フェルミオン場について検討する。量子ビットの加速度に関して可能なすべての状況を考える。この結果から、QCMIは加速される量子ビットの役割に応じて増大または減少する可能性がある。最後に, 分離状態と分離状態について検討し, qcmiと相関関係について考察する。

Quantum conditional mutual information (QCMI) is a versatile information theoretic measure. It is used to find the amount of correlations between two qubits from the perspective of a third qubit. In this work we characterise the QCMI of tripartite W-states when some of the qubits are under accelerated motion. Here for our investigations we consider a massless fermionic field in the single mode approximation. We consider all possible situations with respect to acceleration of the qubits. From our results we observe that QCMI can either increase or decrease depending on the role of the qubit being accelerated. Finally we discuss the connection between QCMI and correlations by studying the biseparable and separable states.

翻訳日:2023-11-27 23:28:46 公開日:2023-11-23

# Sentinel-1とSentinel-2から得られた高分解能人口地図

High-resolution Population Maps Derived from Sentinel-1 and Sentinel-2 ( http://arxiv.org/abs/2311.14006v1 )

ライセンス: Link先を確認

Nando Metzger, Rodrigo Caye Daudt, Devis Tuia, Konrad Schindler

(参考訳) 詳細な人口地図は人道行動から都市計画まで様々な分野で重要な役割を果たしている。このような地図をタイムリーかつスケーラブルに生成することは、特にデータスカース領域において課題となる。そこで我々は,Sentinel-1 と Sentinel-2 の衛星画像のみを無償で利用できる人口マッピング手法であるPOPCORN を開発した。最小限のデータ要求にもかかわらず、我々のアプローチは既存のスキームのマッピング精度を超えています。例えば、400人未満の地域国勢調査に基づいて100m GSDでルワンダの人口地図を作成できた。キガリでは、これらの地図はw.r.t.地上真理参照地図の66%の$r^2$スコアに達し、平均誤差は1haあたり$pm$10である。同時に、POPCORNは、ビルトアップされた地域の明示的な地図と、地元の建物占有率を検索し、マッピングプロセスが解釈可能となり、例えば、工業倉庫のような、人口の少ない地域の分布に関する追加の洞察を提供する。さらに、一度訓練すると、人口の変化を追跡するためにモデルが繰り返し適用され、ウガンダからルワンダまで、地理的に類似した地域へ移行できることがわかった。本研究の目的は,特に人口動態の強い地域では,費用のかかるマイクロセンサスキャンペーンの資源が不足している可能性があることを認識して,最新の高解像度人口地図へのアクセスを民主化することにある。

Detailed population maps play an important role in diverse fields ranging from humanitarian action to urban planning. Generating such maps in a timely and scalable manner presents a challenge, especially in data-scarce regions. To address it we have developed POPCORN, a population mapping method whose only inputs are free, globally available satellite images from Sentinel-1 and Sentinel-2; and a small number of aggregate population counts over coarse census districts for calibration. Despite the minimal data requirements our approach surpasses the mapping accuracy of existing schemes, including several that rely on building footprints derived from high-resolution imagery. E.g., we were able to produce population maps for Rwanda with 100m GSD based on less than 400 regional census counts. In Kigali, those maps reach an $R^2$ score of 66% w.r.t. a ground truth reference map, with an average error of only $\pm$10 inhabitants/ha. Conveniently, POPCORN retrieves explicit maps of built-up areas and of local building occupancy rates, making the mapping process interpretable and offering additional insights, for instance about the distribution of built-up, but unpopulated areas, e.g., industrial warehouses. Moreover, we find that, once trained, the model can be applied repeatedly to track population changes; and that it can be transferred to geographically similar regions, e.g., from Uganda to Rwanda). With our work we aim to democratize access to up-to-date and high-resolution population maps, recognizing that some regions faced with particularly strong population dynamics may lack the resources for costly micro-census campaigns.

翻訳日:2023-11-27 23:28:38 公開日:2023-11-23

# サイドチャネル攻撃が組込み人工知能のブラックボックス特性を破る時

When Side-Channel Attacks Break the Black-Box Property of Embedded Artificial Intelligence ( http://arxiv.org/abs/2311.14005v1 )

ライセンス: Link先を確認

Benoit Coqueret, Mathieu Carbone, Olivier Sentieys, Gabriel Zaid

(参考訳) 人工知能、特にディープニューラルネットワーク(DNN)は、特定の広告からオブジェクト検出まで、いくつかのタスクの標準として過去10年間に急速に登場した。提供された性能はDNNアルゴリズムを重要な組み込みシステムの一部にし、効率と信頼性の両方を必要とした。特に、DNNは、人間の観察者にとって検出不能であると同時に、ネットワークを騙すために設計された悪意のある例である。以前の研究では、このような攻撃をブラックボックスの設定で実装するためのフレームワークが提案されていたが、攻撃者がニューラルネットワークのロジットにアクセスでき、従来のブラックボックスの仮定を破る、という仮説に依存することが多い。本稿では,攻撃者がロジットにアクセスできない,本物のブラックボックスのシナリオについて検討する。特に,ロジットを抽出してこの制約を解決するアーキテクチャ非依存攻撃を提案する。ハードウェアとソフトウェアを併用して,電磁的漏れを利用して入力のログを抽出し,攻撃者が勾配を推定し,最先端の敵の例を生成して標的のニューラルネットワークを騙す,サイドチャネル攻撃を行う。この逆攻撃の例を通じて,より一般的な攻撃フレームワークにおいて,サイドチャネルを用いたロジット抽出が,ロジットか信頼度スコアのいずれかを必要とする第一歩として有効であることを示す。

Artificial intelligence, and specifically deep neural networks (DNNs), has rapidly emerged in the past decade as the standard for several tasks from specific advertising to object detection. The performance offered has led DNN algorithms to become a part of critical embedded systems, requiring both efficiency and reliability. In particular, DNNs are subject to malicious examples designed in a way to fool the network while being undetectable to the human observer: the adversarial examples. While previous studies propose frameworks to implement such attacks in black box settings, those often rely on the hypothesis that the attacker has access to the logits of the neural network, breaking the assumption of the traditional black box. In this paper, we investigate a real black box scenario where the attacker has no access to the logits. In particular, we propose an architecture-agnostic attack which solve this constraint by extracting the logits. Our method combines hardware and software attacks, by performing a side-channel attack that exploits electromagnetic leakages to extract the logits for a given input, allowing an attacker to estimate the gradients and produce state-of-the-art adversarial examples to fool the targeted neural network. Through this example of adversarial attack, we demonstrate the effectiveness of logits extraction using side-channel as a first step for more general attack frameworks requiring either the logits or the confidence scores.

翻訳日:2023-11-27 23:28:06 公開日:2023-11-23

# モデル選択におけるクロスバリデーションと変異バリデーションの実証比較

Empirical Comparison between Cross-Validation and Mutation-Validation in Model Selection ( http://arxiv.org/abs/2311.14079v1 )

ライセンス: Link先を確認

Jinyang Yu, Sami Hamdan, Leonard Sasse, Abigail Morrison, Kaustubh R. Patil

(参考訳) 変異検証(MV)は、最近提案されたモデル選択のアプローチであり、広く使われているクロスバリデーション(CV)法と比較して、その特徴と潜在的な利点から重要な関心を集めている。本研究では,ベンチマークと実世界のデータセットを用いて,MVと$k$fold CVを比較した。ベイズ試験を用いて, 実用的等価性, CV優越性, MV優越性の3つの後続確率を推定した。また,選択したモデルの容量と計算効率の差についても検討した。その結果、MVとCVは、様々な機械学習アルゴリズムとベンチマークデータセットの大部分で、実質的に等価な一般化性能を持つモデルを選択することがわかった。 MVはより単純なモデルを選択し、計算コストを下げるという利点を示した。しかし、mvは過度に単純化されたモデルを選択し、過度なパラメータ選択の不安定さを示した。これらのmvの限界は、脳機能的接続を用いて出生時の性別を予測する現実世界の神経科学的タスクの評価においてより顕著となった。

Mutation validation (MV) is a recently proposed approach for model selection, garnering significant interest due to its unique characteristics and potential benefits compared to the widely used cross-validation (CV) method. In this study, we empirically compared MV and $k$-fold CV using benchmark and real-world datasets. By employing Bayesian tests, we compared generalization estimates yielding three posterior probabilities: practical equivalence, CV superiority, and MV superiority. We also evaluated the differences in the capacity of the selected models and computational efficiency. We found that both MV and CV select models with practically equivalent generalization performance across various machine learning algorithms and the majority of benchmark datasets. MV exhibited advantages in terms of selecting simpler models and lower computational costs. However, in some cases MV selected overly simplistic models leading to underfitting and showed instability in hyperparameter selection. These limitations of MV became more evident in the evaluation of a real-world neuroscientific task of predicting sex at birth using brain functional connectivity.

翻訳日:2023-11-27 23:20:23 公開日:2023-11-23

# VLC IoTネットワークのための機械学習に基づく分散TDMA

Machine learning-based decentralized TDMA for VLC IoT networks ( http://arxiv.org/abs/2311.14078v1 )

ライセンス: Link先を確認

Armin Makvandi, Yousef Seifi Kavian

(参考訳) 本稿では,可視光通信(vlc)モノのインターネット(iot)ネットワークのための機械学習に基づく分散時分割多重アクセス(tdma)アルゴリズムを提案する。提案アルゴリズムは強化学習アルゴリズムであるQ-learningに基づいている。本稿では、同期フレームを送信し、他のノードに送信時間スロットを割り当てるコーディネータノードが存在しない分散状態を考える。提案アルゴリズムは同期に分散的手法を用いており,各ノードはQ学習アルゴリズムを用いて衝突のないデータ送信に最適な送信時間スロットを求める。提案アルゴリズムは,本研究所で設計・実装されたVLCハードウェアシステム上に実装されている。評価パラメータは、平均報酬、収束時間、出力、平均遅延、データパケットサイズである。その結果,提案アルゴリズムは高速に収束し,ネットワークに無衝突分散TDMAを提供することがわかった。提案アルゴリズムは、分散VLC IoTネットワークの潜在的選択として、衝突回避によるキャリアセンス多重アクセス(CSMA/CA)アルゴリズムと比較する。その結果,提案アルゴリズムはCSMA/CAよりも最大61%,平均遅延を最大49%低減できることがわかった。

In this paper, a machine learning-based decentralized time division multiple access (TDMA) algorithm for visible light communication (VLC) Internet of Things (IoT) networks is proposed. The proposed algorithm is based on Q-learning, a reinforcement learning algorithm. This paper considers a decentralized condition in which there is no coordinator node for sending synchronization frames and assigning transmission time slots to other nodes. The proposed algorithm uses a decentralized manner for synchronization, and each node uses the Q-learning algorithm to find the optimal transmission time slot for sending data without collisions. The proposed algorithm is implemented on a VLC hardware system, which had been designed and implemented in our laboratory. Average reward, convergence time, goodput, average delay, and data packet size are evaluated parameters. The results show that the proposed algorithm converges quickly and provides collision-free decentralized TDMA for the network. The proposed algorithm is compared with carrier-sense multiple access with collision avoidance (CSMA/CA) algorithm as a potential selection for decentralized VLC IoT networks. The results show that the proposed algorithm provides up to 61% more goodput and up to 49% less average delay than CSMA/CA.

翻訳日:2023-11-27 23:20:07 公開日:2023-11-23

# RetroDiff:多段階分布補間としての再合成

RetroDiff: Retrosynthesis as Multi-stage Distribution Interpolation ( http://arxiv.org/abs/2311.14077v1 )

ライセンス: Link先を確認

Yiming Wang, Yuxuan Song, Minkai Xu, Rui Wang, Hao Zhou, Weiying Ma

(参考訳) 再合成は、化学者が適切な反応分子や決定された生成物分子の合成経路を見つけるのを助けることを目的として、バイオ医薬品の基本的な課題となっている。反応物と積が2Dグラフとして表されるので、逆合成は条件付きグラフからグラフへの生成タスクを構成する。グラフ生成のための離散拡散モデルの最近の進歩に触発されて,この問題に対処する新しい拡散法であるRetro synthesis Diffusion(RetroDiff)を導入する。しかし,本質的な化学反応テンプレート情報を保持しつつ拡散ベースのグラフ・ツー・グラフのフレームワークを統合することは,大きな課題である。私たちの重要な革新は、多段階拡散プロセスを開発することです。本手法では, ダミー分布生成物から最初に外部基を採取し, 生成物と生成物を結合する外部結合を生成するために, 逆合成法を分解する。興味深いことに、このような生成過程は、広く適応された半テンプレート逆合成過程、すなわち反応中心の同定から合成完了までの逆であり、エラーの蓄積を著しく減少させる。評価実験の結果,提案手法が他の準テンプレート法よりも優れていることが示された。

Retrosynthesis poses a fundamental challenge in biopharmaceuticals, aiming to aid chemists in finding appropriate reactant molecules and synthetic pathways given determined product molecules. With the reactant and product represented as 2D graphs, retrosynthesis constitutes a conditional graph-to-graph generative task. Inspired by the recent advancements in discrete diffusion models for graph generation, we introduce Retrosynthesis Diffusion (RetroDiff), a novel diffusion-based method designed to address this problem. However, integrating a diffusion-based graph-to-graph framework while retaining essential chemical reaction template information presents a notable challenge. Our key innovation is to develop a multi-stage diffusion process. In this method, we decompose the retrosynthesis procedure to first sample external groups from the dummy distribution given products and then generate the external bonds to connect the products and generated groups. Interestingly, such a generation process is exactly the reverse of the widely adapted semi-template retrosynthesis procedure, i.e. from reaction center identification to synthon completion, which significantly reduces the error accumulation. Experimental results on the benchmark have demonstrated the superiority of our method over all other semi-template methods.

翻訳日:2023-11-27 23:19:51 公開日:2023-11-23

# タスク指向対話データセットにおけるオープンドメイン対話スニペットの探索

Searching for Snippets of Open-Domain Dialogue in Task-Oriented Dialogue Datasets ( http://arxiv.org/abs/2311.14076v1 )

ライセンス: Link先を確認

Armand Stricker, Patrick Paroubek

(参考訳) 既存の対話コーパスやモデルは2つの主要なカテゴリに適合するように設計されている: タスク指向の対話はレストランの予約や飛行機チケットの予約といった機能目標を表現する。しかし、人間はモードをシームレスに切り替え、chitchatを使ってタスク指向の会話を強化する傾向があります。このギャップを埋めるため、最近新しいデータセットが作成され、コミュニケーションモードと会話の例が混在している。使用されるアプローチは、既存のタスク指向データセットにchit-chatスニペットを追加する傾向がある。しかし、人間で観察される傾向を考えると、後者がchit-chatシークエンスを保持しないのかどうか疑問である。トピックモデリングと、ソーシャルトークに関連するキーワードの集合に最もよく似たトピック検索を用いて、スキーマガイド対話とマルチWOZのトレーニングセットを探索する。本研究は,ソーシャルトークに関連するシーケンスが自然に存在することを示し,chitchatがタスク指向対話に組み合わされる方法に関するさらなる研究を動機付ける。

Most existing dialogue corpora and models have been designed to fit into 2 predominant categories : task-oriented dialogues portray functional goals, such as making a restaurant reservation or booking a plane ticket, while chit-chat/open-domain dialogues focus on holding a socially engaging talk with a user. However, humans tend to seamlessly switch between modes and even use chitchat to enhance task-oriented conversations. To bridge this gap, new datasets have recently been created, blending both communication modes into conversation examples. The approaches used tend to rely on adding chit-chat snippets to pre-existing, human-generated task-oriented datasets. Given the tendencies observed in humans, we wonder however if the latter do not \textit{already} hold chit-chat sequences. By using topic modeling and searching for topics which are most similar to a set of keywords related to social talk, we explore the training sets of Schema-Guided Dialogues and MultiWOZ. Our study shows that sequences related to social talk are indeed naturally present, motivating further research on ways chitchat is combined into task-oriented dialogues.

翻訳日:2023-11-27 23:19:30 公開日:2023-11-23

# 修正から回復を学ぶ

Learning Saliency From Fixations ( http://arxiv.org/abs/2311.14073v1 )

ライセンス: Link先を確認

Yasser Abdelaziz Dahou Djilali, Kevin McGuiness, Noel O'Connor

(参考訳) 本稿では, 画像の並列デコードを利用して, 修正マップからのみ唾液度を学習する手法を提案する。モデルは通常、離散固定写像の最適化の難しさを克服するため、連続的サルマンシー写像に依存する。我々は,saliencyデータセットを生成する実験的なセットアップを再現しようとする。提案手法は,両部マッチングとトランスフォーマーエンコーダ・デコーダアーキテクチャを用いて,一意な固定予測を強制するグローバルな損失を通じて,サリエンシ予測を直接セット予測問題として扱う。学習した修正クエリの固定セットを利用することで、画像特徴に対する横断的理由は、修正点を直接出力し、他の現代の唾液量予測器と区別する。我々のアプローチは、Saliency TRansformer (SalTR) と呼ばれ、SaliconとMIT300ベンチマークの最先端のアプローチと同等のスコアを得る。

We present a novel approach for saliency prediction in images, leveraging parallel decoding in transformers to learn saliency solely from fixation maps. Models typically rely on continuous saliency maps, to overcome the difficulty of optimizing for the discrete fixation map. We attempt to replicate the experimental setup that generates saliency datasets. Our approach treats saliency prediction as a direct set prediction problem, via a global loss that enforces unique fixations prediction through bipartite matching and a transformer encoder-decoder architecture. By utilizing a fixed set of learned fixation queries, the cross-attention reasons over the image features to directly output the fixation points, distinguishing it from other modern saliency predictors. Our approach, named Saliency TRansformer (SalTR), achieves metric scores on par with state-of-the-art approaches on the Salicon and MIT300 benchmarks.

翻訳日:2023-11-27 23:19:08 公開日:2023-11-23

# chitchatによるタスク指向対話の強化--語彙の多様性と多様性に基づく比較研究

Enhancing Task-Oriented Dialogues with Chitchat: a Comparative Study Based on Lexical Diversity and Divergence ( http://arxiv.org/abs/2311.14067v1 )

ライセンス: Link先を確認

Armand Stricker, Patrick Paroubek

(参考訳) 近年,タスク指向対話(TOD)は,対話をより多様かつ活発にするために,chitchatで強化されている。この強化は、TODが狭い領域に制限されることがしばしばあり、反復的かつ予測可能な応答の緩和が重要な課題となるため、特に貴重である。本稿では,3つのchitchat拡張の比較分析を行い,多様性の観点から最も効果的なアプローチを明らかにすることを目的とした。さらに、タスク指向言語であるchitchatと、chitchatデータセットで一般的に見られるchitchatとの相違を定量化し、各比較で上位20のダイバージェントキーワードを強調する。本研究は,tod強化のための今後の強化に関する議論を促し,より多様で自然な交流を実現するためのタスクを超えて対話を基礎付けることの重要性を強調した。

As a recent development, task-oriented dialogues (TODs) have been enriched with chitchat in an effort to make dialogues more diverse and engaging. This enhancement is particularly valuable as TODs are often confined to narrow domains, making the mitigation of repetitive and predictable responses a significant challenge. This paper presents a comparative analysis of three chitchat enhancements, aiming to identify the most effective approach in terms of diversity. Additionally, we quantify the divergence between the added chitchat, the original task-oriented language, and chitchat typically found in chitchat datasets, highlighting the top 20 divergent keywords for each comparison. Our findings drive a discussion on future enhancements for augmenting TODs, emphasizing the importance of grounding dialogues beyond the task to achieve more diverse and natural exchanges.

翻訳日:2023-11-27 23:18:53 公開日:2023-11-23

# HGCLIP:階層的理解のためのグラフ表現を用いた視覚言語モデルの探索

HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding ( http://arxiv.org/abs/2311.14064v1 )

ライセンス: Link先を確認

Peng Xia, Xingtong Yu, Ming Hu, Lie Ju, Zhiyong Wang, Peibo Duan, Zongyuan Ge

(参考訳) 対象分類は通常、多粒性分類階層に分類される。異なる階層レベルでカテゴリを分類する場合、従来のユニモーダルアプローチは主にイメージ機能に焦点を当て、複雑なシナリオにおける制限を明らかにする。ビジョンランゲージモデル(VLM)とクラス階層を統合する最近の研究は、将来性を示しているが、階層関係を完全に活用するには至っていない。これらの取り組みは、様々なカテゴリの粒度で効果的に実行できないことによる制約を受けている。本稿では,CLIPとグラフ表現学習による階層型クラス構造のより深い活用を効果的に組み合わせた新しいフレームワーク(HGCLIP)を提案する。各カテゴリのテキスト的または画像的特徴を表すノードを用いて、クラス階層をグラフに構築することを検討する。グラフエンコーダを通過した後、テキストの特徴は階層構造情報を含み、画像特徴はアテンション機構を通じてプロトタイプから派生したクラス認識の特徴を強調している。本手法は,総称と細粒度の両方の視覚認識ベンチマークにおいて有意な改善を示す。私たちのコードはhttps://github.com/richard-peng-xia/HGCLIPで完全に利用可能です。

Object categories are typically organized into a multi-granularity taxonomic hierarchy. When classifying categories at different hierarchy levels, traditional uni-modal approaches focus primarily on image features, revealing limitations in complex scenarios. Recent studies integrating Vision-Language Models (VLMs) with class hierarchies have shown promise, yet they fall short of fully exploiting the hierarchical relationships. These efforts are constrained by their inability to perform effectively across varied granularity of categories. To tackle this issue, we propose a novel framework (HGCLIP) that effectively combines CLIP with a deeper exploitation of the Hierarchical class structure via Graph representation learning. We explore constructing the class hierarchy into a graph, with its nodes representing the textual or image features of each category. After passing through a graph encoder, the textual features incorporate hierarchical structure information, while the image features emphasize class-aware features derived from prototypes through the attention mechanism. Our approach demonstrates significant improvements on both generic and fine-grained visual recognition benchmarks. Our codes are fully available at https://github.com/richard-peng-xia/HGCLIP.

翻訳日:2023-11-27 23:18:39 公開日:2023-11-23

# VSRモデルはRS3を超えて一般化されるか?

Do VSR Models Generalize Beyond LRS3? ( http://arxiv.org/abs/2311.14063v1 )

ライセンス: Link先を確認

Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Eustache Le Bihan, Haithem Boussaid, Ebtessam Almazrouei, Merouane Debbah

(参考訳) Lip Reading Sentences-3 (LRS3) ベンチマークは、ここ数年、視覚音声認識(VSR)における激しい研究の焦点となっている。その結果、過度に使用されるテストセットに過度に適合するリスクが高まり、これは1時間しか続かない。この問題を緩和するために、LSS3データセット生成プロセスに密接に従えば、WildVSRという新しいVSRテストセットを構築します。次に、現在のVSRモデルが新しいテストデータに一般化される範囲を評価し、分析する。我々は、利用可能なVSRモデルを幅広く評価し、対応するLSS3結果と比較して、テストセットの性能が大幅に低下することを示した。以上の結果から,単語誤り率の増加は,LSS3テストセットよりもわずかに困難で野生の唇配列に一般化できないモデルが原因であることが示唆された。我々の新しいテストベンチマークは、より堅牢なVSRモデルに向けた将来の研究を可能にするために公開されています。

The Lip Reading Sentences-3 (LRS3) benchmark has primarily been the focus of intense research in visual speech recognition (VSR) during the last few years. As a result, there is an increased risk of overfitting to its excessively used test set, which is only one hour duration. To alleviate this issue, we build a new VSR test set named WildVSR, by closely following the LRS3 dataset creation processes. We then evaluate and analyse the extent to which the current VSR models generalize to the new test data. We evaluate a broad range of publicly available VSR models and find significant drops in performance on our test set, compared to their corresponding LRS3 results. Our results suggest that the increase in word error rates is caused by the models inability to generalize to slightly harder and in the wild lip sequences than those found in the LRS3 test set. Our new test benchmark is made public in order to enable future research towards more robust VSR models.

翻訳日:2023-11-27 23:18:22 公開日:2023-11-23

# テキストガイド画像分類器のハードウェアレジリエンス特性

Hardware Resilience Properties of Text-Guided Image Classifiers ( http://arxiv.org/abs/2311.14062v1 )

ライセンス: Link先を確認

Syed Talal Wasim, Kabila Haile Saboka, Abdulrahman Mahmoud, Salman Khan, David Brooks, Gu-Yeon Wei

(参考訳) 本稿では,過渡的ハードウェアエラーに直面した配置中の画像分類モデルの信頼性を高める新しい手法を提案する。 GPT-3から派生したリッチテキスト埋め込みをクラスごとの質問プロンプトとCLIP事前訓練テキストエンコーダを用いて,分類層の初期化としての影響を検討する。当社のアプローチは,PyTorchのベースラインモデルと比較して,最小限の精度低下(平均0.3%)で,さまざまなアーキテクチャにおけるハードウェア信頼性(最大14倍)の平均的な向上を実現している。さらに,任意の画像分類バックボーンとシームレスに統合し,様々なネットワークアーキテクチャにまたがる結果を表示し,パラメータとフラップのオーバーヘッドを低減し,一貫したトレーニングレシピに従う。この研究は、ハードウェア障害に対する画像分類モデルのロバスト性を強化するための、実用的で効率的なソリューションを提供する。私たちのコードとモデルはhttps://github.com/talalwasim/textguidedresilienceでリリースしています。

This paper presents a novel method to enhance the reliability of image classification models during deployment in the face of transient hardware errors. By utilizing enriched text embeddings derived from GPT-3 with question prompts per class and CLIP pretrained text encoder, we investigate their impact as an initialization for the classification layer. Our approach achieves a remarkable $5.5\times$ average increase in hardware reliability (and up to 14x) across various architectures in the most critical layer, with minimal accuracy drop (0.3% on average) compared to baseline PyTorch models. Furthermore, our method seamlessly integrates with any image classification backbone, showcases results across various network architectures, decreases parameter and FLOPs overhead, and follows a consistent training recipe. This research offers a practical and efficient solution to bolster the robustness of image classification models against hardware failures, with potential implications for future studies in this domain. Our code and models are released at https://github.com/TalalWasim/TextGuidedResilience.

翻訳日:2023-11-27 23:18:04 公開日:2023-11-23

# nlpトランスフォーマーを用いた説明可能な戦略テンプレート作成に向けて

Towards Explainable Strategy Templates using NLP Transformers ( http://arxiv.org/abs/2311.14061v1 )

ライセンス: Link先を確認

Pallavi Bagga, Kostas Stathis

(参考訳) 本稿では,自動エージェントネゴシエーションにおけるDeep Reinforcement Learning(DRL)から学んだ数学的ヒューリスティック戦略と,理解可能な自然言語説明とのギャップを橋渡しする。我々の目標は、これらの戦略を非専門家にもっとアクセスできるようにすることです。トランスフォーマーを備えた従来の自然言語処理(NLP)技術とLarge Language Models(LLM)を活用して,戦略テンプレートを構成するDRL戦略の一部をユーザフレンドリで人間らしい英語の物語に変換する方法について概説する。これを実現するために、戦略テンプレートの数学的表現解析、変数と構造の意味論的解釈、ルールベースの一次説明の生成、およびこれらの説明を洗練・文脈化するために生成前変換器(GPT)モデルを利用するトップレベルアルゴリズムを提案する。様々なオーディエンスに対するその後のカスタマイズと、厳密な検証プロセスの例は、このアプローチの適用性と可能性を示している。

This paper bridges the gap between mathematical heuristic strategies learned from Deep Reinforcement Learning (DRL) in automated agent negotiation, and comprehensible, natural language explanations. Our aim is to make these strategies more accessible to non-experts. By leveraging traditional Natural Language Processing (NLP) techniques and Large Language Models (LLMs) equipped with Transformers, we outline how parts of DRL strategies composed of parts within strategy templates can be transformed into user-friendly, human-like English narratives. To achieve this, we present a top-level algorithm that involves parsing mathematical expressions of strategy templates, semantically interpreting variables and structures, generating rule-based primary explanations, and utilizing a Generative Pre-trained Transformer (GPT) model to refine and contextualize these explanations. Subsequent customization for varied audiences and meticulous validation processes in an example illustrate the applicability and potential of this approach.

翻訳日:2023-11-27 23:17:44 公開日:2023-11-23

# 多項式時間における木型構造因果モデルの同定

Identification for Tree-shaped Structural Causal Models in Polynomial Time ( http://arxiv.org/abs/2311.14058v1 )

ライセンス: Link先を確認

Aaryan Gupta and Markus Bl\"aser

(参考訳) 線形構造因果モデル(SCM)は、確率変数間の関係を表現・解析するために用いられる。直接因果効果は有向エッジとして表現され、結合因子は両向エッジとして表現される。ノード間の相関から因果パラメータを同定することは、人工知能におけるオープンな問題である。本稿では,木を配向成分とするSCMについて検討する。 Van der Zander et al. (AISTATS'22, PLMR 151, pp. 6770--6792, 2022) は、この場合の同定問題に対する PSPACE-algorithm を与える。本研究では,木形SCMの同定問題を解くランダム化多項式時間アルゴリズムを提案する。すべての構造パラメータに対して、アルゴリズムは、汎用的に識別可能か、ジェネリックで2-識別可能か、ジェネリックで識別不能かを決定する。 (他にはあり得ない。) 最初の2つのケースでは、対応するパラメータに対して多項式の1つまたは2つの分数アフィン平方根項(FASTP)を提供する。

Linear structural causal models (SCMs) are used to express and analyse the relationships between random variables. Direct causal effects are represented as directed edges and confounding factors as bidirected edges. Identifying the causal parameters from correlations between the nodes is an open problem in artificial intelligence. In this paper, we study SCMs whose directed component forms a tree. Van der Zander et al. (AISTATS'22, PLMR 151, pp. 6770--6792, 2022) give a PSPACE-algorithm for the identification problem in this case, which is a significant improvement over the general Gr\"obner basis approach, which has doubly-exponential time complexity in the number of structural parameters. In this work, we present a randomized polynomial-time algorithm, which solves the identification problem for tree-shaped SCMs. For every structural parameter, our algorithms decides whether it is generically identifiable, generically 2-identifiable, or generically unidentifiable. (No other cases can occur.) In the first two cases, it provides one or two fractional affine square root terms of polynomials (FASTPs) for the corresponding parameter, respectively.

翻訳日:2023-11-27 23:17:26 公開日:2023-11-23

# 量子ニューラルネットワークにおけるノイズの影響評価 : 実験的検討

Assessing the Impact of Noise on Quantum Neural Networks: An Experimental Analysis ( http://arxiv.org/abs/2311.14057v1 )

ライセンス: Link先を確認

Erik B. Terres Escudero, Danel Arias Alamo, Oier Mentxaka G\'omez, Pablo Garc\'ia Bringas

(参考訳) 量子コンピューティングへの競争の中で、量子ニューラルネットワーク(QNN)の潜在的な利点はますます明らかになっている。しかし、Noisy Intermediate-Scale Quantum (NISQ)プロセッサはエラーを起こしやすいため、複雑なアルゴリズムや量子機械学習の実行には大きな課題がある。 QNNの品質とセキュリティを確保するためには,ノイズがパフォーマンスに与える影響を検討することが重要である。本稿では,qnnに対する雑音の影響を包括的に解析し,様々なノイズモデルに基づくモットネン状態生成アルゴリズムを調べ,qnnの複数の層を通過する量子状態の劣化について検討する。さらに,事前学習されたqnnの性能に対する雑音の影響を評価し,量子コンピューティングにおけるノイズモデルによる課題を強調する。本研究は,QNNの開発における安定性とノイズ補正の優先順位付けの重要性を強調し,信頼性と信頼性を確保することを目的とする。本稿では,量子コンピューティングと量子機械学習に関する文献の発展に寄与し,ノイズがqnnに与える影響に関する新たな知見を提供し,より堅牢で効率的な量子アルゴリズムの開発への道を開く。

In the race towards quantum computing, the potential benefits of quantum neural networks (QNNs) have become increasingly apparent. However, Noisy Intermediate-Scale Quantum (NISQ) processors are prone to errors, which poses a significant challenge for the execution of complex algorithms or quantum machine learning. To ensure the quality and security of QNNs, it is crucial to explore the impact of noise on their performance. This paper provides a comprehensive analysis of the impact of noise on QNNs, examining the Mottonen state preparation algorithm under various noise models and studying the degradation of quantum states as they pass through multiple layers of QNNs. Additionally, the paper evaluates the effect of noise on the performance of pre-trained QNNs and highlights the challenges posed by noise models in quantum computing. The findings of this study have significant implications for the development of quantum software, emphasizing the importance of prioritizing stability and noise-correction measures when developing QNNs to ensure reliable and trustworthy results. This paper contributes to the growing body of literature on quantum computing and quantum machine learning, providing new insights into the impact of noise on QNNs and paving the way towards the development of more robust and efficient quantum algorithms.

翻訳日:2023-11-27 23:17:04 公開日:2023-11-23

# 株式市場予測のためのニューラルアーキテクチャと特徴の共進化--多目的意思決定の観点から

Coevolution of Neural Architectures and Features for Stock Market Forecasting: A Multi-objective Decision Perspective ( http://arxiv.org/abs/2311.14053v1 )

ライセンス: Link先を確認

Faizal Hafiz and Jan Broekaert and Davide La Torre and Akshya Swain

(参考訳) 多目的設定では、ポートフォリオマネージャの極めて連続的な決定は、ストックインデックス運動の代替予測モデルを評価することの恩恵を受けることができる。本研究は、意思決定者によるさらなる選択のための、非支配的なニューラルネットワークモデルセットを特定するための新しいアプローチを提案する。ニューラルネットワークの特徴とトポロジを同時に選択する新たな共進化的手法が提案され、トポロジ的観点から入力ニューロンとみなす。さらに、共進化はスパースで効率的な神経アーキテクチャを進化させるための多重基準問題として提起される。よく知られた支配と分解に基づく多目的進化アルゴリズムは、非幾何学的クロスオーバー演算子で拡張され、矛盾する基準を越えてニューラルネットワークの探索を多様化しバランスをとる。さらに、進行中のcovid-19パンデミック(covid-19)の前後における、異なる市場の行動に関するデータに基づく影響に対応するために、共進化が強化されている。特徴選択の従来の逐次的アプローチとニューラルトポロジー設計,スカラー化共進化アプローチを用いて,詳細な比較評価を行った。 nasdaq index in pre and peri covid time windowsの結果は、提案された共進化アプローチが、より一般化能力のある非支配的な神経予測モデルの集合を進化させることができることを説得力をもって証明している。

In a multi objective setting, a portfolio manager's highly consequential decisions can benefit from assessing alternative forecasting models of stock index movement. The present investigation proposes a new approach to identify a set of nondominated neural network models for further selection by the decision maker. A new coevolution approach is proposed to simultaneously select the features and topology of neural networks (collectively referred to as neural architecture), where the features are viewed from a topological perspective as input neurons. Further, the coevolution is posed as a multicriteria problem to evolve sparse and efficacious neural architectures. The well known dominance and decomposition based multiobjective evolutionary algorithms are augmented with a nongeometric crossover operator to diversify and balance the search for neural architectures across conflicting criteria. Moreover, the coevolution is augmented to accommodate the data based implications of distinct market behaviors prior to and during the ongoing COVID 19 pandemic. A detailed comparative evaluation is carried out with the conventional sequential approach of feature selection followed by neural topology design, as well as a scalarized coevolution approach. The results on the NASDAQ index in pre and peri COVID time windows convincingly demonstrate that the proposed coevolution approach can evolve a set of nondominated neural forecasting models with better generalization capabilities.

翻訳日:2023-11-27 23:16:44 公開日:2023-11-23

# リアルタイム自由出血型心臓磁気共鳴画像における深層学習のセグメンテーションの評価

Assessment of Deep Learning Segmentation for Real-Time Free-Breathing Cardiac Magnetic Resonance Imaging ( http://arxiv.org/abs/2311.14049v1 )

ライセンス: Link先を確認

Martin Schilling and Christina Unterberg-Buchwald and Joachim Lotz and Martin Uecker

(参考訳) 近年、心臓MRI(CMR)セグメンテーションのための様々なディープラーニングネットワークが開発され、分析されている。しかし、ほとんど全員が呼吸中のシネCMRに焦点を当てている。本研究は、安静時および運動負荷時のリアルタイム自由呼吸cmrにおける左室容積分析(セグメンテーション)において、深部学習法の精度を評価した。健常者(n=15)とリアルタイム自由呼吸型CMRのデータを振り返って分析した。商用ソフトウェア(comDL)と利用可能なニューラルネットワーク(nnU-Net)のセグメンテーションを、comDLセグメンテーションのマニュアル修正によって作成されたリファレンスと比較した。左室心内膜(lv)、左室心筋(myo)、右室(rv)のセグメンテーションは、末期収縮期と末期拡張期の両方において評価され、dice係数(dc)を用いて解析された。ボリューム分析は、LV端収縮体積(EDV)、LV端収縮体積(ESV)、LV放出率(EF)を含む。 cine cmr では、nnu-net と comdl は lv が 0.95 以上、myo と rv が 0.9 以上となる。リアルタイムCMRでは, nnU-Net の精度が comDL の精度を上回っている。リアルタイムCMRでは、nnU-NetはLVが0.94、MYOが0.89、RVが0.90、EDVが2.9mL、ESVが3.5mL、EFが2.6%である。運動ストレス下でのリアルタイムCMRでは、nnU-Netは、LVが0.92、MYOが0.85、RVが0.83、EDVが11.4mL、ESVが2.9mL、EFが3.6%である。シネCMRセグメンテーションのために設計または訓練されたディープラーニング手法は、リアルタイムCMRでよく機能する。リアルタイムのフリーブレスCMRでは、ディープラーニングメソッドのパフォーマンスは、cine CMRのサーバ間変動と同等であり、使用可能なか、完全に自動セグメンテーションである。

In recent years, a variety of deep learning networks for cardiac MRI (CMR) segmentation have been developed and analyzed. However, nearly all of them are focused on cine CMR under breathold. In this work, accuracy of deep learning methods is assessed for volumetric analysis (via segmentation) of the left ventricle in real-time free-breathing CMR at rest and under exercise stress. Data from healthy volunteers (n=15) for cine and real-time free-breathing CMR were analyzed retrospectively. Segmentations of a commercial software (comDL) and a freely available neural network (nnU-Net), were compared to a reference created via the manual correction of comDL segmentation. Segmentation of left ventricular endocardium (LV), left ventricular myocardium (MYO), and right ventricle (RV) is evaluated for both end-systolic and end-diastolic phases and analyzed with Dice's coefficient (DC). The volumetric analysis includes LV end-diastolic volume (EDV), LV end-systolic volume (ESV), and LV ejection fraction (EF). For cine CMR, nnU-Net and comDL achieve a DC above 0.95 for LV and 0.9 for MYO, and RV. For real-time CMR, the accuracy of nnU-Net exceeds that of comDL overall. For real-time CMR at rest, nnU-Net achieves a DC of 0.94 for LV, 0.89 for MYO, and 0.90 for RV; mean absolute differences between nnU-Net and reference are 2.9mL for EDV, 3.5mL for ESV and 2.6% for EF. For real-time CMR under exercise stress, nnU-Net achieves a DC of 0.92 for LV, 0.85 for MYO, and 0.83 for RV; mean absolute differences between nnU-Net and reference are 11.4mL for EDV, 2.9mL for ESV and 3.6% for EF. Deep learning methods designed or trained for cine CMR segmentation can perform well on real-time CMR. For real-time free-breathing CMR at rest, the performance of deep learning methods is comparable to inter-observer variability in cine CMR and is usable or fully automatic segmentation.

翻訳日:2023-11-27 23:16:07 公開日:2023-11-23

# 深い)線形ニューラルネットワークにおける重み変動と逆分散平坦性関係の導出

Weight fluctuations in (deep) linear neural networks and a derivation of the inverse-variance flatness relation ( http://arxiv.org/abs/2311.14120v1 )

ライセンス: Link先を確認

Markus Gross, Arne P. Raulf, Christoph R\"ath

(参考訳) 合成ガウスデータに対する確率勾配降下(SGD)の連続限界内における単層および二層線形ニューラルネットワークの定常的(時間的)訓練条件について検討した。弱いオーバーサンプリング状態の単一層ネットワークの場合、ノイズ共分散行列のスペクトルは特にヘシアンから逸脱し、これはSGD力学の詳細なバランスの破れに起因する。この場合、重量変動は一般に異方性であるが、等方性損失を経験する。 2層ネットワークでは,各層内の重みの確率的ダイナミクスを求め,関連する定常共分散の解析を行う。重みのゆらぎに対する新しい異方性源として層間カップリングを同定した。単層の場合とは対照的に、重量変動は異方性損失を経験し、その平坦さは変動分散と逆関係である。そこで我々は,最近観測された線形ネットワークモデルにおける逆分散-平坦性関係の解析的導出を行う。

We investigate the stationary (late-time) training regime of single- and two-layer linear neural networks within the continuum limit of stochastic gradient descent (SGD) for synthetic Gaussian data. In the case of a single-layer network in the weakly oversampled regime, the spectrum of the noise covariance matrix deviates notably from the Hessian, which can be attributed to the broken detailed balance of SGD dynamics. The weight fluctuations are in this case generally anisotropic, but experience an isotropic loss. For a two-layer network, we obtain the stochastic dynamics of the weights in each layer and analyze the associated stationary covariances. We identify the inter-layer coupling as a new source of anisotropy for the weight fluctuations. In contrast to the single-layer case, the weight fluctuations experience an anisotropic loss, the flatness of which is inversely related to the fluctuation variance. We thereby provide an analytical derivation of the recently observed inverse variance-flatness relation in a deep linear network model.

翻訳日:2023-11-27 23:08:59 公開日:2023-11-23

# アルミニウム薄膜の磁気浸透深さ

Magnetic penetration depth of Aluminum thin films ( http://arxiv.org/abs/2311.14119v1 )

ライセンス: Link先を確認

David L\'opez-N\'u\~nez, Queralt Portell Montserrat, Gemma Rius, Elia Bertoldo, Alba Torras-Coloma, M. Mart\'inez, P. Forn-D\'iaz

(参考訳) 厚みの異なるアルミニウム薄膜における超伝導透過深さ$\lambda$の研究を行った。選択された厚さの範囲は、薄膜状態からバルクな挙動に近づく状態まで幅広い。観測された浸透深度は、$\lambda = 163.3\pm0.4~\rm{nm}$ から$\lambda = 53.6\pm0.4~\rm{nm}$ 200~\rm{nm}$-thick までの幅である。我々は,$\lambda$を正確に判定するために,超伝導$LC$共振器の周波数と常圧メランダの抵抗を用いて相補的な測定を行った。どちらの方法も同等の結果をもたらし、量子コンピューティングやマイクロ波放射検出器技術といった分野の応用に適切な範囲のアルミニウムに$\lambda$の値を与える。

We present a study of the superconducting penetration depth $\lambda$ in aluminum thin films of varying thickness. The range of thicknesses chosen spans from the thin-film regime to the regime approaching bulk behavior. The penetration depths observed range from $\lambda = 163.3\pm0.4~\rm{nm}$ for the thinnest $20~\rm{nm}$ samples down to $\lambda = 53.6\pm0.4~\rm{nm}$ for the $200~\rm{nm}$-thick ones. In order to accurately determine $\lambda$, we performed complementary measurements using the frequency of superconducting $LC$ resonators as well as the resistance of normal-state meanders. Both methods yield comparable results, providing a well-characterized set of values of $\lambda$ in aluminum in the relevant range for applications in fields such as quantum computing and microwave radiation detector technologies.

翻訳日:2023-11-27 23:08:43 公開日:2023-11-23

# 対人嗜好からの学習における密度推定の視点

A density estimation perspective on learning from pairwise human preferences ( http://arxiv.org/abs/2311.14115v1 )

ライセンス: Link先を確認

Vincent Dumoulin, Daniel D. Johnson, Pablo Samuel Castro, Hugo Larochelle, Yann Dauphin

(参考訳) 人間からのフィードバック(LHF)から学ぶこと、特にペアの好みから学ぶことは、最近、大きな言語モデル(LLM)のトレーニングにおいて重要な要素となり、多くの研究の対象となっている。最近の研究は、報酬関数がペアの選好データから学習され、LLMは報酬を最大化するためのポリシーとして扱われ、しばしば追加の正規化制約の下で扱われる強化学習問題である。本稿では,一対選好のための生成過程を中心とし,lhfを密度推定問題として扱う代替解釈を提案する。選好行動分布方程式によって定義される生成過程の族に対して、ペアワイズ選好の報奨関数を訓練することで、アノテーションの暗黙の選好分布を効果的にモデル化できることを理論的および実証的に示す。最後に,「注釈的誤特定」について考察し,その知見を提示する。アノテーション的行動について誤ったモデリングの仮定がなされた場合,その結果,不適応なモデルが生まれる場合,ペアで人間の選好から学ぶアプローチは,多様な視点を持つ注釈者集団から学ぶのに困難をもたらす可能性があることを示唆する。

Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in training large language models (LLMs), and has been the subject of much research. Most recent works frame it as a reinforcement learning problem, where a reward function is learned from pairwise preference data and the LLM is treated as a policy which is adapted to maximize the rewards, often under additional regularization constraints. We propose an alternative interpretation which centers on the generative process for pairwise preferences and treats LHF as a density estimation problem. We provide theoretical and empirical results showing that for a family of generative processes defined via preference behavior distribution equations, training a reward function on pairwise preferences effectively models an annotator's implicit preference distribution. Finally, we discuss and present findings on "annotator misspecification" -- failure cases where wrong modeling assumptions are made about annotator behavior, resulting in poorly-adapted models -- suggesting that approaches that learn from pairwise human preferences could have trouble learning from a population of annotators with diverse viewpoints.

翻訳日:2023-11-27 23:08:30 公開日:2023-11-23

# SySMOL:超低・微細混合精度ニューラルネットワークのためのハードウェア・ソフトウェア共同設計フレームワーク

SySMOL: A Hardware-software Co-design Framework for Ultra-Low and Fine-Grained Mixed-Precision Neural Networks ( http://arxiv.org/abs/2311.14114v1 )

ライセンス: Link先を確認

Cyrus Zhou, Vaughn Richard, Pedro Savarese, Zachary Hassman, Michael Maire, Michael DiBrino, Yanjing Li

(参考訳) 近年の量子化と混合精度技術の発展は、ニューラルネットワークの実行時間とエネルギー効率を改善するための大きな約束となる。本研究では,個々のパラメータやアクティベーションが1ビットから4ビットの間で異なる精度を発揮できるニューラルネットワークが,全精度と同等あるいはそれ以上の精度を実現できることを示した。しかしながら、これらのネットワークの展開は、各データに対する非常に細かい混合精度に関連する計算/通信/ストレージ要件の管理と制御の必要性から、多くの課題を生んでいる。これらのユニークで困難な要件に合わせて、既存の効率的なハードウェアとシステムレベルのサポートがない。本研究は,ハードウェア設計,トレーニング,推論間の継続的なフィードバックループを実現し,系統的な設計探索を容易にする,新たなハードウェア・ソフトウェア共同設計手法を提案する。概念実証として、これらのネットワークに適した新しい構成可能なCPU SIMDアーキテクチャを設計し、アーキテクチャを新しいシステム認識トレーニングと推論技術と密に統合することで、この共同設計のアプローチを説明する。このフレームワークを用いて,様々なトレードオフを解析するシステム設計空間探索を行う。 The design for mixed-precision networks that achieves optimized tradeoffs corresponds to an architecture that supports 1, 2, and 4-bit fixed-point operations with four configurable precision patterns, when coupled with system-aware training and inference optimization -- networks trained for this design achieve accuracies that closely match full-precision accuracies, while compressing and improving run-time efficiency of the neural networks drastically by 10-20x, compared to full-precision networks.

Recent advancements in quantization and mixed-precision techniques offer significant promise for improving the run-time and energy efficiency of neural networks. In this work, we further showed that neural networks, wherein individual parameters or activations can take on different precisions ranging between 1 and 4 bits, can achieve accuracies comparable to or exceeding the full-precision counterparts. However, the deployment of such networks poses numerous challenges, stemming from the necessity to manage and control the compute/communication/storage requirements associated with these extremely fine-grained mixed precisions for each piece of data. There is a lack of existing efficient hardware and system-level support tailored to these unique and challenging requirements. Our research introduces the first novel holistic hardware-software co-design approach for these networks, which enables a continuous feedback loop between hardware design, training, and inference to facilitate systematic design exploration. As a proof-of-concept, we illustrate this co-design approach by designing new, configurable CPU SIMD architectures tailored for these networks, tightly integrating the architecture with new system-aware training and inference techniques. We perform systematic design space exploration using this framework to analyze various tradeoffs. The design for mixed-precision networks that achieves optimized tradeoffs corresponds to an architecture that supports 1, 2, and 4-bit fixed-point operations with four configurable precision patterns, when coupled with system-aware training and inference optimization -- networks trained for this design achieve accuracies that closely match full-precision accuracies, while compressing and improving run-time efficiency of the neural networks drastically by 10-20x, compared to full-precision networks.

翻訳日:2023-11-27 23:08:06 公開日:2023-11-23

# 円錐空間上の強文脈的単純分布のホモトピカル解析

Homotopical characterization of strongly contextual simplicial distributions on cone spaces ( http://arxiv.org/abs/2311.14111v1 )

ライセンス: Link先を確認

Aziz Kharoof, Cihan Okay

(参考訳) 本稿では,2次結果を持つ強文脈的単純分布,特に1次元空間の円錐上で定義されるものについて,新しいホモトピー的特徴を与える。せん断理論の枠組みでは、これらの分布は、各文脈が2つの測定結果を含むような測定シナリオ上の非シグナリング分布に対応している。結果の確立には,計測空間の崩壊を含むホモトピー的アプローチを採用し,強力な文脈性を検出するための単純分布に関連するカテゴリを導入する。

This paper offers a novel homotopical characterization of strongly contextual simplicial distributions with binary outcomes, specifically those defined on the cone of a 1-dimensional space. In the sheaf-theoretic framework, such distributions correspond to non-signaling distributions on measurement scenarios where each context contains 2 measurements with binary outcomes. To establish our results, we employ a homotopical approach that includes collapsing measurement spaces and introduce categories associated with simplicial distributions that can detect strong contextuality.

翻訳日:2023-11-27 23:07:40 公開日:2023-11-23

# オフポリティ評価はいつ有効か? データ中心の視点

When is Off-Policy Evaluation Useful? A Data-Centric Perspective ( http://arxiv.org/abs/2311.14110v1 )

ライセンス: Link先を確認

Hao Sun, Alex J. Chan, Nabeel Seedat, Alihan H\"uy\"uk, Mihaela van der Schaar

(参考訳) ログ化されたデータセットだけで仮説的ターゲットポリシーの価値を評価することは重要だが、難しい。一方で、臨床ガイドラインのような高リスクシナリオの下で、安全な政策改善の機会をもたらす。一方、このような機会は、正確な非政治評価(OPE)の必要性を高める。 OPEに関する以前の研究は、価値推定におけるアルゴリズムの改善に重点を置いていたが、この研究ではオフラインデータセットの重要性を強調し、OPE問題を評価するためのデータ中心のフレームワークを推進した。我々は、データ中心のope評価フレームワークであるdatacopeを提案し、データセットが与えられた場合、ターゲットポリシーをどの程度評価できるかという疑問に答える。データCOPE(1)は,OPE評価が不可能な実世界展開前において特に有用である環境へのアクセスのないOPEアルゴリズム全体の性能を予測し,(2)OPEが不正確なデータセット内のサブグループを特定し,(3)OPE問題に対するデータセットの評価やデータ収集戦略を許可する。医療データセットを用いたログ化された文脈的帯域設定におけるDataCOPEの実証分析により、臨床ガイドラインのような機械学習と人的専門家のポリシーを評価する能力が確認された。

Evaluating the value of a hypothetical target policy with only a logged dataset is important but challenging. On the one hand, it brings opportunities for safe policy improvement under high-stakes scenarios like clinical guidelines. On the other hand, such opportunities raise a need for precise off-policy evaluation (OPE). While previous work on OPE focused on improving the algorithm in value estimation, in this work, we emphasize the importance of the offline dataset, hence putting forward a data-centric framework for evaluating OPE problems. We propose DataCOPE, a data-centric framework for evaluating OPE, that answers the questions of whether and to what extent we can evaluate a target policy given a dataset. DataCOPE (1) forecasts the overall performance of OPE algorithms without access to the environment, which is especially useful before real-world deployment where evaluating OPE is impossible; (2) identifies the sub-group in the dataset where OPE can be inaccurate; (3) permits evaluations of datasets or data-collection strategies for OPE problems. Our empirical analysis of DataCOPE in the logged contextual bandit settings using healthcare datasets confirms its ability to evaluate both machine-learning and human expert policies like clinical guidelines.

翻訳日:2023-11-27 23:07:31 公開日:2023-11-23

# 大規模モデルに適合する小型マルチモーダル推論モデルのパワー向上と自己一貫性トレーニング

Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training ( http://arxiv.org/abs/2311.14109v1 )

ライセンス: Link先を確認

Cheng Tan, Jingxuan Wei, Zhangyang Gao, Linzhuang Sun, Siyuan Li, Xihong Yang, Stan Z. Li

(参考訳) マルチモーダル推論(multimodal reasoning)は、複数のモーダルをまたいだモデルによる質問に答える難しいタスクである。既存のアプローチでは、言語と視覚のモダリティを2段階の推論フレームワークに組み込むことで、応答推論から合理的生成を分離する。しかし、これらのアプローチは、しばしば生成された合理性の不十分な品質のために不足する。この研究では、モデル推論における理性の重要性を掘り下げる。理論が完全に正確である場合、モデルの精度が大幅に向上し、高品質な論理生成の必要性が強調される。 MC-CoTは,複数の合理性と回答を生成する自己整合性学習戦略であり,投票プロセスを通じて最も正確なものを選択する。このアプローチは、生成された合理性の品質を高めるだけでなく、より正確で堅牢な答えをもたらす。広範な実験を通じて,本手法は様々なベンチマークにおけるモデル性能を著しく向上させることを示した。注目すべきことに,提案手法を応用すれば,より小さなベースモデルであっても,より大規模なモデルに匹敵する結果が得られることが示され,マルチモーダル推論の改善に合理性のパワーを活用できる可能性が示唆された。コードはhttps://github.com/chengtan9907/mc-cotで入手できる。

Multimodal reasoning is a challenging task that requires models to reason across multiple modalities to answer questions. Existing approaches have made progress by incorporating language and visual modalities into a two-stage reasoning framework, separating rationale generation from answer inference. However, these approaches often fall short due to the inadequate quality of the generated rationales. In this work, we delve into the importance of rationales in model reasoning. We observe that when rationales are completely accurate, the model's accuracy significantly improves, highlighting the need for high-quality rationale generation. Motivated by this, we propose MC-CoT, a self-consistency training strategy that generates multiple rationales and answers, subsequently selecting the most accurate through a voting process. This approach not only enhances the quality of generated rationales but also leads to more accurate and robust answers. Through extensive experiments, we demonstrate that our approach significantly improves model performance across various benchmarks. Remarkably, we show that even smaller base models, when equipped with our proposed approach, can achieve results comparable to those of larger models, illustrating the potential of our approach in harnessing the power of rationales for improved multimodal reasoning. The code is available at https://github.com/chengtan9907/mc-cot.

翻訳日:2023-11-27 23:07:10 公開日:2023-11-23

# minty: 価値のない機能を暗示する必要性を最小限にするルールベースのモデル

MINTY: Rule-based Models that Minimize the Need for Imputing Features with Missing Values ( http://arxiv.org/abs/2311.14108v1 )

ライセンス: Link先を確認

Lena Stempfle and Fredrik D. Johansson

(参考訳) ルールモデルは、自然言語を使って容易に解釈でき、より複雑なモデルと同等の予測性能を提供するため、表形式の入力を持つ予測タスクでしばしば好まれる。しかし、ほとんどのルールモデルの予測は、いくつかの入力が欠けている場合、定義されていないか曖昧であり、ユーザーは統計的インプテーションモデルやゼロインプテーションのようなヒューリスティックに依存し、モデルの解釈可能性を損なう。本稿では,値の欠如を回避し,テスト時のインプテーションへの依存度を制限することを学ぶ,簡潔で正確なルールモデルを適用することを提案する。 MINTYは,各変数間の解離という形でルールを学習する手法で,各変数が1つ以上の欠落時に相互の代替として機能する。これにより、不適合性、解釈性、テスト時の欠落値に対する堅牢性の間のトレードオフを可能にするために、値の欠如した特徴への依存が小さいように規則化されたスパース線形規則モデルが実現される。本研究では,合成および実世界のデータセットを用いた実験におけるmintyの価値を実証し,その予測性能がベースラインに匹敵するか好適であるかを見出した。

Rule models are often preferred in prediction tasks with tabular inputs as they can be easily interpreted using natural language and provide predictive performance on par with more complex models. However, most rule models' predictions are undefined or ambiguous when some inputs are missing, forcing users to rely on statistical imputation models or heuristics like zero imputation, undermining the interpretability of the models. In this work, we propose fitting concise yet precise rule models that learn to avoid relying on features with missing values and, therefore, limit their reliance on imputation at test time. We develop MINTY, a method that learns rules in the form of disjunctions between variables that act as replacements for each other when one or more is missing. This results in a sparse linear rule model, regularized to have small dependence on features with missing values, that allows a trade-off between goodness of fit, interpretability, and robustness to missing values at test time. We demonstrate the value of MINTY in experiments using synthetic and real-world data sets and find its predictive performance comparable or favorable to baselines, with smaller reliance on features with missing values.

翻訳日:2023-11-27 23:06:48 公開日:2023-11-23

# モード最適化型ハイブリッドCPU-GPU密度行列再正規化法による2次元量子格子モデル

Two dimensional quantum lattice models via mode optimized hybrid CPU-GPU density matrix renormalization group method ( http://arxiv.org/abs/2311.14106v1 )

ライセンス: Link先を確認

Andor Menczer, Korn\'el Kap\'as, Mikl\'os Antal Werner, and \"Ors Legeza

(参考訳) 本稿では,2次元量子格子モデル上の量子多体問題を非可換ab型密度行列再正規化群法でシミュレートするハイブリッド数値的手法を提案する。本稿では,2次元スピンレスフェルミオンモデルとトーラス幾何学上のハバードモデルについて,最適化した計算とハイブリッドcpu-マルチgpu並列化を用いて計算時間を何桁も節約できることを実証する。少なくとも1桁の計算複雑性の減少はモード最適化によるものであり、さらに大きな並列化によって壁時間の減少が達成される。結果はFLOPと秒で直接測定される。得られた性能を行列ランクの関数として,および12\times 12$格子トポロジーまでのシステムサイズ関数として詳細なスケーリング解析を行った。

We present a hybrid numerical approach to simulate quantum many body problems on two spatial dimensional quantum lattice models via the non-Abelian ab initio version of the density matrix renormalization group method on state-of-the-art high performance computing infrastructures. We demonstrate for the two dimensional spinless fermion model and for the Hubbard model on torus geometry that altogether several orders of magnitude in computational time can be saved by performing calculations on an optimized basis and by utilizing hybrid CPU-multiGPU parallelization. At least an order of magnitude reduction in computational complexity results from mode optimization, while a further order of reduction in wall time is achieved by massive parallelization. Our results are measured directly in FLOP and seconds. A detailed scaling analysis of the obtained performance as a function of matrix ranks and as a function of system size up to $12\times 12$ lattice topology is discussed.

翻訳日:2023-11-27 23:06:26 公開日:2023-11-23

# カオス系シミュレーションのためのハイブリッド量子古典型貯留層計算

Hybrid quantum-classical reservoir computing for simulating chaotic systems ( http://arxiv.org/abs/2311.14105v1 )

ライセンス: Link先を確認

Filip Wudarski, Daniel O`Connor, Shaun Geaney, Ata Akbari Asanjan, Max Wilson, Elena Strbac, P. Aaron Lott, and Davide Venturelli

(参考訳) カオスシステムの予測は特に複雑なタスクであり、近年、システムの時空間情報を抽出するのに使われる固定ランダム重み(貯水池)のリカレントネットワークであるリザーバコンピューティング(rc)を用いて合理的に成功している。この研究は、RCの貯水池を量子回路に置き換える、ハイブリッド量子貯水池計算(HQRC)フレームワークを提案する。回路のモジュラー構造と測定フィードバックは、貯水池状態における複雑な系のダイナミクスを符号化するために使用され、そこから古典的学習を行い、将来のダイナミクスを予測する。 HQRCのノイズレスシミュレーションは、ロレンツ63とダブルスクロールカオスのパラダイムシステムの両方の最先端の古典的RCモデルに匹敵する有効な予測時間を示し、予測が真実から逸脱してからずっと後のアトラクタダイナミクスに固執する。

Forecasting chaotic systems is a notably complex task, which in recent years has been approached with reasonable success using reservoir computing (RC), a recurrent network with fixed random weights (the reservoir) used to extract the spatio-temporal information of the system. This work presents a hybrid quantum reservoir-computing (HQRC) framework, which replaces the reservoir in RC with a quantum circuit. The modular structure and measurement feedback in the circuit are used to encode the complex system dynamics in the reservoir states, from which classical learning is performed to predict future dynamics. The noiseless simulations of HQRC demonstrate valid prediction times comparable to state-of-the-art classical RC models for both the Lorenz63 and double-scroll chaotic paradigmatic systems and adhere to the attractor dynamics long after the forecasts have deviated from the ground truth.

翻訳日:2023-11-27 23:06:11 公開日:2023-11-23

# 単一光子による量子鍵分布のクロック解析と回復

Single-Photon-Based Clock Analysis and Recovery in Quantum Key Distribution ( http://arxiv.org/abs/2311.14104v1 )

ライセンス: Link先を確認

Mujtaba Zahidy, Domenico Ribezzo, Ronny M\"uller, Jasper Riebesehl, Alessandro Zavatta, Michael Galili, Leif Katsuo Oxenl{\o}we, Davide Bacco

(参考訳) 量子鍵分布は、市場に向けて準備された最初の量子技術の一つである。現在の量子通信システムは通常、送信機(alice)と受信機(bob)の同期にサービスチャネルを使用する。しかし、このサービスチャネルを除去し、クロックリカバリ手法を利用する可能性は、ファイバーリンクとフリースペースリンクの両方において将来の実装に興味深い。本稿では,量子通信シナリオにおけるクロック回収の基準について検討し,タイムビン量子鍵分散プロトコルにおける量子クロック回収システムの利用可能性について実験的に検証した。クロックリカバリ技術の性能は、量子ビット誤り率と秘密鍵レートの点で、クロック共有のためのサービスチャネルの使用と同等である。

Quantum key distribution is one of the first quantum technologies ready for the market. Current quantum telecommunication systems usually utilize a service channel for synchronizing the transmitter (Alice) and the receiver (Bob). However, the possibility of removing this service channel and exploiting a clock recovery method is intriguing for future implementation, both in fiber and free-space links. In this paper, we investigate criteria to recover the clock in a quantum communication scenario, and experimentally demonstrated the possibility of using a quantum-based clock recovery system in a time-bin quantum key distribution protocol. The performance of the clock recovery technique, in terms of quantum bit error rate and secret key rate, is equivalent to using the service channel for clock sharing.

翻訳日:2023-11-27 23:05:53 公開日:2023-11-23

# サブネットワークアンサンブル

Subnetwork Ensembles ( http://arxiv.org/abs/2311.14101v1 )

ライセンス: Link先を確認

Tim Whitaker

(参考訳) ニューラルネットワークアンサンブルは、独立に訓練された複数のモデルの予測を組み合わせることで、一般化を改善するために効果的に使用されている。しかし、ディープニューラルネットワークのスケールと複雑さが増大し、これらの手法は禁止的に高価になり、実装に時間がかかる。複数のモデルをスクラッチからトレーニングする必要性を軽減しつつ、従来のアンサンブル学習方法が備える一般化のメリットを保ちながら、低コストなアンサンブル手法がますます重要になっている。この論文は、訓練された親モデルからサブネットワークをサンプリング、摂動、最適化することで子ネットワークの集合を形成するサブネットワークアンサンブルを構築するための低コストなフレームワークを紹介し、定式化する。児童ネットワーク生成のための異なる手法を探索し、様々なアブレーション研究と確立されたベンチマークを通じてその有効性を評価する。その結果,この手法は計算コストを最小化しつつ,トレーニング効率,パラメトリック利用,一般化性能を大幅に向上できることがわかった。 Subnetwork Ensemblesは、ディープニューラルネットワークの非現実的なポテンシャルを活用することによって、よりよいシステムを構築するための魅力的なフレームワークを提供する。

Neural network ensembles have been effectively used to improve generalization by combining the predictions of multiple independently trained models. However, the growing scale and complexity of deep neural networks have led to these methods becoming prohibitively expensive and time consuming to implement. Low-cost ensemble methods have become increasingly important as they can alleviate the need to train multiple models from scratch while retaining the generalization benefits that traditional ensemble learning methods afford. This dissertation introduces and formalizes a low-cost framework for constructing Subnetwork Ensembles, where a collection of child networks are formed by sampling, perturbing, and optimizing subnetworks from a trained parent model. We explore several distinct methodologies for generating child networks and we evaluate their efficacy through a variety of ablation studies and established benchmarks. Our findings reveal that this approach can greatly improve training efficiency, parametric utilization, and generalization performance while minimizing computational cost. Subnetwork Ensembles offer a compelling framework for exploring how we can build better systems by leveraging the unrealized potential of deep neural networks.

翻訳日:2023-11-27 23:05:41 公開日:2023-11-23

# ACT: 敵対的一貫性モデル

ACT: Adversarial Consistency Models ( http://arxiv.org/abs/2311.14097v1 )

ライセンス: Link先を確認

Fei Kong, Jinhao Duan, Lichao Sun, Hao Cheng, Renjing Xu, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu

(参考訳) 拡散モデルは画像生成に優れているが、ステップバイステップのデノージングは生成速度を遅くする。一貫性トレーニングは、単一ステップサンプリングでこの問題に対処するが、しばしば低品質世代を生成し、高いトレーニングコストを必要とする。本稿では,目標分布と生成分布との間のwasserstein距離を最小化する一貫性トレーニング損失の最適化について述べる。時間ステップが増加すると、上限は以前の一貫性トレーニング損失を蓄積する。そのため、電流と累積損失を減らすために、より大きなバッチサイズが必要となる。本稿では,判別器を用いて,各時刻における分布間のJensen-Shannon(JS)ばらつきを極力最小化するAdversarial Consistency Training(ACT)を提案する。理論的には、ACTは生成品質と収束を高める。一貫性トレーニングフレームワークに識別器を組み込むことにより、cifar10とimagenet 64$\times$64のfidスコアを改善し、ゼロショット画像の塗り込み能力を保持し、元のバッチサイズが1/6ドル以下で、モデルパラメータとトレーニングステップがベースラインメソッドと比較して1/2$以下となることにより、リソース消費量が大幅に削減される。

Though diffusion models excel in image generation, their step-by-step denoising leads to slow generation speeds. Consistency training addresses this issue with single-step sampling but often produces lower-quality generations and requires high training costs. In this paper, we show that optimizing consistency training loss minimizes the Wasserstein distance between target and generated distributions. As timestep increases, the upper bound accumulates previous consistency training losses. Therefore, larger batch sizes are needed to reduce both current and accumulated losses. We propose Adversarial Consistency Training (ACT), which directly minimizes the Jensen-Shannon (JS) divergence between distributions at each timestep using a discriminator. Theoretically, ACT enhances generation quality, and convergence. By incorporating a discriminator into the consistency training framework, our method achieves improved FID scores on CIFAR10 and ImageNet 64$\times$64, retains zero-shot image inpainting capabilities, and uses less than $1/6$ of the original batch size and fewer than $1/2$ of the model parameters and training steps compared to the baseline method, this leads to a substantial reduction in resource consumption.

翻訳日:2023-11-27 23:05:20 公開日:2023-11-23

# LLMにおける文化的バイアスの監査と緩和

Auditing and Mitigating Cultural Bias in LLMs ( http://arxiv.org/abs/2311.14096v1 )

ライセンス: Link先を確認

Yan Tao, Olga Viberg, Ryan S. Baker, Rene F. Kizilcec

(参考訳) 文化は人々の推論、行動、コミュニケーションを根本的に形作る。生成人工知能(AI)技術は、支配的な文化へと移行する可能性がある。人々がAIを使って、さまざまな専門的および個人的タスクを迅速かつ自動化するにつれて、AIモデルに埋め込まれた文化的価値は、真の表現をバイアスする可能性がある。我々は,文化バイアスに対する大規模言語モデルの監査を行い,その回答を全国的に代表される調査データと比較し,国別行動の促進を緩和戦略として評価した。 GPT-4,3.5,3は、英語やプロテスタントのヨーロッパ諸国に似た文化的価値を示す。我々の緩和戦略は、近年のモデルでは文化バイアスを減少させるが、すべての国や地域ではそうではない。生成AIの文化的偏見を回避するため,文化マッチングと継続的な文化監査を併用することを提案する。

Culture fundamentally shapes people's reasoning, behavior, and communication. Generative artificial intelligence (AI) technologies may cause a shift towards a dominant culture. As people increasingly use AI to expedite and even automate various professional and personal tasks, cultural values embedded in AI models may bias authentic expression. We audit large language models for cultural bias, comparing their responses to nationally representative survey data, and evaluate country-specific prompting as a mitigation strategy. We find that GPT-4, 3.5 and 3 exhibit cultural values resembling English-speaking and Protestant European countries. Our mitigation strategy reduces cultural bias in recent models but not for all countries/territories. To avoid cultural bias in generative AI, especially in high-stakes contexts, we suggest using culture matching and ongoing cultural audits.

翻訳日:2023-11-27 23:04:57 公開日:2023-11-23

# GANを用いたビデオ異常検出

Video Anomaly Detection using GAN ( http://arxiv.org/abs/2311.14095v1 )

ライセンス: Link先を確認

Anikeit Sethi, Krishanu Saini and Sai Mounika Mididoddi

(参考訳) 公衆の安全に対する懸念が高まる中で,監視場面における異常の自動検出と認識が重要である。その複雑さと実用性から、現在オープンな研究対象となっている。異常な出来事を自動的に識別することは、異常という考え方が異なるため、難しい作業です。ある状況での典型的な発生は、別の状況では異常と見なすことができる。人混みや閉塞度が高いため, 群集による監視映像では, 自動的異常識別が特に困難となる。機械学習技術を利用することで、この論文は、このユースケースに対する解決策を提供することを目的としています。我々は,新しい生成型逆向ネットワーク(gan)ベースの異常検出モデルを開発した。このモデルは、高次元画像空間の構築と、映像の文脈から潜在空間の決定について一緒に学ぶように訓練される。このジェネレータは、マルチステージチャネルアテンションベースのデコーダと、空間データと時間データの両方を実現できる2ストリームの深層畳み込みエンコーダからなる残余のオートエンコーダアーキテクチャを使用する。また,データセット間の移動学習を活用してモデルを一般化しながら,トレーニング時間を短縮するGANモデルを精錬する手法も提案している。様々な評価尺度を用いて,4つのベンチマークデータセットにおける現在の最先端技術と比較した。実験の結果,既存の手法と比較して,ネットワークはすべてのデータセットで良好に動作していることがわかった。

Accounting for the increased concern for public safety, automatic abnormal event detection and recognition in a surveillance scene is crucial. It is a current open study subject because of its intricacy and utility. The identification of aberrant events automatically, it's a difficult undertaking because everyone's idea of abnormality is different. A typical occurrence in one circumstance could be seen as aberrant in another. Automatic anomaly identification becomes particularly challenging in the surveillance footage with a large crowd due to congestion and high occlusion. With the use of machine learning techniques, this thesis study aims to offer the solution for this use case so that human resources won't be required to keep an eye out for any unusual activity in the surveillance system records. We have developed a novel generative adversarial network (GAN) based anomaly detection model. This model is trained such that it learns together about constructing a high dimensional picture space and determining the latent space from the video's context. The generator uses a residual Autoencoder architecture made up of a multi-stage channel attention-based decoder and a two-stream, deep convolutional encoder that can realise both spatial and temporal data. We have also offered a technique for refining the GAN model that reduces training time while also generalising the model by utilising transfer learning between datasets. Using a variety of assessment measures, we compare our model to the current state-of-the-art techniques on four benchmark datasets. The empirical findings indicate that, in comparison to existing techniques, our network performs favourably on all datasets.

翻訳日:2023-11-27 23:04:44 公開日:2023-11-23

# 2次情報を用いたロバスト決定集約

Robust Decision Aggregation with Second-order Information ( http://arxiv.org/abs/2311.14094v1 )

ライセンス: Link先を確認

Yuqi Pan, Zhaohua Chen, Yuqing Kong

(参考訳) 我々は,未知のバイナリ世界状態に関するプライベートシグナルを観察した後に,それぞれ二進的推薦を行う2人の専門家による意思決定集約問題を考察する。信号と状態の合同情報構造を知らないエージェントは、専門家の勧告を見て、アクションを実際の状態と一致させることを目指している。本シナリオでは,2次情報(各専門家の推薦予測)を補足することで,より優れた集計が可能かどうかを検討する。我々は,複合的な情報構造を知っている全知的なベンチマークと比較することにより,アグリゲータのパフォーマンスを評価するために,minimax regretフレームワークを採用する。一般的な情報構造では、二階情報には利益がないことを示す。簡単なアグリゲータよりも改善できるアグリゲータは存在しない。しかし、専門家の信号が世界状態から条件的に独立していると仮定すると、ポジティブな結果が得られる。本稿では,アグリゲータが決定論的である場合,第2次情報を活用するロバストアグリゲータを提案する。第2に、信号に非退化仮定を加えることによって、2次情報を用いたランダムアグリゲータが、それなしで最適なアグリゲータを超越できることを実証する。残りの設定では、2階情報は有益ではない。また、アグリゲータのユーティリティ関数がより一般的な場合に、上記の結果を設定に拡張する。

We consider a decision aggregation problem with two experts who each make a binary recommendation after observing a private signal about an unknown binary world state. An agent, who does not know the joint information structure between signals and states, sees the experts' recommendations and aims to match the action with the true state. Under the scenario, we study whether supplemented additionally with second-order information (each expert's forecast on the other's recommendation) could enable a better aggregation. We adopt a minimax regret framework to evaluate the aggregator's performance, by comparing it to an omniscient benchmark that knows the joint information structure. With general information structures, we show that second-order information provides no benefit. No aggregator can improve over a trivial aggregator, which always follows the first expert's recommendation. However, positive results emerge when we assume experts' signals are conditionally independent given the world state. When the aggregator is deterministic, we present a robust aggregator that leverages second-order information, which can significantly outperform counterparts without it. Second, when two experts are homogeneous, by adding a non-degenerate assumption on the signals, we demonstrate that random aggregators using second-order information can surpass optimal ones without it. In the remaining settings, the second-order information is not beneficial. We also extend the above results to the setting when the aggregator's utility function is more general.

翻訳日:2023-11-27 23:04:21 公開日:2023-11-23

# PortfolioMentor: インタラクティブデジタルアートポートフォリオの学習と製作のためのマルチモーダル生成AIコンパニオン

PortfolioMentor: Multimodal Generative AI Companion for Learning and Crafting Interactive Digital Art Portfolios ( http://arxiv.org/abs/2311.14091v1 )

ライセンス: Link先を確認

Tao Long, Weirui Peng

(参考訳) デジタルアートのポートフォリオは、アーティストが自分のビジョンを伝え、視覚、オーディオ、対話、物語を織り上げるためのインパクトのある媒体となる。しかし、技術的背景がなければ、美術学校における非技術的、学術的支援のための調整されたリソースの欠如や、精神的に要求されるプロセスを通じて包括的な指導ツールが欠如していることを考えると、創造的なアイデアを具体的なコードやデザインに翻訳することは困難である。コード学習におけるコンパニオンの役割を認識し,創造的なタスクを支援するための生成AIモデルの能力を活用して,IDE用のコーディングコンパニオンチャットボットであるPortfolioMentorを紹介する。このツールは、学習、インスピレーション、サポートのための積極的な提案と責任あるq&aを通じて学生を指導し、協力する。このシステムは、タスクとアーティストのビジョンの理解から始まり、視覚的なイラスト、オーディオまたは音楽の提案とファイル、対話のためのクリック・スクロール効果、創造的な視覚概念化の共創に従い、最終的にこれらのファセットを洗練されたデジタルポートフォリオに合成する。

Digital art portfolios serve as impactful mediums for artists to convey their visions, weaving together visuals, audio, interactions, and narratives. However, without technical backgrounds, design students often find it challenging to translate creative ideas into tangible codes and designs, given the lack of tailored resources for the non-technical, academic support in art schools, and a comprehensive guiding tool throughout the mentally demanding process. Recognizing the role of companionship in code learning and leveraging generative AI models' capabilities in supporting creative tasks, we present PortfolioMentor, a coding companion chatbot for IDEs. This tool guides and collaborates with students through proactive suggestions and responsible Q&As for learning, inspiration, and support. In detail, the system starts with the understanding of the task and artist's visions, follows the co-creation of visual illustrations, audio or music suggestions and files, click-scroll effects for interactions, and creative vision conceptualization, and finally synthesizes these facets into a polished interactive digital portfolio.

翻訳日:2023-11-27 23:03:57 公開日:2023-11-23

# クラス不確実性:クラス不均衡の緩和方策

Class Uncertainty: A Measure to Mitigate Class Imbalance ( http://arxiv.org/abs/2311.14090v1 )

ライセンス: Link先を確認

Z. S. Baltaci, K. Oksuz, S. Kuzucu, K. Tezoren, B. K. Konar, A. Ozkan, E. Akbas, S. Kalkan

(参考訳) 訓練例のクラスワイド特性は深層分類器の性能に影響を及ぼす。良く研究された例は、クラスのトレーニング例の数が長い尾の分布に従うときであり、この状況は、表現不足なクラスに対して最適でないパフォーマンスをもたらす可能性がある。このクラス不均衡問題は、データ再サンプリングのようなトレーニング例のクラスワイドの濃度に依存するアプローチによって解決される。本稿では,クラス濃度のみを考慮すれば,クラス不均衡の原因となる問題をすべてカバーできるわけではないことを実証する。クラス不均衡を測定するために,訓練例の平均予測不確実性として「クラス不確実性」を提案し,この新手法が濃度よりもクラス間の差異を捉えていることを示す。また, SVCI-20は, クラスが同じ数のトレーニングサンプルを持つが, それらの硬さによって異なる新しいデータセットとしてキュレートし, 基数に依存するアプローチでは対応できないクラス不均衡を生じさせる。当社の"クラス不確実性"尺度を10種類のクラス不均衡緩和手法に組み込んで,ロングテールデータセットとsvci-20上での有効性を実証した。コードとデータセットが利用可能になる。

Class-wise characteristics of training examples affect the performance of deep classifiers. A well-studied example is when the number of training examples of classes follows a long-tailed distribution, a situation that is likely to yield sub-optimal performance for under-represented classes. This class imbalance problem is conventionally addressed by approaches relying on the class-wise cardinality of training examples, such as data resampling. In this paper, we demonstrate that considering solely the cardinality of classes does not cover all issues causing class imbalance. To measure class imbalance, we propose "Class Uncertainty" as the average predictive uncertainty of the training examples, and we show that this novel measure captures the differences across classes better than cardinality. We also curate SVCI-20 as a novel dataset in which the classes have equal number of training examples but they differ in terms of their hardness; thereby causing a type of class imbalance which cannot be addressed by the approaches relying on cardinality. We incorporate our "Class Uncertainty" measure into a diverse set of ten class imbalance mitigation methods to demonstrate its effectiveness on long-tailed datasets as well as on our SVCI-20. Code and datasets will be made available.

翻訳日:2023-11-27 23:03:35 公開日:2023-11-23

# 自然言語における質問応答 : 時間表現の特例

Question Answering in Natural Language: the Special Case of Temporal Expressions ( http://arxiv.org/abs/2311.14087v1 )

ライセンス: Link先を確認

Armand Stricker

(参考訳) 近年,一般的な質問応答はよく研究されているが,時間的質問応答はそれほど注目されていない課題である。本研究の目的は,一般質問応答や回答抽出に使用される一般的なアプローチを活用し,段落内の時間的質問に対する回答を見つけることにある。モデルをトレーニングするために、SQuADにインスパイアされた新しいデータセットを提案する。我々は歴史の最大の紛争に関するいくつかの文書を含むコーパスウィキワーズの採用を選択した。評価の結果,テキスト内で直接回答しなければならない質問を受理した場合,一般的な質問応答によく使用されるパターンマッチングを行うように訓練されたディープラーニングモデルが,時間的質問応答に適応できることが示されている。

Although general question answering has been well explored in recent years, temporal question answering is a task which has not received as much focus. Our work aims to leverage a popular approach used for general question answering, answer extraction, in order to find answers to temporal questions within a paragraph. To train our model, we propose a new dataset, inspired by SQuAD, specifically tailored to provide rich temporal information. We chose to adapt the corpus WikiWars, which contains several documents on history's greatest conflicts. Our evaluation shows that a deep learning model trained to perform pattern matching, often used in general question answering, can be adapted to temporal question answering, if we accept to ask questions whose answers must be directly present within a text.

翻訳日:2023-11-27 23:03:13 公開日:2023-11-23

# フェデレーション学習による脳MRIスクリーニングツール

Brain MRI Screening Tool with Federated Learning ( http://arxiv.org/abs/2311.14086v1 )

ライセンス: Link先を確認

Roman Stoklasa, Ioannis Stathopoulos, Efstratios Karavasilis, Efstathios Efstathopoulos, Marek Dost\'al, Milo\v{s} Ke\v{r}kovsk\'y, Michal Kozubek, Luigi Serio

(参考訳) 臨床では,重症例においてもmri検査と放射線科医による診断との間に有意な遅延がみられた。場合によっては、追加情報や手がかりの欠如によって引き起こされる場合もあるため、重篤なケースでさえ診断待ちに待たなければならない。これは、追加情報を補う自動ソフトウェアツールがあれば回避でき、特定の患者が重篤なケースである可能性があると放射線科医に警告する。我々は,脳MRI自動スクリーニングツールを提示し,腫瘍様の病態を検出する能力を実証している。これは、堅牢なマルチ病理スクリーニングソリューションに向けた最初のバージョンである。このツールは連合学習をサポートするので、複数の機関がプライベートデータを開示することなくモデルに貢献することができる。

In clinical practice, we often see significant delays between MRI scans and the diagnosis made by radiologists, even for severe cases. In some cases, this may be caused by the lack of additional information and clues, so even the severe cases need to wait in the queue for diagnosis. This can be avoided if there is an automatic software tool, which would supplement additional information, alerting radiologists that the particular patient may be a severe case. We are presenting an automatic brain MRI Screening Tool and we are demonstrating its capabilities for detecting tumor-like pathologies. It is the first version on the path toward a robust multi-pathology screening solution. The tool supports Federated Learning, so multiple institutions may contribute to the model without disclosing their private data.

翻訳日:2023-11-27 23:03:01 公開日:2023-11-23

# テキスト画像検索に可視的関連バイアスをもたらすai生成画像

AI-Generated Images Introduce Invisible Relevance Bias to Text-Image Retrieval ( http://arxiv.org/abs/2311.14084v1 )

ライセンス: Link先を確認

Shicheng Xu, Danyang Hou, Liang Pang, Jingcheng Deng, Jun Xu, Huawei Shen, Xueqi Cheng

(参考訳) 世代モデルの発展に伴い、AIGC(AI- generated content)がより現実的になり、インターネットが溢れている。最近の研究は、この現象がウェブ検索のテキスト検索におけるソースバイアスの問題を増加させたことを示唆している。具体的には、ニューラル検索モデルは、人間が書いたテキストよりも高いテキストをランク付けする傾向にある。本稿では,このバイアスの研究をクロスモーダル検索に拡張する。まず,バイアスの存在を調べるための適切なベンチマークの構築に成功しました。このベンチマークのさらなる実験により、AI生成画像はテキスト画像検索モデルに目に見えない関連性バイアスをもたらすことが明らかになった。具体的には,テキスト画像検索モデルが,実際の画像よりも視覚的に関連した特徴を提示していないにもかかわらず,実際の画像よりもai生成画像を上位にランク付けする傾向があることを示す。この目に見えない関連性バイアスは、トレーニングデータやアーキテクチャの異なる検索モデルに共通している。さらに, 検索モデルの学習データにai生成画像が組み込まれることにより, 可視性バイアスが悪化することが明らかとなった。上記の現象は悪循環を引き起こし、目に見えない関連性バイアスがますます深刻になる。見えない関連性の潜在的原因を解明し、上記の問題に対処するために、目に見えない関連性バイアスを緩和するための効果的なトレーニング手法を提案する。次に,提案手法を適用して,視覚的関連性の原因を遡及的に同定し,AI生成画像が画像エンコーダを誘導し,その表現に付加情報を埋め込むことを示した。この情報は、異なる意味を持つ生成された画像間で一定の一貫性を示し、レトリバーが高い関連性スコアを推定することができる。

With the advancement of generation models, AI-generated content (AIGC) is becoming more realistic, flooding the Internet. A recent study suggests that this phenomenon has elevated the issue of source bias in text retrieval for web searches. Specifically, neural retrieval models tend to rank generated texts higher than human-written texts. In this paper, we extend the study of this bias to cross-modal retrieval. Firstly, we successfully construct a suitable benchmark to explore the existence of the bias. Subsequent extensive experiments on this benchmark reveal that AI-generated images introduce an invisible relevance bias to text-image retrieval models. Specifically, our experiments show that text-image retrieval models tend to rank the AI-generated images higher than the real images, even though the AI-generated images do not exhibit more visually relevant features to the query than real images. This invisible relevance bias is prevalent across retrieval models with varying training data and architectures. Furthermore, our subsequent exploration reveals that the inclusion of AI-generated images in the training data of the retrieval models exacerbates the invisible relevance bias. The above phenomenon triggers a vicious cycle, which makes the invisible relevance bias become more and more serious. To elucidate the potential causes of invisible relevance and address the aforementioned issues, we introduce an effective training method aimed at alleviating the invisible relevance bias. Subsequently, we apply our proposed debiasing method to retroactively identify the causes of invisible relevance, revealing that the AI-generated images induce the image encoder to embed additional information into their representation. This information exhibits a certain consistency across generated images with different semantics and can make the retriever estimate a higher relevance score.

翻訳日:2023-11-27 23:02:49 公開日:2023-11-23

# ファジィビット

The fuzzy bit ( http://arxiv.org/abs/2311.14083v1 )

ライセンス: Link先を確認

Milagrosa Aldana, Mar\'ia A. Lled\'o

(参考訳) 本稿では,ファジィ論理とファジィ集合の観点から量子力学の定式化について述べる。ヒルベルト空間の射影子の格子であるバーホフ・ヴォン・ノイマン論理(英語版)(Birkhoff-von Neumann logic)に、(量子)論理(ある性質を持つ格子)とファジィ集合のある種の族との対応を確立するピカツの結果が適用される。 3つのケース: 量子ビット、2つの量子ビットが絡み合っており、2つのエンタングルされた量子ビットの中に「ネスト」がある。ファジィ集合の会員関数は明示的に計算され、ファジィ集合のすべての接続は、これらの特定の会員関数の操作として解釈される。このようにして、考慮されたシステムに対してファジィ集合の観点から標準量子論理の完全な図を得る。

In this paper, the formulation of Quantum Mechanics in terms of fuzzy logic and fuzzy sets is explored. A result by Pykacz, that establishes a correspondence between (quantum) logics (lattices with certain properties) and certain families of fuzzy sets, is applied to the Birkhoff-von Neumann logic, the lattice of projectors of a Hilbert space. Three cases are considered: the qubit, two qubits entangled and a qutrit `nested' inside the two entangled qubits. The membership functions of the fuzzy sets are explicitly computed and all the connectives of the fuzzy sets are interpreted as operations with these particular membership functions. In this way, a complete picture of the standard quantum logic in terms of fuzzy sets is obtained for the systems considered.

翻訳日:2023-11-27 23:02:24 公開日:2023-11-23

# 一度だけ説明すると

You Only Explain Once ( http://arxiv.org/abs/2311.14081v1 )

ライセンス: Link先を確認

David A. Kelly, Hana Chockler, Daniel Kroening, Nathan Blake, Aditi Ramaswamy, Melane Navaratnarajah, Aaditya Shivakumar

(参考訳) 本稿では,対象検出器の出力を効率的に説明するための新しいブラックボックス説明可能性アルゴリズム, YO-ReXを提案する。新しいアルゴリズムは、画像で検出されたすべてのオブジェクトに対する説明を同時に計算する。したがって,新しいアルゴリズムは,ベースラインと比較して,検出対象が10個ある場合のクエリ数を10倍に削減する。スピードアップは、オブジェクトの数によってさらに増加する。実験の結果,YOLOの走行時間に対して,YOLOの出力を無視できるオーバーヘッドで説明できることがわかった。また、SSDとFaster R-CNNについても同様の結果を示す。この高速化は、アグレッシブプルーニングと因果解析を組み合わせることで、バックトラックを回避することで達成される。

In this paper, we propose a new black-box explainability algorithm and tool, YO-ReX, for efficient explanation of the outputs of object detectors. The new algorithm computes explanations for all objects detected in the image simultaneously. Hence, compared to the baseline, the new algorithm reduces the number of queries by a factor of 10X for the case of ten detected objects. The speedup increases further with with the number of objects. Our experimental results demonstrate that YO-ReX can explain the outputs of YOLO with a negligible overhead over the running time of YOLO. We also demonstrate similar results for explaining SSD and Faster R-CNN. The speedup is achieved by avoiding backtracking by combining aggressive pruning with a causal analysis.

翻訳日:2023-11-27 23:02:07 公開日:2023-11-23

# GigaPose: 1つの対応による高速でロバストな新しいオブジェクトポス推定

GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence ( http://arxiv.org/abs/2311.14155v1 )

ライセンス: Link先を確認

Van Nguyen Nguyen, Thibault Groueix, Mathieu Salzmann, Vincent Lepetit

(参考訳) 本稿では,RGB画像におけるCADに基づく新しいオブジェクトポーズ推定手法であるGigaPoseを提案する。 gigaposeはまず識別テンプレート、cadモデルのレンダリング画像を活用し、面外回転を回復し、残りの4つのパラメータをパッチ対応で推定する。提案手法では,通常の3倍ではなく,2自由度でのみテンプレートをサンプリングし,特徴空間の高速近傍探索を用いて入力画像とテンプレートをマッチングすることにより,最先端技術と比較して38倍の高速化率が得られる。さらに、GigaPoseはセグメンテーションエラーに対してはるかに堅牢である。 BOPチャレンジの7つのコアデータセットに対する広範な評価は、最先端の精度を実現し、改良手法とシームレスに統合できることを示しています。さらに,1枚の画像から3次元再構成を行い,CADモデルの必要性を緩和し、6次元ポーズオブジェクト推定をより便利にするための3次元モデルによるGigaPoseの可能性を示す。私たちのソースコードとトレーニングされたモデルはhttps://github.com/nv-nguyen/gigaPoseで公開されています。

We present GigaPose, a fast, robust, and accurate method for CAD-based novel object pose estimation in RGB images. GigaPose first leverages discriminative templates, rendered images of the CAD models, to recover the out-of-plane rotation and then uses patch correspondences to estimate the four remaining parameters. Our approach samples templates in only a two-degrees-of-freedom space instead of the usual three and matches the input image to the templates using fast nearest neighbor search in feature space, results in a speedup factor of 38x compared to the state of the art. Moreover, GigaPose is significantly more robust to segmentation errors. Our extensive evaluation on the seven core datasets of the BOP challenge demonstrates that it achieves state-of-the-art accuracy and can be seamlessly integrated with a refinement method. Additionally, we show the potential of GigaPose with 3D models predicted by recent work on 3D reconstruction from a single image, relaxing the need for CAD models and making 6D pose object estimation much more convenient. Our source code and trained models are publicly available at https://github.com/nv-nguyen/gigaPose

翻訳日:2023-11-27 16:42:19 公開日:2023-11-23

# Tube-NeRF:Tube-Guided Data AugmentationとNeRFを用いたMPCからのVisuomotor Policiesの効率的な模倣学習

Tube-NeRF: Efficient Imitation Learning of Visuomotor Policies from MPC using Tube-Guided Data Augmentation and NeRFs ( http://arxiv.org/abs/2311.14153v1 )

ライセンス: Link先を確認

Andrea Tagliabue, Jonathan P. How

(参考訳) 模倣学習(il)は、リソース集約型モデル予測コントローラ(mpc)から計算効率の高いセンサモジュレータポリシをトレーニングできるが、多くのサンプルを必要とするため、長いトレーニング時間や限定的な堅牢性が求められる。これらの問題に対処するために,il と不確実性を考慮したロバストな mpc の変種を組み合わせることで,視覚に基づくポリシの効率的な学習を可能にするデータ拡張 (da) 戦略を設計する。提案手法はneural radiance field (nerfs) を利用して新しい合成画像を生成し、ロバストなmpc(チューブ)の特性を利用して関連するビューを選択し、対応するアクションを効率的に計算する。搭載カメラからの映像を水平位置のみのソースとして制御動作を生成するビジュモータポリシーを学習することにより、マルチロータ上での局所化と軌道追跡のタスクに対する我々のアプローチを調整する。実演効率を80倍に向上し,現行のIL法に比べて50%のトレーニング時間を短縮し,ロバストなビズモータ政策の学習を数値的に示す。さらに,本手法は実マルチロケータへの移行に成功し,大きな乱れがあっても正確なローカライズと低トラッキングエラーを実現し,オンボード推算時間は1.5msであった。

Imitation learning (IL) can train computationally-efficient sensorimotor policies from a resource-intensive Model Predictive Controller (MPC), but it often requires many samples, leading to long training times or limited robustness. To address these issues, we combine IL with a variant of robust MPC that accounts for process and sensing uncertainties, and we design a data augmentation (DA) strategy that enables efficient learning of vision-based policies. The proposed DA method, named Tube-NeRF, leverages Neural Radiance Fields (NeRFs) to generate novel synthetic images, and uses properties of the robust MPC (the tube) to select relevant views and to efficiently compute the corresponding actions. We tailor our approach to the task of localization and trajectory tracking on a multirotor, by learning a visuomotor policy that generates control actions using images from the onboard camera as only source of horizontal position. Our evaluations numerically demonstrate learning of a robust visuomotor policy with an 80-fold increase in demonstration efficiency and a 50% reduction in training time over current IL methods. Additionally, our policies successfully transfer to a real multirotor, achieving accurate localization and low tracking errors despite large disturbances, with an onboard inference time of only 1.5 ms.

翻訳日:2023-11-27 16:41:53 公開日:2023-11-23

# 経時的立方体PatchGAN(TCuP-GAN)を用いた3次元腫瘍切除

Automated 3D Tumor Segmentation using Temporal Cubic PatchGAN (TCuP-GAN) ( http://arxiv.org/abs/2311.14148v1 )

ライセンス: Link先を確認

Kameswara Bharadwaj Mantha, Ramanakumar Sankar, Lucy Fortson

(参考訳) 最新の深層学習技術を用いた堅牢な汎用3Dセグメンテーションフレームワークの開発は,様々なバイオメディカル領域において活発な話題の1つである。本研究では,3次元セグメンテーションの課題に対して,畳み込み長短期記憶ネットワーク(LSTM)を用いた生成的特徴学習フレームワークの概念をマージするボリューム・ツー・ボリューム翻訳モデルであるテンポラルキュービック・パッチGAN(TCuP-GAN)を紹介する。われわれは2023年のBrain tumor Segmentation (BraTS) Challengeで紹介された4つのセグメンテーション課題(Adult Glioma, Meningioma, Pediatric tumors, Sub-Saharan Africa subset)のデータに基づいてTCuP-GANの能力を実証し、LesionWise Dice類似度と95%のHausdorff Distance測定値を用いてその性能を定量化する。我々は,すべての課題に対してロバストなマルチクラスセグメンテーションマスクを予測するためのフレームワークの学習を成功させた。このベンチマーク作業は、将来TCuP-GANを電子顕微鏡イメージングにおける多臓器分割のような他のマルチクラスタスクに適用するための足掛かりとなる。

Development of robust general purpose 3D segmentation frameworks using the latest deep learning techniques is one of the active topics in various bio-medical domains. In this work, we introduce Temporal Cubic PatchGAN (TCuP-GAN), a volume-to-volume translational model that marries the concepts of a generative feature learning framework with Convolutional Long Short-Term Memory Networks (LSTMs), for the task of 3D segmentation. We demonstrate the capabilities of our TCuP-GAN on the data from four segmentation challenges (Adult Glioma, Meningioma, Pediatric Tumors, and Sub-Saharan Africa subset) featured within the 2023 Brain Tumor Segmentation (BraTS) Challenge and quantify its performance using LesionWise Dice similarity and $95\%$ Hausdorff Distance metrics. We demonstrate the successful learning of our framework to predict robust multi-class segmentation masks across all the challenges. This benchmarking work serves as a stepping stone for future efforts towards applying TCuP-GAN on other multi-class tasks such as multi-organelle segmentation in electron microscopy imaging.

翻訳日:2023-11-27 16:41:25 公開日:2023-11-23

# アクティブラーニングを用いたドメイン適応意味セグメンテーションのためのクラスバランス動的獲得

Class Balanced Dynamic Acquisition for Domain Adaptive Semantic Segmentation using Active Learning ( http://arxiv.org/abs/2311.14146v1 )

ライセンス: Link先を確認

Marc Schachtsiek and Simone Rossi and Thomas Hannagan

(参考訳) ドメイン適応型アクティブラーニングは、ニューラルネットワークのラベル効率の良いトレーニングの責任者である。セマンティクスのセグメンテーションでは、最先端のモデルは2つの不確実性と多様性の基準を使ってトレーニングラベルを選択し、ピクセル単位での獲得戦略を組み合わせる。しかし,このような手法は現在,大規模アクティブな学習予算に対する成績を低下させるクラス不均衡問題に苦しんでいる。次に,この問題を特に高予算環境において軽減する新しいアクティブラーニング手法であるcbda(class balanced dynamic acquisition)を導入する。よりバランスの取れたラベルによってマイノリティクラスのパフォーマンスが向上し、それによってモデルはそれぞれ5%、10%、20%の予算で以前のベースラインを0.6、1.7、2.4miouで上回ることができる。さらに、マイノリティクラスへのフォーカスは、それぞれ0.5、2.9、および4.6 IoUの最小クラスパフォーマンスの改善につながる。トップパフォーマンスモデルは、完全に教師されたベースラインを超え、地上の真実全体よりもバランスのとれたラベルが有益であることを示す。

Domain adaptive active learning is leading the charge in label-efficient training of neural networks. For semantic segmentation, state-of-the-art models jointly use two criteria of uncertainty and diversity to select training labels, combined with a pixel-wise acquisition strategy. However, we show that such methods currently suffer from a class imbalance issue which degrades their performance for larger active learning budgets. We then introduce Class Balanced Dynamic Acquisition (CBDA), a novel active learning method that mitigates this issue, especially in high-budget regimes. The more balanced labels increase minority class performance, which in turn allows the model to outperform the previous baseline by 0.6, 1.7, and 2.4 mIoU for budgets of 5%, 10%, and 20%, respectively. Additionally, the focus on minority classes leads to improvements of the minimum class performance of 0.5, 2.9, and 4.6 IoU respectively. The top-performing model even exceeds the fully supervised baseline, showing that a more balanced label than the entire ground truth can be beneficial.

翻訳日:2023-11-27 16:41:00 公開日:2023-11-23

# 半導体中の2次元水素原子に対する宇宙弦の影響とRytova-Keldysh対数近似との関係

Cosmic string influence on a 2D hydrogen atom and its relationship with the Rytova-Keldysh logarithmic approximation in semiconductors ( http://arxiv.org/abs/2311.14144v1 )

ライセンス: Link先を確認

Frankbelson dos S. Azevedo, Izael A. Lima, Gallileu Genesis, Rodolfo Casana, Edilberto O. Silva

(参考訳) 二次元水素原子は、ストレートな宇宙線の存在下で電子と陽子の間の量子相互作用を記述するための有望な代替手段を提供する。水素原子を2次元に減らすことは、宇宙の弦に付随する円筒/円錐対称性を捉え、より適切な物理系の記述を与えるのに適している。 schr\"dinger's equation を解いた後、位相的欠陥の影響下で対数ポテンシャルを持つ水素原子の固有エネルギー、確率分布関数、および期待値を計算する。有限差分法を用いて、2次元水素原子の計算を初めて行う。結果はグラフィックス、テーブル、図を通して示され、システムの物理的特性を解明する。計算結果が線形変分法の結果と一致することを確認した。本モデルは, 特定の半導体領域内に位置する2次元単層半導体において, 励起子と興味深い類似性をもたらす。この類似性を解明するために,いくつかの相互作用ポテンシャルと励起子固有状態について文献から得られた結果と比較し,議論する。

A two-dimensional hydrogen atom offers a promising alternative for describing the quantum interaction between an electron and a proton in the presence of a straight cosmic string. Reducing the hydrogen atom to two dimensions enhances its suited to capture the cylindrical/conical symmetry associated with the cosmic string, providing a more appropriate description of the physical system. After solving Schr\"dinger's equation, we calculate the eigenenergies, probability distribution function, and expected values for the hydrogen atom with logarithmic potential under the influence of the topological defect. The calculations for the 2D hydrogen atom are performed for the first time using the Finite Difference Method. The results are presented through graphics, tables, and diagrams to elucidate the system's physical properties. We have verified that our calculations agree with a linear variational method result. Our model leads to an interesting analogy with excitons in a two-dimensional monolayer semiconductor located within a specific semiconductor region. To elucidate this analogy, we present and discuss some interaction potentials and their exciton eigenstates by comparing them with the results from the literature.

翻訳日:2023-11-27 16:40:42 公開日:2023-11-23

# 量子コンピュータにおける粗粒タンパク質折り畳み問題の解法

An approach to solve the coarse-grained Protein folding problem in a Quantum Computer ( http://arxiv.org/abs/2311.14141v1 )

ライセンス: Link先を確認

Jaya Vasavi P, Soham Bopardikar, Avinash D, Ashwini K, Kalyan Dasgupta, Sanjib Senapati

(参考訳) タンパク質の折り畳みは、アミノ酸配列からタンパク質の構造を決定するもので、生物学の半世紀前の問題である。タンパク質の機能はその構造と相関し、生体内で起こる細胞や分子のメカニズムを研究するためにタンパク質の折りたたみを理解する必要性を強調する。タンパク質の構造と酵素の理解は、標的ベースの薬物設計、タンパク質関連疾患機構の解明、新規酵素の革新において重要な役割を担っている。 AIに基づくタンパク質構造予測法の最近の進歩はタンパク質の折り畳み問題をある程度解決しているが、タンパク質の構造を低い配列類似性で決定する精度は限られている。古典的手法は広範囲なコンフォメーションサンプリングの生成において困難に直面しており、量子ベースのアプローチはタンパク質折り畳み問題を解くのに有利である。本研究では,hpモデルを初期枠組みとして,より小さなタンパク質配列の構造を予測するためのゲート型量子コンピュータ上で実行可能な,新たなターンベースのエンコーディングアルゴリズムを開発した。 HPモデルはタンパク質の折り畳み現象における大きなステップであり、疎水性アミノ酸をタンパク質の内部にもたらす疎水性崩壊である。折り畳み問題は、直交軸に平行な縁に沿って自由度を持つ3次元立方体格子、および軸方向平面に平行な対角線に沿って鋳造される。高次項のオリジナルの定式化はゲートベースの量子ハードウェアで実行できるが、QUBOの定式化は、アンニールとIBM CPLEXと量子ハードウェアの両方の古典的なソフトウェアに対して結果を与えることができる。

Protein folding, which dictates the protein structure from its amino acid sequence, is half a century old problem of biology. The function of the protein correlates with its structure, emphasizing the need of understanding protein folding for studying the cellular and molecular mechanisms that occur within biological systems. Understanding protein structures and enzymes plays a critical role in target based drug designing, elucidating protein-related disease mechanisms, and innovating novel enzymes. While recent advancements in AI based protein structure prediction methods have solved the protein folding problem to an extent, their precision in determining the structure of the protein with low sequence similarity is limited. Classical methods face challenges in generating extensive conformational samplings, making quantum-based approaches advantageous for solving protein folding problems. In this work we developed a novel turn based encoding algorithm that can be run on a gate based quantum computer for predicting the structure of smaller protein sequences using the HP model as an initial framework, which can be extrapolated in its application to larger and more intricate protein systems in future. The HP model best represents a major step in protein folding phenomena - the hydrophobic collapse which brings the hydrophobic amino acid to the interior of a protein. The folding problem is cast in a 3D cubic lattice with degrees of freedom along edges parallel to the orthogonal axes, as well as along diagonals parallel to the axial planes. While, the original formulation with higher order terms can be run on gate based quantum hardwares, the QUBO formulation can give results on both classical softwares employing annealers and IBM CPLEX as well as quantum hardwares.

翻訳日:2023-11-27 16:40:25 公開日:2023-11-23

# 医療保険の費用予測のための機械学習

Machine Learning For An Explainable Cost Prediction of Medical Insurance ( http://arxiv.org/abs/2311.14139v1 )

ライセンス: Link先を確認

Ugochukwu Orji and Elochukwu Ukwandu

(参考訳) 医療における予測モデリングは、生産性と効率を高めるために機械学習アプローチの可能性を最大化しようとする保険会社が増えているため、活動的な研究テーマであり続けている。本稿では, 医療保険コストの予測のために, 高度勾配ブースティング, 勾配ブースティングマシン, ランダムフォレストによる決定木の変動を組み合わせる回帰型アンサンブルmlモデルを3つ導入した。説明可能な人工知能手法 SHapley Additive exPlanationsと個人条件予測プロットを配置し、データセットにおける医療保険のプレミアム価格に影響を与える重要な要因を発見し説明した。使用されるデータセットは986レコードで構成され、KAGGLEリポジトリで公開されている。 r-squared, 平均絶対誤差, 根平均二乗誤差, 平均絶対パーセンテージ誤差の4つの性能評価指標を用いて評価を行った。結果,xgboostモデルでは計算資源が増加したが,rfモデルでは予測誤差が小さく,xgboostモデルに比べて計算資源がはるかに少ないという結果が得られた。さらに,各モデルのPremiumPricesに影響を及ぼす重要な決定的特徴を同定する上で,両者のXAi手法の結果を比較し,両者が類似した結果を得たのに対し,ICEプロットはより高レベルなSHAP解析よりも,各変数間の相互作用をより詳細に示した。本研究は, 政策立案者, 保険会社, 潜在的な医療保険購入者が, 特定のニーズを満たす適切な政策を選択するための意思決定プロセスを支援することを目的としている。

Predictive modeling in healthcare continues to be an active actuarial research topic as more insurance companies aim to maximize the potential of Machine Learning approaches to increase their productivity and efficiency. In this paper, the authors deployed three regression-based ensemble ML models that combine variations of decision trees through Extreme Gradient Boosting, Gradient-boosting Machine, and Random Forest) methods in predicting medical insurance costs. Explainable Artificial Intelligence methods SHapley Additive exPlanations and Individual Conditional Expectation plots were deployed to discover and explain the key determinant factors that influence medical insurance premium prices in the dataset. The dataset used comprised 986 records and is publicly available in the KAGGLE repository. The models were evaluated using four performance evaluation metrics, including R-squared, Mean Absolute Error, Root Mean Squared Error, and Mean Absolute Percentage Error. The results show that all models produced impressive outcomes; however, the XGBoost model achieved a better overall performance although it also expanded more computational resources, while the RF model recorded a lesser prediction error and consumed far fewer computing resources than the XGBoost model. Furthermore, we compared the outcome of both XAi methods in identifying the key determinant features that influenced the PremiumPrices for each model and whereas both XAi methods produced similar outcomes, we found that the ICE plots showed in more detail the interactions between each variable than the SHAP analysis which seemed to be more high-level. It is the aim of the authors that the contributions of this study will help policymakers, insurers, and potential medical insurance buyers in their decision-making process for selecting the right policies that meet their specific needs.

翻訳日:2023-11-27 16:39:54 公開日:2023-11-23

# プライバシー保護アルゴリズム

Privacy-Preserving Algorithmic Recourse ( http://arxiv.org/abs/2311.14137v1 )

ライセンス: Link先を確認

Sikha Pentyala, Shubham Sharma, Sanjay Kariyappa, Freddy Lecue, Daniele Magazzeni

(参考訳) 個人が機械学習モデルから有害な結果を受ける場合、ポジティブな結果を達成するためのリコースパスの提供が望ましい。最近の研究は、反事実的説明(一段階の会話の手段として使用できる)がプライバシー問題に弱いことを示し、個人のプライバシーを危険に晒している。リコースのためのシーケンシャルなマルチステップパスを提供することで、このリスクを増幅することができる。さらに、既存の手法から得られた経路にノイズを加えるだけで、エンドユーザーにとって経路の現実性と実行可能性に影響を与える可能性がある。本研究では, 実例に基づく現実的なリコースパスを生成する際のプライバシ問題に対処し, 現実的なリコースパスを提供できるエンドツーエンドのプライバシ保護パイプラインであるPrivRecourseを提供する。 PrivRecourseは、プライベートデータセットの重複しないサブセットを表現するために、差分プライベート(DP)クラスタリングを使用する。これらのDPクラスタセンターは、クラスタセンターをノードとしてグラフを形成することで、リコースパスを生成するために使用される。金融データセットに対する我々のアプローチを実証的に評価し、それを単にデータインスタンスにノイズを加えること、DP合成データを用いてグラフを生成することと比較した。 PrivRecourseはプライベートでリアルなパスを提供することができる。

When individuals are subject to adverse outcomes from machine learning models, providing a recourse path to help achieve a positive outcome is desirable. Recent work has shown that counterfactual explanations - which can be used as a means of single-step recourse - are vulnerable to privacy issues, putting an individuals' privacy at risk. Providing a sequential multi-step path for recourse can amplify this risk. Furthermore, simply adding noise to recourse paths found from existing methods can impact the realism and actionability of the path for an end-user. In this work, we address privacy issues when generating realistic recourse paths based on instance-based counterfactual explanations, and provide PrivRecourse: an end-to-end privacy preserving pipeline that can provide realistic recourse paths. PrivRecourse uses differentially private (DP) clustering to represent non-overlapping subsets of the private dataset. These DP cluster centers are then used to generate recourse paths by forming a graph with cluster centers as the nodes, so that we can generate realistic - feasible and actionable - recourse paths. We empirically evaluate our approach on finance datasets and compare it to simply adding noise to data instances, and to using DP synthetic data, to generate the graph. We observe that PrivRecourse can provide paths that are private and realistic.

翻訳日:2023-11-27 16:39:24 公開日:2023-11-23

# IoT上での協調機械学習のためのブロックチェーンソリューション

A Blockchain Solution for Collaborative Machine Learning over IoT ( http://arxiv.org/abs/2311.14136v1 )

ライセンス: Link先を確認

Carlos Beis-Penedo and Francisco Troncoso-Pastoriza and Rebeca P. D\'iaz-Redondo and Ana Fern\'andez-Vilas and Manuel Fern\'andez-Veiga and Mart\'in Gonz\'alez Soto

(参考訳) iot(internet of things, モノのインターネット)デバイスとアプリケーションの急速な成長は、データのプライバシ、セキュリティ、スケーラビリティに関わる課題を処理可能な高度な分析と機械学習技術に対する需要の増加につながった。フェデレートラーニング(FL)とブロックチェーン技術は、分散データソース上での分散型、セキュア、プライバシ保護モデルトレーニングを可能にすることによって、これらの課題に対処するための有望なアプローチとして登場した。本稿では,学習ベクトル量子化アルゴリズム(XuILVQ)とEthereumブロックチェーン技術を組み合わせて,セキュアで効率的なデータ共有,モデルトレーニング,分散環境におけるプロトタイプストレージを実現する,新たなIoTソリューションを提案する。提案アーキテクチャは,データプライバシとセキュリティを維持しながら,計算オーバーヘッドと通信オーバーヘッドを削減することにより,既存のブロックチェーンベースのFLソリューションの欠点に対処する。我々は,iot環境における機械学習タスクの精度と効率を向上させる可能性を示す一連の実験を通じて,システムの性能を評価する。

The rapid growth of Internet of Things (IoT) devices and applications has led to an increased demand for advanced analytics and machine learning techniques capable of handling the challenges associated with data privacy, security, and scalability. Federated learning (FL) and blockchain technologies have emerged as promising approaches to address these challenges by enabling decentralized, secure, and privacy-preserving model training on distributed data sources. In this paper, we present a novel IoT solution that combines the incremental learning vector quantization algorithm (XuILVQ) with Ethereum blockchain technology to facilitate secure and efficient data sharing, model training, and prototype storage in a distributed environment. Our proposed architecture addresses the shortcomings of existing blockchain-based FL solutions by reducing computational and communication overheads while maintaining data privacy and security. We assess the performance of our system through a series of experiments, showcasing its potential to enhance the accuracy and efficiency of machine learning tasks in IoT settings.

翻訳日:2023-11-27 16:39:04 公開日:2023-11-23

# 準隠れ分子自由度を用いた量子計算のためのロバストな枠組み

A robust framework for quantum computation using quasi-hidden molecular degrees of freedom ( http://arxiv.org/abs/2311.14133v1 )

ライセンス: Link先を確認

Martin Zeppenfeld

(参考訳) 本稿では,分子の環境や他の環境から分離された自由度に基づく分子による量子情報処理の新たなアプローチについて論じる。このような自由度は、ノイズの多い環境でも長期量子ストレージを提供し、他の分子と外部システムの間で量子操作が行われる間、独立した保護された量子メモリを提供する。分子内の準隠れ自由度を実現するいくつかの可能性を示し、そのような自由度を実際に利用するいくつかの例について論じる。準隠れ自由度を使うことは、分子ベースの量子コンピュータの展望を大幅に改善することができる。

We discuss a novel approach to quantum information processing with molecules based on molecular degrees of freedom which are isolated from the environment as well as from the rest of the molecule. Such a degree of freedom can provide long-term quantum storage even in a noisy environment, and provides an independent protected quantum memory while quantum operations are performed between the rest of the molecule and external systems. We present several possibilities for realizing a quasi-hidden degree of freedom in a molecule, and discuss a number of examples for using such a degree of freedom in practice. Using quasi-hidden degrees of freedom could substantially improve the prospects for a molecule-based quantum computer.

翻訳日:2023-11-27 16:38:47 公開日:2023-11-23

# 物理インフォームドニューラルネットワークと動的システムのためのディープ・オペレーター・ネットワーク

Exactly conservative physics-informed neural networks and deep operator networks for dynamical systems ( http://arxiv.org/abs/2311.14131v1 )

ライセンス: Link先を確認

Elsa Cardoso-Bihlo and Alex Bihlo

(参考訳) 本稿では,自然力学系に対する厳密な物理不定形ニューラルネットワークと物理不定形深層作用素ネットワークの訓練手法を提案する。この方法は、ニューラルネットワークソルバが学習した候補解を、少なくとも1つの第一積分を持つ任意の力学系に対して不変多様体にマッピングするプロジェクションベースの手法を用いる。物理インフォームドニューラルネットワークと物理インフォームドな動的システムのためのディープ・オペレーター・ネットワークは、数理科学から現実のいくつかの問題に対する非保守的問題を著しく上回っている。

We introduce a method for training exactly conservative physics-informed neural networks and physics-informed deep operator networks for dynamical systems. The method employs a projection-based technique that maps a candidate solution learned by the neural network solver for any given dynamical system possessing at least one first integral onto an invariant manifold. We illustrate that exactly conservative physics-informed neural network solvers and physics-informed deep operator networks for dynamical systems vastly outperform their non-conservative counterparts for several real-world problems from the mathematical sciences.

翻訳日:2023-11-27 16:38:37 公開日:2023-11-23

# ビザンチンのロバスト性と部分的参加を同時に達成できる: クリップ勾配の違いだけ

Byzantine Robustness and Partial Participation Can Be Achieved Simultaneously: Just Clip Gradient Differences ( http://arxiv.org/abs/2311.14127v1 )

ライセンス: Link先を確認

Grigory Malinovsky, Peter Richt\'arik, Samuel Horv\'ath, Eduard Gorbunov

(参考訳) 大規模機械学習モデルをトレーニングするための主要なパラダイムとして、分散学習が登場した。しかし、現実世界のシナリオでは、参加者は信頼できないか悪意がある場合があり、訓練されたモデルの完全性と正確性に重大な課題がある。これらの問題に対処するためにビザンチンのフォールトトレランスメカニズムが提案されているが、彼らはしばしば全クライアントからの完全な参加を想定している。本研究では,クライアントサンプリングとビザンチン労働者への耐性を証明可能な最初の分散手法を提案する。提案手法の背後にある重要なアイデアは,再帰的分散還元における確率的勾配差を制御するために勾配クリッピングを用いることである。これにより、すべてのサンプルクライアントがビザンチンであるイテレーションの間でさえも、ビザンチンワーカーによって引き起こされる潜在的な危害に縛られることができます。さらに,通信効率を向上させるために,通信圧縮を組み込んだ。非常に一般的な仮定の下では、既存の最先端(SOTA)理論結果と一致する提案手法の収束率を証明する。

Distributed learning has emerged as a leading paradigm for training large machine learning models. However, in real-world scenarios, participants may be unreliable or malicious, posing a significant challenge to the integrity and accuracy of the trained models. Byzantine fault tolerance mechanisms have been proposed to address these issues, but they often assume full participation from all clients, which is not always practical due to the unavailability of some clients or communication constraints. In our work, we propose the first distributed method with client sampling and provable tolerance to Byzantine workers. The key idea behind the developed method is the use of gradient clipping to control stochastic gradient differences in recursive variance reduction. This allows us to bound the potential harm caused by Byzantine workers, even during iterations when all sampled clients are Byzantine. Furthermore, we incorporate communication compression into the method to enhance communication efficiency. Under quite general assumptions, we prove convergence rates for the proposed method that match the existing state-of-the-art (SOTA) theoretical results.

翻訳日:2023-11-27 16:38:28 公開日:2023-11-23

# 大規模言語モデルへの展望:テキストに基づくステレオタイプ検出の改善

Towards Auditing Large Language Models: Improving Text-based Stereotype Detection ( http://arxiv.org/abs/2311.14126v1 )

ライセンス: Link先を確認

Wu Zekun, Sahan Bulathwela, Adriano Soares Koshiyama

(参考訳) 大規模言語モデル(llm)は、ai(artificial intelligence, 人工知能)によって人間を対象とするアプリケーションにおいて、近年大きな進歩を遂げている。しかし、LLMは歴史的データから受け継いだステレオタイプ的な出力を生成し、社会的偏見を増幅し、倫理的関心を喚起する。この作品を紹介する一性別、人種、職業及び宗教のステレオタイプテキストの52,751例を含む多粒ステレオタイプデータセット二英語テキストの新規なステレオタイプ分類器そこで本研究では,新しいデータセットを用いてトレーニングしたモデルを提案する。実験では,マルチクラス環境でモデルのトレーニングが1-vs-allバイナリセットよりも優れることを示した。異なるeXplainable AIツールからの一貫性のある機能重要信号は、新しいモデルが関連するテキスト機能を利用することを示す。我々は,新たに作成されたモデルを用いて,一般的なGPTモデルのステレオタイプ行動を評価し,時間とともにバイアスの低減を観察する。まとめると,本研究はLLMのステレオタイプバイアスを監査・評価するための,堅牢で実用的な枠組みを確立している。

Large Language Models (LLM) have made significant advances in the recent past becoming more mainstream in Artificial Intelligence (AI) enabled human-facing applications. However, LLMs often generate stereotypical output inherited from historical data, amplifying societal biases and raising ethical concerns. This work introduces i) the Multi-Grain Stereotype Dataset, which includes 52,751 instances of gender, race, profession and religion stereotypic text and ii) a novel stereotype classifier for English text. We design several experiments to rigorously test the proposed model trained on the novel dataset. Our experiments show that training the model in a multi-class setting can outperform the one-vs-all binary counterpart. Consistent feature importance signals from different eXplainable AI tools demonstrate that the new model exploits relevant text features. We utilise the newly created model to assess the stereotypic behaviour of the popular GPT family of models and observe the reduction of bias over time. In summary, our work establishes a robust and practical framework for auditing and evaluating the stereotypic bias in LLM.

翻訳日:2023-11-27 16:38:10 公開日:2023-11-23

# 二重効率な議論によるスケーラブルなAI安全性

Scalable AI Safety via Doubly-Efficient Debate ( http://arxiv.org/abs/2311.14125v1 )

ライセンス: Link先を確認

Jonah Brown-Cohen, Geoffrey Irving, Georgios Piliouras

(参考訳) 多様な複雑なドメインの集合にまたがる強力な能力を持つ事前訓練されたAIシステムの出現は、人間が直接判断するにはタスクが複雑すぎるため、AIの安全性にとって重要な課題となった。アーヴィングとアル。 [2018]は、(ミス)アライメントを識別する問題が管理可能なサブタスクに分解されるまで、このようなAIモデルのパワーを互いに比較することを目的として、この方向の議論手法を提案した。このアプローチの約束は明確だが、当初のフレームワークは、正直な戦略が決定論的AIシステムを指数関数的なステップでシミュレートし、適用性を制限するという前提に基づいていた。本稿では, 確率的AIシステムのアライメントを検証すると同時に, 指数関数的に多くのシミュレーションステップを使用できる場合にも, 多項式数のシミュレーションを用いて, 正直な戦略が常に成功するような, 新たな議論プロトコルを設計することで, これらの課題に対処する方法を示す。

The emergence of pre-trained AI systems with powerful capabilities across a diverse and ever-increasing set of complex domains has raised a critical challenge for AI safety as tasks can become too complicated for humans to judge directly. Irving et al. [2018] proposed a debate method in this direction with the goal of pitting the power of such AI models against each other until the problem of identifying (mis)-alignment is broken down into a manageable subtask. While the promise of this approach is clear, the original framework was based on the assumption that the honest strategy is able to simulate deterministic AI systems for an exponential number of steps, limiting its applicability. In this paper, we show how to address these challenges by designing a new set of debate protocols where the honest strategy can always succeed using a simulation of a polynomial number of steps, whilst being able to verify the alignment of stochastic AI systems, even when the dishonest strategy is allowed to use exponentially many simulation steps.

翻訳日:2023-11-27 16:37:54 公開日:2023-11-23

# ストリーミングモデルにおける最大有向カット近似に対する指数量子空間アドバンテージ

Exponential Quantum Space Advantage for Approximating Maximum Directed Cut in the Streaming Model ( http://arxiv.org/abs/2311.14123v1 )

ライセンス: Link先を確認

John Kallaugher and Ojas Parekh and Nadezhda Voronova

(参考訳) 量子アドバンテージの探索は一般的に実行時間の高速化に重点を置いているが、量子アルゴリズムは空間複雑性の利点も提供する。これまでの研究は、要素がランダムアクセスなしで順次処理されるデータストリーム問題に対してそのような利点を示してきたが、これらは特別に構築された問題(Le Gall, SPAA `06]や多項式の利点(Kallaugher, FOCS `21]に限られていた。最大有向切断問題に対する指数量子空間の優位性を示す。これは、任意の自然ストリーミング問題に対する最初の指数関数的量子空間アドバンテージである。これはまた、任意の設定で離散最適化問題を近似する最初の非条件指数量子リソースの利点を構成する。我々の量子ストリーミングアルゴリズム$0.4844$は、ポリログ$(n)$空間を用いたグラフストリームにおける最大有向カットの値を近似するが、chou, golovnev, velusamy [focs '20] による以前の研究は、任意の古典的なストリーミングアルゴリズムに対して$\omega(\sqrt{n})$空間を必要とする。この結果は、Sexena, Singer, Sudan, Velusamy [FOCS '23] による空間古典的ストリーミングアプローチである $\widetilde{\text{O}}(\sqrt{n})$ に基づいており、Singer [APPROX '23] による最近の研究により近似比がさらに改善されている。

While the search for quantum advantage typically focuses on speedups in execution time, quantum algorithms also offer the potential for advantage in space complexity. Previous work has shown such advantages for data stream problems, in which elements arrive and must be processed sequentially without random access, but these have been restricted to specially-constructed problems [Le Gall, SPAA `06] or polynomial advantage [Kallaugher, FOCS `21]. We show an exponential quantum space advantage for the maximum directed cut problem. This is the first known exponential quantum space advantage for any natural streaming problem. This also constitutes the first unconditional exponential quantum resource advantage for approximating a discrete optimization problem in any setting. Our quantum streaming algorithm $0.4844$-approximates the value of the largest directed cut in a graph stream with $n$ vertices using polylog$(n)$ space, while previous work by Chou, Golovnev, and Velusamy [FOCS '20] implies that obtaining an approximation ratio better than $4/9 \approx 0.4444$ requires $\Omega(\sqrt{n})$ space for any classical streaming algorithm. Our result is based on a recent $\widetilde{\text{O}}(\sqrt{n})$ space classical streaming approach by Saxena, Singer, Sudan, and Velusamy [FOCS '23], with an additional improvement in the approximation ratio due to recent work by Singer [APPROX '23].

翻訳日:2023-11-27 16:37:36 公開日:2023-11-23

# 3次元ctを用いた残差三重畳畳み込みニューラルネットワークによるmtbi診断の強化

Enhancing mTBI Diagnosis with Residual Triplet Convolutional Neural Network Using 3D CT ( http://arxiv.org/abs/2311.14197v1 )

ライセンス: Link先を確認

Hanem Ellethy, Shekhar S. Chandra and Viktor Vegh

(参考訳) 軽度外傷性脳損傷(mTBI)は、正確に診断する上で一般的で困難な疾患である。タイムリーかつ正確な診断は、効果的な治療と患者の成績改善に不可欠である。 mtbiの従来の診断方法は、精度と感度に制限があることが多い。本研究では,3次元CT画像を用いたmTBI診断の革新的手法と,三重項損失を訓練した計量学習手法を提案する。これらの課題に対処するために,3次元CTスキャンを特徴空間に埋め込むことにより,mTBI症例と健常症例を識別するResidual Triplet Convolutional Neural Network (RTCNN)モデルを提案する。三重項損失関数は、類似画像対と異画像対のマージンを最大化し、特徴表現を最適化する。これにより、個々の症例の文脈的配置が改善し、情報的意思決定を支援し、患者の結果を改善する可能性がある。 RTCNNモデルは,mTBI診断における有望な成績を示し,平均精度94.3%,感度94.1%,特異性95.2%を5倍のクロスバリデーションで確認した。重要なことに、従来のResidual Convolutional Neural Network (RCNN)モデルと比較すると、RTCNNは顕著な改善を示し、22.5%の特異性、16.2%の精度向上、11.3%の感度向上を示す。さらに、RTCNNは低いメモリリソースを必要とするため、偽陽性を最小化するだけでなく、診断精度を最大化し、通常のCTスキャンとmTBIケースを区別する。モデルの決定過程を視覚的に説明するためのオクルージョン感度マップの量的性能測定と利用により、我々のアプローチの解釈可能性と透明性がさらに向上した。

Mild Traumatic Brain Injury (mTBI) is a common and challenging condition to diagnose accurately. Timely and precise diagnosis is essential for effective treatment and improved patient outcomes. Traditional diagnostic methods for mTBI often have limitations in terms of accuracy and sensitivity. In this study, we introduce an innovative approach to enhance mTBI diagnosis using 3D Computed Tomography (CT) images and a metric learning technique trained with triplet loss. To address these challenges, we propose a Residual Triplet Convolutional Neural Network (RTCNN) model to distinguish between mTBI cases and healthy ones by embedding 3D CT scans into a feature space. The triplet loss function maximizes the margin between similar and dissimilar image pairs, optimizing feature representations. This facilitates better context placement of individual cases, aids informed decision-making, and has the potential to improve patient outcomes. Our RTCNN model shows promising performance in mTBI diagnosis, achieving an average accuracy of 94.3%, a sensitivity of 94.1%, and a specificity of 95.2%, as confirmed through a five-fold cross-validation. Importantly, when compared to the conventional Residual Convolutional Neural Network (RCNN) model, the RTCNN exhibits a significant improvement, showcasing a remarkable 22.5% increase in specificity, a notable 16.2% boost in accuracy, and an 11.3% enhancement in sensitivity. Moreover, RTCNN requires lower memory resources, making it not only highly effective but also resource-efficient in minimizing false positives while maximizing its diagnostic accuracy in distinguishing normal CT scans from mTBI cases. The quantitative performance metrics provided and utilization of occlusion sensitivity maps to visually explain the model's decision-making process further enhance the interpretability and transparency of our approach.

翻訳日:2023-11-27 16:29:45 公開日:2023-11-23

# タッチ分析:タッチデータを用いた機械学習分類アルゴリズムの実証的評価

Touch Analysis: An Empirical Evaluation of Machine Learning Classification Algorithms on Touch Data ( http://arxiv.org/abs/2311.14195v1 )

ライセンス: Link先を確認

Melodee Montgomery, Prosenjit Chatterjee, John Jenkins, and Kaushik Roy

(参考訳) 本研究の目的は、タッチスクリーンベースのスマートフォン上でのユニークなインタラクションに基づいて個人を分類することである。本研究では、41の被験者と30の異なる行動特徴を含むTouch-Analyticsデータセットを使用する。さらに,全認証性能を向上させるために,生データから新機能を導出した。 Touch-Analyticsデータセットには、SVM(Support Vector Machine)やk-nearest neighbor(kNN)など、最先端の分類器が組み込まれており、エラーレート(EER)は0%から4%である。本稿では、個人を正しく分類するための新しいDeep Neural Net(DNN)アーキテクチャを提案する。提案するdnnアーキテクチャは、3つの密集層を持ち、多対多マッピング技術を用いる。新機能と既存の機能を組み合わせると、SVMとkNNはそれぞれ94.7%と94.6%の分類精度を達成した。本研究は,他の7つの分類器を探索し,その内,決定木と提案したDNN分類器の精度を100%とした。その他、ロジスティック回帰(lr)、線形判別分析(lda)、ガウス的ナイーブベイズ(nb)、ニューラルネットワーク、およびvggnetはそれぞれ94.7%、95.9%、31.9%、88.8%、96.1%である。

Our research aims at classifying individuals based on their unique interactions on touchscreen-based smartphones. In this research, we use Touch-Analytics datasets, which include 41 subjects and 30 different behavioral features. Furthermore, we derived new features from the raw data to improve the overall authentication performance. Previous research has already been done on the Touch-Analytics datasets with the state-of-the-art classifiers, including Support Vector Machine (SVM) and k-nearest neighbor (kNN), and achieved equal error rates (EERs) between 0% to 4%. Here, we propose a novel Deep Neural Net (DNN) architecture to classify the individuals correctly. The proposed DNN architecture has three dense layers and uses many-to-many mapping techniques. When we combine the new features with the existing ones, SVM and kNN achieved the classification accuracy of 94.7% and 94.6%, respectively. This research explored seven other classifiers and out of them, the decision tree and our proposed DNN classifiers resulted in the highest accuracy of 100%. The others included: Logistic Regression (LR), Linear Discriminant Analysis (LDA), Gaussian Naive Bayes (NB), Neural Network, and VGGNet with the following accuracy scores of 94.7%, 95.9%, 31.9%, 88.8%, and 96.1%, respectively.

翻訳日:2023-11-27 16:29:12 公開日:2023-11-23

# HACDによる単眼ハンドヘルド物体再建のための条件拡散

HACD: Hand-Aware Conditional Diffusion for Monocular Hand-Held Object Reconstruction ( http://arxiv.org/abs/2311.14189v1 )

ライセンス: Link先を確認

Bowen Fu, Yan Di, Chenyangguang Zhang, Gu Wang, Ziqin Huang, Zhiying Leng, Fabian Manhardt, Xiangyang Ji and Federico Tombari

(参考訳) コンピュータビジョンでは、手持ちのオブジェクトを3dオブジェクトのテンプレートやカテゴリの事前情報、奥行き情報なしで単一のrgbイメージから再構築することは、非常に難しい問題である。手動・自己閉塞による不確実性を考慮しにくい決定論的モデリングパラダイムを利用する先行研究とは対照的に,我々は上記の課題に対処するために,確率的点雲デノナイズ拡散モデルを用いている。そこで本研究では, モノクロハンドヘルドオブジェクト再構成(hacd)のためのハンドアウェア条件拡散(hand-aware conditional diffusion)を提案し, ハンドオブジェクト間相互作用を2つの側面からモデル化する。まず,意味的視点と幾何学的視点の両方から手・物体間相互作用をモデル化する。具体的には、統合された手オブジェクト意味埋め込みは、手閉塞によって引き起こされる2次元局所特徴障害を補うものであり、さらに手関節埋め込みは、対象の頂点と手関節の関係を符号化する。第2に,手の頂点を前もって使用し,拡散・反転過程における部分分断点雲の遠心偏差を制限する手拘束型遠心固定スキームを提案する。遠心バイアスの干渉を取り除くことにより、拡散モデルは形状の再構成に集中することができ、局所的特徴投影の安定性と精度が向上する。 ObManデータセットと2つの実世界のデータセット、HO3DとMOWの実験は、我々のアプローチが既存のすべてのメソッドを大きなマージンで上回っていることを実証している。

Reconstructing hand-held objects from a single RGB image without known 3D object templates, category prior, or depth information is a vital yet challenging problem in computer vision. In contrast to prior works that utilize deterministic modeling paradigms, which make it hard to account for the uncertainties introduced by hand- and self-occlusion, we employ a probabilistic point cloud denoising diffusion model to tackle the above challenge. In this work, we present Hand-Aware Conditional Diffusion for monocular hand-held object reconstruction (HACD), modeling the hand-object interaction in two aspects. First, we introduce hand-aware conditioning to model hand-object interaction from both semantic and geometric perspectives. Specifically, a unified hand-object semantic embedding compensates for the 2D local feature deficiency induced by hand occlusion, and a hand articulation embedding further encodes the relationship between object vertices and hand joints. Second, we propose a hand-constrained centroid fixing scheme, which utilizes hand vertices priors to restrict the centroid deviation of partially denoised point cloud during diffusion and reverse process. Removing the centroid bias interference allows the diffusion models to focus on the reconstruction of shape, thus enhancing the stability and precision of local feature projection. Experiments on the synthetic ObMan dataset and two real-world datasets, HO3D and MOW, demonstrate our approach surpasses all existing methods by a large margin.

翻訳日:2023-11-27 16:28:46 公開日:2023-11-23

# 行列微分計算による多重ペナルティリッジ回帰の勾配ベースバイレベル最適化

Gradient-based bilevel optimization for multi-penalty Ridge regression through matrix differential calculus ( http://arxiv.org/abs/2311.14182v1 )

ライセンス: Link先を確認

Gabriele Maroni, Loris Cannelli, Dario Piga

(参考訳) LASSOやリッジ回帰のような線形回帰の共通正規化アルゴリズムは、適合誤差と学習モデル係数のノルムとのトレードオフを最小化する正規化ハイパーパラメータに依存している。このハイパーパラメータはスカラーであるため、クロスバリデーション基準を最適化するランダムまたはグリッドサーチにより容易に選択できる。しかし、スカラーハイパーパラメーターを用いることで、アルゴリズムの柔軟性と一般化の可能性が制限される。本稿では,各入力変数に異なる正規化ハイパーパラメータが関連付けられるl2正規化を伴う線形回帰問題に対処する。これらのハイパーパラメータを勾配に基づく手法で最適化し、正規化ハイパーパラメータに対するクロスバリデーション基準の勾配を行列微分計算により解析的に計算する。さらに,検証データへの過剰適合のリスクを低減することを目的とした,スパースモデル学習問題に適した2つの戦略を提案する。数値的な例は、我々のマルチハイパーパラメータ正規化アプローチがLASSO、リッジ、弾性ネット回帰よりも優れていることを示している。さらに, 勾配の解析計算は, 計算時間の面では, 特に多くの入力変数を扱う場合には, 自動微分よりも効率的であることが証明された。過パラメータ付き線形パラメータ変動モデルの同定にも応用した。

Common regularization algorithms for linear regression, such as LASSO and Ridge regression, rely on a regularization hyperparameter that balances the tradeoff between minimizing the fitting error and the norm of the learned model coefficients. As this hyperparameter is scalar, it can be easily selected via random or grid search optimizing a cross-validation criterion. However, using a scalar hyperparameter limits the algorithm's flexibility and potential for better generalization. In this paper, we address the problem of linear regression with l2-regularization, where a different regularization hyperparameter is associated with each input variable. We optimize these hyperparameters using a gradient-based approach, wherein the gradient of a cross-validation criterion with respect to the regularization hyperparameters is computed analytically through matrix differential calculus. Additionally, we introduce two strategies tailored for sparse model learning problems aiming at reducing the risk of overfitting to the validation data. Numerical examples demonstrate that our multi-hyperparameter regularization approach outperforms LASSO, Ridge, and Elastic Net regression. Moreover, the analytical computation of the gradient proves to be more efficient in terms of computational time compared to automatic differentiation, especially when handling a large number of input variables. Application to the identification of over-parameterized Linear Parameter-Varying models is also presented.

翻訳日:2023-11-27 16:28:17 公開日:2023-11-23

# TCuPGAN:市民科学における人間と機械の相互作用を最適化するための新しいフレームワーク

TCuPGAN: A novel framework developed for optimizing human-machine interactions in citizen science ( http://arxiv.org/abs/2311.14177v1 )

ライセンス: Link先を確認

Ramanakumar Sankar and Kameswara Mantha and Lucy Fortson and Helen Spiers and Thomas Pengo and Douglas Mashek and Myat Mo and Mark Sanders and Trace Christensen and Jeffrey Salisbury and Laura Trouille

(参考訳) 科学研究におけるビッグデータの時代において、高度な機械ツールによって大規模データセットのラベル付けや分類に人間の労力を削減できる技術を活用する必要がある。この問題に対処するために,パッチワイド逆数とLong Short-Term Memoryを利用して逐次情報をエンコードする3次元セグメンテーションの汎用モデルを提案する。このモデルと3dデータセット(画像キューブ)をzooniverseプラットフォームで使用する市民科学プロジェクトと合わせて,これらのキューブからの2dスライスのほんの一部がボランティアによって見られる反復的ヒューマンマシン最適化フレームワークを提案する。我々は,このモデルのパッチワイズ判別器を用いて,画像キューブ内のスライスが特徴表現に乏しい部分と,それに対応するマシン性能の低下を推定する。これらの画像と対応する機械の提案は、修正のためにZooniverseのボランティアに提示され、市民科学プロジェクトにおけるボランティア活動の大幅な削減につながった。約2300個の肝組織3D電子顕微鏡でモデルを訓練した。脂質滴は、zoonniverseプラットフォームにホストされた「etch a cell - fat checker」市民科学プロジェクトを通じて人間のアノテーションを通して画像内に分割された。本研究では,この枠組みと選択手法を実証し,ボランティア活動の60%以上削減を実現した。このタイプの人間と機械のパートナーシップは、将来のZooniverseプロジェクトで大いに役立つだろう。

In the era of big data in scientific research, there is a necessity to leverage techniques which reduce human effort in labeling and categorizing large datasets by involving sophisticated machine tools. To combat this problem, we present a novel, general purpose model for 3D segmentation that leverages patch-wise adversariality and Long Short-Term Memory to encode sequential information. Using this model alongside citizen science projects which use 3D datasets (image cubes) on the Zooniverse platforms, we propose an iterative human-machine optimization framework where only a fraction of the 2D slices from these cubes are seen by the volunteers. We leverage the patch-wise discriminator in our model to provide an estimate of which slices within these image cubes have poorly generalized feature representations, and correspondingly poor machine performance. These images with corresponding machine proposals would be presented to volunteers on Zooniverse for correction, leading to a drastic reduction in the volunteer effort on citizen science projects. We trained our model on ~2300 liver tissue 3D electron micrographs. Lipid droplets were segmented within these images through human annotation via the `Etch A Cell - Fat Checker' citizen science project, hosted on the Zooniverse platform. In this work, we demonstrate this framework and the selection methodology which resulted in a measured reduction in volunteer effort by more than 60%. We envision this type of joint human-machine partnership will be of great use on future Zooniverse projects.

翻訳日:2023-11-27 16:27:58 公開日:2023-11-23

# 深部ニューラルネットワークを用いた合成画像による外見に基づく視線推定

Appearance-based gaze estimation enhanced with synthetic images using deep neural networks ( http://arxiv.org/abs/2311.14175v1 )

ライセンス: Link先を確認

Dmytro Herashchenko and Igor Farka\v{s}

(参考訳) 人間の視線推定は人間とロボットの相互作用を成功させる上で重要な認知要素であり、ロボットは人間の行動を読み、予測することができる。ニューラルネットワークを用いてこの問題にアプローチし,顔検出 (RetinaFace) と頭部ポーズ推定 (6DRepNet) のために,既存のよく機能するコンポーネントを活用して,切り取った目から視線を推定するモジュールシステムを構築する。提案手法では,特殊なハードウェアや赤外線フィルタは必要とせず,通常,ノートブック内蔵のrgbカメラを用いる。 MetaHumanツールを使用して、57,000人以上の顔の大規模な合成データセットを生成し、公開しました。標準的なコロンビアの視線データセットの上にこのデータセット(視線と頭部のポーズ情報を含む)を組み込んでモデルのトレーニングを行うことで、平均平均誤差が2度未満で精度が向上し、関連する方法と比較した。また,nicoセミヒューマノイドロボットの眼球に内蔵された4kカメラを用いて実世界における予備テストを行い,本モデルの有効性を検証した。

Human eye gaze estimation is an important cognitive ingredient for successful human-robot interaction, enabling the robot to read and predict human behavior. We approach this problem using artificial neural networks and build a modular system estimating gaze from separately cropped eyes, taking advantage of existing well-functioning components for face detection (RetinaFace) and head pose estimation (6DRepNet). Our proposed method does not require any special hardware or infrared filters but uses a standard notebook-builtin RGB camera, as often approached with appearance-based methods. Using the MetaHuman tool, we also generated a large synthetic dataset of more than 57,000 human faces and made it publicly available. The inclusion of this dataset (with eye gaze and head pose information) on top of the standard Columbia Gaze dataset into training the model led to better accuracy with a mean average error below two degrees in eye pitch and yaw directions, which compares favourably to related methods. We also verified the feasibility of our model by its preliminary testing in real-world setting using the builtin 4K camera in NICO semi-humanoid robot's eye.

翻訳日:2023-11-27 16:27:30 公開日:2023-11-23

# 絡み合う絡み合い:需要による二光子の結合周波数と偏光

Entangling entanglement: coupling frequency and polarization of biphotons on demand ( http://arxiv.org/abs/2311.14173v1 )

ライセンス: Link先を確認

Arash Riazi, Eric Y. Zhu, Dan Xu, and Li Qian

(参考訳) 量子情報は、単一光子と絡み合った光子の周波数と偏光度(DoF)でしばしば記録される。広帯域バイフォトンの周波数と偏光度を結合・分離する新しい手法を示す。提案手法は,2つの非線形媒質の間に挟まれた線形分散媒質と偏光制御器を備えた共通パス非線形干渉計 (CP-NLI) に基づく。偏光制御器を調整することで、2つのdofを効果的に操作できる。 2つのdofがデカップリングされた場合、極性化dofでは極性偏光絡みバイフォトンは観測され、干渉縞はバイフォトンのスペクトル強度で観測される。しかし、2つのDoFが結合されると、干渉線はスペクトル強度から消え、代わりに偏光絡みの程度に現れる。原理的に量子化された分極絡み合いの度合いは、信号とアイドラー光子周波数によって0から1に変化する。本手法は偏光エンタングルメントを調整し、任意の双光子偏光状態の生成に利用し、量子情報処理や基礎物理学の研究に応用することができる。

Quantum information is often carried in the frequency and polarization degrees of freedom (DoFs) in single photons and entangled photons. We demonstrate a new approach to couple and decouple the frequency and polarization DoFs of broadband biphotons. Our approach is based on a common-path nonlinear interferometer (CP-NLI) with a linear dispersive medium and a polarization controller sandwiched in between two nonlinear media that generate the interfering biphotons. By adjusting the polarization controller, we can effectively manipulate the two DoFs. When the two DoFs are decoupled, maximally polarization-entangled biphotons are observed in the polarization DoF, while interference fringes are observed in the spectral intensity of the biphotons. When the two DoFs are coupled, however, interference fringes disappear from the spectral intensity and instead appear in the degree of polarization entanglement. The degree of polarization entanglement quantified by concurrence in principle can vary from 0 to 1 depending on the signal and idler photon frequencies. Our approach offers a convenient means of tuning the polarization entanglement and can be employed for arbitrary biphoton polarization state generation, with applications in quantum information processing and the study of fundamental physics.

翻訳日:2023-11-27 16:27:08 公開日:2023-11-23

# シードおよび損失非線形干渉計におけるメトロロジカルアドバンテージ

Metrological Advantages in Seeded and Lossy Nonlinear Interferometers ( http://arxiv.org/abs/2311.14172v1 )

ライセンス: Link先を確認

Jasper Kranias, Guillaume Thekkadath, Khabat Heshami, Aaron Z. Goldberg

(参考訳) 量子フィッシャー情報(QFI)は、量子測定の感度を制限し、量子上の利点の条件を規定する。非線形干渉計を用いた単パラメータ位相センシングにおいて量子優位が実現できる条件を求める。本稿では, 非線形干渉計のqfi解析式を, 損失条件とコヒーレント状態のシードで計算する。我々は、位相誘導素子を通過する光子数に基づいて結果を正規化する。我々は、線形干渉計と直接比較することにより、様々な測地における非線形干渉計の性能と、内部および外部損失に対する量子優位性の堅牢性を分析する。我々は,量子優位が消滅する内部損失のしきい値を発見し,コヒーレント状態のシードが内部損失に最適に対応する時期と時期を特定し,十分な量のスクイージングが,外部損失と非効率検出に対する量子有利性に寄与することを示した。

The quantum Fisher information (QFI) bounds the sensitivity of a quantum measurement, heralding the conditions for quantum advantages. We aim to find conditions at which quantum advantage can be realized in single-parameter phase sensing with nonlinear interferometers. Here, we calculate analytical expressions for the QFI of nonlinear interferometers under lossy conditions and with coherent-state seeding. We normalize the results based on the number of photons going through the phase-inducing element, which eliminates some of the previously declared metrological advantages. We analyze the performance of nonlinear interferometers in a variety of geometries and robustness of the quantum advantage with respect to internal and external loss through direct comparison with a linear interferometer. We find the threshold on the internal loss at which the quantum advantage vanishes, specify when and how much coherent-state seeding optimally counters internal loss, and show that a sufficient amount of squeezing confers to the quantum advantages robustness against external loss and inefficient detection.

翻訳日:2023-11-27 16:26:48 公開日:2023-11-23

# ブラジル大学入学試験におけるGPT-4の視力評価

Evaluating GPT-4's Vision Capabilities on Brazilian University Admission Exams ( http://arxiv.org/abs/2311.14169v1 )

ライセンス: Link先を確認

Ramon Pires, Thales Sales Almeida, Hugo Abonizio, Rodrigo Nogueira

(参考訳) 近年の言語モデルの進歩は、学術受験において人間に相応しい性能を示した。しかし、既存の研究はしばしば、視覚的理解の統合を必要とする問題を見落とし、現実のシナリオに固有の完全なスペクトルと複雑さを妥協させる。このギャップに対処するために,テキスト要素と視覚要素の両方を組み込んだ入学試験における言語モデル評価フレームワークを提案する。ブラジルの大学が採用する主要な標準入学試験であるExame Nacional do Ensino M\'edio(ENEM)の2つの最新版を評価した。本研究は,GPT-4の複雑な多分野質問処理技術としての能力を再確認するだけでなく,ポルトガル語試験におけるマルチモーダル言語モデルの現実的評価の先駆者でもある。ハイライトの1つは、視覚コンテンツを翻訳するテキストキャプションが画像の直接使用よりも優れており、視覚モデルに改善の余地があることである。しかし、画像やキャプションによる改善にもかかわらず、数学的な疑問はこれらの最先端モデルの課題である。実験で使用されるコードとデータは、https://github.com/piresramon/gpt-4-enemで入手できる。

Recent advancements in language models have showcased human-comparable performance in academic entrance exams. However, existing studies often overlook questions that require the integration of visual comprehension, thus compromising the full spectrum and complexity inherent in real-world scenarios. To address this gap, we present a comprehensive framework to evaluate language models on entrance exams, which incorporates both textual and visual elements. We evaluate the two most recent editions of Exame Nacional do Ensino M\'edio (ENEM), the main standardized entrance examination adopted by Brazilian universities. Our study not only reaffirms the capabilities of GPT-4 as the state of the art for handling complex multidisciplinary questions, but also pioneers in offering a realistic assessment of multimodal language models on Portuguese examinations. One of the highlights is that text captions transcribing visual content outperform the direct use of images, suggesting that the vision model has room for improvement. Yet, despite improvements afforded by images or captions, mathematical questions remain a challenge for these state-of-the-art models. The code and data used on experiments are available at https://github.com/piresramon/gpt-4-enem.

翻訳日:2023-11-27 16:26:29 公開日:2023-11-23

# エントロピー正規化を伴う線形二次レギュレータの高速ポリシー学習

Fast Policy Learning for Linear Quadratic Regulator with Entropy Regularization ( http://arxiv.org/abs/2311.14168v1 )

ライセンス: Link先を確認

Xin Guo, Xinyu Li and Renyuan Xu

(参考訳) 本稿では,エントロピー正規化を伴う無限時間軸上のディスカウント線形量子化レギュレータ(lqr)問題に対して,正則化ポリシー勾配(rpg)と反復ポリシー最適化(ipo)という2つの新しいポリシー学習法を提案し,解析する。正確な政策評価にアクセスできると仮定すると、どちらの手法も正規化LQRの最適ポリシーを見つける際に線形収束することが証明される。さらに、最適政策の周辺地域に入ると、IPO方式は超直線収束率を達成することができる。最後に、RL問題におけるよく理解された環境からの最適ポリシーを、未知の環境のRL問題に対する初期ポリシーとして適切に転送した場合、後者が十分に前者に近い場合、IPO方式により超線形収束率を実現する。これらのアルゴリズムの性能は数値例によって支持されている。

This paper proposes and analyzes two new policy learning methods: regularized policy gradient (RPG) and iterative policy optimization (IPO), for a class of discounted linear-quadratic regulator (LQR) problems over an infinite time horizon with entropy regularization. Assuming access to the exact policy evaluation, both proposed approaches are proved to converge linearly in finding optimal policies of the regularized LQR. Moreover, the IPO method can achieve a super-linear convergence rate once it enters a local region around the optimal policy. Finally, when the optimal policy from a well-understood environment in an RL problem is appropriately transferred as the initial policy to an RL problem with an unknown environment, the IPO method is shown to enable a super-linear convergence rate if the latter is sufficiently close to the former. The performances of these proposed algorithms are supported by numerical examples.

翻訳日:2023-11-27 16:26:09 公開日:2023-11-23

# ハイブリッド回路マッピング:中性原子量子コンピュータの計算能力の全スペクトルを活用する

Hybrid Circuit Mapping: Leveraging the Full Spectrum of Computational Capabilities of Neutral Atom Quantum Computers ( http://arxiv.org/abs/2311.14164v1 )

ライセンス: Link先を確認

Ludwig Schmid, Sunghye Park, Seokhyeong Kang, Robert Wille

(参考訳) ニュートラル原子(NA)に基づく量子コンピューティングは、ネイティブなマルチキュービットゲートとの高忠実な長距離相互作用を含む幅広い計算能力を提供し、キュービットの配列をシャトルする能力を提供する。従来これらの機能は個別に研究されてきたが、我々は高速ハイブリッドコンパイラの最初のアプローチとして、高忠実度ゲート相互作用とクビットシャットリングの両方に基づいて回路マッピングとルーティングを行う。複数の機能を組み合わせて、結果として生じる課題に対処する効果的なソリューションを示す際に、コンパイルプロセスの複雑さを掘り下げる。最終的なコンパイル戦略は、さまざまなハードウェア設定にまたがって紹介され、その汎用性を明らかにし、ゲートとシャットリングベースのルーティングを戦略的に利用することで、潜在的な忠実性の向上を強調する。両方のルーティング機能に対するマルチキュービットゲートの追加サポートにより、提案されたアプローチはnasが提供する計算能力の完全なスペクトルを活用することができる。

Quantum computing based on Neutral Atoms (NAs) provides a wide range of computational capabilities, encompassing high-fidelity long-range interactions with native multi-qubit gates, and the ability to shuttle arrays of qubits. While previously these capabilities have been studied individually, we propose the first approach of a fast hybrid compiler to perform circuit mapping and routing based on both high-fidelity gate interactions and qubit shuttling. We delve into the intricacies of the compilation process when combining multiple capabilities and present effective solutions to address resulting challenges. The final compilation strategy is then showcased across various hardware settings, revealing its versatility, and highlighting potential fidelity enhancements achieved through the strategic utilization of combined gate- and shuttling-based routing. With the additional multi-qubit gate support for both routing capabilities, the proposed approach is able to take advantage of the full spectrum of computational capabilities offered by NAs.

翻訳日:2023-11-27 16:25:49 公開日:2023-11-23

# 量子力学における一般化虚数単位

Generalized imaginary units in quantum mechanics ( http://arxiv.org/abs/2311.14162v1 )

ライセンス: Link先を確認

Sergio Giardino

(参考訳) 虚数単位の一般化は、複素量子力学(英語版)(\mathbb C$QM)や準イオン量子力学(英語版)(\mathbb H$QM)の例でも検討される。複素理論は非定常量子過程を記述するが、四元数論はそのような解釈を認めず、一般化された虚数単位を新しい時間進化関数に関連付ける。今後の研究の方向性として様々な可能性が開けられている。

The generalization of the imaginary unit is examined within the instances of the complex quantum mechanics ($\mathbb C$QM), and of the quaternionic quantum mechanics ($\mathbb H$QM) as well. Whereas the complex theory describes non-stationary quantum processes, the quaternionic theory does not admit such an interpretation, and associates the generalized imaginary unit to a novel time evolution function. Various possibilities are opened as future directions for future research.

翻訳日:2023-11-27 16:25:31 公開日:2023-11-23

# 知識蒸留によるLHCにおける効率的ロバストジェットタグリング

Efficient and Robust Jet Tagging at the LHC with Knowledge Distillation ( http://arxiv.org/abs/2311.14160v1 )

ライセンス: Link先を確認

Ryan Liu, Abhijith Gandrakota, Jennifer Ngadiuba, Maria Spiropulu, Jean-Roch Vlimant

(参考訳) LHC(Large Hadron Collider)におけるリアルタイムデータ処理システムの困難な環境は、デプロイ可能なアルゴリズムの計算複雑性を厳しく制限する。ディープラーニングモデルでは、帰納バイアスの弱い計算複雑性の低いモデルのみが実現可能であることを意味する。この問題に対処するため,我々は,大規模モデルの性能と小型モデルの計算複雑性の低減を両立するために,知識蒸留を利用する。本稿では,LHCにおけるジェットの分類作業において,学生モデルの性能の全体的な向上を示す知識蒸留の実装について述べる。さらに,ローレンツ対称性の強いインダクティブバイアスを持つ教師モデルを用いることにより,任意のローレンツブーストに対するロバスト性が向上する学生モデルにおいて,同じインダクティブバイアスを誘導できることを示した。

The challenging environment of real-time data processing systems at the Large Hadron Collider (LHC) strictly limits the computational complexity of algorithms that can be deployed. For deep learning models, this implies that only models with low computational complexity that have weak inductive bias are feasible. To address this issue, we utilize knowledge distillation to leverage both the performance of large models and the reduced computational complexity of small ones. In this paper, we present an implementation of knowledge distillation, demonstrating an overall boost in the student models' performance for the task of classifying jets at the LHC. Furthermore, by using a teacher model with a strong inductive bias of Lorentz symmetry, we show that we can induce the same inductive bias in the student model which leads to better robustness against arbitrary Lorentz boost.

翻訳日:2023-11-27 16:25:22 公開日:2023-11-23

# 実験的匿名量子会議

Experimental anonymous quantum conferencing ( http://arxiv.org/abs/2311.14158v1 )

ライセンス: Link先を確認

Jonathan W. Webb, Joseph Ho, Federico Grasselli, Gl\'aucia Murta, Alexander Pickston, Andr\'es Ulibarrena and Alessandro Fedrizzi

(参考訳) 匿名量子会議鍵契約(AQCKA)により、ネットワーク内のユーザのグループは、参加を公表することなく、共有暗号鍵を確立することができる。これは、二部構成のプリミティブだけで実現できるが、必要なネットワークラウンドの数には費用がかかる。マルチパーティ・エンタングルメントの使用を可能にすることで、大幅な効率向上が達成される。我々は,greenberger-horne-zeilinger (ghz) 状態の絡み合いを用いた6ユーザ量子ネットワークにおいて aqcka タスクを実験的に実装し,二成分のみのアプローチと比較して,理論に沿ったリソースコストの大幅な削減を実現する。また,このプロトコルは,鍵効果が有限である4ユーザシナリオにおいて有利であることを示す。

Anonymous quantum conference key agreement (AQCKA) allows a group of users within a network to establish a shared cryptographic key without revealing their participation. Although this can be achieved using bi-partite primitives alone, it is costly in the number of network rounds required. By allowing the use of multi-partite entanglement, there is a substantial efficiency improvement. We experimentally implement the AQCKA task in a six-user quantum network using Greenberger-Horne-Zeilinger (GHZ)-state entanglement and obtain a significant resource cost reduction in line with theory when compared to a bi-partite-only approach. We also demonstrate that the protocol retains an advantage in a four-user scenario with finite key effects taken into account.

翻訳日:2023-11-27 16:25:07 公開日:2023-11-23

# 組合せ最適化のためのグラフ上の変分アニーリング

Variational Annealing on Graphs for Combinatorial Optimization ( http://arxiv.org/abs/2311.14156v1 )

ライセンス: Link先を確認

Sebastian Sanokowski, Wilhelm Berghammer, Sepp Hochreiter, Sebastian Lehner

(参考訳) いくつかの非教師なし学習手法は確率論的手法を用いて統計的に独立な解変数の仮定に基づいて組合せ最適化(CO)問題を解決する。この仮定が特に難しい問題インスタンスにパフォーマンス上の制限を課すことを実証する。その結果, 解変数間の統計的依存関係を捉える自己回帰的手法は, 多くのCO問題に対して優れた性能を示すことがわかった。本稿では,ソリューション変数の集合の構成を単一のトークンで表現するサブグラフトークン化を導入する。このトークン化技術は、表現性を犠牲にすることなく自己回帰法固有の長いシーケンシャルサンプリング手順の欠点を軽減する。重要なのは、理論上、アニールエントロピー正規化を動機付け、効率的で安定した学習に必須であることを実証的に示すことである。

Several recent unsupervised learning methods use probabilistic approaches to solve combinatorial optimization (CO) problems based on the assumption of statistically independent solution variables. We demonstrate that this assumption imposes performance limitations in particular on difficult problem instances. Our results corroborate that an autoregressive approach which captures statistical dependencies among solution variables yields superior performance on many popular CO problems. We introduce subgraph tokenization in which the configuration of a set of solution variables is represented by a single token. This tokenization technique alleviates the drawback of the long sequential sampling procedure which is inherent to autoregressive methods without sacrificing expressivity. Importantly, we theoretically motivate an annealed entropy regularization and show empirically that it is essential for efficient and stable learning.

翻訳日:2023-11-27 16:24:50 公開日:2023-11-23

# 逆行訓練による胸部X線画像のロバストかつ解釈可能な新型コロナウイルス診断

Robust and Interpretable COVID-19 Diagnosis on Chest X-ray Images using Adversarial Training ( http://arxiv.org/abs/2311.14227v1 )

ライセンス: Link先を確認

Karina Yang, Alexis Bennett, Dominique Duncan

(参考訳) 2019年の新型コロナウイルス(covid-19)の世界的パンデミックは、明確な健康危機である。最近の取り組みは、病気の強度と広がりを和らげるため、症状のある患者にまたがるcovid-19の迅速かつ正確な検出に向けられている。胸部X線(CXR)画像に適用される人工知能(AI)アルゴリズムは、有望な診断ツールとして登場し、以前の研究は印象的な分類性能を示した。しかし、このような方法は、ブラックボックスの推論プロセスや予測不能な性質から、医師から批判されている。専門的な放射線科医の診断とは対照的に、aiシステムは、しばしば臨床意思決定プロセスにおける一般化可能性、説明可能性、堅牢性に欠ける。本研究では,21種類の畳み込みニューラルネットワーク(convolutional neural network, cnn)モデルについて,健康,covid-19,非covid-19肺炎cxrを分類するための33,000以上のcxr画像を用いた広範なベースラインスタディを提案し,その評価を行った。得られたモデルは,最大97.03\%,97.97\%,99.95\%の3方向の分類精度,リコール,精度を達成した。次に,グラデーション強調型クラスアクティベーションマッピング(grad-cam)によるモデルロバストネスと説明可能性に対する敵意学習の有効性について検討する。逆向きに訓練されたモデルは、摂動画像の分類において標準モデルよりも大幅に優れているだけでなく、正当性マップも得られることがわかった。 1)臨床的に関連のある特徴をより適切に特定する。 2)外的アーティファクトに対して頑健であり、 3) 専門医の放射線検査結果とはかなり一致した。

The novel 2019 Coronavirus disease (COVID-19) global pandemic is a defining health crisis. Recent efforts have been increasingly directed towards achieving quick and accurate detection of COVID-19 across symptomatic patients to mitigate the intensity and spread of the disease. Artificial intelligence (AI) algorithms applied to chest X-ray (CXR) images have emerged as promising diagnostic tools, and previous work has demonstrated impressive classification performances. However, such methods have faced criticisms from physicians due to their black-box reasoning process and unpredictable nature. In contrast to professional radiologist diagnosis, AI systems often lack generalizability, explainability, and robustness in the clinical decision making process. In our work, we address these issues by first proposing an extensive baseline study, training and evaluating 21 convolutional neural network (CNN) models on a diverse set of 33,000+ CXR images to classify between healthy, COVID-19, and non-COVID-19 pneumonia CXRs. Our resulting models achieved a 3-way classification accuracy, recall, and precision of up to 97.03\%, 97.97\%, and 99.95\%, respectively. Next, we investigate the effectiveness of adversarial training on model robustness and explainability via Gradient-weighted Class Activation Mapping (Grad-CAM) heatmaps. We find that adversarially trained models not only significantly outperform their standard counterparts on classifying perturbed images, but also yield saliency maps that 1) better specify clinically relevant features, 2) are robust against extraneous artifacts, and 3) agree considerably more with expert radiologist findings.

翻訳日:2023-11-27 16:16:32 公開日:2023-11-23

# ビデオゲームキャラクターデザインにおけるジェンダーステレオタイプを明らかにする--キングス名誉のマルチモーダル分析

Uncovering Gender Stereotypes in Video Game Character Designs: A Multi-Modal Analysis of Honor of Kings ( http://arxiv.org/abs/2311.14226v1 )

ライセンス: Link先を確認

Bingqing Liu, Kyrie Zhixuan Zhou, Danlei Zhu, Jaihyun Park

(参考訳) 本稿では,中国で人気のマルチプレイヤーオンラインバトルアリーナ(MOBA)ゲームであるHonor of Kingsのキャラクターデザインにおいて,ジェンダーステレオタイプを包括的に分析する。我々は,役割割り当て,視覚デザイン,話し言葉,背景話のレンズを通してジェンダーステレオタイプを探索し,道徳的基礎理論に基づく質的分析とテキストマイニングを組み合わせた。男性ヒーローは、典型的には力を持つ男性戦士として、女性ヒーローは理想的外観を持つ女性ヒーローとしてデザインされる。ゲームにおけるジェンダーステレオタイプに対する文化認識とマルチモーダル理解に寄与し,テキスト,視覚,ロールに基づくエビデンスを活用する。

In this paper, we conduct a comprehensive analysis of gender stereotypes in the character design of Honor of Kings, a popular multiplayer online battle arena (MOBA) game in China. We probe gender stereotypes through the lens of role assignments, visual designs, spoken lines, and background stories, combining qualitative analysis and text mining based on the moral foundation theory. Male heroes are commonly designed as masculine fighters with power and female heroes as feminine "ornaments" with ideal looks. We contribute with a culture-aware and multi-modal understanding of gender stereotypes in games, leveraging text-, visual-, and role-based evidence.

翻訳日:2023-11-27 16:15:35 公開日:2023-11-23

# 過パラメータ線形回帰に対する加速SGDのリスク境界

Risk Bounds of Accelerated SGD for Overparameterized Linear Regression ( http://arxiv.org/abs/2311.14222v1 )

ライセンス: Link先を確認

Xuheng Li and Yihe Deng and Jingfeng Wu and Dongruo Zhou and Quanquan Gu

(参考訳) 加速度確率勾配降下(ASGD)は深層学習におけるワークホースであり、しばしばSGDよりも優れた一般化性能を達成する。しかし、既存の最適化理論はASGDのより高速な収束しか説明できないが、より優れた一般化は説明できない。本稿では,過パラメータ化による学習の最も簡単な設定である過パラメータ化線形回帰に対するasgdの一般化について検討する。データ共分散行列の各固有部分空間内で、ASGDのインスタンス依存過剰リスクを定めている。私たちの分析は (i)ASGDは小さな固有値の部分空間においてSGDより優れ、バイアス誤差の指数的減衰の速度が速い一方、大きな固有値の部分空間では、そのバイアス誤差はSGDよりも遅い。 (ii) ASGD の分散誤差は SGD の分散誤差よりも常に大きい。その結果,初期化と真の重みベクトルの差が小さい固有値の部分空間に限られている場合,ASGDはSGDより優れていることが示唆された。さらに,本解析が強凸集合における線形回帰に特化すると,最もよく知られた結果よりもバイアス誤差に強く結びつく。

Accelerated stochastic gradient descent (ASGD) is a workhorse in deep learning and often achieves better generalization performance than SGD. However, existing optimization theory can only explain the faster convergence of ASGD, but cannot explain its better generalization. In this paper, we study the generalization of ASGD for overparameterized linear regression, which is possibly the simplest setting of learning with overparameterization. We establish an instance-dependent excess risk bound for ASGD within each eigen-subspace of the data covariance matrix. Our analysis shows that (i) ASGD outperforms SGD in the subspace of small eigenvalues, exhibiting a faster rate of exponential decay for bias error, while in the subspace of large eigenvalues, its bias error decays slower than SGD; and (ii) the variance error of ASGD is always larger than that of SGD. Our result suggests that ASGD can outperform SGD when the difference between the initialization and the true weight vector is mostly confined to the subspace of small eigenvalues. Additionally, when our analysis is specialized to linear regression in the strongly convex setting, it yields a tighter bound for bias error than the best-known result.

翻訳日:2023-11-27 16:15:20 公開日:2023-11-23

# 推定リーンとデータ適応予測

Assumption-lean and Data-adaptive Post-Prediction Inference ( http://arxiv.org/abs/2311.14220v1 )

ライセンス: Link先を確認

Jiacheng Miao, Xinran Miao, Yixuan Wu, Jiwei Zhao, and Qiongshi Lu

(参考訳) 現代の科学研究が直面する主な課題は金本位制のデータの入手が限られていることであり、費用と労力がかかる。機械学習(ML)の急速な発展により、科学者は容易に得られる共変量でこれらの金標準結果を予測するためにMLアルゴリズムに依存してきた。しかし、これらの予測結果は、予測手順によってもたらされた不正確さや不均質性を無視して、後続の統計分析で直接使用されることが多い。これはおそらく偽陽性の発見と無効な科学的結論をもたらす。本研究では、ML予測結果に基づいて、有効かつ強力な推論を可能にする仮定型およびデータ適応型ポストプレディション推論(POP-Inf)手法を提案する。その「推定リーン」特性は、幅広い統計量のML予測を仮定せずに信頼できる統計的推測を保証する。その"data-adaptive"機能は、ml-predictionの精度に関わらず、既存の予測後推論メソッドよりも効率性が向上する。シミュレーションと大規模ゲノムデータを用いて,本手法の優位性と適用性を示す。

A primary challenge facing modern scientific research is the limited availability of gold-standard data which can be both costly and labor-intensive to obtain. With the rapid development of machine learning (ML), scientists have relied on ML algorithms to predict these gold-standard outcomes with easily obtained covariates. However, these predicted outcomes are often used directly in subsequent statistical analyses, ignoring imprecision and heterogeneity introduced by the prediction procedure. This will likely result in false positive findings and invalid scientific conclusions. In this work, we introduce an assumption-lean and data-adaptive Post-Prediction Inference (POP-Inf) procedure that allows valid and powerful inference based on ML-predicted outcomes. Its "assumption-lean" property guarantees reliable statistical inference without assumptions on the ML-prediction, for a wide range of statistical quantities. Its "data-adaptive'" feature guarantees an efficiency gain over existing post-prediction inference methods, regardless of the accuracy of ML-prediction. We demonstrate the superiority and applicability of our method through simulations and large-scale genomic data.

翻訳日:2023-11-27 16:14:39 公開日:2023-11-23

# 画像マニピュレーション検出のための新しいベンチマークとモデル

A New Benchmark and Model for Challenging Image Manipulation Detection ( http://arxiv.org/abs/2311.14218v1 )

ライセンス: Link先を確認

Zhenfei Zhang, Mingyang Li and Ming-Ching Chang

(参考訳) マルチメディアデータの操作を検出する能力は、デジタル法医学において不可欠である。既存の画像操作検出(IMD)法は主に、画像編集や二重圧縮による異常な特徴の検出に基づいている。既存のimd技術はすべて、大きな画像から小さな改ざんされた領域を検出する際に困難に直面する。さらに、同一品質因子の二重圧縮の場合、圧縮に基づくimdアプローチは困難に直面する。そこで我々は,これらの課題に対処するために,編集ベースおよび圧縮ベースIMD手法をそれぞれ評価するための2つのサブセットから構成されるChallenging Image Manipulation Detection (CIMD)ベンチマークデータセットを提案する。データセットのイメージは手動で撮影され、高品質なアノテーションで改ざんされた。さらに,hrnetに基づく新たな2分岐ネットワークモデルを提案し,これらの課題条件において,画像編集と圧縮アーティファクトの両方をよりよく検出する手法を提案する。 CIMDベンチマークの大規模な実験により,本モデルはCIMD上でのSoTA IMD法よりも有意に優れていた。

The ability to detect manipulation in multimedia data is vital in digital forensics. Existing Image Manipulation Detection (IMD) methods are mainly based on detecting anomalous features arisen from image editing or double compression artifacts. All existing IMD techniques encounter challenges when it comes to detecting small tampered regions from a large image. Moreover, compression-based IMD approaches face difficulties in cases of double compression of identical quality factors. To investigate the State-of-The-Art (SoTA) IMD methods in those challenging conditions, we introduce a new Challenging Image Manipulation Detection (CIMD) benchmark dataset, which consists of two subsets, for evaluating editing-based and compression-based IMD methods, respectively. The dataset images were manually taken and tampered with high-quality annotations. In addition, we propose a new two-branch network model based on HRNet that can better detect both the image-editing and compression artifacts in those challenging conditions. Extensive experiments on the CIMD benchmark show that our model significantly outperforms SoTA IMD methods on CIMD.

翻訳日:2023-11-27 16:14:21 公開日:2023-11-23

# 射影アサーションをもつ量子プログラムの精製計算

Refinement calculus of quantum programs with projective assertions ( http://arxiv.org/abs/2311.14215v1 )

ライセンス: Link先を確認

Yuan Feng, Li Zhou, Yingte Xu

(参考訳) リファインメント解析は、プログラムの進歩的かつモジュラーな開発のための構造化フレームワークを提供し、リファインメントプロセスを通してその正確性を保証する。本稿では,量子プログラムに適した精細化計算法を提案する。この目的のために、まず、処方文を含む言語において、量子内での非決定論的プログラムの部分的正当性について検討する。状態ヒルベルト空間の部分空間と等価な直交射影は、量子状態のアサーションとして扱われる。非決定論的プログラムがトレース非インクリエーションスーパーオペレータのセットに関連付けられる記述的意味論に加えて、ポスト条件を最も弱いリベラルなポスト条件に変換し、逆にプリコンを最強ポスト条件に変換する意味論も提示する。その後、これらの双対意味論に基づいて洗練規則を導入し、様々な文脈で適用可能な量子プログラムの漸進的開発に体系的なアプローチを提供する。精錬計算の実際的な応用例を示すために,$z$回転ゲートの実装,反復コード,量子から量子へのベルヌーイ工場などについて検討する。さらに,正しい量子プログラムのステップワイズ開発に携わるプログラマに実用的なサポートを提供する,pythonベースのインタラクティブプロトタイプツールquireを提案する。

Refinement calculus provides a structured framework for the progressive and modular development of programs, ensuring their correctness throughout the refinement process. This paper introduces a refinement calculus tailored for quantum programs. To this end, we first study the partial correctness of nondeterministic programs within a quantum while language featuring prescription statements. Orthogonal projectors, which are equivalent to subspaces of the state Hilbert space, are taken as assertions for quantum states. In addition to the denotational semantics where a nondeterministic program is associated with a set of trace-nonincreasing super-operators, we also present their semantics in transforming a postcondition to the weakest liberal postconditions and, conversely, transforming a precondition to the strongest postconditions. Subsequently, refinement rules are introduced based on these dual semantics, offering a systematic approach to the incremental development of quantum programs applicable in various contexts. To illustrate the practical application of the refinement calculus, we examine examples such as the implementation of a $Z$-rotation gate, the repetition code, and the quantum-to-quantum Bernoulli factory. Furthermore, we present Quire, a Python-based interactive prototype tool that provides practical support to programmers engaged in the stepwise development of correct quantum programs.

翻訳日:2023-11-27 16:14:03 公開日:2023-11-23

# 機械学習プロジェクトにおけるバイアス検出による変数認識モデル選択の拡張

Extending Variability-Aware Model Selection with Bias Detection in Machine Learning Projects ( http://arxiv.org/abs/2311.14214v1 )

ライセンス: Link先を確認

Cristina Tavares, Nathalia Nascimento, Paulo Alencar, Donald Cowan

(参考訳) データサイエンスプロジェクトは、データ、コード、モデルに依存するさまざまな機械学習(ML)メソッドを含むことが多い。これらのプロジェクトにおける重要な活動の1つは、手元のデータ分析に適したモデルやアルゴリズムの選択である。 mlモデルの選択は、サンプルサイズなどのデータ関連属性、予測アルゴリズムタイプのような機能要件、パフォーマンスやバイアスなどの非機能要件など、いくつかの要因に依存する。しかし、このような選択に影響を与える要因はよく理解されず、明確に表現される。本稿では,mlプロジェクトにおけるバイアス検出を用いた適応的可変性認識モデル選択手法の拡張について述べる。方法は次のとおりである。 (i)文献に提示されたヒューリスティックスに基づく特徴モデルを用いたモデル選択に影響する要因の変動のモデル化 (ii)バイアスに関連する追加機能(例えば、バイアス関連指標)による変動性モデルのインスタンス化、 (iii)心不全予測プロジェクトに基づいたアプローチを説明するための、特定のケーススタディにおける方法を示す実験を行うこと。提案手法は,モデル選択に影響を及ぼす明示的な要因,特にバイアスに関連する要因を,その相互作用にもとづく技術の発展を目標としている。提供された表現は、MLプロジェクトのモデル選択を非アドホックで適応的で説明可能なプロセスに変換することができる。

Data science projects often involve various machine learning (ML) methods that depend on data, code, and models. One of the key activities in these projects is the selection of a model or algorithm that is appropriate for the data analysis at hand. ML model selection depends on several factors, which include data-related attributes such as sample size, functional requirements such as the prediction algorithm type, and non-functional requirements such as performance and bias. However, the factors that influence such selection are often not well understood and explicitly represented. This paper describes ongoing work on extending an adaptive variability-aware model selection method with bias detection in ML projects. The method involves: (i) modeling the variability of the factors that affect model selection using feature models based on heuristics proposed in the literature; (ii) instantiating our variability model with added features related to bias (e.g., bias-related metrics); and (iii) conducting experiments that illustrate the method in a specific case study to illustrate our approach based on a heart failure prediction project. The proposed approach aims to advance the state of the art by making explicit factors that influence model selection, particularly those related to bias, as well as their interactions. The provided representations can transform model selection in ML projects into a non ad hoc, adaptive, and explainable process.

翻訳日:2023-11-27 16:13:40 公開日:2023-11-23

# アノテーション感性:訓練データ収集手法がモデル性能に与える影響

Annotation Sensitivity: Training Data Collection Methods Affect Model Performance ( http://arxiv.org/abs/2311.14212v1 )

ライセンス: Link先を確認

Christoph Kern, Stephanie Eckman, Jacob Beck, Rob Chew, Bolei Ma, Frauke Kreuter

(参考訳) ヒューマンアノテータからトレーニングデータを収集する場合、アノテーション機器の設計、アノテータに与えられる指示、アノテータの特性、それらの相互作用はトレーニングデータに影響を与える可能性がある。本研究は,アノテーション楽器作成時の設計選択が,結果のアノテーションに基づいてトレーニングされたモデルにも影響を与えることを実証する。アノテーションの感度という用語を導入し、アノテーションデータ収集メソッドがアノテーション自身と下流モデルのパフォーマンスと予測に与える影響について紹介する。アノテーション装置の5つの実験条件においてヘイトスピーチと攻撃的言語のアノテーションを収集し,アノテータを条件にランダムに割り当てる。次に、得られた5つのデータセットのそれぞれでBERTモデルを微調整し、各条件のホールドアウト部分でモデル性能を評価する。条件によってかなり異なることが分かりました 1)ヘイトスピーチ/違反言語アノテーションの共有 2)モデル性能 3)モデル予測,及び 4)モデル学習曲線。本研究は,機械学習の文献にはほとんど注目されていない楽器が果たす重要な役割を強調した。楽器設計におけるベストプラクティスの発展を知らせるために,アノテーションにどのような影響を与えるのか,またその理由について,さらなる研究を求めている。

When training data are collected from human annotators, the design of the annotation instrument, the instructions given to annotators, the characteristics of the annotators, and their interactions can impact training data. This study demonstrates that design choices made when creating an annotation instrument also impact the models trained on the resulting annotations. We introduce the term annotation sensitivity to refer to the impact of annotation data collection methods on the annotations themselves and on downstream model performance and predictions. We collect annotations of hate speech and offensive language in five experimental conditions of an annotation instrument, randomly assigning annotators to conditions. We then fine-tune BERT models on each of the five resulting datasets and evaluate model performance on a holdout portion of each condition. We find considerable differences between the conditions for 1) the share of hate speech/offensive language annotations, 2) model performance, 3) model predictions, and 4) model learning curves. Our results emphasize the crucial role played by the annotation instrument which has received little attention in the machine learning literature. We call for additional research into how and why the instrument impacts the annotations to inform the development of best practices in instrument design.

翻訳日:2023-11-27 16:13:22 公開日:2023-11-23

# 球状量子ドットにおける位置依存質量粒子のエネルギー固有状態

Energy eigenstates of position-dependent mass particles in a spherical quantum dot ( http://arxiv.org/abs/2311.14211v1 )

ライセンス: Link先を確認

R. M. Lima and H. R. Christiansen

(参考訳) 量子ドットへの3次元アプローチにより、ハミルトニアンの集合に対する非一様質量粒子の正確なエネルギースペクトルを求める。粒子の運動量と質量の順序の異なる一般化されたシュリンガー方程式の集合を考えると、エネルギー境界状態は硬い境界条件に対して解析的に計算される。この結果は原子物理学と量子ドット理論に非常に興味がある。

We obtain the exact energy spectrum of nonuniform mass particles for a collection of Hamiltonians in a three-dimensional approach to a quantum dot. By considering a set of generalized Schr\"odinger equations with different orderings between the particle's momentum and mass, the energy bound-states are calculated analytically for hard boundary conditions. The present results are of interest in atomic physics and quantum dot theory.

翻訳日:2023-11-27 16:13:02 公開日:2023-11-23

# ECRF:周波数領域最適化を用いたエントロピー制約ニューラルラジアンス場圧縮

ECRF: Entropy-Constrained Neural Radiance Fields Compression with Frequency Domain Optimization ( http://arxiv.org/abs/2311.14208v1 )

ライセンス: Link先を確認

Soonbin Lee, Fangwen Shu, Yago Sanchez, Thomas Schierl, Cornelius Hellge

(参考訳) 明示的な機能グリッドベースのNeRFモデルは、レンダリング品質とトレーニングにおける大幅なスピードアップの点で有望な結果を示している。しかし、これらのメソッドは単一のシーンやオブジェクトを表現するのに大量のデータを必要とすることが多い。本研究では,データサイズを効果的に削減するために,周波数領域のエントロピーを最小化することを目的とした圧縮モデルを提案する。まず、テンソル放射場上の離散コサイン変換(DCT)を用いて特徴グリッドを圧縮する。この特徴グリッドは係数に変換され、従来のビデオ符号化パイプラインと同様のアプローチに従って量子化されエントロピー符号化される。さらに,高レベルのスパーシティを実現するために,周波数領域,特に特徴格子のdct係数に対するエントロピーパラメータ化手法を提案する。変換係数はトレーニング段階で最適化されるため、提案モデルでは微調整や追加情報を必要としない。我々のモデルは、符号化と復号化のために軽量な圧縮パイプラインのみを必要とするため、実世界のアプリケーションにボリュームラディアンスフィールド法を適用するのが容易になる。実験により,提案する周波数領域エントロピーモデルにより,各種データセットの圧縮性能が向上することを示す。ソースコードは一般公開される予定だ。

Explicit feature-grid based NeRF models have shown promising results in terms of rendering quality and significant speed-up in training. However, these methods often require a significant amount of data to represent a single scene or object. In this work, we present a compression model that aims to minimize the entropy in the frequency domain in order to effectively reduce the data size. First, we propose using the discrete cosine transform (DCT) on the tensorial radiance fields to compress the feature-grid. This feature-grid is transformed into coefficients, which are then quantized and entropy encoded, following a similar approach to the traditional video coding pipeline. Furthermore, to achieve a higher level of sparsity, we propose using an entropy parameterization technique for the frequency domain, specifically for DCT coefficients of the feature-grid. Since the transformed coefficients are optimized during the training phase, the proposed model does not require any fine-tuning or additional information. Our model only requires a lightweight compression pipeline for encoding and decoding, making it easier to apply volumetric radiance field methods for real-world applications. Experimental results demonstrate that our proposed frequency domain entropy model can achieve superior compression performance across various datasets. The source code will be made publicly available.

翻訳日:2023-11-27 16:12:55 公開日:2023-11-23

# 人工知能を用いたインフラプロジェクトのデータ駆動リスクモデリング

Data-Driven Risk Modeling for Infrastructure Projects Using Artificial Intelligence Techniques ( http://arxiv.org/abs/2311.14203v1 )

ライセンス: Link先を確認

Abdolmajid Erfani

(参考訳) プロジェクトリスクの管理は、あらゆる大規模プロジェクトの成功の鍵となる部分であり、公共機関がインフラを提供するためのベストプラクティスとして広く認識されている。プロジェクトのリスクを識別し評価する従来の方法は、プロジェクトの初期段階のリスクワークショップにおいて、主題の専門家からインプットを受けることを伴う。プロジェクトがライフサイクルを進むにつれて、これらのリスクと評価が進化します。一部のリスクは問題になり、一部は軽減され、一部はもはや重要でないとして引退する。従来のエキスパートベースのアプローチが提供した価値にもかかわらず、時間とコストのかかるプロセスのために、いくつかの課題が残っている。さらに、リスクが前者から前者へとどのように進化していくかは、時間とともに限られている。プロジェクト実行中に何が起こるかと比較して、プロジェクトチームは初期段階のリスクを特定し、評価しますか? 過去のデータと人工知能技術を用いて、リスクを自動的に識別し、早期のリスクレジスタとリスク評価の品質を調べるデータ駆動型フレームワークを導入することで、これらの制限に対処した。 70以上のアメリカの主要輸送プロジェクトのリスクレジスタが入力データセットを形成する。

Managing project risk is a key part of the successful implementation of any large project and is widely recognized as a best practice for public agencies to deliver infrastructures. The conventional method of identifying and evaluating project risks involves getting input from subject matter experts at risk workshops in the early phases of a project. As a project moves through its life cycle, these identified risks and their assessments evolve. Some risks are realized to become issues, some are mitigated, and some are retired as no longer important. Despite the value provided by conventional expert-based approaches, several challenges remain due to the time-consuming and expensive processes involved. Moreover, limited is known about how risks evolve from ex-ante to ex-post over time. How well does the project team identify and evaluate risks in the initial phase compared to what happens during project execution? Using historical data and artificial intelligence techniques, this study addressed these limitations by introducing a data-driven framework to identify risks automatically and to examine the quality of early risk registers and risk assessments. Risk registers from more than 70 U.S. major transportation projects form the input dataset.

翻訳日:2023-11-27 16:12:31 公開日:2023-11-23

# 深層学習に基づく放射線レポート生成研究の体系的レビュー

A Systematic Review of Deep Learning-based Research on Radiology Report Generation ( http://arxiv.org/abs/2311.14199v1 )

ライセンス: Link先を確認

Chang Liu, Yuanhe Tian, Yan Song

(参考訳) 放射線学報告生成(RRG)は、胸部X線画像などの臨床放射線写真から自由テキスト記述を自動的に生成することを目的としている。 rrgは臨床自動化の促進に欠かせない役割を担っており、経験の浅い医師や放射線科医の作業の軽減に役立つ。したがって、これらの有意義なポテンシャルを考えると、RRGの研究は過去半年で爆発的な成長を経験しており、特にディープラーニングアプローチの急速な発展と共にである。既存の研究は、様々なモダリティの強化の観点からRRGを実行し、視覚情報とテキスト情報の両方から詳細な特徴を持つレポート生成プロセスを最適化するための洞察を与え、それら間の相互モーダル相互作用によりRRGを促進する。本稿では,深層学習に基づくRRGについて,様々な観点から概観する。具体的には、まず、無線グラフのタスク固有の特徴、レポート、それらの間の相互関係に基づいて、重要なRRGアプローチを取り上げ、その後、従来のベンチマークデータセットを評価指標で説明し、その後、異なるアプローチのパフォーマンスを分析し、最後に、今後の課題とトレンドについて概説する。本論文の目的は,既存の文献を理解するためのツールとして機能し,RRG分野における潜在的価値研究を促進することである。

Radiology report generation (RRG) aims to automatically generate free-text descriptions from clinical radiographs, e.g., chest X-Ray images. RRG plays an essential role in promoting clinical automation and presents significant help to provide practical assistance for inexperienced doctors and alleviate radiologists' workloads. Therefore, consider these meaningful potentials, research on RRG is experiencing explosive growth in the past half-decade, especially with the rapid development of deep learning approaches. Existing studies perform RRG from the perspective of enhancing different modalities, provide insights on optimizing the report generation process with elaborated features from both visual and textual information, and further facilitate RRG with the cross-modal interactions among them. In this paper, we present a comprehensive review of deep learning-based RRG from various perspectives. Specifically, we firstly cover pivotal RRG approaches based on the task-specific features of radiographs, reports, and the cross-modal relations between them, and then illustrate the benchmark datasets conventionally used for this task with evaluation metrics, subsequently analyze the performance of different approaches and finally offer our summary on the challenges and the trends in future directions. Overall, the goal of this paper is to serve as a tool for understanding existing literature and inspiring potential valuable research in the field of RRG.

翻訳日:2023-11-27 16:12:16 公開日:2023-11-23

# 伝達学習に基づくリアルタイム拳銃検出

Transfer Learning-based Real-time Handgun Detection ( http://arxiv.org/abs/2311.13559v2 )

ライセンス: Link先を確認

Youssef Elmir, Sid Ahmed Laouar, Larbi Hamdaoui

(参考訳) 従来の監視システムは人間の注意に依存し、その効果を制限している。本研究では,畳み込みニューラルネットワークとトランスファー学習を用いて,拳銃自動検出のためのリアルタイムコンピュータビジョンシステムを開発した。オンライン拳銃検出手法の包括的分析を行い,偽陽性の低減と学習時間の短縮を強調する。転校学習は効果的なアプローチとして示される。技術的課題にもかかわらず、提案システムは84.74%の精度を実現し、関連する作業に匹敵する有望な性能を示し、より高速な学習と精度の高い自動拳銃検出を可能にした。本研究は, 人体監視依存度を低減し, 効率・信頼性の高い拳銃検出のための伝達学習アプローチの可能性を示す。

Traditional surveillance systems rely on human attention, limiting their effectiveness. This study employs convolutional neural networks and transfer learning to develop a real-time computer vision system for automatic handgun detection. Comprehensive analysis of online handgun detection methods is conducted, emphasizing reducing false positives and learning time. Transfer learning is demonstrated as an effective approach. Despite technical challenges, the proposed system achieves a precision rate of 84.74%, demonstrating promising performance comparable to related works, enabling faster learning and accurate automatic handgun detection for enhanced security. This research advances security measures by reducing human monitoring dependence, showcasing the potential of transfer learning-based approaches for efficient and reliable handgun detection.

翻訳日:2023-11-27 12:28:03 公開日:2023-11-23

# LucidDreamer:3Dガウス撮影シーンのドメインフリー生成

LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes ( http://arxiv.org/abs/2311.13384v2 )

ライセンス: Link先を確認

Jaeyoung Chung, Suyoung Lee, Hyeongjin Nam, Jaerin Lee, Kyoung Mu Lee

(参考訳) VR機器やコンテンツの普及に伴い、3Dシーン生成技術への需要が高まっている。しかし、既存の3Dシーン生成モデルでは、ターゲットシーンを特定のドメインに制限している。このような制限に対処するために,既存の大規模拡散ベース生成モデルのパワーをフル活用したドメインフリーシーン生成パイプラインであるLucidDreamerを提案する。我々のLucidDreamerには、DreamingとAlignmentという2つの別のステップがあります。まず、入力から複数視点の一貫した画像を生成するため、ポイントクラウドを画像生成ごとに幾何学的ガイドラインとして設定する。具体的には、ポイントクラウドの一部を所望の視点に投影し、生成モデルを用いて絵を描くためのガイダンスとしてプロジェクションを提供する。塗装された画像は、推定深度マップで3D空間に持ち上げられ、新しいポイントを構成する。次に,新たなポイントを3Dシーンに集約するために,新たに生成された3Dシーンの一部を調和的に統合するアライメントアルゴリズムを提案する。最終的に得られた3Dシーンはガウススプラッターを最適化する最初のポイントとなる。 LucidDreamerは、従来の3Dシーン生成手法と比較して、ターゲットシーンのドメインに制約がなく、非常に詳細なガウススプラットを生成する。プロジェクトページ: https://luciddreamer-cvlab.github.io/

With the widespread usage of VR devices and contents, demands for 3D scene generation techniques become more popular. Existing 3D scene generation models, however, limit the target scene to specific domain, primarily due to their training strategies using 3D scan dataset that is far from the real-world. To address such limitation, we propose LucidDreamer, a domain-free scene generation pipeline by fully leveraging the power of existing large-scale diffusion-based generative model. Our LucidDreamer has two alternate steps: Dreaming and Alignment. First, to generate multi-view consistent images from inputs, we set the point cloud as a geometrical guideline for each image generation. Specifically, we project a portion of point cloud to the desired view and provide the projection as a guidance for inpainting using the generative model. The inpainted images are lifted to 3D space with estimated depth maps, composing a new points. Second, to aggregate the new points into the 3D scene, we propose an aligning algorithm which harmoniously integrates the portions of newly generated 3D scenes. The finally obtained 3D scene serves as initial points for optimizing Gaussian splats. LucidDreamer produces Gaussian splats that are highly-detailed compared to the previous 3D scene generation methods, with no constraint on domain of the target scene. Project page: https://luciddreamer-cvlab.github.io/

翻訳日:2023-11-27 12:27:52 公開日:2023-11-23

# 逆流モデルのない微動拡散モデルへの人間のフィードバックの利用

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model ( http://arxiv.org/abs/2311.13231v2 )

ライセンス: Link先を確認

Kai Yang, Jian Tao, Jiafei Lyu, Chunjiang Ge, Jiaxin Chen, Qimai Li, Weihan Shen, Xiaolong Zhu, Xiu Li

(参考訳) 人間のフィードバックを用いた強化学習(RLHF)は微調整拡散モデルにおいて有望である。これまでの方法は、人間の好みに合わせて報酬モデルをトレーニングし、RL技術を利用して基礎となるモデルを微調整することから始まる。しかし、効率的な報酬モデルを作成するには、膨大なデータセット、最適なアーキテクチャ、手動のハイパーパラメータチューニングが必要である。大規模言語モデルの微調整に有効な直接選好最適化(DPO)法は,報奨モデルの必要性を排除している。しかし,拡散モデルのデノイングプロセスにおけるGPUメモリの広範な要求は,DPO法の直接適用を妨げる。この問題に対処するため、直列拡散モデルにD3PO(Denoising Diffusion Policy Optimization)法を導入する。理論的解析により,D3POは報酬モデルのトレーニングを省略するが,人間のフィードバックデータを用いて学習過程をガイドする最適な報酬モデルとして効果的に機能することが示された。このアプローチでは、報酬モデルのトレーニングを必要とせず、より直接的でコスト効率が良く、計算オーバーヘッドを最小限に抑えることが証明される。実験では,目的の相対尺度を人間の嗜好のプロキシとして使用し,地道報酬を用いた手法に匹敵する結果を与える。さらに、D3POは画像歪み率を低減し、より安全な画像を生成する能力を示し、堅牢な報酬モデルに欠ける課題を克服する。私たちのコードはhttps://github.com/yk7333/D3PO/tree/mainで公開されています。

Using reinforcement learning with human feedback (RLHF) has shown significant promise in fine-tuning diffusion models. Previous methods start by training a reward model that aligns with human preferences, then leverage RL techniques to fine-tune the underlying models. However, crafting an efficient reward model demands extensive datasets, optimal architecture, and manual hyperparameter tuning, making the process both time and cost-intensive. The direct preference optimization (DPO) method, effective in fine-tuning large language models, eliminates the necessity for a reward model. However, the extensive GPU memory requirement of the diffusion model's denoising process hinders the direct application of the DPO method. To address this issue, we introduce the Direct Preference for Denoising Diffusion Policy Optimization (D3PO) method to directly fine-tune diffusion models. The theoretical analysis demonstrates that although D3PO omits training a reward model, it effectively functions as the optimal reward model trained using human feedback data to guide the learning process. This approach requires no training of a reward model, proving to be more direct, cost-effective, and minimizing computational overhead. In experiments, our method uses the relative scale of objectives as a proxy for human preference, delivering comparable results to methods using ground-truth rewards. Moreover, D3PO demonstrates the ability to reduce image distortion rates and generate safer images, overcoming challenges lacking robust reward models. Our code is publicly available in https://github.com/yk7333/D3PO/tree/main.

翻訳日:2023-11-27 12:27:31 公開日:2023-11-23

# DRIFu: 識別可能なレンダリングとインシシト関数に基づくシングルビュー3D再構成

DRIFu: Differentiable Rendering and Implicit Function-based Single-View 3D Reconstruction ( http://arxiv.org/abs/2311.13199v2 )

ライセンス: Link先を確認

Zijian Kuang, Lihang Ying, Shi Jin, Li Cheng

(参考訳) DRIFu(Dariable Rendering and Implicit Function-based model)は、当初は人体用に設計された3Dデジタル化技術のパイオニアであるPixel-aligned Implicit Function(PIFU)をルーツとしている。 PIFUは低次元空間におけるニュアンスドボディー形状の変化を捉え、ヒトの3Dスキャンで広範囲に訓練されている。しかし, 生動物へのピフの応用は, 主に3dスキャンのための動物の協力を得るのが困難であるため, 重要な課題となっている。この課題への対応として,動物デジタル化に特化したdrifuモデルを提案する。 DRIFuの訓練には、様々な形状、大きさ、さらには赤ちゃんの鳥などのバリエーションを考慮に入れた合成3D動物モデルを用いている。私たちの革新的なアライメントツールは、これらの多様な合成動物モデルを統一されたテンプレートにマッピングする上で重要な役割を担います。重要なことは、私たちのテンプレートアライメント戦略は共有された形状空間を確立し、新しい動物の形をシームレスにサンプリングし、それらをリアルに撮り、アニメーションし、それらを現実世界のデータと整合させる。この画期的なアプローチは、鳥の形を包括的に理解し表現する能力に革命をもたらします。プロジェクトの詳細とアクセスについては、プロジェクトのwebサイトがhttps://github.com/kuangzijian/drifu-for-animalsにある。

The Differentiable Rendering and Implicit Function-based model (DRIFu) draws its roots from the Pixel-aligned Implicit Function (PIFU), a pioneering 3D digitization technique initially designed for clothed human bodies. PIFU excels in capturing nuanced body shape variations within a low-dimensional space and has been extensively trained on human 3D scans. However, the application of PIFU to live animals poses significant challenges, primarily due to the inherent difficulty in obtaining the cooperation of animals for 3D scanning. In response to this challenge, we introduce the DRIFu model, specifically tailored for animal digitization. To train DRIFu, we employ a curated set of synthetic 3D animal models, encompassing diverse shapes, sizes, and even accounting for variations such as baby birds. Our innovative alignment tools play a pivotal role in mapping these diverse synthetic animal models onto a unified template, facilitating precise predictions of animal shape and texture. Crucially, our template alignment strategy establishes a shared shape space, allowing for the seamless sampling of new animal shapes, posing them realistically, animating them, and aligning them with real-world data. This groundbreaking approach revolutionizes our capacity to comprehensively understand and represent avian forms. For further details and access to the project, the project website can be found at https://github.com/kuangzijian/drifu-for-animals

翻訳日:2023-11-27 12:27:04 公開日:2023-11-23

# マルチモーダルインコンテキスト学習によるエゴ進化型シーンテキスト認識

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer ( http://arxiv.org/abs/2311.13120v2 )

ライセンス: Link先を確認

Zhen Zhao, Jingqun Tang, Chunhui Lin, Binghong Wu, Hao Liu, Zhizhong Zhang, Xin Tan, Can Huang, Yuan Xie

(参考訳) 野生のシーンテキスト認識(STR)は、ドメインのバリエーション、フォントの多様性、形状の変形などに対処する際の課題に頻繁に遭遇する。簡単な解決策は、特定のシナリオに合わせて微調整を行うことだが、計算量が多く、様々なシナリオに複数のモデルコピーを必要とする。近年の研究では、大規模言語モデル(LLM)が、訓練のない方法でいくつかの実演例から学習できることが示されている。それでも、LLMをテキスト認識器として適用することは許容できないリソース消費である。さらに,本実験の結果から,ILCがSTRで失敗するのは,学習段階における多様なサンプルからの文脈情報の組み入れが不十分であったためと考えられる。そこで本稿では,コンテキストに富んだシーンテキストシーケンスをトレーニングしたSTRモデルであるE$^2$STRを紹介し,提案したコンテキスト内トレーニング戦略を用いてシーケンスを生成する。 E$^2$STR は、STR において有効な ICL 機能を達成するのに、正規サイズのモデルで十分であることを示す。大規模な実験により、E$^2$STRは、様々なシナリオにおいて顕著なトレーニングなし適応を示し、公開ベンチマークにおける微調整された最先端アプローチよりも優れていた。

Scene text recognition (STR) in the wild frequently encounters challenges when coping with domain variations, font diversity, shape deformations, etc. A straightforward solution is performing model fine-tuning tailored to a specific scenario, but it is computationally intensive and requires multiple model copies for various scenarios. Recent studies indicate that large language models (LLMs) can learn from a few demonstration examples in a training-free manner, termed "In-Context Learning" (ICL). Nevertheless, applying LLMs as a text recognizer is unacceptably resource-consuming. Moreover, our pilot experiments on LLMs show that ICL fails in STR, mainly attributed to the insufficient incorporation of contextual information from diverse samples in the training stage. To this end, we introduce E$^2$STR, a STR model trained with context-rich scene text sequences, where the sequences are generated via our proposed in-context training strategy. E$^2$STR demonstrates that a regular-sized model is sufficient to achieve effective ICL capabilities in STR. Extensive experiments show that E$^2$STR exhibits remarkable training-free adaptation in various scenarios and outperforms even the fine-tuned state-of-the-art approaches on public benchmarks.

翻訳日:2023-11-27 12:26:02 公開日:2023-11-23

# 大規模基礎モデルの自律運転への適用

Applications of Large Scale Foundation Models for Autonomous Driving ( http://arxiv.org/abs/2311.12144v3 )

ライセンス: Link先を確認

Yu Huang, Yue Chen, Zhu Li

(参考訳) 2004/05年のDARPA Grand Challenges、2007年のUrban Challenges以来、自動運転はAIアプリケーションの最も活発な分野となっている。近年,大規模言語モデル (LLM) を基盤として,チャットGPT や PaLM などのチャットシステムが出現し,自然言語処理 (NLP) において人工知能 (AGI) を実現するための有望な方向となった。自動運転の改革にこれらの能力を使うことは自然な考えだ。 llmを基礎モデルと組み合わせることで、人間の知識、常識、推論を利用して、現在のロングテールのaiジレンマから自動運転システムを再構築することができる。本稿では、シミュレーション、世界モデル、データアノテーションと計画、E2Eソリューションなどに分類される、自動運転に応用された基礎モデルとLLMの技術について検討する。

Since DARPA Grand Challenges (rural) in 2004/05 and Urban Challenges in 2007, autonomous driving has been the most active field of AI applications. Recently powered by large language models (LLMs), chat systems, such as chatGPT and PaLM, emerge and rapidly become a promising direction to achieve artificial general intelligence (AGI) in natural language processing (NLP). There comes a natural thinking that we could employ these abilities to reformulate autonomous driving. By combining LLM with foundation models, it is possible to utilize the human knowledge, commonsense and reasoning to rebuild autonomous driving systems from the current long-tailed AI dilemma. In this paper, we investigate the techniques of foundation models and LLMs applied for autonomous driving, categorized as simulation, world model, data annotation and planning or E2E solutions etc.

翻訳日:2023-11-27 12:25:11 公開日:2023-11-23

# edgefm: エッジ上のオープンセット学習に基盤モデルを活用する

EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge ( http://arxiv.org/abs/2311.10986v3 )

ライセンス: Link先を確認

Bufang Yang, Lixing He, Neiwen Ling, Zhenyu Yan, Guoliang Xing, Xian Shuai, Xiaozhe Ren, Xin Jiang

(参考訳) ディープラーニング(DL)モデルは、DLアルゴリズムとチップの進歩の助けを借りて、IoTデバイスに広くデプロイされている。しかし、エッジデバイスの限られたリソースは、これらのデバイス上のDLモデルを様々な環境やタスクに一般化することを困難にしている。最近出現した基盤モデル(FM)は、驚くべき一般化力を示しているが、リソース制限エッジデバイスにFMの豊富な知識を効果的に活用する方法はまだ検討されていない。本稿では,オープンセット認識機能を備えたエッジクラウド協調システムであるEdgeFMを提案する。 EdgeFMは、クラウド上のFMに問い合わせるためにラベルのないデータを選択的にアップロードし、エッジモデルの特定の知識とアーキテクチャをカスタマイズする。一方、EdgeFMは、データ不確実性と動的ネットワークのばらつきの両方を考慮して、実行時に動的モデル切替を行うため、元のFMに常に近い精度が保証される。 2つのエッジプラットフォームに2つのfmsを使用してedgefmを実装します。 EdgeFMを3つの公開データセットと2つの自己収集データセットで評価する。結果としてEdgeFMは、エンドツーエンドのレイテンシを3.2倍に削減し、ベースラインと比較して34.3%の精度向上を実現している。

Deep Learning (DL) models have been widely deployed on IoT devices with the help of advancements in DL algorithms and chips. However, the limited resources of edge devices make these on-device DL models hard to be generalizable to diverse environments and tasks. Although the recently emerged foundation models (FMs) show impressive generalization power, how to effectively leverage the rich knowledge of FMs on resource-limited edge devices is still not explored. In this paper, we propose EdgeFM, a novel edge-cloud cooperative system with open-set recognition capability. EdgeFM selectively uploads unlabeled data to query the FM on the cloud and customizes the specific knowledge and architectures for edge models. Meanwhile, EdgeFM conducts dynamic model switching at run-time taking into account both data uncertainty and dynamic network variations, which ensures the accuracy always close to the original FM. We implement EdgeFM using two FMs on two edge platforms. We evaluate EdgeFM on three public datasets and two self-collected datasets. Results show that EdgeFM can reduce the end-to-end latency up to 3.2x and achieve 34.3% accuracy increase compared with the baseline.

翻訳日:2023-11-27 12:24:55 公開日:2023-11-23

# 潜在空間における乱れによる回帰の因果的説明

Counterfactual Explanation for Regression via Disentanglement in Latent Space ( http://arxiv.org/abs/2311.08228v3 )

ライセンス: Link先を確認

Xuan Zhao and Klaus Broelemann and Gjergji Kasneci

(参考訳) 予測モデルの予測に影響を与える要因は、ユーザの視点からより好ましい結果を得るために、どのように変えられるのか? このように、簡単に理解可能な説明を表現できるため、AIシステムとのユーザインタラクションをガイドする可能性を秘めている。適用するには、CEは現実的で実行可能でなければなりません。文献では、CEを生成する様々な方法が提案されている。しかし、CEに関する研究の大部分は、「拒否されたローンを承認するために何をすべきか?」といった疑問が提起されるような分類問題に焦点が当てられている。実際には、"給与を上げるために何をすべきか?"というような質問に答えることは、より回帰的な性質です。本稿では,ラベル関係をラベル非関係次元から潜在空間に分離して,事前学習したレグレッセプタのcesを生成する新しい手法を提案する。 CEはラベル非関連次元と事前定義された出力を組み合わせることで生成される。このアプローチの背景にある直感は、理想的な反事実探索は、入力のラベル非関連特性に焦点を合わせ、ターゲット関連特性への変化を提案することである。潜在領域での検索はこの目標を達成するのに役立つ。本手法は,反事実探索中にクエリサンプルの特性を維持していることを示す。様々な実験において、回帰問題設定における画像と表のデータセットの異なる品質尺度に基づいて、提案手法が競合することを示した。リアルな高次元機械学習アプリケーションに不可欠な3つの最先端手法と比較して、元のデータ多様体に近い結果を効率よく返します。私たちのコードは、この作業の公開時にオープンソースパッケージとして公開されます。

Counterfactual Explanations (CEs) help address the question: How can the factors that influence the prediction of a predictive model be changed to achieve a more favorable outcome from a user's perspective? Thus, they bear the potential to guide the user's interaction with AI systems since they represent easy-to-understand explanations. To be applicable, CEs need to be realistic and actionable. In the literature, various methods have been proposed to generate CEs. However, the majority of research on CEs focuses on classification problems where questions like "What should I do to get my rejected loan approved?" are raised. In practice, answering questions like "What should I do to increase my salary?" are of a more regressive nature. In this paper, we introduce a novel method to generate CEs for a pre-trained regressor by first disentangling the label-relevant from the label-irrelevant dimensions in the latent space. CEs are then generated by combining the label-irrelevant dimensions and the predefined output. The intuition behind this approach is that the ideal counterfactual search should focus on the label-irrelevant characteristics of the input and suggest changes toward target-relevant characteristics. Searching in the latent space could help achieve this goal. We show that our method maintains the characteristics of the query sample during the counterfactual search. In various experiments, we demonstrate that the proposed method is competitive based on different quality measures on image and tabular datasets in regression problem settings. It efficiently returns results closer to the original data manifold compared to three state-of-the-art methods, which is essential for realistic high-dimensional machine learning applications. Our code will be made available as an open-source package upon the publication of this work.

翻訳日:2023-11-27 12:24:36 公開日:2023-11-23

# ライダー位置認識のためのポーズグラフ注意グラフニューラルネットワーク

Pose-Graph Attentional Graph Neural Network for Lidar Place Recognition ( http://arxiv.org/abs/2309.00168v3 )

ライセンス: Link先を確認

Milad Ramezani, Liang Wang, Joshua Knights, Zhibin Li, Pauline Pounds, Peyman Moghadam

(参考訳) 本稿では,現在somaプレース認識法で実施されている一般的なフレーム間検索問題とは対照的に,逐次および非系列のサブグラフ間の(キー)ノードの比較を行う,ポーズグラフ注目グラフニューラルネットワークであるp-gatを提案する。 p-gatは、ポーズグラフスラムの概念を利用して、既存のエンコーダによって生成された隣り合うクラウドディスクリプタ間の最大空間的および時間的情報を利用する。 p-gatは、アテンション内およびグラフニューラルネットワークを利用して、ユークリッド空間の近傍で捕獲された点雲とその特徴空間への埋め込みを関連付ける。大規模公開データセットにおける実験結果は,異なる特徴を欠いた場面や,トレーニング環境やテスト環境が異なる分布(ドメイン適応)を持つ場面において,我々のアプローチの有効性を示す。さらに,最先端技術との比較により,性能向上が見られた。コードはhttps://github.com/csiro-robotics/p-gatで入手できる。

This paper proposes a pose-graph attentional graph neural network, called P-GAT, which compares (key)nodes between sequential and non-sequential sub-graphs for place recognition tasks as opposed to a common frame-to-frame retrieval problem formulation currently implemented in SOTA place recognition methods. P-GAT uses the maximum spatial and temporal information between neighbour cloud descriptors -- generated by an existing encoder -- utilising the concept of pose-graph SLAM. Leveraging intra- and inter-attention and graph neural network, P-GAT relates point clouds captured in nearby locations in Euclidean space and their embeddings in feature space. Experimental results on the large-scale publically available datasets demonstrate the effectiveness of our approach in scenes lacking distinct features and when training and testing environments have different distributions (domain adaptation). Further, an exhaustive comparison with the state-of-the-art shows improvements in performance gains. Code is available at https://github.com/csiro-robotics/P-GAT.

翻訳日:2023-11-27 12:23:56 公開日:2023-11-23

# オーバーザ・エアフェデレーション学習のためのチャネルおよびグラデーション・インポータンス・アウェア・スケジューリング

Channel and Gradient-Importance Aware Device Scheduling for Over-the-Air Federated Learning ( http://arxiv.org/abs/2305.16854v4 )

ライセンス: Link先を確認

Yuchang Sun and Zehong lin and Yuyi Mao and Shi Jin and Jun Zhang

(参考訳) Federated Learning(FL)は、複数のデバイスが協力して、ローカルモデルの更新をアップロードすることで機械学習モデルをトレーニングする、一般的なプライバシ保護分散トレーニングスキームである。通信効率を向上させるため、flはアナログ変調を利用して電波の重ね合わせ特性を利用して、多数のデバイスがモデル更新をアグリゲーションに同時にアップロードできるように、aircomp(over-the-air computation)を適用している。しかし、アップリンクチャネルノイズは、デバイススケジューリングによって決定的に決定され、学習したモデル性能を損なうかなりのモデル凝集歪みを引き起こす。本稿では,ある確率に応じて各デバイスをスケジュールし,そのモデル更新をこのアグリゲーションの確率を用いて再重み付けする,チャネルノイズの負の影響を軽減するために,PO-FLと呼ばれるオーバーザエアFLの確率的デバイススケジューリングフレームワークを提案する。この凝集スキームの不偏性を証明し、凸損失関数と非凸損失関数の両方におけるpo-flの収束を実証する。我々の収束限界は、デバイススケジューリングがコミュニケーションの歪みとグローバル更新のばらつきを通じて学習性能に影響することを明かした。収束解析に基づいて、PO-FLにおけるデバイススケジューリング確率を最適化するチャネルと勾配重要度認識アルゴリズムをさらに開発する。広範なシミュレーション結果から,提案手法は,提案手法がベースライン法よりも高速に収束し,より優れたモデルを生成することを示す。

Federated learning (FL) is a popular privacy-preserving distributed training scheme, where multiple devices collaborate to train machine learning models by uploading local model updates. To improve communication efficiency, over-the-air computation (AirComp) has been applied to FL, which leverages analog modulation to harness the superposition property of radio waves such that numerous devices can upload their model updates concurrently for aggregation. However, the uplink channel noise incurs considerable model aggregation distortion, which is critically determined by the device scheduling and compromises the learned model performance. In this paper, we propose a probabilistic device scheduling framework for over-the-air FL, named PO-FL, to mitigate the negative impact of channel noise, where each device is scheduled according to a certain probability and its model update is reweighted using this probability in aggregation. We prove the unbiasedness of this aggregation scheme and demonstrate the convergence of PO-FL on both convex and non-convex loss functions. Our convergence bounds unveil that the device scheduling affects the learning performance through the communication distortion and global update variance. Based on the convergence analysis, we further develop a channel and gradient-importance aware algorithm to optimize the device scheduling probabilities in PO-FL. Extensive simulation results show that the proposed PO-FL framework with channel and gradient-importance awareness achieves faster convergence and produces better models than baseline methods.

翻訳日:2023-11-27 12:23:38 公開日:2023-11-23

PDF登録状況（公開日: 20231123）