Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240424となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# Helpfulness Scores vs. Review Unhelpfulness Scores: Two Sides of the same Coin or different Coins? Review Helpfulness Scores vs. Review Unhelpfulness Scores: Two Sides of the Same Coin or Different Coins? ( http://arxiv.org/abs/2407.05207v1 ) ライセンス: Link先を確認	Yinan Yu, Dominik Gutt, Warut Khern-am-nuai,	(参考訳) オンラインレビューの有用性を評価することは、大量のオンラインレビューを精査しなければならない消費者を支援する。オンラインレビュープラットフォームは、レビューが有用かどうかをユーザが評価できるレビュー評価システムを採用しており、これらの評価はレビュー読者を支援し、レビューコントリビュータを奨励する。本報告では, 文献的有用度スコアは広く研究されているが, 文献的知識が乏しく, 不健康度スコアが欠落している。文献のこのギャップに対処することが重要であるのは、研究者や実践者が、不完全なスコアは本質的なレビューの特徴によって駆動され、そのようなスコアは品質の低いレビューと関連していると仮定しているからである。本研究では、この従来の知恵を、不健康スコアに影響を与える要因を調べることによって検証する。本研究は, 検査有用度スコアとは違って, 内在的な評価特性によって, 不完全性スコアが引き起こされることはないこと, ほぼ誰も統計学的に有意な不完全性スコアの予測因子ではないこと, などを見出した。また、レビュー不満足な投票を受けたユーザーは、他のレビューに対して不健康な投票をする傾向にあることもわかりました。最後に、不愉快な有権者は、役立たずの有権者よりもプラットフォームとの関わりがはるかに少ない。以上の結果から,本態性評価は本態性評価の特徴によるものではないことが示唆された。したがって、同じ貨幣の両面として、有益さと無益さのスコアを考慮すべきではない。 Evaluating the helpfulness of online reviews supports consumers who must sift through large volumes of online reviews. Online review platforms have increasingly adopted review evaluating systems, which let users evaluate whether reviews are helpful or not; in turn, these evaluations assist review readers and encourage review contributors. Although review helpfulness scores have been studied extensively in the literature, our knowledge regarding their counterpart, review unhelpfulness scores, is lacking. Addressing this gap in the literature is important because researchers and practitioners have assumed that unhelpfulness scores are driven by intrinsic review characteristics and that such scores are associated with low-quality reviews. This study validates this conventional wisdom by examining factors that influence unhelpfulness scores. We find that, unlike review helpfulness scores, unhelpfulness scores are generally not driven by intrinsic review characteristics, as almost none of them are statistically significant predictors of an unhelpfulness score. We also find that users who receive review unhelpfulness votes are more likely to cast unhelpfulness votes for other reviews. Finally, unhelpfulness voters engage much less with the platform than helpfulness voters do. In summary, our findings suggest that review unhelpfulness scores are not driven by intrinsic review characteristics. Therefore, helpfulness and unhelpfulness scores should not be considered as two sides of the same coin.	翻訳日:2024-07-22 14:29:03 公開日:2024-04-24
# アフリカにおける気候回復のためのAIの活用 : 課題、機会、コラボレーションの必要性 Leveraging AI for Climate Resilience in Africa: Challenges, Opportunities, and the Need for Collaboration ( http://arxiv.org/abs/2407.05210v1 ) ライセンス: Link先を確認	Rendani Mbuvha, Yassine Yaakoubi, John Bagiliko, Santiago Hincapie Potes, Amal Nammouchi, Sabrina Amrouche,	(参考訳) 気候変動の問題がより厳しくなるにつれて、アフリカにおける彼らの影響は、大陸の固有の課題に合わせた緊急で革新的な解決策を求めている。人工知能(AI)は気候変動の適応と緩和のための重要かつ価値のあるツールとして出現する一方で、その有効性と潜在性は、データの不足、インフラストラクチャのギャップ、限定的なローカルAI開発といった重要な課題を克服する上で欠かせないものである。本稿では,アフリカにおける気候変動適応と緩和におけるAIの役割について考察する。キャパシティの構築、オープンソースのデータレポジトリの開発、文化的にもコンテキスト的にも関係のあるコンテキスト対応で堅牢なAI駆動型気候ソリューションの作成に協力的なアプローチを提唱している。 As climate change issues become more pressing, their impact in Africa calls for urgent, innovative solutions tailored to the continent's unique challenges. While Artificial Intelligence (AI) emerges as a critical and valuable tool for climate change adaptation and mitigation, its effectiveness and potential are contingent upon overcoming significant challenges such as data scarcity, infrastructure gaps, and limited local AI development. This position paper explores the role of AI in climate change adaptation and mitigation in Africa. It advocates for a collaborative approach to build capacity, develop open-source data repositories, and create context-aware, robust AI-driven climate solutions that are culturally and contextually relevant.	翻訳日:2024-07-22 14:29:03 公開日:2024-04-24
# 人工知能を使って小規模企業のクラウドファンディング成功を解き放つ Using Artificial Intelligence to Unlock Crowdfunding Success for Small Businesses ( http://arxiv.org/abs/2407.09480v1 ) ライセンス: Link先を確認	Teng Ye, Jingnan Zheng, Junhui Jin, Jingyi Qiu, Wei Ai, Qiaozhu Mei,	(参考訳) 中小企業はオンラインのクラウドファンディングプラットホームに本質的な資金を提供しようとしているが、これらのキャンペーンの40%以上は、特に低社会経済分野からの資金調達に失敗している。我々は、AI技術の最新の進歩を利用して、クラウドファンディングキャンペーンの成功に影響を及ぼす重要な要因を特定し、これらの要因を戦略的に最適化することで資金調達結果を改善する。我々の最高の機械学習モデルは、主にテキスト記述に基づいて、キャンペーンの81.0%の資金調達結果を正確に予測する。機械学習モデルを解釈することで、キャンペーンを開始する前にテキスト記述を改善するための実用的な提案ができる。大規模な言語モデルを用いて物語の3つの側面を増大させることで、83%の人的評価者よりもキャンペーンの方が好まれるようになり、金融支援を確保する可能性も11.9%向上することを示した。本研究は,中小企業の資金調達キャンペーンにおける説明書作成の効果的な戦略を明らかにするとともに,大規模言語モデルをクラウドファンディング手法に統合する新たな領域を開拓するものである。 While small businesses are increasingly turning to online crowdfunding platforms for essential funding, over 40% of these campaigns may fail to raise any money, especially those from low socio-economic areas. We utilize the latest advancements in AI technology to identify crucial factors that influence the success of crowdfunding campaigns and to improve their fundraising outcomes by strategically optimizing these factors. Our best-performing machine learning model accurately predicts the fundraising outcomes of 81.0% of campaigns, primarily based on their textual descriptions. Interpreting the machine learning model allows us to provide actionable suggestions on improving the textual description before launching a campaign. We demonstrate that by augmenting just three aspects of the narrative using a large language model, a campaign becomes more preferable to 83% human evaluators, and its likelihood of securing financial support increases by 11.9%. Our research uncovers the effective strategies for crafting descriptions for small business fundraising campaigns and opens up a new realm in integrating large language models into crowdfunding methodologies.	翻訳日:2024-07-22 13:48:17 公開日:2024-04-24
# 低比デバイス上での符号化自動音声認識のためのGated Low-rank Adaptation Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices ( http://arxiv.org/abs/2406.02562v1 ) ライセンス: Link先を確認	Gwantae Kim, Bokyeung Lee, Donghyeon Kim, Hanseok Ko,	(参考訳) 近年、モバイルやCPU専用デバイスなどの低スペックデバイスでパーソナライズされた大規模モデルの利用に対する関心が高まっている。しかし、オンデバイスでパーソナライズされた大規模モデルを利用することは非効率であり、時には計算コストのために制限される。そこで本研究では,パラメータ効率のよい微調整法を用いて,デバイス上のモデル重みを最小化する重み分離手法を提案する。さらに、コードスイッチング(code-switching)として知られる発話で複数の言語を話す人もいるため、このようなケースに対処するにはパーソナライズされたASRモデルが必要である。しかし、現在の多言語音声認識モデルは、発話毎に単一の言語を認識することに限定されている。この問題に対処するため,単言語モデルと多言語音声認識モデルを組み合わせたコードスイッチング音声認識モデルを提案する。さらに,パラメータ効率のよい微調整のためのゲートローランク適応(GLoRA)を導入し,性能劣化を最小限に抑えた。韓国語と英語のコードスイッチングデータセットを用いて実験を行い、コードスイッチングのための微調整音声認識モデルが、スクラッチから訓練された従来のコードスイッチング音声認識モデルの性能を上回ることを示した。さらに、GLoRAは従来のLoRAと比較してパラメータ効率の良い微調整性能を向上させる。 In recent times, there has been a growing interest in utilizing personalized large models on low-spec devices, such as mobile and CPU-only devices. However, utilizing a personalized large model in the on-device is inefficient, and sometimes limited due to computational cost. To tackle the problem, this paper presents the weights separation method to minimize on-device model weights using parameter-efficient fine-tuning methods. Moreover, some people speak multiple languages in an utterance, as known as code-switching, the personalized ASR model is necessary to address such cases. However, current multilingual speech recognition models are limited to recognizing a single language within each utterance. To tackle this problem, we propose code-switching speech recognition models that incorporate fine-tuned monolingual and multilingual speech recognition models. Additionally, we introduce a gated low-rank adaptation(GLoRA) for parameter-efficient fine-tuning with minimal performance degradation. Our experiments, conducted on Korean-English code-switching datasets, demonstrate that fine-tuning speech recognition models for code-switching surpasses the performance of traditional code-switching speech recognition models trained from scratch. Furthermore, GLoRA enhances parameter-efficient fine-tuning performance compared to conventional LoRA.	翻訳日:2024-07-01 08:10:07 公開日:2024-04-24
# MatFusion:SVBRDFキャプチャのための生成拡散モデル MatFusion: A Generative Diffusion Model for SVBRDF Capture ( http://arxiv.org/abs/2406.06539v1 ) ライセンス: Link先を確認	Sam Sartor, Pieter Peers,	(参考訳) 画像からのSVBRDF推定を拡散タスクとして定式化する。空間的に変化する材料の分布をモデル化するために,我々はまず,空間的に変化する材料の大集合312,165個の非条件SVBRDF拡散バックボーンモデルを訓練する。このSVBRDF拡散バックボーンモデルであるMatFusionは、条件付き拡散モデルを精製し、制御または制御されていない照明下での写真から物質特性を推定する基礎となる。私たちのバックボーンMatFusionモデルは反射率特性の損失のみを用いてトレーニングされるので、トレーニング中にバックプロパゲーションを必要とせずに、より高価なレンダリング手法と組み合わせることができる。条件付きSVBRDF拡散モデルでは,複数のSVBRDF推定値を合成することができる。本手法の柔軟性は,様々な種類の入射光に条件付き異なるSVBRDF拡散モデルを精製することにより実証し,光の同時照射による1枚の写真の場合,既存のSVBRDF推定法と同等かそれ以上の精度が得られることを示す。 We formulate SVBRDF estimation from photographs as a diffusion task. To model the distribution of spatially varying materials, we first train a novel unconditional SVBRDF diffusion backbone model on a large set of 312,165 synthetic spatially varying material exemplars. This SVBRDF diffusion backbone model, named MatFusion, can then serve as a basis for refining a conditional diffusion model to estimate the material properties from a photograph under controlled or uncontrolled lighting. Our backbone MatFusion model is trained using only a loss on the reflectance properties, and therefore refinement can be paired with more expensive rendering methods without the need for backpropagation during training. Because the conditional SVBRDF diffusion models are generative, we can synthesize multiple SVBRDF estimates from the same input photograph from which the user can select the one that best matches the users' expectation. We demonstrate the flexibility of our method by refining different SVBRDF diffusion models conditioned on different types of incident lighting, and show that for a single photograph under colocated flash lighting our method achieves equal or better accuracy than existing SVBRDF estimation methods.	翻訳日:2024-07-01 08:00:19 公開日:2024-04-24
# Telugu手話のためのYOLOv5アルゴリズムに基づく手話認識 Sign Language Recognition based on YOLOv5 Algorithm for the Telugu Sign Language ( http://arxiv.org/abs/2406.10231v1 ) ライセンス: Link先を確認	Vipul Reddy. P, Vishnu Vardhan Reddy. B, Sukriti,	(参考訳) 手話認識(SLR)技術は、難聴者に対するコミュニケーションとアクセシビリティを向上させるという大きな可能性を秘めている。本稿では、YOLOv5オブジェクト識別フレームワークを用いて、TSL内のジェスチャーを識別する新しい手法を提案する。主な目標は、聴覚障害者コミュニティがslrを使用できるように、TSLジェスチャを特定するための正確で成功した方法を作ることである。その後、YOLOv5を使ってジェスチャーを認識し分類するディープラーニングモデルが作成された。このモデルはYOLOv5アーキテクチャの複雑な手話機能を扱うための高い精度、速度、能力の恩恵を受けている。転送学習のアプローチを利用して、YOLOv5モデルはTSLジェスチャーにカスタマイズされた。最高の結果を得るために、トレーニング中に慎重にパラメータとハイパーパラメータを調整した。 F1スコアと平均平均精度 (mAP) は90.5%と98.1%であり、YOLOv5-mediumモデルは卓越したパフォーマンス指標で際立っている。驚くべきことに、このモデルは計算複雑性とトレーニング時間の間に許容可能なバランスを取り、これらの驚くべき結果を生み出す。精度と効率の十分なブレンドを提供するため、200エポックでトレーニングされたYOLOv5-mediumモデルは、現実のデプロイメントに推奨される選択肢として現れます。各種のTSLジェスチャーおよび設定に対するシステムの安定性と一般化性は厳密なテストと検証によって評価され、精度は著しく向上した。本研究は、深層学習とコンピュータビジョン技術の最先端の応用をTSLジェスチャ識別に適用することにより、言語コミュニティにおけるアクセス可能な技術の発展の基盤となるものである。また、手話認識の分野に対する洞察力のある視点と新しいアプローチも提供する。 Sign language recognition (SLR) technology has enormous promise to improve communication and accessibility for the difficulty of hearing. This paper presents a novel approach for identifying gestures in TSL using the YOLOv5 object identification framework. The main goal is to create an accurate and successful method for identifying TSL gestures so that the deaf community can use slr. After that, a deep learning model was created that used the YOLOv5 to recognize and classify gestures. This model benefited from the YOLOv5 architecture's high accuracy, speed, and capacity to handle complex sign language features. Utilizing transfer learning approaches, the YOLOv5 model was customized to TSL gestures. To attain the best outcomes, careful parameter and hyperparameter adjustment was carried out during training. With F1-score and mean Average Precision (mAP) ratings of 90.5% and 98.1%, the YOLOv5-medium model stands out for its outstanding performance metrics, demonstrating its efficacy in Telugu sign language identification tasks. Surprisingly, this model strikes an acceptable balance between computational complexity and training time to produce these amazing outcomes. Because it offers a convincing blend of accuracy and efficiency, the YOLOv5-medium model, trained for 200 epochs, emerges as the recommended choice for real-world deployment. The system's stability and generalizability across various TSL gestures and settings were evaluated through rigorous testing and validation, which yielded outstanding accuracy. This research lays the foundation for future advancements in accessible technology for linguistic communities by providing a cutting-edge application of deep learning and computer vision techniques to TSL gesture identification. It also offers insightful perspectives and novel approaches to the field of sign language recognition.	翻訳日:2024-07-01 07:50:27 公開日:2024-04-24
# 金融工学におけるBERT vs GPT BERT vs GPT for financial engineering ( http://arxiv.org/abs/2405.12990v1 ) ライセンス: Link先を確認	Edward Sharkey, Philip Treleaven,	(参考訳) この論文は、これらのモデルがニュースイベントからどのように感情を判断できるかを示すために、いくつかのTransformerモデル[4]をベンチマークする。この信号は下流のモデリングや商品取引の信号識別に使用できる。細調整されたBERTモデルは,細調整されたGPTモデルやバニラGPTモデルよりも優れていることがわかった。近年、トランスフォーマーモデルは自然言語処理(NLP)の分野に革命をもたらし、機械翻訳、テキスト要約、質問応答、自然言語生成といった様々なタスクで最先端の成果を上げている。最も顕著なトランスモデルとしては、BERT (Bidirectional Encoder Representations from Transformers) とGPT (Generative Pre-Traited Transformer) がある。 CopBERTモデルトレーニングデータとプロセス概要を提供する。 CopBERTモデルはFinBERTのような類似のドメイン固有BERTトレーニングモデルより優れている。以下の混乱行列は、それぞれCopBERTとCopGPTのパフォーマンスを示している。 CopBERT対GPT4ではf1_scoreが約10%増加し、CopGPTでは16%増加しています。 GPT4が主流である一方で、金融工学的なタスクに対するGPTモデルの代替案を検討することの重要性、幻覚のリスク、解釈可能性に関わる課題が強調されている。当然のことながら、より大きなLLMがBERTモデルより優れており、予測能力がある。要約すると、BERTは部分的に新しいXGboostであり、高いレベルの解釈可能性を提供する予測能力に欠けている。 BERTモデルは次のXGboost [2]ではなく、解釈可能性と精度の混合を必要とする金融工学タスクの興味深い代替案である。 The paper benchmarks several Transformer models [4], to show how these models can judge sentiment from a news event. This signal can then be used for downstream modelling and signal identification for commodity trading. We find that fine-tuned BERT models outperform fine-tuned or vanilla GPT models on this task. Transformer models have revolutionized the field of natural language processing (NLP) in recent years, achieving state-of-the-art results on various tasks such as machine translation, text summarization, question answering, and natural language generation. Among the most prominent transformer models are Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer (GPT), which differ in their architectures and objectives. A CopBERT model training data and process overview is provided. The CopBERT model outperforms similar domain specific BERT trained models such as FinBERT. The below confusion matrices show the performance on CopBERT & CopGPT respectively. We see a ~10 percent increase in f1_score when compare CopBERT vs GPT4 and 16 percent increase vs CopGPT. Whilst GPT4 is dominant It highlights the importance of considering alternatives to GPT models for financial engineering tasks, given risks of hallucinations, and challenges with interpretability. We unsurprisingly see the larger LLMs outperform the BERT models, with predictive power. In summary BERT is partially the new XGboost, what it lacks in predictive power it provides with higher levels of interpretability. Concluding that BERT models might not be the next XGboost [2], but represent an interesting alternative for financial engineering tasks, that require a blend of interpretability and accuracy.	翻訳日:2024-05-27 03:08:05 公開日:2024-04-24
# 要求工学課題における自然言語推論(NLI)の活用から学んだこと Lessons from the Use of Natural Language Inference (NLI) in Requirements Engineering Tasks ( http://arxiv.org/abs/2405.05135v1 ) ライセンス: Link先を確認	Mohamad Fazelnia, Viktoria Koscinski, Spencer Herzog, Mehdi Mirakhorli,	(参考訳) 要求工学タスクの自動化における自然言語推論(NLI)の利用について検討する。特に、要求分類、要求仕様の欠陥の特定、利害関係者の要求における矛盾の検出という3つのタスクに重点を置いています。従来の研究は、自然言語処理タスクの幅広い範囲において、NLIを普遍的な手法として使用するという大きな利点を示してきたが、これらの利点は、ソフトウェア要件工学の文脈内では研究されていない。そこで我々は,要求分析におけるNLIの利用を評価する実験を設計した。我々は,NLIの性能を,プロンプトベースモデル,従来の移動学習,Large Language Models(LLM)を利用したチャットボットモデル,確率モデルなど,様々なアプローチと比較する。従来の学習やゼロショットなど,様々な学習環境下で実施された実験を通じて,NLI法が要求仕様の分析において従来のNLP法や,その他のLLMに基づくチャットボットモデルを上回ることを確定的に実証した。さらに、NLIを要求工学タスクの自動化に適したアプローチにする学習環境を特徴付けるための教訓を共有した。 We investigate the use of Natural Language Inference (NLI) in automating requirements engineering tasks. In particular, we focus on three tasks: requirements classification, identification of requirements specification defects, and detection of conflicts in stakeholders' requirements. While previous research has demonstrated significant benefit in using NLI as a universal method for a broad spectrum of natural language processing tasks, these advantages have not been investigated within the context of software requirements engineering. Therefore, we design experiments to evaluate the use of NLI in requirements analysis. We compare the performance of NLI with a spectrum of approaches, including prompt-based models, conventional transfer learning, Large Language Models (LLMs)-powered chatbot models, and probabilistic models. Through experiments conducted under various learning settings including conventional learning and zero-shot, we demonstrate conclusively that our NLI method surpasses classical NLP methods as well as other LLMs-based and chatbot models in the analysis of requirements specifications. Additionally, we share lessons learned characterizing the learning settings that make NLI a suitable approach for automating requirements engineering tasks.	翻訳日:2024-05-12 15:40:48 公開日:2024-04-24
# 知能チューリングシステムにおける長周期データ解析のためのLSTMとBERTの統合 Integrating LSTM and BERT for Long-Sequence Data Analysis in Intelligent Tutoring Systems ( http://arxiv.org/abs/2405.05136v1 ) ライセンス: Link先を確認	Zhaoxing Li, Jujie Yang, Jindi Wang, Lei Shi, Sebastian Stein,	(参考訳) 知識追跡の分野は、学生が過去の行動データを分析することによって、時間とともに学習し、知識をマスターする方法を理解することを目的としている。この目標を達成するために、多くの研究者が、Intelligent Tutoring Systemsのデータを使って学生のその後の行動を予測する知識追跡モデルを提案している。しかし、Intelligent Tutoring Systemsの開発に伴い、長いシーケンスデータを含む大規模データセットが出現し始めた。最近のディープラーニングベースの知識追跡モデルでは、長いシーケンスデータを含む大規模データセットを扱う際に、低効率、低精度、低解釈可能性といった障害に直面している。これらの課題に対処し,LSTM BERT をベースとした長周期データ処理のための知識追跡モデル LBKT を提案する。 LBKTは、ACCとAUCのメトリクス上で、ほとんどのベンチマークデータセット上で最高のパフォーマンスを達成する。さらに,LBKTの全体的な性能に対する各成分の影響を分析するためのアブレーション研究を行った。さらに、モデルの埋め込み戦略を示すために、可視化ツールとしてt-SNEを使用しました。その結果、LBKTはより高速で解釈可能であり、従来のディープラーニングベースの知識追跡手法よりもメモリコストが低いことが示唆された。 The field of Knowledge Tracing aims to understand how students learn and master knowledge over time by analyzing their historical behaviour data. To achieve this goal, many researchers have proposed Knowledge Tracing models that use data from Intelligent Tutoring Systems to predict students' subsequent actions. However, with the development of Intelligent Tutoring Systems, large-scale datasets containing long-sequence data began to emerge. Recent deep learning based Knowledge Tracing models face obstacles such as low efficiency, low accuracy, and low interpretability when dealing with large-scale datasets containing long-sequence data. To address these issues and promote the sustainable development of Intelligent Tutoring Systems, we propose a LSTM BERT-based Knowledge Tracing model for long sequence data processing, namely LBKT, which uses a BERT-based architecture with a Rasch model-based embeddings block to deal with different difficulty levels information and an LSTM block to process the sequential characteristic in students' actions. LBKT achieves the best performance on most benchmark datasets on the metrics of ACC and AUC. Additionally, an ablation study is conducted to analyse the impact of each component of LBKT's overall performance. Moreover, we used t-SNE as the visualisation tool to demonstrate the model's embedding strategy. The results indicate that LBKT is faster, more interpretable, and has a lower memory cost than the traditional deep learning based Knowledge Tracing methods.	翻訳日:2024-05-12 15:40:48 公開日:2024-04-24
# ロバスト検査ドローン運用におけるハイブリッド型確率的バッテリヘルス管理手法 A Hybrid Probabilistic Battery Health Management Approach for Robust Inspection Drone Operations ( http://arxiv.org/abs/2405.00055v1 ) ライセンス: Link先を確認	Jokin Alcibar, Jose I. Aizpurua, Ekhi Zugastia, Oier Penagarikano,	(参考訳) リモートクリティカルインフラストラクチャのヘルスモニタリングは、インフラのアクセシビリティが制限されているため、複雑で高価な活動である。検査ドローンは、アクセシビリティを改善して重要なインフラの信頼性を高めるユビキタスな資産である。しかし、厳しい運用環境のため、検査を成功させるためには、健康状態を監視することが不可欠である。バッテリーは、検査ドローンの全体的な信頼性を決定する重要なコンポーネントであり、適切な健康管理アプローチにより、信頼性と堅牢な検査に寄与する。本稿では,Li-Po電池の放電終端電圧予測のためのハイブリッド確率的手法を提案する。このハイブリダイゼーションは、物理に基づく放電と確率論的誤差補正モデルを組み合わせた誤差補正構成で達成され、アレタリックおよびエピステミックの不確かさを定量化する。負荷条件の異なるEOD電圧を含むデータセット上で,ハイブリッド確率的手法の性能を実験的に評価した。データセットは、オフショア風力タービンの検査に焦点を当てた、異なる飛行で作動する実際の検査ドローンから得られた。提案手法は様々な確率的手法で検証され、最高の確率的手法と比較して14.8%の確率的精度が向上したことを示す。さらに, 動脈およびてんかんの不確実性は, 電池の健康状態の診断を高めるために, 頑健な評価を提供する。 Health monitoring of remote critical infrastructure is a complex and expensive activity due to the limited infrastructure accessibility. Inspection drones are ubiquitous assets that enhance the reliability of critical infrastructures through improved accessibility. However, due to the harsh operation environment, it is crucial to monitor their health to ensure successful inspection operations. The battery is a key component that determines the overall reliability of the inspection drones and, with an appropriate health management approach, contributes to reliable and robust inspections. In this context, this paper presents a novel hybrid probabilistic approach for battery end-of-discharge (EOD) voltage prediction of Li-Po batteries. The hybridization is achieved in an error-correction configuration, which combines physics-based discharge and probabilistic error-correction models to quantify the aleatoric and epistemic uncertainty. The performance of the hybrid probabilistic methodology was empirically evaluated on a dataset comprising EOD voltage under varying load conditions. The dataset was obtained from real inspection drones operated on different flights, focused on offshore wind turbine inspections. The proposed approach has been tested with different probabilistic methods and demonstrates 14.8% improved performance in probabilistic accuracy compared to the best probabilistic method. In addition, aleatoric and epistemic uncertainties provide robust estimations to enhance the diagnosis of battery health-states.	翻訳日:2024-05-05 17:54:32 公開日:2024-04-24
# グルジア語におけるHomonym Sense Disambiguation Homonym Sense Disambiguation in the Georgian Language ( http://arxiv.org/abs/2405.00710v1 ) ライセンス: Link先を確認	Davit Melikidze, Alexander Gamkrelidze,	(参考訳) 本研究では,ジョージアの共通crawlsコーパスをフィルタリングしたデータセットに基づいて,事前学習した大規模言語モデル(LLM)の教師付き微調整に基づいて,ジョージア語における単語センス曖昧化(WSD)タスクに対する新しいアプローチを提案する。データセットは、複数の感覚を持つ単語の分類器を訓練するために使用される。さらに,WSDにLSTMを用いた実験結果について報告する。正確な曖昧な同義語は自然言語処理において不可欠である。グルジア語はカルトヴェリア語族に属する不可解な言語であり、この文脈で固有の課題を提示している。本研究の目的は、グルジア語における同義語曖昧化に関する特定の問題を強調し、その解決に向けた我々のアプローチを示すことである。本稿で論じる手法は、7500以上の文を手書き分類したデータセットを用いて、同義語の語彙的意味を予測するための95%の精度を達成している。 This research proposes a novel approach to the Word Sense Disambiguation (WSD) task in the Georgian language, based on supervised fine-tuning of a pre-trained Large Language Model (LLM) on a dataset formed by filtering the Georgian Common Crawls corpus. The dataset is used to train a classifier for words with multiple senses. Additionally, we present experimental results of using LSTM for WSD. Accurately disambiguating homonyms is crucial in natural language processing. Georgian, an agglutinative language belonging to the Kartvelian language family, presents unique challenges in this context. The aim of this paper is to highlight the specific problems concerning homonym disambiguation in the Georgian language and to present our approach to solving them. The techniques discussed in the article achieve 95% accuracy for predicting lexical meanings of homonyms using a hand-classified dataset of over 7500 sentences.	翻訳日:2024-05-05 17:44:45 公開日:2024-04-24
# 協調型組み込みシステム開発のためのリモートハンドオンサポートの探索 Exploring Remote Hands-on Support for Collaborative Embedded Systems Development ( http://arxiv.org/abs/2404.17604v1 ) ライセンス: Link先を確認	Yan Chen, Jasmine Jones,	(参考訳) 組み込みシステム開発は複雑なタスクであり、しばしばチームコラボレーションを必要とします。フリーランサーの市場が拡大し、リモートワークへの世界的シフトを考えると、多くの開発者やクライアントにとってリモートコラボレーションが不可欠になっている。既存のコミュニケーションとコーディネーションツールは、ユーザが共同でコードを共有し、議論し、編集するのに役立つが、これらのツールはハードウェア開発ではなく、ソフトウェア用に特別に設計されている。本研究の目的は,組込みシステム開発のための遠隔支援ツールの設計空間を探ることである。これを実現するため、私たちは12人の経験豊富な組み込みシステム開発者に対して、現在のリモートワークプラクティス、課題、ニーズについてインタビューしました。次に, 遠隔操作エージェントHandyを用いて, 共同作業者から支援開発者の欲求のタイプを抽出するための仮説的アシスタントとして, ユーザ・エコメンテーション・スタディを行った。本研究は,リモートワークのシナリオと戦略,開発者によるサポートニーズ,情報提供,調整,実装の課題,開発者がリモート物理操作ツールを使用してプロジェクトに取り組む際のプライバシ,コントロール,信頼に関する懸念について述べる。この研究は、リモートでオンデマンドなコラボレーションと、ソフトウェア環境におけるヘルプ・シーキングに沿った組み込みシステム開発を提供することによって、文献に寄与する。この研究の実証的基盤は、遠隔操作エージェントにおける将来の作業の基盤となり、組込みシステム開発における協調サポートを強化する、ドキュメント化されたニーズ、好み、欲求の豊富な基盤を提供する。 Embedded systems development is a complex task that often requires team collaboration. Given the growing market of freelancers and the global shift to remote work, remote collaboration has become a necessity for many developers and clients. While existing communication and coordination tools help users share, discuss, and edit code collaboratively, these tools were specifically designed for software rather than hardware development. In this work, our goal is to explore the design space of remote support tools for embedded systems development. To do this, we interviewed 12 seasoned embedded systems developers regarding their current remote work practices, issues, and needs. We then conducted a user enactment study with a bespoke remote manipulation agent, Handy, as a hypothetical assistant to elicit the types of support developers desire from a collaborator. Our findings describe the scenarios and strategies in which remote work takes place; the support needs and information, coordination, and implementation challenges expressed by developers; and the privacy, control, and trust concerns that developers have when working on their projects with remote physical manipulation tools. This research contributes to the literature by bringing embedded systems development in line with remote, on-demand collaboration and help-seeking in software environments. The empirical basis of this work provides a rich foundation of documented needs, preferences, and desires that can ground future work on remote manipulation agents and enhance collaboration support in the domain of embedded systems development.	翻訳日:2024-04-30 20:10:08 公開日:2024-04-24
# 自律型LCM駆動型データから人間検証研究論文へ Autonomous LLM-driven research from data to human-verifiable research papers ( http://arxiv.org/abs/2404.17605v1 ) ライセンス: Link先を確認	Tal Ifargan, Lukas Hafner, Maor Kern, Ori Alcalay, Roy Kishony,	(参考訳) AIが科学的発見を加速することを約束しているため、完全なAI駆動型研究が可能であるか、透明性、トレーサビリティ、検証可能性といった重要な科学的価値に準拠できるかどうかは不明だ。人間の科学的実践を模倣して、私たちは、完全な段階的な研究プロセスを通じて、LLMエージェント間のインタラクションをガイドする自動化プラットフォームであるData-to-paperを構築しました。自動操縦モードでは、注釈付きデータだけで、データ・ツー・ペーパーの仮説を立て、研究計画を設計し、分析コードを書き、デバッグし、結果を生成して解釈し、完全な情報追跡可能な研究論文を作成した。研究の新規性は比較的限られていたが、このプロセスはデータからデ・ノボの定量的洞察を自律的に生成することを示した。単純な研究目的のために、完全に自律的なサイクルは、80～90%の誤差を伴わずにピアレビューされた出版物を再カプセル化する原稿を作成することができるが、目標の複雑さが増大するにつれて、人間の共同操縦は精度を測るために重要になる。プロセス自体を超えて、作成された原稿も本質的に検証可能であり、情報追跡によって結果、方法、データをプログラム的に連鎖することができる。我々の研究は、危険、トレーサビリティ、透明性、検証可能性ではなく、AIによる科学的発見の加速の可能性を示している。 As AI promises to accelerate scientific discovery, it remains unclear whether fully AI-driven research is possible and whether it can adhere to key scientific values, such as transparency, traceability and verifiability. Mimicking human scientific practices, we built data-to-paper, an automation platform that guides interacting LLM agents through a complete stepwise research process, while programmatically back-tracing information flow and allowing human oversight and interactions. In autopilot mode, provided with annotated data alone, data-to-paper raised hypotheses, designed research plans, wrote and debugged analysis codes, generated and interpreted results, and created complete and information-traceable research papers. Even though research novelty was relatively limited, the process demonstrated autonomous generation of de novo quantitative insights from data. For simple research goals, a fully-autonomous cycle can create manuscripts which recapitulate peer-reviewed publications without major errors in about 80-90%, yet as goal complexity increases, human co-piloting becomes critical for assuring accuracy. Beyond the process itself, created manuscripts too are inherently verifiable, as information-tracing allows to programmatically chain results, methods and data. Our work thereby demonstrates a potential for AI-driven acceleration of scientific discovery while enhancing, rather than jeopardizing, traceability, transparency and verifiability.	翻訳日:2024-04-30 20:10:08 公開日:2024-04-24
# サンプル, 特徴, 期間のデータ中心時系列分析の概観 Review of Data-centric Time Series Analysis from Sample, Feature, and Period ( http://arxiv.org/abs/2404.16886v1 ) ライセンス: Link先を確認	Chenxi Sun, Hongyan Li, Yaliang Li, Shenda Hong,	(参考訳) データは、古典的モデルであれ、今日の大規模言語モデルであれ、機械学習アプローチを利用した時系列分析を実行する上で不可欠である。優れた時系列データセットは、タスクの結果とコストだけでなく、モデルの正確性、堅牢性、収束性にも有利です。データ中心AIの出現は、モデルの改良からデータ品質の優先順位付けへの展望の変化を表している。時系列データ処理手法は、広範囲の研究分野に頻繁に現れるが、特定のトピックとしてはあまり研究されていない。このギャップを埋めるために、本稿では、時系列分析における様々なデータ中心の手法を体系的にレビューし、幅広い研究トピックを取り上げる。本稿では,サンプル,特徴,期間における時系列データの特徴に基づいて,レビューしたデータ選択手法の分類法を提案する。時系列データを対象とした特徴,利益,欠点を論じ,要約することに加えて,推奨事項やオープン問題,可能な研究トピックを提案することで,課題や機会も紹介する。 Data is essential to performing time series analysis utilizing machine learning approaches, whether for classic models or today's large language models. A good time-series dataset is advantageous for the model's accuracy, robustness, and convergence, as well as task outcomes and costs. The emergence of data-centric AI represents a shift in the landscape from model refinement to prioritizing data quality. Even though time-series data processing methods frequently come up in a wide range of research fields, it hasn't been well investigated as a specific topic. To fill the gap, in this paper, we systematically review different data-centric methods in time series analysis, covering a wide range of research topics. Based on the time-series data characteristics at sample, feature, and period, we propose a taxonomy for the reviewed data selection methods. In addition to discussing and summarizing their characteristics, benefits, and drawbacks targeting time-series data, we also introduce the challenges and opportunities by proposing recommendations, open problems, and possible research topics.	翻訳日:2024-04-29 15:03:56 公開日:2024-04-24
# 大規模インシデント応答の異常検出 Anomaly Detection for Incident Response at Scale ( http://arxiv.org/abs/2404.16887v1 ) ライセンス: Link先を確認	Hanzhang Wang, Gowtham Kumar Tangirala, Gilkara Pranav Naidu, Charles Mayville, Arighna Roy, Joanne Sun, Ramesh Babu Mandava,	(参考訳) 本稿では,機械学習に基づく異常検出製品であるAIDR(AI Detect and Respond)について紹介する。 3ヶ月にわたる検証の間に、製品は3000以上のモデルから25以上のアプリケーション、プラットフォーム、運用チームへの予測を提供し、主要なインシデントのうち63%をカバーし、平均時間検出(MTTD)を7分以上短縮した。従来の異常検出手法とは異なり、我々のソリューションは統計的、ML、ディープラーニングモデルを活用しながら、ルールベースの静的しきい値を導入し、ドメイン固有の知識を取り入れている。単変量および多変量MLモデルは、スケーラビリティと高可用性のために、分散サービスを通じてデプロイされ、メンテナンスされる。 AIDRには、ドリフト検出アルゴリズムと顧客のフィードバックを組み合わせたモデル品質を評価するフィードバックループがある。また、セルフオンボーディング機能とカスタマイズ性も備えている。 AIDRは、検出にかかる時間が少なく、従来の方法よりも偽陽性が少ない、さまざまな社内チームで成功している。前進するにつれて、インシデントカバレッジと防止を拡張し、ノイズを低減し、根本原因推奨(RCR)とさらに統合して、エンドツーエンドのAIDRエクスペリエンスの実現を目指しています。 We present a machine learning-based anomaly detection product, AI Detect and Respond (AIDR), that monitors Walmart's business and system health in real-time. During the validation over 3 months, the product served predictions from over 3000 models to more than 25 application, platform, and operation teams, covering 63\% of major incidents and reducing the mean-time-to-detect (MTTD) by more than 7 minutes. Unlike previous anomaly detection methods, our solution leverages statistical, ML and deep learning models while continuing to incorporate rule-based static thresholds to incorporate domain-specific knowledge. Both univariate and multivariate ML models are deployed and maintained through distributed services for scalability and high availability. AIDR has a feedback loop that assesses model quality with a combination of drift detection algorithms and customer feedback. It also offers self-onboarding capabilities and customizability. AIDR has achieved success with various internal teams with lower time to detection and fewer false positives than previous methods. As we move forward, we aim to expand incident coverage and prevention, reduce noise, and integrate further with root cause recommendation (RCR) to enable an end-to-end AIDR experience.	翻訳日:2024-04-29 15:03:56 公開日:2024-04-24
# NEPENTHE: ニューラルネットワーク深さ低減器としてのエントロピーベースプルーニング NEPENTHE: Entropy-Based Pruning as a Neural Network Depth's Reducer ( http://arxiv.org/abs/2404.16890v1 ) ライセンス: Link先を確認	Zhu Liao, Victor Quétu, Van-Tam Nguyen, Enzo Tartaglione,	(参考訳) ディープニューラルネットワークは複雑なタスクを解くのに非常に効果的であるが、その計算要求はリアルタイムアプリケーションや限られたリソースシステムにおいてその有用性を妨げうる。さらに、多くのタスクにおいてこれらのモデルが過度にパラメータ化されていることが知られている。本稿では,nEural Network depTH の rEducer (NEPENTHE) として eNtropy-basEd Pruning を提案する。我々の理論的発見に基づいて、NEPENTHEは、完全に除去するために低いエントロピーを持つ層で非構造的に切断される接続に焦点を当てている。我々はMobileNetやSwin-Tのような一般的なアーキテクチャに対するアプローチを検証し、過度なパラメータ化体制に遭遇すると、いくつかのレイヤを効果的に線形化できることを示した。コードは記事の受理時に公開される。 While deep neural networks are highly effective at solving complex tasks, their computational demands can hinder their usefulness in real-time applications and with limited-resources systems. Besides, for many tasks it is known that these models are over-parametrized: neoteric works have broadly focused on reducing the width of these networks, rather than their depth. In this paper, we aim to reduce the depth of over-parametrized deep neural networks: we propose an eNtropy-basEd Pruning as a nEural Network depTH's rEducer (NEPENTHE) to alleviate deep neural networks' computational burden. Based on our theoretical finding, NEPENTHE focuses on un-structurally pruning connections in layers with low entropy to remove them entirely. We validate our approach on popular architectures such as MobileNet and Swin-T, showing that when encountering an over-parametrization regime, it can effectively linearize some layers (hence reducing the model's depth) with little to no performance loss. The code will be publicly available upon acceptance of the article.	翻訳日:2024-04-29 15:03:56 公開日:2024-04-24
# 大規模言語モデルのサードパーティAPIに対する攻撃 Attacks on Third-Party APIs of Large Language Models ( http://arxiv.org/abs/2404.16891v1 ) ライセンス: Link先を確認	Wanru Zhao, Vidit Khazanchi, Haodi Xing, Xuanli He, Qiongkai Xu, Nicholas Donald Lane,	(参考訳) 大規模言語モデル(LLM)サービスは最近、サードパーティのAPIサービスと対話するプラグインエコシステムの提供を開始した。このイノベーションはLLMの能力を高めるが、様々なサードパーティが開発したプラグインが容易に信頼できないため、リスクも伴う。本稿では,サードパーティサービスを含むLDMプラットフォームにおけるセキュリティと安全性の脆弱性を調査する新たな攻撃フレームワークを提案する。フレームワークを広く使われているLLMに適用し、LLM出力を許容不能に修正可能なサードパーティAPI上で、さまざまなドメインにわたる現実世界の悪意のある攻撃を識別する。本稿は,サードパーティのAPI統合によって引き起こされるユニークな課題について論じ,今後のLCMエコシステムのセキュリティと安全性を改善するための戦略的可能性を提供する。私たちのコードは、https://github.com/vk0812/Third-Party-Attacks-on-LLMsでリリースされています。 Large language model (LLM) services have recently begun offering a plugin ecosystem to interact with third-party API services. This innovation enhances the capabilities of LLMs, but it also introduces risks, as these plugins developed by various third parties cannot be easily trusted. This paper proposes a new attacking framework to examine security and safety vulnerabilities within LLM platforms that incorporate third-party services. Applying our framework specifically to widely used LLMs, we identify real-world malicious attacks across various domains on third-party APIs that can imperceptibly modify LLM outputs. The paper discusses the unique challenges posed by third-party API integration and offers strategic possibilities to improve the security and safety of LLM ecosystems moving forward. Our code is released at https://github.com/vk0812/Third-Party-Attacks-on-LLMs.	翻訳日:2024-04-29 15:03:56 公開日:2024-04-24
# マルチクロモフォリック系における光子相関時間非対称性と動的コヒーレンス Photon correlation time-asymmetry and dynamical coherence in multichromophoric systems ( http://arxiv.org/abs/2404.16892v1 ) ライセンス: Link先を確認	Charlie Nation, Hallmann Oskar Gestsson, Alexandra Olaya-Castro,	(参考訳) 実効エキシトン-フォノン相互作用下で励起輸送を受けるマルチクロモフォリック系により放射される光の偏光フィルターによる2光子相関を理論的に検討し,連続的不整合照明を受ける。本研究では、FMO(Fenna-Matthews Olson)光合成複合体のような生体分子集合体において、異なる偏光に対応する光子の相互相関における時間-対称性を利用して、ゼロ遅延相関で観測されない量子コヒーレント輸送機構と定常状態コヒーレンス特性の両方を探索できることを示す。相関非対称性の古典的境界が得られ、FMOは正確な数値計算によって破られる。これらの光子交叉相関における時間非対称性への支配的な寄与は、フレンケル・エクシトンモデルに対するコヒーレンス移動の集団であることを示す。その結果、分子集合体や他の多部位量子エミッタにおける励起状態のダイナミクスに対するコヒーレントな寄与を研究するために、光子相関非対称性を有望なアプローチとして提案した。 We theoretically investigate polarization-filtered two-photon correlations for the light emitted by a multichromophoric system undergoing excitation transport under realistic exciton-phonon interactions, and subject to continuous incoherent illumination. We show that for a biomolecular aggregate, such as the Fenna-Matthews Olson (FMO) photosynthetic complex, time-asymmetries in the cross-correlations of photons corresponding to different polarizations can be exploited to probe both quantum coherent transport mechanisms and steady-state coherence properties, which are not witnessed by zero-delay correlations. A classical bound on correlation asymmetry is obtained, which FMO is shown to violate using exact numerical calculations. Our analysis indicates that the dominant contributions to time-asymmetry in such photon cross-correlations are population to coherence transfer for Frenkel-Exciton models. Our results therefore put forward photon correlation asymmetry as a promising approach to investigate coherent contributions to excited-stated dynamics in molecular aggregates and other many-site quantum emitters.	翻訳日:2024-04-29 15:03:56 公開日:2024-04-24
# 自信を持って運転できる自動AIコントローラー:不確実性のある車両を操縦する Automatic AI controller that can drive with confidence: steering vehicle with uncertainty knowledge ( http://arxiv.org/abs/2404.16893v1 ) ライセンス: Link先を確認	Neha Kumari, Sumit Kumar. Sneha Priya, Ayush Kumar, Akash Fogla,	(参考訳) 現実世界と対話する安全クリティカルなシステムでは、意思決定における不確実性の役割が特に機械学習モデルにおいて重要である。 CPS(Cyber-Physical Systems)のセキュアな機能のためには、このような不確実性を適切に管理することが不可欠である。本研究では,機械学習フレームワークを用いた車両の横方向制御システムの開発に焦点をあてる。具体的には、確率論的学習モデルであるベイズニューラルネットワーク(BNN)を用いて不確実性定量化に対処する。この能力により、モデルの予測における信頼度や不確実性のレベルを測定することができます。 BNNベースのコントローラは、単一のトラックを横断する車両から収集されたシミュレーションデータを使用して訓練され、その後、他の様々なトラックでテストされる。まず、トレーニングされたモデルは、複数の類似したトラック上で車両を適応し、効果的に制御する能力を示します。第二に、制御器に組み込まれた予測信頼性の定量化は早期警戒システムとして機能し、アルゴリズムが予測に対する信頼を欠き、したがって失敗に陥りやすいことを示唆する。信頼しきい値を確立することで、手動による介入をトリガーし、安全なパラメータの外で動作した場合に、制御がアルゴリズムから解放されることを保証できます。 In safety-critical systems that interface with the real world, the role of uncertainty in decision-making is pivotal, particularly in the context of machine learning models. For the secure functioning of Cyber-Physical Systems (CPS), it is imperative to manage such uncertainty adeptly. In this research, we focus on the development of a vehicle's lateral control system using a machine learning framework. Specifically, we employ a Bayesian Neural Network (BNN), a probabilistic learning model, to address uncertainty quantification. This capability allows us to gauge the level of confidence or uncertainty in the model's predictions. The BNN based controller is trained using simulated data gathered from the vehicle traversing a single track and subsequently tested on various other tracks. We want to share two significant results: firstly, the trained model demonstrates the ability to adapt and effectively control the vehicle on multiple similar tracks. Secondly, the quantification of prediction confidence integrated into the controller serves as an early-warning system, signaling when the algorithm lacks confidence in its predictions and is therefore susceptible to failure. By establishing a confidence threshold, we can trigger manual intervention, ensuring that control is relinquished from the algorithm when it operates outside of safe parameters.	翻訳日:2024-04-29 15:03:56 公開日:2024-04-24
# 空間的最適線形非バイアス予測:高次元大規模データセットに対する計算数学的アプローチ Spatial best linear unbiased prediction: A computational mathematics approach for high dimensional massive datasets ( http://arxiv.org/abs/1701.00285v3 ) ライセンス: Link先を確認	Julio E. Castrillon-Candas,	(参考訳) 膨大なデータセットの出現により、計算科学とエンジニアリングのコミュニティの多くは、回帰と分類におけるデータ集約的なアプローチに向かっている。しかし、これらの課題は、問題の規模、複雑さ、次元性の増加によるものである。特に、多くの場合、共分散行列は数値的に不安定であり、線形代数はそのような行列を有限精度のコンピュータ上で正確に逆転することはできないことを示す。行列の安定化に対する一般的なアドホックなアプローチは、いわゆるナゲットの応用である。しかし、これはモデルを変更し、元のソリューションにエラーをもたらす可能性がある。不条件行列を正確に逆転することはできないことは、数値解析からよく知られている。本稿では,観測値や次元数とよく一致したマルチレベル計算法を提案する。マルチレベル基底は、観測のkD木分割に適合する。条件数が大きい数値的に不安定な共分散行列は、精度を損なうことなく、良好な条件付きマルチレベル行列に変換することができる。さらに, 最適線形不偏予測 (BLUP) モデルと一般化最小正方形 (GLS) モデルを正確に解くが, 数値的に安定であることを示す。最大25次元の数値的不安定な問題に対して, マルチレベル法を検証した。 BLUP問題を解くために最大42,050倍の高速化が得られたが、従来の反復法と同じ精度である。非常に不条件の場合、スピードアップは無限である。さらに,多値共分散行列の減衰推定は数値解析の分野から高次元補間法に基づいて導出される。この研究は統計学、不確実量化、高性能計算、計算応用数学の交差点にある。 With the advent of massive data sets much of the computational science and engineering community has moved toward data-intensive approaches in regression and classification. However, these present significant challenges due to increasing size, complexity and dimensionality of the problems. In particular, covariance matrices in many cases are numerically unstable and linear algebra shows that often such matrices cannot be inverted accurately on a finite precision computer. A common ad hoc approach to stabilizing a matrix is application of a so-called nugget. However, this can change the model and introduce error to the original solution. It is well known from numerical analysis that ill-conditioned matrices cannot be accurately inverted. In this paper we develop a multilevel computational method that scales well with the number of observations and dimensions. A multilevel basis is constructed adapted to a kD-tree partitioning of the observations. Numerically unstable covariance matrices with large condition numbers can be transformed into well conditioned multilevel ones without compromising accuracy. Moreover, it is shown that the multilevel prediction exactly solves the Best Linear Unbiased Predictor (BLUP) and Generalized Least Squares (GLS) model, but is numerically stable. The multilevel method is tested on numerically unstable problems of up to 25 dimensions. Numerical results show speedups of up to 42,050 times for solving the BLUP problem, but with the same accuracy as the traditional iterative approach. For very ill-conditioned cases the speedup is infinite. In addition, decay estimates of the multilevel covariance matrices are derived based on high dimensional interpolation techniques from the field of numerical analysis. This work lies at the intersection of statistics, uncertainty quantification, high performance computing and computational applied mathematics.	翻訳日:2024-04-28 14:58:07 公開日:2024-04-24
# 非地方放送 Broadcasting of non-locality ( http://arxiv.org/abs/1909.12565v2 ) ライセンス: Link先を確認	Dhrumil Patel, Arup Roy, Indranil Chakrabarty, Nirman Ganguly,	(参考訳) ベル非局所性とステアリング(英: Bell nonlocality and steering)は、従来の古典的概念から大きく離れている量子力学の根本的特徴である。基本的には、非局所的不等式に反する分離系間の量子相関の存在を指し、古典的相関のみに制限すれば、その違反はあり得ない。このようなユニークな相関性の重要性を考慮すると、放送と呼ばれるプロトコルである、少数の非局所性を示すより多くの状態を生成することに興味があるかもしれない。しかし、本論文では、ブゼック・ヒラリー(BH)量子クローニング機を用いて、局所的な量子クローニングによるブロードキャストを制限すると、そのような非局所性はブロードキャストできないことを示す。本稿ではCJWR(E.G.Cavalcanti,S.J. Jones,H.M Wiseman,M.D. Reid, Phys.Rev.A 80,032112(2009))の不等式について検討する。測定条件が6以上であれば, 局所最適B-Hクローンを適用すれば, 出力状態の一部が制御可能であることが観察された。いくつかの制限の下で、Werner と Bell の対角状態がそのような手順に従うと、結果として得られる状態は計算不能となる。我々はこの研究を3つの量子ビット系に拡張し、Svetlichnyの不等式を考えると、真の三部体非局所性は普遍的なBH局所量子クローンでは放送できないことを発見した。 2つの量子ビット系に対して、Werner と Bell の対角線上の一般局所ユニタリを10^5$ のシミュレーションで検討した。 Bell nonlocality and steering are archetypal characteristics of quantum mechanics that mark a significant departure from conventional classical notions. They basically refer to the presence of quantum correlations between separated systems which violate a nonlocal inequality, the violation otherwise not possible if we restrict ourselves only to classical correlations. In view of the importance of such unique correlations one may be interested to generate more states exhibiting nonlocality starting from a few, a protocol which is termed as broadcasting. However, in the present submission, we show using universal Buzek-Hillary(BH) quantum cloning machine that, if one restricts to broadcasting through local quantum cloning, then such nonlocality cannot be broadcasted. Our study is done in the purview of the Bell-CHSH inequality and the CJWR (E.G.Cavalcanti,S.J. Jones,H.M Wiseman and M.D. Reid, Phys.Rev.A 80,032112(2009)) steering inequality. It is observed that when number of measurement settings is greater than 6, some of the output states are steerable after the application of local optimal B-H cloners. We find that, under some restrictions, if the Werner and Bell diagonal states are subjected to such procedures, then the resultant state is rendered unsteerable. We extend this study to three qubit systems and find that genuine tripartite nonlocality cannot be broadcasted using universal BH local quantum cloners, when we consider the Svetlichny's inequality. For two qubit systems, we have considered $10^5$ simulated general local unitaries over Werner and Bell-diagonal states and find that for none of these states broadcasting on nonlocality and 3-steerability is possible.	翻訳日:2024-04-28 14:58:07 公開日:2024-04-24
# MAMLはいつ最適か? NLP応用におけるモデル非依存メタラーニングに関する実証的研究 When does MAML Work the Best? An Empirical Study on Model-Agnostic Meta-Learning in NLP Applications ( http://arxiv.org/abs/2005.11700v2 ) ライセンス: Link先を確認	Zequn Liu, Ruiyi Zhang, Yiping Song, Wei Ju, Ming Zhang,	(参考訳) モデルに依存しないメタラーニング手法であるモデル非依存メタラーニング(MAML)は、少数ショットテキスト分類やマルチドメイン低リソース言語生成を含むNLPアプリケーションに成功している。データ量、タスク間の類似性、一般的な言語モデルとタスク固有の適応のバランスなど、多くの影響要因がNLPにおけるMAMLの性能に影響を与えるが、それらを徹底的に研究する研究は少ない。本稿では,これらの影響要因を調査し,MAMLが最適に機能するかどうかを実験的に検討する。 Model-Agnostic Meta-Learning (MAML), a model-agnostic meta-learning method, is successfully employed in NLP applications including few-shot text classification and multi-domain low-resource language generation. Many impacting factors, including data quantity, similarity among tasks, and the balance between general language model and task-specific adaptation, can affect the performance of MAML in NLP, but few works have thoroughly studied them. In this paper, we conduct an empirical study to investigate these impacting factors and conclude when MAML works the best based on the experimental results.	翻訳日:2024-04-28 14:58:07 公開日:2024-04-24
# ブラインド画像復元に向けた深部変動ネットワーク Deep Variational Network Toward Blind Image Restoration ( http://arxiv.org/abs/2008.10796v5 ) ライセンス: Link先を確認	Zongsheng Yue, Hongwei Yong, Qian Zhao, Lei Zhang, Deyu Meng, Kwan-Yee K. Wong,	(参考訳) ブラインド画像復元(IR)はコンピュータビジョンにおいて一般的な問題であるが難しい問題である。古典的モデルに基づく手法と最近のディープラーニング(DL)に基づく手法は、この問題に対する2つの異なる方法論を表現している。本稿では,その両方の利点を統合することを目的とした,新しいブラインド画像復元手法を提案する。具体的には、劣化過程を明示したブラインドIRのための一般的なベイズ生成モデルを構築する。提案したモデルでは,画像ノイズに適合するために,ガウス分布を画素単位で非i-d-ガウス分布とする。従来のほとんどの方法で採用されている単純なガウス分布やラプラシア分布よりも柔軟性があり、画像劣化に含まれるより複雑なノイズタイプを扱うことができる。モデル解くために,予測されるすべての後部分布をディープニューラルネットワークとしてパラメータ化してモデル能力を向上する変分推論アルゴリズムを設計する。特に、このような推論アルゴリズムは、劣化推定と画像復元のタスクを共同で扱う統一的なフレームワークを誘導する。また、前処理で推定される劣化情報を利用して後者のIRプロセスを導出する。画像デノイングと超解像という2つの典型的なブラインド赤外線タスクの実験により,提案手法が現状よりも優れた性能を達成できることが実証された。 Blind image restoration (IR) is a common yet challenging problem in computer vision. Classical model-based methods and recent deep learning (DL)-based methods represent two different methodologies for this problem, each with their own merits and drawbacks. In this paper, we propose a novel blind image restoration method, aiming to integrate both the advantages of them. Specifically, we construct a general Bayesian generative model for the blind IR, which explicitly depicts the degradation process. In this proposed model, a pixel-wise non-i.i.d. Gaussian distribution is employed to fit the image noise. It is with more flexibility than the simple i.i.d. Gaussian or Laplacian distributions as adopted in most of conventional methods, so as to handle more complicated noise types contained in the image degradation. To solve the model, we design a variational inference algorithm where all the expected posteriori distributions are parameterized as deep neural networks to increase their model capability. Notably, such an inference algorithm induces a unified framework to jointly deal with the tasks of degradation estimation and image restoration. Further, the degradation information estimated in the former task is utilized to guide the latter IR process. Experiments on two typical blind IR tasks, namely image denoising and super-resolution, demonstrate that the proposed method achieves superior performance over current state-of-the-arts.	翻訳日:2024-04-28 14:58:07 公開日:2024-04-24
# ディープラーニングによる外見に基づく視線推定: レビューとベンチマーク Appearance-based Gaze Estimation With Deep Learning: A Review and Benchmark ( http://arxiv.org/abs/2104.12668v2 ) ライセンス: Link先を確認	Yihua Cheng, Haofei Wang, Yiwei Bao, Feng Lu,	(参考訳) 人間の視線は人間の焦点や意図に関する貴重な情報を提供しており、重要な研究領域となっている。近年,深層学習は外見に基づく視線推定に革命をもたらした。しかし、2次元視線位置と3次元視線ベクトルの不公平な比較や、異なる前処理と後処理の方法など、視線推定研究の独特な特徴から、深層学習に基づく視線推定アルゴリズムを開発するための決定的なガイドラインが欠如している。本稿では,ディープラーニングを用いた外見に基づく視線推定手法の体系的レビューを行う。まず,従来の視線推定アルゴリズムを,深い特徴抽出,深層学習モデル設計,個人キャリブレーション,プラットフォームなど,典型的な視線推定パイプラインに沿って調査する。次に, 顔・目検出, データ修正, 2D/3D視線変換, 視線原点変換などのデータ前処理と後処理の手法を概説する。最後に、深層学習に基づく視線推定のための総合的なベンチマークを設定した。我々は、すべての公開データセットを特徴付け、典型的な視線推定アルゴリズムのソースコードを提供する。本稿では,深層学習に基づく視線推定手法の開発への参考となるだけでなく,将来の視線推定研究の指針となる。プロジェクトのWebページはhttps://phi-ai.buaa.edu.cn/Gazehub.orgにある。 Human gaze provides valuable information on human focus and intentions, making it a crucial area of research. Recently, deep learning has revolutionized appearance-based gaze estimation. However, due to the unique features of gaze estimation research, such as the unfair comparison between 2D gaze positions and 3D gaze vectors and the different pre-processing and post-processing methods, there is a lack of a definitive guideline for developing deep learning-based gaze estimation algorithms. In this paper, we present a systematic review of the appearance-based gaze estimation methods using deep learning. Firstly, we survey the existing gaze estimation algorithms along the typical gaze estimation pipeline: deep feature extraction, deep learning model design, personal calibration and platforms. Secondly, to fairly compare the performance of different approaches, we summarize the data pre-processing and post-processing methods, including face/eye detection, data rectification, 2D/3D gaze conversion and gaze origin conversion. Finally, we set up a comprehensive benchmark for deep learning-based gaze estimation. We characterize all the public datasets and provide the source code of typical gaze estimation algorithms. This paper serves not only as a reference to develop deep learning-based gaze estimation methods, but also a guideline for future gaze estimation research. The project web page can be found at https://phi-ai.buaa.edu.cn/Gazehub.	翻訳日:2024-04-28 14:58:07 公開日:2024-04-24
# Maxwell Demon と Einstein-Podolsky-Rosen ステアリング Maxwell Demon and Einstein-Podolsky-Rosen Steering ( http://arxiv.org/abs/2105.05656v4 ) ライセンス: Link先を確認	Meng-Jun Hu, Xiao-Min Hu, Yong-Sheng Zhang,	(参考訳) マクスウェルの悪魔と量子絡み合いの研究は、物理学における基礎的な重要性と量子情報への潜在的な応用のために重要である。マクスウェルのデーモンに関するこれまでの研究は、主に量子相関を考慮した熱力学に焦点を当てていた。ここでは、別の観点から考察し、量子非局所性相関が作業によってシミュレートできるかどうかを問う。このため、マックスウェルの悪魔支援型アインシュタイン・ポドルスキー・ローゼン(EPR)ステアリングが提案され、新しいタイプの抜け穴が示唆された。ランダウアーの消去原理の適用は、操舵作業中にこの抜け穴を閉じる唯一の方法は、参加者による局所環境の熱変動を継続的に監視することであることを示している。我々は、超伝導量子コンピュータのような現在のプログラマブル量子プロセッサで実証できる、マックスウェルのデモンアシスト型EPRステアリングの量子回路モデルを構築した。この量子回路モデルに基づいて、デーモンの作用によるエネルギー散逸と量子非局所性相関の関係を記述する定量的な式を得る。この結果は、量子非局所性、情報、熱力学の関係を探索し理解する新しい方法を提供するため、非常に物理的に興味深い。 The study of Maxwell demon and quantum entanglement is important because of its foundational significance in physics and its potential applications in quantum information. Previous research on the Maxwell demon has primarily focused on thermodynamics, taking into account quantum correlations. Here we consider from another perspective and ask whether quantum non-locality correlations can be simulated by performing work. The Maxwell demon-assisted Einstein-Podolsky-Rosen (EPR) steering is thus proposed, which implies a new type of loophole. The application of Landauer's erasure principle suggests that the only way to close this loophole during a steering task is by continuously monitoring the heat fluctuation of the local environment by the participant. We construct a quantum circuit model of Maxwell demon-assisted EPR steering, which can be demonstrated by current programmable quantum processors, such as superconducting quantum computers. Based on this quantum circuit model, we obtain a quantitative formula describing the relationship between energy dissipation due to the work of the demon and quantum non-locality correlation. The result is of great physical interest because it provides a new way to explore and understand the relationship between quantum non-locality, information, and thermodynamics.	翻訳日:2024-04-27 00:45:56 公開日:2024-04-24
# 偽のCOVID-19物語が相次ぐ-時相分析 The False COVID-19 Narratives That Keep Being Debunked: A Spatiotemporal Analysis ( http://arxiv.org/abs/2107.12303v3 ) ライセンス: Link先を確認	Iknoor Singh, Kalina Bontcheva, Carolina Scarton,	(参考訳) 新型コロナウイルス(COVID-19)のパンデミックが始まり、世界のインフォデミックは市民、メディア、ファクトチェッカーに前例のない挑戦をもたらした。この課題に対処するため、世界中の100以上のファクトチェックイニシアチブが、彼らの国の情報空間を監視し、新型コロナウイルス(COVID-19)の物語を定期的に公開してきた。本研究では、さまざまなファクトチェック組織によって複数の言語で公開された新型コロナウイルスに関連する10,381件の文書を含むCoronaVirusFacts Allianceのデータベースを調査した。我々の時空間分析では、類似またはほぼ重複した偽の新型コロナウイルスの物語が、様々な国の様々なソーシャルメディアプラットフォームで拡散していることが明らかとなり、時にはその物語の最初の一節が国際ファクトチェックネットワーク(IFCN)のファクトチェッカーによって公表されてから数ヶ月も経っている。また、一般的な医療アドバイスを含む誤報が複数の国に広まっていることもわかりました。さらに、手動のファクトチェックはそれ自体が厄介な作業であるため、異なる国で同じ物語を繰り返す必要性は、時間とともに、ファクトチェックリソースのかなりの無駄に導かれる。この目的のために我々は,ファクトチェックパイプラインに多言語デバンク検索ツールを組み込むことを提案し,また,不足するファクトチェックリソースを最大限に活用するために,ソーシャルメディアプラットフォームが大規模に同じ技術を採用する必要があることを強く推奨する。 The onset of the COVID-19 pandemic led to a global infodemic that has brought unprecedented challenges for citizens, media, and fact-checkers worldwide. To address this challenge, over a hundred fact-checking initiatives worldwide have been monitoring the information space in their countries and publishing regular debunks of viral false COVID-19 narratives. This study examines the database of the CoronaVirusFacts Alliance, which contains 10,381 debunks related to COVID-19 published in multiple languages by different fact-checking organisations. Our spatiotemporal analysis reveals that similar or nearly duplicate false COVID-19 narratives have been spreading in multiple modalities and on various social media platforms in different countries, sometimes as much as several months after the first debunk of that narrative has been published by an International Fact-checking Network (IFCN) fact-checker. We also find that misinformation involving general medical advice has spread across multiple countries and hence has the highest proportion of false COVID-19 narratives that keep being debunked. Furthermore, as manual fact-checking is an onerous task in itself, therefore the need to repeatedly debunk the same narrative in different countries is leading, over time, to a significant waste of fact-checker resources. To this end, we propose the idea of including a multilingual debunk search tool in the fact-checking pipeline, in addition to recommending strongly that social media platforms need to adopt the same technology at scale, so as to make the best use of scarce fact-checker resources.	翻訳日:2024-04-27 00:45:56 公開日:2024-04-24
# 単眼3次元物体検出のための投影モデルによる幾何学誘導深度学習 Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection ( http://arxiv.org/abs/2107.13931v2 ) ライセンス: Link先を確認	Yinmin Zhang, Xinzhu Ma, Shuai Yi, Jun Hou, Zhihui Wang, Wanli Ouyang, Dan Xu,	(参考訳) 自動運転の重要な課題として、近年3Dオブジェクト検出は大きな進歩を遂げている。しかし, 深度推定における不満足な性能のため, 単分子3次元物体検出は依然として困難な問題である。既存のモノクラー法は、通常、シーンの深さを直接回帰するが、深さと様々な幾何学的要素(例えば、境界箱のサイズ、3Dオブジェクトの寸法、オブジェクトのポーズ)の間の重要な関係を無視している。本稿では,投影モデルを用いて幾何学誘導深度推定を学習し,モノクル3次元物体検出を推し進めることを提案する。具体的には,モノクロ3次元物体検出ネットワークにおける2次元および3次元深度予測の投影モデルを用いた原理的幾何式を考案した。さらに,提案式の実装と組込みにより,幾何を考慮した深部表現学習が可能となり,深部推定の促進に有効な2次元および3次元インタラクションが可能となった。さらに,2次元アノテーションと投影ボックスの相違に対処し,幾何学式による頑健な学習を確保することで,強力なベースラインを提供する。 KITTIデータセットを用いた実験により, 適度なテスト設定において, 余分なデータを必要としない最先端単分子法の検出性能を2.80%向上することを確認した。モデルとコードはhttps://github.com/YinminZhang/MonoGeo.comでリリースされる。 As a crucial task of autonomous driving, 3D object detection has made great progress in recent years. However, monocular 3D object detection remains a challenging problem due to the unsatisfactory performance in depth estimation. Most existing monocular methods typically directly regress the scene depth while ignoring important relationships between the depth and various geometric elements (e.g. bounding box sizes, 3D object dimensions, and object poses). In this paper, we propose to learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection. Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised. We further implement and embed the proposed formula to enable geometry-aware deep representation learning, allowing effective 2D and 3D interactions for boosting the depth estimation. Moreover, we provide a strong baseline through addressing substantial misalignment between 2D annotation and projected boxes to ensure robust learning with the proposed geometric formula. Experiments on the KITTI dataset show that our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting. The model and code will be released at https://github.com/YinminZhang/MonoGeo.	翻訳日:2024-04-27 00:45:56 公開日:2024-04-24
# 超伝導ニオブの2レベル系損失源としての一酸化ニオブ中の酸素空孔 Oxygen Vacancies in Niobium Pentoxide as a Source of Two-Level System Losses in Superconducting Niobium ( http://arxiv.org/abs/2108.13352v3 ) ライセンス: Link先を確認	Daniel Bafia, Akshay Murthy, Anna Grassellino, Alexander Romanenko,	(参考訳) 酸化ニオブからなる3次元超伝導無線周波数共振器と2次元トランスモン量子ビットの量子デコヒーレンスの主源を同定した。時空二次イオン質量分析法 (ToF-SIMS) を用いて, バルクNb SRF共振器のRF特性および代表Nb試料の酸化物構造に及ぼすシーケンシャル \textit{in situ} 真空焼成処理の影響を調べたところ, Nb\textsubscript{2}O\textsubscript{5} の空隙発生と酸化物厚みの減少に相関する空洞品質係数$Q_0$の非単調進化が認められた。この効果を酸化膜自体に局在させ, 酸化膜を酸化膜に再成長させることにより, TLS損失の緩和を図り, Nb中での拡散間質酸素の役割を明らかにした。我々は、一酸化炭素中のこれらの空孔が磁気不純物であり、TLSによるRF損失の原因であると仮定する。 We identify a major source of quantum decoherence in three-dimensional superconducting radio-frequency (SRF) resonators and two-dimensional transmon qubits composed of oxidized niobium: oxygen vacancies in the niobium pentoxide which drive two-level system (TLS) losses. By probing the effect of sequential \textit{in situ} vacuum baking treatments on the RF performance of bulk Nb SRF resonators and on the oxide structure of a representative Nb sample using time-of-flight secondary ion mass spectrometry (ToF-SIMS), we find a non-monotonic evolution of cavity quality factor $Q_0$ which correlates with the interplay of Nb\textsubscript{2}O\textsubscript{5} vacancy generation and oxide thickness reduction. We localize this effect to the oxide itself and present the insignificant role of diffused interstitial oxygen in the underlying Nb by regrowing a new oxide \textit{via} wet oxidation which reveals a mitigation of aggravated TLS losses. We hypothesize that such vacancies in the pentoxide serve as magnetic impurities and are a source of TLS-driven RF loss.	翻訳日:2024-04-27 00:45:56 公開日:2024-04-24
# ICDM 2020 Knowledge Graph Contest: Consumer Event-Cause extract ICDM 2020 Knowledge Graph Contest: Consumer Event-Cause Extraction ( http://arxiv.org/abs/2110.15722v2 ) ライセンス: Link先を確認	Congqing He, Jie Zhang, Xiangyu Zhu, Huan Liu, Yukun Huang,	(参考訳) テキスト中の特定のイベントの背後にある潜在的な原因を抽出するタスクであるConsumer Event-Cause extractは、その幅広い応用により近年注目を集めている。 ICDM 2020は、特定の主題(ブランドや製品)でイベントやイベントの原因を抽出することを目的とした評価コンペを開催する。このタスクでは、主にエンドツーエンドモデルの構築方法に注目し、複数のイベントタイプとイベント原因を同時に抽出する。そこで本稿では,イベントタイプやイベント原因を別々に抽出する代わりに,リレーショナルイベント原因抽出タスクを再検討する新たな視点を導入し,新しいシーケンスタギングフレームワークを提案する。実験では,エンコーダモジュールが初期化事前学習されたBERTエンコーダを使用して,新たなタグ付けフレームワークのパワーを示す場合においても,ベースラインメソッドよりも優れた性能を示す。この大会では,私たちのチームが第1ステージのリーダーボードで1位,最終ステージのリーダーボードで3位を獲得しました。 Consumer Event-Cause Extraction, the task aimed at extracting the potential causes behind certain events in the text, has gained much attention in recent years due to its wide applications. The ICDM 2020 conference sets up an evaluation competition that aims to extract events and the causes of the extracted events with a specified subject (a brand or product). In this task, we mainly focus on how to construct an end-to-end model, and extract multiple event types and event-causes simultaneously. To this end, we introduce a fresh perspective to revisit the relational event-cause extraction task and propose a novel sequence tagging framework, instead of extracting event types and events-causes separately. Experiments show our framework outperforms baseline methods even when its encoder module uses an initialized pre-trained BERT encoder, showing the power of the new tagging framework. In this competition, our team achieved 1st place in the first stage leaderboard, and 3rd place in the final stage leaderboard.	翻訳日:2024-04-27 00:45:56 公開日:2024-04-24
# BQPのアクロバティックス The Acrobatics of BQP ( http://arxiv.org/abs/2111.10409v4 ) ライセンス: Link先を確認	Scott Aaronson, DeVon Ingram, William Kretschmer,	(参考訳) ランダム化アルゴリズムが使用するランダム性を修正することができるが、量子性アルゴリズムが使用する量子性を修正するという類似概念は存在しない。この基本的な違いを説明すれば、ブラックボックスの設定では、量子多項式時間($\mathsf{BQP}$)の振舞いは、$\mathsf{NP}$のような古典的な複雑性クラスと著しく分離できることが示される。具体的には、–あるオラクルが存在し、$\mathsf{NP^{BQP}}\not\subset\mathsf{BQP^{PH}}$は、フォーチュウの2005年の問題を解く。圏として、$\mathsf{P}=\mathsf{NP}$であるが、$\mathsf{BQP}\neq\mathsf{QCMA}$であるようなオラクルが存在する。逆に、$\mathsf{BQP^{NP}}\not\subset\mathsf{PH^{BQP}}$であるようなオラクルが存在する。 -ランダムオラクルに対して、$\mathsf{PP}=\mathsf{PostBQP}$は "$\mathsf{QMA}$ hierarchy" $\mathsf{QMA}^{\mathsf{QMA}^{\mathsf{QMA}^{\cdots}}}$には含まれない。 -ランダムオラクルに対して、$\mathsf{\Sigma}_{k+1}^\mathsf{P}\not\subset\mathsf{BQP}^{\mathsf{\Sigma}_{k}^\mathsf{P}}$ for every $k$。オラクルは、$\mathsf{BQP}=\mathsf{P^{\# P}}$ に対して、$\mathsf{PH}$ は無限である。 -その関係は、$\mathsf{P}=\mathsf{NP}\neq\mathsf{BQP}=\mathsf{P^{\# P}}$である。これらの結果を達成するために、Raz と Tal による2018 年のオラクルの業績を $\mathsf{BQP}\not \subset \mathsf{PH}$ と比較し、Forrelation 問題に関する関連する結果に基づける。また、独立した関心を持つかもしれない新しいツールも導入します。ランダム制限法の「量子認識」バージョン、$\mathsf{AC^0}$回路のブロック感度に対する濃度定理、スパースオラクルに対するアーロンソン・アンバイニス射影の(証明可能な)アナログを含む。 One can fix the randomness used by a randomized algorithm, but there is no analogous notion of fixing the quantumness used by a quantum algorithm. Underscoring this fundamental difference, we show that, in the black-box setting, the behavior of quantum polynomial-time ($\mathsf{BQP}$) can be remarkably decoupled from that of classical complexity classes like $\mathsf{NP}$. Specifically: -There exists an oracle relative to which $\mathsf{NP^{BQP}}\not\subset\mathsf{BQP^{PH}}$, resolving a 2005 problem of Fortnow. As a corollary, there exists an oracle relative to which $\mathsf{P}=\mathsf{NP}$ but $\mathsf{BQP}\neq\mathsf{QCMA}$. -Conversely, there exists an oracle relative to which $\mathsf{BQP^{NP}}\not\subset\mathsf{PH^{BQP}}$. -Relative to a random oracle, $\mathsf{PP}=\mathsf{PostBQP}$ is not contained in the "$\mathsf{QMA}$ hierarchy" $\mathsf{QMA}^{\mathsf{QMA}^{\mathsf{QMA}^{\cdots}}}$. -Relative to a random oracle, $\mathsf{\Sigma}_{k+1}^\mathsf{P}\not\subset\mathsf{BQP}^{\mathsf{\Sigma}_{k}^\mathsf{P}}$ for every $k$. -There exists an oracle relative to which $\mathsf{BQP}=\mathsf{P^{\# P}}$ and yet $\mathsf{PH}$ is infinite. -There exists an oracle relative to which $\mathsf{P}=\mathsf{NP}\neq\mathsf{BQP}=\mathsf{P^{\# P}}$. To achieve these results, we build on the 2018 achievement by Raz and Tal of an oracle relative to which $\mathsf{BQP}\not \subset \mathsf{PH}$, and associated results about the Forrelation problem. We also introduce new tools that might be of independent interest. These include a "quantum-aware" version of the random restriction method, a concentration theorem for the block sensitivity of $\mathsf{AC^0}$ circuits, and a (provable) analogue of the Aaronson-Ambainis Conjecture for sparse oracles.	翻訳日:2024-04-27 00:45:56 公開日:2024-04-24
# 非一様超グラフ確率ブロックモデルにおける部分回復と弱い整合性 Partial recovery and weak consistency in the non-uniform hypergraph Stochastic Block Model ( http://arxiv.org/abs/2112.11671v3 ) ライセンス: Link先を確認	Ioana Dumitriu, Haixiao Wang, Yizhe Zhu,	(参考訳) 本研究では,非一様ハイパーグラフ確率ブロックモデル(HSBM)に基づくスパース・ランダム・ハイパーグラフにおけるコミュニティ検出問題について考察する。ランダムハイパーグラフが有界次数を持つ場合、少なくとも$\gamma$区切りを正しく分類した頂点を出力するスペクトルアルゴリズムを提供し、$\gamma\in (0.5,1)$はモデルの信号-雑音比(SNR)に依存する。頂点数が無限に近づくにつれてSNRが緩やかに増加すると、我々のアルゴリズムは弱い一貫性を達成し、非一様HSBMに対するGhoshdastidar と Dukkipati (2017) の以前の結果を改善する。スペクトルアルゴリズムは,(1) ハイパーエッジ選択: 誘導されたサブハイパーグラフに対して最大信号-雑音比を提供するために,特定のサイズのハイパーエッジを選択する; (2) スペクトル分割: 正規化された隣接行列を構築し,特異ベクトルに基づいて近似的な分割を得る; (3) 補正とマージ: 隣接テンソルからのハイパーエッジ情報を組み込んでエラー率保証をアップグレードする。本アルゴリズムの理論的解析は,非一様非一様ハイパーグラフに対する隣接行列の濃度と正則化に依存する。 We consider the community detection problem in sparse random hypergraphs under the non-uniform hypergraph stochastic block model (HSBM), a general model of random networks with community structure and higher-order interactions. When the random hypergraph has bounded expected degrees, we provide a spectral algorithm that outputs a partition with at least a $\gamma$ fraction of the vertices classified correctly, where $\gamma\in (0.5,1)$ depends on the signal-to-noise ratio (SNR) of the model. When the SNR grows slowly as the number of vertices goes to infinity, our algorithm achieves weak consistency, which improves the previous results in Ghoshdastidar and Dukkipati (2017) for non-uniform HSBMs. Our spectral algorithm consists of three major steps: (1) Hyperedge selection: select hyperedges of certain sizes to provide the maximal signal-to-noise ratio for the induced sub-hypergraph; (2) Spectral partition: construct a regularized adjacency matrix and obtain an approximate partition based on singular vectors; (3) Correction and merging: incorporate the hyperedge information from adjacency tensors to upgrade the error rate guarantee. The theoretical analysis of our algorithm relies on the concentration and regularization of the adjacency matrix for sparse non-uniform random hypergraphs, which can be of independent interest.	翻訳日:2024-04-27 00:45:56 公開日:2024-04-24
# FIRST:FrontrunnIngのレジリエントなスマートコントラクト FIRST: FrontrunnIng Resilient Smart ConTracts ( http://arxiv.org/abs/2204.00955v3 ) ライセンス: Link先を確認	Emrah Sariboz, Gaurav Panwar, Roopa Vishwanathan, Satyajayant Misra,	(参考訳) 暗号通貨の使用量の増加により、貸し出し、借り入れ、マージン取引などの従来の金融応用を暗号通貨の世界に広く浸透させてきた。一部のケースでは、本質的に透明で規制されていない暗号通貨が、これらのアプリケーションのユーザを攻撃します。悪意のあるエンティティは、現在処理されていない金融トランザクションの知識を活用し、未処理のトランザクションの前に独自のトランザクションを実行しようとする。この結果、財務的損失、不正確なトランザクション、さらにはより多くの攻撃にさらされる可能性がある。本稿では、最前線攻撃を防ぐフレームワークであるFIRSTを提案し、検証遅延関数やアグリゲートシグネチャを含む暗号プロトコルを用いて構築する。我々の設計では、VDFの公開パラメータを生成するためのフェデレートされたセットアップがあり、単一の信頼できるセットアップの必要性を排除しています。我々は、FIRSTを正式に分析し、Universal Composabilityフレームワークを用いてセキュリティを証明し、FIRSTの有効性を実験的に実証する。 Owing to the meteoric rise in the usage of cryptocurrencies, there has been a widespread adaptation of traditional financial applications such as lending, borrowing, margin trading, and more, to the cryptocurrency realm. In some cases, the inherently transparent and unregulated nature of cryptocurrencies leads to attacks on users of these applications. One such attack is frontrunning, where a malicious entity leverages the knowledge of currently unprocessed financial transactions submitted by users and attempts to get its own transaction(s) executed ahead of the unprocessed ones. The consequences of this can be financial loss, inaccurate transactions, and even exposure to more attacks. We propose FIRST, a framework that prevents frontrunning attacks, and is built using cryptographic protocols including verifiable delay functions and aggregate signatures. In our design, we have a federated setup for generating the public parameters of the VDF, thus removing the need for a single trusted setup. We formally analyze FIRST, prove its security using the Universal Composability framework and experimentally demonstrate the effectiveness of FIRST.	翻訳日:2024-04-27 00:37:16 公開日:2024-04-24
# 劣化適応を用いた3次元MRI超解像の教師なし表現学習 Unsupervised Representation Learning for 3D MRI Super Resolution with Degradation Adaptation ( http://arxiv.org/abs/2205.06891v5 ) ライセンス: Link先を確認	Jianan Liu, Hao Li, Tao Huang, Euijoon Ahn, Kang Han, Adeel Razi, Wei Xiang, Jinman Kim, David Dagan Feng,	(参考訳) 高分解能(HR)磁気共鳴イメージングは、診断や画像誘導治療において医師を支援する上で重要である。しかし、HR画像の取得には時間と費用がかかる。その結果、低分解能(LR)画像から超解像(SR)画像を生成するための有望な解決策として、ディープラーニングに基づく超解像再構成(SRR)が登場した。残念なことに、そのようなニューラルネットワークのトレーニングには、画像取得中と画像取得間の患者の動きのために取得が困難である、整列したHRとLRイメージペアが必要である。硬組織の硬い動きは画像登録によって補正できるが、変形した軟組織の整列は複雑であり、真正なHRとLRイメージペアでニューラルネットワークを訓練することは不可能である。従来の研究では、真正なHR画像とダウンサンプリングされた合成LR画像を用いてSRRに焦点を当ててきた。しかし,合成LR画像と真性LR画像の劣化表現の違いは,真性LR画像から再構成したSR画像の品質を抑制する。この問題に対処するため,我々は,Unsupervised Degradation Adaptation Network (UDEAN)を提案する。我々のネットワークは劣化学習ネットワークとSRRネットワークで構成されている。劣化学習ネットワークは、不整合または不整合LR画像から学習した劣化表現を用いてHR画像をダウンサンプリングする。 SRRネットワークは、ダウンサンプリングされたHR画像から元の画像へのマッピングを学習する。実験の結果,本手法は最先端ネットワークよりも優れており,臨床現場での課題に対して有望な解決法であることがわかった。 High-resolution (HR) magnetic resonance imaging is critical in aiding doctors in their diagnoses and image-guided treatments. However, acquiring HR images can be time-consuming and costly. Consequently, deep learning-based super-resolution reconstruction (SRR) has emerged as a promising solution for generating super-resolution (SR) images from low-resolution (LR) images. Unfortunately, training such neural networks requires aligned authentic HR and LR image pairs, which are challenging to obtain due to patient movements during and between image acquisitions. While rigid movements of hard tissues can be corrected with image registration, aligning deformed soft tissues is complex, making it impractical to train neural networks with authentic HR and LR image pairs. Previous studies have focused on SRR using authentic HR images and down-sampled synthetic LR images. However, the difference in degradation representations between synthetic and authentic LR images suppresses the quality of SR images reconstructed from authentic LR images. To address this issue, we propose a novel Unsupervised Degradation Adaptation Network (UDEAN). Our network consists of a degradation learning network and an SRR network. The degradation learning network downsamples the HR images using the degradation representation learned from the misaligned or unpaired LR images. The SRR network then learns the mapping from the down-sampled HR images to the original ones. Experimental results show that our method outperforms state-of-the-art networks and is a promising solution to the challenges in clinical settings.	翻訳日:2024-04-27 00:37:16 公開日:2024-04-24
# 教師なし時系列異常検出のための校正一級分類 Calibrated One-class Classification for Unsupervised Time Series Anomaly Detection ( http://arxiv.org/abs/2207.12201v2 ) ライセンス: Link先を確認	Hongzuo Xu, Yijie Wang, Songlei Jian, Qing Liao, Yongjun Wang, Guansong Pang,	(参考訳) 時系列異常検出は、様々な領域におけるシステム可用性を維持するのに重要である。この研究ラインにおける現在の研究は、先進的なニューラルネットワーク構造を考案し、新しい再構築・予測学習目標を考案することによって、データの正規性を深く、包括的に学習することに焦点を当てている。しかし、その一級学習過程は、教師なしパラダイムの下での訓練データ(すなわち異常汚染)の潜伏異常によって誤解されることがある。彼らの学習プロセスは異常に関する知識も欠如している。その結果、バイアスのある不正確な正規性境界をしばしば学習する。これらの問題に対処するために,不確実性モデルに基づく校正とネイティブな異常に基づく校正による汚染耐性,データ正規性の異常情報学習を実現した,異常検出のための校正一級分類を提案する。具体的には、最適化中に不規則なサンプルを不規則に抑えるための不確実な予測を適応的に適用し、同時に正規サンプルに対する確実な予測を奨励し、効果的な正規性学習を確実にする。これにより、異常な汚染による悪影響がほとんど軽減される。また,本手法は時系列異常動作をシミュレートするための摂動による自然異常例も生成する。これらのダミー異常を識別することで、我々の一級学習はさらに校正され、より正確な正規性境界を形成する。 10の実世界のデータセットに対する大規模な実験により、我々のモデルは16の最先端の競合者よりも大幅に改善されていることが示される。 Time series anomaly detection is instrumental in maintaining system availability in various domains. Current work in this research line mainly focuses on learning data normality deeply and comprehensively by devising advanced neural network structures and new reconstruction/prediction learning objectives. However, their one-class learning process can be misled by latent anomalies in training data (i.e., anomaly contamination) under the unsupervised paradigm. Their learning process also lacks knowledge about the anomalies. Consequently, they often learn a biased, inaccurate normality boundary. To tackle these problems, this paper proposes calibrated one-class classification for anomaly detection, realizing contamination-tolerant, anomaly-informed learning of data normality via uncertainty modeling-based calibration and native anomaly-based calibration. Specifically, our approach adaptively penalizes uncertain predictions to restrain irregular samples in anomaly contamination during optimization, while simultaneously encouraging confident predictions on regular samples to ensure effective normality learning. This largely alleviates the negative impact of anomaly contamination. Our approach also creates native anomaly examples via perturbation to simulate time series abnormal behaviors. Through discriminating these dummy anomalies, our one-class learning is further calibrated to form a more precise normality boundary. Extensive experiments on ten real-world datasets show that our model achieves substantial improvement over sixteen state-of-the-art contenders.	翻訳日:2024-04-27 00:37:16 公開日:2024-04-24
# 信頼できない教師からの正直な学生:事前学習された言語モデルから解釈可能な質問答えパイプラインを学習する Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model ( http://arxiv.org/abs/2210.02498v3 ) ライセンス: Link先を確認	Jacob Eisenstein, Daniel Andor, Bernd Bohnet, Michael Collins, David Mimno,	(参考訳) 説明可能な質問応答システムは、正確な回答だけでなく、推論を正当化し、人間が作業を確認するための合理的な根拠も生み出すべきである。しかし、どんな理屈が役に立つのか、どうやってシステムをトレーニングして生産できるのか? 本稿では,オープンブックの質問応答に対する新たな論理的手法である「emph{markup-and-mask}」を提案する。マークアップフェーズでは、節は自由テキストのマークアップで拡張され、各文は談話コンテキストの外側で独立して立つことができる。マスキングフェーズでは、マークアップ通路のサブスパンが選択される。アノテーションを使わずにマークアップ・アンド・マスクの合理性を生成するシステムを訓練するには,文脈内学習を活用する。具体的には,教師として機能する凍結した事前学習言語モデルに一連のプロンプトを送信することで,銀アノテートデータを生成する。次に、正しい答えをもたらす有理数の部分集合をトレーニングすることで、より小さな学生モデルを微調整する。生徒は、それがパイプラインであるという意味では「最高」であり、理性は通路と答えの間のボトルネックとして機能し、「信頼できない」教師はそのような制約を受けない。したがって、エンドタスクアノテーションとフリーズされた事前訓練された言語モデルを組み合わせて、信頼できるパイプラインシステムを構築する新しい方法を提供する。 Explainable question answering systems should produce not only accurate answers but also rationales that justify their reasoning and allow humans to check their work. But what sorts of rationales are useful and how can we train systems to produce them? We propose a new style of rationale for open-book question answering, called \emph{markup-and-mask}, which combines aspects of extractive and free-text explanations. In the markup phase, the passage is augmented with free-text markup that enables each sentence to stand on its own outside the discourse context. In the masking phase, a sub-span of the marked-up passage is selected. To train a system to produce markup-and-mask rationales without annotations, we leverage in-context learning. Specifically, we generate silver annotated data by sending a series of prompts to a frozen pretrained language model, which acts as a teacher. We then fine-tune a smaller student model by training on the subset of rationales that led to correct answers. The student is "honest" in the sense that it is a pipeline: the rationale acts as a bottleneck between the passage and the answer, while the "untrusted" teacher operates under no such constraints. Thus, we offer a new way to build trustworthy pipeline systems from a combination of end-task annotations and frozen pretrained language models.	翻訳日:2024-04-27 00:37:16 公開日:2024-04-24
# スケールにおけるノイズ・ロバストデ複製 Noise-Robust De-Duplication at Scale ( http://arxiv.org/abs/2210.04261v2 ) ライセンス: Link先を確認	Emily Silcock, Luca D'Amico-Wong, Jinglin Yang, Melissa Dell,	(参考訳) 大規模でノイズの多いテキストコーパス内の重複のほぼ特定には、トレーニングデータセットの非重複化、プライバシーリスクの低減、テストセットリークの評価、大規模なコーパス内の再生されたニュース記事や文学の特定など、数多くのアプリケーションがある。これらの多様なアプリケーションの中で、圧倒的な作業はN-gramに依存している。 N-gram法がいかにうまく機能するかを評価するための限定的な努力がなされているが、その理由の一部は、大規模なコーパスに対して、どのように偏りのない評価データセットを作成できるかがはっきりしないためである。本研究は,27,210個の文書データセットと122,876個の正の重複ペアを作成し,ノイズ・ロバスト重複の除去について検討する。ニュースのタイムセンシティブさは、コーパスの全体サイズが大きいにも関わらず、短い日付範囲内で重複が発生するため、包括的ハンドラベリングを可能にする。この研究は、ハッシュとN-gramオーバーラップ(文学において支配的な)、対照的に訓練されたバイエンコーダ、およびバイエンコーダとクロスエンコーダを組み合わせたリランクスタイルアプローチなど、様々な非複製手法を開発し、評価する。神経アプローチはハッシュとN-gramの重なりを著しく上回る。バイエンコーダのスケールは良好で、1つのGPUカードに1000万記事のコーパスを数時間で非重複化する。また、トレーニング済みのモデルをRealNewsやC4(Colossal Clean Crawled Corpus)の特許部分に適用し、ニューラルアプローチは、様々な種類のノイズの存在下で、ハッシュによって欠落した多くのほぼ重複を識別できることを示した。 NEWS-COPYの非重複データセット、コードベース、事前訓練されたモデルのパブリックリリースは、さらなる研究と応用を促進するでしょう。 Identifying near duplicates within large, noisy text corpora has a myriad of applications that range from de-duplicating training datasets, reducing privacy risk, and evaluating test set leakage, to identifying reproduced news articles and literature within large corpora. Across these diverse applications, the overwhelming majority of work relies on N-grams. Limited efforts have been made to evaluate how well N-gram methods perform, in part because it is unclear how one could create an unbiased evaluation dataset for a massive corpus. This study uses the unique timeliness of historical news wires to create a 27,210 document dataset, with 122,876 positive duplicate pairs, for studying noise-robust de-duplication. The time-sensitivity of news makes comprehensive hand labelling feasible - despite the massive overall size of the corpus - as duplicates occur within a narrow date range. The study then develops and evaluates a range of de-duplication methods: hashing and N-gram overlap (which predominate in the literature), a contrastively trained bi-encoder, and a re-rank style approach combining a bi- and cross-encoder. The neural approaches significantly outperform hashing and N-gram overlap. We show that the bi-encoder scales well, de-duplicating a 10 million article corpus on a single GPU card in a matter of hours. We also apply our pre-trained model to the RealNews and patent portions of C4 (Colossal Clean Crawled Corpus), illustrating that a neural approach can identify many near duplicates missed by hashing, in the presence of various types of noise. The public release of our NEWS-COPY de-duplication dataset, codebase, and the pre-trained models will facilitate further research and applications.	翻訳日:2024-04-27 00:37:16 公開日:2024-04-24
# 複数部品の絡み合い生成へのショートカット:ボソン減算へのグラフアプローチ Shortcut to Multipartite Entanglement Generation: A Graph Approach to Boson Subtractions ( http://arxiv.org/abs/2211.04042v5 ) ライセンス: Link先を確認	Seungbeom Chin, Yong-Su Kim, Marcin Karczewski,	(参考訳) 本稿では,線形ボソニック系における多部交絡を生成するスキームを体系的に探索するグラフ手法を提案する。階層型エンタングルメント生成は、ポストセレクトされたタスクよりも量子タスクに対する許容可能なスキームを提供するが、一般的にはマルチパーティイトシステムのための適切な回路を見つけることは困難である。ボソンサブトラクションからのグラフマッピングは,回路設計の限界を克服するための便利な手法であることを示す。本稿では,グラフ手法の実装を通じて限界を緩和する実践的戦略を提案する。我々の物理的な構成は彫刻プロトコルに基づいており、これは1つのボソンの空間的に重なり合ったサブトラクションを1つのボソンのフォック状態に変換するものである。キュービットN-パーティイトGHZおよびW状態の一般的なスキームを特定し、従来のスキームよりもはるかに効率的である。さらに、$N=3$ GHZ と W の絡み合った状態の重ね合わせを生成するためのスキームは、より一般化された絡み合った状態の形式を導出するために我々のアプローチを拡張することができることを示している。さらに,従来の提案よりもかなり少ない粒子を必要とするN-パーティイトGHZ状態生成方式が発見された。これらの結果は,厳密な密接な絡み合った状態を生成するための最適化された解を発見する上で,我々のアプローチの力を示すものである。概念実証として,ベル状態生成のための線形光学スキームを提案する。我々は本手法が多様な絡み合いを生み出す上で有望なツールになることを期待している。 We propose a graph method for systematically searching for schemes that can generate multipartite entanglement in linear bosonic systems with heralding. While heralded entanglement generation offers more tolerable schemes for quantum tasks than postselected ones, it is generally more challenging to find appropriate circuits for multipartite systems. We show that our graph mapping from boson subtractions provides handy tactics to overcome the limitations in circuit designs. We present a practical strategy to mitigate the limitation through the implementation of our graph technique. Our physical setup is based on the sculpting protocol, which utilizes an $ N$ spatially overlapped subtractions of single bosons to convert Fock states of evenly distributed bosons into entanglement. We have identified general schemes for qubit N-partite GHZ and W states, which are significantly more efficient than previous schemes. In addition, our scheme for generating the superposition of $N=3$ GHZ and W entangled states illustrates that our approach can be extended to derive more generalized forms of entangled states. Furthermore, we have found an N-partite GHZ state generation scheme for qudits, which requires substantially fewer particles than previous proposals. These results demonstrate the power of our approach in discovering optimized solutions for the generation of intricate heralded entangled states. As a proof of concept, we propose a linear optical scheme for the generation of the Bell state by heralding detections. We expect our method to serve as a promising tool in generating diverse entanglement.	翻訳日:2024-04-27 00:37:16 公開日:2024-04-24
# 低精度環境下でのリプシッツ連続損失関数に対するSGDの変動 Variants of SGD for Lipschitz Continuous Loss Functions in Low-Precision Environments ( http://arxiv.org/abs/2211.04655v7 ) ライセンス: Link先を確認	Michael R. Metel,	(参考訳) この研究は、低精度算術環境でのニューラルネットワークトレーニングによって動機付けられ、適応的なステップサイズと計算誤差を用いたSGDの変種収束について研究する。一般確率的リプシッツ連続損失関数を考えると、クラーク定常点への漸近収束と近似定常点への非漸近収束が証明される。損失関数の確率勾配の近似のみを計算し、SGDステップ自体の誤差を計算できると仮定する。 SGDの異なる変種を経験的にテストし、2つの画像認識タスクに対してSGDと比較してテストセットの精度が改善された。 Motivated by neural network training in low-precision arithmetic environments, this work studies the convergence of variants of SGD using adaptive step sizes with computational error. Considering a general stochastic Lipschitz continuous loss function, an asymptotic convergence result to a Clarke stationary point is proven as well as the non-asymptotic convergence to an approximate stationary point. It is assumed that only an approximation of the loss function's stochastic gradient can be computed in addition to error in computing the SGD step itself. Different variants of SGD are tested empirically, where improved test set accuracy is observed compared to SGD for two image recognition tasks.	翻訳日:2024-04-27 00:37:16 公開日:2024-04-24
# デュアルラベル分布を用いた軽量顔面運動性予測 Lightweight Facial Attractiveness Prediction Using Dual Label Distribution ( http://arxiv.org/abs/2212.01742v2 ) ライセンス: Link先を確認	Shu Liu, Enquan Huang, Ziyu Zhou, Yan Xu, Xiaoyan Kui, Tao Lei, Hongying Meng,	(参考訳) 顔の魅力予測(FAP)は、人間の美的知覚に基づいて顔の魅力を自動的に評価することを目的としている。ディープ畳み込みニューラルネットワークを用いた従来の手法では性能が向上したが、大規模なモデルでは柔軟性が欠如している。さらに、ほとんどのメソッドはデータセットを完全に活用することができません。本稿では,デュアルラベル分布と軽量設計を統合した新しいエンドツーエンドFAP手法を提案する。手動のレーティング、魅力スコア、標準偏差を明示的に集計して、2ラベルの分布を構築し、魅力分布や評価分布を含むデータセットを最大限に活用する。このような分布は,ラベル分散学習(LDL)パラダイムに基づく共同学習フレームワークで最適化される。データ処理は軽量な設計では最小限に単純化され、MobileNetV2がバックボーンとして選択されます。 2つのベンチマークデータセットで大規模な実験を行い、提案手法は有望な結果を達成し、性能と効率のバランスをとることに成功した。アブレーション研究は、繊細に設計された学習モジュールが必須であり、相関していることを示している。さらに, この手法は, 顔の魅力を知覚し, 魅力ある顔領域を捉え, 意味的予測を容易にすることが示唆された。コードはhttps://github.com/enquan/2D_FAPで公開されている。 Facial attractiveness prediction (FAP) aims to assess facial attractiveness automatically based on human aesthetic perception. Previous methods using deep convolutional neural networks have improved the performance, but their large-scale models have led to a deficiency in flexibility. In addition, most methods fail to take full advantage of the dataset. In this paper, we present a novel end-to-end FAP approach that integrates dual label distribution and lightweight design. The manual ratings, attractiveness score, and standard deviation are aggregated explicitly to construct a dual-label distribution to make the best use of the dataset, including the attractiveness distribution and the rating distribution. Such distributions, as well as the attractiveness score, are optimized under a joint learning framework based on the label distribution learning (LDL) paradigm. The data processing is simplified to a minimum for a lightweight design, and MobileNetV2 is selected as our backbone. Extensive experiments are conducted on two benchmark datasets, where our approach achieves promising results and succeeds in balancing performance and efficiency. Ablation studies demonstrate that our delicately designed learning modules are indispensable and correlated. Additionally, the visualization indicates that our approach can perceive facial attractiveness and capture attractive facial regions to facilitate semantic predictions. The code is available at https://github.com/enquan/2D_FAP.	翻訳日:2024-04-27 00:37:16 公開日:2024-04-24
# 複雑なネットワーク力学のストレッチと計測による神経予測 Stretched and measured neural predictions of complex network dynamics ( http://arxiv.org/abs/2301.04900v4 ) ライセンス: Link先を確認	Vaiva Vasiliauskaite, Nino Antulov-Fantulin,	(参考訳) 微分方程式は、物理的システムから複雑なシステムまで、多くのエージェントが非自明な位相的特徴を持つグラフを通して相互作用する、力学を研究するユビキタスなツールである。微分方程式のデータ駆動近似は、特に明示的な第一原理を欠いた複雑なシステムにおいて、力学系のモデルを明らかにする従来の方法に代わる有望な方法を示す。最近、ダイナミックスを研究する機械学習ツールとしてニューラルネットワークが採用されている。これは、データ駆動型ソリューションの検出や微分方程式の発見に使用できる。特に後者のタスクでは、観測されていない状態空間領域や新しいグラフのダイナミクスを予測するような、未知の設定でディープラーニングモデルをデプロイすることは、急激な結果をもたらす可能性がある。グラフを通して結合された一階微分方程式の系で力学を記述する複雑なシステムに着目し、従来の統計的学習理論の限界を超えてモデルの一般化可能性を拡張することは可能であることを示す。しかし、この高度な一般化を実現するためには、ニューラルネットワークモデルが力学モデルに関する基本的な仮定に従う必要がある。さらに、推論中の予測品質を評価するための統計的意義テストを提案し、その予測においてニューラルネットワークの信頼性レベルを識別できるようにする。 Differential equations are a ubiquitous tool to study dynamics, ranging from physical systems to complex systems, where a large number of agents interact through a graph with non-trivial topological features. Data-driven approximations of differential equations present a promising alternative to traditional methods for uncovering a model of dynamical systems, especially in complex systems that lack explicit first principles. A recently employed machine learning tool for studying dynamics is neural networks, which can be used for data-driven solution finding or discovery of differential equations. Specifically for the latter task, however, deploying deep learning models in unfamiliar settings - such as predicting dynamics in unobserved state space regions or on novel graphs - can lead to spurious results. Focusing on complex systems whose dynamics are described with a system of first-order differential equations coupled through a graph, we show that extending the model's generalizability beyond traditional statistical learning theory limits is feasible. However, achieving this advanced level of generalization requires neural network models to conform to fundamental assumptions about the dynamical model. Additionally, we propose a statistical significance test to assess prediction quality during inference, enabling the identification of a neural network's confidence level in its predictions.	翻訳日:2024-04-27 00:27:30 公開日:2024-04-24
# 加速電子からのウンルー放射の測定 Measuring Unruh radiation from accelerated electrons ( http://arxiv.org/abs/2301.06772v5 ) ライセンス: Link先を確認	Gianluca Gregori, Giacomo Marocco, Subir Sarkar, Robert Bingham, Charles Wang,	(参考訳) 加速された電子から熱的ウンルー放射を検出することは、技術的な困難だけでなく、実験室で実際に見られるものに関する概念的明瞭さが欠如しているため、非常に難しい課題となっている。我々は、アンルー効果と2レベル原子系の放射の類似性に基づく、より単純なヒューリスティックな記述とともに、現在の解釈の要約を述べる。加速電子から熱光子の放出があるかどうかを検証する実験を提案する。 Detecting thermal Unruh radiation from accelerated electrons has presented a formidable challenge due not only to technical difficulties but also for lack of conceptual clarity about what is actually seen by a laboratory observer. We give a summary of the current interpretations along with a simpler heuristic description that draws on the analogy between the Unruh effect and radiation from a two-level atomic system. We propose an experiment to test whether there is emission of thermal photons from an accelerated electron.	翻訳日:2024-04-27 00:27:30 公開日:2024-04-24
# 合成負データを用いたハイブリッドオープンセットセグメンテーション Hybrid Open-set Segmentation with Synthetic Negative Data ( http://arxiv.org/abs/2301.08555v3 ) ライセンス: Link先を確認	Matej Grcić, Siniša Šegvić,	(参考訳) 開集合セグメンテーションは、閉集合分類と異常検出を補完することで実現できる。既存の高密度異常検出器の多くは、正規データの生成モデリングや、負のデータに対する識別によって機能する。これらの2つのアプローチは、異なる目的を最適化し、異なる障害モードを示す。そこで本研究では,生成的および識別的手がかりを融合させる新しい異常スコアを提案する。我々のスコアは、データセット後部および非正規化データの密度の高い推定値を持つ任意のクローズドセットセグメンテーションモデルをアップグレードすることで実現できる。結果として得られる密集したハイブリッドなオープンセットモデルには、負のトレーニングイメージが必要で、これは正の負のデータセットから、共同で訓練された生成モデルから、あるいは両方のソースの混合からサンプリングすることができる。我々は,高密度異常検出と開集合セグメンテーションのためのベンチマークへのコントリビューションを評価した。この実験は、計算オーバーヘッドが無視できないにもかかわらず、強力なオープンセット性能を示す。 Open-set segmentation can be conceived by complementing closed-set classification with anomaly detection. Many of the existing dense anomaly detectors operate through generative modelling of regular data or by discriminating with respect to negative data. These two approaches optimize different objectives and therefore exhibit different failure modes. Consequently, we propose a novel anomaly score that fuses generative and discriminative cues. Our score can be implemented by upgrading any closed-set segmentation model with dense estimates of dataset posterior and unnormalized data likelihood. The resulting dense hybrid open-set models require negative training images that can be sampled from an auxiliary negative dataset, from a jointly trained generative model, or from a mixture of both sources. We evaluate our contributions on benchmarks for dense anomaly detection and open-set segmentation. The experiments reveal strong open-set performance in spite of negligible computational overhead.	翻訳日:2024-04-27 00:27:30 公開日:2024-04-24
# リモートセンシング画像を用いた自己教師型学習のためのグローバル・ローカル・ビューアライメントの拡張 Extending global-local view alignment for self-supervised learning with remote sensing imagery ( http://arxiv.org/abs/2303.06670v2 ) ライセンス: Link先を確認	Xinye Wanyan, Sachith Seneviratne, Shuchang Shen, Michael Kirley,	(参考訳) 多数の高品質なリモートセンシング画像が容易にアクセス可能であるため、手動によるアノテーションの少ない画像のコーパスを利用すると注目が集まる。自己教師付きモデルは、大量のラベルのないデータに対して擬似ラベルを生成するプレテキストタスクを定式化し、訓練のための監督を提供することで、一般的な特徴表現を取得する。従来の研究では、リモートセンシング領域における複数の自己教師付き学習手法が検討されてきたが、自然画像に関する最先端の結果が得られたにもかかわらず、局所的な視点のアライメントに基づくプレテキストタスクは未探索のままである。グローバル・ローカル・ビューアライメントに基づく知識蒸留による効果的な表現学習構造を取り入れたDINOに着想を得て,リモートセンシング画像(SSLRS)を用いた自己教師型学習のための2つのプレテキストタスクを定式化した。これらのタスクを用いて、SSLRSのマルチサイズビューと同様に、正の時間的コントラストの有効性について検討する。我々は,DINOを拡張し,DINO-MCを提案する。DINO-MCは,リモートセンシング画像で観測される物体の大きさの限られた変化を緩和するために,単一の固定サイズではなく,様々な大きさの作物の局所的なビューを使用する。我々の実験は、データセットの10%しか事前トレーニングしていない場合でも、DINO-MCは計算資源を少ないまま、複数のリモートセンシングタスクにおいて既存の最先端SSLRSメソッドと同等かそれ以上の性能を発揮することを示した。すべてのコード、モデル、結果はhttps://github.com/WennyXY/DINO-MCで公開される。 Since large number of high-quality remote sensing images are readily accessible, exploiting the corpus of images with less manual annotation draws increasing attention. Self-supervised models acquire general feature representations by formulating a pretext task that generates pseudo-labels for massive unlabeled data to provide supervision for training. While prior studies have explored multiple self-supervised learning techniques in remote sensing domain, pretext tasks based on local-global view alignment remain underexplored, despite achieving state-of-the-art results on natural imagery. Inspired by DINO, which employs an effective representation learning structure with knowledge distillation based on global-local view alignment, we formulate two pretext tasks for self-supervised learning on remote sensing imagery (SSLRS). Using these tasks, we explore the effectiveness of positive temporal contrast as well as multi-sized views on SSLRS. We extend DINO and propose DINO-MC which uses local views of various sized crops instead of a single fixed size in order to alleviate the limited variation in object size observed in remote sensing imagery. Our experiments demonstrate that even when pre-trained on only 10% of the dataset, DINO-MC performs on par or better than existing state-of-the-art SSLRS methods on multiple remote sensing tasks, while using less computational resources. All codes, models, and results are released at https://github.com/WennyXY/DINO-MC.	翻訳日:2024-04-27 00:27:30 公開日:2024-04-24
# LatentForensics:StyleGAN潜伏空間におけるFragal Deepfake検出に向けて LatentForensics: Towards frugal deepfake detection in the StyleGAN latent space ( http://arxiv.org/abs/2303.17222v2 ) ライセンス: Link先を確認	Matthieu Delmas, Amine Kacete, Stephane Paquelet, Simon Leglaive, Renaud Seguier,	(参考訳) 偽ビデオの分類はここ数年、難しい課題だった。ディープフェイク分類器は、ビデオフレームが改ざんされたかどうかを確実に予測できる。しかしながら、それらのパフォーマンスは、トレーニングに使用されるデータセットと、アナリストの計算能力の両方に結びついている。本稿では,高品質な顔画像で訓練された最先端生成逆数ネットワーク(GAN)の潜時空間で動作するディープフェイク検出手法を提案する。提案手法は、StyleGANの潜在空間の構造を利用して、軽量な二項分類モデルを学ぶ。標準データセットに対する実験結果から,提案手法は他の最先端のディープフェイク分類手法よりも優れており,特に新しい操作手法を導入する場合など,モデルのトレーニングに使用可能なデータが稀な状況では,その性能が向上することが明らかとなった。我々の知る限りでは、この研究はStyleGANの潜伏空間の深い分類への関心を示す最初の研究である。この潜伏空間の解釈と操作に関する他の最近の研究と組み合わせて、顔画像の解釈可能な高レベル特性に基づくフラジアルディープフェイク分類法をさらに発展させることができると信じている。 The classification of forged videos has been a challenge for the past few years. Deepfake classifiers can now reliably predict whether or not video frames have been tampered with. However, their performance is tied to both the dataset used for training and the analyst's computational power. We propose a deepfake detection method that operates in the latent space of a state-of-the-art generative adversarial network (GAN) trained on high-quality face images. The proposed method leverages the structure of the latent space of StyleGAN to learn a lightweight binary classification model. Experimental results on standard datasets reveal that the proposed approach outperforms other state-of-the-art deepfake classification methods, especially in contexts where the data available to train the models is rare, such as when a new manipulation method is introduced. To the best of our knowledge, this is the first study showing the interest of the latent space of StyleGAN for deepfake classification. Combined with other recent studies on the interpretation and manipulation of this latent space, we believe that the proposed approach can further help in developing frugal deepfake classification methods based on interpretable high-level properties of face images.	翻訳日:2024-04-27 00:27:30 公開日:2024-04-24
# 深層学習モデル変換器の故障とリスクの分析:ONNXエコシステムを事例として Analysis of Failures and Risks in Deep Learning Model Converters: A Case Study in the ONNX Ecosystem ( http://arxiv.org/abs/2303.17708v3 ) ライセンス: Link先を確認	Purvish Jajal, Wenxin Jiang, Arav Tewari, Erik Kocinare, Joseph Woo, Anusha Sarraf, Yung-Hsiang Lu, George K. Thiruvathukal, James C. Davis,	(参考訳) ソフトウェアエンジニアは、さまざまな開発フレームワークとランタイム環境を使用して、ディープラーニング(DL)モデルを開発、微調整、デプロイします。 DLモデルコンバータは、フレームワークとランタイム環境の間でモデルを移動します。変換エラーによってモデルの品質が損なわれ、デプロイメントが破壊される。しかし、DLモデルコンバータの故障特性は不明であり、DLインターオペラビリティ技術を使用する場合のリスクが増大する。本稿では,DLモデルコンバータの故障解析を行う。我々は,DL相互運用性ツール,ユースケース,痛点(N=92)について,ソフトウェアエンジニアを調査した。次に、メインの相互運用性ツールであるONNX(PyTorchとTensorFlowのN=200問題)に関連するモデルコンバータの障害を特徴付ける。最後に、我々が研究した失敗の構造的原因に関する2つの仮説を定式化し、検証した。モデル変換器のノード変換段階が欠陥の75%を占め、報告された障害の33%が意味的に誤りのあるモデルと関連していることがわかった。意味的に不正確なモデルの原因は解明されているが、振る舞いの不整合のあるモデルは演算子シーケンスを共有する。我々の成果は、DLインターオペラビリティソフトウェアをメンテナンス、拡張、検証をより簡単にするための将来の研究を動機付けています。行動寛容とアーキテクチャカバレッジメトリクスの研究は実りあるかもしれない。 Software engineers develop, fine-tune, and deploy deep learning (DL) models using a variety of development frameworks and runtime environments. DL model converters move models between frameworks and to runtime environments. Conversion errors compromise model quality and disrupt deployment. However, the failure characteristics of DL model converters are unknown, adding risk when using DL interoperability technologies. This paper analyzes failures in DL model converters. We survey software engineers about DL interoperability tools, use cases, and pain points (N=92). Then, we characterize failures in model converters associated with the main interoperability tool, ONNX (N=200 issues in PyTorch and TensorFlow). Finally, we formulate and test two hypotheses about structural causes for the failures we studied. We find that the node conversion stage of a model converter accounts for ~75% of the defects and 33% of reported failure are related to semantically incorrect models. The cause of semantically incorrect models is elusive, but models with behaviour inconsistencies share operator sequences. Our results motivate future research on making DL interoperability software simpler to maintain, extend, and validate. Research into behavioural tolerances and architectural coverage metrics could be fruitful.	翻訳日:2024-04-27 00:27:30 公開日:2024-04-24
# 胸部CT検診における汎用医用AI Specialty-Oriented Generalist Medical AI for Chest CT Screening ( http://arxiv.org/abs/2304.02649v4 ) ライセンス: Link先を確認	Chuang Niu, Qing Lyu, Christopher D. Carothers, Parisa Kaviani, Josh Tan, Pingkun Yan, Mannudeep K. Kalra, Christopher T. Whitlow, Ge Wang,	(参考訳) 現代の医療記録には、膨大な量のマルチモーダル・フリーテキスト臨床データと、放射線学、心臓学、デジタル病理学からの画像データが含まれている。このようなビッグデータを完全にマイニングするにはマルチタスクが必要である。そうでなければ、オカルトだが重要な側面は見過ごされ、臨床管理や人口医療に悪影響を及ぼす可能性がある。単一モーダルデータを用いた個々のタスクにおけるAIの顕著な成功にもかかわらず、データキュレーションとモデルアーキテクチャの2つの課題のために、マルチタスクのためのマルチモーダルデータを組み合わせるための一般の医療AIの開発の進歩は、比較的遅いままである。データ課題は、マルチモーダルな構造化および非構造化のテキスト、アルファ数値、特にリアルタイム決定のための患者レベルでの3Dトモグラフィースキャンを、人口健康統計を推定するためのスケールでクエリし、キュレートすることである。このモデル課題は、多様な臨床タスクのためのマルチモーダルデータセットを統合するために、スケーラブルで適応可能なネットワークアーキテクチャを必要とする。本稿では,肺がん検診および関連する課題に応用したM3FMの基礎モデルを提案する。 163,725個の胸部CTシリーズを含む49種類の臨床データとLCSに関わる17の医療タスクからなる総合マルチモーダルマルチタスクデータセットをキュレートした後,我々は多モーダル情報の相乗化と自由テキストプロンプトによる複数タスク実行のための統一的なトレーニングおよび推論戦略として,多モーダル質問応答フレームワークを開発した。 M3FMは、最先端の単一モーダルタスク固有のモデルより一貫して優れており、臨床タスクに有用なマルチモーダルデータ要素を特定し、小さなアウト・オブ・ディストリビューションデータセットで新しいタスクに柔軟に適応する。専門的な汎用的な医療AIモデルとして、M3FMは、専門医とジェネラリストのギャップを埋め、他の分野における同様のブレークスルーの道を開く。 Modern medical records include a vast amount of multimodal free text clinical data and imaging data from radiology, cardiology, and digital pathology. Fully mining such big data requires multitasking; otherwise, occult but important aspects may be overlooked, adversely affecting clinical management and population healthcare. Despite remarkable successes of AI in individual tasks with single-modal data, the progress in developing generalist medical AI remains relatively slow to combine multimodal data for multitasks because of the dual challenges of data curation and model architecture. The data challenge involves querying and curating multimodal structured and unstructured text, alphanumeric, and especially 3D tomographic scans on an individual patient level for real-time decisions and on a scale to estimate population health statistics. The model challenge demands a scalable and adaptable network architecture to integrate multimodal datasets for diverse clinical tasks. Here we propose the first-of-its-kind medical multimodal-multitask foundation model (M3FM) with application in lung cancer screening and related tasks. After we curated a comprehensive multimodal multitask dataset consisting of 49 clinical data types including 163,725 chest CT series and 17 medical tasks involved in LCS, we develop a multimodal question-answering framework as a unified training and inference strategy to synergize multimodal information and perform multiple tasks via free-text prompting. M3FM consistently outperforms the state-of-the-art single-modal task-specific models, identifies multimodal data elements informative for clinical tasks and flexibly adapts to new tasks with a small out-of-distribution dataset. As a specialty-oriented generalist medical AI model, M3FM paves the way for similar breakthroughs in other areas of medicine, closing the gap between specialists and the generalist.	翻訳日:2024-04-27 00:27:30 公開日:2024-04-24
# オーバーロード:エッジデバイスのオブジェクト検出における遅延攻撃 Overload: Latency Attacks on Object Detection for Edge Devices ( http://arxiv.org/abs/2304.05370v3 ) ライセンス: Link先を確認	Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung, Che-rung Lee,	(参考訳) 今日では、インテリジェントなサービスに対する需要が高まっているため、ディープラーニングベースのアプリケーションのデプロイが不可欠である。本稿では,ディープラーニングアプリケーションに対する遅延攻撃について検討する。誤分類に対する一般的な敵攻撃とは異なり、遅延攻撃の目標は推論時間を増やすことであり、アプリケーションが適切な時間内に要求に応答するのを阻止する可能性がある。このような攻撃は様々なアプリケーションに広く適用されており、この種の攻撃がどのように動作するかを示すためにオブジェクト検出を使用します。また、大規模な遅延アタックを生成するOverloadというフレームワークも設計しています。提案手法は,新たに定式化された最適化問題と空間アテンションと呼ばれる新しい手法に基づく。この攻撃は、推論時間の間に必要となる計算コストを増大させ、結果としてオブジェクト検出のための推論時間が延長される。これは特に限られた計算資源を持つシステムに重大な脅威をもたらす。 Nvidia NX上でYOLOv5モデルを用いた実験を行った。既存の手法と比較して,本手法はよりシンプルで効果的である。実験の結果, 遅延攻撃では, 単一画像の推測時間は, 通常の設定の10倍長くなることがわかった。さらに,NMSに依存せず,非最大抑制(NMS)を必要とする全ての物体検出タスクに対して新たな脅威となる可能性が示唆された。 Nowadays, the deployment of deep learning-based applications is an essential task owing to the increasing demands on intelligent services. In this paper, we investigate latency attacks on deep learning applications. Unlike common adversarial attacks for misclassification, the goal of latency attacks is to increase the inference time, which may stop applications from responding to the requests within a reasonable time. This kind of attack is ubiquitous for various applications, and we use object detection to demonstrate how such kind of attacks work. We also design a framework named Overload to generate latency attacks at scale. Our method is based on a newly formulated optimization problem and a novel technique, called spatial attention. This attack serves to escalate the required computing costs during the inference time, consequently leading to an extended inference time for object detection. It presents a significant threat, especially to systems with limited computing resources. We conducted experiments using YOLOv5 models on Nvidia NX. Compared to existing methods, our method is simpler and more effective. The experimental results show that with latency attacks, the inference time of a single image can be increased ten times longer in reference to the normal setting. Moreover, our findings pose a potential new threat to all object detection tasks requiring non-maximum suppression (NMS), as our attack is NMS-agnostic.	翻訳日:2024-04-27 00:27:30 公開日:2024-04-24
# 自律運転テストを改善するデジタル兄弟 Two is Better Than One: Digital Siblings to Improve Autonomous Driving Testing ( http://arxiv.org/abs/2305.08060v2 ) ライセンス: Link先を確認	Matteo Biagiola, Andrea Stocco, Vincenzo Riccio, Paolo Tonella,	(参考訳) シミュレーションベースのテストは、自律運転ソフトウェアの信頼性を確保するための重要なステップである。実際には、企業が社内またはアウトソーステストのどちらかで、サードパーティの汎用シミュレータに頼っている場合、実際の自動運転車に対するテスト結果の一般化が重要になっている。本稿では、異なる技術で構築された複数の汎用シミュレータ上で、与えられた自動運転車をテストするマルチシミュレータアプローチであるデジタルシミュレータの概念を導入し、シミュレーションベースのテストを強化し、テストプロセスにおけるアンサンブルとして一括して動作する。我々は、自動運転車の車線維持コンポーネントのテストに焦点をあてたケーススタディに、我々のアプローチを例示する。我々は2つのオープンソースシミュレータをデジタルシグナリングとして使用し、このようなマルチシミュレータアプローチを、大規模なテストケースにおいて物理的にスケールされた自動運転車のディジタルツインに対して実証的に比較する。提案手法では,各シミュレータのテストケースの生成と実行を,道路点列の形式で行う必要がある。次に、テストケースをシミュレータ間で移動させ、特徴マップを用いて運動した運転条件を特徴付ける。最後に、共同予測故障確率を算出し、兄弟間の一致の場合のみ故障を報知する。実験により,デジタル双子の故障予測において,デジタル兄弟によるアンサンブル故障予測器が個々のシミュレータよりも優れていることが示された。ケーススタディの成果と,自律走行ソフトウェアの自動テストに関心のある研究者に,我々のアプローチがどのように役立つのかを詳述する。 Simulation-based testing represents an important step to ensure the reliability of autonomous driving software. In practice, when companies rely on third-party general-purpose simulators, either for in-house or outsourced testing, the generalizability of testing results to real autonomous vehicles is at stake. In this paper, we enhance simulation-based testing by introducing the notion of digital siblings, a multi-simulator approach that tests a given autonomous vehicle on multiple general-purpose simulators built with different technologies, that operate collectively as an ensemble in the testing process. We exemplify our approach on a case study focused on testing the lane-keeping component of an autonomous vehicle. We use two open-source simulators as digital siblings, and we empirically compare such a multi-simulator approach against a digital twin of a physical scaled autonomous vehicle on a large set of test cases. Our approach requires generating and running test cases for each individual simulator, in the form of sequences of road points. Then, test cases are migrated between simulators, using feature maps to characterize the exercised driving conditions. Finally, the joint predicted failure probability is computed, and a failure is reported only in cases of agreement among the siblings. Our empirical evaluation shows that the ensemble failure predictor by the digital siblings is superior to each individual simulator at predicting the failures of the digital twin. We discuss the findings of our case study and detail how our approach can help researchers interested in automated testing of autonomous driving software.	翻訳日:2024-04-27 00:17:35 公開日:2024-04-24
# ULIP-2:3次元理解のためのスケーラブルなマルチモーダル事前学習を目指して ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding ( http://arxiv.org/abs/2305.08275v3 ) ライセンス: Link先を確認	Le Xue, Ning Yu, Shu Zhang, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, Silvio Savarese,	(参考訳) 近年のマルチモーダル事前学習の進歩は, 3次元形状, 2次元形状, 言語記述の多モーダル特徴の整合による3次元表現学習において有望な効果を示した。しかし, 既存のフレームワークがこのようなマルチモーダルデータ, 特に3次元形状の言語記述をキュレートする手法はスケーラビリティに欠けており, 収集された言語記述は多様ではない。そこで本研究では,大規模マルチモーダルモデルを利用して3次元形状の全体的言語記述を自動的に生成する,シンプルで効果的な3モーダル事前学習フレームワークULIP-2を紹介する。入力として3Dデータしか必要とせず、手動の3Dアノテーションを必要としないため、大規模なデータセットにスケーラブルである。 ULIP-2は、より優れたマルチモーダル表現学習のためのスケールアップバックボーンも備えている。我々は,2つの大規模3DデータセットであるObjaverseとShapeNetで実験を行い,ULIP-2をトレーニングするための3Dポイントクラウド,画像,言語をトリモーダルデータセットで拡張した。実験の結果, ULIP-2は, ゼロショット3D分類, ファインチューニングによる標準3D分類, 3Dキャプション生成(3D-to-Language generation)の3つのダウンストリームタスクにおいて, 顕著なメリットを示すことがわかった。ゼロショット分類では、Objaverse-LVISで50.6%(トップ-1)、ModelNet40で84.7%(トップ-1)の新しいSOTAを実現している。標準微調整のためのScanObjectNNベンチマークでは、ULIP-2は91.5%の精度に達し、パラメータはわずか1.4万である。 ULIP-2は、人間のアノテーションを使わずにスケーラブルなマルチモーダル3D表現学習のための新しいパラダイムに光を当て、既存のベースラインよりも大幅に改善されている。コードとデータセットはhttps://github.com/salesforce/ULIPで公開されている。 Recent advancements in multimodal pre-training have shown promising efficacy in 3D representation learning by aligning multimodal features across 3D shapes, their 2D counterparts, and language descriptions. However, the methods used by existing frameworks to curate such multimodal data, in particular language descriptions for 3D shapes, are not scalable, and the collected language descriptions are not diverse. To address this, we introduce ULIP-2, a simple yet effective tri-modal pre-training framework that leverages large multimodal models to automatically generate holistic language descriptions for 3D shapes. It only needs 3D data as input, eliminating the need for any manual 3D annotations, and is therefore scalable to large datasets. ULIP-2 is also equipped with scaled-up backbones for better multimodal representation learning. We conduct experiments on two large-scale 3D datasets, Objaverse and ShapeNet, and augment them with tri-modal datasets of 3D point clouds, images, and language for training ULIP-2. Experiments show that ULIP-2 demonstrates substantial benefits in three downstream tasks: zero-shot 3D classification, standard 3D classification with fine-tuning, and 3D captioning (3D-to-language generation). It achieves a new SOTA of 50.6% (top-1) on Objaverse-LVIS and 84.7% (top-1) on ModelNet40 in zero-shot classification. In the ScanObjectNN benchmark for standard fine-tuning, ULIP-2 reaches an overall accuracy of 91.5% with a compact model of only 1.4 million parameters. ULIP-2 sheds light on a new paradigm for scalable multimodal 3D representation learning without human annotations and shows significant improvements over existing baselines. The code and datasets are released at https://github.com/salesforce/ULIP.	翻訳日:2024-04-27 00:17:35 公開日:2024-04-24
# クリーンデータよりも破損データの多いシステム同定のための厳密な復元 Exact Recovery for System Identification with More Corrupt Data than Clean Data ( http://arxiv.org/abs/2305.10506v3 ) ライセンス: Link先を確認	Baturalp Yalcin, Haixiang Zhang, Javad Lavaei, Murat Arcak,	(参考訳) 本稿では,2つのラッソ型推定器を用いた線形離散時間系のシステム同定問題について検討する。本研究では,これらの推定器の漸近特性と非漸近特性を,攻撃時の決定論的モデルと確率論的モデルに対応する2つの異なるシナリオで検討する。システムから採取したサンプルは相関しているため,既存のラッソに関する結果は適用できない。システムが安定しており、攻撃が定期的に注入される場合、システムダイナミクスの正確な回復のためのサンプルの複雑さは状態の次元の点で線形であることが証明された。確率 p のインスタンスごとに逆攻撃が発生したとき、正確な回復に必要なサンプルの複雑さは状態と確率 p の次元で多項式的にスケールする。この結果は、漸近的状態の下での真の系力学へのほぼ確実な収束を示唆する。副産物として、データの半分以上が漏洩した場合でも、私たちの推定者はシステムを正しく学習します。本研究では,攻撃ベクトルが相互に相関することが認められているのに対して,攻撃の発生時期についていくつかの仮定を行う。本稿では, 汚いデータよりもクリーンなデータが少ない場合に, 動的システムの相関データから学習することに関する文献の中で, 初めての数学的保証を提供する。 This paper investigates the system identification problem for linear discrete-time systems under adversaries and analyzes two lasso-type estimators. We examine both asymptotic and non-asymptotic properties of these estimators in two separate scenarios, corresponding to deterministic and stochastic models for the attack times. Since the samples collected from the system are correlated, the existing results on lasso are not applicable. We prove that when the system is stable and attacks are injected periodically, the sample complexity for exact recovery of the system dynamics is linear in terms of the dimension of the states. When adversarial attacks occur at each time instance with probability p, the required sample complexity for exact recovery scales polynomially in the dimension of the states and the probability p. This result implies almost sure convergence to the true system dynamics under the asymptotic regime. As a by-product, our estimators still learn the system correctly even when more than half of the data is compromised. We highlight that the attack vectors are allowed to be correlated with each other in this work, whereas we make some assumptions about the times at which the attacks happen. This paper provides the first mathematical guarantee in the literature on learning from correlated data for dynamical systems in the case when there is less clean data than corrupt data.	翻訳日:2024-04-27 00:17:35 公開日:2024-04-24
# 差別的拡散モデル : 映像と言語学習者による差別的拡散モデル Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners ( http://arxiv.org/abs/2305.10722v3 ) ライセンス: Link先を確認	Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang,	(参考訳) 安定拡散のような拡散モデルは、テキスト・画像生成において素晴らしい性能を示している。テキスト・ツー・イメージ生成は、しばしば、細かな詳細とテキスト・プロンプトで特定された属性で視覚概念を生成するモデルを必要とするため、画像・テキストマッチングのような識別的なタスクに対して、事前学習された拡散モデルによって学習された強力な表現を活用できるだろうか? そこで本研究では,事前学習したテキストと画像の拡散モデルから数ショットの識別学習者へ変換する新たなアプローチとして,DSD(Distriminative Staable Diffusion)を提案する。提案手法は, 安定拡散モデルの相互注意スコアを用いて, 視覚情報とテキスト情報の相互影響を捉え, より効率的な注意に基づくプロンプト学習により, 画像テキストマッチングを行う。いくつかのベンチマークデータセット上で、DSDと最先端の手法を比較することで、数ショット画像テキストマッチングにおいて優れた結果が得られる識別的タスクに事前訓練された拡散モデルを使用することの可能性を示す。 Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified in text prompts, can we leverage the powerful representations learned by pre-trained diffusion models for discriminative tasks such as image-text matching? To answer this question, we propose a novel approach, Discriminative Stable Diffusion (DSD), which turns pre-trained text-to-image diffusion models into few-shot discriminative learners. Our approach mainly uses the cross-attention score of a Stable Diffusion model to capture the mutual influence between visual and textual information and fine-tune the model via efficient attention-based prompt learning to perform image-text matching. By comparing DSD with state-of-the-art methods on several benchmark datasets, we demonstrate the potential of using pre-trained diffusion models for discriminative tasks with superior results on few-shot image-text matching.	翻訳日:2024-04-27 00:17:35 公開日:2024-04-24
# テキスト・ビデオ生成のための時空間拡散におけるスワップアテンション Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation ( http://arxiv.org/abs/2305.10874v4 ) ライセンス: Link先を確認	Wenjing Wang, Huan Yang, Zixi Tuo, Huiguo He, Junchen Zhu, Jianlong Fu, Jiaying Liu,	(参考訳) AI生成コンテンツ(AIGC)の爆発的な人気により、ビデオ生成は近年多くの注目を集めている。テキスト命令でガイドされたビデオを生成することは、空間と時間の間の複雑な関係をモデル化することや、大規模なテキストとビデオのペアリングデータの欠如など、大きな課題をもたらす。既存のテキストビデオデータセットは、コンテンツ品質とスケールの両方の制限に悩まされるか、オープンソースではないため、学習や使用にはアクセスできない。モデル設計においては、ビデオ生成のための時間的1D畳み込み/アテンションモジュールを追加することで、事前訓練されたテキスト・画像生成モデルを拡張する。しかし、これらのアプローチは空間と時間の共同モデリングの重要性を軽視し、必然的に時間的歪みやテキストとビデオ間の不一致を招きかねない。本稿では,空間的知覚と時間的知覚の相互作用を強化する新しいアプローチを提案する。特に,空間ブロックと時間ブロックの「クエリ」ロールを交互に置き換える3次元ウィンドウにおいて,相互強化を実現する。さらに、高品質なビデオ生成のためのモデル機能を完全にアンロックし、フィールドの開発を促進するために、HD-VG-130Mと呼ばれる大規模かつオープンソースのビデオデータセットをキュレートする。このデータセットは、オープンドメインから1億3000万のテキストビデオペアで構成され、高精細度、ワイドスクリーン、透かしのない文字を保証する。より小さく、より精巧に掃除されたサブセットは、データ品質をさらに向上させ、優れたパフォーマンスを達成するためのモデルを支援する。実験的な定量的および定性的な結果から,フレーム単位の品質,時間的相関,テキスト・ビデオアライメントの面で,明確なマージンを有するアプローチの優位性を示した。 With the explosive popularity of AI-generated content (AIGC), video generation has recently received a lot of attention. Generating videos guided by text instructions poses significant challenges, such as modeling the complex relationship between space and time, and the lack of large-scale text-video paired data. Existing text-video datasets suffer from limitations in both content quality and scale, or they are not open-source, rendering them inaccessible for study and use. For model design, previous approaches extend pretrained text-to-image generation models by adding temporal 1D convolution/attention modules for video generation. However, these approaches overlook the importance of jointly modeling space and time, inevitably leading to temporal distortions and misalignment between texts and videos. In this paper, we propose a novel approach that strengthens the interaction between spatial and temporal perceptions. In particular, we utilize a swapped cross-attention mechanism in 3D windows that alternates the "query" role between spatial and temporal blocks, enabling mutual reinforcement for each other. Moreover, to fully unlock model capabilities for high-quality video generation and promote the development of the field, we curate a large-scale and open-source video dataset called HD-VG-130M. This dataset comprises 130 million text-video pairs from the open-domain, ensuring high-definition, widescreen and watermark-free characters. A smaller-scale yet more meticulously cleaned subset further enhances the data quality, aiding models in achieving superior performance. Experimental quantitative and qualitative results demonstrate the superiority of our approach in terms of per-frame quality, temporal correlation, and text-video alignment, with clear margins.	翻訳日:2024-04-27 00:17:35 公開日:2024-04-24
# LANISTR: 構造化データと非構造化データによるマルチモーダル学習 LANISTR: Multimodal Learning from Structured and Unstructured Data ( http://arxiv.org/abs/2305.16556v3 ) ライセンス: Link先を確認	Sayna Ebrahimi, Sercan O. Arik, Yihe Dong, Tomas Pfister,	(参考訳) マルチモーダルな大規模事前学習は,言語や画像などの非構造化データに対して顕著な性能を示した。しかし、一般的な実世界のシナリオは、構造化データ型、表型、時系列型、非構造化データである。このようなシナリオは検討されている。このギャップを埋めるために,LANguage, Image, STRucturedデータから学習する注目ベースのフレームワークLANISTRを提案する。 LANISTRの方法論のコアは、単調なレベルとマルチモーダルなレベルの両方に適用される‘textit{masking-based}トレーニングに根ざしている。特に,新しい類似性に基づくマルチモーダルマスキングの損失を導入し,モダリティを欠いた大規模マルチモーダルデータからクロスモーダル関係を学習する。 MIMIC-IV(ヘルスケアから)とAmazon Product Review(小売から)の2つの実世界のデータセットにおいて、LANISTRは、最先端の代替品と比較して、それぞれ0.1\%と0.01\%のラベル付きデータで微調整された場合、6.6\%(AUROCで)と14\%(精度で)の顕著な改善を示している。特に、これらの改善は、全てのモダリティを含まない非常に高い比(それぞれ35.7\%と99.8\%)のサンプルでも観察され、LANISTRの頑丈さを事実上欠落したモダリティの課題に基づけている。私たちのコードとモデルはhttps://github.com/google-research/lanistrで公開されます。 Multimodal large-scale pretraining has shown impressive performance for unstructured data such as language and image. However, a prevalent real-world scenario involves structured data types, tabular and time-series, along with unstructured data. Such scenarios have been understudied. To bridge this gap, we propose LANISTR, an attention-based framework to learn from LANguage, Image, and STRuctured data. The core of LANISTR's methodology is rooted in \textit{masking-based} training applied across both unimodal and multimodal levels. In particular, we introduce a new similarity-based multimodal masking loss that enables it to learn cross-modal relations from large-scale multimodal data with missing modalities. On two real-world datasets, MIMIC-IV (from healthcare) and Amazon Product Review (from retail), LANISTR demonstrates remarkable improvements, 6.6\% (in AUROC) and 14\% (in accuracy) when fine-tuned with 0.1\% and 0.01\% of labeled data, respectively, compared to the state-of-the-art alternatives. Notably, these improvements are observed even with very high ratio of samples (35.7\% and 99.8\% respectively) not containing all modalities, underlining the robustness of LANISTR to practical missing modality challenge. Our code and models will be available at https://github.com/google-research/lanistr	翻訳日:2024-04-27 00:17:35 公開日:2024-04-24
# 統計的機械学習を用いた研究全体にわたる不均一処理効果推定のためのマルチスタディRラーナー Multi-Study R-Learner for Estimating Heterogeneous Treatment Effects Across Studies Using Statistical Machine Learning ( http://arxiv.org/abs/2306.01086v3 ) ライセンス: Link先を確認	Cathy Shyr, Boyu Ren, Prasad Patil, Giovanni Parmigiani,	(参考訳) ヘテロジニアス治療効果(HTEs)の推定は、精密医療に不可欠である。複数の研究が結果の一般化性を改善することができるが、それらを推定に活用することは統計的に困難である。既存のアプローチでは、研究全体で同じHTEを仮定することが多いが、これは、研究設計の違い、研究人口、データ収集プロトコルなど、研究間の異種性の様々な源泉によって、違反される可能性がある。そこで本研究では,Nuisance関数と処理効果の相違を考慮したマルチスタディHTE推定のためのフレームワークを提案する。我々のアプローチであるマルチスタディR-ラーナーは、R-ラーナーを拡張し、マルチスタディ環境における機械学習(ML)を用いた原理的統計的推定値を得る。これは、研究固有の治療効果と、メンバーシップ確率を通してニュアンス関数をリンクするデータ適応的客観的関数を含んでおり、これにより、潜在的に異種な研究を通じて情報を借りることができる。マルチスタディなRラーナーフレームワークは、ランダムに制御された試行錯誤、観察研究、あるいは両方の組み合わせからのデータを組み合わせることができる。 HTE、ニュアンス関数、メンバシップ確率を推定するためにMLを組み込むことは、実装が容易でフレキシブルです。連続推定フレームワークでは、Rラーナーが相似性の下で確率的スコアモデルに相似不均一性が存在する場合、Rラーナーよりも漸近的に正規かつ効率的であることが示される。提案手法は, 既存手法と比較して, 学際的不均一性が存在する場合と比較して, 有効であることを示す。 Estimating heterogeneous treatment effects (HTEs) is crucial for precision medicine. While multiple studies can improve the generalizability of results, leveraging them for estimation is statistically challenging. Existing approaches often assume identical HTEs across studies, but this may be violated due to various sources of between-study heterogeneity, including differences in study design, study populations, and data collection protocols, among others. To this end, we propose a framework for multi-study HTE estimation that accounts for between-study heterogeneity in the nuisance functions and treatment effects. Our approach, the multi-study R-learner, extends the R-learner to obtain principled statistical estimation with machine learning (ML) in the multi-study setting. It involves a data-adaptive objective function that links study-specific treatment effects with nuisance functions through membership probabilities, which enable information to be borrowed across potentially heterogeneous studies. The multi-study R-learner framework can combine data from randomized controlled trials, observational studies, or a combination of both. It's easy to implement and flexible in its ability to incorporate ML for estimating HTEs, nuisance functions, and membership probabilities. In the series estimation framework, we show that the multi-study R-learner is asymptotically normal and more efficient than the R-learner when there is between-study heterogeneity in the propensity score model under homoscedasticity. We illustrate using cancer data that the proposed method performs favorably compared to existing approaches in the presence of between-study heterogeneity.	翻訳日:2024-04-27 00:17:35 公開日:2024-04-24
# インジェクティブフローのリフティング構造制約 Lifting Architectural Constraints of Injective Flows ( http://arxiv.org/abs/2306.01843v4 ) ライセンス: Link先を確認	Peter Sorrenson, Felix Draxler, Armand Rousselot, Sander Hummerich, Lea Zimmermann, Ullrich Köthe,	(参考訳) 正規化フローはトレーニングデータに対して全次元の確率を明示的に最大化する。しかし、実際のデータは一般に低次元多様体上でのみサポートされ、モデルがモデリングノイズに大きな計算を出力する。単射フローは、多様体とその上の分布を共同で学習することでこれを解決する。これまでのところ、制限的なアーキテクチャや高い計算コストによって制限されている。我々は、自由形式のボトルネックアーキテクチャと互換性のある最大可能性損失を推定する新しい効率的な推定器により、両方の制約を引き上げる。さらに、データ多様体とそれ上の分布の両方を鼻で学習することで、分岐解がもたらされることを示し、この知見を用いて、安定した最大可能性トレーニング目標を動機付ける。我々は,玩具,表,画像データについて広範な実験を行い,その結果の競争性能を実証した。 Normalizing Flows explicitly maximize a full-dimensional likelihood on the training data. However, real data is typically only supported on a lower-dimensional manifold leading the model to expend significant compute on modeling noise. Injective Flows fix this by jointly learning a manifold and the distribution on it. So far, they have been limited by restrictive architectures and/or high computational cost. We lift both constraints by a new efficient estimator for the maximum likelihood loss, compatible with free-form bottleneck architectures. We further show that naively learning both the data manifold and the distribution on it can lead to divergent solutions, and use this insight to motivate a stable maximum likelihood training objective. We perform extensive experiments on toy, tabular and image data, demonstrating the competitive performance of the resulting model.	翻訳日:2024-04-27 00:17:35 公開日:2024-04-24
# WOUAF:テキスト・画像拡散モデルにおけるユーザ属性とフィンガープリントの軽量化 WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models ( http://arxiv.org/abs/2306.04744v3 ) ライセンス: Link先を確認	Changhoon Kim, Kyle Min, Maitreya Patel, Sheng Cheng, Yezhou Yang,	(参考訳) 生成モデルの急速な進歩は、テキスト記述から超現実的画像の作成を容易にし、誤情報のような社会的な重要な懸念を同時にエスカレートさせてきた。いくつかの軽減策を提供しているが、従来の指紋認証機構は、悪意ある合成画像の使用に対する責任を負うには不十分である。本稿では,生成画像に対する責任を負うモデルフィンガープリントの新たなアプローチを提案する。提案手法は,ユーザ固有のデジタル指紋に基づいて生成モデルを修正し,ユーザへ遡ることができるコンテンツにユニークな識別子を印字する。安定拡散モデルを用いたテキスト・トゥ・イメージ(T2I)タスクに微調整を取り入れたこのアプローチは、出力品質に最小限の影響を伴って、ほぼ完全な帰属精度を示す。本手法は,画像後処理の処理効率を平均11倍に向上させ,ベースライン法よりも優れていることを示す。提案手法は,説明責任のあるモデル分布と責任ある利用のための,有望で斬新な道を示す。私たちのコードは \url{https://github.com/kylemin/WOUAF} で利用可能です。 The rapid advancement of generative models, facilitating the creation of hyper-realistic images from textual descriptions, has concurrently escalated critical societal concerns such as misinformation. Although providing some mitigation, traditional fingerprinting mechanisms fall short in attributing responsibility for the malicious use of synthetic images. This paper introduces a novel approach to model fingerprinting that assigns responsibility for the generated images, thereby serving as a potential countermeasure to model misuse. Our method modifies generative models based on each user's unique digital fingerprint, imprinting a unique identifier onto the resultant content that can be traced back to the user. This approach, incorporating fine-tuning into Text-to-Image (T2I) tasks using the Stable Diffusion Model, demonstrates near-perfect attribution accuracy with a minimal impact on output quality. Through extensive evaluation, we show that our method outperforms baseline methods with an average improvement of 11\% in handling image post-processes. Our method presents a promising and novel avenue for accountable model distribution and responsible use. Our code is available in \url{https://github.com/kylemin/WOUAF}.	翻訳日:2024-04-27 00:07:23 公開日:2024-04-24
# デコヒーレンス自由部分空間におけるカー効果に基づく量子論理ゲート Kerr-effect-based quantum logical gates in decoherence-free subspace ( http://arxiv.org/abs/2306.05625v2 ) ライセンス: Link先を確認	Fang-Fang Du, Gang Fan, Xue-Mei Ren,	(参考訳) システムと環境のカップリングによるデコヒーレンス効果は、量子情報処理における2つの(または3つの)量子ビット論理ゲートの効率的な実装におけるエラーにつながる。幸いなことに、decoherence-free subspace (DFS) が導入されたことにより、decoherence効果の影響を効果的に低減することができる。本稿では,DFSにおけるクロスカー非線形性を用いて,2つないし3つの論理量子ビットに対して,制御NOT(CNOT),トフォリ,フレドキンゲートなどの量子制御ゲートの族を設定する手法を提案する。これら3つの論理ゲートは複雑な量子計算回路も補助光子(あるいは絡み合った状態)も必要としない。 3つの論理ゲートの成功確率は、X-ホモジン検出器の異なる測定結果に基づいて、対応する古典的フィードフォワード演算を行うことで近似1であり、その忠実度は、現在の技術による光子損失に対して堅牢である。提案する論理ゲートは, 単純な線形光学素子, 利用可能な単一量子ビット演算, 成熟度測定方法のみに依存しており, 実用上, 有効である。 The decoherence effect caused by the coupling between the system and the environment undoubtedly leads to the errors in efficient implementations of two (or three) qubit logical gates in quantum information processing. Fortunately, decoherence-free subspace (DFS) introduced can effectively decrease the influence of decoherence effect. In this paper, we propose some schemes for setting up a family of quantum control gates, including controlled-NOT (CNOT), Toffoli, and Fredkin gates for two or three logical qubits by means of cross-Kerr nonlinearities in DFS. These three logical gates require neither complicated quantum computational circuits nor auxiliary photons (or entangled states). The success probabilities of three logical gates are approximate 1 by performing the corresponding classical feed-forward operations based on the different measuring results of the X-homodyne detectors, and their fidelities are robust against the photon loss with the current technology. The proposed logical gates rely on only simple linear-optics elements, available single-qubit operations, and mature measurement methods, making our proposed gates be feasible and efficient in practical applications.	翻訳日:2024-04-27 00:07:23 公開日:2024-04-24
# Adjusted PageRank Centrality を用いたサイバーキー地形識別 Cyber Key Terrain Identification Using Adjusted PageRank Centrality ( http://arxiv.org/abs/2306.11018v2 ) ライセンス: Link先を確認	Lukáš Sadlek, Pavel Čeleda,	(参考訳) サイバー地形には、デバイス、ネットワークサービス、サイバーペルソナ、その他ネットワーク操作に関わるネットワークエンティティが含まれる。キーネットワークエンティティをネットワーク操作に自動的に識別する手法の設計は困難である。しかし、サイバー防衛が重視すべきサイバー資産を決定するためには、このような方法が不可欠である。本稿では,PageRankの集中度計算を機械学習によって調整した手法を用いて,サイバー鍵地形に属するIPアドレスをネットワーク位置に応じて分類する手法を提案する。我々は、IPフローでキャプチャされたソースポートと宛先ポートに基づいて、PageRankの減衰要因を識別するために、登山アルゴリズムとランダムウォークアルゴリズムを使用した。静的データサンプルのワンタイム学習フェーズでは、完全なネットワークグラフを維持することなく、IPフローデータからキーホストをほぼリアルタイムに分類することができる。我々は,サイバー防御演習とキャンパスネットワークのデータから,データセットに対するアプローチを評価した。その結果, 中央値の調整計算によるサイバー鍵地形の同定は, 元のバージョンよりも精度が高いことがわかった。 The cyber terrain contains devices, network services, cyber personas, and other network entities involved in network operations. Designing a method that automatically identifies key network entities to network operations is challenging. However, such a method is essential for determining which cyber assets should the cyber defense focus on. In this paper, we propose an approach for the classification of IP addresses belonging to cyber key terrain according to their network position using the PageRank centrality computation adjusted by machine learning. We used hill climbing and random walk algorithms to distinguish PageRank's damping factors based on source and destination ports captured in IP flows. The one-time learning phase on a static data sample allows near-real-time stream-based classification of key hosts from IP flow data in operational conditions without maintaining a complete network graph. We evaluated the approach on a dataset from a cyber defense exercise and on data from the campus network. The results show that cyber key terrain identification using the adjusted computation of centrality is more precise than its original version.	翻訳日:2024-04-27 00:07:23 公開日:2024-04-24
# ニューラルネットワークの階層構造 A Hierarchical Architecture for Neural Materials ( http://arxiv.org/abs/2307.10135v3 ) ライセンス: Link先を確認	Bowen Xue, Shuang Zhao, Henrik Wann Jensen, Zahra Montazeri,	(参考訳) ニューラルリフレクタンスモデルは、多くの現実世界の物質を異なるスケールで空間的に変化する外観を再現することができる。残念なことに、NeuMIPのような既存の技術は、強いシャドーイング効果や詳細なスペックハイライトを持つ材料を扱うのに苦労している。本稿では,新しいレベルの精度を提供するニューラルな外観モデルを提案する。私たちのモデルの中心は、並列動作カーネルを用いて複数のスケールで素材の外観をキャプチャし、特殊な畳み込み層を通じて多段階の機能を保証する、インセプションベースのコアネットワーク構造である。さらに、入力を周波数空間に符号化し、勾配に基づく損失を導入し、学習フェーズの進行に適応させる。提案手法の有効性を, 各種合成例と実例を用いて実証する。 Neural reflectance models are capable of reproducing the spatially-varying appearance of many real-world materials at different scales. Unfortunately, existing techniques such as NeuMIP have difficulties handling materials with strong shadowing effects or detailed specular highlights. In this paper, we introduce a neural appearance model that offers a new level of accuracy. Central to our model is an inception-based core network structure that captures material appearances at multiple scales using parallel-operating kernels and ensures multi-stage features through specialized convolution layers. Furthermore, we encode the inputs into frequency space, introduce a gradient-based loss, and employ it adaptive to the progress of the learning phase. We demonstrate the effectiveness of our method using a variety of synthetic and real examples.	翻訳日:2024-04-27 00:07:23 公開日:2024-04-24
# TransFusion: 変圧器を用いた拡散モデルを用いた長距離高忠実時系列生成 TransFusion: Generating Long, High Fidelity Time Series using Diffusion Models with Transformers ( http://arxiv.org/abs/2307.12667v2 ) ライセンス: Link先を確認	Md Fahim Sikder, Resmi Ramachandranpillai, Fredrik Heintz,	(参考訳) 高品質で時系列の時系列データの生成は、その幅広い応用のために不可欠である。過去には、時系列データを合成するためにスタンドアロンのRecurrent and Convolutional Neural Network-based Generative Adversarial Networks (GAN) が用いられていた。しかし、アーキテクチャの制約のため、時系列データの長いシーケンスを生成するには不十分である。さらに、GANはトレーニングの不安定性とモード崩壊の問題でよく知られている。そこで本稿では,トランスフュージョン(TransFusion)とトランスフュージョン(TransFusion)をモデルとして,高品質な時系列時系列データを生成する。配列長を384に拡張し,高品質な合成データを生成した。また,合成データの品質と予測特性を評価するための2つの評価指標を提案する。我々はTransFusionを様々な視覚的・経験的な指標で評価し、TransFusionは従来の最先端技術よりも大幅に優れています。 The generation of high-quality, long-sequenced time-series data is essential due to its wide range of applications. In the past, standalone Recurrent and Convolutional Neural Network-based Generative Adversarial Networks (GAN) were used to synthesize time-series data. However, they are inadequate for generating long sequences of time-series data due to limitations in the architecture. Furthermore, GANs are well known for their training instability and mode collapse problem. To address this, we propose TransFusion, a diffusion, and transformers-based generative model to generate high-quality long-sequence time-series data. We have stretched the sequence length to 384, and generated high-quality synthetic data. Also, we introduce two evaluation metrics to evaluate the quality of the synthetic data as well as its predictive characteristics. We evaluate TransFusion with a wide variety of visual and empirical metrics, and TransFusion outperforms the previous state-of-the-art by a significant margin.	翻訳日:2024-04-27 00:07:23 公開日:2024-04-24
# 大規模データ駆動フルウェーブフォームインバージョンに関する実証的研究 An Empirical Study of Large-Scale Data-Driven Full Waveform Inversion ( http://arxiv.org/abs/2307.15388v2 ) ライセンス: Link先を確認	Peng Jin, Yinan Feng, Shihang Feng, Hanchen Wang, Yinpeng Chen, Benjamin Consolvo, Zicheng Liu, Youzuo Lin,	(参考訳) 本稿では,ビッグデータがディープラーニングモデルに与える影響について検討し,FWI(Full Waveform Inversion)問題の解法を提案する。ビッグデータが多くのタスクにおいてディープラーニングモデルの性能を向上させることはよく知られているが、その有効性はFWIでは検証されていない。このギャップに対処するために、最近出版された大規模で多構造的な合成データセットの集合であるOpenFWIで訓練されたFWIのディープラーニングモデルがどのように振る舞うかを実証研究する。特に,470万組の地震データと速度マップを含むOpenFWIの10個の2次元サブセットを用いてFWIモデルを訓練し,評価する。実験の結果,MSEでは平均13.03%,MSEでは7.19%,SSIMでは1.87%,残余一般化テストでは平均28.60%,21.55%,8.22%の改善が得られた。さらに、モデルキャパシティは最適な改善のためにデータサイズに応じてスケールする必要があることを示し、最も大きなモデルでは、最小モデルに比べて20.06%、13.39%、0.72%の平均的な改善が得られます。 This paper investigates the impact of big data on deep learning models to help solve the full waveform inversion (FWI) problem. While it is well known that big data can boost the performance of deep learning models in many tasks, its effectiveness has not been validated for FWI. To address this gap, we present an empirical study that investigates how deep learning models in FWI behave when trained on OpenFWI, a collection of large-scale, multi-structural, synthetic datasets published recently. In particular, we train and evaluate the FWI models on a combination of 10 2D subsets in OpenFWI that contain 470K pairs of seismic data and velocity maps in total. Our experiments demonstrate that training on the combined dataset yields an average improvement of 13.03% in MAE, 7.19% in MSE and 1.87% in SSIM compared to each split dataset, and an average improvement of 28.60%, 21.55% and 8.22% in the leave-one-out generalization test. We further demonstrate that model capacity needs to scale in accordance with data size for optimal improvement, where our largest model yields an average improvement of 20.06%, 13.39% and 0.72% compared to the smallest one.	翻訳日:2024-04-27 00:07:23 公開日:2024-04-24
# 初期スクリーニング順序問題 The Initial Screening Order Problem ( http://arxiv.org/abs/2307.15398v3 ) ライセンス: Link先を確認	Jose M. Alvarez, Antonio Mastropietro, Salvatore Ruggieri,	(参考訳) 本研究は,従業員採用や大学入学など,候補検診プロセスにおける初期検診命令(ISO)の役割について検討する。 ISOは、スクリーニング者が候補プールを評価する順序を指す。文学では、選択されたセットの最適性と公正性、特にヒトスクリーニングの下での潜在的影響にもかかわらず、ほとんど見過ごされている。問題の定式化は、$k$、$k$、$k$、$k$という2つを定義します。 ISOの影響を調べるため、人間のようなスクリーニングを導入し、アルゴリズムと比較する。人型スクリーニング装置は、疲労により時間の経過とともに不整合であると考えられる。分析の結果、ISOは、特に人間のようなスクリーニングの下では、ミーティンググループレベルの公正さにもかかわらず、個人の公正さを妨げていることがわかった。これは、候補の評価がISO内の位置によって影響を受ける位置バイアスによるものである。我々は,アルゴリズムと人型スクリーニングの両方において,ベスト$k$とグッド$k$の問題定式化のパラメータを探索する広範囲なシミュレーション実験を報告する。この研究は、ヨーロッパの大企業と共同で研究されている実世界の候補者スクリーニング問題によって動機付けられている。 We investigate the role of the initial screening order (ISO) in candidate screening processes, such as employee hiring and academic admissions. The ISO refers to the order in which the screener evaluates the candidate pool. It has been largely overlooked in the literature, despite its potential impact on the optimality and fairness of the chosen set, especially under a human screener. We define two problem formulations: the best-$k$, where the screener selects the $k$ best candidates, and the good-$k$, where the screener selects the $k$ first good-enough candidates. To study the impact of the ISO, we introduce a human-like screener and compare it to its algorithmic counterpart. The human-like screener is conceived to be inconsistent over time due to fatigue. Our analysis shows that the ISO, in particular, under a human-like screener hinders individual fairness despite meeting group level fairness. This is due to the position bias, where a candidate's evaluation is affected by its position within the ISO. We report extensive simulated experiments exploring the parameters of the best-$k$ and good-$k$ problem formulations both for the algorithmic and human-like screeners. This work is motivated by a real world candidate screening problem studied in collaboration with a large European company.	翻訳日:2024-04-27 00:07:23 公開日:2024-04-24
# スワップ演算子の代数構造による量子マックスカットの緩和と厳密解 Relaxations and Exact Solutions to Quantum Max Cut via the Algebraic Structure of Swap Operators ( http://arxiv.org/abs/2307.15661v3 ) ライセンス: Link先を確認	Adam Bene Watts, Anirban Chowdhury, Aidan Epperly, J. William Helton, Igor Klep,	(参考訳) 量子マックスカット(QMC)問題は、局所ハミルトン問題に対する近似アルゴリズムを設計するためのテスト確率として登場した。本稿では、QMCの代数構造、特に量子マックスカットハミルトニアンと対称群の表現理論の関係を用いてこの問題に対処する。この論文の最初の大きな貢献は、量子マックスカットに緩和の新たな階層を与えるために非可換な正方形最適化手法(ncSoS)の拡張である。現在の階層は、キュービットスワップ作用素の多項式に対する最適化に基づいている。これは、パウリ行列の項で表される多項式に基づく「標準的な」量子ラッサール階層とは対照的である。この階層の正しさを証明するために、キュービットスワップ作用素によって生成される代数の有限表現を利用する。このプレゼンテーションでは、スワップ演算子の言葉で書かれた多項式を操作・単純化するためのコンピュータ代数的技法が利用可能であり、独立した興味を持つかもしれない。驚くべきことに、この新しい階層のレベル2は、少なくとも8頂点のグラフ上の一様辺重みを持つ全てのQMCインスタンス上で、数値的に正確である(耐性10^(-7)まで)。この論文の2つ目の大きな貢献は、あるグラフに対してQMCハミルトンの最大固有値を計算する多項式時間アルゴリズムである。後者の特別なケースは、一様辺重みを持つ完備二部グラフであり、リーブとマティスの業績から正確な解が知られている。この手法は対称群の表現論を用いており、リーブ・マティス結果の一般化と見なすことができる。 The Quantum Max Cut (QMC) problem has emerged as a test-problem for designing approximation algorithms for local Hamiltonian problems. In this paper we attack this problem using the algebraic structure of QMC, in particular the relationship between the quantum max cut Hamiltonian and the representation theory of the symmetric group. The first major contribution of this paper is an extension of non-commutative Sum of Squares (ncSoS) optimization techniques to give a new hierarchy of relaxations to Quantum Max Cut. The hierarchy we present is based on optimizations over polynomials in the qubit swap operators. This is in contrast to the "standard" quantum Lasserre Hierarchy, which is based on polynomials expressed in terms of the Pauli matrices. To prove correctness of this hierarchy, we exploit a finite presentation of the algebra generated by the qubit swap operators. This presentation allows for the use of computer algebraic techniques to manipulate and simplify polynomials written in terms of the swap operators, and may be of independent interest. Surprisingly, we find that level-2 of this new hierarchy is numerically exact (up to tolerance 10^(-7)) on all QMC instances with uniform edge weights on graphs with at most 8 vertices. The second major contribution of this paper is a polynomial-time algorithm that computes (in exact arithmetic) the maximum eigenvalue of the QMC Hamiltonian for certain graphs, including graphs that can be "decomposed" as a signed combination of cliques. A special case of the latter are complete bipartite graphs with uniform edge-weights, for which exact solutions are known from the work of Lieb and Mattis. Our methods, which use representation theory of the symmetric group, can be seen as a generalization of the Lieb-Mattis result.	翻訳日:2024-04-27 00:07:23 公開日:2024-04-24
# ネットワーク型マルチエージェントマルコフ決定過程に対する連続時間分散動的計画法 Continuous-Time Distributed Dynamic Programming for Networked Multi-Agent Markov Decision Processes ( http://arxiv.org/abs/2307.16706v6 ) ライセンス: Link先を確認	Donghwan Lee, Han-Dong Lim, Do Wan Kim,	(参考訳) 本稿では,ネットワーク型マルチエージェントマルコフ決定問題(MAMDP)に対する連続時間分散動的プログラミング(DP)アルゴリズムについて検討する。本研究では,個々のエージェントが自身の報酬のみにアクセスできる分散マルチエージェントフレームワークを採用し,他のエージェントの報酬に対する洞察を欠いている。さらに、各エージェントは、グラフで表される通信ネットワークを介して、そのパラメータを隣接するエージェントと共有することができる。まず,Wang と Elia の分散最適化手法に着想を得た分散DPを提案する。次に、デカップリングプロセスを通じて、新しい分散DPを導入する。 DPアルゴリズムの収束はシステムと制御の観点から証明される。本稿では,分散時間差学習アルゴリズムについて述べる。 The main goal of this paper is to investigate continuous-time distributed dynamic programming (DP) algorithms for networked multi-agent Markov decision problems (MAMDPs). In our study, we adopt a distributed multi-agent framework where individual agents have access only to their own rewards, lacking insights into the rewards of other agents. Moreover, each agent has the ability to share its parameters with neighboring agents through a communication network, represented by a graph. We first introduce a novel distributed DP, inspired by the distributed optimization method of Wang and Elia. Next, a new distributed DP is introduced through a decoupling process. The convergence of the DP algorithms is proved through systems and control perspectives. The study in this paper sets the stage for new distributed temporal different learning algorithms.	翻訳日:2024-04-27 00:07:23 公開日:2024-04-24
# 情緒的核・共感 : EmotionBench を用いた LLM の評価 Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench ( http://arxiv.org/abs/2308.03656v4 ) ライセンス: Link先を確認	Jen-tse Huang, Man Ho Lam, Eric John Li, Shujie Ren, Wenxuan Wang, Wenxiang Jiao, Zhaopeng Tu, Michael R. Lyu,	(参考訳) 大規模言語モデル(LLM)の人為的能力の評価は,現代言論においてますます重要になっている。感情評価理論を心理学から活用し, LLMの共感能力, すなわち, 特定の状況における感情の変化を評価することを提案する。注意深い総合的な調査の後、研究の中心となる8つの感情を引き出すのに有効な400以上の状況を含むデータセットを収集しました。状況を36因子に分類し,世界中の1200名以上の被験者を対象に人間による評価を行った。 GPT-4 や LLaMA-2 のような最新のイテレーションを特徴とする,商用モデルとオープンソースモデルの両方をカバーする5つの LLM を参考として評価を行った。いくつかのミスアライメントにもかかわらず、LLMは一般的に特定の状況に適切に対応できる。しかしながら、それらは人間の感情的な行動と一致せず、類似した状況間のつながりを確立できない。 EmotionBenchと呼ばれるテストフレームワークは、https://github.com/CUHK-ARISE/EmotionBench.comから公開されています。我々は,人間の感情行動との整合性を向上し,知的アシスタントとしての有用性と適用性を高めることを目的としている。 Evaluating Large Language Models' (LLMs) anthropomorphic capabilities has become increasingly important in contemporary discourse. Utilizing the emotion appraisal theory from psychology, we propose to evaluate the empathy ability of LLMs, i.e., how their feelings change when presented with specific situations. After a careful and comprehensive survey, we collect a dataset containing over 400 situations that have proven effective in eliciting the eight emotions central to our study. Categorizing the situations into 36 factors, we conduct a human evaluation involving more than 1,200 subjects worldwide. With the human evaluation results as references, our evaluation includes five LLMs, covering both commercial and open-source models, including variations in model sizes, featuring the latest iterations, such as GPT-4 and LLaMA-2. We find that, despite several misalignments, LLMs can generally respond appropriately to certain situations. Nevertheless, they fall short in alignment with the emotional behaviors of human beings and cannot establish connections between similar situations. Our collected dataset of situations, the human evaluation results, and the code of our testing framework, dubbed EmotionBench, is made openly accessible via https://github.com/CUHK-ARISE/EmotionBench. We aspire to contribute to the advancement of LLMs regarding better alignment with the emotional behaviors of human beings, thereby enhancing their utility and applicability as intelligent assistants.	翻訳日:2024-04-26 23:57:24 公開日:2024-04-24
# 1+1D $\mathbb{Z}_2$格子ゲージ理論における有限温度での閉じ込め Confinement in 1+1D $\mathbb{Z}_2$ Lattice Gauge Theories at Finite Temperature ( http://arxiv.org/abs/2308.08592v2 ) ライセンス: Link先を確認	Matjaž Kebrič, Jad C. Halimeh, Ulrich Schollwöck, Fabian Grusdt,	(参考訳) 閉じ込めはゲージ理論のパラダイム的な現象であり、その理解は高エネルギー物理学の最前線にある。ここでは, 有限温度での1次元$\mathbb{Z}_2$格子ゲージ理論の閉じ込めについて検討する。行列積状態(MPS)計算を用いることで、有限温度グリーン関数の崩壊を調べ、閉じ込められた状態と分解された状態の間の滑らかな交叉を明らかにする。さらに,MPSから採取したスナップショットから得られたフリーデル振動と弦長分布を実験により容易に利用でき,任意の有限温度で閉じ込められた中間子が適切に定義されていることを検証した。この現象学は、メソンのクエンチダイナミクスを正確に対角化することでさらに支持される。実験結果から, 有限温度における閉じ込めに関する新たな光が得られた。 Confinement is a paradigmatic phenomenon of gauge theories, and its understanding lies at the forefront of high-energy physics. Here, we study confinement in a simple one-dimensional $\mathbb{Z}_2$ lattice gauge theory at finite temperature and filling, which is within the reach of current cold-atom and superconducting-qubit platforms. By employing matrix product states (MPS) calculations, we investigate the decay of the finite-temperature Green's function and uncover a smooth crossover between the confined and deconfined regimes. Furthermore, using the Friedel oscillations and string length distributions obtained from snapshots sampled from MPS, both of which are experimentally readily available, we verify that confined mesons remain well-defined at arbitrary finite temperature. This phenomenology is further supported by probing quench dynamics of mesons with exact diagonalization. Our results shed new light on confinement at finite temperature from an experimentally relevant standpoint.	翻訳日:2024-04-26 23:57:24 公開日:2024-04-24
# G3Reg:ガウス楕円体モデルを用いたピラミッドグラフによるグローバルレジストレーション G3Reg: Pyramid Graph-based Global Registration using Gaussian Ellipsoid Model ( http://arxiv.org/abs/2308.11573v2 ) ライセンス: Link先を確認	Zhijian Qiao, Zehuan Yu, Binqian Jiang, Huan Yin, Shaojie Shen,	(参考訳) 本研究では,LiDAR点雲の高速かつ堅牢なグローバル登録のための新しいフレームワークであるG3Regを紹介する。従来の複雑なキーポイントやディスクリプタとは対照的に,原点雲から平面,クラスタ,線(PCL)を含む基本的な幾何学的プリミティブを抽出し,低レベルのセマンティックセグメントを得る。各セグメントは統一ガウス楕円体モデル (GEM) として表現され、確率楕円体を用いて基底真理中心が一定の確率で包含されることを保証する。本稿では,これらのGEMを用いて,グローバル登録のためのピラミッド適合性グラフ(PAGOR)に基づく不信・検証方式を提案する。具体的には、ピラミッドグラフを構築するための互換性テストの信頼性レベルに基づいて、上界を確立する。そして、ピラミッドグラフの各レベルに対して複数の最大傾き(MAC)を解き、対応する変換候補を生成する。検証段階では、最適候補を特定するために、幾何学的プリミティブに基づいて構築された点雲のアライメント品質の正確かつ効率的な測定基準を採用する。アルゴリズムのパフォーマンスは、公開されている3つのデータセットと、自己コンパイルされたマルチセッションデータセットで検証される。パラメータ設定は実験評価中も変化しなかった。その結果,G3Regフレームワークの高剛性と実時間性能は最先端の手法と比較して優れていた。さらに,個々のGEMおよびPAGORコンポーネントを他の登録フレームワークに統合して有効性を高める可能性を示した。コード:https://github.com/HKUST-Aerial-Robotics/G3Reg This study introduces a novel framework, G3Reg, for fast and robust global registration of LiDAR point clouds. In contrast to conventional complex keypoints and descriptors, we extract fundamental geometric primitives, including planes, clusters, and lines (PCL) from the raw point cloud to obtain low-level semantic segments. Each segment is represented as a unified Gaussian Ellipsoid Model (GEM), using a probability ellipsoid to ensure the ground truth centers are encompassed with a certain degree of probability. Utilizing these GEMs, we present a distrust-and-verify scheme based on a Pyramid Compatibility Graph for Global Registration (PAGOR). Specifically, we establish an upper bound, which can be traversed based on the confidence level for compatibility testing to construct the pyramid graph. Then, we solve multiple maximum cliques (MAC) for each level of the pyramid graph, thus generating the corresponding transformation candidates. In the verification phase, we adopt a precise and efficient metric for point cloud alignment quality, founded on geometric primitives, to identify the optimal candidate. The algorithm's performance is validated on three publicly available datasets and a self-collected multi-session dataset. Parameter settings remained unchanged during the experiment evaluations. The results exhibit superior robustness and real-time performance of the G3Reg framework compared to state-of-the-art methods. Furthermore, we demonstrate the potential for integrating individual GEM and PAGOR components into other registration frameworks to enhance their efficacy. Code: https://github.com/HKUST-Aerial-Robotics/G3Reg	翻訳日:2024-04-26 23:57:24 公開日:2024-04-24
# 弱レーザー励起下におけるダイヤモンド中の窒素空孔中心の光磁気共鳴 Optically Detected Magnetic Resonance of Nitrogen-Vacancy Centers in Diamond under Weak Laser Excitation ( http://arxiv.org/abs/2308.13351v2 ) ライセンス: Link先を確認	Yong-Hong Yu, Rui-Zhi Zhang, Yue Xu, Xiu-Qi Chen, Huijie Zheng, Quan Li, Ren-Bao Liu, Xin-Yu Pan, Dmitry Budker, Gang-Qin Liu,	(参考訳) 有望な量子センサーとして、ダイヤモンド中の窒素空孔(NV)中心は、凝縮物質物理学、物質科学、生命科学のフロンティア研究に広く用いられている。実用用途では、レーザー照射の副作用、例えば光毒性や加熱を減らすため、弱いレーザー励起が好ましい。弱い532nmレーザー励起下でのNV中心アンサンブルの光検出磁気共鳴(ODMR)の理論的および実験的研究を併用して報告する。この状態において、ODMRスペクトルの幅と分割はレーザーパワーの増加とともに減少する。この電力依存は、NV--N+対のレーザー誘起電荷中和を考慮したモデルで再現され、局所電界環境が変化する。これらの結果は、感光性アプリケーションにおけるNVベースの量子センシングの理解と設計に重要である。 As promising quantum sensors, nitrogen-vacancy (NV) centers in diamond have been widely used in frontier studies in condensed matter physics, material sciences, and life sciences. In practical applications, weak laser excitation is favorable as it reduces the side effects of laser irradiation, for example, phototoxicity and heating. Here we report a combined theoretical and experimental study of optically detected magnetic resonance (ODMR) of NV-center ensembles under weak 532-nm laser excitation. In this regime, both the width and splitting of ODMR spectra decrease with increasing laser power. This power dependence is reproduced with a model considering laser-induced charge neutralization of NV--N+ pairs, which alters the local electric field environment. These results are important for understanding and designing NV-based quantum sensing in light-sensitive applications.	翻訳日:2024-04-26 23:57:24 公開日:2024-04-24
# POCKET:特徴選択から見た時系列分類のためのランダム畳み込みカーネル POCKET: Pruning Random Convolution Kernels for Time Series Classification from a Feature Selection Perspective ( http://arxiv.org/abs/2309.08499v3 ) ライセンス: Link先を確認	Shaowu Chen, Weize Sun, Lei Huang, Xiaopeng Li, Qingyuan Wang, Deepu John,	(参考訳) 近年、ROCKETとMINIROCKETという2つの競合時系列分類モデルが、トレーニングコストの低さと高い精度で注目されている。しかし、リソース制約のあるデバイスと互換性のない機能を包括的にキャプチャするには、多数のランダムな1-D畳み込みカーネルが必要である。冗長カーネルを認識およびプルークするために設計されたヒューリスティックアルゴリズムの開発にもかかわらず、進化的アルゴリズムの本質的な時間的特性は効率的な評価を妨げている。そこで本論文では,逐次分類器の接続を不要にすることで,冗長なランダムカーネルを特徴選択の観点から排除する。 2つの革新的なアルゴリズムが提案され、第1のADMMベースのアルゴリズムはグループ弾性ネット分類問題としてプルーニングチャレンジを定式化し、第2のコアアルゴリズムであるPOCKETは問題を2段階に分岐させることで第1のアルゴリズムを大幅に高速化する。 POCKETのステージ1では、動的に異なるペナルティを導入して、冗長カーネルを削除するためにグループレベルの正規化を効率的に実装している。多様な時系列データセットによる実験結果から、POCKETは精度を著しく低下させることなく最大60%のカーネルを産み出し、それよりも11倍高速に動作していることがわかった。私たちのコードはhttps://github.com/ShaowuChen/POCKET.comで公開されています。 In recent years, two competitive time series classification models, namely, ROCKET and MINIROCKET, have garnered considerable attention due to their low training cost and high accuracy. However, they require a large number of random 1-D convolutional kernels to comprehensively capture features, which is incompatible with resource-constrained devices. Despite the development of heuristic algorithms designed to recognize and prune redundant kernels, the inherent time-consuming nature of evolutionary algorithms hinders efficient evaluation. To effectively prune models, this paper removes redundant random kernels from a feature selection perspective by eliminating associating connections in the sequential classifier. Two innovative algorithms are proposed, where the first ADMM-based algorithm formulates the pruning challenge as a group elastic net classification problem, and the second core algorithm named POCKET greatly accelerates the first one by bifurcating the problem into two sequential stages. Stage 1 of POCKET introduces dynamically varying penalties to efficiently implement group-level regularization to delete redundant kernels, and Stage 2 employs element-level regularization on the remaining features to refit a linear classifier for better performance. Experimental results on diverse time series datasets show that POCKET prunes up to 60% of kernels without a significant reduction in accuracy and performs 11 times faster than its counterparts. Our code is publicly available at https://github.com/ShaowuChen/POCKET.	翻訳日:2024-04-26 23:57:24 公開日:2024-04-24
# オンライングラフ学習のための不確実性駆動探索手法 Uncertainty-driven Exploration Strategies for Online Grasp Learning ( http://arxiv.org/abs/2309.12038v2 ) ライセンス: Link先を確認	Yitian Shi, Philipp Schillinger, Miroslav Gabriel, Alexander Qualmann, Zohar Feldman, Hanna Ziesche, Ngo Anh Vien,	(参考訳) 既存の把握予測アプローチは、主にオフライン学習に基づいており、オンライン適応中の探索的把握学習を、新しいピックシナリオ、すなわち、目に見えないオブジェクトや、ドメイン外(OOD)、カメラ、ビンの設定に無視する。本稿では,ロボットビンピッキングにおける把握予測のオンライン学習における不確実性に基づくアプローチを提案する。具体的には、効果的な探索戦略を持つオンライン学習アルゴリズムは、目に見えない環境設定への適応性を著しく向上させることができる。この目的のために,まずオンライン学習をRL問題として定式化することを提案する。ベイズの不確実性定量化と分布アンサンブルに基づく様々な不確実性推定手法を提案する。我々は,様々な難易度のある実世界のビンピッキングシーンの評価を行う。ビン内の物体は、半透明または全透明、不規則または湾曲した表面によって特徴づけられる様々な困難な物理的特徴と知覚的特徴を有する。実験の結果, ナイーブな探索戦略のみを取り入れた従来のオンライン学習手法と比較して, 把握能力の顕著な向上が示された。ビデオ:https://youtu.be/fPKOrjC2QrU Existing grasp prediction approaches are mostly based on offline learning, while, ignoring the exploratory grasp learning during online adaptation to new picking scenarios, i.e., objects that are unseen or out-of-domain (OOD), camera and bin settings, etc. In this paper, we present an uncertainty-based approach for online learning of grasp predictions for robotic bin picking. Specifically, the online learning algorithm with an effective exploration strategy can significantly improve its adaptation performance to unseen environment settings. To this end, we first propose to formulate online grasp learning as an RL problem that will allow us to adapt both grasp reward prediction and grasp poses. We propose various uncertainty estimation schemes based on Bayesian uncertainty quantification and distributional ensembles. We carry out evaluations on real-world bin picking scenes of varying difficulty. The objects in the bin have various challenging physical and perceptual characteristics that can be characterized by semi- or total transparency, and irregular or curved surfaces. The results of our experiments demonstrate a notable improvement of grasp performance in comparison to conventional online learning methods which incorporate only naive exploration strategies. Video: https://youtu.be/fPKOrjC2QrU	翻訳日:2024-04-26 23:57:24 公開日:2024-04-24
# RoleLLM: 大規模言語モデルのベンチマーク、緩和、ロールプレイ能力向上 RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models ( http://arxiv.org/abs/2310.00746v2 ) ライセンス: Link先を確認	Zekun Moore Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhan Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Jian Yang, Man Zhang, Zhaoxiang Zhang, Wanli Ouyang, Ke Xu, Stephen W. Huang, Jie Fu, Junran Peng,	(参考訳) LLM(Large Language Models)の出現は、ロールプレイングのような複雑なタスクの道を開いた。しかし、最先端のLCMのクローズソースの性質と、それらの汎用的なトレーニングはロールプレイングの最適化を制限している。本稿では,LLMにおけるロールプレイング能力をベンチマークし,評価し,拡張するフレームワークであるRoleLLMを紹介する。 RoleLLM は,(1) 役割のロールプロファイル構築,(2) 役割固有の知識抽出のためのコンテキストベースインストラクション生成(Context-Instruction Generation),(3) GPT (RoleGPT) を用いた発話スタイル模倣のためのロールプロンプト,(4) オープンソースモデルの微調整のためのロールコンストラクションインストラクションチューニング (RoCIT) の4段階から構成される。 Context-InstructとRoleGPTによって、168,093サンプルでロールプレイする最初の体系的できめ細かい文字レベルのベンチマークデータセットであるRoleBenchを作成します。さらに、RoleBench上のRoCITはRoleLLaMA(英語)とRoleGLM(中国語)を生成し、ロールプレイング能力を大幅に向上させ、RoleGPT(GPT-4)と同等の結果を得る。 The advent of Large Language Models (LLMs) has paved the way for complex tasks such as role-playing, which enhances user interactions by enabling models to imitate various characters. However, the closed-source nature of state-of-the-art LLMs and their general-purpose training limit role-playing optimization. In this paper, we introduce RoleLLM, a framework to benchmark, elicit, and enhance role-playing abilities in LLMs. RoleLLM comprises four stages: (1) Role Profile Construction for 100 roles; (2) Context-Based Instruction Generation (Context-Instruct) for role-specific knowledge extraction; (3) Role Prompting using GPT (RoleGPT) for speaking style imitation; and (4) Role-Conditioned Instruction Tuning (RoCIT) for fine-tuning open-source models along with role customization. By Context-Instruct and RoleGPT, we create RoleBench, the first systematic and fine-grained character-level benchmark dataset for role-playing with 168,093 samples. Moreover, RoCIT on RoleBench yields RoleLLaMA (English) and RoleGLM (Chinese), significantly enhancing role-playing abilities and even achieving comparable results with RoleGPT (using GPT-4).	翻訳日:2024-04-26 23:57:24 公開日:2024-04-24
# SEED: 大規模言語モデルによるドメイン特化データキュレーション SEED: Domain-Specific Data Curation With Large Language Models ( http://arxiv.org/abs/2310.00749v3 ) ライセンス: Link先を確認	Zui Chen, Lei Cao, Sam Madden, Tim Kraska, Zeyuan Shang, Ju Fan, Nan Tang, Zihui Gu, Chunwei Liu, Michael Cafarella,	(参考訳) 分析のためにデータを作成するデータキュレーションタスクは、データを実行可能な洞察に変換する上で非常に重要です。しかし、異なるドメインにおけるアプリケーションの多様な要求のため、一般的なオフザシェルフツールは一般的に不十分である。その結果、データサイエンティストはデータセットとタスクの両方に適したドメイン固有のソリューションを開発する必要がある。このプロセスは、非常に難しく、時間がかかります。本稿では,Large Language Models (LLMs) を通じて,ドメイン固有のデータキュレーションソリューションを自動生成する LLM-as-compiler アプローチのSEEDを提案する。ユーザがタスクや入力データ、期待される出力を記述すると、SEEDコンパイラは、LLMクエリと、ベクトルベースのキャッシュ、LLM生成コード、LLMアノテーション付きデータに基づいてトレーニングされた小さなモデルといった、よりコスト効率のよい代替品を組み合わせたハイブリッドパイプラインを生成する。 SEEDは4つのLCMアシストモジュールから自動的に選択するオプティマイザを備えており、そのタスクに最も適したハイブリッド実行パイプラインを形成している。この新しい革命的アプローチを検証するために、私たちは5ドル以上のデータキュレーションタスクにまたがる9ドルのデータセットの実験を行いました。すべてのデータレコードでLLMを使用するソリューションと比較して、SEEDは最先端または同等の数ショットのパフォーマンスを達成し、LLM呼び出しの数を著しく削減する。 Data curation tasks that prepare data for analytics are critical for turning data into actionable insights. However, due to the diverse requirements of applications in different domains, generic off-the-shelf tools are typically insufficient. As a result, data scientists often have to develop domain-specific solutions tailored to both the dataset and the task, e.g. writing domain-specific code or training machine learning models on a sufficient number of annotated examples. This process is notoriously difficult and time-consuming. We present SEED, an LLM-as-compiler approach that automatically generates domain-specific data curation solutions via Large Language Models (LLMs). Once the user describes a task, input data, and expected output, the SEED compiler produces a hybrid pipeline that combines LLM querying with more cost-effective alternatives, such as vector-based caching, LLM-generated code, and small models trained on LLM-annotated data. SEED features an optimizer that automatically selects from the four LLM-assisted modules and forms a hybrid execution pipeline that best fits the task at hand. To validate this new, revolutionary approach, we conducted experiments on $9$ datasets spanning over $5$ data curation tasks. In comparison to solutions that use the LLM on every data record, SEED achieves state-of-the-art or comparable few-shot performance, while significantly reducing the number of LLM calls.	翻訳日:2024-04-26 23:47:37 公開日:2024-04-24
# リアルタイムでジェネリックなマルチタスクで一度だけ見る You Only Look at Once for Real-time and Generic Multi-Task ( http://arxiv.org/abs/2310.01641v4 ) ライセンス: Link先を確認	Jiayuan Wang, Q. M. Jonathan Wu, Ning Zhang,	(参考訳) 高精度で軽量でリアルタイムな応答性は、自動運転を実装する上で必須の3つの要件である。本研究では,適応型,リアルタイム,軽量なマルチタスクモデルであるA-YOLOMを導入する。具体的には、統一的で合理化されたセグメンテーション構造を持つエンドツーエンドのマルチタスクモデルを開発する。セグメンテーションタスクにおいて,すべてのセグメンテーションタスクに対して同じ損失関数を用いて,ネックとバックボーンの機能を適応的に結合する学習可能なパラメータを提案する。これにより、カスタマイズの必要性がなくなり、モデルの一般化能力が強化される。また,一連の畳み込み層のみで構成されたセグメンテーションヘッドを導入し,パラメータ数と推定時間を削減する。 BDD100kデータセット上で、特に視覚化結果の競合的な結果を達成する。その結果, 物体検出用mAP50は81.1%, 乾燥領域分割用mIoUは91.0%, レーン線分割用IoUは28.8%であった。さらに、実環境におけるモデルの性能を評価するための現実シナリオを導入し、競争相手を著しく上回ります。これは、我々のモデルが競争性能を示すだけでなく、既存のマルチタスクモデルよりも柔軟で高速であることを示している。ソースコードと事前訓練済みモデルはhttps://github.com/JiayuanWang-JW/YOLOv8-multi-taskで公開されている。 High precision, lightweight, and real-time responsiveness are three essential requirements for implementing autonomous driving. In this study, we incorporate A-YOLOM, an adaptive, real-time, and lightweight multi-task model designed to concurrently address object detection, drivable area segmentation, and lane line segmentation tasks. Specifically, we develop an end-to-end multi-task model with a unified and streamlined segmentation structure. We introduce a learnable parameter that adaptively concatenates features between necks and backbone in segmentation tasks, using the same loss function for all segmentation tasks. This eliminates the need for customizations and enhances the model's generalization capabilities. We also introduce a segmentation head composed only of a series of convolutional layers, which reduces the number of parameters and inference time. We achieve competitive results on the BDD100k dataset, particularly in visualization outcomes. The performance results show a mAP50 of 81.1% for object detection, a mIoU of 91.0% for drivable area segmentation, and an IoU of 28.8% for lane line segmentation. Additionally, we introduce real-world scenarios to evaluate our model's performance in a real scene, which significantly outperforms competitors. This demonstrates that our model not only exhibits competitive performance but is also more flexible and faster than existing multi-task models. The source codes and pre-trained models are released at https://github.com/JiayuanWang-JW/YOLOv8-multi-task	翻訳日:2024-04-26 23:47:37 公開日:2024-04-24
# スクラッチから遠ざかる - データ駆動プライオリティを必要とするロングシーケンスモデルの比較 Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors ( http://arxiv.org/abs/2310.02980v3 ) ライセンス: Link先を確認	Ido Amos, Jonathan Berant, Ankit Gupta,	(参考訳) シーケンス間の長距離依存関係のモデリングは、機械学習における長年の目標であり、状態空間モデルのようなアーキテクチャに導かれ、長いシーケンス上でトランスフォーマーを劇的に上回っている。しかし、これらの印象的な経験的利益は、モデルがランダムに初期化され、入力シーケンスからターゲットラベルを予測するために訓練されたベンチマーク(例えばLong Range Arena)において、大きく証明されている。本稿では, ランダム初期化がアーキテクチャの違いの過大な過大評価につながることを示すとともに, $\textit{only the downstream task data}$を用いることで, トランスフォーマーと状態空間モデル(SSM)の極めて小さなギャップを生じることを示す。従来の作業とは対照的に,Long Range ArenaにおけるS4の性能に適合するバニラトランスフォーマーが発見され,PathX-256タスクにおけるSSMの最高の報告結果を20絶対点改善する。次に, 事前学習により得られたデータ駆動初期化の存在下で, 従来提案されていたSSMに対する構造化パラメータ化の有用性を解析し, ほとんど冗長となることを示す。我々の研究は、教師付きタスク上で異なるアーキテクチャを評価する際に、事前学習によるデータ駆動の事前学習が信頼性の高い性能推定に不可欠であることを示し、効率的に行うことができることを示した。 Modeling long-range dependencies across sequences is a longstanding goal in machine learning and has led to architectures, such as state space models, that dramatically outperform Transformers on long sequences. However, these impressive empirical gains have been by and large demonstrated on benchmarks (e.g. Long Range Arena), where models are randomly initialized and trained to predict a target label from an input sequence. In this work, we show that random initialization leads to gross overestimation of the differences between architectures and that pretraining with standard denoising objectives, using $\textit{only the downstream task data}$, leads to dramatic gains across multiple architectures and to very small gaps between Transformers and state space models (SSMs). In stark contrast to prior works, we find vanilla Transformers to match the performance of S4 on Long Range Arena when properly pretrained, and we improve the best reported results of SSMs on the PathX-256 task by 20 absolute points. Subsequently, we analyze the utility of previously-proposed structured parameterizations for SSMs and show they become mostly redundant in the presence of data-driven initialization obtained through pretraining. Our work shows that, when evaluating different architectures on supervised tasks, incorporation of data-driven priors via pretraining is essential for reliable performance estimation, and can be done efficiently.	翻訳日:2024-04-26 23:47:37 公開日:2024-04-24
# PST:プログラムスケッチベースのチューニングによる量的トレーディングの改善 PST: Improving Quantitative Trading via Program Sketch-based Tuning ( http://arxiv.org/abs/2310.05551v2 ) ライセンス: Link先を確認	Zhiming Li, Junzhe Jiang, Yushi Cao, Aixin Cui, Bozhi Wu, Bo Li, Yang Liu, Dongning Sun,	(参考訳) 深層強化学習(DRL)は、有能な人的知識を伴わずに十分なパフォーマンスを達成し、量的金融に革命をもたらした。その成果にもかかわらず、現在最先端のDRLモデルは依然として市場の動向を特定するのに効果がなく、良い取引機会を逃したり、市場崩壊に遭遇した場合に大きな損失を被ることになる。この制限に対処するためには、市場の動向に関する人間の専門知識を組み込むことが自然な考えである。しかし、そのような知識は抽象的で定量化が難しい。本稿では,プログラム・スケッチ・ベース・チューニング(PST)と呼ばれる,普遍的なニューロシンボリック・チューニング・フレームワークを提案する。特に、PSTは、新しい記号プログラムスケッチを使用して、市場動向に関する抽象的人間専門家の知識を埋め込むことを最初に提案する。そして、プログラムスケッチを利用して、現在の市場動向に応じて訓練されたDRLポリシーをチューニングする。最後に,このニューラルシンボリックフレームワークを最適化するために,新しいハイブリッド最適化手法を提案する。 2つの一般的な量的トレーディングタスクに対する広範囲な評価は、PSTが非常に軽量でありながら、従来の最先端DRL戦略の性能を大幅に向上させることができることを示している。 Deep reinforcement learning (DRL) has revolutionized quantitative finance by achieving decent performance without significant human expert knowledge. Despite its achievements, we observe that the current state-of-the-art DRL models are still ineffective in identifying the market trend, causing them to miss good trading opportunities or suffer from large drawdowns when encountering market crashes. To tackle this limitation, a natural idea is to embed human expert knowledge regarding the market trend. Whereas, such knowledge is abstract and hard to be quantified. In this paper, we propose a universal neuro-symbolic tuning framework, called program sketch-based tuning (PST). Particularly, PST first proposes using a novel symbolic program sketch to embed the abstract human expert knowledge of market trends. Then we utilize the program sketch to tune a trained DRL policy according to the different market trend of the moment. Finally, in order to optimize this neural-symbolic framework, we propose a novel hybrid optimization method. Extensive evaluations on two popular quantitative trading tasks demonstrate that PST can significantly enhance the performance of previous state-of-the-art DRL strategies while being extremely lightweight.	翻訳日:2024-04-26 23:47:37 公開日:2024-04-24
# 大規模言語モデルはルールを学習できる Large Language Models can Learn Rules ( http://arxiv.org/abs/2310.07064v2 ) ライセンス: Link先を確認	Zhaocheng Zhu, Yuan Xue, Xinyun Chen, Denny Zhou, Jian Tang, Dale Schuurmans, Hanjun Dai,	(参考訳) いくつかの例と中間ステップで促されると、大きな言語モデル(LLM)は、様々な推論タスクにおいて印象的なパフォーマンスを示している。しかし、LLMにおける暗黙の知識に依存しているメソッドのプロンプトは、暗黙の知識が誤りであったり、そのタスクと矛盾している場合、しばしば誤った答えを生じる。この問題に対処するために,LLMによる推論のためのルールライブラリを学習するフレームワークであるHtT(Hypotheses-to-Theories)を提案する。 HtTは、誘導段階と推論段階の2つの段階を含む。誘導段階では、LLMはまず一連のトレーニング例に基づいてルールを生成し検証するように要求される。出現し、十分な正答につながるルールは、ルールライブラリを形成するために収集されることが多い。推論段階では、LLMは学習ルールライブラリを使用して、テスト問題に答えるための推論を行うように促される。リレーショナル推論、数値推論、概念学習に関する実験は、HtTが既存のプロンプト法を改良し、絶対精度が10～30%向上したことを示している。学習されたルールは、異なるモデルや同じ問題の異なる形式にも転送可能である。 When prompted with a few examples and intermediate steps, large language models (LLMs) have demonstrated impressive performance in various reasoning tasks. However, prompting methods that rely on implicit knowledge in an LLM often generate incorrect answers when the implicit knowledge is wrong or inconsistent with the task. To tackle this problem, we present Hypotheses-to-Theories (HtT), a framework that learns a rule library for reasoning with LLMs. HtT contains two stages, an induction stage and a deduction stage. In the induction stage, an LLM is first asked to generate and verify rules over a set of training examples. Rules that appear and lead to correct answers sufficiently often are collected to form a rule library. In the deduction stage, the LLM is then prompted to employ the learned rule library to perform reasoning to answer test questions. Experiments on relational reasoning, numerical reasoning and concept learning problems show that HtT improves existing prompting methods, with an absolute gain of 10-30% in accuracy. The learned rules are also transferable to different models and to different forms of the same problem.	翻訳日:2024-04-26 23:47:37 公開日:2024-04-24
# 不定量子ダイナミクスによる光回転計測におけるナノラディアンスケール精度 Nanoradian-Scale Precision in Light Rotation Measurement via Indefinite Quantum Dynamics ( http://arxiv.org/abs/2310.07125v3 ) ライセンス: Link先を確認	Binke Xia, Jingzheng Huang, Hongjing Li, Zhongyuan Luo, Guihua Zeng,	(参考訳) 光ビームの操作とメロジは光学科学や応用にとって重要な要素である。特に、光線回転測定における超高精度の達成は、長年にわたる課題である。絡み合った光子のような量子プローブを利用する代わりに、量子パラメータ推定のパラメータ化プロセスに「不定時間方向」と呼ばれる量子戦略を組み込むことで、この問題に対処する。パラメータ化力学のこの量子特性を活用することで、ビームプロファイルの極小角回転を測定するためのOAM資源の利用を最大化することができる。特に、ナノラジアンスケールの光回転測定精度が実験でようやく達成された。さらに、このスキームは光子によって提供される様々な操作可能な資源のために、様々な光学応用において有望である。 The manipulation and metrology of light beams are pivotal for optical science and applications. In particular, achieving ultra-high precision in the measurement of light beam rotations has been a long-standing challenge. Instead of utilizing quantum probes like entangled photons, we address this challenge by incorporating a quantum strategy called "indefinite time direction" into the parameterizing process of quantum parameter estimation. Leveraging this quantum property of the parameterizing dynamics allows us to maximize the utilization of OAM resources for measuring ultra-small angular rotations of beam profile. Notably, a nanoradian-scale precision of light rotation measurement is finally achieved in the experiment, which is the highest precision by far to our best knowledge. Furthermore, this scheme holds promise in various optical applications due to the diverse range of manipulable resources offered by photons.	翻訳日:2024-04-26 23:47:37 公開日:2024-04-24
# ルート付きベル試験による長距離量子相関の証明 Certifying long-range quantum correlations through routed Bell tests ( http://arxiv.org/abs/2310.07484v4 ) ライセンス: Link先を確認	Edwin Peter Lobo, Jef Pauwels, Stefano Pironio,	(参考訳) 透過チャネルの損失は距離とともに増加するが、量子非局所性のフォトニクスの実証とその応用に大きな障害となる。最近、Chaturvedi, Viola, and Pawlowski (CVP) [arXiv:2211.14231] は、量子非局所性を証明できる範囲を拡張することを目的として、標準ベルの実験のバリエーションを導入した。と呼ばれるこれらの実験では、Bobは量子粒子を2つの可能な経路に沿ってルートし、それを2つの異なる場所(近距離と遠距離)で測定することができる。ショートパスにおけるベルの違反は、ロングパスにおける非局所的相関を検出するために必要な条件を弱めるべきである。実際、CVPはルーティングされたベル実験において、検出効率が任意に低い場合でも、リモートデバイスの結果を古典的に規定できないような量子相関が存在することを示した。本稿では,CVPが考慮した相関関係を古典的に規定することはできないが,遠隔デバイスへの量子システムの伝送を必要としないことを示す。これにより、ルート付きベル実験において「短距離」および「長距離」量子相関の概念が定義される。これらの相関は、非可換多項式最適化のための標準半定値プログラム階層によって特徴づけられることを示す。次に、短距離量子相関を除外できる条件について検討する。我々は、遠方装置の臨界検出効率に基本的な低バウンドがあることを指摘し、経路付きベル実験では、任意に大きな距離で長距離量子非局所性を証明できないことを示唆している。しかし,経路付きベル実験により検出効率の閾値が低下することが判明した。しかし、改善はCVPの分析によって示唆されるものよりも大幅に小さい。 Losses in the transmission channel, which increase with distance, pose a major obstacle to photonics demonstrations of quantum nonlocality and its applications. Recently, Chaturvedi, Viola, and Pawlowski (CVP) [arXiv:2211.14231] introduced a variation of standard Bell experiments with the goal of extending the range over which quantum nonlocality can be demonstrated. In these experiments, which we call 'routed Bell experiments', Bob can route his quantum particle along two possible paths and measure it at two distinct locations - one near and another far from the source. The idea is that a Bell violation in the short-path should weaken the conditions required to detect nonlocal correlations in the long-path. Indeed, CVP showed that there are quantum correlations in routed Bell experiments such that the outcomes of the remote device cannot be classically predetermined, even when its detection efficiency is arbitrarily low. In this paper, we show that the correlations considered by CVP, though they cannot be classically predetermined, do not require the transmission of quantum systems to the remote device. This leads us to define the concept of 'short-range' and 'long-range' quantum correlations in routed Bell experiments. We show that these correlations can be characterized through standard semidefinite programming hierarchies for non-commutative polynomial optimization. We then explore the conditions under which short-range quantum correlations can be ruled out. We point out that there exist fundamental lower-bounds on the critical detection efficiency of the distant device, implying that routed Bell experiments cannot demonstrate long-range quantum nonlocality at arbitrarily large distances. However, we do find that routed Bell experiments allow for reducing the detection efficiency threshold. The improvements, though, are significantly smaller than those suggested by CVP's analysis.	翻訳日:2024-04-26 23:47:37 公開日:2024-04-24
# 視覚的注意刺激による予測と学習 Visual Attention Prompted Prediction and Learning ( http://arxiv.org/abs/2310.08420v3 ) ライセンス: Link先を確認	Yifei Zhang, Siyi Gu, Bo Pan, Guangji Bai, Meikang Qiu, Xiaofeng Yang, Liang Zhao,	(参考訳) 視覚的説明(注意)誘導学習はラベルだけでなく、モデル推論プロセスのガイドにも用いられる。視覚的注意誘導学習は有望な結果を示しているが、準備に時間を要する多くの説明アノテーションが必要である。しかし、現実の多くの状況では、モデルの再訓練なしに視覚的注意を喚起することが望まれる。例えば、医療画像上でAI支援がん分類を行う場合、利用者(例えば臨床医)は、どの領域が必須で、どの領域が除外されているかという視覚的な注意喚起をAIモデルに提供することができる。その有望な目標にもかかわらず、視覚的な注意を喚起する予測を達成することは、いくつかの大きな課題を提示する。 1) モデル推論プロセスに視覚的プロンプトを効果的に組み込むには,どうすればよいのか? 2) 視覚的なプロンプトを欠いたサンプルをどう扱うべきか? 3)視覚的プロンプトが不完全である場合,モデルのパフォーマンスにどのような影響があるのか? 本稿では,視覚的プロンプトを利用してモデルの推論過程を制御し,注意喚起による予測と学習のための新しい枠組みを提案する。非プロンプト状況における性能向上と、それに伴うシナリオの調整を目的として、非プロンプトモデルとプロンプトモデルの両方に対する協調学習手法を提案し、同様のパラメータとアクティベーションの共有を保証した。さらに、視覚的プロンプトが入力画像全体を包含していない場合、革新的な注意喚起プロンプト改善法が開発されている。これらの手法は、モデルの説明と整合性を維持しながら不完全なプロンプトを補間する。 4つのデータセットに対する大規模な実験により,提案手法の有効性が実証された。 Visual explanation (attention)-guided learning uses not only labels but also explanations to guide model reasoning process. While visual attention-guided learning has shown promising results, it requires a large number of explanation annotations that are time-consuming to prepare. However, in many real-world situations, it is usually desired to prompt the model with visual attention without model retraining. For example, when doing AI-assisted cancer classification on a medical image, users (e.g., clinicians) can provide the AI model with visual attention prompt on which areas are indispensable and which are precluded. Despite its promising objectives, achieving visual attention-prompted prediction presents several major challenges: 1) How can the visual prompt be effectively integrated into the model's reasoning process? 2) How should the model handle samples that lack visual prompts? 3) What is the impact on the model's performance when a visual prompt is imperfect? This paper introduces a novel framework for attention-prompted prediction and learning, utilizing visual prompts to steer the model's reasoning process. To improve performance in non-prompted situations and align it with prompted scenarios, we propose a co-training approach for both non-prompted and prompted models, ensuring they share similar parameters and activations. Additionally, for instances where the visual prompt does not encompass the entire input image, we have developed innovative attention prompt refinement methods. These methods interpolate the incomplete prompts while maintaining alignment with the model's explanations. Extensive experiments on four datasets demonstrate the effectiveness of our proposed framework in enhancing predictions for samples both with and without prompt.	翻訳日:2024-04-26 23:47:37 公開日:2024-04-24
# 脳年齢予測へのボクセルレベルのアプローチ:局所的脳老化評価法 A voxel-level approach to brain age prediction: A method to assess regional brain aging ( http://arxiv.org/abs/2310.11385v2 ) ライセンス: Link先を確認	Neha Gianchandani, Mahsa Dibaji, Johanna Ospel, Fernando Vega, Mariana Bento, M. Ethan MacDonald, Roberto Souza,	(参考訳) 脳の老化は局所的な現象であり、機械学習の手法を用いて脳年齢予測研究の領域内では比較的解明されていない。ボクセルレベルの予測は、局所的な脳年齢推定を提供し、局所的な老化過程に関する詳細な洞察を与えることができる。これは,健常者と疾患者における老化軌跡の相違を理解するために不可欠である。本研究では,T1強調磁気共鳴画像からのボクセルレベルの脳年齢予測のために,深層学習に基づくマルチタスクモデルを提案する。提案モデルは文献に存在するモデルより優れており、健康な人口と病気の人口の両方に適用した場合に貴重な臨床所見が得られる。脳の既知の解剖学的領域の老化軌跡を理解するために、ボクセルレベルの脳年齢予測を用いて局所分析を行い、認知症やより具体的にはアルツハイマー病のような基礎疾患の患者と比較して、健常者の地域老化軌跡に相違があることが示されている。私たちのコードはhttps://github.com/nehagianchandani/Voxel-level-brain-age-predictionで公開されています。 Brain aging is a regional phenomenon, a facet that remains relatively under-explored within the realm of brain age prediction research using machine learning methods. Voxel-level predictions can provide localized brain age estimates that can provide granular insights into the regional aging processes. This is essential to understand the differences in aging trajectories in healthy versus diseased subjects. In this work, a deep learning-based multitask model is proposed for voxel-level brain age prediction from T1-weighted magnetic resonance images. The proposed model outperforms the models existing in the literature and yields valuable clinical insights when applied to both healthy and diseased populations. Regional analysis is performed on the voxel-level brain age predictions to understand aging trajectories of known anatomical regions in the brain and show that there exist disparities in regional aging trajectories of healthy subjects compared to ones with underlying neurological disorders such as Dementia and more specifically, Alzheimer's disease. Our code is available at https://github.com/nehagianchandani/Voxel-level-brain-age-prediction.	翻訳日:2024-04-26 23:47:37 公開日:2024-04-24
# ニューラルパーセプション機構を持つ部分観測可能な確率ゲーム Partially Observable Stochastic Games with Neural Perception Mechanisms ( http://arxiv.org/abs/2310.11566v2 ) ライセンス: Link先を確認	Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta Kwiatkowska,	(参考訳) 確率ゲームは、不確実性の下でのマルチエージェントシーケンシャル決定のためのよく確立されたモデルである。しかし、現実的な応用では、エージェントは環境の部分的な観察性しか持たないことが多い。さらに、エージェントは、継続的データに基づいてトレーニングされたニューラルネットワークのようなデータ駆動アプローチを使用して、環境をますます知覚する。本稿では,ニューラルシンボリックな部分可観測確率ゲーム(NS-POSG)のモデルを提案する。我々は、離散的データ駆動観察と、完全インフォームドエージェントを用いた部分インフォームドエージェントによる一方的な設定に焦点を当てた。本稿では,片側NS-POSGを近似解として,片側NS-HSVIと呼ばれる新しい手法を提案する。ニューラルネットワークプレイメージ分析を用いて,有限多面体表現と粒子に基づく信念表現を構築し,歩行者車と追従回避シナリオの分析にその実践的適用性を示す。 Stochastic games are a well established model for multi-agent sequential decision making under uncertainty. In practical applications, though, agents often have only partial observability of their environment. Furthermore, agents increasingly perceive their environment using data-driven approaches such as neural networks trained on continuous data. We propose the model of neuro-symbolic partially-observable stochastic games (NS-POSGs), a variant of continuous-space concurrent stochastic games that explicitly incorporates neural perception mechanisms. We focus on a one-sided setting with a partially-informed agent using discrete, data-driven observations and another, fully-informed agent. We present a new method, called one-sided NS-HSVI, for approximate solution of one-sided NS-POSGs, which exploits the piecewise constant structure of the model. Using neural network pre-image analysis to construct finite polyhedral representations and particle-based representations for beliefs, we implement our approach and illustrate its practical applicability to the analysis of pedestrian-vehicle and pursuit-evasion scenarios.	翻訳日:2024-04-26 23:47:37 公開日:2024-04-24
# PopDescentでスケジュールをストラップする Scrap Your Schedules with PopDescent ( http://arxiv.org/abs/2310.14671v2 ) ライセンス: Link先を確認	Abhinav Pomalapally, Bassel El Mabsout, Renato Mansuco,	(参考訳) 現代の機械学習のワークロードでは、多くのハイパーパラメータ探索アルゴリズムが頻繁に使われ、学習や正規化率などのハイパフォーマンスなハイパーパラメータ値を効率的に発見する。その結果、トレーニング中にハイパーパラメータを調整する能力を活用し、損失性能を向上させるために、パラメータスケジュールの幅が設計された。しかし、これらのスケジュールは、探索すべき新しいハイパーパラメータを導入し、トレーニング中のモデルの現在の損失値を考慮しない。これらの課題に対処するため,我々は,人口探索を用いた進捗対応ハイパーパラメータチューニング技術であるPopDescent(PopDescent)を提案する。 PopDescentは進化的および局所的な探索プロセスを統合することで、そのパフォーマンスに基づいてトレーニング中のハイパーパラメータオプションを積極的に探索する。標準的な機械学習ビジョンタスクの試行では、PopDescentは既存の検索手法よりも高速に収束し、テストロス値が最大18%低いモデルパラメータがスケジュールの利用を考慮しても見つかる。さらに,PopDescentの強靭さを,その初期訓練パラメータに強調する。 In contemporary machine learning workloads, numerous hyper-parameter search algorithms are frequently utilized to efficiently discover high-performing hyper-parameter values, such as learning and regularization rates. As a result, a range of parameter schedules have been designed to leverage the capability of adjusting hyper-parameters during training to enhance loss performance. These schedules, however, introduce new hyper-parameters to be searched and do not account for the current loss values of the models being trained. To address these issues, we propose Population Descent (PopDescent), a progress-aware hyper-parameter tuning technique that employs a memetic, population-based search. By merging evolutionary and local search processes, PopDescent proactively explores hyper-parameter options during training based on their performance. Our trials on standard machine learning vision tasks show that PopDescent converges faster than existing search methods, finding model parameters with test-loss values up to 18% lower, even when considering the use of schedules. Moreover, we highlight the robustness of PopDescent to its initial training parameters, a crucial characteristic for hyper-parameter search techniques.	翻訳日:2024-04-26 23:47:37 公開日:2024-04-24
# フリーフォームフロー:任意のアーキテクチャを正規化フローにする Free-form Flows: Make Any Architecture a Normalizing Flow ( http://arxiv.org/abs/2310.16624v2 ) ライセンス: Link先を確認	Felix Draxler, Peter Sorrenson, Lea Zimmermann, Armand Rousselot, Ullrich Köthe,	(参考訳) 正規化フローは、可能性を直接最大化する生成モデルである。従来, 正規化フローの設計は解析的可逆性の必要性に大きく制約されていた。この制約を,変数式の変化の勾配を効率的に推定する訓練手法によって克服する。これにより、任意の次元保存ニューラルネットワークが、最大限のトレーニングを通じて生成モデルとして機能することが可能になる。当社のアプローチでは,手元にあるタスクに対して,帰納的バイアスを正確に調整することに重点を置くことが可能です。具体的には、$E(n)$-equivariantネットワークを用いた分子生成ベンチマークにおいて優れた結果を得る。さらに,本手法は,市販のResNetアーキテクチャを採用しながら,逆問題ベンチマークにおいて競合する。 Normalizing Flows are generative models that directly maximize the likelihood. Previously, the design of normalizing flows was largely constrained by the need for analytical invertibility. We overcome this constraint by a training procedure that uses an efficient estimator for the gradient of the change of variables formula. This enables any dimension-preserving neural network to serve as a generative model through maximum likelihood training. Our approach allows placing the emphasis on tailoring inductive biases precisely to the task at hand. Specifically, we achieve excellent results in molecule generation benchmarks utilizing $E(n)$-equivariant networks. Moreover, our method is competitive in an inverse problem benchmark, while employing off-the-shelf ResNet architectures.	翻訳日:2024-04-26 23:37:50 公開日:2024-04-24
# UWFormer:半監督型マルチスケール変圧器による水中画像強調 UWFormer: Underwater Image Enhancement via a Semi-Supervised Multi-Scale Transformer ( http://arxiv.org/abs/2310.20210v4 ) ライセンス: Link先を確認	Weiwen Chen, Yingtie Lei, Shenghong Luo, Ziyang Zhou, Mingxian Li, Chi-Man Pun,	(参考訳) 水中画像は、光、水、物体の複雑な複雑な相互作用のため、品質が悪く、色バランスが歪んだり、コントラストが低かったりすることが多い。従来の水中強化技術には大きな貢献があったが、さらなる改善を求める問題がいくつかある。 (i)現在のディープラーニング手法は、マルチスケールの強化を欠いた畳み込みニューラルネットワーク(CNN)に依存しており、グローバルな知覚場も制限されている。 (II)実世界の水中データセットの不足は大きな課題となり、合成画像ペアの利用が過度に適合する可能性がある。上記の問題に対処するため, 半教師付き学習による複数周波数画像の強調を行うUWFormerと呼ばれるマルチスケールトランスフォーマーネットワークを導入し, 低周波数強調のための非線形周波数認識アテンション機構とマルチスケールフュージョンフィードフォワードネットワークを提案する。さらに,水中における半教師付き訓練戦略を導入し,疑似ラベルを生成するためのサブアキュースパーセプティカルロス関数を提案する。完全参照型および非参照型水中ベンチマークを用いた実験により,本手法は,量および視覚的品質の両面で最先端の手法より優れていることが示された。 Underwater images often exhibit poor quality, distorted color balance and low contrast due to the complex and intricate interplay of light, water, and objects. Despite the significant contributions of previous underwater enhancement techniques, there exist several problems that demand further improvement: (i) The current deep learning methods rely on Convolutional Neural Networks (CNNs) that lack the multi-scale enhancement, and global perception field is also limited. (ii) The scarcity of paired real-world underwater datasets poses a significant challenge, and the utilization of synthetic image pairs could lead to overfitting. To address the aforementioned problems, this paper introduces a Multi-scale Transformer-based Network called UWFormer for enhancing images at multiple frequencies via semi-supervised learning, in which we propose a Nonlinear Frequency-aware Attention mechanism and a Multi-Scale Fusion Feed-forward Network for low-frequency enhancement. Besides, we introduce a special underwater semi-supervised training strategy, where we propose a Subaqueous Perceptual Loss function to generate reliable pseudo labels. Experiments using full-reference and non-reference underwater benchmarks demonstrate that our method outperforms state-of-the-art methods in terms of both quantity and visual quality.	翻訳日:2024-04-26 23:37:50 公開日:2024-04-24
# 協調フィルタリングのためのグラフ信号拡散モデル Graph Signal Diffusion Model for Collaborative Filtering ( http://arxiv.org/abs/2311.08744v3 ) ライセンス: Link先を確認	Yunqin Zhu, Chao Wang, Qi Zhang, Hui Xiong,	(参考訳) 協調フィルタリングはレコメンデータシステムにおいて重要な手法である。ユーザフィードバックデータに対する条件付き生成タスクとして,新たな拡散モデルが大きな可能性を秘めている。しかし、既存の拡散モデルの研究では、暗黙のフィードバックをモデル化するための効果的な解決策が欠如している。特に、標準等方拡散過程は、相互作用空間のグラフィカル構造と誤って、アイテム間の相関性を見落としている。一方、ガウスノイズはユーザのインタラクションベクター内のパーソナライズされた情報を破壊し、その再構築が困難になる。本稿では,標準拡散モデルを適用し,協調フィルタリングのためのグラフ信号拡散モデル(GiffCF)を提案する。ユーザ・イテム相互作用の相関分布をよりよく表現するために、アイテム・イテム類似性グラフ上の熱方程式を用いた一般化拡散過程を定義する。我々のフォワードプロセスは、グラフフィルタの高度なファミリとの相互作用信号を円滑にし、グラフ隣接性を推奨のための有益な事前知識として導入する。我々のリバースプロセスは、ノイズのない方法で遅延信号を反復的に洗練・シャープし、ユーザの履歴に基づいて更新を条件付け、慎重に設計された2段階のデノイザから計算し、高品質な再構築をもたらす。最後に、GiffCFは拡散モデルとグラフ信号処理の両方の利点を効果的に活用し、3つのベンチマークデータセットの最先端性能を実現することを示す。 Collaborative filtering is a critical technique in recommender systems. It has been increasingly viewed as a conditional generative task for user feedback data, where newly developed diffusion model shows great potential. However, existing studies on diffusion model lack effective solutions for modeling implicit feedback. Particularly, the standard isotropic diffusion process overlooks correlation between items, misaligned with the graphical structure of the interaction space. Meanwhile, Gaussian noise destroys personalized information in a user's interaction vector, causing difficulty in its reconstruction. In this paper, we adapt standard diffusion model and propose a novel Graph Signal Diffusion Model for Collaborative Filtering (named GiffCF). To better represent the correlated distribution of user-item interactions, we define a generalized diffusion process using heat equation on the item-item similarity graph. Our forward process smooths interaction signals with an advanced family of graph filters, introducing the graph adjacency as beneficial prior knowledge for recommendation. Our reverse process iteratively refines and sharpens latent signals in a noise-free manner, where the updates are conditioned on the user's history and computed from a carefully designed two-stage denoiser, leading to high-quality reconstruction. Finally, through extensive experiments, we show that GiffCF effectively leverages the advantages of both diffusion model and graph signal processing, and achieves state-of-the-art performance on three benchmark datasets.	翻訳日:2024-04-26 23:37:50 公開日:2024-04-24
# CARE:臨床文献から実験的発見を抽出する CARE: Extracting Experimental Findings From Clinical Literature ( http://arxiv.org/abs/2311.09736v2 ) ライセンス: Link先を確認	Aakanksha Naik, Bailey Kuehl, Erin Bransom, Doug Downey, Tom Hope,	(参考訳) 文学からきめ細かい実験結果を抽出することは、科学的応用に劇的な有用性をもたらすことができる。それまでの作業では、この問題の限られた側面のためのアノテーションスキーマとデータセットが開発され、現実の複雑さとニュアンスをキャプチャできなかった。バイオメディシンに焦点を当てたこの研究は、臨床所見を抽出するタスクのための新しいIEデータセットであるCAREを提示する。本研究では,非連続的なエンティティスパン,ネスト関係,可変arity n-ary関係,数値結果など,現在のIEシステムにおいて困難な現象を統一する,エンティティと属性間のn-ary関係として微細な発見をキャプチャーする新しいアノテーションスキーマを開発した。臨床治験と症例報告の2つの資料から,700件の抄録を広範囲に収集した。また,コンピュータ科学・材料科学分野へのスキーマの一般化可能性を示す。私たちはCAREで最新のIEシステムをベンチマークし、GPT4のようなモデルでさえ苦労していることを示した。文献を抽出・集約する研究を進めるため、我々の資源を解放する。 Extracting fine-grained experimental findings from literature can provide dramatic utility for scientific applications. Prior work has developed annotation schemas and datasets for limited aspects of this problem, failing to capture the real-world complexity and nuance required. Focusing on biomedicine, this work presents CARE -- a new IE dataset for the task of extracting clinical findings. We develop a new annotation schema capturing fine-grained findings as n-ary relations between entities and attributes, which unifies phenomena challenging for current IE systems such as discontinuous entity spans, nested relations, variable arity n-ary relations and numeric results in a single schema. We collect extensive annotations for 700 abstracts from two sources: clinical trials and case reports. We also demonstrate the generalizability of our schema to the computer science and materials science domains. We benchmark state-of-the-art IE systems on CARE, showing that even models such as GPT4 struggle. We release our resources to advance research on extracting and aggregating literature findings.	翻訳日:2024-04-26 23:37:50 公開日:2024-04-24
# 局所平衡仮定を超えた非平衡温度 The non-equilibrium temperature beyond local equilibrium assumption ( http://arxiv.org/abs/2311.11028v2 ) ライセンス: Link先を確認	Zheng-Chuan Wang,	(参考訳) 本論文では, 環境貯留層を輸送する荷電粒子に対する温度依存性フラソフ方程式による非平衡温度を提案する。新しい減衰力と逆減衰緩和時間は、輸送粒子の外部力と緩和時間に明らかな影響を及ぼすフラソフ方程式に基づいて導出される。輸送粒子の非平衡温度は, 貯留層の平衡温度と異なる平衡関数で定義される。輸送粒子と貯水池の間には、輸送粒子全体が非平衡状態であるため、熱伝達が存在する。最後に、外部電界下での1次元荷電粒子輸送の例を例に、私たちによって定義される非平衡温度と減衰力を数値的に示す。 In this manuscript, we propose a non-equilibrium temperature by a temperature dependent Vlasov equation for the charge particles transport through a environmental reservoir. A new damping force and a inverse damping relaxation time are derived based on the Vlasov equation, which have obvious influence on the external force and the relaxation time of transport particles. The non-equilibrium temperature for the transport particles is defined by their distribution function out of equilibrium, which is different from the equilibrium temperature of reservoir. There exists heat transfer between the transport particles and the reservoir, because the whole transport particles are in non-equilibrium state. Finally, we illustrate them by an example of one-dimensional charge particles transport under an external electric field, the non-equilibrium temperature and damping force defined by us are shown numerically.	翻訳日:2024-04-26 23:37:50 公開日:2024-04-24
# 大規模言語モデルを用いた視覚的ゼロショット学習の強化 Boosting Audio-visual Zero-shot Learning with Large Language Models ( http://arxiv.org/abs/2311.12268v2 ) ライセンス: Link先を確認	Haoxing Chen, Yaohui Li, Yan Hong, Zizheng Huang, Zhuoer Xu, Zhangxuan Gu, Jun Lan, Huijia Zhu, Weiqiang Wang,	(参考訳) 音声視覚ゼロショット学習は、ペア化された音声視覚シーケンスに基づいて、目に見えないクラスを認識することを目的としている。近年の手法は,クラス名に整合したマルチモーダルな特徴の学習に重点を置いており,カテゴリを発見できないような一般化能力の向上に寄与している。しかし、これらのアプローチはクラス名の不明瞭なイベント概念を無視し、必然的に訓練目的の難しい複雑なネットワーク構造を導入する可能性がある。本稿では,外部知識ベースを活用することで,新たなイベントコンテンツをより効果的に学習する上で有効なKDA(KnowleDge-Augmented Audio-Viual Learning)という,単純かつ効率的なフレームワークを提案する。具体的には、まず、大型言語モデル(LLM)に含まれる知識を利用して、イベントクラスの音声・視覚的特徴を識別する重要な記述文を生成することを提案する。さらに,類似した事象を識別し,未確認クラスへの一般化能力の向上を図るために,知識対応型適応マージン損失を提案する。広汎な実験結果から,提案したKDAは,一般的な3つのゼロショット学習データセットに対して,最先端の手法より優れており,我々のコードは \url{https://github.com/chenhaoxing/KDA} で検証可能であることがわかった。 Audio-visual zero-shot learning aims to recognize unseen classes based on paired audio-visual sequences. Recent methods mainly focus on learning multi-modal features aligned with class names to enhance the generalization ability to unseen categories. However, these approaches ignore the obscure event concepts in class names and may inevitably introduce complex network structures with difficult training objectives. In this paper, we introduce a straightforward yet efficient framework called KnowleDge-Augmented audio-visual learning (KDA), which aids the model in more effectively learning novel event content by leveraging an external knowledge base. Specifically, we first propose to utilize the knowledge contained in large language models (LLMs) to generate numerous descriptive sentences that include important distinguishing audio-visual features of event classes, which helps to better understand unseen categories. Furthermore, we propose a knowledge-aware adaptive margin loss to help distinguish similar events, further improving the generalization ability towards unseen classes. Extensive experimental results demonstrate that our proposed KDA can outperform state-of-the-art methods on three popular audio-visual zero-shot learning datasets.Our code will be avaliable at \url{https://github.com/chenhaoxing/KDA}.	翻訳日:2024-04-26 23:27:32 公開日:2024-04-24
# グラフの大規模言語モデルに関する調査 - 進展と今後の方向性 A Survey of Graph Meets Large Language Model: Progress and Future Directions ( http://arxiv.org/abs/2311.12399v4 ) ライセンス: Link先を確認	Yuhan Li, Zhixun Li, Peisong Wang, Jia Li, Xiangguo Sun, Hong Cheng, Jeffrey Xu Yu,	(参考訳) グラフは、引用ネットワーク、ソーシャルネットワーク、生物学的データといった現実世界のアプリケーションにおける複雑な関係を表現し分析する上で重要な役割を果たしている。近年,様々な領域で大きな成功を収めたLarge Language Models (LLM) もグラフ関連タスクに活用され,従来のグラフニューラルネットワーク(GNN)ベースの手法を超越し,最先端のパフォーマンスを実現している。本稿ではまず,LLMとグラフを統合する既存手法の総合的なレビューと分析を行う。まず,グラフ関連タスクにおいてLLMが果たす役割(エンハンサー,予測,アライメント)に基づいて,既存の手法を3つのカテゴリに分類する手法を提案する。次に、分類学の3つのカテゴリに沿って、代表的手法を体系的に調査する。最後に,既存の研究の残余の限界について論じ,今後の研究に期待できる道のりを強調した。関連する論文は要約され、一貫して更新される。 https://github.com/yhLeeee/Awesome-LLMs-in-Graph-tasks。 Graph plays a significant role in representing and analyzing complex relationships in real-world applications such as citation networks, social networks, and biological data. Recently, Large Language Models (LLMs), which have achieved tremendous success in various domains, have also been leveraged in graph-related tasks to surpass traditional Graph Neural Networks (GNNs) based methods and yield state-of-the-art performance. In this survey, we first present a comprehensive review and analysis of existing methods that integrate LLMs with graphs. First of all, we propose a new taxonomy, which organizes existing methods into three categories based on the role (i.e., enhancer, predictor, and alignment component) played by LLMs in graph-related tasks. Then we systematically survey the representative methods along the three categories of the taxonomy. Finally, we discuss the remaining limitations of existing studies and highlight promising avenues for future research. The relevant papers are summarized and will be consistently updated at: https://github.com/yhLeeee/Awesome-LLMs-in-Graph-tasks.	翻訳日:2024-04-26 23:27:32 公開日:2024-04-24
# 一次元モット絶縁体における電荷とエネルギー輸送の動的分離 Dynamical separation of charge and energy transport in one-dimensional Mott insulators ( http://arxiv.org/abs/2311.16234v2 ) ライセンス: Link先を確認	Frederik Møller, Botond C. Nagy, Márton Kormos, Gábor Takács,	(参考訳) 一次元モット絶縁体はシン・ゴルドンモデル(英語版)を用いて記述できるが、これは積分可能場の理論で、閉じ込められた超低温原子による最近の実現を含む、いくつかの1次元のギャップを持つ凝縮物質系の低エネルギーな効率的な記述を提供する。一般化流体力学の理論を用いて、このモデルがトポロジカル電荷対エネルギーの輸送の分離を示すことを示した。準粒子力学の解析により、分離の背後にあるメカニズムは、トポロジカルに荷電したキンク/アンチキンクの間の反射散乱であることが明らかになった。これらの散乱現象の影響は、強い結合と低温において最も顕著であり、準粒子の分布は反射散乱振幅と比較して狭い。この効果により、トポロジカル電荷に対する特徴的な形状の「ローヘッド」光円錐が生じる。 One-dimensional Mott insulators can be described using the sine-Gordon model, an integrable quantum field theory that provides the low-energy effective description of several one-dimensional gapped condensed matter systems, including recent realizations with trapped ultra-cold atoms. Employing the theory of Generalized Hydrodynamics, we demonstrate that this model exhibits separation of the transport of topological charge vs. energy. Analysis of the quasiparticle dynamics reveals that the mechanism behind the separation is the reflective scattering between topologically charged kinks/antikinks. The effect of these scattering events is most pronounced at strong coupling and low temperatures, where the distribution of quasiparticles is narrow compared to the reflective scattering amplitude. This effect results in a distinctively shaped "arrowhead" light cone for the topological charge.	翻訳日:2024-04-26 23:27:32 公開日:2024-04-24
# ヴァルシュニ・ヘルマンポテンシャルのエネルギー固有値の決定 Determination of the Energy Eigenvalues of the Varshni-Hellmann Potential ( http://arxiv.org/abs/2401.11151v4 ) ライセンス: Link先を確認	N. Tazimi,	(参考訳) 本稿では,バルシュニ・ヘルマンポテンシャルの有界状態問題を有用手法を用いて解く。本手法では, アンザッツ法によるヴァルシュニ・ヘルマンポテンシャルに対するシュロディンガー方程式の有界解を求める。エネルギー固有値と対応する固有関数を得る。また、地中におけるエネルギースペクトルの挙動と、2つの身体系の励起状態について図式的に示す。この結果と正確な数値との類似性は,本手法の効率性を示すものである。 In this paper, we solve the bound state problem for Varshni-Hellmann potential via a useful technique. In our technique, we obtain the bound state solution of the Schrodinger equation for the Varshni-Hellmann potential via ansatz method. We obtain the energy eigenvalues and the corresponding eigen-functions. Also, the behavior of the energy spectra for both the ground and the excited state of the two body systems is illustrated graphically. The similarity of our results to the accurate numerical values is indicative of the efficiency of our technique.	翻訳日:2024-04-26 23:27:32 公開日:2024-04-24
# 血圧データから学ぶ:7500万人の患者を対象にしたデモグラフィー Learning from Two Decades of Blood Pressure Data: Demography-Specific Patterns Across 75 Million Patient Encounters ( http://arxiv.org/abs/2402.01598v3 ) ライセンス: Link先を確認	Seyedeh Somayyeh Mousavi, Yuting Guo, Abeed Sarker, Reza Sameni,	(参考訳) 高血圧は世界的な健康上の問題であり、血圧(BP)動態の効果的なモニタリングと分析の必要性が強調されている。米国ジョージア州のエモリー・ヘルスケアで2000年から2022年の間に収集された2,054,462人の患者75,636,128件のBPデータから,人口統計学的に多様であった。性別,年齢,人種・民族の2変量BP (SBP) と糖尿病BP (DBP) の2変量変化の個体群別統計を比較検討した。分析の結果,雄は雌よりもBP濃度が高く,年齢とともにBPプロファイルが異なっていた。特に、平均的なSBPは年齢とともに常に上昇し、平均的なDBPは40歳以上のグループでピークとなる。調査された民族集団の中で、黒人はBPが極端に高く、標準偏差が大きい。また,SBPとDBPの集団レベルでの有意な相関がみられた。これらの結果は, 臨床診断における画像診断特異的BP分析の重要性を強調し, パーソナライズされた, 画像診断特異的医療介入の開発に有用な知見を提供する。 Hypertension is a global health concern with an increasing prevalence, underscoring the need for effective monitoring and analysis of blood pressure (BP) dynamics. We analyzed a substantial BP dataset comprising 75,636,128 records from 2,054,462 unique patients collected between 2000 and 2022 at Emory Healthcare in Georgia, USA, representing a demographically diverse population. We examined and compared population-wide statistics of bivariate changes in systolic BP (SBP) and diastolic BP (DBP) across sex, age, and race/ethnicity. The analysis revealed that males have higher BP levels than females and exhibit a distinct BP profile with age. Notably, average SBP consistently rises with age, whereas average DBP peaks in the forties age group. Among the ethnic groups studied, Blacks have marginally higher BPs and a greater standard deviation. We also discovered a significant correlation between SBP and DBP at the population level, a phenomenon not previously researched. These results emphasize the importance of demography-specific BP analysis for clinical diagnosis and provide valuable insights for developing personalized, demography-specific healthcare interventions.	翻訳日:2024-04-26 23:27:32 公開日:2024-04-24
# 量子反転:コヒーレント量子吸収器の一般理論 Quantum reversal: a general theory of coherent quantum absorbers ( http://arxiv.org/abs/2402.02502v2 ) ライセンス: Link先を確認	Mankei Tsang,	(参考訳) コヒーレント量子吸収器(コヒーレント量子吸収器、英: coherent quantum absorber)は、他の系によって放出される光子を吸収し、その系との絡み合いを保ちながら、様々な意味を持つ。この研究は、いわゆる逆条件を2つの系に対して提案することで、この概念を一般化する。逆条件は、ペッツ回収マップとクラウス演算子を含む簡潔な公式に厳密に沸騰させ、既存のコヒーレント吸収体の処理を合理化すると共に一般化する。 The fascinating concept of coherent quantum absorber - which can absorb any photon emitted by another system while maintaining entanglement with that system - has found diverse implications in open quantum system theory and quantum metrology. This work generalizes the concept by proposing the so-called reversal conditions for the two systems, in which a "reverser" coherently reverses any effect of the other system on a field. The reversal conditions are rigorously boiled down to concise formulas involving the Petz recovery map and Kraus operators, thereby generalizing as well as streamlining the existing treatments of coherent absorbers.	翻訳日:2024-04-26 23:27:32 公開日:2024-04-24
# 2次元ライドバーグ原子配列におけるアモルファス量子磁石 Amorphous quantum magnets in a two-dimensional Rydberg atom array ( http://arxiv.org/abs/2402.02852v2 ) ライセンス: Link先を確認	Sergi Julià-Farré, Joseph Vovrosh, Alexandre Dauphin,	(参考訳) アモルファス固体(アモルファス固体、すなわち、明確に定義された短距離特性を持つが、長距離秩序を持たない系)は、凝縮物質において重要な研究トピックである。結晶構造は結晶構造と異なることが知られているが、アモルファス材料における創発的な集団的挙動に関する多くのオープンな疑問がある。これは、数値シミュレーションが極めて困難である量子状態において特にそうである。本稿では,アナログ量子シミュレータを用いたアモルファス量子マグネットの探索を提案する。そこで我々はまず,IsingモデルのRydbergシミュレータに適したアモルファス量子磁石を生成するアルゴリズムを提案する。その後、半古典的手法を用いて、モデルの物理に関する予備的な知見を得る。特に強磁性相互作用では平均磁場位相図を計算し、線形スピン波理論を用いて励起の局在特性と動的構造因子を研究する。反強磁性相互作用では、アモルファス磁石は擬似アニールにより複雑な古典的エネルギー景観を示す。最後に,プログラム可能なツイーザアレイにおけるRydberg原子に基づく実験的な提案を概説し,古典的にシミュレートが難しい状態におけるアモルファス量子マグネットの研究への道を開く。 Amorphous solids, i.e., systems which feature well-defined short-range properties but lack long-range order, constitute an important research topic in condensed matter. While their microscopic structure is known to differ from their crystalline counterpart, there are still many open questions concerning the emergent collective behavior in amorphous materials. This is particularly the case in the quantum regime, where the numerical simulations are extremely challenging. In this article, we instead propose to explore amorphous quantum magnets with an analog quantum simulator. To this end, we first present an algorithm to generate amorphous quantum magnets, suitable for Rydberg simulators of the Ising model. Subsequently, we use semiclassical approaches to get a preliminary insight of the physics of the model. In particular, for ferromagnetic interactions, we calculate mean-field phase diagrams, and use the linear-spin-wave theory to study localization properties and dynamical structure factors of the excitations. For antiferromagnetic interactions, we show that amorphous magnets exhibit a complex classical energy landscape by means of simulated annealing. Finally, we outline an experimental proposal based on Rydberg atoms in programmable tweezer arrays, thus opening the road towards the study of amorphous quantum magnets in regimes difficult to simulate classically.	翻訳日:2024-04-26 23:27:32 公開日:2024-04-24
# 量子コンピューティング:ビジョンと課題 Quantum Computing: Vision and Challenges ( http://arxiv.org/abs/2403.02240v2 ) ライセンス: Link先を確認	Sukhpal Singh Gill, Oktay Cetinkaya, Stefano Marrone, Daniel Claudino, David Haunschild, Leon Schlote, Huaming Wu, Carlo Ottaviani, Xiaoyuan Liu, Sree Pragna Machupalli, Kamalpreet Kaur, Priyansh Arora, Ji Liu, Salman Shamshad, Ahmed Farouk, Houbing Herbert Song, Steve Uhlig, Kotagiri Ramamohanarao,	(参考訳) 量子コンピューティングの最近の発展は、絡み合い、重ね合わせ、その他の量子基本概念を用いており、従来の計算よりも大幅に処理上の利点をもたらす。これらの量子的特徴は、従来の計算手法では解けない多くの複雑な問題を解くのに役立つ。これらの問題には、量子力学、ロジスティクス、化学ベースの進歩、薬物設計、統計科学、持続可能なエネルギー、銀行、信頼性のある通信、量子化学工学などが含まれる。ここ数年、量子ソフトウェアやアルゴリズムの作成、量子ハードウェアの研究が目覚ましい進歩を遂げており、量子コンピュータの実現に向けて大きく進歩している。この分野に関する総合的な文献研究を行うことで、現状を把握し、量子コンピューティング業界で働く研究コミュニティからかなりの注意を必要とする未解決の問題を発見できるだろう。本稿では,量子コンピューティングの理解を深めるために,この領域における現在の研究に基づく基礎とビジョンについて考察する。本稿では,量子コンピュータハードウェアの最先端開発と量子暗号,量子ソフトウェア,高スケール性量子コンピュータの今後の進歩について論じる。量子技術の研究と開発における多くの潜在的な課題とエキサイティングな新しいトレンドが、より広範な議論のためにこの論文で強調されている。 The recent development of quantum computing, which uses entanglement, superposition, and other quantum fundamental concepts, can provide substantial processing advantages over traditional computing. These quantum features help solve many complex problems that cannot be solved with conventional computing methods. These problems include modeling quantum mechanics, logistics, chemical-based advances, drug design, statistical science, sustainable energy, banking, reliable communication, and quantum chemical engineering. The last few years have witnessed remarkable advancements in quantum software and algorithm creation and quantum hardware research, which has significantly advanced the prospect of realizing quantum computers. It would be helpful to have comprehensive literature research on this area to grasp the current status and find outstanding problems that require considerable attention from the research community working in the quantum computing industry. To better understand quantum computing, this paper examines the foundations and vision based on current research in this area. We discuss cutting-edge developments in quantum computer hardware advancement and subsequent advances in quantum cryptography, quantum software, and high-scalability quantum computers. Many potential challenges and exciting new trends for quantum technology research and development are highlighted in this paper for a broader debate.	翻訳日:2024-04-26 23:27:32 公開日:2024-04-24
# ヘッドマウントセンサを用いた実時間シミュレーションアバター Real-Time Simulated Avatar from Head-Mounted Sensors ( http://arxiv.org/abs/2403.06862v2 ) ライセンス: Link先を確認	Zhengyi Luo, Jinkun Cao, Rawal Khirodkar, Alexander Winkler, Jing Huang, Kris Kitani, Weipeng Xu,	(参考訳) 我々はAR/VRヘッドセットから得られた情報(ヘッドセットポーズとカメラ)からシミュレーションアバターを制御するSimXRを提案する。ヘッドマウントカメラの難易度のため、人間の体は視界から切り離され、従来の画像に基づく自我中心のポーズ推定が困難になる。一方、ヘッドセットのポーズは全身の動きに関する貴重な情報を提供するが、手や足の詳細は明らかになっていない。カメラでヘッドセットのポーズを合成するために、人型ロボットを制御してヘッドセットの動きをトラッキングし、入力画像を分析して身体の動きを決定する。体の一部が見えると、手足の動きは画像によって案内され、見えない場合は物理法則が制御器を誘導して可塑性運動を発生させる。我々は,中間表現に依存しないエンドツーエンドの手法を設計し,画像やヘッドセットのポーズから直接ヒューマノイド制御信号にマップする方法を学習する。また,市販のVRヘッドセット(Quest 2)と互換性のあるカメラ構成を用いて作成した大規模合成データセットを提案する。フレームワークの適用性を実証するため、前方カメラを備えたARヘッドセットでもテストしています。 We present SimXR, a method for controlling a simulated avatar from information (headset pose and cameras) obtained from AR / VR headsets. Due to the challenging viewpoint of head-mounted cameras, the human body is often clipped out of view, making traditional image-based egocentric pose estimation challenging. On the other hand, headset poses provide valuable information about overall body motion, but lack fine-grained details about the hands and feet. To synergize headset poses with cameras, we control a humanoid to track headset movement while analyzing input images to decide body movement. When body parts are seen, the movements of hands and feet will be guided by the images; when unseen, the laws of physics guide the controller to generate plausible motion. We design an end-to-end method that does not rely on any intermediate representations and learns to directly map from images and headset poses to humanoid control signals. To train our method, we also propose a large-scale synthetic dataset created using camera configurations compatible with a commercially available VR headset (Quest 2) and show promising results on real-world captures. To demonstrate the applicability of our framework, we also test it on an AR headset with a forward-facing camera.	翻訳日:2024-04-26 23:27:32 公開日:2024-04-24
# Across-Task Transferable Max-Value Entropy Search を用いた多要素ベイズ最適化 Multi-Fidelity Bayesian Optimization With Across-Task Transferable Max-Value Entropy Search ( http://arxiv.org/abs/2403.09570v2 ) ライセンス: Link先を確認	Yunchuan Zhang, Sangwoo Park, Osvaldo Simeone,	(参考訳) 多くのアプリケーションにおいて、ロジスティクスからエンジニアリングまで、設計者は、その目的が評価にコストがかかるブラックボックス関数の形で、一連の最適化タスクに直面している。例えば、デザイナは、時間とともに異なる学習タスクのために、ニューラルネットワークモデルのハイパーパラメータを調整する必要があるかもしれない。各候補解に対する目的関数を評価するのではなく、設計者は目的関数の近似にアクセスでき、高い忠実度評価はより大きなコストを伴う。既存のマルチフィデリティブラックボックス最適化戦略では、現在のタスクの最適値や解に関する情報を最大化することを目的として、候補解とフィデリティレベルを選択する。逐次最適化タスクが関連していると仮定すると,本論文では,現在のタスクに関する情報を取得する必要性と,将来のタスクに転送可能な情報収集の目標とのバランスをとる,新たな情報理論獲得機能を導入する。提案手法は,タスク間で伝達されるタスク間潜伏変数の共有を含む。実世界の実世界の実例にまたがる実験結果から,将来的な課題に適合する提案した提案手法が,十分な数のタスクを処理すれば,最適化効率を大幅に向上できることがわかった。 In many applications, ranging from logistics to engineering, a designer is faced with a sequence of optimization tasks for which the objectives are in the form of black-box functions that are costly to evaluate. For example, the designer may need to tune the hyperparameters of neural network models for different learning tasks over time. Rather than evaluating the objective function for each candidate solution, the designer may have access to approximations of the objective functions, for which higher-fidelity evaluations entail a larger cost. Existing multi-fidelity black-box optimization strategies select candidate solutions and fidelity levels with the goal of maximizing the information accrued about the optimal value or solution for the current task. Assuming that successive optimization tasks are related, this paper introduces a novel information-theoretic acquisition function that balances the need to acquire information about the current task with the goal of collecting information transferable to future tasks. The proposed method includes shared inter-task latent variables, which are transferred across tasks by implementing particle-based variational Bayesian updates. Experimental results across synthetic and real-world examples reveal that the proposed provident acquisition strategy that caters to future tasks can significantly improve the optimization efficiency as soon as a sufficient number of tasks is processed.	翻訳日:2024-04-26 23:27:32 公開日:2024-04-24
# NonGEMM Bench:非GEMMワークロードによる最新のMLワークロードのパフォーマンス水平性を理解する NonGEMM Bench: Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads ( http://arxiv.org/abs/2404.11788v2 ) ライセンス: Link先を確認	Rachid Karami, Hemanth Kota, Sheng-Chun Kao, Hyoukjun Kwon,	(参考訳) 機械学習(ML)オペレータは、さまざまなターゲットアプリケーションでMLモデルを設計するためのビルディングブロックである。 GEMM演算子は、MLモデルのバックボーンである。彼らは何十億もの乗算と累積を必要とする計算コストで有名だ。そのため,MLモデルの実行を高速化するため,GEMM演算子の研究と最適化に多大な努力が払われている。 GPUとアクセラレータは、GEMM演算子の実行を最適化することで、MLワークロードを高速化するために広くデプロイされている。それでも、非GEMM演算子の性能はGEMMほど徹底的に研究されていない。そこで本稿では,非GEMM演算子のベンチマークである \bench について述べる。まず、さまざまなドメインから人気のMLワークロードを使用して‘bench’を構築し、次に様々なグレードのGPUプラットフォーム上でケーススタディを行い、GPUアクセラレーションシステムにおける非GEMM演算子の挙動を分析する。最後に,GEMM と NonGEMM オペレータ間のギャップを埋める上で重要なポイントをいくつか提示し,新たな最適化の方向性をコミュニティに提供する。 Machine Learning (ML) operators are the building blocks to design ML models with various target applications. GEneral Matrix Multiplication (GEMM) operators are the backbone of ML models. They are notorious for being computationally expensive requiring billions of multiply-and-accumulate. Therefore, significant effort has been put to study and optimize the GEMM operators in order to speed up the execution of ML models. GPUs and accelerators are widely deployed to accelerate ML workloads by optimizing the execution of GEMM operators. Nonetheless, the performance of NonGEMM operators have not been studied as thoroughly as GEMMs. Therefore, this paper describes \bench, a benchmark to study NonGEMM operators. We first construct \bench using popular ML workloads from different domains, then perform case studies on various grade GPU platforms to analyze the behavior of NonGEMM operators in GPU accelerated systems. Finally, we present some key takeaways to bridge the gap between GEMM and NonGEMM operators and to offer the community with potential new optimization directions.	翻訳日:2024-04-26 23:27:32 公開日:2024-04-24
# カオスシステムのシミュレーションのためのハイブリッド量子古典型貯水池計算 Hybrid quantum-classical reservoir computing for simulating chaotic systems ( http://arxiv.org/abs/2311.14105v2 ) ライセンス: Link先を確認	Filip Wudarski, Daniel O`Connor, Shaun Geaney, Ata Akbari Asanjan, Max Wilson, Elena Strbac, P. Aaron Lott, Davide Venturelli,	(参考訳) カオスシステムの予測は特に複雑な作業であり、近年、システムの時空間情報を抽出するために用いられる固定ランダムウェイト(貯水池)を持つ再帰的ネットワークである貯水池コンピューティング(RC)を用いて、合理的に成功している。この研究は、RCの貯水池を量子回路に置き換える、ハイブリッド量子貯水池計算(HQRC)フレームワークを提案する。回路のモジュラ構造と測定フィードバックは、貯水池状態の複雑な系の力学を符号化するために使用され、そこから古典的な学習を行い、将来の力学を予測する。 HQRCのノイズレスシミュレーションは、ロレンツ63とダブルスクロールカオスのパラダイムシステムの両方の最先端の古典的RCモデルに匹敵する有効な予測時間を示し、予測が真実から逸脱してからずっと後のアトラクタダイナミクスに固執する。 Forecasting chaotic systems is a notably complex task, which in recent years has been approached with reasonable success using reservoir computing (RC), a recurrent network with fixed random weights (the reservoir) used to extract the spatio-temporal information of the system. This work presents a hybrid quantum reservoir-computing (HQRC) framework, which replaces the reservoir in RC with a quantum circuit. The modular structure and measurement feedback in the circuit are used to encode the complex system dynamics in the reservoir states, from which classical learning is performed to predict future dynamics. The noiseless simulations of HQRC demonstrate valid prediction times comparable to state-of-the-art classical RC models for both the Lorenz63 and double-scroll chaotic paradigmatic systems and adhere to the attractor dynamics long after the forecasts have deviated from the ground truth.	翻訳日:2024-04-26 23:17:45 公開日:2024-04-24
# DP-NMT:スケーラブルな微分代用機械翻訳 DP-NMT: Scalable Differentially-Private Machine Translation ( http://arxiv.org/abs/2311.14465v2 ) ライセンス: Link先を確認	Timour Igamberdiev, Doan Nam Long Vu, Felix Künnecke, Zhuo Yu, Jannik Holmer, Ivan Habernal,	(参考訳) ニューラルマシン翻訳(NMT)は、広く普及しているテキスト生成タスクであるが、NMTシステムに重大なデータプライバシー上の懸念があるにもかかわらず、プライバシを保存するNMTモデルの開発にはかなりの研究ギャップがある。 DP-SGDは、具体的なプライバシー保証のある機械学習モデルをトレーニングするための一般的な方法であるが、DP-SGDでモデルをトレーニングする実装仕様は、既存のモデルでは常に明確化されていない。これを解決するために,DP-SGDを用いてプライバシー保護NMTの研究を行うオープンソースフレームワークであるDP-NMTを導入し,多数のモデル,データセット,評価指標をひとつのソフトウェアパッケージにまとめる。我々のゴールは、DP-SGDアルゴリズムの具体的詳細を透過的かつ直感的に実装し、プライバシー保護型NMTシステムの開発を進めるためのプラットフォームを提供することです。一般的なドメインとプライバシ関連のドメインのデータセットに関する一連の実験を実施して、使用中のフレームワークを実演しています。フレームワークを公開し、コミュニティからのフィードバックを歓迎します。 Neural machine translation (NMT) is a widely popular text generation task, yet there is a considerable research gap in the development of privacy-preserving NMT models, despite significant data privacy concerns for NMT systems. Differentially private stochastic gradient descent (DP-SGD) is a popular method for training machine learning models with concrete privacy guarantees; however, the implementation specifics of training a model with DP-SGD are not always clarified in existing models, with differing software libraries used and code bases not always being public, leading to reproducibility issues. To tackle this, we introduce DP-NMT, an open-source framework for carrying out research on privacy-preserving NMT with DP-SGD, bringing together numerous models, datasets, and evaluation metrics in one systematic software package. Our goal is to provide a platform for researchers to advance the development of privacy-preserving NMT systems, keeping the specific details of the DP-SGD algorithm transparent and intuitive to implement. We run a set of experiments on datasets from both general and privacy-related domains to demonstrate our framework in use. We make our framework publicly available and welcome feedback from the community.	翻訳日:2024-04-26 23:17:45 公開日:2024-04-24
# ハミルトニアンシミュレーションによるオープン量子系のシミュレーション Simulating Open Quantum Systems Using Hamiltonian Simulations ( http://arxiv.org/abs/2311.15533v3 ) ライセンス: Link先を確認	Zhiyan Ding, Xiantao Li, Lin Lin,	(参考訳) 我々はリンドブラッド方程式をシミュレートする新しい方法を提案し、リンドブラッド力学、確率微分方程式、ハミルトンシミュレーションの関係を描いている。拡大ヒルベルト空間におけるユニタリ力学の列を導出し、リンドブラッド力学を任意の高次に近似することができる。このユニタリ表現は、ハミルトニアンシミュレーションとアンシラ量子ビットの追跡のみを含む量子回路を用いてシミュレートすることができる。測定結果に追加のポストセレクションは不要であり、各段階での成功確率が保証される。我々の手法は時間に依存した設定に直接一般化することができる。時間に依存しないリンドブレディアン力学と時間に依存しないリンドブレディアン力学の両方を3階まで精度良くシミュレートする数値例を提供する。 We present a novel method to simulate the Lindblad equation, drawing on the relationship between Lindblad dynamics, stochastic differential equations, and Hamiltonian simulations. We derive a sequence of unitary dynamics in an enlarged Hilbert space that can approximate the Lindblad dynamics up to an arbitrarily high order. This unitary representation can then be simulated using a quantum circuit that involves only Hamiltonian simulation and tracing out the ancilla qubits. There is no need for additional postselection in measurement outcomes, ensuring a success probability of one at each stage. Our method can be directly generalized to the time-dependent setting. We provide numerical examples that simulate both time-independent and time-dependent Lindbladian dynamics with accuracy up to the third order.	翻訳日:2024-04-26 23:17:45 公開日:2024-04-24
# ベアメタル埋込デバイスにおける汎用バイナリ機器のアブユージングプロセッサ例外 Abusing Processor Exception for General Binary Instrumentation on Bare-metal Embedded Devices ( http://arxiv.org/abs/2311.16532v2 ) ライセンス: Link先を確認	Shipei Qu, Xiaolin Zhang, Chi Zhang, Dawu Gu,	(参考訳) 組込みシステムにおけるクローズドソースドライバとライブラリのセキュリティの分析は、サプライチェーンにおけるその基本的な役割を考えると、非常に重要である。 x86とは異なり、組み込みプラットフォームには包括的なバイナリ操作ツールがないため、研究者や開発者がそのようなクローズドソースコンポーネントのセキュリティ問題を効果的に検出しパッチするのは難しい。既存の作業は、本格的なオペレーティングシステム機能に依存するか、面倒なコーナーケースに悩まされ、組み込み環境で普及しているベアメタルファームウェアにアプリケーションを制限している。本稿では,埋め込まれたベアメタルファームウェアに対して,汎用的できめ細かな静的バイナリ・インスツルメンテーションを可能にするPIFER(Practical Instrumenting Framework for Embedded fiRmware)を提案する。組み込みプロセッサのハードウェア例外処理機構を悪用することにより、PIFERは任意のターゲットアドレスに対してインスツルメンテーションを行うことができる。さらに,修正後のファームウェアの正しい実行を保証するための命令翻訳方式を提案する。我々は、Zephyr RTOS、CoreMarkベンチマーク、およびクローズソースの商用製品を含む、現実世界の複雑なファームウェアに対してPIFERを評価した。結果は、PIFERが98.9%の指示を正しく測定したことを示している。さらに,本研究の実用性と効率性を示す総合的な性能評価を行った。 Analyzing the security of closed-source drivers and libraries in embedded systems holds significant importance, given their fundamental role in the supply chain. Unlike x86, embedded platforms lack comprehensive binary manipulating tools, making it difficult for researchers and developers to effectively detect and patch security issues in such closed-source components. Existing works either depend on full-fledged operating system features or suffer from tedious corner cases, restricting their application to bare-metal firmware prevalent in embedded environments. In this paper, we present PIFER (Practical Instrumenting Framework for Embedded fiRmware) that enables general and fine-grained static binary instrumentation for embedded bare-metal firmware. By abusing the built-in hardware exception-handling mechanism of the embedded processors, PIFER can perform instrumentation on arbitrary target addresses. Additionally, We propose an instruction translation-based scheme to guarantee the correct execution of the original firmware after patching. We evaluate PIFER against real-world, complex firmware, including Zephyr RTOS, CoreMark benchmark, and a close-sourced commercial product. The results indicate that PIFER correctly instrumented 98.9% of the instructions. Further, a comprehensive performance evaluation was conducted, demonstrating the practicality and efficiency of our work.	翻訳日:2024-04-26 23:17:45 公開日:2024-04-24
# EAGLES: 軽量エンコーディングによる効率的な3Dガウスの高速化 EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS ( http://arxiv.org/abs/2312.04564v2 ) ライセンス: Link先を確認	Sharath Girish, Kamal Gupta, Abhinav Shrivastava,	(参考訳) 近年,3次元ガウシアンスプラッティング(3D-GS)が新規シーン合成で人気を博している。これは、Neural Radiance Fields(NeRF)に関連する、長いトレーニング時間と遅いレンダリング速度の課題に対処する。 3Dガウスの高速かつ微分可能なラスタ化により、3D-GSはリアルタイムレンダリングと高速トレーニングを実現する。しかし、トレーニングとストレージの両方にかなりのメモリリソースを必要とするため、各シーンに何百万人ものガウシアンが必要なのだ。本稿では,ガウス点雲の高速で安定な最適化のために,量子埋め込みを利用してポイント単位のメモリ記憶要求を大幅に削減する手法を提案する。提案手法では,ガウスの少ないシーン表現が実現し,高速なトレーニング時間と高解像度シーンのリアルタイムレンダリングのためのレンダリング速度が向上する。復元品質を維持しながら、記憶容量を1桁以上削減する。 10～20倍少ないメモリと高速なトレーニング/推論速度を消費しながら、視覚的品質を保ったさまざまなデータセットやシーンに対するアプローチの有効性を検証する。プロジェクトページとコードはhttps://efficientgaussian.github.ioで入手できる。 Recently, 3D Gaussian splatting (3D-GS) has gained popularity in novel-view scene synthesis. It addresses the challenges of lengthy training times and slow rendering speeds associated with Neural Radiance Fields (NeRFs). Through rapid, differentiable rasterization of 3D Gaussians, 3D-GS achieves real-time rendering and accelerated training. They, however, demand substantial memory resources for both training and storage, as they require millions of Gaussians in their point cloud representation for each scene. We present a technique utilizing quantized embeddings to significantly reduce per-point memory storage requirements and a coarse-to-fine training strategy for a faster and more stable optimization of the Gaussian point clouds. Our approach develops a pruning stage which results in scene representations with fewer Gaussians, leading to faster training times and rendering speeds for real-time rendering of high resolution scenes. We reduce storage memory by more than an order of magnitude all while preserving the reconstruction quality. We validate the effectiveness of our approach on a variety of datasets and scenes preserving the visual quality while consuming 10-20x lesser memory and faster training/inference speed. Project page and code is available https://efficientgaussian.github.io	翻訳日:2024-04-26 23:17:45 公開日:2024-04-24
# Genixer: 強力なデータジェネレータとしてのマルチモーダル大言語モデル Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator ( http://arxiv.org/abs/2312.06731v4 ) ライセンス: Link先を確認	Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou,	(参考訳) インストラクションチューニングデータは、MLLM(Multimodal Large Language Models)のトレーニングに不可欠である。しかし、高品質なチューニングチューニングデータの作成には大きな課題がある。命令チューニングデータのラベル付けを人間に依頼することは、ラベル集約的で時間を要する。データ生成のためにGPT-4に誘導されたいくつかの作業は、コストがかかるだけでなく、複雑なタスク(グラウンドベース推論タスク)で満足なパフォーマンスが欠如していた。データ作成の課題に対処するため,ユーザ命令に従うことで命令調整データを生成する能力を備えたMLLMの強化の可能性について,まず検討する。具体的には,9つの代表的なタスク,例えば,共通VQA,REC,REG,PointQを含む,高品質な命令チューニングデータを生成する革新的なデータ生成パイプラインGenixerを開発した。 Genixerは4つの重要なステップでデータ生成に統一されたソリューションを提供する。 (i)命令データ収集 (ii) 命令テンプレートの設計三 MLLMの強化、及び (iv)データ生成とフィルタリング。生成データの有効性を検証するため,人体評価とユーザ嗜好調査を行い,生成データの品質評価を行った。その後、LLaVA1.5とShikraという2つの代表MLLMのトレーニングのための2つの命令チューニングデータセットを生成し、様々なVQAタスクとマルチモーダルベンチマークで一貫した改善を行った。例えば、VizWizベンチマークのパフォーマンスは50.0%から53.8%に向上し、ScienceQAでは66.8%から69.7%に向上した。データ、コード、モデルがリリースされる。 Instruction tuning data is essential for training the Multimodal Large Language Models (MLLMs). However, the creation of high-quality instruction tuning data presents significant challenges. Asking the human to label the instruction tuning data is label-intensive and time-consuming. Some works prompted to GPT-4 for data generation were not only costly but also lacked satisfactory performance in complex tasks (i.e., grounding-based reasoning tasks). To address the challenges of data creation, we are the first to explore the potential of empowering MLLMs with the ability to generate instruction-tuning data by following user instructions. Specifically, we developed an innovative data generation pipeline Genixer to generate various high-quality instruction tuning data, including nine representative tasks, e.g., Common VQA, REC, REG, and PointQ. Genixer provides a unified solution for data generation with four key steps: (i) instruction data collection, (ii) instruction template design, (iii) empowering MLLM, and (iv) data generation and filtering. To validate the effectiveness of generated data, we conducted the human evaluation and user preference study to assess the quality of generated data. Subsequently, we generated two instruction-tuning datasets for the training of two representative MLLMs, LLaVA1.5 and Shikra, and noted consistent improvements across various VQA tasks and multimodal benchmarks. For instance, performance on the VizWiz benchmark improved from 50.0% to 53.8%, and on ScienceQA, it increased from 66.8% to 69.7%, reconfirming the quality of the generated instruction tuning data. The data, code, and models will be released.	翻訳日:2024-04-26 23:17:45 公開日:2024-04-24
# データストリームの動的性質を考慮した条件付き教師なし回帰フレームワーク A Conditioned Unsupervised Regression Framework Attuned to the Dynamic Nature of Data Streams ( http://arxiv.org/abs/2312.07682v2 ) ライセンス: Link先を確認	Rene Richard, Nabil Belacel,	(参考訳) リアルタイムラベルの取得が困難である場合、従来の手法では、サブ最適性能が得られる。本稿では,制限付きラベル付きデータを用いたストリーミング環境の最適戦略を提案し,教師なし回帰のための適応手法を提案する。提案手法は,初期ラベルのスパースセットを活用し,データの進化パターンに応答して動的モデル適応を可能にする,革新的なドリフト検出機構を導入する。適応性を高めるために,Adaptive WINdowingアルゴリズムとRoot Mean Square Error (RMSE)に基づく誤り一般化アルゴリズムを統合する。 ADWINはリアルタイムドリフト検出を容易にし、RMSEはモデル予測精度のロバストな測度を提供する。この組み合わせにより、高レベルの予測精度を維持しつつ、パターンの変化に継続的に適応しながら、ストリーミングデータの課題を効果的にナビゲートすることが可能になります。各種公開データセットを対象とした多変量法の性能評価を行い, 適応しないベースラインと比較した。包括的評価を通じて、リアルタイムにラベルを取得することが重要な課題となるタスクに対する適応回帰手法の優れた効果を実証する。その結果、従来のアプローチよりも優れ、ラベルの不足とデータパターンの進化を特徴とするシナリオにおいて、その可能性を強調した。 In scenarios where obtaining real-time labels proves challenging, conventional approaches may result in sub-optimal performance. This paper presents an optimal strategy for streaming contexts with limited labeled data, introducing an adaptive technique for unsupervised regression. The proposed method leverages a sparse set of initial labels and introduces an innovative drift detection mechanism to enable dynamic model adaptations in response to evolving patterns in the data. To enhance adaptability, we integrate the ADWIN (ADaptive WINdowing) algorithm with error generalization based on Root Mean Square Error (RMSE). ADWIN facilitates real-time drift detection, while RMSE provides a robust measure of model prediction accuracy. This combination enables our multivariate method to effectively navigate the challenges of streaming data, continuously adapting to changing patterns while maintaining a high level of predictive precision. We evaluate the performance of our multivariate method across various public datasets, comparing it to non-adapting baselines. Through comprehensive assessments, we demonstrate the superior efficacy of our adaptive regression technique for tasks where obtaining labels in real-time is a significant challenge. The results underscore the method's capacity to outperform traditional approaches and highlight its potential in scenarios characterized by label scarcity and evolving data patterns.	翻訳日:2024-04-26 23:17:45 公開日:2024-04-24
# 連続時間動的グラフに対するマルチパースペクティブフィードバック・アテンション結合モデル Multi-perspective Feedback-attention Coupling Model for Continuous-time Dynamic Graphs ( http://arxiv.org/abs/2312.07983v2 ) ライセンス: Link先を確認	Xiaobo Zhu, Yan Wu, Zhipeng Li, Hailong Su, Jin Che, Zhanheng Chen, Liying Wang,	(参考訳) 近年,グラフネットワーク上での表現学習が普及し,様々なモデルが有望な結果を示している。それにもかかわらず、いくつかの課題が続いている。 1) ほとんどのメソッドは静的あるいは離散時間動的グラフ用に設計されている。 2) 既存の連続時間動的グラフアルゴリズムは、単一の進化的な視点に焦点をあてる。 3) 多くの連続時間動的グラフアプローチは、長期依存を捉えるために多くの時間的隣人を必要とします。本稿では,MPFA(Multi-Perspective Feedback-Attention Coupling)モデルを提案する。 MPFAは進化と生の両方の観点から情報を取り入れ、観察されたプロセスのインターリーブされたダイナミクスを効率的に学習する。進化する視点は、情報集約のために継続的に進化する時間的隣人を区別するために、時間的自己意識を用いる。動的更新を通じて、この視点は少数の時間的隣人を使用して長期的な依存関係をキャプチャすることができる。一方、生の視点は生の近傍情報を集約するために、成長特性係数を持つフィードバックアテンションモジュールを利用する。自己組織型データセットと7つの公開データセットの実験結果から,提案モデルの有効性と競争性を検証した。 Recently, representation learning over graph networks has gained popularity, with various models showing promising results. Despite this, several challenges persist: 1) most methods are designed for static or discrete-time dynamic graphs; 2) existing continuous-time dynamic graph algorithms focus on a single evolving perspective; and 3) many continuous-time dynamic graph approaches necessitate numerous temporal neighbors to capture long-term dependencies. In response, this paper introduces the Multi-Perspective Feedback-Attention Coupling (MPFA) model. MPFA incorporates information from both evolving and raw perspectives, efficiently learning the interleaved dynamics of observed processes. The evolving perspective employs temporal self-attention to distinguish continuously evolving temporal neighbors for information aggregation. Through dynamic updates, this perspective can capture long-term dependencies using a small number of temporal neighbors. Meanwhile, the raw perspective utilizes a feedback attention module with growth characteristic coefficients to aggregate raw neighborhood information. Experimental results on a self-organizing dataset and seven public datasets validate the efficacy and competitiveness of our proposed model.	翻訳日:2024-04-26 23:17:45 公開日:2024-04-24
# PTT:高能率時間3次元物体検出のためのポイントトラジェクトリ変換器 PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection ( http://arxiv.org/abs/2312.08371v2 ) ライセンス: Link先を確認	Kuan-Chih Huang, Weijie Lyu, Ming-Hsuan Yang, Yi-Hsuan Tsai,	(参考訳) 近年の時空間LiDARを用いた3Dオブジェクト検出器は,2段階提案に基づく提案手法により有望な性能を実現している。それらは第1段階の高密度検出器から3Dボックス候補を生成し、その後に異なる時間的集約法を生成する。しかしながら、これらのアプローチはフレーム単位のオブジェクトや全体点のクラウドを必要とし、メモリバンクの利用に関する課題を提起する。さらに、点雲と軌道特徴は結合のみに基づいて結合され、それら間の効果的な相互作用を無視する可能性がある。本稿では,時間的3次元物体検出を効率的に行うために,長期記憶が可能なポイントトラジェクトリトランスを提案する。この目的のために、メモリバンクストレージの必要量を最小限に抑えるために、現在のフレームオブジェクトとその履歴トラジェクトリのポイントクラウドのみを入力として利用する。さらに,トラジェクトリ機能をエンコードするモジュールを導入し,長期的かつ将来的な視点に着目し,ポイントクラウド機能で効果的に集約する。我々は、大規模Waymoデータセットの広範な実験を行い、我々のアプローチが最先端の手法に対してうまく機能することを実証した。コードとモデルはhttps://github.com/kuanchihhuang/PTT.comで公開される。 Recent temporal LiDAR-based 3D object detectors achieve promising performance based on the two-stage proposal-based approach. They generate 3D box candidates from the first-stage dense detector, followed by different temporal aggregation methods. However, these approaches require per-frame objects or whole point clouds, posing challenges related to memory bank utilization. Moreover, point clouds and trajectory features are combined solely based on concatenation, which may neglect effective interactions between them. In this paper, we propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection. To this end, we only utilize point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement. Furthermore, we introduce modules to encode trajectory features, focusing on long short-term and future-aware perspectives, and then effectively aggregate them with point cloud features. We conduct extensive experiments on the large-scale Waymo dataset to demonstrate that our approach performs well against state-of-the-art methods. Code and models will be made publicly available at https://github.com/kuanchihhuang/PTT.	翻訳日:2024-04-26 23:17:45 公開日:2024-04-24
# 3次元生成モデルのためのモザイクSDF Mosaic-SDF for 3D Generative Models ( http://arxiv.org/abs/2312.09222v2 ) ライセンス: Link先を確認	Lior Yariv, Omri Puny, Natalia Neverova, Oran Gafni, Yaron Lipman,	(参考訳) 現在の3次元形状の拡散モデルまたはフローベース生成モデルは、事前訓練された2次元画像拡散モデルを蒸留し、3次元形状を直接訓練する。拡散モデルや流れモデルを3次元形状で訓練する場合、重要な設計選択は形状表現である。効果的な形状表現は、3つの設計原則に従う必要がある: 大きな3Dデータセットを表現形式に効率的に変換すること; 近似パワーとパラメータの数との良好なトレードオフを提供すること; 既存の強力なニューラルネットワークアーキテクチャと互換性のある単純なテンソル形式を持つこと。体積格子や点雲のような標準的な3次元形状表現はこれらすべての原則を同時に従わないが、本稿では新しい表現を提唱する。モーザイクSDF(M-SDF: Mosaic-SDF)は、形状境界付近に広がる局所格子を用いて、与えられた形状の符号距離関数(SDF)を近似した単純な3次元形状表現である。 M-SDF表現は、個々の形状に対して、容易に並列化できるように高速に計算でき、形状の境界付近の空間のみをカバーするため、パラメータ効率が良く、トランスフォーマーベースのアーキテクチャと互換性のある単純な行列形式を持つ。我々は,M-SDF表現の有効性を実演し,M-SDF表現を用いて3Dウェアハウスデータセットを用いたクラス条件付き生成と約600k字幕形状のデータセットを用いたテキストから3D生成を含む3次元生成フローモデルを訓練した。 Current diffusion or flow-based generative models for 3D shapes divide to two: distilling pre-trained 2D image diffusion models, and training directly on 3D shapes. When training a diffusion or flow models on 3D shapes a crucial design choice is the shape representation. An effective shape representation needs to adhere three design principles: it should allow an efficient conversion of large 3D datasets to the representation form; it should provide a good tradeoff of approximation power versus number of parameters; and it should have a simple tensorial form that is compatible with existing powerful neural architectures. While standard 3D shape representations such as volumetric grids and point clouds do not adhere to all these principles simultaneously, we advocate in this paper a new representation that does. We introduce Mosaic-SDF (M-SDF): a simple 3D shape representation that approximates the Signed Distance Function (SDF) of a given shape by using a set of local grids spread near the shape's boundary. The M-SDF representation is fast to compute for each shape individually making it readily parallelizable; it is parameter efficient as it only covers the space around the shape's boundary; and it has a simple matrix form, compatible with Transformer-based architectures. We demonstrate the efficacy of the M-SDF representation by using it to train a 3D generative flow model including class-conditioned generation with the 3D Warehouse dataset, and text-to-3D generation using a dataset of about 600k caption-shape pairs.	翻訳日:2024-04-26 23:08:01 公開日:2024-04-24
# 2レベル量子システムと物理空間の接続の確立に向けて Towards establishing a connection between two-level quantum systems and physical spaces ( http://arxiv.org/abs/2312.09270v2 ) ライセンス: Link先を確認	V. G. Valle, L. L. Brugger, B. F. Rizzuti, Cristhiano Duarte,	(参考訳) この研究は、ヒルベルト空間における対応する記述(状態として)を用いて、2レベル量子系の準備の間の運用上の接続を明確にすることを目的としている。これは時代遅れに聞こえるかもしれませんが、一般的な感覚以上の関連性があることが、私たちを信じさせます。これら2つの分離された領域(実際の実験室と状態空間)を橋渡しするために、私たちはパラダイム的な数学的対象であるホップフィブレーション(Hopf fibration)に依存している。この接続が簡単な光学的設定で実際にどのように機能するかを説明する。興味深いことに、この光学装置は球体を覆うために2つのチャートを使う必要があることを反映している。別の言い方をすれば、実験的な実現は滑らかな多様体と見なされる球体の2次元性を反映している。 This work seeks to make explicit the operational connection between the preparation of two-level quantum systems with their corresponding description (as states) in a Hilbert space. This may sound outdated, but we show there is more to this connection than common sense may lead us to believe. To bridge these two separated realms -- the actual laboratory and the space of states -- we rely on a paradigmatic mathematical object: the Hopf fibration. We illustrate how this connection works in practice with a simple optical setup. Remarkably, this optical setup also reflects the necessity of using two charts to cover a sphere. Put another way, our experimental realization reflects the bi-dimensionality of a sphere seen as a smooth manifold.	翻訳日:2024-04-26 23:08:00 公開日:2024-04-24
# 意味-数値ギャップのブリッジ:材料特性予測のためのクロスモーダル知識グラフの数値推論法 Bridging the Semantic-Numerical Gap: A Numerical Reasoning Method of Cross-modal Knowledge Graph for Material Property Prediction ( http://arxiv.org/abs/2312.09744v2 ) ライセンス: Link先を確認	Guangxuan Song, Dongmei Fu, Zhongwei Qiu, Zijiang Yang, Jiaxin Dai, Lingwei Ma, Dawei Zhang,	(参考訳) 機械学習(ML)技術を用いて材料特性を予測することが重要な研究トピックである。これらの性質は数値データと意味要因に依存する。小さなサンプルデータセットの制限のため、既存の手法では一般的にMLアルゴリズムを使用して数値特性を回帰したり、トレーニング済みの知識グラフ(KG)を素材に転送する。しかし,これらの手法は意味情報と数値情報を同時に扱うことはできない。本稿では,意味ノードと数値プロキシノードを用いたクロスモーダルKGを構成する材料KG(NR-KG)の数値解析手法を提案する。 KGを標準KGに投影することで、両方のタイプの情報をキャプチャし、グラフニューラルネットワークを使用して材料特性を予測する。このプロセスでは,数値情報から意味的特徴を抽出するために,新しい予測予測損失を提案する。 NR-KGは、小さなサンプルデータセットにおけるクロスモーダルデータ、マイニング関係、クロスモーダル情報のエンドツーエンド処理を容易にし、価値ある実験データを十分に活用して、材料予測を強化する。さらに、意味記述を伴う2つの新しい高エントロピー合金(HEA)特性データセットを提案する。 NR-KGは最先端のSOTA(State-of-the-art)法より優れており、2つの材料データセットに対して25.9%と16.1%の相対的な改善を達成している。さらに、NR-KGは2つの公共物理化学分子データセットのSOTA法を超越し、22.2%と54.3%の改善を示し、その可能性と一般化性を強調している。提案されたデータセット、アルゴリズム、および事前訓練されたモデルが、材料のためのKGとAIのコミュニティを促進することを願っている。 Using machine learning (ML) techniques to predict material properties is a crucial research topic. These properties depend on numerical data and semantic factors. Due to the limitations of small-sample datasets, existing methods typically adopt ML algorithms to regress numerical properties or transfer other pre-trained knowledge graphs (KGs) to the material. However, these methods cannot simultaneously handle semantic and numerical information. In this paper, we propose a numerical reasoning method for material KGs (NR-KG), which constructs a cross-modal KG using semantic nodes and numerical proxy nodes. It captures both types of information by projecting KG into a canonical KG and utilizes a graph neural network to predict material properties. In this process, a novel projection prediction loss is proposed to extract semantic features from numerical information. NR-KG facilitates end-to-end processing of cross-modal data, mining relationships and cross-modal information in small-sample datasets, and fully utilizes valuable experimental data to enhance material prediction. We further propose two new High-Entropy Alloys (HEA) property datasets with semantic descriptions. NR-KG outperforms state-of-the-art (SOTA) methods, achieving relative improvements of 25.9% and 16.1% on two material datasets. Besides, NR-KG surpasses SOTA methods on two public physical chemistry molecular datasets, showing improvements of 22.2% and 54.3%, highlighting its potential application and generalizability. We hope the proposed datasets, algorithms, and pre-trained models can facilitate the communities of KG and AI for materials.	翻訳日:2024-04-26 23:08:00 公開日:2024-04-24
# 高次元構成空間の可視化:包括的解析的アプローチ Visualizing High-Dimensional Configuration Spaces: A Comprehensive Analytical Approach ( http://arxiv.org/abs/2312.10918v2 ) ライセンス: Link先を確認	Jorge Ocampo Jimenez, Wael Suleiman,	(参考訳) 構成空間Cの表現は、状態の衝突チェックに計算時間の大半が費やされるサンプリングベースモーションプランナーのための衝突のない経路の発見を加速する上で重要な役割を担っている。伝統的に、プランナーは衝突チェッカーを用いて衝突のない経路を限定的に評価したり、可視化のためにCの次元を小さくすることでCの表現を評価する。しかし、衝突チェッカーは、元のCのサブセットだけが表現されている場合でも高い精度を示すことができ、また、移動プランナーが元のCのパスに匹敵するパスを見つける能力を制限することができる。本稿では,マニピュレータロボットの高次元Cs表現を2次元形式で可視化するための新しい手法を提案する。元の寸法を小さくすることなく高次元Cs近似の定性的評価を行うための新しいツールを提供する。これにより、2つの異なる高次元Cの精度とカバレッジを比較する能力が向上する。マニピュレータロボットのキネマティックチェーンと人間の色知覚を利用して,マニピュレータロボットの7自由度CSを用いて,本手法の有効性を示す。この可視化は、ロボットの関節の境界と衝突状態の組み合わせのカバレッジに関する質的な洞察を、元のデータの次元性を低下させることなく提供する。本主張を支持するために,提案した可視化の数値的な評価を行う。 The representation of a Configuration Space C plays a vital role in accelerating the finding of a collision-free path for sampling-based motion planners where the majority of computation time is spent in collision checking of states. Traditionally, planners evaluate C's representations through limited evaluations of collision-free paths using the collision checker or by reducing the dimensionality of C for visualization. However, a collision checker may indicate high accuracy even when only a subset of the original C is represented; limiting the motion planner's ability to find paths comparable to those in the original C. Additionally, dealing with high-dimensional Cs is challenging, as qualitative evaluations become increasingly difficult in dimensions higher than three, where reduced-dimensional C evaluation may decrease accuracy in cluttered environments. In this paper, we present a novel approach for visualizing representations of high-dimensional Cs of manipulator robots in a 2D format. We provide a new tool for qualitative evaluation of high-dimensional Cs approximations without reducing the original dimension. This enhances our ability to compare the accuracy and coverage of two different high-dimensional Cs. Leveraging the kinematic chain of manipulator robots and human color perception, we show the efficacy of our method using a 7-degree-of-freedom CS of a manipulator robot. This visualization offers qualitative insights into the joint boundaries of the robot and the coverage of collision state combinations without reducing the dimensionality of the original data. To support our claim, we conduct a numerical evaluation of the proposed visualization.	翻訳日:2024-04-26 23:08:00 公開日:2024-04-24
# 言語可能な空間オントロジーによる屋内・屋外3次元シーングラフ生成 Indoor and Outdoor 3D Scene Graph Generation via Language-Enabled Spatial Ontologies ( http://arxiv.org/abs/2312.11713v2 ) ライセンス: Link先を確認	Jared Strader, Nathan Hughes, William Chen, Alberto Speranzon, Luca Carlone,	(参考訳) 本稿では,任意の屋内環境と屋外環境に3次元シーングラフを構築する手法を提案する。このような拡張は困難であり、屋外環境を記述する概念の階層は屋内よりも複雑であり、手動でそのような階層を定義するのは時間を要するためスケールしない。さらに、トレーニングデータの欠如は、屋内環境で使用される学習ツールの直接的な適用を妨げている。これらの課題に対処するため、我々は2つの新しい拡張を提案する。まず,室内と屋外のロボット操作に関連する概念と関係を定義する空間オントロジーを構築する手法を開発する。特に、そのようなオントロジーを構築するためにLLM(Large Language Model)を使用します。第2に、論理テンソルネットワーク(LTN)を用いた3次元シーングラフ構築のための空間オントロジーを活用し、論理ルールや公理(例えば「砂を含むビーチ」)を付加することで、トレーニング時に追加の監視信号を提供し、ラベル付きデータの必要性を低減し、より良い予測を提供し、トレーニング時に見つからない概念の予測を可能にする。室内環境,農村環境,沿岸環境など,さまざまなデータセットを用いて本手法を検証した結果,微少な注釈付きデータによる3Dシーングラフ生成の品質向上が確認できた。 This paper proposes an approach to build 3D scene graphs in arbitrary indoor and outdoor environments. Such extension is challenging; the hierarchy of concepts that describe an outdoor environment is more complex than for indoors, and manually defining such hierarchy is time-consuming and does not scale. Furthermore, the lack of training data prevents the straightforward application of learning-based tools used in indoor settings. To address these challenges, we propose two novel extensions. First, we develop methods to build a spatial ontology defining concepts and relations relevant for indoor and outdoor robot operation. In particular, we use a Large Language Model (LLM) to build such an ontology, thus largely reducing the amount of manual effort required. Second, we leverage the spatial ontology for 3D scene graph construction using Logic Tensor Networks (LTN) to add logical rules, or axioms (e.g., "a beach contains sand"), which provide additional supervisory signals at training time thus reducing the need for labelled data, providing better predictions, and even allowing predicting concepts unseen at training time. We test our approach in a variety of datasets, including indoor, rural, and coastal environments, and show that it leads to a significant increase in the quality of the 3D scene graph generation with sparsely annotated data.	翻訳日:2024-04-26 23:08:00 公開日:2024-04-24
# マトリックス生成状態をもつフェルミオン回路の高速エミュレーション Fast emulation of fermionic circuits with matrix product states ( http://arxiv.org/abs/2312.17657v4 ) ライセンス: Link先を確認	Justin Provazza, Klaas Gunst, Huanchen Zhai, Garnet K. -L. Chan, Toru Shiozaki, Nicholas C. Rubin, Alec F. White,	(参考訳) 本稿では,Fermionic Quantum Emulator (FQE)ソフトウェアライブラリのMPS拡張について述べる。本稿では、スピン1/2フェルミオンの多体波動関数を近似するための対称性適応行列積状態の理論について論じ、FQEインタフェース(MPS-FQE)のオープンソース実装について述べる。このソフトウェアは、ほとんどの基本的なテンソル操作にオープンソースのpyblock3とBlock2ライブラリを使用し、FQEのドロップイン代替として、より大きなフェルミオン回路のより効率的で近似的なエミュレーションを可能にする。最後に、大規模システムの近似的なエミュレーションが期待できる、短期的および耐故障性量子アルゴリズムの両方に関連するいくつかの応用について、量子位相推定のための状態準備戦略の評価、異なる変分量子固有解法Ans\atzeのテスト、トロッター誤差の数値評価、一般的な量子力学問題のシミュレーションを示す。これらすべての例において、MPS-FQEによる近似エミュレーションにより、フルステートベクターエミュレータでアクセス可能なシステムよりもはるかに大きいシステムを扱うことができる。 We describe a matrix product state (MPS) extension for the Fermionic Quantum Emulator (FQE) software library. We discuss the theory behind symmetry adapted matrix product states for approximating many-body wavefunctions of spin-1/2 fermions, and we present an open-source, MPS-enabled implementation of the FQE interface (MPS-FQE). The software uses the open-source pyblock3 and block2 libraries for most elementary tensor operations, and it can largely be used as a drop-in replacement for FQE that allows for more efficient, but approximate, emulation of larger fermionic circuits. Finally, we show several applications relevant to both near-term and fault-tolerant quantum algorithms where approximate emulation of larger systems is expected to be useful: characterization of state preparation strategies for quantum phase estimation, the testing of different variational quantum eigensolver Ans\"atze, the numerical evaluation of Trotter errors, and the simulation of general quantum dynamics problems. In all these examples, approximate emulation with MPS-FQE allows us to treat systems that are significantly larger than those accessible with a full statevector emulator.	翻訳日:2024-04-26 23:08:00 公開日:2024-04-24
# NU-Class Net:ビデオ品質向上のための新しいディープラーニングベースのアプローチ NU-Class Net: A Novel Deep Learning-based Approach for Video Quality Enhancement ( http://arxiv.org/abs/2401.01163v2 ) ライセンス: Link先を確認	Parham Zilouchian Moghaddam, Mehdi Modarressi, Mohammad Amin Sadeghi,	(参考訳) ビデオコンテンツの人気は急増しており、インターネットトラフィックとIoT(Internet of Things)ネットワークに対する優位性を主張している。ビデオ圧縮は、ビデオキャプチャー装置が生成する実質的なマルチメディアトラフィックを効率的に管理する主要な手段であると考えられてきた。それでも、ビデオ圧縮アルゴリズムは、かなりの圧縮比を達成するために、かなりの計算要求を必要とする。この複雑さは、IoTエッジノードカメラなどのリソース制限された組み込みシステムにおいて、効率的なビデオコーディング標準を実装する上で、非常に難しい課題となる。そこで本研究では,圧縮コーデックの損失による圧縮アーチファクトの軽減を目的とした,革新的な深層学習モデルであるNU-Class Netを提案する。この拡張により、低ビットレートビデオの品質が著しく向上する。 NU-Class Netを利用することで、ビデオキャプチャノード内のビデオエンコーダは出力品質を低減し、低ビットレートのビデオを生成し、エッジでの計算と帯域幅の要求を効果的に調整することができる。デコーダ側では、典型的にはリソース制限の影響を受けないが、NU-Class Netはビデオデコーダの後に適用され、アーティファクトを補償し、元のビデオの品質を近似する。実験により,低ビットレートでストリーミングされたビデオの知覚品質を高めるためのモデルの有効性が確認された。 Video content has experienced a surge in popularity, asserting its dominance over internet traffic and Internet of Things (IoT) networks. Video compression has long been regarded as the primary means of efficiently managing the substantial multimedia traffic generated by video-capturing devices. Nevertheless, video compression algorithms entail significant computational demands in order to achieve substantial compression ratios. This complexity presents a formidable challenge when implementing efficient video coding standards in resource-constrained embedded systems, such as IoT edge node cameras. To tackle this challenge, this paper introduces NU-Class Net, an innovative deep-learning model designed to mitigate compression artifacts stemming from lossy compression codecs. This enhancement significantly elevates the perceptible quality of low-bit-rate videos. By employing the NU-Class Net, the video encoder within the video-capturing node can reduce output quality, thereby generating low-bit-rate videos and effectively curtailing both computation and bandwidth requirements at the edge. On the decoder side, which is typically less encumbered by resource limitations, NU-Class Net is applied after the video decoder to compensate for artifacts and approximate the quality of the original video. Experimental results affirm the efficacy of the proposed model in enhancing the perceptible quality of videos, especially those streamed at low bit rates.	翻訳日:2024-04-26 23:08:00 公開日:2024-04-24
# クローズド述語を用いた既存規則の一貫性クエリー解法 Consistent Query Answering for Existential Rules with Closed Predicates ( http://arxiv.org/abs/2401.05743v2 ) ライセンス: Link先を確認	Lorenzo Marconi, Riccardo Rosati,	(参考訳) Consistent Query Answering (CQA)は、知識ベースとデータベースのデータアクセスに対する一貫性のないアプローチである。 CQAの目標は、一貫性のない情報が存在する場合でも、クエリに意味のある(一貫性のある)回答を提供することである。 CQAのセマンティクスは、修復の概念、すなわち最小限の変更によって得られる初期一貫性のないデータベースの一貫したバージョンに基づいている。既存のルールで表されるデータ依存データベースにおけるCQAについて検討する。より具体的には、タプル生成の依存関係と等価性の生成の依存関係の両方を拡張する、不等式(DED)を伴う、広範囲な結合型依存性のクラスに焦点を当てる。まず、データベース述語がクローズされた場合、すなわち、データベースがそのような述語に関する完全な知識を持っていると仮定し、データベースを修復するタプルの追加は不可能である。このようなシナリオでは、CQAのデータ複雑性と関連するタスク(再チェック)を、異なる意味論(ARとIAR)と異なる存在規則のクラスで詳細に分析する。特に,非巡回型,線形型,完全型,粘着型およびガード型DEDのクラスとその組み合わせについて考察する。 Consistent Query Answering (CQA) is an inconsistency-tolerant approach to data access in knowledge bases and databases. The goal of CQA is to provide meaningful (consistent) answers to queries even in the presence of inconsistent information, e.g. a database whose data conflict with meta-data (typically the database integrity constraints). The semantics of CQA is based on the notion of repair, that is, a consistent version of the initial, inconsistent database that is obtained through minimal modifications. We study CQA in databases with data dependencies expressed by existential rules. More specifically, we focus on the broad class of disjunctive embedded dependencies with inequalities (DEDs), which extend both tuple-generating dependencies and equality-generated dependencies. We first focus on the case when the database predicates are closed, i.e. the database is assumed to have complete knowledge about such predicates, thus no tuple addition is possible to repair the database. In such a scenario, we provide a detailed analysis of the data complexity of CQA and associated tasks (repair checking) under different semantics (AR and IAR) and for different classes of existential rules. In particular, we consider the classes of acyclic, linear, full, sticky and guarded DEDs, and their combinations.	翻訳日:2024-04-26 21:08:18 公開日:2024-04-24
# LLMCheckup:解釈可能性ツールと自己説明による大規模言語モデルの会話的検証 LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools and Self-Explanations ( http://arxiv.org/abs/2401.12576v2 ) ライセンス: Link先を確認	Qianli Wang, Tatiana Anikina, Nils Feldhus, Josef van Genabith, Leonhard Hennig, Sebastian Möller,	(参考訳) 対話形式で説明を提供する解釈可能性ツールは,ユーザへの十分な情報提供に不足する可能性があるため,ユーザの理解を高める効果(Slack et al , 2023; Shen et al , 2023)を示した。しかしながら、対話ベースの説明のための現在のソリューションは、しばしば外部ツールやモジュールを必要とし、設計されていないタスクに簡単に転送できない。 LLMCheckupでは、ユーザが最新の大規模言語モデル(LLM)の振る舞いをチャットできる、容易にアクセスできるツールを提供する。特徴属性などのホワイトボックス説明可能性ツールや自己説明(合理生成など)を含む、説明可能なAI(XAI)メソッドを幅広い範囲に接続することにより、LCMが説明を生成し、微調整なしでユーザ意図の認識を可能にする。 LLMベースの(自己)説明は、フォローアップ質問をサポートし、提案を生成する対話対話として提示される。 LLMCheckupprovidesはシステムで利用可能なオペレーションのチュートリアルを公開し、XAIの様々なレベルの専門知識を持つ個人にケアし、複数の入力モダリティをサポートする。 LLMのユーザ意図認識精度を大幅に向上させる新しい解析手法を提案する。最後に,ファクトチェックとコモンセンス質問応答のタスクに対するLLMCheckupを紹介する。 Interpretability tools that offer explanations in the form of a dialogue have demonstrated their efficacy in enhancing users' understanding (Slack et al., 2023; Shen et al., 2023), as one-off explanations may fall short in providing sufficient information to the user. Current solutions for dialogue-based explanations, however, often require external tools and modules and are not easily transferable to tasks they were not designed for. With LLMCheckup, we present an easily accessible tool that allows users to chat with any state-of-the-art large language model (LLM) about its behavior. We enable LLMs to generate explanations and perform user intent recognition without fine-tuning, by connecting them with a broad spectrum of Explainable AI (XAI) methods, including white-box explainability tools such as feature attributions, and self-explanations (e.g., for rationale generation). LLM-based (self-)explanations are presented as an interactive dialogue that supports follow-up questions and generates suggestions. LLMCheckupprovides tutorials for operations available in the system, catering to individuals with varying levels of expertise in XAI and supporting multiple input modalities. We introduce a new parsing strategy that substantially enhances the user intent recognition accuracy of the LLM. Finally, we showcase LLMCheckup for the tasks of fact checking and commonsense question answering.	翻訳日:2024-04-26 21:08:18 公開日:2024-04-24
# LLMとIDE静的解析による抽出メソッドリファクタリング Together We Go Further: LLMs and IDE Static Analysis for Extract Method Refactoring ( http://arxiv.org/abs/2401.15298v2 ) ライセンス: Link先を確認	Dorin Pomian, Abhiram Bellur, Malinda Dilhara, Zarina Kurbatova, Egor Bogomolov, Timofey Bryksin, Danny Dig,	(参考訳) 単一のメソッドに複数の責任をカプセル化する長いメソッドはメンテナンスが難しい。新しい手法にどの文を抽出するかを選択することが、多くの研究ツールの標的となっている。着実に改善されているにもかかわらず、これらのツールは、開発者の好みや受け入れ基準に沿ってリファクタリングを生成するのに失敗することが多い。大きな言語モデル(LLM)が大規模なコードコーパスでトレーニングされていることを考えると、開発者が関数を作る方法に精通しているなら、開発者が受け入れそうなリファクタリングを提案するかもしれません。本稿では,LLMの知見とIDEのパワーを相乗的に組み合わせて抽出法(EM)を実行することにより,リファクタリングの科学と実践を推し進める。 1752 EMシナリオに関する我々のフォーマティブな研究により、LSMは専門家による提案を行うのに非常に効果的であるが、信頼できないことが判明した。 LLMが提案する候補から幻覚を取り除く新しいアプローチを設計し、プログラムスライシングから静的解析技術に基づいて提案をさらに強化・ランク付けし、最終的にIDEを利用してリファクタリングを正しく実行した。このアプローチは、EM-Assistと呼ばれるIntelliJ IDEAプラグインで実装しました。我々は,オープンソースプロジェクトから1752個の実際のリファクタリングを複製する多種多様なコーパス上でEM-Assistを実証的に評価した。 EM-Assistは、53.4%のケースで、開発者によるリファクタリングを推奨し、以前のベストプラクティスツールの39.4%のリコール率よりも改善した。さらに,16人の産業開発者を対象に,暖炉調査を行い,最近のコミットをリファクタリングすることを提案した。 81.3%がEM-Assistの勧告に賛成した。 Long methods that encapsulate multiple responsibilities within a single method are challenging to maintain. Choosing which statements to extract into new methods has been the target of many research tools. Despite steady improvements, these tools often fail to generate refactorings that align with developers' preferences and acceptance criteria. Given that Large Language Models (LLMs) have been trained on large code corpora, if we harness their familiarity with the way developers form functions, we could suggest refactorings that developers are likely to accept. In this paper, we advance the science and practice of refactoring by synergistically combining the insights of LLMs with the power of IDEs to perform Extract Method (EM). Our formative study on 1752 EM scenarios revealed that LLMs are very effective for giving expert suggestions, yet they are unreliable: up to 76.3% of the suggestions are hallucinations. We designed a novel approach that removes hallucinations from the candidates suggested by LLMs, then further enhances and ranks suggestions based on static analysis techniques from program slicing, and finally leverages the IDE to execute refactorings correctly. We implemented this approach in an IntelliJ IDEA plugin called EM-Assist. We empirically evaluated EM-Assist on a diverse corpus that replicates 1752 actual refactorings from open-source projects. We found that EM-Assist outperforms previous state of the art tools: EM-Assist suggests the developerperformed refactoring in 53.4% of cases, improving over the recall rate of 39.4% for previous best-in-class tools. Furthermore, we conducted firehouse surveys with 16 industrial developers and suggested refactorings on their recent commits. 81.3% of them agreed with the recommendations provided by EM-Assist.	翻訳日:2024-04-26 21:08:18 公開日:2024-04-24
# ヒト脳波の表現的アライメントによるより人間の脳に似た視力の獲得 Achieving More Human Brain-Like Vision via Human EEG Representational Alignment ( http://arxiv.org/abs/2401.17231v2 ) ライセンス: Link先を確認	Zitong Lu, Yile Wang, Julie D. Golomb,	(参考訳) 人工知能の進歩にもかかわらず、物体認識モデルは人間の脳における視覚情報処理のエミュレートに遅れを取っている。近年の研究では、脳の処理を模倣するために神経データを使用することの可能性を強調している。非侵襲的脳波に基づく視覚モデル「Re(presentational)Al(ignment)net」を初めて提示した。我々の革新的な画像から脳への多層符号化フレームワークは、複数のモデルレイヤーを最適化し、モデルがオブジェクトカテゴリと異なるモダリティをまたいだ人間の脳の視覚的表現パターンを効率的に学習し模倣できるようにすることにより、人間の神経アライメントを向上させる。我々の発見は、ReAlnetが人工と人間の視覚のギャップを埋め、より脳に似た人工知能システムへの道を歩むブレークスルーを表していることを示唆している。 Despite advancements in artificial intelligence, object recognition models still lag behind in emulating visual information processing in human brains. Recent studies have highlighted the potential of using neural data to mimic brain processing; however, these often rely on invasive neural recordings from non-human subjects, leaving a critical gap in understanding human visual perception. Addressing this gap, we present, for the first time, 'Re(presentational)Al(ignment)net', a vision model aligned with human brain activity based on non-invasive EEG, demonstrating a significantly higher similarity to human brain representations. Our innovative image-to-brain multi-layer encoding framework advances human neural alignment by optimizing multiple model layers and enabling the model to efficiently learn and mimic human brain's visual representational patterns across object categories and different modalities. Our findings suggest that ReAlnet represents a breakthrough in bridging the gap between artificial and human vision, and paving the way for more brain-like artificial intelligence systems.	翻訳日:2024-04-26 21:08:18 公開日:2024-04-24
# FuseFormer: 画像と熱画像の融合のためのトランスフォーマー FuseFormer: A Transformer for Visual and Thermal Image Fusion ( http://arxiv.org/abs/2402.00971v2 ) ライセンス: Link先を確認	Aytekin Erdogan, Erdem Akagündüz,	(参考訳) 画像融合問題に対する決定的な基礎的真理が欠如しているため、損失関数は構造類似度指数測定(SSIM)などの評価指標に基づいて構造化される。しかし、これを行うと、SSIMに対してバイアスが発生し、その結果、入力されたビジュアルバンド画像が生成される。本研究の目的は,古典的評価指標を損失関数として用いた場合の限界を緩和する画像融合問題に対する新しい手法を提案することである。提案手法は,局所的およびグローバルなコンテキスト情報に順応的に対処するトランスフォーマーベースのマルチスケール融合戦略を統合する。この統合により、画像融合プロセスの個々のコンポーネントが洗練されるだけでなく、全体の有効性も大幅に向上する。提案手法は,第1段階において,複数スケールの深部特徴を抽出するオートエンコーダを訓練する2段階の訓練手法に従っている。第2段階では、核融合ブロックを統合し、前述の損失関数を変更する。マルチスケール機能は、畳み込みニューラルネットワーク(CNN)とトランスフォーマーを組み合わせることで融合される。 CNNはローカル機能をキャプチャするために使用され、Transformerは一般的なコンテキスト機能の統合を処理する。種々のベンチマークデータセットに対する広範な実験を通じて,提案手法は新たな損失関数の定義とともに,他の競合融合アルゴリズムと比較して優れた性能を示す。 Due to the lack of a definitive ground truth for the image fusion problem, the loss functions are structured based on evaluation metrics, such as the structural similarity index measure (SSIM). However, in doing so, a bias is introduced toward the SSIM and, consequently, the input visual band image. The objective of this study is to propose a novel methodology for the image fusion problem that mitigates the limitations associated with using classical evaluation metrics as loss functions. Our approach integrates a transformer-based multi-scale fusion strategy that adeptly addresses local and global context information. This integration not only refines the individual components of the image fusion process but also significantly enhances the overall efficacy of the method. Our proposed method follows a two-stage training approach, where an auto-encoder is initially trained to extract deep features at multiple scales in the first stage. For the second stage, we integrate our fusion block and change the loss function as mentioned. The multi-scale features are fused using a combination of Convolutional Neural Networks (CNNs) and Transformers. The CNNs are utilized to capture local features, while the Transformer handles the integration of general context features. Through extensive experiments on various benchmark datasets, our proposed method, along with the novel loss function definition, demonstrates superior performance compared to other competitive fusion algorithms.	翻訳日:2024-04-26 21:08:18 公開日:2024-04-24
# より高速かつ軽量なLDM:現状の課題と今後の展望 Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward ( http://arxiv.org/abs/2402.01799v2 ) ライセンス: Link先を確認	Arnav Chavan, Raghav Magazine, Shubham Kushwaha, Mérouane Debbah, Deepak Gupta,	(参考訳) LLMの優れた性能にもかかわらず、その普及は推論中にかなりの計算とメモリの要求のために困難に直面している。モデル圧縮およびシステムレベルの最適化手法の最近の進歩は、LLM推論を強化することを目的としている。この調査はこれらの手法の概要を提供し、最近の発展を強調している。 LLaMA(/2)-7Bの実験を通じて, 各種圧縮技術の評価を行い, 統一された環境下でのLLMの効率的な展開に関する実用的な知見を提供する。 LLaMA(/2)-7Bの実証分析は,これらの手法の有効性を強調した。調査結果から,現在の限界を特定し,LLM推論効率を改善するための今後の方向性について議論する。我々は、この論文で提示された結果を再現するコードベースをhttps://github.com/nyunAI/Faster-LLM-Surveyでリリースします。 Despite the impressive performance of LLMs, their widespread adoption faces challenges due to substantial computational and memory requirements during inference. Recent advancements in model compression and system-level optimization methods aim to enhance LLM inference. This survey offers an overview of these methods, emphasizing recent developments. Through experiments on LLaMA(/2)-7B, we evaluate various compression techniques, providing practical insights for efficient LLM deployment in a unified setting. The empirical analysis on LLaMA(/2)-7B highlights the effectiveness of these methods. Drawing from survey insights, we identify current limitations and discuss potential future directions to improve LLM inference efficiency. We release the codebase to reproduce the results presented in this paper at https://github.com/nyunAI/Faster-LLM-Survey	翻訳日:2024-04-26 21:08:18 公開日:2024-04-24
# Mambaは文脈内学習が可能なのか? Is Mamba Capable of In-Context Learning? ( http://arxiv.org/abs/2402.03170v2 ) ライセンス: Link先を確認	Riccardo Grazzi, Julien Siems, Simon Schrodi, Thomas Brox, Frank Hutter,	(参考訳) GPT-4のような最先端技術基盤モデルは、ニューラルネットワークのフォワードパス中にタスクを解決するための学習能力に関するメタラーニングの変種であるインコンテキストラーニング(ICL)において驚くほどうまく機能し、モデルへの入力として提供されるコンテキスト情報を活用する。この有用な機能は、基礎モデルの大規模な事前訓練の副産物として現れる。現在、トランスモデルはICLの最先端技術であるが、この研究は、入力シーケンス長のトランスフォーマーよりも優れたスケールを持つ新しい状態空間モデルであるMambaが、同様のICL機能を持つという実証的な証拠を提供する。我々は,より複雑な自然言語処理問題だけでなく,単純な関数近似を含むタスクにおいて,Mambaを評価した。以上の結果から,タスクのカテゴリによって,MambaはICLのトランスフォーマーモデルの性能と密に一致していることがわかった。さらなる分析により、Mambaは変換器と同様に内部表現を漸進的に最適化することでICL問題を解くように見える。全体としては,長い入力シーケンスを含むICLタスクのトランスフォーマーの代替として,Mambaが有効である可能性が示唆されている。これはメタ学習におけるエキサイティングな発見であり、コンテキスト内で学習したAutoMLアルゴリズム(TabPFNやOptformerなど)の長い入力シーケンスへの一般化を可能にする可能性がある。 State of the art foundation models such as GPT-4 perform surprisingly well at in-context learning (ICL), a variant of meta-learning concerning the learned ability to solve tasks during a neural network forward pass, exploiting contextual information provided as input to the model. This useful ability emerges as a side product of the foundation model's massive pretraining. While transformer models are currently the state of the art in ICL, this work provides empirical evidence that Mamba, a newly proposed state space model which scales better than transformers w.r.t. the input sequence length, has similar ICL capabilities. We evaluated Mamba on tasks involving simple function approximation as well as more complex natural language processing problems. Our results demonstrate that, across both categories of tasks, Mamba closely matches the performance of transformer models for ICL. Further analysis reveals that, like transformers, Mamba appears to solve ICL problems by incrementally optimizing its internal representations. Overall, our work suggests that Mamba can be an efficient alternative to transformers for ICL tasks involving long input sequences. This is an exciting finding in meta-learning and may enable generalizations of in-context learned AutoML algorithms (like TabPFN or Optformer) to long input sequences.	翻訳日:2024-04-26 20:58:26 公開日:2024-04-24
# YOLOv8-AM: YOLOv8 : 小児腰部骨折検出のための注意機構 YOLOv8-AM: YOLOv8 with Attention Mechanisms for Pediatric Wrist Fracture Detection ( http://arxiv.org/abs/2402.09329v4 ) ライセンス: Link先を確認	Chun-Tse Chien, Rui-Yang Ju, Kuang-Yi Chou, Enkaer Xieerke, Jen-Shiun Chiang,	(参考訳) 難治性外傷や骨折は、特に骨折症例のかなりの割合を占める小児において、日常生活において頻繁に起こる。手術の前に、外科医は患者にまずX線撮影を依頼し、放射線医の分析に基づいてそれに備える。ニューラルネットワークの開発に伴い、You Only Look Once (YOLO)シリーズモデルがコンピュータ支援診断(CAD)として骨折検出に広く利用されている。 2023年、UltralyticsはYOLOモデルの最新バージョンを発表した。注意機構は、モデルパフォーマンスを改善する最もホットな方法の1つです。本研究は,本来のYOLOv8アーキテクチャにアテンション機構を組み込んだYOLOv8-AMを提案する。具体的には、4つの注意モジュール、CBAM(Convolutional Block Attention Module)、GAM(Global Attention Mechanism)、ECA(Efficient Channel Attention)、SA(Shuffle Attention)を使用して、改良されたモデルを設計し、GRAZPEDWRI-DXデータセットでトレーニングする。 ResBlock + CBAM (ResCBAM) に基づくYOLOv8-AMモデルのIoU 50(mAP 50)の平均精度は63.6%から65.8%に向上し,SOTAの性能が向上した。逆に、GAMを組み込んだYOLOv8-AMモデルは、mAP 50の64.2%の値を得るが、これは満足のいく拡張ではない。したがって、ResBlockとGAMを組み合わせてResGAMを導入し、新しいYOLOv8-AMモデルを設計し、mAP 50値が65.0%に向上した。この研究の実装コードはGitHubでhttps://github.com/RuiyangJu/Fracture_Detection_Improved_YOLOv8で公開されている。 Wrist trauma and even fractures occur frequently in daily life, particularly among children who account for a significant proportion of fracture cases. Before performing surgery, surgeons often request patients to undergo X-ray imaging first and prepare for it based on the analysis of the radiologist. With the development of neural networks, You Only Look Once (YOLO) series models have been widely used in fracture detection as computer-assisted diagnosis (CAD). In 2023, Ultralytics presented the latest version of the YOLO models, which has been employed for detecting fractures across various parts of the body. Attention mechanism is one of the hottest methods to improve the model performance. This research work proposes YOLOv8-AM, which incorporates the attention mechanism into the original YOLOv8 architecture. Specifically, we respectively employ four attention modules, Convolutional Block Attention Module (CBAM), Global Attention Mechanism (GAM), Efficient Channel Attention (ECA), and Shuffle Attention (SA), to design the improved models and train them on GRAZPEDWRI-DX dataset. Experimental results demonstrate that the mean Average Precision at IoU 50 (mAP 50) of the YOLOv8-AM model based on ResBlock + CBAM (ResCBAM) increased from 63.6% to 65.8%, which achieves the state-of-the-art (SOTA) performance. Conversely, YOLOv8-AM model incorporating GAM obtains the mAP 50 value of 64.2%, which is not a satisfactory enhancement. Therefore, we combine ResBlock and GAM, introducing ResGAM to design another new YOLOv8-AM model, whose mAP 50 value is increased to 65.0%. The implementation code for this study is available on GitHub at https://github.com/RuiyangJu/Fracture_Detection_Improved_YOLOv8.	翻訳日:2024-04-26 20:58:26 公開日:2024-04-24
# MuChin: 音楽分野における言語モデル評価のための中国語の口語記述ベンチマーク MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music ( http://arxiv.org/abs/2402.09871v3 ) ライセンス: Link先を確認	Zihao Wang, Shuyu Li, Tao Zhang, Qi Wang, Pengfei Yu, Jinyang Luo, Yan Liu, Ming Xi, Kejun Zhang,	(参考訳) 急速に発展するマルチモーダル大言語モデル(LLM)は、音楽の理解とテキスト記述において、そのパフォーマンスを均一に評価するために、新しいベンチマークを必要とする。しかし、音楽情報検索(MIR)アルゴリズムと人間の理解、専門家と一般人の相違、注釈の精度の低さなどにより、既存の音楽記述データセットはベンチマークとして機能することができない。そこで本研究では,中国語における最初のオープンソース音楽記述ベンチマークであるMuChinについて述べる。そこで我々は,革新的な多人数多段階保証手法を取り入れたCaiMAP(Caichong Music Annotation Platform)を構築し,アノテーションの精度と一般的な意味論との整合性を確保するために,アマチュアとプロの両方を雇った。この手法を用いて,多次元で高精度な音楽アノテーションを備えたデータセットであるCaichong Music Dataset (CaiMD)を構築し,Muchinのテストセットとして1,000の高品質なエントリを慎重に選択した。 MuChin を用いて,音楽記述の観点からプロとアマチュアの差異を分析し,微調整 LLM における注釈付きデータの有効性を実証的に実証した。最終的に、我々は既存の音楽理解モデルの評価にMuChinを用いて、音楽の口語的記述を提供する能力について検討した。ベンチマークに関連するすべてのデータとスコアコード、詳細な付録がオープンソース化された(https://github.com/CarlWangChina/MuChin/)。 The rapidly evolving multimodal Large Language Models (LLMs) urgently require new benchmarks to uniformly evaluate their performance on understanding and textually describing music. However, due to semantic gaps between Music Information Retrieval (MIR) algorithms and human understanding, discrepancies between professionals and the public, and low precision of annotations, existing music description datasets cannot serve as benchmarks. To this end, we present MuChin, the first open-source music description benchmark in Chinese colloquial language, designed to evaluate the performance of multimodal LLMs in understanding and describing music. We established the Caichong Music Annotation Platform (CaiMAP) that employs an innovative multi-person, multi-stage assurance method, and recruited both amateurs and professionals to ensure the precision of annotations and alignment with popular semantics. Utilizing this method, we built a dataset with multi-dimensional, high-precision music annotations, the Caichong Music Dataset (CaiMD), and carefully selected 1,000 high-quality entries to serve as the test set for MuChin. Based on MuChin, we analyzed the discrepancies between professionals and amateurs in terms of music description, and empirically demonstrated the effectiveness of annotated data for fine-tuning LLMs. Ultimately, we employed MuChin to evaluate existing music understanding models on their ability to provide colloquial descriptions of music. All data related to the benchmark, along with the scoring code and detailed appendices, have been open-sourced (https://github.com/CarlWangChina/MuChin/).	翻訳日:2024-04-26 20:58:26 公開日:2024-04-24
# NToP:トップビュー魚眼画像における2次元・3次元人物位置推定のためのNeRFを用いた大規模データセット生成 NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images ( http://arxiv.org/abs/2402.18196v2 ) ライセンス: Link先を確認	Jingrui Yu, Dipankar Nandi, Roman Seidel, Gangolf Hirtz,	(参考訳) 魚眼カメラを用いたトップビューでのヒューマンポーズ推定(HPE)は、有望で革新的なアプリケーションドメインを示す。しかし、この視点を捉えたデータセットの可用性は非常に限られており、特に高品質な2Dおよび3Dキーポイントアノテーションがある。このギャップに対処するため、我々はNeural Radiance Fields(NeRF)の技術を活用し、既存の2Dおよび3Dデータセットから人間のポーズデータセットを生成する包括的なパイプラインを構築します。このパイプラインを通じて,魚眼カメラ用の新しいデータセットNToP570K(NeRFを利用した570万枚以上の画像付きトップビューヒューマンポースデータセット)を作成し,そのニューラルネットワークを2次元および3次元のトップビュー人間のポーズ推定のために拡張する効果を広範囲に評価する。事前トレーニングした ViTPose-B モデルでは,トレーニングセットを微調整した後の2次元 HPE の検証セットにおいて,AP が 33.3 % 向上した。同様に微調整されたHybrIK-Transformerモデルは、検証セット上の3D HPEに対してPA-MPJPEを53.7mm削減する。 Human pose estimation (HPE) in the top-view using fisheye cameras presents a promising and innovative application domain. However, the availability of datasets capturing this viewpoint is extremely limited, especially those with high-quality 2D and 3D keypoint annotations. Addressing this gap, we leverage the capabilities of Neural Radiance Fields (NeRF) technique to establish a comprehensive pipeline for generating human pose datasets from existing 2D and 3D datasets, specifically tailored for the top-view fisheye perspective. Through this pipeline, we create a novel dataset NToP570K (NeRF-powered Top-view human Pose dataset for fisheye cameras with over 570 thousand images), and conduct an extensive evaluation of its efficacy in enhancing neural networks for 2D and 3D top-view human pose estimation. A pretrained ViTPose-B model achieves an improvement in AP of 33.3 % on our validation set for 2D HPE after finetuning on our training set. A similarly finetuned HybrIK-Transformer model gains 53.7 mm reduction in PA-MPJPE for 3D HPE on the validation set.	翻訳日:2024-04-26 20:58:26 公開日:2024-04-24
# NiNformer: トケミキシング生成ゲーティング機能を備えたネットワークトランスフォーマーのネットワーク NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function ( http://arxiv.org/abs/2403.02411v2 ) ライセンス: Link先を確認	Abdullah Nazhat Abdullah, Tarkan Aydin,	(参考訳) AttentionメカニズムはTransformerアーキテクチャの主要なコンポーネントであり、導入以来、多くのドメインと複数のタスクにまたがるディープラーニングの大幅な進歩につながっている。アテンションメカニズムはコンピュータビジョンでビジョントランスフォーマーViTとして利用され、その用途は、分類、セグメンテーション、オブジェクト検出、画像生成など、視覚領域の多くのタスクに拡張されている。このメカニズムは非常に表現力があり能力があるが、計算コストが高く、効率的な最適化のためにかなりのサイズのデータセットを必要とするという欠点がある。これらの欠点に対処するために、計算負担を減らし、データサイズ要件を緩和する多くの設計が文献で提案されている。視覚領域におけるこのような試みの例としては、MLP-Mixer、Conv-Mixer、Perciver-IOなどがある。本稿では,MLPミキサーの静的アプローチを強化するネットワーク・イン・ネットワーク構造を,トークン・ミキシング・プロセスにより要素ワイド・ゲーティング関数を学習する動的システムに置き換えることで,通常のViTブロックに代わる新しい計算ブロックを提案する。広汎な実験により,視覚領域の画像分類タスクに適用された複数のデータセットのベースラインアーキテクチャよりも優れた性能が得られた。 The Attention mechanism is the main component of the Transformer architecture, and since its introduction, it has led to significant advancements in Deep Learning that span many domains and multiple tasks. The Attention Mechanism was utilized in Computer Vision as the Vision Transformer ViT, and its usage has expanded into many tasks in the vision domain, such as classification, segmentation, object detection, and image generation. While this mechanism is very expressive and capable, it comes with the drawback of being computationally expensive and requiring datasets of considerable size for effective optimization. To address these shortcomings, many designs have been proposed in the literature to reduce the computational burden and alleviate the data size requirements. Examples of such attempts in the vision domain are the MLP-Mixer, the Conv-Mixer, the Perciver-IO, and many more. This paper introduces a new computational block as an alternative to the standard ViT block that reduces the compute burdens by replacing the normal Attention layers with a Network in Network structure that enhances the static approach of the MLP Mixer with a dynamic system of learning an element-wise gating function by a token mixing process. Extensive experimentation shows that the proposed design provides better performance than the baseline architectures on multiple datasets applied in the image classification task of the vision domain.	翻訳日:2024-04-26 20:48:34 公開日:2024-04-24
# WMDPベンチマーク:アンラーニングによる悪意的使用の測定と削減 The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning ( http://arxiv.org/abs/2403.03218v4 ) ライセンス: Link先を確認	Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer, Samuel Marks, Oam Patel, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Liu, Adam A. Hunt, Justin Tienken-Harder, Kevin Y. Shih, Kemper Talley, John Guan, Russell Kaplan, Ian Steneker, David Campbell, Brad Jokubaitis, Alex Levinson, Jean Wang, William Qian, Kallol Krishna Karmakar, Steven Basart, Stephen Fitz, Mindy Levine, Ponnurangam Kumaraguru, Uday Tupakula, Vijay Varadharajan, Yan Shoshitaishvili, Jimmy Ba, Kevin M. Esvelt, Alexandr Wang, Dan Hendrycks,	(参考訳) ホワイトハウス人工知能に関する大統領令は、生物、サイバー、化学兵器の開発において悪意あるアクターに力を与える大きな言語モデル(LLM)のリスクを強調している。悪意のある使用のリスクを測定するために、政府機関と主要なAIラボは、LLMにおける有害な能力の評価を開発している。しかし、現在の評価は非公開であり、リスク軽減のさらなる研究を妨げている。さらに、悪意のある使用のための、非常に特殊な経路にのみ焦点をあてている。これらのギャップを埋めるために、私たちは、バイオセキュリティ、サイバーセキュリティ、化学セキュリティにおける有害な知識のプロキシ測定として機能する、4,157の多重選択質問のデータセットであるWMDP(Weapons of Mass Destruction Proxy)ベンチマークを公開しました。 WMDPは学者と技術コンサルタントのコンソーシアムによって開発され、公開前に機密情報を除去するために厳格にフィルタリングされた。 WMDPは、まず、LLMにおける有害な知識の評価として、そして次に、そのような有害な知識を取り除くための未学習手法のベンチマークとして、2つの役割を果たす。モデル表現の制御に基づく最先端のアンラーニング手法であるCUTを開発した。 CUTは、生物学や計算機科学などの分野における一般的な能力を保ちながら、WMDPのモデル性能を低下させる。私たちはベンチマークとコードをhttps://wmdp.aiで公開しています。 The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 4,157 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop CUT, a state-of-the-art unlearning method based on controlling model representations. CUT reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at https://wmdp.ai	翻訳日:2024-04-26 20:48:34 公開日:2024-04-24
# 量子因果構造に対するデ・フィネッティの定理 A de Finetti theorem for quantum causal structures ( http://arxiv.org/abs/2403.10316v2 ) ライセンス: Link先を確認	Fabio Costa, Jonathan Barrett, Sally Shrapnel,	(参考訳) 因果構造が'未知'である,という意味は何でしょうか? 因果関係に関する事前の知識のない実験の「繰り返し」についても話せるだろうか? そして、任意の、あるいは不確定な因果構造を持つプロセスの集合が独立かつ同一に分散されていると、どのような条件で言えるだろうか? 古典的確率、量子状態、量子チャネルに関する同様の質問は、「デ・フィネッティの定理(de Finetti theorems)」と呼ばれる、単純で修正が容易な条件(交換下での対称性)と非常に特殊な多部構造(同じ状態とチャネルの混合)を結びつけて、美しく答えられる。ここでは、任意の因果構造を持つプロセスに結果を拡張し、不定因果順序や、雑音量子デバイスに適用可能なマルチ時間非マルコフ過程を含む。この結果はまた、線形制約の大きい量子状態に対する新しいクラスであるデ・フィネッティの定理も意味しており、これは独立な興味を持つことができる。 What does it mean for a causal structure to be `unknown'? Can we even talk about `repetitions' of an experiment without prior knowledge of causal relations? And under what conditions can we say that a set of processes with arbitrary, possibly indefinite, causal structure are independent and identically distributed? Similar questions for classical probabilities, quantum states, and quantum channels are beautifully answered by so-called "de Finetti theorems", which connect a simple and easy-to-justify condition -- symmetry under exchange -- with a very particular multipartite structure: a mixture of identical states/channels. Here we extend the result to processes with arbitrary causal structure, including indefinite causal order and multi-time, non-Markovian processes applicable to noisy quantum devices. The result also implies a new class of de Finetti theorems for quantum states subject to a large class of linear constraints, which can be of independent interest.	翻訳日:2024-04-26 20:48:34 公開日:2024-04-24
# SiMBA: 視覚と多変量時系列のためのシンプルマンバベースアーキテクチャ SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series ( http://arxiv.org/abs/2403.15360v2 ) ライセンス: Link先を確認	Badri N. Patro, Vijay S. Agneeswaran,	(参考訳) トランスフォーマーは、シーケンスミキシングのための注意ネットワークとチャネルミキシングのためのMDPを広く採用しており、ドメイン間のブレークスルーを達成する上で重要な役割を担っている。しかし、近年の文献では、低い帰納バイアスや入力シーケンス長に関する二次的複雑さなど、注意ネットワークの問題が強調されている。 S4などの状態空間モデル(Hippo、Global Convolutions、Liquid S4、LRU、Mega、Mamba)は、長いシーケンス長を扱うために上記の問題に対処するために登場した。 Mambaは最先端のSSMだが、コンピュータビジョンデータセットの大規模ネットワークにスケールする場合、安定性に問題がある。我々は,特定の固有値計算によるチャネルモデリングのためのEinstein FFT(EinFFT)を導入し,シーケンスモデリングにMambaブロックを用いる新しいアーキテクチャであるSiMBAを提案する。画像と時系列のベンチマークによる大規模なパフォーマンス調査は、SiMBAが既存のSSMよりも優れており、最先端のトランスフォーマーとパフォーマンスギャップを埋めていることを示している。特に、SiMBAは、ImageNetとStanford CarやFlowerなどのトランスファーラーニングベンチマーク、タスクラーニングベンチマーク、および7つの時系列ベンチマークデータセットにおいて、最先端のSSMとしての地位を確立している。プロジェクトページは ~\url{https://github.com/badripatro/Simba} で公開されている。 Transformers have widely adopted attention networks for sequence mixing and MLPs for channel mixing, playing a pivotal role in achieving breakthroughs across domains. However, recent literature highlights issues with attention networks, including low inductive bias and quadratic complexity concerning input sequence length. State Space Models (SSMs) like S4 and others (Hippo, Global Convolutions, liquid S4, LRU, Mega, and Mamba), have emerged to address the above issues to help handle longer sequence lengths. Mamba, while being the state-of-the-art SSM, has a stability issue when scaled to large networks for computer vision datasets. We propose SiMBA, a new architecture that introduces Einstein FFT (EinFFT) for channel modeling by specific eigenvalue computations and uses the Mamba block for sequence modeling. Extensive performance studies across image and time-series benchmarks demonstrate that SiMBA outperforms existing SSMs, bridging the performance gap with state-of-the-art transformers. Notably, SiMBA establishes itself as the new state-of-the-art SSM on ImageNet and transfer learning benchmarks such as Stanford Car and Flower as well as task learning benchmarks as well as seven time series benchmark datasets. The project page is available on this website ~\url{https://github.com/badripatro/Simba}.	翻訳日:2024-04-26 20:48:34 公開日:2024-04-24
# 確率モデルを用いた入院電子健康記録の逐次推定 Sequential Inference of Hospitalization Electronic Health Records Using Probabilistic Models ( http://arxiv.org/abs/2403.19011v2 ) ライセンス: Link先を確認	Alan D. Kaplan, Priyadip Ray, John D. Greene, Vincent X. Liu,	(参考訳) ダイナミックな病院環境では、意思決定支援は患者の成果を改善する貴重なツールとなり得る。このダイナミックな環境では、実験室のテストや薬品などの長いシーケンスを頻繁に更新するデータ駆動推論が困難である。これは、データ型と可変長列に含まれる混合シーケンス型の不均一性による部分もある。本研究では,入院電子健康記録(EHR)データに含まれる複数の任意長配列に対する確率的教師なしモデルの設計を行う。このモデルは潜在変数構造を使用し、薬物、診断、実験室のテスト、神経学的評価、薬物の間の複雑な関係を捉えている。損失のある変換や時間ビンニングを必要とせずに、オリジナルのデータでトレーニングすることができる。推論アルゴリズムは、部分的データを用いて、その長さや特定の値の存在を含む完全なシーケンスの特性を推測する。我々は,北カリフォルニアのKaiser Permanente(カイザー・パーマネンテ)統合型ヘルスケアデリバリーシステムにおいて,医療を受ける被験者のデータに基づいて,このモデルをトレーニングする。その結果,入院ベッドにおける集中治療室 (ICU) の長さと存在を予測するための保留データと比較した。提案手法はベースライン手法よりも優れており,これらの実験では,学習したモデルが将来の値を示すシーケンスで情報をキャプチャすることを示す。 In the dynamic hospital setting, decision support can be a valuable tool for improving patient outcomes. Data-driven inference of future outcomes is challenging in this dynamic setting, where long sequences such as laboratory tests and medications are updated frequently. This is due in part to heterogeneity of data types and mixed-sequence types contained in variable length sequences. In this work we design a probabilistic unsupervised model for multiple arbitrary-length sequences contained in hospitalization Electronic Health Record (EHR) data. The model uses a latent variable structure and captures complex relationships between medications, diagnoses, laboratory tests, neurological assessments, and medications. It can be trained on original data, without requiring any lossy transformations or time binning. Inference algorithms are derived that use partial data to infer properties of the complete sequences, including their length and presence of specific values. We train this model on data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system. The results are evaluated against held-out data for predicting the length of sequences and presence of Intensive Care Unit (ICU) in hospitalization bed sequences. Our method outperforms a baseline approach, showing that in these experiments the trained model captures information in the sequences that is informative of their future values.	翻訳日:2024-04-26 20:48:34 公開日:2024-04-24
# ILPO-NET:3次元における任意の体積パターンの不変認識のためのネットワーク ILPO-NET: Network for the invariant recognition of arbitrary volumetric patterns in 3D ( http://arxiv.org/abs/2403.19612v3 ) ライセンス: Link先を確認	Dmitrii Zhemchuzhnikov, Sergei Grudinin,	(参考訳) 空間パターンの効果的な認識とそれらの階層の学習は、現代の空間データ分析において不可欠である。ボリュームデータアプリケーションは、シフトだけでなく、パターンの回転にも不変性を保証する技術を模索している。従来の方法では翻訳的不変性は容易に達成できるが、回転的不変性には複数の課題があり、研究の活発な領域として残っている。本稿では、Wigner行列拡張を用いた局所的な空間パターン配向に本質的に不変な畳み込み操作で任意の形状のパターンを扱う新しいアプローチであるILPO-Net(Invariant to Local Patterns Orientation Network)を提案する。我々のアーキテクチャは新たな畳み込み演算子をシームレスに統合し、MedMNISTやCATHといった多様なボリュームデータセットをベンチマークすると、パラメータ数を大幅に削減したベースラインよりも優れた性能を示し、MedMNISTの1000倍も少ない。これらの実証の他に、ILPO-Netの回転不変性は、複数の分野にわたる他のアプリケーションへの道を開く。私たちのコードはhttps://gricad-gitlab.univ-grenoble-alpes.fr/GruLab/ILPO/-/tree/main/ILPONetで公開されています。 Effective recognition of spatial patterns and learning their hierarchy is crucial in modern spatial data analysis. Volumetric data applications seek techniques ensuring invariance not only to shifts but also to pattern rotations. While traditional methods can readily achieve translational invariance, rotational invariance possesses multiple challenges and remains an active area of research. Here, we present ILPO-Net (Invariant to Local Patterns Orientation Network), a novel approach that handles arbitrarily shaped patterns with the convolutional operation inherently invariant to local spatial pattern orientations using the Wigner matrix expansions. Our architecture seamlessly integrates the new convolution operator and, when benchmarked on diverse volumetric datasets such as MedMNIST and CATH, demonstrates superior performance over the baselines with significantly reduced parameter counts - up to 1000 times fewer in the case of MedMNIST. Beyond these demonstrations, ILPO-Net's rotational invariance paves the way for other applications across multiple disciplines. Our code is publicly available at https://gricad-gitlab.univ-grenoble-alpes.fr/GruLab/ILPO/-/tree/main/ILPONet.	翻訳日:2024-04-26 20:38:42 公開日:2024-04-24
# UltraLight VM-UNet: Parallel Vision Mamba が皮膚病変セグメンテーションのパラメータを著しく削減 UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation ( http://arxiv.org/abs/2403.20035v3 ) ライセンス: Link先を確認	Renkai Wu, Yinghao Liu, Pengchen Liang, Qing Chang,	(参考訳) 伝統的にモデルのセグメンテーション性能を改善するために、ほとんどのアプローチはより複雑なモジュールを追加することを好む。また,これは医療分野,特にモバイル医療機器には適さない。計算負荷モデルでは,計算資源の制約により実際の臨床環境には適さない。近年、Mambaによって代表される状態空間モデル(SSM)は、従来のCNNやTransformerと強力な競合関係にある。本稿では,マンバにおけるパラメータの影響の鍵となる要素を深く探求し,これに基づくUltraLight Vision Mamba UNet(UltraLight VM-UNet)を提案する。具体的には、処理チャネルの全体数を一定に保ちながら、最小の計算負荷で優れた性能を実現する、PVM Layerという並列ビジョン・マンバの並列処理手法を提案する。以上の結果から,UltraLight VM-UNetは0.049M,GFLOPs 0.060のパラメータと同等の性能を示すことを示した。さらに,本研究では,マンバのパラメータ影響の鍵となる要素を深く研究し,マンバが将来,軽量化のための新たなメインストリームモジュールとなるための理論的基盤となることを示唆する。コードはhttps://github.com/wurenkai/UltraLight-VM-UNetから入手できる。 Traditionally for improving the segmentation performance of models, most approaches prefer to use adding more complex modules. And this is not suitable for the medical field, especially for mobile medical devices, where computationally loaded models are not suitable for real clinical environments due to computational resource constraints. Recently, state-space models (SSMs), represented by Mamba, have become a strong competitor to traditional CNNs and Transformers. In this paper, we deeply explore the key elements of parameter influence in Mamba and propose an UltraLight Vision Mamba UNet (UltraLight VM-UNet) based on this. Specifically, we propose a method for processing features in parallel Vision Mamba, named PVM Layer, which achieves excellent performance with the lowest computational load while keeping the overall number of processing channels constant. We conducted comparisons and ablation experiments with several state-of-the-art lightweight models on three skin lesion public datasets and demonstrated that the UltraLight VM-UNet exhibits the same strong performance competitiveness with parameters of only 0.049M and GFLOPs of 0.060. In addition, this study deeply explores the key elements of parameter influence in Mamba, which will lay a theoretical foundation for Mamba to possibly become a new mainstream module for lightweighting in the future. The code is available from https://github.com/wurenkai/UltraLight-VM-UNet .	翻訳日:2024-04-26 20:38:42 公開日:2024-04-24
# マルチモーダル脳画像翻訳のための教師なし腫瘍認識蒸留法 Unsupervised Tumor-Aware Distillation for Multi-Modal Brain Image Translation ( http://arxiv.org/abs/2403.20168v2 ) ライセンス: Link先を確認	Chuan Huang, Jia Wei, Rui Li,	(参考訳) MRIスキャンによるマルチモーダル脳画像は、様々なモダリティから補完的な情報を提供するために臨床診断に広く用いられている。しかし、時間、コスト、アーティファクトといった様々な要因により、実際に完全にペア化されたマルチモーダル画像を得るのは難しいため、モダリティを欠く脳画像が得られる。この問題に対処するために、教師なしマルチモーダル脳画像翻訳が広く研究されている。既存の方法は、画像全体を翻訳する際に腫瘍領域に集中できないため、翻訳中の脳腫瘍の変形の問題に悩まされている。本稿では, 腫瘍領域を正確に知覚・翻訳できる, UTAD-Net と呼ばれる教師なしの蒸留指導者ネットワークを提案する。具体的には,教師ネットワークと学生ネットワークの2つの部分から構成される。教師ネットワークは、まず、未ペア画像と対応する腫瘍マスクを用いて、ソースからターゲットモダリティへのエンドツーエンドマッピングを学習する。そして、翻訳知識を学生ネットワークに蒸留し、マスクなしでより現実的な腫瘍領域と画像全体を生成する。実験により, 画像品質の定量評価と定性評価の両面において, 最先端の手法と比較して競合性能が得られた。さらに、下流セグメンテーションタスクにおいて生成された画像の有効性を示す。私たちのコードはhttps://github.com/scut-HC/UTAD-Net.orgで公開されています。 Multi-modal brain images from MRI scans are widely used in clinical diagnosis to provide complementary information from different modalities. However, obtaining fully paired multi-modal images in practice is challenging due to various factors, such as time, cost, and artifacts, resulting in modality-missing brain images. To address this problem, unsupervised multi-modal brain image translation has been extensively studied. Existing methods suffer from the problem of brain tumor deformation during translation, as they fail to focus on the tumor areas when translating the whole images. In this paper, we propose an unsupervised tumor-aware distillation teacher-student network called UTAD-Net, which is capable of perceiving and translating tumor areas precisely. Specifically, our model consists of two parts: a teacher network and a student network. The teacher network learns an end-to-end mapping from source to target modality using unpaired images and corresponding tumor masks first. Then, the translation knowledge is distilled into the student network, enabling it to generate more realistic tumor areas and whole images without masks. Experiments show that our model achieves competitive performance on both quantitative and qualitative evaluations of image quality compared with state-of-the-art methods. Furthermore, we demonstrate the effectiveness of the generated images on downstream segmentation tasks. Our code is available at https://github.com/scut-HC/UTAD-Net.	翻訳日:2024-04-26 20:38:42 公開日:2024-04-24
# ピクセルからグラフへ:視覚言語モデルを用いたオープン語彙シーングラフ生成 From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models ( http://arxiv.org/abs/2404.00906v3 ) ライセンス: Link先を確認	Rongjie Li, Songyang Zhang, Dahua Lin, Kai Chen, Xuming He,	(参考訳) シーングラフ生成(SGG)は、下流の推論タスクのための中間グラフ表現に視覚シーンを解析することを目的としている。近年の進歩にもかかわらず、既存の手法は、新しい視覚的関係の概念を持つシーングラフを生成するのに苦労している。この課題に対処するために、シークエンス生成に基づく新しいオープン語彙SGGフレームワークを導入する。我々のフレームワークは、画像からグラフへの生成パラダイムを取り入れた視覚言語事前学習モデル(VLM)を活用している。具体的には,VLMを用いた画像からテキストへの生成によってシーングラフのシーケンスを生成し,これらのシーケンスからシーングラフを構築する。これにより、オープン語彙SGGにおけるVLMの強みを活用し、VLタスクを強化するための明示的リレーショナルモデリングをシームレスに統合する。実験結果から,我々の設計はオープンな語彙で優れた性能を達成できるだけでなく,明示的な関係モデリング知識を通じて,下流の視覚言語タスク性能を向上させることが示唆された。 Scene graph generation (SGG) aims to parse a visual scene into an intermediate graph representation for downstream reasoning tasks. Despite recent advancements, existing methods struggle to generate scene graphs with novel visual relation concepts. To address this challenge, we introduce a new open-vocabulary SGG framework based on sequence generation. Our framework leverages vision-language pre-trained models (VLM) by incorporating an image-to-graph generation paradigm. Specifically, we generate scene graph sequences via image-to-text generation with VLM and then construct scene graphs from these sequences. By doing so, we harness the strong capabilities of VLM for open-vocabulary SGG and seamlessly integrate explicit relational modeling for enhancing the VL tasks. Experimental results demonstrate that our design not only achieves superior performance with an open vocabulary but also enhances downstream vision-language task performance through explicit relation modeling knowledge.	翻訳日:2024-04-26 20:38:42 公開日:2024-04-24
# Poro 34Bと多言語性の祝福 Poro 34B and the Blessing of Multilinguality ( http://arxiv.org/abs/2404.01856v2 ) ライセンス: Link先を確認	Risto Luukkonen, Jonathan Burdge, Elaine Zosa, Aarne Talman, Ville Komulainen, Väinö Hatanpää, Peter Sarlin, Sampo Pyysalo,	(参考訳) 最先端の大規模言語モデルの事前訓練は、今や数兆ワードのテキストを必要としており、これは大多数の言語で利用できるものよりも桁違いに多い。複数の言語にテキストを含めることは、より事前訓練されたデータを取得するための明らかな方法であるが、多言語性はしばしば呪いと見なされる。我々は、多言語性は祝福であり、多言語学習を通じて、小言語に対する単言語モデルの性能を大幅に向上させることが可能であると信じている。本研究では, フィンランド語, 英語, プログラミング言語の1兆トークンに対して訓練された34億のパラメータモデルであるPoro 34Bを紹介し, 多言語学習アプローチは, 既存のフィンランド語モデルの能力よりも大幅に進歩するだけでなく, 翻訳に優れ, 英語やプログラミング言語の生成においてそのクラスにおいて競争力を持つモデルを生成することができることを示した。我々は、オープンライセンスの下でモデルパラメータ、スクリプト、データをhttps://huggingface.co/LumiOpen/Poro-34Bでリリースします。 The pretraining of state-of-the-art large language models now requires trillions of words of text, which is orders of magnitude more than available for the vast majority of languages. While including text in more than one language is an obvious way to acquire more pretraining data, multilinguality is often seen as a curse, and most model training efforts continue to focus near-exclusively on individual large languages. We believe that multilinguality can be a blessing and that it should be possible to substantially improve over the capabilities of monolingual models for small languages through multilingual training. In this study, we introduce Poro 34B, a 34 billion parameter model trained for 1 trillion tokens of Finnish, English, and programming languages, and demonstrate that a multilingual training approach can produce a model that not only substantially advances over the capabilities of existing models for Finnish, but also excels in translation and is competitive in its class in generating English and programming languages. We release the model parameters, scripts, and data under open licenses at https://huggingface.co/LumiOpen/Poro-34B.	翻訳日:2024-04-26 20:38:42 公開日:2024-04-24
# DeepFunction:関数データを用いたスレッドパイプ接続欠陥の診断のための深度学習に基づく不均衡分類 DeepFunction: Deep Metric Learning-based Imbalanced Classification for Diagnosing Threaded Pipe Connection Defects using Functional Data ( http://arxiv.org/abs/2404.03329v2 ) ライセンス: Link先を確認	Yukun Xie, Juan Du, Chen Zhang,	(参考訳) 現代の製造業では、ほとんどの製品ラインが適合している。非コンフォーミングな製品はほとんどないが、欠陥タイプが異なる。欠陥型の同定は、生産ラインのさらなる根本原因診断に役立つ。センサの開発により、プロセス変数の信号を高分解能で収集することができ、マルチチャネル機能データと見なすことができる。プロセスの特徴と欠陥のタイプを特定するのに役立つ、豊富な情報があります。パイプの締め付けプロセスの実際の例に触発され、各サンプルが多チャンネル関数データである欠陥分類に焦点をあてる。しかし、各欠陥タイプのサンプルは制限され、不均衡である。また、パイプ締め付け工程の前の密閉前工程が未保存であるため、機能は不完全である。不均衡、マルチチャネル、不完全な機能データに基づいて欠陥サンプルを分類するのは非常に重要であるが困難である。そこで我々は,関数型データ(DeepFunction)を用いたディープメトリック学習に基づく,革新的な分類フレームワークを提案する。このフレームワークは、深いメトリック学習の力を活用して、不均衡なデータセットをトレーニングする。関数データを処理するために特別に設計されたニューラルネットワークも、多チャンネルおよび不完全な関数データを扱うために提案されている。実世界のケーススタディの結果は、既存のベンチマークと比較すると、我々のフレームワークの精度が優れていることを示している。 In modern manufacturing, most of the product lines are conforming. Few products are nonconforming but with different defect types. The identification of defect types can help further root cause diagnosis of production lines. With the sensing development, signals of process variables can be collected in high resolution, which can be regarded as multichannel functional data. They have abundant information to characterize the process and help identify the defect types. Motivated by a real example from the pipe tightening process, we focus on defect classification where each sample is a multichannel functional data. However, the available samples for each defect type are limited and imbalanced. Moreover, the functions are incomplete since the pre-tightening process before the pipe tightening process is unobserved. To classify the defect samples based on imbalanced, multichannel, and incomplete functional data is very important but challenging. Thus, we propose an innovative classification framework based on deep metric learning using functional data (DeepFunction). The framework leverages the power of deep metric learning to train on imbalanced datasets. A neural network specially crafted for processing functional data is also proposed to handle multichannel and incomplete functional data. The results from a real-world case study demonstrate the superior accuracy of our framework when compared to existing benchmarks.	翻訳日:2024-04-26 20:38:42 公開日:2024-04-24
# 十分でないなら、そのようにしよう:合成顔を通して顔認識における認証データの需要を減らす If It's Not Enough, Make It So: Reducing Authentic Data Demand in Face Recognition through Synthetic Faces ( http://arxiv.org/abs/2404.03537v3 ) ライセンス: Link先を確認	Andrea Atzori, Fadi Boutros, Naser Damer, Gianni Fenu, Mirko Marras,	(参考訳) 近年の深層顔認識の進歩は、大規模で多様で手動で注釈付けされた顔データセットの需要を増大させてきた。顔認識のための真正で高品質なデータを取得することは、主にプライバシー上の懸念から、困難であることが証明されている。大規模な顔データセットは、主にWebベースのイメージから作成され、明示的なユーザの同意が欠如している。本稿では,合成顔データを用いて実画像に頼らずに効果的な顔認識モデルを訓練し,データ収集の懸念を緩和する方法について検討する。まず,最新の顔認識モデルの性能ギャップについて検討し,合成データのみと認証データのみを用いて訓練した。そこで我々は,最先端のバックボーンを様々な合成データと認証データの組み合わせで訓練することにより,分析をより深め,検証精度の確保のために,後者の限られた使用法を最適化するための洞察を得た。最後に、同じ目的を念頭において、データ拡張アプローチが合成データおよび認証データに与える影響を評価した。以上の結果から,統合データセットでトレーニングしたFRの有効性,特に適切な拡張手法と組み合わせた場合のFRの有効性が明らかとなった。 Recent advances in deep face recognition have spurred a growing demand for large, diverse, and manually annotated face datasets. Acquiring authentic, high-quality data for face recognition has proven to be a challenge, primarily due to privacy concerns. Large face datasets are primarily sourced from web-based images, lacking explicit user consent. In this paper, we examine whether and how synthetic face data can be used to train effective face recognition models with reduced reliance on authentic images, thereby mitigating data collection concerns. First, we explored the performance gap among recent state-of-the-art face recognition models, trained with synthetic data only and authentic (scarce) data only. Then, we deepened our analysis by training a state-of-the-art backbone with various combinations of synthetic and authentic data, gaining insights into optimizing the limited use of the latter for verification accuracy. Finally, we assessed the effectiveness of data augmentation approaches on synthetic and authentic data, with the same goal in mind. Our results highlighted the effectiveness of FR trained on combined datasets, particularly when combined with appropriate augmentation techniques.	翻訳日:2024-04-26 20:38:42 公開日:2024-04-24
# SAAS:大規模言語モデルにおける数学的推論強化のための問題解決能力向上戦略 SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models ( http://arxiv.org/abs/2404.03887v3 ) ライセンス: Link先を確認	Hyeonwoo Kim, Gyoungjin Gim, Yungi Kim, Jihoo Kim, Byungju Kim, Wonseok Lee, Chanjun Park,	(参考訳) 本研究では,Large Language Models (LLM) の数学的推論と問題解決能力の向上を目的とした,新しい学習手法を提案する。我々は,CoT(Chain-of-Thought)とPoT(Program-of-Thought)の学習を統合することに集中し,数学的推論能力の学習の優先順位付けが問題解決能力の増幅に役立つと仮定した。したがって、CoTによる初期学習は、問題の解決に不可欠である。そこで本研究では,CoT学習からPoT学習へ戦略的に移行する,SAAS(Solving Ability Amplification Strategy)という逐次学習手法を提案する。いくつかのベンチマークによる広範な性能比較を含む実証研究により,SAASがSOTA(State-of-the-art)の性能を達成することを示す。その結果, LLMにおける数学的推論の分野において, 逐次学習手法の有効性が著しく向上していることが示唆された。 This study presents a novel learning approach designed to enhance both mathematical reasoning and problem-solving abilities of Large Language Models (LLMs). We focus on integrating the Chain-of-Thought (CoT) and the Program-of-Thought (PoT) learning, hypothesizing that prioritizing the learning of mathematical reasoning ability is helpful for the amplification of problem-solving ability. Thus, the initial learning with CoT is essential for solving challenging mathematical problems. To this end, we propose a sequential learning approach, named SAAS (Solving Ability Amplification Strategy), which strategically transitions from CoT learning to PoT learning. Our empirical study, involving an extensive performance comparison using several benchmarks, demonstrates that our SAAS achieves state-of-the-art (SOTA) performance. The results underscore the effectiveness of our sequential learning approach, marking a significant advancement in the field of mathematical reasoning in LLMs.	翻訳日:2024-04-26 20:38:42 公開日:2024-04-24
# コード表現の強化によるグラフニューラルネットによる障害位置推定の改善に向けて Towards Better Graph Neural Neural Network-based Fault Localization Through Enhanced Code Representation ( http://arxiv.org/abs/2404.04496v3 ) ライセンス: Link先を確認	Md Nakhla Rafi, Dong Jae Kim, An Ran Chen, Tse-Hsun Chen, Shaowei Wang,	(参考訳) 自動ソフトウェアフォールトローカライゼーションは、デバッグを容易にするために故障箇所をピンポイントすることで、ソフトウェア品質保証において重要な役割を果たす。広く使われている手法であるカバレッジベースのフォールトローカライゼーションでは、被疑点スコアに基づいたコードランク付けにカバレッジスペクトルの統計を用いる。しかし、統計的アプローチの剛性は、学習に基づく技術を要求する。中でもグラフニューラルネットワーク(GNN)に基づくグラフニューラルネットワーク(Grace)は,特徴表現を圧縮する他の学習手法の制限を緩和する,厳密な抽象構文強化グラフ表現として,テストとソースのカバレッジ関係を保存する能力によって,最先端技術を実現している。しかし、そのような表現は、ソフトウェアと関連するカバレッジスペクトルとASTグラフの複雑さの増大によりスケーラビリティに苦慮している。本研究では,ノードやエッジにおけるグラフ表現の複雑さを70%削減する新しいグラフ表現であるDepGraphを提案する。さらに,属性としてグラフ内のコード変更情報などの付加的機能を統合し,そのモデルが豊富な歴史的プロジェクトデータを活用できるようにする。 Defects4j 2.0.0を用いてDepGraphを評価し,Top-1における20%以上の障害の所在と平均一位と平均平均ランク(MAR)を50%以上改善し,GPUメモリ使用率を44%削減し,トレーニング/推論時間を85%向上させた。さらに、クロスプロジェクト環境では、DepGraphは最先端のベースラインを超え、Top-1の精度が42%、MFRとMARが68%、MARが65%向上している。我々の研究は、DepGraphの堅牢性、最先端の精度、将来の拡張と採用のためのスケーラビリティを実証する。 Automatic software fault localization plays an important role in software quality assurance by pinpointing faulty locations for easier debugging. Coverage-based fault localization, a widely used technique, employs statistics on coverage spectra to rank code based on suspiciousness scores. However, the rigidity of statistical approaches calls for learning-based techniques. Amongst all, Grace, a graph-neural network (GNN) based technique has achieved state-of-the-art due to its capacity to preserve coverage spectra, i.e., test-to-source coverage relationships, as precise abstract syntax-enhanced graph representation, mitigating the limitation of other learning-based technique which compresses the feature representation. However, such representation struggles with scalability due to the increasing complexity of software and associated coverage spectra and AST graphs. In this work, we proposed a new graph representation, DepGraph, that reduces the complexity of the graph representation by 70% in nodes and edges by integrating interprocedural call graph in the graph representation of the code. Moreover, we integrate additional features such as code change information in the graph as attributes so the model can leverage rich historical project data. We evaluate DepGraph using Defects4j 2.0.0, and it outperforms Grace by locating 20% more faults in Top-1 and improving the Mean First Rank (MFR) and the Mean Average Rank (MAR) by over 50% while decreasing GPU memory usage by 44% and training/inference time by 85%. Additionally, in cross-project settings, DepGraph surpasses the state-of-the-art baseline with a 42% higher Top-1 accuracy, and 68% and 65% improvement in MFR and MAR, respectively. Our study demonstrates DepGraph's robustness, achieving state-of-the-art accuracy and scalability for future extension and adoption.	翻訳日:2024-04-26 20:38:42 公開日:2024-04-24
# MA-LMM:長期ビデオ理解のためのメモリ拡張大型マルチモーダルモデル MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding ( http://arxiv.org/abs/2404.05726v2 ) ライセンス: Link先を確認	Bo He, Hengduo Li, Young Kyun Jang, Menglin Jia, Xuefei Cao, Ashish Shah, Abhinav Shrivastava, Ser-Nam Lim,	(参考訳) 大型言語モデル(LLM)の成功により、ビジョンモデルとLLMの統合により、ビジョン言語基盤モデルの構築が注目されている。しかし、既存のLLMベースの大規模マルチモーダルモデル(例えば、Video-LLaMA、VideoChat)は、短いビデオ理解のために限られたフレームしか持たない。本研究では,長期的映像理解のための効率的かつ効果的なモデルの設計に主眼を置いている。既存の作業と同じようなフレームを同時に処理するのではなく、オンラインで動画を処理し、過去の映像情報をメモリバンクに保存することを提案する。これにより、LLMのコンテキスト長制約やGPUメモリ制限を超過することなく、長期解析のために過去の映像コンテンツを参照することが可能となる。私たちのメモリバンクは、市販のマルチモーダルLCMにシームレスに統合できます。我々は,映像理解,ビデオ質問応答,ビデオキャプションなど,様々な映像理解タスクに関する広範な実験を行い,そのモデルにより,複数のデータセットにわたる最先端のパフォーマンスを実現することができる。コードはhttps://boheumd.github.io/MA-LMM/で公開されている。 With the success of large language models (LLMs), integrating the vision model into LLMs to build vision-language foundation models has gained much more interest recently. However, existing LLM-based large multimodal models (e.g., Video-LLaMA, VideoChat) can only take in a limited number of frames for short video understanding. In this study, we mainly focus on designing an efficient and effective model for long-term video understanding. Instead of trying to process more frames simultaneously like most existing work, we propose to process videos in an online manner and store past video information in a memory bank. This allows our model to reference historical video content for long-term analysis without exceeding LLMs' context length constraints or GPU memory limits. Our memory bank can be seamlessly integrated into current multimodal LLMs in an off-the-shelf manner. We conduct extensive experiments on various video understanding tasks, such as long-video understanding, video question answering, and video captioning, and our model can achieve state-of-the-art performances across multiple datasets. Code available at https://boheumd.github.io/MA-LMM/.	翻訳日:2024-04-26 20:28:54 公開日:2024-04-24
# 定段非平滑型SAのプリリミット結合と定常収束 Prelimit Coupling and Steady-State Convergence of Constant-stepsize Nonsmooth Contractive SA ( http://arxiv.org/abs/2404.06023v2 ) ライセンス: Link先を確認	Yixuan Zhang, Dongyan Huo, Yudong Chen, Qiaomin Xie,	(参考訳) Q-learningによって動機づけられ, 定常段階の非滑らかな収縮性確率近似 (SA) について検討した。ダイナミクスの2つの重要なクラスに焦点を当てます。 1)付加雑音を有する非平滑な収縮型SA 2) 加法ノイズと乗法ノイズの両方を特徴とする同期および非同期Q-ラーニング。どちらの力学に対しても、ワッサーシュタイン距離の定常極限分布に反復体の弱収束を確立する。さらに,定常収束を確立するためのプリリミット結合手法を提案し,ステップサイズがゼロになるにつれて定常分布の限界を特徴づける。この結果から、非滑らかなSAの漸近バイアスは、滑らかなSAと鋭い対照的なステップサイズの平方根に比例することを示した。このバイアス特性により、非滑らかなSAのバイアス低減にリチャードソン・ロームバーグ外挿を用いることができる。 Motivated by Q-learning, we study nonsmooth contractive stochastic approximation (SA) with constant stepsize. We focus on two important classes of dynamics: 1) nonsmooth contractive SA with additive noise, and 2) synchronous and asynchronous Q-learning, which features both additive and multiplicative noise. For both dynamics, we establish weak convergence of the iterates to a stationary limit distribution in Wasserstein distance. Furthermore, we propose a prelimit coupling technique for establishing steady-state convergence and characterize the limit of the stationary distribution as the stepsize goes to zero. Using this result, we derive that the asymptotic bias of nonsmooth SA is proportional to the square root of the stepsize, which stands in sharp contrast to smooth SA. This bias characterization allows for the use of Richardson-Romberg extrapolation for bias reduction in nonsmooth SA.	翻訳日:2024-04-26 20:28:54 公開日:2024-04-24
# ノイズNCA:ニューラルセルオートマタの時空間連続性を改善するノイジー種子 NoiseNCA: Noisy Seed Improves Spatio-Temporal Continuity of Neural Cellular Automata ( http://arxiv.org/abs/2404.06279v2 ) ライセンス: Link先を確認	Ehsan Pajouheshgar, Yitao Xu, Sabine Süsstrunk,	(参考訳) ニューラルセルオートマタ(Neural Cellular Automata、NCA)はセルオートマタの一種で、ニューラルネットワークによって更新ルールをパラメータ化して、勾配降下を用いてトレーニングすることができる。本稿では, 反応拡散系を記述する偏微分方程式 (PDE) に着想を得て, テクスチャ合成に使用されるNAAモデルに着目した。 NCAモデルをトレーニングするために、時空間領域を離散化し、オイラー積分を用いてPDEを数値シミュレーションする。しかし、訓練されたNAAが、対応するPDEによって記述される連続力学を真に学習するかどうか、あるいは単にトレーニングで使用される離散化を過度に適合させるだけなのかは、未解決の問題である。時空離散化が連続性に近づく極限において, NCA モデルについて検討する。既存のNAAモデルは、特に「シード」とも呼ばれる初期状態に近い場合、トレーニングの離散化に過度に適合する傾向にある。そこで本研究では,一様雑音を初期条件とする解を提案する。本研究では, NCA の動的一貫性を幅広い時空間的粒度にわたって維持する手法の有効性を実証する。 NCAモデルの改良により、パターン生成速度と合成パターンのスケールを連続的に制御し、2つの新しいテスト時間相互作用が可能となった。インタラクティブなオンラインデモでは、この新しいNAA機能を実演しています。我々の研究は、NAAモデルが連続力学を学習し、動的システムの観点からNAA研究の新たな場を開くことを明らかにしている。 Neural Cellular Automata (NCA) is a class of Cellular Automata where the update rule is parameterized by a neural network that can be trained using gradient descent. In this paper, we focus on NCA models used for texture synthesis, where the update rule is inspired by partial differential equations (PDEs) describing reaction-diffusion systems. To train the NCA model, the spatio-termporal domain is discretized, and Euler integration is used to numerically simulate the PDE. However, whether a trained NCA truly learns the continuous dynamic described by the corresponding PDE or merely overfits the discretization used in training remains an open question. We study NCA models at the limit where space-time discretization approaches continuity. We find that existing NCA models tend to overfit the training discretization, especially in the proximity of the initial condition, also called "seed". To address this, we propose a solution that utilizes uniform noise as the initial condition. We demonstrate the effectiveness of our approach in preserving the consistency of NCA dynamics across a wide range of spatio-temporal granularities. Our improved NCA model enables two new test-time interactions by allowing continuous control over the speed of pattern formation and the scale of the synthesized patterns. We demonstrate this new NCA feature in our interactive online demo. Our work reveals that NCA models can learn continuous dynamics and opens new venues for NCA research from a dynamical systems' perspective.	翻訳日:2024-04-26 20:28:54 公開日:2024-04-24
# 指数重み付き移動モデル Exponentially Weighted Moving Models ( http://arxiv.org/abs/2404.08136v2 ) ライセンス: Link先を確認	Eric Luxenberg, Stephen Boyd,	(参考訳) ベクトル時系列に対する指数重み付き移動モデル(EWMM)は、過去の観測データに対する指数重み付き損失関数に基づいて、時間毎に新しいデータモデルに適合する。指数重み付き移動平均(EWMA)は、平方損失関数を用いて平均を推定する特殊なケースである。二次損失関数に対して、EWMMは2次関数のパラメータを更新する単純な再帰を用いて適合することができる。他の損失関数の場合、過去の履歴全体が保存されなければならない。本稿では,過去のサンプルの固定数のウィンドウのみを格納するEWMMの近似計算法を提案する。この近似EWMMは凸最適化に依存し、時間とともに成長しない問題を解く。近似から得られた推定値と正確なEWMM法による推定値を比較する。 An exponentially weighted moving model (EWMM) for a vector time series fits a new data model each time period, based on an exponentially fading loss function on past observed data. The well known and widely used exponentially weighted moving average (EWMA) is a special case that estimates the mean using a square loss function. For quadratic loss functions EWMMs can be fit using a simple recursion that updates the parameters of a quadratic function. For other loss functions, the entire past history must be stored, and the fitting problem grows in size as time increases. We propose a general method for computing an approximation of EWMM, which requires storing only a window of a fixed number of past samples, and uses an additional quadratic term to approximate the loss associated with the data before the window. This approximate EWMM relies on convex optimization, and solves problems that do not grow with time. We compare the estimates produced by our approximation with the estimates from the exact EWMM method.	翻訳日:2024-04-26 20:28:54 公開日:2024-04-24
# 量子エミッタからの時間結合絡み合った光子の理論 Theory of time-bin entangled photons from quantum emitters ( http://arxiv.org/abs/2404.08348v2 ) ライセンス: Link先を確認	Thomas K. Bracht, Florian Kappe, Moritz Cygorek, Tim Seidelmann, Yusuf Karli, Vikas Remesh, Gregor Weihs, Vollrath Martin Axt, Doris E. Reiter,	(参考訳) 絡み合った光子対は、量子通信の領域における多くの応用の基礎となる。絡み合った光子対の光ファイバー移動では、時間ビン符号化は偏光符号化量子ビットに比べて安定性が向上する可能性がある。ここでは、時間双絡光子の測定を記述するための理論的基礎を定めている。我々は、量子状態トモグラフィー測定に対応する時間ビン符号化光子対の多重時間相関関数を導出する。我々の理論は、量子エミッタからの時間ビン絡みの現実的なシミュレーションのために、特定の量子システムに適用されるあらゆる種類の損失やデコヒーレンス効果を含むようにシミュレーションを拡張する出発点となる。 Entangled photon pairs form the foundation for many applications in the realm of quantum communication. For fiber-optic transfer of entangled photon pairs, time-bin encoding can potentially offer an improved stability compared to polarization encoded qubits. Here, we lay the theoretical foundations to describe the measurement of time-bin entangled photons. We derive multi-time correlation functions of the time-bin encoded photon pairs, corresponding to quantum state tomographic measurements. Our theory can be the starting point to extend the simulations to include all kinds of loss or decoherence effects that apply in a specific quantum system for realistic simulation for time-bin entanglement from quantum emitters.	翻訳日:2024-04-26 20:28:54 公開日:2024-04-24
# 大規模言語モデルを用いた次世代データインタラクションシステムDB-GPTの実証 Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models ( http://arxiv.org/abs/2404.10209v3 ) ライセンス: Link先を確認	Siqiao Xue, Danrui Qi, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhiping Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Hong Yi, Shaodong Liu, Hongjun Yang, Faqiang Chen,	(参考訳) 大規模言語モデル(LLM)の最近のブレークスルーは、ソフトウェアの多くの領域を移行する位置にある。データと対話する技術は、特にLLMと重要な絡み合いを持ち、効率的で直感的なデータインタラクションが最重要である。本稿では,従来のデータインタラクションタスクにLLMを統合し,ユーザエクスペリエンスとアクセシビリティを向上させる,革新的で製品対応のPythonライブラリDB-GPTを提案する。 DB-GPTは、自然言語で記述されたデータインタラクションタスクを理解し、LLMによるコンテキスト認識応答を提供するように設計されており、初心者から専門家まで、ユーザにとって必須のツールである。システム設計は、ローカル、分散、およびクラウド環境へのデプロイをサポートする。 LLMでText-to-SQLのような基本的なデータインタラクションタスクを扱うだけでなく、Multi-AgentsフレームワークやAエージェントワークフロー表現言語(AWEL)を通じて生成データ分析のような複雑なタスクを処理できる。サービス指向マルチモデル管理フレームワーク(SMMF)は、データのプライバシとセキュリティを保証する。さらに、DB-GPTは、ユーザがDB-GPTを製品環境に簡単に統合できるように設計された一連の製品対応機能を提供している。 DB-GPTのコードはGithub(https://github.com/eosphoros-ai/DB-GPT)で公開されている。手順(https://github.com/eosphoros-ai/DB-GPT#install)でDB-GPTをインストールし、Youtube(https://youtu.be/n_8RI1ENyl4)で5分間の紹介ビデオを見て、DB-GPTをさらに調査してください。 The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. The technologies of interacting with data particularly have an important entanglement with LLMs as efficient and intuitive data interactions are paramount. In this paper, we present DB-GPT, a revolutionary and product-ready Python library that integrates LLMs into traditional data interaction tasks to enhance user experience and accessibility. DB-GPT is designed to understand data interaction tasks described by natural language and provide context-aware responses powered by LLMs, making it an indispensable tool for users ranging from novice to expert. Its system design supports deployment across local, distributed, and cloud environments. Beyond handling basic data interaction tasks like Text-to-SQL with LLMs, it can handle complex tasks like generative data analysis through a Multi-Agents framework and the Agentic Workflow Expression Language (AWEL). The Service-oriented Multi-model Management Framework (SMMF) ensures data privacy and security, enabling users to employ DB-GPT with private LLMs. Additionally, DB-GPT offers a series of product-ready features designed to enable users to integrate DB-GPT within their product environments easily. The code of DB-GPT is available at Github(https://github.com/eosphoros-ai/DB-GPT) which already has over 10.7k stars. Please install DB-GPT for your own usage with the instructions(https://github.com/eosphoros-ai/DB-GPT#install) and watch a 5-minute introduction video on Youtube(https://youtu.be/n_8RI1ENyl4) to further investigate DB-GPT.	翻訳日:2024-04-26 20:28:54 公開日:2024-04-24
# 騒音測定の浄化と密閉の忠実な蒸留 Purification of Noisy Measurements and Faithful Distillation of Entanglement ( http://arxiv.org/abs/2404.10538v2 ) ライセンス: Link先を確認	Jaemin Kim, Jiyoung Yun, Joonwoo Bae,	(参考訳) 一般量子演算を構成する量子測度が特にノイズとなるような,ノイズを伴う現実的なシナリオにおけるエンタングルメント蒸留について考察する。本報告では, ノイズ測定を浄化するプロトコルについて述べるとともに, 浄化の助けを借りて, 不完全な局所操作を蒸留に利用できることを示す。提案手法は, 実装時のノイズに対して堅牢であることを示すとともに, 実用化時の浄化を解析し, 測定およびゲート誤差を最大10%まで低減するために, 2つの追加量子ビットによる浄化は, 絡み合わせを蒸留するのに費用対効果があることを示唆する。精製プロトコルは、現在利用可能な量子技術で実現可能であり、絡み合いアプリケーションに容易に適用できる。 We consider entanglement distillation in a realistic scenario with noisy operations in which quantum measurements that constitute a general quantum operation are particularly noisy. We present a protocol for purifying noisy measurements and show that with the help of the purification, imperfect local operations can be used to distill entanglement. We show that the purification protocol is robust against noise in implementation and analyze the purification in a practical realization: for measurement and gate errors up to 10%, we suggest that the purification with two additional qubits is cost-effective for distilling entanglement. The purification protocol is feasible with currently available quantum technologies and readily applied to entanglement applications.	翻訳日:2024-04-26 20:28:54 公開日:2024-04-24
# 異常検出のための散乱変換を用いたグラフニューラルネットワークの統合 Integrating Graph Neural Networks with Scattering Transform for Anomaly Detection ( http://arxiv.org/abs/2404.10800v3 ) ライセンス: Link先を確認	Abdeljalil Zoubir, Badr Missaoui,	(参考訳) 本稿では,グラフニューラルネットワーク(GNN)を用いたネットワーク侵入検知システム(NIDS)における2つの新しい手法を提案する。最初のアプローチであるScattering Transform with E-GraphSAGE (STEG)は、散乱変換を用いてエッジ特徴ベクトルの多重分解能解析を行う。これは、ネットワークトラフィックの微妙な異常を特定するのに不可欠な詳細な表現を提供する。第2のアプローチでは、ノード表現をNode2Vecで開始することで改善し、統一値を使用する標準的な方法から逸脱し、より正確で全体的なネットワーク画像を取得する。提案手法は,ベンチマークNIDSデータセットにおける既存の最先端手法と比較して,性能が大幅に向上した。 In this paper, we present two novel methods in Network Intrusion Detection Systems (NIDS) using Graph Neural Networks (GNNs). The first approach, Scattering Transform with E-GraphSAGE (STEG), utilizes the scattering transform to conduct multi-resolution analysis of edge feature vectors. This provides a detailed representation that is essential for identifying subtle anomalies in network traffic. The second approach improves node representation by initiating with Node2Vec, diverging from standard methods of using uniform values, thereby capturing a more accurate and holistic network picture. Our methods have shown significant improvements in performance compared to existing state-of-the-art methods in benchmark NIDS datasets.	翻訳日:2024-04-26 20:28:54 公開日:2024-04-24
# 高速スパース入力動的ビュー合成のための分解運動場 Factorized Motion Fields for Fast Sparse Input Dynamic View Synthesis ( http://arxiv.org/abs/2404.11669v3 ) ライセンス: Link先を確認	Nagabhushan Somraj, Kapil Choudhary, Sai Harsha Mupparaju, Rajiv Soundararajan,	(参考訳) 高速な最適化とレンダリングのために動的シーンの3D表現を設計することは難しい作業である。最近の明示的な表現は動的放射場を高速に学習しレンダリングすることを可能にするが、それらには深い入力視点が必要である。本研究では,スパースな入力視点を持つ動的放射場に対する高速な表現の学習に焦点をあてる。しかし、スパース入力による最適化は非制約であり、学習を制約するためには、前もって動きを使う必要がある。既存の高速ダイナミックシーンモデルでは、動きを明示的にモデル化することはなく、動きの先行に制約されるのが困難である。運動場の時空間相関を生かし,高速な因子化4次元表現として明示的な動きモデルを設計する。次に、カメラ間のスパースフロー前処理と、カメラ内の密流前処理を組み合わせることで、動作モデルを調整することを含む、信頼性の高いフロー前処理を導入する。我々のモデルは高速でコンパクトであり、スパースな入力視点を持つ人気のあるマルチビュー動的シーンデータセット上で非常に優れた性能を実現している。私たちのモデルのソースコードは、プロジェクトページにある。 https://nagabhushansn95.github.io/publications/2024/RF-DeRF.html。 Designing a 3D representation of a dynamic scene for fast optimization and rendering is a challenging task. While recent explicit representations enable fast learning and rendering of dynamic radiance fields, they require a dense set of input viewpoints. In this work, we focus on learning a fast representation for dynamic radiance fields with sparse input viewpoints. However, the optimization with sparse input is under-constrained and necessitates the use of motion priors to constrain the learning. Existing fast dynamic scene models do not explicitly model the motion, making them difficult to be constrained with motion priors. We design an explicit motion model as a factorized 4D representation that is fast and can exploit the spatio-temporal correlation of the motion field. We then introduce reliable flow priors including a combination of sparse flow priors across cameras and dense flow priors within cameras to regularize our motion model. Our model is fast, compact and achieves very good performance on popular multi-view dynamic scene datasets with sparse input viewpoints. The source code for our model can be found on our project page: https://nagabhushansn95.github.io/publications/2024/RF-DeRF.html.	翻訳日:2024-04-26 20:28:54 公開日:2024-04-24
# ダミーの格子手術 Lattice Surgery for Dummies ( http://arxiv.org/abs/2404.13202v2 ) ライセンス: Link先を確認	Avimita Chatterjee, Subrata Das, Swaroop Ghosh,	(参考訳) 量子誤り訂正(QEC)は、ノイズの修正とフォールトトレラント量子コンピューティングへの道を開く上で重要な役割を果たす。この分野は大幅に進歩し、新しい量子エラー訂正符号が頻繁に出現し、エラーに効果的に対処している。これらのうち、トポロジ的符号、特に表面符号は、誤差の低いしきい値と大規模量子コンピュータの実装の可能性で際立っている。しかし、これらの符号は1量子ビットの符号化に制限されている。格子手術は、複数の符号化量子ビット間の相互作用や、表面コードの格子間の相互作用を可能にするために重要であり、その高度な誤り訂正機能は、運用上のオーバーヘッドを大幅に増大させることなく維持される。格子手術は、より広範な量子系にまたがるQECCのスケーリングに重要である。その重要な重要性にもかかわらず、格子の手術を理解することは、その固有の複雑さのために困難であり、複雑な量子物理学と数学的概念の深い理解を必要としている。本論文は,格子手術のデミスティフィケーションを試み,量子物理学や数学の深い背景を持たない人にもアクセスできるようにする。この研究は、表面符号を探索し、格子手術の基礎を導入し、量子ゲートの構築とマルチキュービット回路のエミュレートにその応用を実証する。 Quantum error correction (QEC) plays a crucial role in correcting noise and paving the way for fault-tolerant quantum computing. This field has seen significant advancements, with new quantum error correction codes emerging regularly to address errors effectively. Among these, topological codes, particularly surface codes, stand out for their low error thresholds and feasibility for implementation in large-scale quantum computers. However, these codes are restricted to encoding a single qubit. Lattice surgery is crucial for enabling interactions among multiple encoded qubits or between the lattices of a surface code, ensuring that its sophisticated error-correcting features are maintained without significantly increasing the operational overhead. Lattice surgery is pivotal for scaling QECCs across more extensive quantum systems. Despite its critical importance, comprehending lattice surgery is challenging due to its inherent complexity, demanding a deep understanding of intricate quantum physics and mathematical concepts. This paper endeavors to demystify lattice surgery, making it accessible to those without a profound background in quantum physics or mathematics. This work explores surface codes, introduces the basics of lattice surgery, and demonstrates its application in building quantum gates and emulating multi-qubit circuits.	翻訳日:2024-04-26 20:19:09 公開日:2024-04-24
# RAW画像からの反射の除去 Removing Reflections from RAW Photos ( http://arxiv.org/abs/2404.14414v2 ) ライセンス: Link先を確認	Eric Kee, Adam Pikielny, Kevin Blackburn-Matzen, Marc Levoy,	(参考訳) 消費者写真用画像から現実世界の反射を除去するシステムについて述べる。本システムでは,リニア(RAW)写真に対して,モバイルデバイス上で自撮りカメラを使用すれば,リフレクション(リフレクション)を不明瞭にするためのコンテキスト写真の追加(オプション)を行う。このシステムは実世界のRAW画像の合成混合物を用いて訓練され、光学的かつ幾何学的に正確な反射シミュレーションを用いて合成される。提案システムは,取得した画像と任意の文脈写真を入力として受け入れ,256pで動作させるベースモデルと,256pで出力された256p画像をフル解像度に変換するアップサンプリングモデルから構成される。このシステムは、MacBookまたはiPhone 14 Proで1Kから4.5～6.5秒でレビュー用の画像を生成することができる。我々は、現場で撮影されたRAW写真をテストし、典型的な消費者向け写真を具現化した。 We describe a system to remove real-world reflections from images for consumer photography. Our system operates on linear (RAW) photos, with the (optional) addition of a contextual photo looking in the opposite direction, e.g., using the selfie camera on a mobile device, which helps disambiguate what should be considered the reflection. The system is trained using synthetic mixtures of real-world RAW images, which are combined using a reflection simulation that is photometrically and geometrically accurate. Our system consists of a base model that accepts the captured photo and optional contextual photo as input, and runs at 256p, followed by an up-sampling model that transforms output 256p images to full resolution. The system can produce images for review at 1K in 4.5 to 6.5 seconds on a MacBook or iPhone 14 Pro. We test on RAW photos that were captured in the field and embody typical consumer photographs.	翻訳日:2024-04-26 20:19:09 公開日:2024-04-24
# 量子計量による非エルミート臨界点の同定 Identifying non-Hermitian critical points with quantum metric ( http://arxiv.org/abs/2404.15628v1 ) ライセンス: Link先を確認	Jun-Feng Ren, Jing Li, Hai-Tao Ding, Dan-Wei Zhang,	(参考訳) 量子状態の幾何学的性質は、量子幾何学テンソルによって完全に符号化される。量子幾何テンソルの実部と虚部は、それぞれヒルベルト空間内の2つの近接量子状態間の距離と位相差を特徴づける量子計量とベリー曲率である。従来のエルミート量子系では、量子メートル法は忠実度感受性に対応しており、幾何学的な観点からの量子相転移の特定に既に使われている。本研究では、この知恵を非エルミート系に拡張し、非エルミート臨界点を明らかにする。具体的には、数値的厳密な対角化法と解析法を用いることで、非エルミート一般化オーブリー・アンドル・マインモデルと非エルミートクラスタと混合場イジングモデルを含む様々な非エルミートモデルにおける量子メートル法と対応する順序パラメータを計算する。これらの非エルミートモデルにおける固有状態の量子計量は、それぞれ局在化遷移、移動エッジ、および多体量子相転移を正確に同定する。さらに、この戦略は有限サイズ効果と異なる境界条件に対して堅牢であることを示す。 The geometric properties of quantum states is fully encoded by the quantum geometric tensor. The real and imaginary parts of the quantum geometric tensor are the quantum metric and Berry curvature, which characterize the distance and phase difference between two nearby quantum states in Hilbert space, respectively. For conventional Hermitian quantum systems, the quantum metric corresponds to the fidelity susceptibility and has already been used to specify quantum phase transitions from the geometric perspective. In this work, we extend this wisdom to the non-Hermitian systems for revealing non-Hermitian critical points. To be concrete, by employing numerical exact diagonalization and analytical methods, we calculate the quantum metric and corresponding order parameters in various non-Hermitian models, which include two non-Hermitian generalized Aubry-Andr\'{e} models and non-Hermitian cluster and mixed-field Ising models. We demonstrate that the quantum metric of eigenstates in these non-Hermitian models exactly identifies the localization transitions, mobility edges, and many-body quantum phase transitions, respectively. We further show that this strategy is robust against the finite-size effect and different boundary conditions.	翻訳日:2024-04-26 20:19:09 公開日:2024-04-24
# 非Fungibleプログラム:Web3用のプライベートフルスタックアプリケーション Non-Fungible Programs: Private Full-Stack Applications for Web3 ( http://arxiv.org/abs/2404.15632v1 ) ライセンス: Link先を確認	Blake Regalia, Benjamin Adams,	(参考訳) Web3アプリケーションがWeb 2.0に対して提供する最大の利点は、データアクセス層の進化である。ユーザからの信頼を強いる不透明で集中的なサービスは、スマートコントラクトの信頼性のない分散システムに置き換えられる。しかしながら、スマートコントラクトがトランザクションされるブロックチェーンベースのデータベースのパブリックな性質は、データプライバシに依存するアプリケーションや、不完全な情報を持つ参加者に依存するアプリケーションに対して、一般的には課題を提起している。これは、アクティブコントラクトのメモリ状態を暗号化する秘密のスマートコントラクトネットワークの導入と、データベースのオンチェーン保存によって、変わっている。機密性によって、コントラクトは以前実現不可能だった新しいインタラクションメカニズムをより容易に実装できる。一方、Web 2.0とWeb3アプリケーションの両方では、ユーザインターフェイスは、ユーザ意図をアクション可能なリクエストに変換する上で重要な役割を担っています。多くの場合、開発者はインテリジェンスと自律性をクライアント側に移行し、計算、グラフィックス、ネットワーキングにWeb技術を活用しています。しかし、Web3のこのようなフロントエンドへの依存は、アプリケーションに永続的なホストがいなければ、分散化されたアプリケーションがエンドユーザにアクセスできないという問題点を浮き彫りにしている。ここでは、ブロックチェーンを介して分散し、Web技術を活用し、暗号化されたスマートコントラクトで永続化されたプライベートデータベースによってバックアップされる、自己完結型のフロントエンドアプリケーションを開発するための、NFP(Non-Fungible Program)モデルを紹介します。フロントエンドコードへのアクセスとバックエンドサービスへのアクセスは、NFTオーナシップモデルに従ってスマートコントラクトによって制御され、保証される。拡張によって、NFPアプリケーションはトークン所有者に対話性をもたらし、オーラクルの認証機構、補充Webサービス、オーバレイネットワークなどの新機能をセキュアに実現します。また...。 The greatest advantage that Web3 applications offer over Web 2.0 is the evolution of the data access layer. Opaque, centralized services that compelled trust from users are replaced by trustless, decentralized systems of smart contracts. However, the public nature of blockchain-based databases, on which smart contracts transact, has typically presented a challenge for applications that depend on data privacy or that rely on participants having incomplete information. This has changed with the introduction of confidential smart contract networks that encrypt the memory state of active contracts as well as their databases stored on-chain. With confidentiality, contracts can more readily implement novel interaction mechanisms that were previously infeasible. Meanwhile, in both Web 2.0 and Web3 applications the user interface continues to play a crucial role in translating user intent into actionable requests. In many cases, developers have shifted intelligence and autonomy into the client-side, leveraging Web technologies for compute, graphics, and networking. Web3's reliance on such frontends has revealed a pain point though, namely that decentralized applications are not accessible to end users without a persistent host serving the application. Here we introduce the Non-Fungible Program (NFP) model for developing self-contained frontend applications that are distributed via blockchain, powered by Web technology, and backed by private databases persisted in encrypted smart contracts. Access to frontend code, as well as backend services, is controlled and guaranteed by smart contracts according to the NFT ownership model, eliminating the need for a separate host. By extension, NFP applications bring interactivity to token owners and enable new functionalities, such as authorization mechanisms for oracles, supplementary Web services, and overlay networks in a secure manner. In addition...	翻訳日:2024-04-26 20:19:09 公開日:2024-04-24
# マルチユニットオークション設計のための人工知能 Artificial Intelligence for Multi-Unit Auction design ( http://arxiv.org/abs/2404.15633v1 ) ライセンス: Link先を確認	Peyman Khezr, Kendall Taylor,	(参考訳) マルチユニットオークションにおける入札行動を理解することは、研究者にとって現在進行中の課題である。広く使われているにもかかわらず、入札行動、収益ランキング、そして一般的な多ユニットオークションの効率に関する理論的洞察は限られている。本稿では,人工知能,特に強化学習をモデル自由学習手法として活用し,実際に使用されている3つの著名なマルチユニットオークションにおける入札をシミュレートする。マルチユニットオークションにおいて,学習と入札に適した6つのアルゴリズムを導入し,実例を用いて比較する。本稿では,人工知能を用いたオークションデザインの重要性,特にマルチユニットオークションの設計の強化について述べる。 Understanding bidding behavior in multi-unit auctions remains an ongoing challenge for researchers. Despite their widespread use, theoretical insights into the bidding behavior, revenue ranking, and efficiency of commonly used multi-unit auctions are limited. This paper utilizes artificial intelligence, specifically reinforcement learning, as a model free learning approach to simulate bidding in three prominent multi-unit auctions employed in practice. We introduce six algorithms that are suitable for learning and bidding in multi-unit auctions and compare them using an illustrative example. This paper underscores the significance of using artificial intelligence in auction design, particularly in enhancing the design of multi-unit auctions.	翻訳日:2024-04-26 20:09:25 公開日:2024-04-24
# 予測エンクロメント時間に基づく非信号区間における歩行者の潜在的リスクのリアルタイム評価フレームワーク A Real-time Evaluation Framework for Pedestrian's Potential Risk at Non-Signalized Intersections Based on Predicted Post-Encroachment Time ( http://arxiv.org/abs/2404.15635v1 ) ライセンス: Link先を確認	Tengfeng Lin, Zhixiong Jin, Seongjin Choi, Hwasoo Yeo,	(参考訳) 交差点での歩行者の安全対策は、交通関連の負傷者や死亡者を減らす緊急性によって引き起こされる交通研究の分野における最重要課題の1つである。コンピュータビジョン技術や予測モデルの発展に伴い、リアルタイムのプロアクティブな保護システムの開発が交差点における歩行者の安全向上に欠かせないものとなっている。これらの保護システムの中核は、歩行者の潜在的なリスクの予測に基づく評価であり、事故の発生を防ぐ重要な役割を担っている。現在の予測に基づく潜在的なリスク評価研究における大きな課題は、歩行者の潜在的なリスクを評価するためのリアルタイムフレームワークを作成するための不十分な進歩、潜在的なリスクを表現するための正確で説明可能な安全指標の欠如、歩行者の各カテゴリーに特有な調整済みの評価基準の欠如、の3つの側面にまとめることができる。これらの課題に対処するために,コンピュータビジョン技術と予測モデルを用いたフレームワークを開発し,歩行者の潜在的なリスクをリアルタイムで評価する。この枠組みと一体化しているのは、歩行者や車両の交差点到着時刻を予測できるディープラーニングモデルから派生した、新しいサロゲート安全対策であるPredicted Post-Encroachment Time (P-PET)である。歩行者のリスク評価の有効性と信頼性をさらに向上するため,歩行者を異なるカテゴリーに分類し,グループ毎に具体的な評価基準を適用した。本研究は,P-PETを用いて潜在的リスクを効果的に識別する能力を示し,リアルタイムアプリケーションの実現可能性と,歩行者の異なるカテゴリーにおけるリスク評価性能の向上を示すものである。 Addressing pedestrian safety at intersections is one of the paramount concerns in the field of transportation research, driven by the urgency of reducing traffic-related injuries and fatalities. With advances in computer vision technologies and predictive models, the pursuit of developing real-time proactive protection systems is increasingly recognized as vital to improving pedestrian safety at intersections. The core of these protection systems lies in the prediction-based evaluation of pedestrian's potential risks, which plays a significant role in preventing the occurrence of accidents. The major challenges in the current prediction-based potential risk evaluation research can be summarized into three aspects: the inadequate progress in creating a real-time framework for the evaluation of pedestrian's potential risks, the absence of accurate and explainable safety indicators that can represent the potential risk, and the lack of tailor-made evaluation criteria specifically for each category of pedestrians. To address these research challenges, in this study, a framework with computer vision technologies and predictive models is developed to evaluate the potential risk of pedestrians in real time. Integral to this framework is a novel surrogate safety measure, the Predicted Post-Encroachment Time (P-PET), derived from deep learning models capable to predict the arrival time of pedestrians and vehicles at intersections. To further improve the effectiveness and reliability of pedestrian risk evaluation, we classify pedestrians into distinct categories and apply specific evaluation criteria for each group. The results demonstrate the framework's ability to effectively identify potential risks through the use of P-PET, indicating its feasibility for real-time applications and its improved performance in risk evaluation across different categories of pedestrians.	翻訳日:2024-04-26 20:09:25 公開日:2024-04-24
# PriorNet: 効率的な画像デハージングのための多次元インタラクティブアテンション付き軽量ネットワーク PriorNet: A Novel Lightweight Network with Multidimensional Interactive Attention for Efficient Image Dehazing ( http://arxiv.org/abs/2404.15638v1 ) ライセンス: Link先を確認	Yutong Chen, Zhang Wen, Chao Wang, Lei Gong, Zhongchao Yi,	(参考訳) ヘイズ画像は視覚的品質を低下させ、デハジングはその後の処理タスクにとって重要な前提条件である。現在のデハジング法のほとんどはニューラルネットワークに依存しており、高い計算パラメータ圧力や弱い一般化能力といった課題に直面している。本稿では,過剰な詳細抽出問題を回避しつつ,ヘイズ画像の明瞭さと視覚的品質を大幅に向上させる,新しい,軽量で,適用性の高いデハージングネットワークであるPresiderNetを紹介する。 PriorNetのコアは、多次元インタラクティブアテンション(MIA)機構であり、複雑なシステムに関連する計算負荷と一般化の難しさを著しく低減し、様々なヘイズ特性を効果的に捉えている。均一な畳み込みカーネルサイズを利用し、スキップ接続を組み込むことで、特徴抽出プロセスの合理化を実現した。レイヤ数とアーキテクチャの簡略化は、デハージング効率を高めるだけでなく、エッジデバイスへのデプロイを容易にする。複数のデータセットにわたる広範囲なテストは、シングルイメージのデハージングタスクにおいて、イメージの詳細と色の忠実さを維持しながら、デハージングと明快さの回復において、PreferNetの例外的なパフォーマンスを示している。特に、モデルのサイズがたった18KbのPresideNetは、他の方法に比べて優れたデハージング一般化機能を示している。我々の研究は、画像デハージング技術の発展に大きく貢献し、特に普遍性とデプロイ性の向上の重要性を強調しながら、フィールドおよび関連ドメインに対する新たな視点とツールを提供しています。 Hazy images degrade visual quality, and dehazing is a crucial prerequisite for subsequent processing tasks. Most current dehazing methods rely on neural networks and face challenges such as high computational parameter pressure and weak generalization capabilities. This paper introduces PriorNet--a novel, lightweight, and highly applicable dehazing network designed to significantly improve the clarity and visual quality of hazy images while avoiding excessive detail extraction issues. The core of PriorNet is the original Multi-Dimensional Interactive Attention (MIA) mechanism, which effectively captures a wide range of haze characteristics, substantially reducing the computational load and generalization difficulties associated with complex systems. By utilizing a uniform convolutional kernel size and incorporating skip connections, we have streamlined the feature extraction process. Simplifying the number of layers and architecture not only enhances dehazing efficiency but also facilitates easier deployment on edge devices. Extensive testing across multiple datasets has demonstrated PriorNet's exceptional performance in dehazing and clarity restoration, maintaining image detail and color fidelity in single-image dehazing tasks. Notably, with a model size of just 18Kb, PriorNet showcases superior dehazing generalization capabilities compared to other methods. Our research makes a significant contribution to advancing image dehazing technology, providing new perspectives and tools for the field and related domains, particularly emphasizing the importance of improving universality and deployability.	翻訳日:2024-04-26 20:09:25 公開日:2024-04-24
# CodeIP: 大規模言語のコードモデルのための文法ガイド付きマルチビット透かし CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code ( http://arxiv.org/abs/2404.15639v1 ) ライセンス: Link先を確認	Batu Guan, Yao Wan, Zhangqian Bi, Zheng Wang, Hongyu Zhang, Yulei Sui, Pan Zhou, Lichao Sun,	(参考訳) 大規模言語モデル(LLM)は、コード生成の自動化にますます使用されているため、コードがAI生成されているかどうか、特に産業における知的財産権(IP)保護や教育における学術的不正行為の防止といった目的のために、どのモデルがどのモデルであるかを知ることが望まれる。マシン生成コンテンツに透かしを組み込むことは、コード証明を提供する方法のひとつだが、既存のソリューションは単一のビットに制限されているか、柔軟性が欠如している。我々は,LLMベースのコード生成のための新しい透かし技術であるCodeIPを提案する。 CodeIPは、生成されたコードのセマンティクスを保持しながら、マルチビット情報の挿入を可能にし、未設定の透かしの強度と多様性を向上させる。これは、次のトークンの後の文法型を予測するために型予測器を訓練し、生成されたコードの構文的および意味的正しさを高めることで達成される。 5つのプログラミング言語にまたがる実世界のデータセットの実験では、CodeIPの有効性が示されている。 As Large Language Models (LLMs) are increasingly used to automate code generation, it is often desired to know if the code is AI-generated and by which model, especially for purposes like protecting intellectual property (IP) in industry and preventing academic misconduct in education. Incorporating watermarks into machine-generated content is one way to provide code provenance, but existing solutions are restricted to a single bit or lack flexibility. We present CodeIP, a new watermarking technique for LLM-based code generation. CodeIP enables the insertion of multi-bit information while preserving the semantics of the generated code, improving the strength and diversity of the inerseted watermark. This is achieved by training a type predictor to predict the subsequent grammar type of the next token to enhance the syntactical and semantic correctness of the generated code. Experiments on a real-world dataset across five programming languages showcase the effectiveness of CodeIP.	翻訳日:2024-04-26 20:09:25 公開日:2024-04-24
# Building-PCC: ポイントクラウド補完ベンチマークの構築 Building-PCC: Building Point Cloud Completion Benchmarks ( http://arxiv.org/abs/2404.15644v1 ) ライセンス: Link先を確認	Weixiao Gao, Ravi Peters, Jantien Stoter,	(参考訳) 3次元センシング技術の急速な進歩により、物体の3次元形状情報を得るのがますます便利になっている。ライダー技術は、遠距離で物体の3D情報を正確にキャプチャする機能を備えており、都市部の3Dデータの収集に広く応用されている。しかし、収集された点雲データは、閉塞、信号吸収、スペクトル反射などの要因により不完全性を示すことが多い。本稿では,これらの不完全データ処理におけるポイントクラウド補完技術の適用について検討し,都市のビルディングポイントクラウド補完作業における既存のディープラーニング手法の性能を評価するために,ビルディングPCCデータセットを新たに構築する。異なる手法の総合的な評価を通じて,3次元地理情報分野の革新を促進することを目的として,ビルディングポイントクラウドの完成において直面する重要な課題を分析した。ソースコードはhttps://github.com/tudelft3d/Building-PCC-Building-Point-Cloud-Completion-Benchmarks.gitで公開されています。 With the rapid advancement of 3D sensing technologies, obtaining 3D shape information of objects has become increasingly convenient. Lidar technology, with its capability to accurately capture the 3D information of objects at long distances, has been widely applied in the collection of 3D data in urban scenes. However, the collected point cloud data often exhibit incompleteness due to factors such as occlusion, signal absorption, and specular reflection. This paper explores the application of point cloud completion technologies in processing these incomplete data and establishes a new real-world benchmark Building-PCC dataset, to evaluate the performance of existing deep learning methods in the task of urban building point cloud completion. Through a comprehensive evaluation of different methods, we analyze the key challenges faced in building point cloud completion, aiming to promote innovation in the field of 3D geoinformation applications. Our source code is available at https://github.com/tudelft3d/Building-PCC-Building-Point-Cloud-Completion-Benchmarks.git.	翻訳日:2024-04-26 20:09:25 公開日:2024-04-24
# 小川らによるRamp Quantum Secret Sharing Schemeの高度共有 Advance Sharing with Ogawa et al.'s Ramp Quantum Secret Sharing Scheme ( http://arxiv.org/abs/2404.15646v1 ) ライセンス: Link先を確認	Satoshi Masumori, Ryutaroh Matsumoto,	(参考訳) 小川らによって提案されたランプ量子秘密共有は、しきい値型アクセス構造から最も高い符号化率を有する。一方、いくつかの量子秘密共有方式では、ディーラーに秘密が渡される前に一部の株式を参加者に分配できることが知られている。しかし、小川らの策略で秘密にされる前に、一部の株式を分配できるかどうかは不明である。本稿では,小川らの方式で秘密が与えられる前に株式を分配する手法を提案し,その上で,所定の秘密の前に流通できる株式の集合について必要かつ十分な条件を決定する。 The ramp quantum secret sharing proposed by Ogawa et al. has the highest possible coding rate given a threshold type access structure. On the other hand, in some quantum secret sharing schemes, it is known that some shares can be distributed to participants before a secret is given to the dealer. However, it is unclear whether some shares can be distributed before a secret is given in Ogawa et al.'s scheme. In this paper, we propose a method to distribute some shares before a secret is given in Ogawa et al.'s scheme, then determine a necessary and sufficient condition on sets of shares that can be distributed before a given secret.	翻訳日:2024-04-26 20:09:25 公開日:2024-04-24
# Affordance Blending Networks Affordance Blending Networks ( http://arxiv.org/abs/2404.15648v1 ) ライセンス: Link先を確認	Hakan Aktas, Yukie Nagai, Minoru Asada, Erhan Oztop, Emre Ugur,	(参考訳) Affordancesは生態心理学に根ざし、James J. Gibsonによって開拓された概念であり、個人と環境の間の動的関係を理解するための基本的な枠組みとして登場した。伝統的な知覚的および認知的パラダイムを超えて、余裕は、与えられたコンテキスト内のエージェントにオブジェクトが与える本質的な効果と行動の可能性を表す。理論レンズとして、余剰は効果と作用の間のギャップを埋め、エージェントの実体に対する作用とこれらの作用の効果の間の関係を微妙に理解する。本研究では, 対象, 行動, 効果を共通潜在空間内の1つの潜在表現に統一するモデルを提案する。この余剰空間を利用することで,アクションやオブジェクトが与えられたときのエフェクトトラジェクトリを生成し,効果トラジェクトリやオブジェクトが与えられたときのアクショントラジェクトリを生成することができる。実験では,本モデルでは各対象の振る舞いを学習せず,同値性と呼ぶ対象が共有する余剰関係を学習することを示した。シミュレーション実験に加えて,実世界の事例において,本モデルが直接模倣に利用できることを示した。また,異なるロボットの動作を関連付けるために,クロス・エボディメント・トランスファーの基盤として,サプライズを提案する。最後に、決定論的モデル入力に対して有効な出力を生成するソリューションとして選択的損失を導入する。 Affordances, a concept rooted in ecological psychology and pioneered by James J. Gibson, have emerged as a fundamental framework for understanding the dynamic relationship between individuals and their environments. Expanding beyond traditional perceptual and cognitive paradigms, affordances represent the inherent effect and action possibilities that objects offer to the agents within a given context. As a theoretical lens, affordances bridge the gap between effect and action, providing a nuanced understanding of the connections between agents' actions on entities and the effect of these actions. In this study, we propose a model that unifies object, action and effect into a single latent representation in a common latent space that is shared between all affordances that we call the affordance space. Using this affordance space, our system is able to generate effect trajectories when action and object are given and is able to generate action trajectories when effect trajectories and objects are given. In the experiments, we showed that our model does not learn the behavior of each object but it learns the affordance relations shared by the objects that we call equivalences. In addition to simulated experiments, we showed that our model can be used for direct imitation in real world cases. We also propose affordances as a base for Cross Embodiment transfer to link the actions of different robots. Finally, we introduce selective loss as a solution that allows valid outputs to be generated for indeterministic model inputs.	翻訳日:2024-04-26 20:09:25 公開日:2024-04-24
# EMの返却:QA評価のためのエンティティ駆動型回答セットの拡張 Return of EM: Entity-driven Answer Set Expansion for QA Evaluation ( http://arxiv.org/abs/2404.15650v1 ) ライセンス: Link先を確認	Dongryeol Lee, Minwoo Lee, Kyungmin Min, Joonsuk Park, Kyomin Jung,	(参考訳) 近年,大規模言語モデル(LLM)を直接使用することが,QAモデルを評価する上で最も信頼性の高い手法であることが示されている。しかし、限定的な解釈可能性、高いコスト、環境被害に悩まされている。そこで本研究では,エンティティ駆動型回答セット拡張を用いたソフトEMを提案する。本手法は, 表面形状が実体の種類によっては特定のパターンに従うことがしばしばあるという観察に基づいて, 多様な表面形状を含むように金の解集合を拡張する。実験結果から,本手法は従来の評価手法よりも高い性能を示した。さらに,評価手法の信頼性はLLM法と同等であり,高い解釈可能性と環境負荷の低減の利点も提供する。 Recently, directly using large language models (LLMs) has been shown to be the most reliable method to evaluate QA models. However, it suffers from limited interpretability, high cost, and environmental harm. To address these, we propose to use soft EM with entity-driven answer set expansion. Our approach expands the gold answer set to include diverse surface forms, based on the observation that the surface forms often follow particular patterns depending on the entity type. The experimental results show that our method outperforms traditional evaluation methods by a large margin. Moreover, the reliability of our evaluation method is comparable to that of LLM-based ones, while offering the benefits of high interpretability and reduced environmental harm.	翻訳日:2024-04-26 20:09:25 公開日:2024-04-24
# CatLIP: Webスケール画像テキストデータによる2.7倍高速事前学習によるCLIPレベルの視覚認識精度 CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data ( http://arxiv.org/abs/2404.15653v1 ) ライセンス: Link先を確認	Sachin Mehta, Maxwell Horton, Fartash Faghri, Mohammad Hossein Sekhavat, Mahyar Najibi, Mehrdad Farajtabar, Oncel Tuzel, Mohammad Rastegari,	(参考訳) コントラスト学習は、画像とテキストの埋め込みのアライメントを通じて効果的な視覚表現を学習するための変換方法として登場した。しかし、画像とテキストのペア間の対照的な損失におけるペアワイズ類似性計算は、計算上の問題を引き起こす。本稿では,Webスケール画像テキストデータに基づく視覚モデルの弱教師付き事前学習を提案する。提案手法は,画像テキストデータに基づく事前学習を分類タスクとして再編成する。その結果、対の類似性計算を対照的な損失で不要にし、Webスケールのデータでの対照的な学習と比較して、トレーニング速度の2.7\times$Accelerationを達成した。検出やセグメンテーションを含む多様な視覚タスクにまたがる広範囲な実験を通じて,提案手法は高い表現品質を維持していることを示す。トレーニング済みのモデルウェイトとトレーニングレシピとともに、ソースコードは \url{https://github.com/apple/corenet} で公開されています。 Contrastive learning has emerged as a transformative method for learning effective visual representations through the alignment of image and text embeddings. However, pairwise similarity computation in contrastive loss between image and text pairs poses computational challenges. This paper presents a novel weakly supervised pre-training of vision models on web-scale image-text data. The proposed method reframes pre-training on image-text data as a classification task. Consequently, it eliminates the need for pairwise similarity computations in contrastive loss, achieving a remarkable $2.7\times$ acceleration in training speed compared to contrastive learning on web-scale data. Through extensive experiments spanning diverse vision tasks, including detection and segmentation, we demonstrate that the proposed method maintains high representation quality. Our source code along with pre-trained model weights and training recipes is available at \url{https://github.com/apple/corenet}.	翻訳日:2024-04-26 20:09:25 公開日:2024-04-24
# パーソナライズされたビジュアル多重クラスタリングに向けたマルチモーダルプロキシ学習 Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering ( http://arxiv.org/abs/2404.15655v1 ) ライセンス: Link先を確認	Jiawei Yao, Qi Qian, Juhua Hu,	(参考訳) 近年、異なる視点から複数の隠れたデータ構造を明らかにする可能性から、複数のクラスタリングが注目されている。深層クラスタリング技術の出現は、大規模データセット内の複雑なパターンと関係を明らかにすることによって、パフォーマンスを著しく向上させた。しかし、アルゴリズムが生成するすべてのクラスタリングをユーザが必要とせず、必要なクラスタリングを判断するためには、クラスタリング結果の相当な理解が必要であるため、大きな課題が生じる。伝統的に、ユーザの短いキーワードと対応する視覚コンポーネントを一致させることは困難であったが、マルチモーダルおよび大規模言語モデル(LLM)の出現はこのギャップを埋め始めている。そこで本研究では,マルチモーダル・プロキシ・ラーニング・プロセスを用いた新しい手法であるMulti-MaPを提案する。これはCLIPエンコーダを利用してコヒーレントテキストと画像埋め込みを抽出し、GPT-4はユーザの興味を統合して効果的なテキストコンテキストを定式化する。さらに、ユーザの関心に応じて最適なテキストプロキシを学習するために、参照語制約と概念レベルの制約を設計する。 Multi-MaPは、キーワードを通じてユーザの興味を適切にキャプチャするだけでなく、関連するクラスタリングの特定を容易にする。広範にわたる実験により,Multi-MaPは,全てのベンチマークマルチクラスタ・ビジョンタスクにおいて,最先端の手法を一貫して上回っていることがわかった。私たちのコードはhttps://github.com/Alexander-Yao/Multi-MaP.comで公開されています。 Multiple clustering has gained significant attention in recent years due to its potential to reveal multiple hidden structures of data from different perspectives. The advent of deep multiple clustering techniques has notably advanced the performance by uncovering complex patterns and relationships within large datasets. However, a major challenge arises as users often do not need all the clusterings that algorithms generate, and figuring out the one needed requires a substantial understanding of each clustering result. Traditionally, aligning a user's brief keyword of interest with the corresponding vision components was challenging, but the emergence of multi-modal and large language models (LLMs) has begun to bridge this gap. In response, given unlabeled target visual data, we propose Multi-MaP, a novel method employing a multi-modal proxy learning process. It leverages CLIP encoders to extract coherent text and image embeddings, with GPT-4 integrating users' interests to formulate effective textual contexts. Moreover, reference word constraint and concept-level constraint are designed to learn the optimal text proxy according to the user's interest. Multi-MaP not only adeptly captures a user's interest via a keyword but also facilitates identifying relevant clusterings. Our extensive experiments show that Multi-MaP consistently outperforms state-of-the-art methods in all benchmark multi-clustering vision tasks. Our code is available at https://github.com/Alexander-Yao/Multi-MaP.	翻訳日:2024-04-26 20:09:25 公開日:2024-04-24
# MISLEAD:エスプレッションアタックにおけるエプシロン学習のための選択機能の重要性の操作 MISLEAD: Manipulating Importance of Selected features for Learning Epsilon in Evasion Attack Deception ( http://arxiv.org/abs/2404.15656v1 ) ライセンス: Link先を確認	Vidit Khazanchi, Pavan Kulkarni, Yuvaraj Govindarajulu, Manojkumar Parmar,	(参考訳) 敵攻撃による機械学習(ML)モデルの新たな脆弱性は、その信頼性に対する懸念を引き起こす。特に、回避攻撃は入力データに正確な摂動を導入してモデルを操作し、誤った予測を引き起こす。そこで本稿では,SHAP(SHapley Additive exPlanations)を特徴量分析に用いた手法と,回避攻撃を行うためのイノベーティブな最適エプシロン手法を提案する。私たちのアプローチは、モデル脆弱性を理解するためのSHAPベースの分析から始まり、ターゲットの回避戦略の考案に不可欠です。バイナリ探索アルゴリズムを用いた最適エプシロン法は,回避に要する最小エプシロンを効率的に決定する。多様な機械学習アーキテクチャによる評価は、敵のサンプルを生成する際のテクニックの精度を示し、モデル結果を操作する上での有効性を裏付けている。本研究は,機械学習システムにおける潜在的なセキュリティリスクを特定し,軽減するための,継続的評価とモニタリングの重要性を強調する。 Emerging vulnerabilities in machine learning (ML) models due to adversarial attacks raise concerns about their reliability. Specifically, evasion attacks manipulate models by introducing precise perturbations to input data, causing erroneous predictions. To address this, we propose a methodology combining SHapley Additive exPlanations (SHAP) for feature importance analysis with an innovative Optimal Epsilon technique for conducting evasion attacks. Our approach begins with SHAP-based analysis to understand model vulnerabilities, crucial for devising targeted evasion strategies. The Optimal Epsilon technique, employing a Binary Search algorithm, efficiently determines the minimum epsilon needed for successful evasion. Evaluation across diverse machine learning architectures demonstrates the technique's precision in generating adversarial samples, underscoring its efficacy in manipulating model outcomes. This study emphasizes the critical importance of continuous assessment and monitoring to identify and mitigate potential security risks in machine learning systems.	翻訳日:2024-04-26 20:09:25 公開日:2024-04-24
# FedSI: 効率的な不確実性定量化のためのフェデレーションサブネットワーク推論 FedSI: Federated Subnetwork Inference for Efficient Uncertainty Quantification ( http://arxiv.org/abs/2404.15657v1 ) ライセンス: Link先を確認	Hui Chen, Hengyu Liu, Zhangkai Wu, Xuhui Fan, Longbing Cao,	(参考訳) 深層ニューラルネットワーク(DNN)に基づくパーソナライズド・フェデレーション・ラーニング(PFL)は、データの不均一性に対処し、有望な性能を示す一方で、既存のフェデレーションド・ラーニング(FL)の方法は、効率的な体系的不確実性定量化に悩まされている。ベイズ DNN ベースの PFL は通常、過剰に単純化されたモデル構造か、高い計算とメモリコストのどちらかに疑問を呈する。本稿では,ベイズDNNベースのサブネットワーク推論PFLフレームワークであるFedSIを紹介する。 FedSIは、ベイズ的手法を利用して体系的な不確実性を効果的に組み込むことにより、シンプルでスケーラブルである。クライアント固有のサブネットワーク推論機構を実装し、後続分布を通して推論される大きな分散を持つネットワークパラメータを選択し、残りを決定論的パラメータとして修正する。 FedSIは、体系的な不確実性を最大限に保ちながら、高速でスケーラブルな推論を達成する。 3つの異なるベンチマークデータセットに対する大規模な実験により、FedSIは異種FLシナリオにおいて既存のベイズ系および非ベイズ系FLベースラインより優れていることが示された。 While deep neural networks (DNNs) based personalized federated learning (PFL) is demanding for addressing data heterogeneity and shows promising performance, existing methods for federated learning (FL) suffer from efficient systematic uncertainty quantification. The Bayesian DNNs-based PFL is usually questioned of either over-simplified model structures or high computational and memory costs. In this paper, we introduce FedSI, a novel Bayesian DNNs-based subnetwork inference PFL framework. FedSI is simple and scalable by leveraging Bayesian methods to incorporate systematic uncertainties effectively. It implements a client-specific subnetwork inference mechanism, selects network parameters with large variance to be inferred through posterior distributions, and fixes the rest as deterministic ones. FedSI achieves fast and scalable inference while preserving the systematic uncertainties to the fullest extent. Extensive experiments on three different benchmark datasets demonstrate that FedSI outperforms existing Bayesian and non-Bayesian FL baselines in heterogeneous FL scenarios.	翻訳日:2024-04-26 20:09:25 公開日:2024-04-24
# KS-LLM:質問応答のためのエビデンス文書を用いた大規模言語モデルの知識選択 KS-LLM: Knowledge Selection of Large Language Models with Evidence Document for Question Answering ( http://arxiv.org/abs/2404.15660v1 ) ライセンス: Link先を確認	Xinxin Zheng, Feihu Che, Jinyang Wu, Shuai Zhang, Shuai Nie, Kang Liu, Jianhua Tao,	(参考訳) 大きな言語モデル(LLM)は幻覚の問題に悩まされ、知識集約的なタスクに適用した場合、重大な課題に直面します。有望なアプローチは、証拠文書を検索や生成を通じて得られる追加の支援知識として活用することである。しかし,既存の手法では証拠文書の全内容を直接活用し,ノイズ情報を導入し,大規模言語モデルの性能を損なう可能性がある。この問題に対処するため,我々は,証拠文書から貴重な情報を特定することを目的とした,KS-LLM(Knowledge Selection of Large Language Models)手法を提案する。 KS-LLMアプローチは三つ組を利用して、質問に答えるのに有用な証拠文書から知識スニペットを効果的に選択する。具体的には、まず、入力された質問に基づいて三重項を生成し、次に、証拠文書から三重項に最もよく似た証拠文を選択し、最後に、証拠文と三重項を組み合わせて、大きな言語モデルによる回答の生成を支援する。 TriviaQA, WebQ, NQ などの質問応答データセットの実験的比較により,提案手法がベースラインを超え,最良の結果が得られることを示した。 Large language models (LLMs) suffer from the hallucination problem and face significant challenges when applied to knowledge-intensive tasks. A promising approach is to leverage evidence documents as extra supporting knowledge, which can be obtained through retrieval or generation. However, existing methods directly leverage the entire contents of the evidence document, which may introduce noise information and impair the performance of large language models. To tackle this problem, we propose a novel Knowledge Selection of Large Language Models (KS-LLM) method, aiming to identify valuable information from evidence documents. The KS-LLM approach utilizes triples to effectively select knowledge snippets from evidence documents that are beneficial to answering questions. Specifically, we first generate triples based on the input question, then select the evidence sentences most similar to triples from the evidence document, and finally combine the evidence sentences and triples to assist large language models in generating answers. Experimental comparisons on several question answering datasets, such as TriviaQA, WebQ, and NQ, demonstrate that the proposed method surpasses the baselines and achieves the best results.	翻訳日:2024-04-26 20:09:25 公開日:2024-04-24
# CWF: 高品質メッシュ単純化における弱機能の統合 CWF: Consolidating Weak Features in High-quality Mesh Simplification ( http://arxiv.org/abs/2404.15661v1 ) ライセンス: Link先を確認	Rui Xu, Longdu Liu, Ningna Wang, Shuangmin Chen, Shiqing Xin, Xiaohu Guo, Zichun Zhong, Taku Komura, Wenping Wang, Changhe Tu,	(参考訳) メッシュの単純化では、精度、三角形の品質、機能アライメントといった一般的な要件がトレードオフと見なされることが多い。既存のアルゴリズムは、これらの要求の1つまたはいくつかの特定の側面にのみ集中する。例えば、よく知られたQuadric Error Metrics (QEM) アプローチは精度を優先し、強い特徴線や点も維持できるが、高い三角形の品質を保証するには不足し、強い特徴ほど独特でない弱い特徴を劣化させる可能性がある。本稿では,これらの要件をすべて同時に検討するスムーズな機能を提案する。関数は通常の異方性項と中心渦渦テッセルレーション(CVT)エネルギー項を含み、変数は表面に配置された可動点の集合である。前者はQEMの精神を継承するが、連続的な設定で動作し、後者は偶数点分布を奨励し、様々な表面測度を許容する。さらに、この2つの項を自動的にバランスをとるために、崩壊する重みを導入します。 ABCデータセットから100のCADモデルと21の有機モデルを選択し、既存のメッシュ単純化アルゴリズムを我々のものと比較した。崩壊重みの導入は、2項間の衝突を効果的に減らし、弱い特徴のアライメントを可能にする。この特徴は、我々のアプローチを既存のメッシュの単純化方法と区別し、形状理解において有意義な可能性を証明している。 In mesh simplification, common requirements like accuracy, triangle quality, and feature alignment are often considered as a trade-off. Existing algorithms concentrate on just one or a few specific aspects of these requirements. For example, the well-known Quadric Error Metrics (QEM) approach prioritizes accuracy and can preserve strong feature lines/points as well but falls short in ensuring high triangle quality and may degrade weak features that are not as distinctive as strong ones. In this paper, we propose a smooth functional that simultaneously considers all of these requirements. The functional comprises a normal anisotropy term and a Centroidal Voronoi Tessellation (CVT) energy term, with the variables being a set of movable points lying on the surface. The former inherits the spirit of QEM but operates in a continuous setting, while the latter encourages even point distribution, allowing various surface metrics. We further introduce a decaying weight to automatically balance the two terms. We selected 100 CAD models from the ABC dataset, along with 21 organic models, to compare the existing mesh simplification algorithms with ours. Experimental results reveal an important observation: the introduction of a decaying weight effectively reduces the conflict between the two terms and enables the alignment of weak features. This distinctive feature sets our approach apart from most existing mesh simplification methods and demonstrates significant potential in shape understanding.	翻訳日:2024-04-26 20:09:25 公開日:2024-04-24
# 自己スーパービジョンによる解剖学からの局所性、構成性、分解性学習による基礎モデルにおける部分ホール階層の表現 Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability, Composability, and Decomposability from Anatomy via Self-Supervision ( http://arxiv.org/abs/2404.15672v1 ) ライセンス: Link先を確認	Mohammad Reza Hosseinzadeh Taher, Michael B. Gotway, Jianming Liang,	(参考訳) 深層学習は多層特徴空間の学習において優れているが、医療画像の顕著な特性である部分全体関係の明示的なコーディングを欠いていることが多い。この制限を克服するために、Adam-v2はAdam [79]を拡張した新しい自己教師型学習フレームワークを紹介した。Adam-v2は、(1)局所性、識別性、異なる解剖パターンを識別するための識別的表現、(2)構成性、各解剖学的構造を一括的に学習する、(3)分解性、各解剖学的構造全体を一括的に解釈する、という3つのキーブランチを通じて、学習目標に全階層を明示的に組み込むことによって、Adam [79]を拡張した新しい学習フレームワークである。 10タスクにわたる実験結果は、ゼロショット、少数ショット転送、フル微調整設定の11ベースラインと比較して、Adam-v2が大規模医療モデルとさまざまな下流タスクにまたがる既存のSSLメソッドよりも優れたパフォーマンスを示している。アダム-v2の表現の一般性やロバスト性の高さは、ラベルのない医療画像と異なる解剖学的構造のための階層構造を明示的に構築することに由来する。 Adam-v2は、その埋め込みにおいて解剖学的多様性と調和のセマンティックバランスを保ち、ジェネリックかつセマンティックに意味のある表現を既存のSSLメソッドで見落としている。すべてのコードと事前訓練されたモデルはhttps://github.com/JLiangLab/Eden.comで入手できる。 Humans effortlessly interpret images by parsing them into part-whole hierarchies; deep learning excels in learning multi-level feature spaces, but they often lack explicit coding of part-whole relations, a prominent property of medical imaging. To overcome this limitation, we introduce Adam-v2, a new self-supervised learning framework extending Adam [79] by explicitly incorporating part-whole hierarchies into its learning objectives through three key branches: (1) Localizability, acquiring discriminative representations to distinguish different anatomical patterns; (2) Composability, learning each anatomical structure in a parts-to-whole manner; and (3) Decomposability, comprehending each anatomical structure in a whole-to-parts manner. Experimental results across 10 tasks, compared to 11 baselines in zero-shot, few-shot transfer, and full fine-tuning settings, showcase Adam-v2's superior performance over large-scale medical models and existing SSL methods across diverse downstream tasks. The higher generality and robustness of Adam-v2's representations originate from its explicit construction of hierarchies for distinct anatomical structures from unlabeled medical images. Adam-v2 preserves a semantic balance of anatomical diversity and harmony in its embedding, yielding representations that are both generic and semantically meaningful, yet overlooked in existing SSL methods. All code and pretrained models are available at https://github.com/JLiangLab/Eden.	翻訳日:2024-04-26 19:59:41 公開日:2024-04-24
# Augmented CARDS: Twitter上の気候変動の誤報のトリガーを特定する機械学習アプローチ Augmented CARDS: A machine learning approach to identifying triggers of climate change misinformation on Twitter ( http://arxiv.org/abs/2404.15673v1 ) ライセンス: Link先を確認	Cristian Rojas, Frank Algra-Maschio, Mark Andrejevic, Travis Coan, John Cook, Yuan-Fang Li,	(参考訳) 気候変動に関する誤報は、社会的幸福への重大な脅威となり、効果的な緩和戦略が緊急に必要となる。しかし、ソーシャルメディアプラットフォーム上でのオンライン誤報の急増は、ファクトチェッカーが虚偽の主張を軽視する能力を上回っている。気候変動の誤報の自動検出は、有望な解決策を提供する。本研究では,2段階の階層モデルであるAugmented CARDSモデルを開発することにより,このギャップに対処する。さらに、2022年の6ヶ月間に500万件の気候をテーマとしたツイートに対して、Augmented CARDSモデルを適用した。 Twitter上での温暖化に関する主張の半分以上は、気候のアクターや陰謀説に対する攻撃が関与していることがわかりました。気候コントラリアニズムのスパイクは、政治イベント、自然イベント、コントラリアンインフルエンサー、あるいは説得力のあるインフルエンサーの4つの刺激の1つと一致する。気候の誤報に対する自動応答の意義について論じる。 Misinformation about climate change poses a significant threat to societal well-being, prompting the urgent need for effective mitigation strategies. However, the rapid proliferation of online misinformation on social media platforms outpaces the ability of fact-checkers to debunk false claims. Automated detection of climate change misinformation offers a promising solution. In this study, we address this gap by developing a two-step hierarchical model, the Augmented CARDS model, specifically designed for detecting contrarian climate claims on Twitter. Furthermore, we apply the Augmented CARDS model to five million climate-themed tweets over a six-month period in 2022. We find that over half of contrarian climate claims on Twitter involve attacks on climate actors or conspiracy theories. Spikes in climate contrarianism coincide with one of four stimuli: political events, natural events, contrarian influencers, or convinced influencers. Implications for automated responses to climate misinformation are discussed.	翻訳日:2024-04-26 19:59:40 公開日:2024-04-24
# チェーン・オブ・ワットを超えて - LLMにおけるチェイン・オブ・Xパラダイムのサーベイ Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs ( http://arxiv.org/abs/2404.15676v1 ) ライセンス: Link先を確認	Yu Xia, Rui Wang, Xu Liu, Mingyan Li, Tong Yu, Xiang Chen, Julian McAuley, Shuai Li,	(参考訳) CoT(Chain-of-Thought)は、大規模言語モデル(LLM)の印象的な推論能力を引き出す、広く採用されているプロンプト手法である。 CoTのシーケンシャルな思考構造に触発されて、様々な領域やLLMを含むタスクにまたがる様々な課題に対処するために、多くのChain-of-X(CoX)手法が開発されている。本稿では,異なる文脈におけるLLMの連鎖-X法に関する包括的調査を行う。具体的には、ノードの分類、すなわち、CoXのXとアプリケーションタスクで分類する。また,既存のCoX手法の発見と意義,今後の方向性についても論じる。我々の調査は、より広いシナリオにCoTのアイデアを適用したい研究者のための、詳細かつ最新のリソースとして機能することを目的としています。 Chain-of-Thought (CoT) has been a widely adopted prompting method, eliciting impressive reasoning abilities of Large Language Models (LLMs). Inspired by the sequential thought structure of CoT, a number of Chain-of-X (CoX) methods have been developed to address various challenges across diverse domains and tasks involving LLMs. In this paper, we provide a comprehensive survey of Chain-of-X methods for LLMs in different contexts. Specifically, we categorize them by taxonomies of nodes, i.e., the X in CoX, and application tasks. We also discuss the findings and implications of existing CoX methods, as well as potential future directions. Our survey aims to serve as a detailed and up-to-date resource for researchers seeking to apply the idea of CoT to broader scenarios.	翻訳日:2024-04-26 19:59:40 公開日:2024-04-24
# characterFactory:拡散モデルのためのGANを用いた一貫性キャラクタのサンプリング CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models ( http://arxiv.org/abs/2404.15677v1 ) ライセンス: Link先を確認	Qinghe Wang, Baolu Li, Xiaomin Li, Bing Cao, Liqian Ma, Huchuan Lu, Xu Jia,	(参考訳) 近年のテキスト・ツー・イメージ・モデルの発展は、人中心世代における新たなフロンティアを開拓している。しかし、これらのモデルは、一貫した新しいIDを持つ画像を生成するために直接利用することはできない。本研究では,拡散モデルのためのGANの潜時空間における一貫した同一性を持つ新しい文字をサンプリングするフレームワークである characterFactory を提案する。より具体的には、セレブ名の埋め込みという言葉をアイデンティティ一貫性のある生成タスクの基礎的真実とみなし、GANモデルを訓練して、潜在空間からセレブ埋め込み空間へのマッピングを学習する。さらに、生成したアイデンティティ埋め込みが、様々なコンテキストにおいて、アイデンティティ一貫性のある画像を生成することができるように、コンテキスト一貫性損失を設計する。注目すべきは、モデル全体がトレーニングに10分しかかからず、推論中に無限の文字をエンドツーエンドにサンプリングできることだ。広範囲な実験により, 文字生成におけるキャラクタファクトリーの性能は, アイデンティティの整合性と編集性に優れていた。さらに、生成された文字は、オフザシェルフ画像/ビデオ/3D拡散モデルとシームレスに結合することができる。我々は、提案した CharacterFactory が、アイデンティティ一貫性のある文字生成の重要なステップであると信じている。プロジェクトページは、https://qinghew.github.io/CharacterFactory/.comで公開されている。 Recent advances in text-to-image models have opened new frontiers in human-centric generation. However, these models cannot be directly employed to generate images with consistent newly coined identities. In this work, we propose CharacterFactory, a framework that allows sampling new characters with consistent identities in the latent space of GANs for diffusion models. More specifically, we consider the word embeddings of celeb names as ground truths for the identity-consistent generation task and train a GAN model to learn the mapping from a latent space to the celeb embedding space. In addition, we design a context-consistent loss to ensure that the generated identity embeddings can produce identity-consistent images in various contexts. Remarkably, the whole model only takes 10 minutes for training, and can sample infinite characters end-to-end during inference. Extensive experiments demonstrate excellent performance of the proposed CharacterFactory on character creation in terms of identity consistency and editability. Furthermore, the generated characters can be seamlessly combined with the off-the-shelf image/video/3D diffusion models. We believe that the proposed CharacterFactory is an important step for identity-consistent character generation. Project page is available at: https://qinghew.github.io/CharacterFactory/.	翻訳日:2024-04-26 19:59:40 公開日:2024-04-24
# レジデントパワー, 不正自動化: 自動意思決定システムにおける正当性を無視する問題 Legitimate Power, Illegitimate Automation: The problem of ignoring legitimacy in automated decision systems ( http://arxiv.org/abs/2404.15680v1 ) ライセンス: Link先を確認	Jake Stone, Brent Mittelstadt,	(参考訳) 機械学習と人工知能の進歩は、自動意思決定システム(ADS)の普及を加速させた。広範な文献は、これらのシステムの決定が公正であるためには、どのような条件を満たさなければならないかを探求している。しかし、ADSの支配者がなぜそのような決定を下す権利があるのかという正当性に関する疑問は、比較的ほとんど注目されていない。この論文は、そのような疑問が提起された場合、しばしば、公的な受容または公正性、正確性、専門性、効率といった他の実体的価値と正当性を誤って説明することを示しています。より良い理論を求めて、我々は国家の正当性について哲学文学を批判的に分析し、同意、公理、民主的な権威に焦点をあてる。この分析は、分析政治哲学における正当性に対する一般的な理解もまた、ADSが正当であるか否かの確定に不適であることを示している。そこで本論文は,ADSの正当性理論への期待を明らかにするとともに,今後の研究プログラムへの道筋を示す。 Progress in machine learning and artificial intelligence has spurred the widespread adoption of automated decision systems (ADS). An extensive literature explores what conditions must be met for these systems' decisions to be fair. However, questions of legitimacy -- why those in control of ADS are entitled to make such decisions -- have received comparatively little attention. This paper shows that when such questions are raised theorists often incorrectly conflate legitimacy with either public acceptance or other substantive values such as fairness, accuracy, expertise or efficiency. In search of better theories, we conduct a critical analysis of the philosophical literature on the legitimacy of the state, focusing on consent, public reason, and democratic authorisation. This analysis reveals that the prevailing understanding of legitimacy in analytical political philosophy is also ill-suited to the task of establishing whether and when ADS are legitimate. The paper thus clarifies expectations for theories of ADS legitimacy and charts a path for a future research programme on the topic.	翻訳日:2024-04-26 19:59:40 公開日:2024-04-24
# 生成事前学習変圧器モデルを用いた暗号ハッシュ関数実装のソースコード変数の自動生成 Automated Creation of Source Code Variants of a Cryptographic Hash Function Implementation Using Generative Pre-Trained Transformer Models ( http://arxiv.org/abs/2404.15681v1 ) ライセンス: Link先を確認	Elijah Pelofske, Vincent Urias, Lorie M. Liebrock,	(参考訳) ジェネレーティブ・プレトレーニング・トランスフォーマー(Generative Pre-trained Transformer、GPT)は、新鮮で一貫性のある自然言語を生成できる大規模言語機械学習モデルの一種である。本研究では, 暗号ハッシュ関数SHA-1の実装において, GPTモデルが新規かつ適切なバージョン, 特に非常に安全でないバージョンを生成する能力について検討した。 GPTモデルLlama-2-70b-chat-h、Mistral-7B-Instruct-v0.1、zephyr-7b-alphaが使用される。 GPTモデルは、ローカルGPTフレームワークとlangchainの修正版を使用して各関数を再書き込みするよう促され、完全なソースコードとヘッダファイルのワード埋め込みコンテキストをモデルに提供し、130,000以上の関数がGPT出力のテキストブロックを書き換え、そのうち約4万がCコードとして解析され、コンパイルされた。生成されたコードは、コンパイル可能であり、アルゴリズムの正しさ、メモリリーク、コンパイラ最適化の安定性、参照実装までの文字距離を解析する。注目すべきは、いくつかの生成された関数変種は、いくつかのテストベクターに対して正しいが、他のテストベクターでは正しくないという高い実装上のセキュリティリスクがあることである。さらに、多くの関数の実装は、SHA-1の参照アルゴリズムに正確ではなく、ハッシュ関数の基本的な特徴を持つハッシュを生成した。関数の再書き込みの多くは、メモリリーク、整数オーバーフロー、バウンダリアクセス、初期化されていない値の使用、コンパイラの最適化不安定といった深刻な欠陥を含んでいた。コンパイラの最適化設定とコンパイル済みバイナリのSHA-256ハッシュチェックサムは、同等だが同一の構文を持たない実装に使用される。 Generative pre-trained transformers (GPT's) are a type of large language machine learning model that are unusually adept at producing novel, and coherent, natural language. In this study the ability of GPT models to generate novel and correct versions, and notably very insecure versions, of implementations of the cryptographic hash function SHA-1 is examined. The GPT models Llama-2-70b-chat-h, Mistral-7B-Instruct-v0.1, and zephyr-7b-alpha are used. The GPT models are prompted to re-write each function using a modified version of the localGPT framework and langchain to provide word embedding context of the full source code and header files to the model, resulting in over 130,000 function re-write GPT output text blocks, approximately 40,000 of which were able to be parsed as C code and subsequently compiled. The generated code is analyzed for being compilable, correctness of the algorithm, memory leaks, compiler optimization stability, and character distance to the reference implementation. Remarkably, several generated function variants have a high implementation security risk of being correct for some test vectors, but incorrect for other test vectors. Additionally, many function implementations were not correct to the reference algorithm of SHA-1, but produced hashes that have some of the basic characteristics of hash functions. Many of the function re-writes contained serious flaws such as memory leaks, integer overflows, out of bounds accesses, use of uninitialised values, and compiler optimization instability. Compiler optimization settings and SHA-256 hash checksums of the compiled binaries are used to cluster implementations that are equivalent but may not have identical syntax - using this clustering over 100,000 novel and correct versions of the SHA-1 codebase were generated where each component C function of the reference implementation is different from the original code.	翻訳日:2024-04-26 19:59:40 公開日:2024-04-24
# AnoFPDM:脳MRIにおける拡散モデルの前方プロセスによる異常セグメンテーション AnoFPDM: Anomaly Segmentation with Forward Process of Diffusion Models for Brain MRI ( http://arxiv.org/abs/2404.15683v1 ) ライセンス: Link先を確認	Yiming Che, Fazle Rafsani, Jay Shah, Md Mahfuzur Rahman Siddiquee, Teresa Wu,	(参考訳) 画像レベルのラベルを活かした異常セグメンテーションにおける弱教師付き拡散モデル(DM)は、教師なし手法に比べて優れた性能で注目されている。トレーニングにおけるピクセルレベルのラベルの必要性を排除し、教師付きメソッドよりもコスト効率の良い代替手段を提供する。しかし、既存の手法は、推論におけるハイパーパラメータチューニングのためのコストのかかるピクセルレベルのラベルに大きく依存するため、完全には教師されない。この課題に対処するために、ピクセルレベルのラベルを必要とせずに動作する、完全に弱い教師付きフレームワークであるAnoFPDM(Anomaly Segmentation with Forward Process of Diffusion Models)を導入する。入力画像毎のノイズスケールとしきい値として,未案内前処理を基準として,適切なハイパーパラメータを同定する。前方プロセスの各ステップから異常マップを集約し,異常領域の信号強度を高める。また,提案手法は,画素レベルのラベルを使わずに,最新の最先端の弱教師付きアプローチよりも優れていた。 Weakly-supervised diffusion models (DM) in anomaly segmentation, leveraging image-level labels, have attracted significant attention for their superior performance compared to unsupervised methods. It eliminates the need for pixel-level labels in training, offering a more cost-effective alternative to supervised methods. However, existing methods are not fully weakly-supervised because they heavily rely on costly pixel-level labels for hyperparameter tuning in inference. To tackle this challenge, we introduce Anomaly Segmentation with Forward Process of Diffusion Models (AnoFPDM), a fully weakly-supervised framework that operates without the need for pixel-level labels. Leveraging the unguided forward process as a reference, we identify suitable hyperparameters, i.e., noise scale and threshold, for each input image. We aggregate anomaly maps from each step in the forward process, enhancing the signal strength of anomalous regions. Remarkably, our proposed method outperforms recent state-of-the-art weakly-supervised approaches, even without utilizing pixel-level labels.	翻訳日:2024-04-26 19:59:40 公開日:2024-04-24
# 能動光強度干渉法による超高分解能イメージング Super-resolution imaging based on active optical intensity interferometry ( http://arxiv.org/abs/2404.15685v1 ) ライセンス: Link先を確認	Lu-Chuan Liu, Cheng Wu, Wei Li, Yu-Ao Chen, Frank Wilczek, Xiao-Peng Shao, Feihu Xu, Qiang Zhang, Jian-Wei Pan,	(参考訳) インターフェロメトリーによる長基線回折制限光開口合成技術は、科学的研究や実用化において重要な役割を担っている。振幅(位相)干渉法とは対照的に、熱光の光子束効果を測定するために光の量子的性質を利用する強度干渉法は、大気の乱流や光学的欠陥に対して堅牢である。しかし、熱光源は典型的には大きなばらつき角を持ち、モード毎の平均光子数は低く、長い範囲で適用可能である。そこで本研究では,超高分解能イメージングのための能動強度干渉法を提案し,実演する。本手法では、位相非依存の複数のレーザーエミッタを用いて熱照射を発生させ、精巧な計算アルゴリズムを用いて画像の再構成を行う。屋外環境では、1つの望遠鏡の14倍の回折限界の解像度で、1.36km以上の2次元ミリレベルのターゲットを撮像する。高分解能な光学イメージングとセンシングは、物理学と気象学の一般的な分野に長基線能動強度干渉法を適用することで期待できる。 Long baseline diffraction-limited optical aperture synthesis technology by interferometry plays an important role in scientific study and practical application. In contrast to amplitude (phase) interferometry, intensity interferometry -- which exploits the quantum nature of light to measure the photon bunching effect in thermal light -- is robust against atmospheric turbulence and optical defects. However, a thermal light source typically has a significant divergence angle and a low average photon number per mode, forestalling the applicability over long ranges. Here, we propose and demonstrate active intensity interferometry for super-resolution imaging over the kilometer range. Our scheme exploits phase-independent multiple laser emitters to produce the thermal illumination and uses an elaborate computational algorithm to reconstruct the image. In outdoor environments, we image two-dimension millimeter-level targets over 1.36 kilometers at a resolution of 14 times the diffraction limit of a single telescope. High-resolution optical imaging and sensing are anticipated by applying long-baseline active intensity interferometry in general branches of physics and metrology.	翻訳日:2024-04-26 19:59:40 公開日:2024-04-24
# 差分プライバシーにおける雑音分散最適化 : インスタンスごとの差分プライバシーによるゲーム理論的アプローチ Noise Variance Optimization in Differential Privacy: A Game-Theoretic Approach Through Per-Instance Differential Privacy ( http://arxiv.org/abs/2404.15686v1 ) ライセンス: Link先を確認	Sehyun Ryu, Jonggyu Jang, Hyun Jong Yang,	(参考訳) 差分プライバシー(DP)の概念は、個人をターゲットデータセットに含めることによる分布の変化を観察することにより、プライバシー損失を定量的に測定することができる。一般的に制約として使用されるDPは、AppleやGoogleのような業界巨人の機械学習におけるデータセットの保護において際立っている。 DPを保証する一般的な手法は、クエリ出力に適切なノイズを組み込むことで、会員推測やリンク攻撃といったプライバシー攻撃に対する統計的防御システムを確立することである。しかし、特に小さなデータセットの場合、既存のDPメカニズムは時にクエリ出力に過剰なノイズを加え、データユーティリティを破棄する。これは、従来のDPが最悪のシナリオ、すなわち統計的外れ値に基づいてプライバシー損失を計算するためである。本研究では、この課題に対処するために、インスタンスごとのDP(pDP)を制約として使用し、各データインスタンスのプライバシ損失を測定し、個々のインスタンスに合わせたノイズを最適化する。簡単に言えば、NVO(Per-instance noise variance Optimization)ゲームは共通の興味のある逐次ゲームとしてフレーム化されており、Nash equilibrium(NE)ポイントが本質的にすべてのデータインスタンスに対してpDPを保証していることを示す。提案したpDPアルゴリズムは, 従来のDPアルゴリズムと比較すると, KLのばらつきから平均99.53%の性能向上を示した。 The concept of differential privacy (DP) can quantitatively measure privacy loss by observing the changes in the distribution caused by the inclusion of individuals in the target dataset. The DP, which is generally used as a constraint, has been prominent in safeguarding datasets in machine learning in industry giants like Apple and Google. A common methodology for guaranteeing DP is incorporating appropriate noise into query outputs, thereby establishing statistical defense systems against privacy attacks such as membership inference and linkage attacks. However, especially for small datasets, existing DP mechanisms occasionally add excessive amount of noise to query output, thereby discarding data utility. This is because the traditional DP computes privacy loss based on the worst-case scenario, i.e., statistical outliers. In this work, to tackle this challenge, we utilize per-instance DP (pDP) as a constraint, measuring privacy loss for each data instance and optimizing noise tailored to individual instances. In a nutshell, we propose a per-instance noise variance optimization (NVO) game, framed as a common interest sequential game, and show that the Nash equilibrium (NE) points of it inherently guarantee pDP for all data instances. Through extensive experiments, our proposed pDP algorithm demonstrated an average performance improvement of up to 99.53% compared to the conventional DP algorithm in terms of KL divergence.	翻訳日:2024-04-26 19:59:40 公開日:2024-04-24
# 脆弱性検出のためのグラフニューラルネットワークの提案 Graph Neural Networks for Vulnerability Detection: A Counterfactual Explanation ( http://arxiv.org/abs/2404.15687v1 ) ライセンス: Link先を確認	Zhaoyang Chu, Yao Wan, Qian Li, Yang Wu, Hongyu Zhang, Yulei Sui, Guandong Xu, Hai Jin,	(参考訳) 脆弱性検出は、ソフトウェアシステムのセキュリティと信頼性を確保するために不可欠である。最近、Graph Neural Networks(GNN)は、ソースコードの基盤となるセマンティック構造をキャプチャする能力のため、脆弱性検出のための顕著なコード埋め込みアプローチとして登場した。しかし、GNNは本質的にブラックボックスの性質のため、説明可能性において重大な課題に直面している。この目的のために、いくつかの事実推論に基づく説明器が提案されている。これらの説明者は、結果に寄与する主要な特徴を分析することによって、GNNによる予測について説明する。コードグラフを代替構造に変更するならば、GNNの決定はどうなるのか? 人工知能における反ファクト推論の進歩に触発されて、GNNベースの脆弱性検出のための新しい反ファクト説明器CFExplainerを提案する。事実推論ベースの説明器とは異なり、CFExplainerは入力コードグラフに対する最小限の摂動を求め、予測が変更される。検出された脆弱性の根本原因を特定し、開発者が脆弱性を修正するための適切なアクションを実行するための貴重な洞察を与えることができる。 4つのGNNベースの脆弱性検出モデルに対する大規模な実験は、既存の最先端の事実推論に基づく説明器に対するCFExplainerの有効性を示している。 Vulnerability detection is crucial for ensuring the security and reliability of software systems. Recently, Graph Neural Networks (GNNs) have emerged as a prominent code embedding approach for vulnerability detection, owing to their ability to capture the underlying semantic structure of source code. However, GNNs face significant challenges in explainability due to their inherently black-box nature. To this end, several factual reasoning-based explainers have been proposed. These explainers provide explanations for the predictions made by GNNs by analyzing the key features that contribute to the outcomes. We argue that these factual reasoning-based explanations cannot answer critical what-if questions: What would happen to the GNN's decision if we were to alter the code graph into alternative structures? Inspired by advancements of counterfactual reasoning in artificial intelligence, we propose CFExplainer, a novel counterfactual explainer for GNN-based vulnerability detection. Unlike factual reasoning-based explainers, CFExplainer seeks the minimal perturbation to the input code graph that leads to a change in the prediction, thereby addressing the what-if questions for vulnerability detection. We term this perturbation a counterfactual explanation, which can pinpoint the root causes of the detected vulnerability and furnish valuable insights for developers to undertake appropriate actions for fixing the vulnerability. Extensive experiments on four GNN-based vulnerability detection models demonstrate the effectiveness of CFExplainer over existing state-of-the-art factual reasoning-based explainers.	翻訳日:2024-04-26 19:59:40 公開日:2024-04-24
# ニューラルプロトランゲージ再建術 Neural Proto-Language Reconstruction ( http://arxiv.org/abs/2404.15690v1 ) ライセンス: Link先を確認	Chenxuan Cui, Ying Chen, Qinxin Wang, David R. Mortensen,	(参考訳) プロトフォームの再構築は言語学者にとって面倒なプロセスだった。近年,このプロセスを自動化するためにRNNやTransformerなどの計算モデルが提案されている。本稿では,データ拡張による失明反射の回復,トランスフォーマーモデルへのVAE構造の追加,再構成作業のためのニューラルマシン翻訳モデルなど,従来の手法を改善するために3つのアプローチを採っている。付加的なVAE構造により、TransformerモデルはWikiHanデータセットのパフォーマンスが向上し、データ拡張ステップがトレーニングを安定化することがわかった。 Proto-form reconstruction has been a painstaking process for linguists. Recently, computational models such as RNN and Transformers have been proposed to automate this process. We take three different approaches to improve upon previous methods, including data augmentation to recover missing reflexes, adding a VAE structure to the Transformer model for proto-to-language prediction, and using a neural machine translation model for the reconstruction task. We find that with the additional VAE structure, the Transformer model has a better performance on the WikiHan dataset, and the data augmentation step stabilizes the training.	翻訳日:2024-04-26 19:59:40 公開日:2024-04-24
# 長期オフポリティ評価と学習 Long-term Off-Policy Evaluation and Learning ( http://arxiv.org/abs/2404.15691v1 ) ライセンス: Link先を確認	Yuta Saito, Himan Abdollahpouri, Jesse Anderton, Ben Carterette, Mounia Lalmas,	(参考訳) アルゴリズムの短期的および長期的な結果はしばしば異なり、下流効果を損なう。クリックベイトアルゴリズムは、短期的なクリックを増加させるが、長期的なユーザーエンゲージメントを損なう可能性がある。長期的な結果を推定する可能な解決策は、潜在的なアルゴリズムに対するオンライン実験またはA/Bテストを実行することであるが、関心の長期的な結果を見るのに数ヶ月またはそれ以上の時間がかかるため、アルゴリズムの選択プロセスは受け入れがたいほど遅くなる。そこで本研究では, 歴史的および短期的な実験データのみを用いて, アルゴリズムの長期的結果の推定を可能かつ正確に行う問題について検討した。既存のアプローチでは、サロガシーと呼ばれる短期的な結果に関する制限的な仮定が必要か、あるいは非効率な短期的な結果を有効に利用することができない。そこで本稿では,報酬関数の分解に基づく長期オフライン評価(LOPE)という新しいフレームワークを提案する。 LOPEは、代理よりもリラックスした仮定の下で機能し、短時間の報酬を効果的に活用して、分散を大幅に減少させる。合成実験により、LOPEは、特にサロゲーシーが厳しく違反し、長期報酬がうるさい場合に、既存のアプローチよりも優れていることが示された。さらに,音楽ストリーミングプラットフォーム上で収集された大規模A/Bテストデータに対する実世界の実験により,LOPEは既存の実現可能な手法よりも,実際のアルゴリズムの長期的な結果をより正確に推定できることを示した。 Short- and long-term outcomes of an algorithm often differ, with damaging downstream effects. A known example is a click-bait algorithm, which may increase short-term clicks but damage long-term user engagement. A possible solution to estimate the long-term outcome is to run an online experiment or A/B test for the potential algorithms, but it takes months or even longer to observe the long-term outcomes of interest, making the algorithm selection process unacceptably slow. This work thus studies the problem of feasibly yet accurately estimating the long-term outcome of an algorithm using only historical and short-term experiment data. Existing approaches to this problem either need a restrictive assumption about the short-term outcomes called surrogacy or cannot effectively use short-term outcomes, which is inefficient. Therefore, we propose a new framework called Long-term Off-Policy Evaluation (LOPE), which is based on reward function decomposition. LOPE works under a more relaxed assumption than surrogacy and effectively leverages short-term rewards to substantially reduce the variance. Synthetic experiments show that LOPE outperforms existing approaches particularly when surrogacy is severely violated and the long-term reward is noisy. In addition, real-world experiments on large-scale A/B test data collected on a music streaming platform show that LOPE can estimate the long-term outcome of actual algorithms more accurately than existing feasible methods.	翻訳日:2024-04-26 19:59:40 公開日:2024-04-24
# MRIの高速化とロバスト化のための深層学習 Deep Learning for Accelerated and Robust MRI Reconstruction: a Review ( http://arxiv.org/abs/2404.15692v1 ) ライセンス: Link先を確認	Reinhard Heckel, Mathews Jacob, Akshay Chaudhari, Or Perlman, Efrat Shimron,	(参考訳) 深層学習(DL)は、放射線診断において重要なツールであるMRI(MRI)の強化のための重要な技術として最近登場した。本稿では,MRI再建のためのDLの最近の進歩について概説する。画質を改善し、スキャンを加速し、データ関連の課題に対処するために設計されたDLアプローチとアーキテクチャに焦点を当てている。その中には、エンドツーエンドのニューラルネットワーク、事前訓練されたネットワーク、生成モデル、自己管理手法などが含まれる。また,DLが獲得プロトコルの最適化,分散シフトに対する堅牢性の向上,微妙なバイアスに対処する上で果たす役割についても論じる。広範にわたる文献と実践的洞察に基づいて、MRI再建におけるDLの活用における現在の成功、限界、今後の方向性を概説し、臨床画像の実践に大きな影響を与えるDLの可能性を強調した。 Deep learning (DL) has recently emerged as a pivotal technology for enhancing magnetic resonance imaging (MRI), a critical tool in diagnostic radiology. This review paper provides a comprehensive overview of recent advances in DL for MRI reconstruction. It focuses on DL approaches and architectures designed to improve image quality, accelerate scans, and address data-related challenges. These include end-to-end neural networks, pre-trained networks, generative models, and self-supervised methods. The paper also discusses the role of DL in optimizing acquisition protocols, enhancing robustness against distribution shifts, and tackling subtle bias. Drawing on the extensive literature and practical insights, it outlines current successes, limitations, and future directions for leveraging DL in MRI reconstruction, while emphasizing the potential of DL to significantly impact clinical imaging practices.	翻訳日:2024-04-26 19:59:40 公開日:2024-04-24
# DeepFeatureX Net:Deep Features eXtractors based Network for discrimination synthesis from real image DeepFeatureX Net: Deep Features eXtractors based Network for discriminating synthetic from real images ( http://arxiv.org/abs/2404.15697v1 ) ライセンス: Link先を確認	Orazio Pontorno, Luca Guarnera, Sebastiano Battiato,	(参考訳) ディープラーニングアルゴリズムによって生成された合成画像であるDeepfakesは、Digital Forensicsの分野における最大の課題の1つだ。科学コミュニティは、デジタル画像(リアルまたはAI生成)の起源を識別できるアプローチの開発に取り組んでいる。しかし、これらの手法は、訓練中に見えないアーキテクチャによって生成されたとしても、画像の性質を識別する能力という一般化の課題に直面している。これは通常パフォーマンスの低下につながる。この文脈では,ベースモデルと呼ばれる3つのブロックをベースとした新しいアプローチを提案し,各ブロックは,意図的不均衡なデータセットを活用することによって,特定の画像クラスの識別的特徴(拡散モデル生成,GAN生成,あるいは実)を抽出する。そして、各ブロックから抽出された特徴を連結処理し、入力画像の原点を識別する。実験結果から,この手法はJPEG圧縮に優れたロバスト性を示すだけでなく,いくつかの一般化テストにおいて最先端の手法よりも優れていることが示された。コード、モデル、データセットはhttps://github.com/opontorno/block-based_deepfake-detectionで確認できる。 Deepfakes, synthetic images generated by deep learning algorithms, represent one of the biggest challenges in the field of Digital Forensics. The scientific community is working to develop approaches that can discriminate the origin of digital images (real or AI-generated). However, these methodologies face the challenge of generalization, that is, the ability to discern the nature of an image even if it is generated by an architecture not seen during training. This usually leads to a drop in performance. In this context, we propose a novel approach based on three blocks called Base Models, each of which is responsible for extracting the discriminative features of a specific image class (Diffusion Model-generated, GAN-generated, or real) as it is trained by exploiting deliberately unbalanced datasets. The features extracted from each block are then concatenated and processed to discriminate the origin of the input image. Experimental results showed that this approach not only demonstrates good robust capabilities to JPEG compression but also outperforms state-of-the-art methods in several generalization tests. Code, models and dataset are available at https://github.com/opontorno/block-based_deepfake-detection.	翻訳日:2024-04-26 19:59:40 公開日:2024-04-24
# MAS-SAM: 群集した特徴を持つ海洋動物を隔離する MAS-SAM: Segment Any Marine Animal with Aggregated Features ( http://arxiv.org/abs/2404.15700v1 ) ライセンス: Link先を確認	Tianyu Yan, Zifu Wan, Xinhao Deng, Pingping Zhang, Yang Liu, Huchuan Lu,	(参考訳) 近年、SAM(Segment Anything Model)は、高品質なオブジェクトマスクを生成し、ゼロショット画像のセグメンテーションを実現する際、例外的な性能を示す。しかし、多用途視覚モデルとして、SAMは主に大規模な自然光画像で訓練されている。水中のシーンでは、光散乱と吸収により性能が著しく低下する。一方、SAMのデコーダの単純さは、きめ細かいオブジェクトの詳細を損なう可能性がある。以上の課題に対処するため,海洋動物セグメンテーションのためのMAS-SAMという新しい特徴学習フレームワークを提案する。より具体的には、水中シーン用の効果的なアダプタを備えたSAMエンコーダを最初に構築する。次に,ハイパーマップ抽出モジュール (HEM) を導入し,包括的ガイダンスのためのマルチスケール機能を生成する。最後に,マルチスケール特徴を集約し,最終的なセグメンテーション結果を予測するプログレッシブ予測デコーダ(PPD)を提案する。本研究では,Fusion Attention Module (FAM) を移植することにより,グローバルな文脈的手がかりからよりリッチな海洋情報をよりきめ細かな局所的詳細まで抽出することができる。 4つのパブリックMASデータセットに対する大規模な実験により、我々のMAS-SAMは、他の典型的なセグメンテーション手法よりも優れた結果が得られることを示した。ソースコードはhttps://github.com/Drchip61/MAS-SAMで入手できる。 Recently, Segment Anything Model (SAM) shows exceptional performance in generating high-quality object masks and achieving zero-shot image segmentation. However, as a versatile vision model, SAM is primarily trained with large-scale natural light images. In underwater scenes, it exhibits substantial performance degradation due to the light scattering and absorption. Meanwhile, the simplicity of the SAM's decoder might lead to the loss of fine-grained object details. To address the above issues, we propose a novel feature learning framework named MAS-SAM for marine animal segmentation, which involves integrating effective adapters into the SAM's encoder and constructing a pyramidal decoder. More specifically, we first build a new SAM's encoder with effective adapters for underwater scenes. Then, we introduce a Hypermap Extraction Module (HEM) to generate multi-scale features for a comprehensive guidance. Finally, we propose a Progressive Prediction Decoder (PPD) to aggregate the multi-scale features and predict the final segmentation results. When grafting with the Fusion Attention Module (FAM), our method enables to extract richer marine information from global contextual cues to fine-grained local details. Extensive experiments on four public MAS datasets demonstrate that our MAS-SAM can obtain better results than other typical segmentation methods. The source code is available at https://github.com/Drchip61/MAS-SAM.	翻訳日:2024-04-26 19:49:57 公開日:2024-04-24
# Nyonic Technical Report Nyonic Technical Report ( http://arxiv.org/abs/2404.15702v1 ) ライセンス: Link先を確認	Junfeng Tian, Rui Wang, Cong Li, Yudong Zhou, Jun Liu, Jun Wang,	(参考訳) 本報告では,カスタムな大規模言語モデル用に設計された最新の言語モデルの開発と重要な成果について詳述する。導入された進歩には、フレキシブルなトレーニングデータ調整とカリキュラム学習をサポートする、新しいオンラインデータスケジューリングが含まれている。モデルのアーキテクチャには、ロータリー位置埋め込み(Rotary Positional Embeddings)、QK-LayerNorm(QK-LayerNorm)などの最先端技術と、安定性と性能を高めるために特別に製作された多言語トークンライザが組み込まれている。さらに、我々の堅牢なトレーニングフレームワークは、最適な効率を確保するために、高度なモニタリングと迅速なリカバリ機能を備えている。我々のWonton 7Bモデルは、多言語および英語のベンチマークで競合性能を示した。今後の開発は、より広範囲にトレーニングされたモデルによるパフォーマンスギャップの縮小を優先し、実際の有効性と適応性を高めるだろう。 This report details the development and key achievements of our latest language model designed for custom large language models. The advancements introduced include a novel Online Data Scheduler that supports flexible training data adjustments and curriculum learning. The model's architecture is fortified with state-of-the-art techniques such as Rotary Positional Embeddings, QK-LayerNorm, and a specially crafted multilingual tokenizer to enhance stability and performance. Moreover, our robust training framework incorporates advanced monitoring and rapid recovery features to ensure optimal efficiency. Our Wonton 7B model has demonstrated competitive performance on a range of multilingual and English benchmarks. Future developments will prioritize narrowing the performance gap with more extensively trained models, thereby enhancing the model's real-world efficacy and adaptability.GitHub: \url{https://github.com/nyonicai/nyonic-public}	翻訳日:2024-04-26 19:49:56 公開日:2024-04-24
# 逆相補表現学習を用いた効率的な多モデル融合 Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning ( http://arxiv.org/abs/2404.15704v1 ) ライセンス: Link先を確認	Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, Jing Xiao,	(参考訳) 単一モデルシステムは、話者検証(SV)や画像分類といったタスクの欠如に悩まされ、意思決定時に部分的な事前知識に大きく依存する。マルチモデル融合(MMF)はこれらの問題のいくつかを軽減することができるが、学習された表現の冗長性は改善を制限する可能性がある。そこで本稿では,新たにトレーニングされたモデルに対して,事前取得した知識を回避し,各コンポーネントモデルに対して,最大で相補的表現の学習を可能にする,対向的補完的表現学習(ACoRL)フレームワークを提案する。提案手法は従来のMMFよりも効率よく性能を向上することを示す。さらに、属性分析により、ACoRLの下で訓練されたモデルがより補完的な知識を獲得し、タスク間の効率性と堅牢性を高めるためのアプローチの有効性を強調した。 Single-model systems often suffer from deficiencies in tasks such as speaker verification (SV) and image classification, relying heavily on partial prior knowledge during decision-making, resulting in suboptimal performance. Although multi-model fusion (MMF) can mitigate some of these issues, redundancy in learned representations may limits improvements. To this end, we propose an adversarial complementary representation learning (ACoRL) framework that enables newly trained models to avoid previously acquired knowledge, allowing each individual component model to learn maximally distinct, complementary representations. We make three detailed explanations of why this works and experimental results demonstrate that our method more efficiently improves performance compared to traditional MMF. Furthermore, attribution analysis validates the model trained under ACoRL acquires more complementary knowledge, highlighting the efficacy of our approach in enhancing efficiency and robustness across tasks.	翻訳日:2024-04-26 19:49:56 公開日:2024-04-24
# ESR-NeRF:LDR多視点画像を用いた音源再構成 ESR-NeRF: Emissive Source Reconstruction Using LDR Multi-view Images ( http://arxiv.org/abs/2404.15707v1 ) ライセンス: Link先を確認	Jinseo Jeong, Junseo Koo, Qimeng Zhang, Gunhee Kim,	(参考訳) 既存のNeRFベースの逆レンダリング手法では、シーンは遠方の光源によってのみ照らされ、シーン内の放射源の影響を無視する。本研究では,LDRマルチビュー画像に送信源をオン/オフにすることで,この制限に直面している。 2つの重要な問題に対処する必要がある。 1)未知の光の詳細とともに、限られたダイナミックレンジから生じるあいまいさ 2) 最終的な物体色に繋がる経路を後付けするために, ボリュームレンダリングの高価な計算コストがかかる。本稿では,ニューラルネットワークを学習可能な関数として活用し,レイトレーシング場を表現する新しいアプローチであるESR-NeRFを提案する。光輸送セグメントを満たすためにネットワークを訓練することにより、放射源を徐々に特定し、反射領域を認識しながら、発信する放射光を規制する。その結果,ESR-NeRFの質的・定量的な優位性が示された。提案手法は,DTUデータセット上の低CD測定値を達成するため,送信源のないシーンに適用性も拡張する。 Existing NeRF-based inverse rendering methods suppose that scenes are exclusively illuminated by distant light sources, neglecting the potential influence of emissive sources within a scene. In this work, we confront this limitation using LDR multi-view images captured with emissive sources turned on and off. Two key issues must be addressed: 1) ambiguity arising from the limited dynamic range along with unknown lighting details, and 2) the expensive computational cost in volume rendering to backtrace the paths leading to final object colors. We present a novel approach, ESR-NeRF, leveraging neural networks as learnable functions to represent ray-traced fields. By training networks to satisfy light transport segments, we regulate outgoing radiances, progressively identifying emissive sources while being aware of reflection areas. The results on scenes encompassing emissive sources with various properties demonstrate the superiority of ESR-NeRF in qualitative and quantitative ways. Our approach also extends its applicability to the scenes devoid of emissive sources, achieving lower CD metrics on the DTU dataset.	翻訳日:2024-04-26 19:49:56 公開日:2024-04-24
# ViViDex:人間のビデオから視覚に基づく有害な操作を学習する ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos ( http://arxiv.org/abs/2404.15709v1 ) ライセンス: Link先を確認	Zerui Chen, Shizhe Chen, Cordelia Schmid, Ivan Laptev,	(参考訳) 本研究では,多指ロボットによる多様なポーズでさまざまな物体を操作するための統一的な視覚ベースのポリシーを学習することを目的とする。これまでの研究は、人間のビデオが政策学習に有効であることを示したが、ビデオから抽出された物理的に不可解な軌跡によって性能改善は制限されてきた。さらに、接地木オブジェクトのような特権オブジェクト情報への依存は、現実的なシナリオにおける適用性をさらに制限する。これらの制約に対処するため、人間のビデオから視覚に基づくポリシー学習を改善するための新しいフレームワークViViDexを提案する。最初は、強化学習と軌道誘導報酬を使って、各ビデオのステートベースのポリシーを訓練し、ビデオから視覚的に自然と身体的にもっともらしい軌跡の両方を得る。次に、州ベースのポリシーから成功したエピソードをロールアウトし、特権情報を使用しずに統一された視覚ポリシーをトレーニングします。性能を著しく向上させるために座標変換法を提案する。提案手法を3つのデクスタラスな操作タスクで評価し,最先端のアルゴリズムよりも大幅に改善したことを示す。 In this work, we aim to learn a unified vision-based policy for a multi-fingered robot hand to manipulate different objects in diverse poses. Though prior work has demonstrated that human videos can benefit policy learning, performance improvement has been limited by physically implausible trajectories extracted from videos. Moreover, reliance on privileged object information such as ground-truth object states further limits the applicability in realistic scenarios. To address these limitations, we propose a new framework ViViDex to improve vision-based policy learning from human videos. It first uses reinforcement learning with trajectory guided rewards to train state-based policies for each video, obtaining both visually natural and physically plausible trajectories from the video. We then rollout successful episodes from state-based policies and train a unified visual policy without using any privileged information. A coordinate transformation method is proposed to significantly boost the performance. We evaluate our method on three dexterous manipulation tasks and demonstrate a large improvement over state-of-the-art algorithms.	翻訳日:2024-04-26 19:49:56 公開日:2024-04-24
# Ada-DF: 顔表情認識のための適応ラベル分布統合ネットワーク Ada-DF: An Adaptive Label Distribution Fusion Network For Facial Expression Recognition ( http://arxiv.org/abs/2404.15714v1 ) ライセンス: Link先を確認	Shu Liu, Yan Xu, Tongming Wan, Xiaoyan Kui,	(参考訳) 表情認識(FER)は日常生活において重要な役割を担っている。しかし、データセットのアノテーションの曖昧さは、パフォーマンスを著しく損なう可能性がある。本稿では,ferタスクをラベル分散学習パラダイム経由で処理し,デュアルブランチ適応分布融合(Ada-DF)フレームワークを開発する。サンプルのラベル分布を得るために1つの補助枝を構築する。感情のクラス分布は、各感情のラベル分布を通して計算される。最後に、これらの2つの分布は、目標分岐を訓練するための注意重みに応じて適応的に融合する。 RAF-DB、AffectNet、SFEWという3つの実世界のデータセットで大規模な実験が行われています。 Facial expression recognition (FER) plays a significant role in our daily life. However, annotation ambiguity in the datasets could greatly hinder the performance. In this paper, we address FER task via label distribution learning paradigm, and develop a dual-branch Adaptive Distribution Fusion (Ada-DF) framework. One auxiliary branch is constructed to obtain the label distributions of samples. The class distributions of emotions are then computed through the label distributions of each emotion. Finally, those two distributions are adaptively fused according to the attention weights to train the target branch. Extensive experiments are conducted on three real-world datasets, RAF-DB, AffectNet and SFEW, where our Ada-DF shows advantages over the state-of-the-art works.	翻訳日:2024-04-26 19:49:56 公開日:2024-04-24
# 中心スピンモデルにおける精製相転移:双対空間アプローチにおける第二レニイエントロピー Purification phase transition in the central spin model: second Rényi entropy in dual space approach ( http://arxiv.org/abs/2404.15717v1 ) ライセンス: Link先を確認	V. V. Belov, W. V. Pogosov,	(参考訳) 我々は, 中心スピンモデルが測定過程が存在する場合の力学の数値的研究を行う。このモデルは、そのトポロジーにより実験的な探索を約束しており、中心粒子と量子浴を異なるサブシステムとして自然に区別し、絡み合い相転移を調べることができる。この系における測定誘起相転移を特徴づけるために、二次元空間における第二R'enyiエントロピーに基づく最近開発された手法を用いる。シミュレーションでは、デコヒーレンス、エネルギー緩和、ゲートエラーが説明できる。臨界測定速度を判定し, 相互エントロピーに基づく簡単なアプローチで予測した値とは大きく異なることを示す。 We conduct a numerical investigation of the dynamics of the central spin model in the presence of measurement processes. This model holds promise for experimental exploration due to its topology, which facilitates the natural distinction of a central particle and the quantum bath as different subsystems, allowing for the examination of entanglement phase transitions. To characterize the measurement-induced phase transition in this system, we employ a recently developed method based on second R\'enyi entropy in dual space. Our simulations account for decoherence, energy relaxation, and gate errors. We determine critical measurement rates and demonstrate that they significantly differ from those predicted by a simple approach based on mutual entropy.	翻訳日:2024-04-26 19:49:56 公開日:2024-04-24
# 不合理な身体領域における偽予測の緩和 Mitigating False Predictions In Unreasonable Body Regions ( http://arxiv.org/abs/2404.15718v1 ) ライセンス: Link先を確認	Constantin Ulrich, Catherine Knobloch, Julius C. Holzschuh, Tassilo Wald, Maximilian R. Rokuss, Maximilian Zenk, Maximilian Fischer, Michael Baumgartner, Fabian Isensee, Klaus H. Maier-Hein,	(参考訳) 3次元医用画像セグメンテーションのためのディープラーニングモデルの開発にかなりの努力を払っているにもかかわらず、多様な画像分布を効果的に一般化するという課題は続いている。ドメインの一般化は、臨床現場での堅牢な応用には不可欠であると認識されているが、限られた視野(FOV)でのトレーニングから生じる課題は未解決のままである。この制限は、トレーニングデータのFOVを超える身体領域に適用した場合、誤った予測につながる。そこで本研究では, 単一データセットと複数データセットの両方のトレーニングスキームに適用可能な, 不確定な身体領域の予測をペナルティ化する新しい損失関数を提案する。軸スライス位置スコアを生成するBody Part Regressionモデルで実現した。様々なFOVを特徴とするテストセットを用いた包括的評価により,本手法は一般化能力の顕著な改善を示す。偽陽性腫瘍予測を85%まで効果的に軽減し、全体のセグメンテーション性能を大幅に向上させる。 Despite considerable strides in developing deep learning models for 3D medical image segmentation, the challenge of effectively generalizing across diverse image distributions persists. While domain generalization is acknowledged as vital for robust application in clinical settings, the challenges stemming from training with a limited Field of View (FOV) remain unaddressed. This limitation leads to false predictions when applied to body regions beyond the FOV of the training data. In response to this problem, we propose a novel loss function that penalizes predictions in implausible body regions, applicable in both single-dataset and multi-dataset training schemes. It is realized with a Body Part Regression model that generates axial slice positional scores. Through comprehensive evaluation using a test set featuring varying FOVs, our approach demonstrates remarkable improvements in generalization capabilities. It effectively mitigates false positive tumor predictions up to 85% and significantly enhances overall segmentation performance.	翻訳日:2024-04-26 19:49:56 公開日:2024-04-24
# 主観的NLP課題に対するアノテータ中心能動学習 Annotator-Centric Active Learning for Subjective NLP Tasks ( http://arxiv.org/abs/2404.15720v1 ) ライセンス: Link先を確認	Michiel van der Meer, Neele Falk, Pradeep K. Murukannaiah, Enrico Liscio,	(参考訳) 主観的NLPタスクに対する人間の判断のばらつきを正確に把握するためには、アノテーションプロセスに幅広い視点を取り入れることが不可欠である。アクティブラーニング(AL)は、最も有益なサンプルを戦略的に注釈付けすることで、人間のアノテーションを収集するコストに対処する。本稿では,データサンプリングに続き,アノテーション選択戦略を取り入れたACAL(Annotator-Centric Active Learning)を提案する。 1)人間の判断の多様性を効率よく近似し,アノテータ中心の指標を用いてモデル性能を評価する。従来の評価指標と人間中心評価指標の両方を用いて、7つの主観的NLPタスクにまたがる複数のアノテータ選択戦略を実験した。以上の結果から,ACALはデータ効率を向上し,アノテータ中心の性能評価に優れることが示唆された。しかし、その成功は、十分に大きく多様なアノテータのプールがサンプルとして利用できることに依存している。 To accurately capture the variability in human judgments for subjective NLP tasks, incorporating a wide range of perspectives in the annotation process is crucial. Active Learning (AL) addresses the high costs of collecting human annotations by strategically annotating the most informative samples. We introduce Annotator-Centric Active Learning (ACAL), which incorporates an annotator selection strategy following data sampling. Our objective is two-fold: (1) to efficiently approximate the full diversity of human judgments, and to assess model performance using annotator-centric metrics, which emphasize minority perspectives over a majority. We experiment with multiple annotator selection strategies across seven subjective NLP tasks, employing both traditional and novel, human-centered evaluation metrics. Our findings indicate that ACAL improves data efficiency and excels in annotator-centric performance evaluations. However, its success depends on the availability of a sufficiently large and diverse pool of annotators to sample from.	翻訳日:2024-04-26 19:49:56 公開日:2024-04-24
# SPARO: 視覚のためのロバストおよびコンポジショントランスフォーマーエンコーディングのための選択的注意 SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision ( http://arxiv.org/abs/2404.15721v1 ) ライセンス: Link先を確認	Ankit Vani, Bac Nguyen, Samuel Lavoie, Ranjay Krishna, Aaron Courville,	(参考訳) 選択的な注意は、感覚入力の絶え間ない洪水におけるタスク関連側面に焦点を合わせるのに役立ちます。この知覚の制約は、注意を散らし、知覚可能な概念の新しい構成にしっかりと一般化することを可能にする。しかし、CLIPやDINOのようなトランスフォーマーバックボーンを持つ表現学習モデルは、堅牢性や構成性を示すのに失敗することが多い。人間の知覚とは異なり、トランスフォーマーエンコーディングは個々の概念を別々に扱うものではない。そこで本研究では,SPAROを提案する。SPAROは1つのアテンションヘッドによって生成され,エンコーディングを別個のアテンションスロットに分割する読み出し機構である。 CLIPによるSPAROの使用は、視覚とテキストのモダリティが同じ概念を持つ共有構成世界の異なる視点であることを示す帰納的バイアスを与える。 SPAROを用いて、CLIPによる下流認識、ロバスト性、検索、構成性ベンチマークの改善(ImageNetは+14%、SugarCrepeは+4%)、およびDINOによるImageNetの近接および線形プローブ(+3%)について示す。また,各SPARO概念に介入して選択し,下流タスク性能(SugarCrepeでは+4%から+9%まで)をさらに向上させ,SPAROの表現構造の堅牢性について検討する強力な能力についても紹介する。最後に、アブレーション実験と学習概念の可視化を通して洞察を提供する。 Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception allows us to robustly generalize under distractions and to new compositions of perceivable concepts. Transformers employ a similar notion of attention in their architecture, but representation learning models with transformer backbones like CLIP and DINO often fail to demonstrate robustness and compositionality. We highlight a missing architectural prior: unlike human perception, transformer encodings do not separately attend over individual concepts. In response, we propose SPARO, a read-out mechanism that partitions encodings into separately-attended slots, each produced by a single attention head. Using SPARO with CLIP imparts an inductive bias that the vision and text modalities are different views of a shared compositional world with the same corresponding concepts. Using SPARO, we demonstrate improvements on downstream recognition, robustness, retrieval, and compositionality benchmarks with CLIP (up to +14% for ImageNet, +4% for SugarCrepe), and on nearest neighbors and linear probe for ImageNet with DINO (+3% each). We also showcase a powerful ability to intervene and select individual SPARO concepts to further improve downstream task performance (up from +4% to +9% for SugarCrepe) and use this ability to study the robustness of SPARO's representation structure. Finally, we provide insights through ablation experiments and visualization of learned concepts.	翻訳日:2024-04-26 19:49:56 公開日:2024-04-24
# Gradformer: 指数減衰を備えたグラフ変換器 Gradformer: Graph Transformer with Exponential Decay ( http://arxiv.org/abs/2404.15729v1 ) ライセンス: Link先を確認	Chuang Liu, Zelin Yao, Yibing Zhan, Xueqi Ma, Shirui Pan, Wenbin Hu,	(参考訳) グラフトランスフォーマー(GT)は、幅広いタスクでその利点を実証している。しかし、GTsの自己注意機構はグラフの帰納バイアス、特にグラフのタスクに不可欠な構造に関するバイアスを見落としている。位置符号化と注意バイアスを利用して誘導バイアスをモデル化する手法もあるが、その効果はいまだに準最適である。そこで本稿では,GTと本質的帰納バイアスを革新的に統合する手法であるGradformerについて述べる。具体的には、崩壊マスク行列の値は指数関数的に減少し、グラフ構造内のノードの近さの減少に関連している。この設計によりGradformerは、グラフのローカル詳細に集中しながら、遠くのノードから情報をキャプチャする能力を維持することができる。さらに、グラッドフォーマーは減衰マスクに学習可能な制約を導入し、異なる注意頭が異なる減衰マスクを学習できるようにする。このような設計は注目ヘッドを多様化させ、グラフ内の多様な構造情報のより効果的な同化を可能にする。様々なベンチマーク実験により、グラフニューラルネットワークとGTベースラインモデルにおいて、グラフ分類や回帰タスクにおいて、Gradformerは一貫してパフォーマンスが向上していることが示された。さらに、Gradformerは他のGTモデルで観測される顕著な精度低下とは対照的に、ネットワークの深層化に伴って浅部モデルと比較して精度を維持または向上し、深部GTモデルのトレーニングに有効な方法であることが証明されている。 Graph Transformers (GTs) have demonstrated their advantages across a wide range of tasks. However, the self-attention mechanism in GTs overlooks the graph's inductive biases, particularly biases related to structure, which are crucial for the graph tasks. Although some methods utilize positional encoding and attention bias to model inductive biases, their effectiveness is still suboptimal analytically. Therefore, this paper presents Gradformer, a method innovatively integrating GT with the intrinsic inductive bias by applying an exponential decay mask to the attention matrix. Specifically, the values in the decay mask matrix diminish exponentially, correlating with the decreasing node proximities within the graph structure. This design enables Gradformer to retain its ability to capture information from distant nodes while focusing on the graph's local details. Furthermore, Gradformer introduces a learnable constraint into the decay mask, allowing different attention heads to learn distinct decay masks. Such an design diversifies the attention heads, enabling a more effective assimilation of diverse structural information within the graph. Extensive experiments on various benchmarks demonstrate that Gradformer consistently outperforms the Graph Neural Network and GT baseline models in various graph classification and regression tasks. Additionally, Gradformer has proven to be an effective method for training deep GT models, maintaining or even enhancing accuracy compared to shallow models as the network deepens, in contrast to the significant accuracy drop observed in other GT models.Codes are available at \url{https://github.com/LiuChuang0059/Gradformer}.	翻訳日:2024-04-26 19:49:56 公開日:2024-04-24
# MD-NOMAD:確率微分方程式と不確実性伝播をエミュレートするための混合密度非線形多様体デコーダ MD-NOMAD: Mixture density nonlinear manifold decoder for emulating stochastic differential equations and uncertainty propagation ( http://arxiv.org/abs/2404.15731v1 ) ライセンス: Link先を確認	Akshay Thakur, Souvik Chakraborty,	(参考訳) 確率シミュレータのためのニューラル演算子フレームワークである混合密度非線形多様体デコーダ(MD-NOMAD)を提案する。提案手法は,ニューラルアーキテクチャの非線形デコーダ(NomaD)と混合密度に基づく手法を併用して,確率的出力関数の条件確率分布を推定する。 MD-NOMADは、確率的混合モデルの複雑な確率と、ポイントワイドニューラル演算子NOMADの高次元スケーラビリティを推定する能力を利用する。本研究では, 確率的常微分方程式と偏微分方程式の広範囲にまたがる実験的な評価を行い, 対応する結果を示し, 提案フレームワークの性能を明らかにする。 We propose a neural operator framework, termed mixture density nonlinear manifold decoder (MD-NOMAD), for stochastic simulators. Our approach leverages an amalgamation of the pointwise operator learning neural architecture nonlinear manifold decoder (NOMAD) with mixture density-based methods to estimate conditional probability distributions for stochastic output functions. MD-NOMAD harnesses the ability of probabilistic mixture models to estimate complex probability and the high-dimensional scalability of pointwise neural operator NOMAD. We conduct empirical assessments on a wide array of stochastic ordinary and partial differential equations and present the corresponding results, which highlight the performance of the proposed framework.	翻訳日:2024-04-26 19:49:56 公開日:2024-04-24
# メトロ原点推定予測のための微粒な時空間MLPアーキテクチャ Fine-grained Spatial-temporal MLP Architecture for Metro Origin-Destination Prediction ( http://arxiv.org/abs/2404.15734v1 ) ライセンス: Link先を確認	Yang Liu, Binglin Chen, Yongsen Zheng, Guanbin Li, Liang Lin,	(参考訳) 都市交通の正確な予測は、地下鉄のスケジューリングを最適化し、全体の輸送効率を向上させるために重要である。駅間の細粒度および包括的関係を効果的に分析することは、メトロオリジン・デスティン化(OD)予測に不可欠である。しかし、既存のメトロODモデルは、駅の視点で複数のODペアからの情報や、ODペアのサブセットにのみ焦点を合わせている。これらのアプローチはODペア間の微細な関係を見落とし、潜在的な異常な状態を予測するのに困難をもたらす可能性がある。これらの課題に対処するために、すべてのODペアの観点からトラフィックの変動を分析し、ODMixerというメトロOD予測のための微粒な時空間MLPアーキテクチャを提案する。具体的には、ODMixerは二重分岐構造を持ち、Channel Mixer、Multi-view Mixer、Bidirectional Trend Learnerを含む。 Channel MixerはODペア間の短期的時間的関係を捉えることを目的としており、Multi-view Mixerは起源と目的地の両方の観点から関係を捉えることに集中している。長期的な時間的関係をモデル化するために,双方向トレンド学習システムを導入する。大規模OD予測データセットHZMODとSHMOの大規模な実験により,ODMixerの利点が示された。コードは利用可能です。 Accurate prediction of metro traffic is crucial for optimizing metro scheduling and enhancing overall transport efficiency. Analyzing fine-grained and comprehensive relations among stations effectively is imperative for metro Origin-Destination (OD) prediction. However, existing metro OD models either mix information from multiple OD pairs from the station's perspective or exclusively focus on a subset of OD pairs. These approaches may overlook fine-grained relations among OD pairs, leading to difficulties in predicting potential anomalous conditions. To address these challenges, we analyze traffic variations from the perspective of all OD pairs and propose a fine-grained spatial-temporal MLP architecture for metro OD prediction, namely ODMixer. Specifically, our ODMixer has double-branch structure and involves the Channel Mixer, the Multi-view Mixer, and the Bidirectional Trend Learner. The Channel Mixer aims to capture short-term temporal relations among OD pairs, the Multi-view Mixer concentrates on capturing relations from both origin and destination perspectives. To model long-term temporal relations, we introduce the Bidirectional Trend Learner. Extensive experiments on two large-scale metro OD prediction datasets HZMOD and SHMO demonstrate the advantages of our ODMixer. The code will be available.	翻訳日:2024-04-26 19:49:56 公開日:2024-04-24
# 列車なしのゲイン:訓練不要な言語適応者強化のための言語算術 No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement ( http://arxiv.org/abs/2404.15737v1 ) ライセンス: Link先を確認	Mateusz Klimaszewski, Piotr Andruszkiewicz, Alexandra Birch,	(参考訳) モジュール型深層学習は、多言語性の呪いを解き、負の干渉の影響を防ぎ、多言語事前学習言語モデルにおける言語間性能を実現するための最先端のソリューションである。しかし、このアプローチのトレードオフは、密接な関係のある言語からの正転移学習の削減である。そこで本研究では,この制限に対処するためのトレーニング不要なポストプロセッシングを実現する,言語演算と呼ばれる新しい手法を提案する。タスク演算フレームワークにインスパイアされ、言語アダプタに加えて学習を適用し、フレームワークをマルチタスクから多言語設定に移行する。提案手法の有効性は,MAD-Xに基づく言語間スキームの3つの下流タスクにおいて実証され,後処理の手順として機能する。ゼロショットおよび低リソースアプリケーションの最も難しいケースでは、言語演算がベースラインを一貫して改善する。私たちのコードとモデルはhttps://github.com/mklimasz/ language-arithmetic で利用可能です。 Modular deep learning is the state-of-the-art solution for lifting the curse of multilinguality, preventing the impact of negative interference and enabling cross-lingual performance in Multilingual Pre-trained Language Models. However, a trade-off of this approach is the reduction in positive transfer learning from closely related languages. In response, we introduce a novel method called language arithmetic, which enables training-free post-processing to address this limitation. Inspired by the task arithmetic framework, we apply learning via addition to the language adapters, transitioning the framework from a multi-task to a multilingual setup. The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes, acting as a post-processing procedure. Language arithmetic consistently improves the baselines with significant gains in the most challenging cases of zero-shot and low-resource applications. Our code and models are available at https://github.com/mklimasz/language-arithmetic .	翻訳日:2024-04-26 19:40:12 公開日:2024-04-24
# ネストニューラルネットワークによるSINDyアプローチの一般化 Generalizing the SINDy approach with nested neural networks ( http://arxiv.org/abs/2404.15742v1 ) ライセンス: Link先を確認	Camilla Fiorini, Clément Flint, Louis Fostier, Emmanuel Franck, Reyhaneh Hashemi, Victor Michel-Dansac, Wassim Tenachi,	(参考訳) シンボリック回帰(英: Symbolic Regression、SR)は、データからシンボリック表現を推論することを目的とした、広く研究されている研究分野である。 SRの一般的なアプローチは、疎回帰を用いてデータから支配方程式を識別する非線形力学系(\sindy)フレームワークのスパース同定である。本研究では、ネスト構造によりSINDyアプローチの表現性を高めることを目的とした強化手法であるNested SINDyを紹介する。実際、伝統的な記号回帰法やシステム同定法は、分析的に容易に記述できない複雑なシステムでは失敗することが多い。 Nested SINDyはSINDyフレームワーク上に構築されており、コアSINDyレイヤの前後に追加レイヤを導入する。これにより、関数の合成や積を含む、より広い範囲のシステムに対する記号表現を特定できる。我々は、基本的な三角関数やより複雑なシステムに対するスパースな解析的表現など、単純なシステムの記号表現を正確に見つけるNested SINDyアプローチの能力を実証する。この結果から,Nested SINDyが表現性において従来のSINDyアプローチを超越した,シンボリック回帰のツールとしての可能性を強調した。しかし、Nested SINDyの最適化プロセスの課題にも言及し、最適化プロセスのためのより堅牢な方法論の設計を含む今後の研究方向性を提案する。この研究は、Nested SINDyがデータから動的システムの記号表現を効果的に発見できることを証明し、データ駆動手法によって複雑なシステムを理解する新たな機会を提供する。 Symbolic Regression (SR) is a widely studied field of research that aims to infer symbolic expressions from data. A popular approach for SR is the Sparse Identification of Nonlinear Dynamical Systems (\sindy) framework, which uses sparse regression to identify governing equations from data. This study introduces an enhanced method, Nested SINDy, that aims to increase the expressivity of the SINDy approach thanks to a nested structure. Indeed, traditional symbolic regression and system identification methods often fail with complex systems that cannot be easily described analytically. Nested SINDy builds on the SINDy framework by introducing additional layers before and after the core SINDy layer. This allows the method to identify symbolic representations for a wider range of systems, including those with compositions and products of functions. We demonstrate the ability of the Nested SINDy approach to accurately find symbolic expressions for simple systems, such as basic trigonometric functions, and sparse (false but accurate) analytical representations for more complex systems. Our results highlight Nested SINDy's potential as a tool for symbolic regression, surpassing the traditional SINDy approach in terms of expressivity. However, we also note the challenges in the optimization process for Nested SINDy and suggest future research directions, including the designing of a more robust methodology for the optimization process. This study proves that Nested SINDy can effectively discover symbolic representations of dynamical systems from data, offering new opportunities for understanding complex systems through data-driven methods.	翻訳日:2024-04-26 19:40:12 公開日:2024-04-24
# SRAGAN: 清墨画創出のための正則化・適応生成支援ネットワーク SRAGAN: Saliency Regularized and Attended Generative Adversarial Network for Chinese Ink-wash Painting Generation ( http://arxiv.org/abs/2404.15743v1 ) ライセンス: Link先を確認	Xiang Gao, Yuqi Zhang,	(参考訳) 本論文は、実際の絵を中国の伝統的な墨画、すなわち中国の墨画様式の移譲に転換する問題に対処する。この問題は、画像から画像への翻訳モデルによって実現できるが、これらすべての方法で注目すべき問題は、オリジナルの画像内容の詳細がインクウォッシュスタイルの要素の転送によって容易に消去または破損できることである。この問題を解消または改善するために,未完成画像から画像への翻訳フレームワークに塩分検出を導入し,生成した絵画のコンテンツ情報を正規化することを提案する。本手法では,サリエンシ・アダプティブ・ノーマライゼーション(SANorm)を提案し,サリエンシ・アダプティブ・ノーマライゼーション(SANorm)を提案し,サリエンシ・インダプティブ・ノーマライゼーション(SANorm)を提案し,サリエンシ・インフォメーション・インフォメーション・インフォメーション・インフォメーション・インフォメーション・インジェクションにより,サリエンシ・インフォメーション・インフォメーション・インフォメーション・インフォメーション・インフォメーション・インフォメーション・インフォメーション(SANorm)を提案する。また,サリエンシ・マスクを用いたサリエンシ・アテンデント・ディシミネータ・ネットワークを提案し,サリエンシ・マスクを用いたサリエンシ・アテンシ・アテンシ・アテンシネータ・ネットワークを提案し,画像のサリエンシ・オブジェクトに対してより微細なインク・ウォッシュ・スタイリゼーション・エフェクトの創出に寄与する。定性的かつ定量的な実験は、中国の墨画様式の伝達方法よりも、我々のモデルの方が優れていることを一貫して示している。 This paper handles the problem of converting real pictures into traditional Chinese ink-wash paintings, i.e., Chinese ink-wash painting style transfer. Though this problem could be realized by a wide range of image-to-image translation models, a notable issue with all these methods is that the original image content details could be easily erased or corrupted due to transfer of ink-wash style elements. To solve or ameliorate this issue, we propose to incorporate saliency detection into the unpaired image-to-image translation framework to regularize content information of the generated paintings. The saliency map is utilized for content regularization from two aspects, both explicitly and implicitly: (\romannumeral1) we propose saliency IOU (SIOU) loss to explicitly regularize saliency consistency before and after stylization; (\romannumeral2) we propose saliency adaptive normalization (SANorm) which implicitly enhances content integrity of the generated paintings by injecting saliency information to the generator network to guide painting generation. Besides, we also propose saliency attended discriminator network which harnesses saliency mask to focus generative adversarial attention onto salient image regions, it contributes to producing finer ink-wash stylization effect for salient objects of images. Qualitative and quantitative experiments consistently demonstrate superiority of our model over related advanced methods for Chinese ink-wash painting style transfer.	翻訳日:2024-04-26 19:40:12 公開日:2024-04-24
# グラフベースフェイクニュース検出器に対する一般ブラックボックス攻撃 A General Black-box Adversarial Attack on Graph-based Fake News Detectors ( http://arxiv.org/abs/2404.15744v1 ) ライセンス: Link先を確認	Peican Zhu, Zechen Pan, Yang Liu, Jiwei Tian, Keke Tang, Zhen Wang,	(参考訳) グラフニューラルネットワーク(GNN)をベースとした偽ニュース検出装置は,グラフ構築に様々な手法を適用し,識別のための特徴あるニュース埋め込みを学習することを目的とした。ブラックボックスのシナリオでは、建設の詳細は分かっていないため、特定の隣接行列を必要とする古典的な敵攻撃を実行することは現実的ではない。本稿では,異なるグラフ構造に基づく検出器に対する一般攻撃(GAFSI)を初めて提案する。特に、共有はグラフを構築するためにGNNベースのフェイクニュース検出器にとって重要な社会的相互作用であるので、我々は共有行動をシミュレートして検出器を騙す。まず,ローカルおよびグローバルな情報を活用するユーザを選別するための不正選択モジュールを提案する。さらに、ポストインジェクションモジュールは、選択したユーザに対して、投稿を送信して共有関係を作成するようにガイドする。共有記録はソーシャルコンテキストに追加され、さまざまな検出器に対する一般的な攻撃につながる。実験データを用いた実験の結果,GAFSIの有効性が示された。 Graph Neural Network (GNN)-based fake news detectors apply various methods to construct graphs, aiming to learn distinctive news embeddings for classification. Since the construction details are unknown for attackers in a black-box scenario, it is unrealistic to conduct the classical adversarial attacks that require a specific adjacency matrix. In this paper, we propose the first general black-box adversarial attack framework, i.e., General Attack via Fake Social Interaction (GAFSI), against detectors based on different graph structures. Specifically, as sharing is an important social interaction for GNN-based fake news detectors to construct the graph, we simulate sharing behaviors to fool the detectors. Firstly, we propose a fraudster selection module to select engaged users leveraging local and global information. In addition, a post injection module guides the selected users to create shared relations by sending posts. The sharing records will be added to the social context, leading to a general attack against different detectors. Experimental results on empirical datasets demonstrate the effectiveness of GAFSI.	翻訳日:2024-04-26 19:40:12 公開日:2024-04-24
# メタアナリシスを超えた協調的不均一因果推論 Collaborative Heterogeneous Causal Inference Beyond Meta-analysis ( http://arxiv.org/abs/2404.15746v1 ) ライセンス: Link先を確認	Tianyu Guo, Sai Praneeth Karimireddy, Michael I. Jordan,	(参考訳) 異なるデータセンター間のコラボレーションは、サイト間の異質性によってしばしば問題になる。異質性を考慮するために、最先端の手法は、各部位における共変量分布を再重み付けして、対象個体群の分布に適合させることである。それでも、ある場所が人口全体をカバーできなかったら、この方法は容易に失敗する可能性がある。さらに、分散シフトを調整した後も、従来のメタ分析の概念に依存している。本研究では,不均一データを用いた因果推論のための協調的逆確率スコア重み付け推定器を提案する。分布シフトを個別に調整する代わりに、重み付けされた確率スコアモデルを用いて分布シフトを協調的に調整する。異質性の増加に伴うメタアナリシスに基づく手法に対して,本手法は有意な改善を示した。脆弱な密度推定を考慮し,d<8を用いた非パラメトリック密度推定と,漸近的正規性を保証するフレキシブルな機械学習手法の可能性を示す。プライバシを保ちながら、成果モデルを協調的にトレーニングするフェデレーション学習アルゴリズムを提案する。合成および実データを用いて,本手法の利点を実証する。 Collaboration between different data centers is often challenged by heterogeneity across sites. To account for the heterogeneity, the state-of-the-art method is to re-weight the covariate distributions in each site to match the distribution of the target population. Nevertheless, this method could easily fail when a certain site couldn't cover the entire population. Moreover, it still relies on the concept of traditional meta-analysis after adjusting for the distribution shift. In this work, we propose a collaborative inverse propensity score weighting estimator for causal inference with heterogeneous data. Instead of adjusting the distribution shift separately, we use weighted propensity score models to collaboratively adjust for the distribution shift. Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases. To account for the vulnerable density estimation, we further discuss the double machine method and show the possibility of using nonparametric density estimation with d<8 and a flexible machine learning method to guarantee asymptotic normality. We propose a federated learning algorithm to collaboratively train the outcome model while preserving privacy. Using synthetic and real datasets, we demonstrate the advantages of our method.	翻訳日:2024-04-26 19:40:12 公開日:2024-04-24
# Guided-SPSA:パラメータシフト則による同時摂動確率近似 Guided-SPSA: Simultaneous Perturbation Stochastic Approximation assisted by the Parameter Shift Rule ( http://arxiv.org/abs/2404.15751v1 ) ライセンス: Link先を確認	Maniraman Periyasamy, Axel Plinge, Christopher Mutschler, Daniel D. Scherer, Wolfgang Mauerer,	(参考訳) 変分量子アルゴリズム(VQC)の研究は近年,量子コンピューティングコミュニティから大きな注目を集めている。これらのハイブリッドアルゴリズムは古典的成分と量子的成分の両方を利用しており、ノイズの多い中間スケールの量子デバイスに適している。パラメータシフト則を用いて正確な勾配を推定してVQCを最適化することは、NISQデバイスでは実現可能だが、より大きな問題サイズではうまくスケールできない。計算複雑性は、パラメータシフト則による勾配推定に必要な回路評価数の観点から、VQCのパラメータ数と線形にスケールする。一方、同時摂動確率近似(SPSA)のようなVQCsの勾配を近似する手法は、パラメータの数に応じてスケールしないが不安定と闘い、しばしば準最適解を得る。本研究では,パラメータシフト則とSPSAに基づく勾配近似を有意に組み合わせた,ガイド-SPSAと呼ばれる新しい勾配推定手法を提案する。 Guided-SPSAは、パラメータシフト則と同等またはより良い解を求めるトレーニング中に必要となる回路評価回数を15%から25%削減する。 Guided-SPSAはすべてのシナリオで標準SPSAより優れており、パラメータの最適下初期化のようなシナリオではパラメータシフトルールより優れている。本稿では、回帰、分類、強化学習などの量子機械学習の様々なパラダイムにおけるガイド-SPSAの性能を数値的に示す。 The study of variational quantum algorithms (VQCs) has received significant attention from the quantum computing community in recent years. These hybrid algorithms, utilizing both classical and quantum components, are well-suited for noisy intermediate-scale quantum devices. Though estimating exact gradients using the parameter-shift rule to optimize the VQCs is realizable in NISQ devices, they do not scale well for larger problem sizes. The computational complexity, in terms of the number of circuit evaluations required for gradient estimation by the parameter-shift rule, scales linearly with the number of parameters in VQCs. On the other hand, techniques that approximate the gradients of the VQCs, such as the simultaneous perturbation stochastic approximation (SPSA), do not scale with the number of parameters but struggle with instability and often attain suboptimal solutions. In this work, we introduce a novel gradient estimation approach called Guided-SPSA, which meaningfully combines the parameter-shift rule and SPSA-based gradient approximation. The Guided-SPSA results in a 15% to 25% reduction in the number of circuit evaluations required during training for a similar or better optimality of the solution found compared to the parameter-shift rule. The Guided-SPSA outperforms standard SPSA in all scenarios and outperforms the parameter-shift rule in scenarios such as suboptimal initialization of the parameters. We demonstrate numerically the performance of Guided-SPSA on different paradigms of quantum machine learning, such as regression, classification, and reinforcement learning.	翻訳日:2024-04-26 19:40:12 公開日:2024-04-24
# GC-IMSデータを用いた感染検出のための機械学習アルゴリズムの探索 Exploring Machine Learning Algorithms for Infection Detection Using GC-IMS Data: A Preliminary Study ( http://arxiv.org/abs/2404.15757v1 ) ライセンス: Link先を確認	Christos Sardianos, Chrysostomos Symvoulidis, Matthias Schlögl, Iraklis Varlamis, Georgios Th. Papadopoulos,	(参考訳) 感染症の診断における高度な診断技術の発達は、現代医療において重要な領域となっている。ガスクロマトグラフィー・イオンモビリティ・スペクトロメトリ(GC-IMS)データを活用し,機械学習アルゴリズムを1つのプラットフォームに組み込むことで,正確な感染識別の課題に対処することを目的とした。これらの困難に触発されて、当社の目標は、強力なデータ分析プロセスの作成、機械学習(ML)モデルの強化、臨床応用の徹底的な検証である。本研究は,ガスクロマトグラフィー・イオンモビリティ・スペクトロメトリ(GC-IMS)データと機械学習アルゴリズムを統合実験室情報管理システム(LIMS)プラットフォームに組み込むことにより,先進的な診断技術の分野に寄与する。プリミティブトライアルでは、さまざまなMLアルゴリズムを使用して感染したサンプルと非感染したサンプルを区別する際の精度の向上が示されている。現在、継続する取り組みは、モデルの有効性を高め、その機能を明らかにするための技術を調査し、病気の早期発見を支援するために様々な種類のデータを統合することに重点を置いている。 The developing field of enhanced diagnostic techniques in the diagnosis of infectious diseases, constitutes a crucial domain in modern healthcare. By utilizing Gas Chromatography-Ion Mobility Spectrometry (GC-IMS) data and incorporating machine learning algorithms into one platform, our research aims to tackle the ongoing issue of precise infection identification. Inspired by these difficulties, our goals consist of creating a strong data analytics process, enhancing machine learning (ML) models, and performing thorough validation for clinical applications. Our research contributes to the emerging field of advanced diagnostic technologies by integrating Gas Chromatography-Ion Mobility Spectrometry (GC-IMS) data and machine learning algorithms within a unified Laboratory Information Management System (LIMS) platform. Preliminary trials demonstrate encouraging levels of accuracy when employing various ML algorithms to differentiate between infected and non-infected samples. Continuing endeavors are currently concentrated on enhancing the effectiveness of the model, investigating techniques to clarify its functioning, and incorporating many types of data to further support the early detection of diseases.	翻訳日:2024-04-26 19:40:12 公開日:2024-04-24
# Dot by Dot:変換言語モデルに隠れた計算 Let's Think Dot by Dot: Hidden Computation in Transformer Language Models ( http://arxiv.org/abs/2404.15758v1 ) ライセンス: Link先を確認	Jacob Pfau, William Merrill, Samuel R. Bowman,	(参考訳) 言語モデルの連鎖応答は、ほとんどのベンチマークのパフォーマンスを改善する。しかしながら、これらのパフォーマンス向上が、人間のようなタスクの分解や、追加トークンが許容するより大きい計算にどの程度貢献できるかは、まだ不明である。中間トークンを使わずに応答できない2つの難解なアルゴリズムタスクを解くという考え方の連鎖の代わりに,トランスフォーマーは無意味なフィラートークン(eg, '...')を使用できることを示す。しかし, フィラートークンの学習は困難であり, 集束するためには, 具体的, 密集的な監督が必要であることが実証的に判明した。また、フィラートークンが一階公式の量化器深さの点で有用であるような問題のクラスを理論的に特徴づける。この特徴を満たすために、連鎖トークンはマルチトークン計算に関わる中間計算ステップに関する情報を提供する必要はない。以上の結果から,トークン選択とは無関係に,追加のトークンが計算上のメリットをもたらすことが示唆された。中間トークンがフィラートークンとして機能するという事実は、観測されたチェーンオブソートトークンから次第に分離される、不明瞭で隠れた計算に関わる大きな言語モデルに対する懸念を提起する。 Chain-of-thought responses from language models improve performance across most benchmarks. However, it remains unclear to what extent these performance gains can be attributed to human-like task decomposition or simply the greater computation that additional tokens allow. We show that transformers can use meaningless filler tokens (e.g., '......') in place of a chain of thought to solve two hard algorithmic tasks they could not solve when responding without intermediate tokens. However, we find empirically that learning to use filler tokens is difficult and requires specific, dense supervision to converge. We also provide a theoretical characterization of the class of problems where filler tokens are useful in terms of the quantifier depth of a first-order formula. For problems satisfying this characterization, chain-of-thought tokens need not provide information about the intermediate computational steps involved in multi-token computations. In summary, our results show that additional tokens can provide computational benefits independent of token choice. The fact that intermediate tokens can act as filler tokens raises concerns about large language models engaging in unauditable, hidden computations that are increasingly detached from the observed chain-of-thought tokens.	翻訳日:2024-04-26 19:40:12 公開日:2024-04-24
# 逆実例によるデバイアスマシンの非学習 Debiasing Machine Unlearning with Counterfactual Examples ( http://arxiv.org/abs/2404.15760v1 ) ライセンス: Link先を確認	Ziheng Chen, Jia Wang, Jun Zhuang, Abbavaram Gowtham Reddy, Fabrizio Silvestri, Jin Huang, Kaushiki Nag, Kun Kuang, Xin Ning, Gabriele Tolomei,	(参考訳) 忘れられる権利(RTBF)は、機械学習技術を実装することによって、過去の行動の持続的な影響から個人を保護しようとするものである。これらの技術は、広範囲なモデルの再訓練を必要とせずに、以前取得した知識の削除を促進する。しかし、彼らはしばしば重要な問題を見落としている。このバイアスは,(1)不均一なデータ除去を特徴とするデータレベルのバイアス,(2)残りのデータセットを汚染し,モデル精度を低下させるアルゴリズムレベルのバイアスの2つから生じる。本研究では、未学習プロセスの背後にある因果要因を分析し、データレベルとアルゴリズムレベルでバイアスを軽減する。通常、我々は介入に基づくアプローチを導入し、脱バイアスデータセットで忘れるべき知識を消去する。さらに,他のデータセットのパフォーマンスを損なうことなくセマンティックデータの一貫性を維持するため,逆実例を活用することで,忘れる手順を導出する。実験の結果,提案手法は,評価指標に基づく既存の機械学習ベースラインよりも優れていた。 The right to be forgotten (RTBF) seeks to safeguard individuals from the enduring effects of their historical actions by implementing machine-learning techniques. These techniques facilitate the deletion of previously acquired knowledge without requiring extensive model retraining. However, they often overlook a critical issue: unlearning processes bias. This bias emerges from two main sources: (1) data-level bias, characterized by uneven data removal, and (2) algorithm-level bias, which leads to the contamination of the remaining dataset, thereby degrading model accuracy. In this work, we analyze the causal factors behind the unlearning process and mitigate biases at both data and algorithmic levels. Typically, we introduce an intervention-based approach, where knowledge to forget is erased with a debiased dataset. Besides, we guide the forgetting procedure by leveraging counterfactual examples, as they maintain semantic data consistency without hurting performance on the remaining dataset. Experimental results demonstrate that our method outperforms existing machine unlearning baselines on evaluation metrics.	翻訳日:2024-04-26 19:40:12 公開日:2024-04-24
# シリコン中のドナーバウンド電子スピンの高強度スピン軌道結合強度の測定 Measurement of enhanced spin-orbit coupling strength for donor-bound electron spins in silicon ( http://arxiv.org/abs/2404.15762v1 ) ライセンス: Link先を確認	Radha Krishnan, Beng Yee Gan, Yu-Ling Hsueh, A. M. Saffat-Ee Huq, Jonathan Kenny, Rajib Rahman, Teck Seng Koh, Michelle Y. Simmons, Bent Weber,	(参考訳) 伝統的に量子ドットスピン量子ビットにおける有害な効果と考えられてきたが、オンチップ交流電場による高速コヒーレント制御を可能にするため、スピン軌道相互作用は近年再検討されている。バルクシリコン中の電子の場合、SOCは本質的に弱いが、表面や界面、あるいは原子配置によって増強することができる。ここでは、スピン軌道結合の強さは、単一ドナーと比較して多重ドナー量子ドットの多体波動関数の2桁以上で局所的に増強できることを示す。電気双極子スピン共鳴(EDSR)を用いたシリコン中のドナー結合スピンの全電気的制御の経路を提供する可能性がある。 While traditionally considered a deleterious effect in quantum dot spin qubits, the spin-orbit interaction is recently being revisited as it allows for rapid coherent control by on-chip AC electric fields. For electrons in bulk silicon, SOC is intrinsically weak, however, it can be enhanced at surfaces and interfaces, or through atomic placement. Here we show that the strength of the spin-orbit coupling can be locally enhanced by more than two orders of magnitude in the manybody wave functions of multi-donor quantum dots compared to a single donor, reaching strengths so far only reported for holes or two-donor system with certain symmetry. Our findings may provide a pathway towards all-electrical control of donor-bound spins in silicon using electric dipole spin resonance (EDSR).	翻訳日:2024-04-26 19:40:12 公開日:2024-04-24
# 非デジタルレジストレーションによる3次元顔モフィング攻撃生成 3D Face Morphing Attack Generation using Non-Rigid Registration ( http://arxiv.org/abs/2404.15765v1 ) ライセンス: Link先を確認	Jag Mohan Singh, Raghavendra Ramachandra,	(参考訳) 顔認識システム(FRS)は、現実の環境での精度の高さから、電子商取引や電子バンキングなどの商業環境で広く使われている。しかし、これらのシステムは、異なる被験者の顔色画像が混ざり合った顔形態形成攻撃に弱い。そこで本研究では、2つのボナファイド点雲から3次元顔形態を生成する新しい方法を提案する。提案手法はまず中性表現を用いたボナファイド点雲を選択する。 2つの入力点雲を最適化せずにベイジアンコヒーレントポイントドリフト (BCPD) を用いて登録し, 登録点雲の形状と色を平均化し, 顔変形点雲を生成する。提案手法は,200人のボナファイド被験者から388個の顔変形点雲を生成する。この手法の有効性は、G-MAPが81.61%の既存のSOTAよりも優れている97.93%の一般モルフィング攻撃可能性(G-MAP)を達成し、広範囲にわたる脆弱性実験によって実証された。 Face Recognition Systems (FRS) are widely used in commercial environments, such as e-commerce and e-banking, owing to their high accuracy in real-world conditions. However, these systems are vulnerable to facial morphing attacks, which are generated by blending face color images of different subjects. This paper presents a new method for generating 3D face morphs from two bona fide point clouds. The proposed method first selects bona fide point clouds with neutral expressions. The two input point clouds were then registered using a Bayesian Coherent Point Drift (BCPD) without optimization, and the geometry and color of the registered point clouds were averaged to generate a face morphing point cloud. The proposed method generates 388 face-morphing point clouds from 200 bona fide subjects. The effectiveness of the method was demonstrated through extensive vulnerability experiments, achieving a Generalized Morphing Attack Potential (G-MAP) of 97.93%, which is superior to the existing state-of-the-art (SOTA) with a G-MAP of 81.61%.	翻訳日:2024-04-26 19:40:12 公開日:2024-04-24
# 確率微分方程式によるベイズ流の統一と拡散モデル Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations ( http://arxiv.org/abs/2404.15766v1 ) ライセンス: Link先を確認	Kaiwen Xue, Yuhao Zhou, Shen Nie, Xu Min, Xiaolu Zhang, Jun Zhou, Chongxuan Li,	(参考訳) ベイズ流ネットワーク (BFN) は, 拡散モデル (DM) のサンプルではなく, ベイズ推定による様々なノイズレベルの分布のパラメータを反復的に改良する。識別可能な性質のため、BFNは連続データと離散データの両方をモデリングし、同時に高速サンプリング機能を維持することを約束している。本稿では,確率微分方程式(SDE)を用いて,BFNをDMに接続することで,BFNの理解と拡張を図る。我々は,BFNの雑音付加過程に対応する線形SDEを同定し,BFNの回帰損失が復調点マッチングと一致していることを示し,各逆時間SDEの1次解法としてBFNのサンプルを検証した。これらの知見と既存のDMにおける高速サンプリングのレシピに基づいて、画像とテキストの両方で機能評価(例、10)が限定されたサンプル品質の観点から、元のBFNサンプリングを著しく上回るBFNの特殊解法を提案する。特に,本研究では,5～20倍の速度を無償で達成している。私たちのコードはhttps://github.com/ML-GSAI/BFN-Solver.comから入手可能です。 Bayesian flow networks (BFNs) iteratively refine the parameters, instead of the samples in diffusion models (DMs), of distributions at various noise levels through Bayesian inference. Owing to its differentiable nature, BFNs are promising in modeling both continuous and discrete data, while simultaneously maintaining fast sampling capabilities. This paper aims to understand and enhance BFNs by connecting them with DMs through stochastic differential equations (SDEs). We identify the linear SDEs corresponding to the noise-addition processes in BFNs, demonstrate that BFN's regression losses are aligned with denoise score matching, and validate the sampler in BFN as a first-order solver for the respective reverse-time SDE. Based on these findings and existing recipes of fast sampling in DMs, we propose specialized solvers for BFNs that markedly surpass the original BFN sampler in terms of sample quality with a limited number of function evaluations (e.g., 10) on both image and text datasets. Notably, our best sampler achieves an increase in speed of 5~20 times for free. Our code is available at https://github.com/ML-GSAI/BFN-Solver.	翻訳日:2024-04-26 19:40:12 公開日:2024-04-24
# ChEX:胸部X線におけるインタラクティブな局在と領域記述 ChEX: Interactive Localization and Region Description in Chest X-rays ( http://arxiv.org/abs/2404.15770v1 ) ライセンス: Link先を確認	Philip Müller, Georgios Kaissis, Daniel Rueckert,	(参考訳) レポート生成モデルは、胸部X線のような医療画像のきめ細かいテキスト解釈を提供するが、対話性(すなわち、ユーザクエリを通じて生成プロセスを操る能力)と局所的解釈可能性(すなわち、その予測を視覚的に根拠づけること)が欠如していることが多い。これらの問題に対処する努力はあったが、テキストクエリをサポートしない、あるいはローカライズされた解釈性を提供しないなど、対話性に制限がある。そこで本研究では,解剖学的領域や病理などの多様な側面を対象としたテキストプロンプトとバウンディングボックスを統合した,新しいマルチタスクアーキテクチャとトレーニングパラダイムを提案する。このアプローチをChest X-Ray Explainer (ChEX)と呼ぶ。画像のローカライズされた解釈やレポート生成を含む9つの胸部X線タスクの不均一なセットに対する評価は、SOTAモデルとの競合性を示し、さらなる分析はChEXのインタラクティブ機能を示している。 Report generation models offer fine-grained textual interpretations of medical images like chest X-rays, yet they often lack interactivity (i.e. the ability to steer the generation process through user queries) and localized interpretability (i.e. visually grounding their predictions), which we deem essential for future adoption in clinical practice. While there have been efforts to tackle these issues, they are either limited in their interactivity by not supporting textual queries or fail to also offer localized interpretability. Therefore, we propose a novel multitask architecture and training paradigm integrating textual prompts and bounding boxes for diverse aspects like anatomical regions and pathologies. We call this approach the Chest X-Ray Explainer (ChEX). Evaluations across a heterogeneous set of 9 chest X-ray tasks, including localized image interpretation and report generation, showcase its competitiveness with SOTA models while additional analysis demonstrates ChEX's interactive capabilities.	翻訳日:2024-04-26 19:40:12 公開日:2024-04-24
# DVF:検索ガイドラインによるロバスト化と高精度画像検索 DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines ( http://arxiv.org/abs/2404.15771v1 ) ライセンス: Link先を確認	Xin Jiang, Hao Tang, Rui Yan, Jinhui Tang, Zechao Li,	(参考訳) 細粒度画像検索(FGIR)は、一般化を維持しながら視覚的に類似した物体を識別する視覚表現を学習することである。既存の方法は識別的特徴を生成することを提案するが、FGIRタスク自体の特異性を考えることは滅多にない。本稿では,FGIRモデルの設計において,サブカテゴリ固有の相違点を特定し,識別的特徴を生成するための実践的ガイドラインを提案する。これらのガイドラインには、オブジェクト(G1)の強調、サブカテゴリ固有の相違(G2)の強調、効果的なトレーニング戦略(G3)の活用が含まれる。 G1 と G2 に続いて,DVF と表記される平易な視覚変換器のための新しいデュアルビジュアルフィルタ機構を設計し,サブカテゴリ固有の相違を捉える。具体的には、二重視覚フィルタリング機構は、オブジェクト指向モジュールと意味指向モジュールとから構成される。これらのコンポーネントは、オブジェクトを拡大し、それぞれ識別可能な領域を特定するのに役立ちます。 G3の後、DVFの識別性と一般化能力を向上させるための識別モデルトレーニング戦略を実装した。総括分析およびアブレーション研究により,提案ガイドラインの有効性が確認された。ベルとホイッスルなしで、提案されたDVFは、クローズドセットとオープンセットの設定で、広く使われている3つのきめ細かいデータセットに対して最先端のパフォーマンスを達成する。 Fine-grained image retrieval (FGIR) is to learn visual representations that distinguish visually similar objects while maintaining generalization. Existing methods propose to generate discriminative features, but rarely consider the particularity of the FGIR task itself. This paper presents a meticulous analysis leading to the proposal of practical guidelines to identify subcategory-specific discrepancies and generate discriminative features to design effective FGIR models. These guidelines include emphasizing the object (G1), highlighting subcategory-specific discrepancies (G2), and employing effective training strategy (G3). Following G1 and G2, we design a novel Dual Visual Filtering mechanism for the plain visual transformer, denoted as DVF, to capture subcategory-specific discrepancies. Specifically, the dual visual filtering mechanism comprises an object-oriented module and a semantic-oriented module. These components serve to magnify objects and identify discriminative regions, respectively. Following G3, we implement a discriminative model training strategy to improve the discriminability and generalization ability of DVF. Extensive analysis and ablation studies confirm the efficacy of our proposed guidelines. Without bells and whistles, the proposed DVF achieves state-of-the-art performance on three widely-used fine-grained datasets in closed-set and open-set settings.	翻訳日:2024-04-26 19:40:12 公開日:2024-04-24
# Bi-Mamba4TS:時系列予測のための双方向マンバ Bi-Mamba4TS: Bidirectional Mamba for Time Series Forecasting ( http://arxiv.org/abs/2404.15772v1 ) ライセンス: Link先を確認	Aobo Liang, Xingguo Jiang, Yan Sun, Chang Lu,	(参考訳) 長期時系列予測(LTSF)は、将来のトレンドとパターンに関するより長い洞察を提供する。近年、ディープラーニングモデル、特にトランスフォーマーはLTSFタスクで高度なパフォーマンスを実現している。しかし、トランスフォーマーの二次複雑性は、計算効率のバランスと性能の予測という課題を提起する。近年,Mamba という新しい状態空間モデル (SSM) が提案されている。入力データに対する選択的機能とハードウェア対応並列計算アルゴリズムにより、Mambaは線形計算複雑性を維持しながら、長期的依存をうまく捉えることができる。 Mamba は長いシーケンスモデリングに優れた能力を示しており、LTSF の Transformer ベースのモデルと競合する可能性がある。本稿では,時系列予測のための双方向マンバであるBi-Mamba4TSを提案する。時系列セマンティクスの空間性に対処するため、我々は、より微細な粒度で時系列の進化パターンを捉えながら、局所的な情報を強化するパッチ手法を採用した。データセットの特性に基づいてより適切なモデリング手法を選択するため,本モデルでは,チャネル独立・チャネル混合トークン化戦略を統一し,系列関係対応決定器を用いて戦略選択プロセスを制御する。 7つの実世界のデータセットに対する大規模な実験により、我々のモデルは最先端の手法と比較してより正確な予測を達成できることを示した。 Long-term time series forecasting (LTSF) provides longer insights into future trends and patterns. In recent years, deep learning models especially Transformers have achieved advanced performance in LTSF tasks. However, the quadratic complexity of Transformers rises the challenge of balancing computaional efficiency and predicting performance. Recently, a new state space model (SSM) named Mamba is proposed. With the selective capability on input data and the hardware-aware parallel computing algorithm, Mamba can well capture long-term dependencies while maintaining linear computational complexity. Mamba has shown great ability for long sequence modeling and is a potential competitor to Transformer-based models in LTSF. In this paper, we propose Bi-Mamba4TS, a bidirectional Mamba for time series forecasting. To address the sparsity of time series semantics, we adopt the patching technique to enrich the local information while capturing the evolutionary patterns of time series in a finer granularity. To select more appropriate modeling method based on the characteristics of the dataset, our model unifies the channel-independent and channel-mixing tokenization strategies and uses a series-relation-aware decider to control the strategy choosing process. Extensive experiments on seven real-world datasets show that our model achieves more accurate predictions compared with state-of-the-art methods.	翻訳日:2024-04-26 19:30:27 公開日:2024-04-24
# LiDARインテンシティシミュレーションのための物理対応ディープラーニングアーキテクチャに向けて Toward Physics-Aware Deep Learning Architectures for LiDAR Intensity Simulation ( http://arxiv.org/abs/2404.15774v1 ) ライセンス: Link先を確認	Vivek Anand, Bharat Lohani, Gaurav Pandey, Rakesh Mishra,	(参考訳) 自動運転車(AV)は環境の理解とナビゲーションにLiDARの認識に大きく依存している。 LiDAR強度は反射レーザー信号に関する貴重な情報を提供し、AVの知覚能力を高める上で重要な役割を果たす。しかし、LiDARの強度を正確にシミュレーションすることは、環境中の物体の材料特性が利用できないことや、レーザービームと環境の間の複雑な相互作用のため、依然として課題である。提案手法は,深層学習フレームワークに物理に基づくモーダルティを組み込むことで,強度シミュレーションの精度を向上させることを目的とする。レーザービームと物体の間の相互作用を捉える重要な要素の1つは、入射角である。本研究は,深部ニューラルネットワークへの個別入力としてLiDAR入射角を追加することにより,結果を著しく向上させることを示した。 U-NET a Convolutional Neural Network (CNN) と Pix2Pix a Generative Adversarial Network (GAN) の2つの著名なディープラーニングアーキテクチャの比較研究を行った。この2つのアーキテクチャを強度予測タスクに実装し,実験にSemanticKITTIとVoxelScapeデータセットを使用した。比較分析により、どちらのアーキテクチャも追加入力として入射角から恩恵を受けることが明らかとなった。さらにPix2Pixアーキテクチャは、特に入射角が組み込まれた場合、U-NETより優れている。 Autonomous vehicles (AVs) heavily rely on LiDAR perception for environment understanding and navigation. LiDAR intensity provides valuable information about the reflected laser signals and plays a crucial role in enhancing the perception capabilities of AVs. However, accurately simulating LiDAR intensity remains a challenge due to the unavailability of material properties of the objects in the environment, and complex interactions between the laser beam and the environment. The proposed method aims to improve the accuracy of intensity simulation by incorporating physics-based modalities within the deep learning framework. One of the key entities that captures the interaction between the laser beam and the objects is the angle of incidence. In this work we demonstrate that the addition of the LiDAR incidence angle as a separate input to the deep neural networks significantly enhances the results. We present a comparative study between two prominent deep learning architectures: U-NET a Convolutional Neural Network (CNN), and Pix2Pix a Generative Adversarial Network (GAN). We implemented these two architectures for the intensity prediction task and used SemanticKITTI and VoxelScape datasets for experiments. The comparative analysis reveals that both architectures benefit from the incidence angle as an additional input. Moreover, the Pix2Pix architecture outperforms U-NET, especially when the incidence angle is incorporated.	翻訳日:2024-04-26 19:30:27 公開日:2024-04-24
# 医療産業における大規模言語モデル応用の評価に関する総合的研究 A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry ( http://arxiv.org/abs/2404.15777v1 ) ライセンス: Link先を確認	Yining Huang, Keke Tang, Meilian Chen,	(参考訳) 2017年のTransformerアーキテクチャの開始以来、GPTやBERTのような大規模言語モデル(LLM)は大幅に進化し、言語理解と生成の高度な能力を持つ様々な産業に影響を与えた。これらのモデルは、医療分野を変革する可能性を示し、その効果的かつ倫理的な展開を保証するための特別な評価フレームワークの必要性を強調している。この包括的調査は、医療におけるLSMの広範な適用と必要な評価を概説し、医療の成果を高める上で、その能力を完全に活用するための実証的検証の重要性を強調した。本調査は,臨床環境,医療用テキストデータ処理,研究,教育,公衆衛生への意識といった分野におけるLCM応用の詳細な分析を行うために構成されている。まず,臨床応用,医用テキストデータ処理,情報検索,データ分析,医学論文作成,教育コンテンツ生成などの業務において,その業績に基づいて評価される役割について検討する。その後のセクションでは、これらの評価で使用される方法論を掘り下げ、モデルの有効性、正確性、倫理的整合性を評価するために使用されるベンチマークとメトリクスについて議論した。本調査は,医療従事者,研究者,政策立案者に対して,医療応用におけるLCMの潜在的な強みと限界を包括的に理解することを目的としている。この調査は、評価プロセスとLSMを医療に組み込む上で直面する課題に関する詳細な洞察を提供することによって、これらの強力なモデルの責任ある開発と展開をガイドし、厳格な倫理基準を維持しながら、その潜在能力を最大限に活用することを目指している。 Since the inception of the Transformer architecture in 2017, Large Language Models (LLMs) such as GPT and BERT have evolved significantly, impacting various industries with their advanced capabilities in language understanding and generation. These models have shown potential to transform the medical field, highlighting the necessity for specialized evaluation frameworks to ensure their effective and ethical deployment. This comprehensive survey delineates the extensive application and requisite evaluation of LLMs within healthcare, emphasizing the critical need for empirical validation to fully exploit their capabilities in enhancing healthcare outcomes. Our survey is structured to provide an in-depth analysis of LLM applications across clinical settings, medical text data processing, research, education, and public health awareness. We begin by exploring the roles of LLMs in different medical applications, detailing how they are evaluated based on their performance in tasks such as clinical application, medical text data processing, information retrieval, data analysis, medical scientific writing, educational content generation etc. The subsequent sections delve into the methodologies employed in these evaluations, discussing the benchmarks and metrics used to assess the models' effectiveness, accuracy, and ethical alignment. Through this survey, we aim to equip healthcare professionals, researchers, and policymakers with a comprehensive understanding of the potential strengths and limitations of LLMs in medical applications. By providing detailed insights into the evaluation processes and the challenges faced in integrating LLMs into healthcare, this survey seeks to guide the responsible development and deployment of these powerful models, ensuring they are harnessed to their full potential while maintaining stringent ethical standards.	翻訳日:2024-04-26 19:30:27 公開日:2024-04-24
# BASS: 意図を最適化した投機サンプリング BASS: Batched Attention-optimized Speculative Sampling ( http://arxiv.org/abs/2404.15778v1 ) ライセンス: Link先を確認	Haifeng Qian, Sujan Kumar Gonugondla, Sungsoo Ha, Mingyue Shang, Sanjay Krishna Gouda, Ramesh Nallapati, Sudipta Sengupta, Xiaofei Ma, Anoop Deoras,	(参考訳) 投機的復号化は、大規模言語モデルをホストする際のレイテンシとスループットを改善する強力な方法として登場した。しかし、既存の実装のほとんどは単一のシーケンスを生成することに重点を置いている。実世界の生成AIアプリケーションは、しばしば複数の応答と、バッチ環境で投機的復号化を実行する方法を必要とする。本稿では、バッチ化された投機的復号化システムについて述べる。これは、マルチシーケンス生成遅延において新しい最先端の状態を設定し、GPUの優れた利用と、時間予算内での世代品質を示す。例えば、1つのA100 GPU上の7.8Bサイズモデルとバッチサイズが8の場合、各シーケンスは平均速度5.8msで生成され、全体のスループットは毎秒1.1Kである。これらの結果は、最先端のレイテンシと、最適化された正規デコードよりも2.15倍のスピードアップを示している。通常のデコーディングが終わらない時間予算の中で、我々のシステムはHumanEval Pass@Firstの43%とPass@Allの61%のシーケンスを生成することができる。復号化のピークGPU利用率は15.8%、正規復号化の最高値の3倍、単列投機復号化の約10倍に達する。 Speculative decoding has emerged as a powerful method to improve latency and throughput in hosting large language models. However, most existing implementations focus on generating a single sequence. Real-world generative AI applications often require multiple responses and how to perform speculative decoding in a batched setting while preserving its latency benefits poses non-trivial challenges. This paper describes a system of batched speculative decoding that sets a new state of the art in multi-sequence generation latency and that demonstrates superior GPU utilization as well as quality of generations within a time budget. For example, for a 7.8B-size model on a single A100 GPU and with a batch size of 8, each sequence is generated at an average speed of 5.8ms per token, the overall throughput being 1.1K tokens per second. These results represent state-of-the-art latency and a 2.15X speed-up over optimized regular decoding. Within a time budget that regular decoding does not finish, our system is able to generate sequences with HumanEval Pass@First of 43% and Pass@All of 61%, far exceeding what's feasible with single-sequence speculative decoding. Our peak GPU utilization during decoding reaches as high as 15.8%, more than 3X the highest of that of regular decoding and around 10X of single-sequence speculative decoding.	翻訳日:2024-04-26 19:30:27 公開日:2024-04-24
# 立方体サットの高スペクトル画像伝送と復元のためのリアルタイム圧縮センシング Real-Time Compressed Sensing for Joint Hyperspectral Image Transmission and Restoration for CubeSat ( http://arxiv.org/abs/2404.15781v1 ) ライセンス: Link先を確認	Chih-Chung Hsu, Chih-Yu Jian, Eng-Shen Tu, Chia-Ming Lee, Guan-Lin Chen,	(参考訳) 本稿では、しばしばストライプ効果に悩まされ、計算資源に制限される小型衛星からのハイパースペクトル画像(HSI)再構成に関わる課題に対処する。本研究は,RTCS(Real-Time Compressed Sensing, リアルタイム圧縮センシング)ネットワークを提案する。 RTCSネットワークは、必要なトレーニングサンプルを削減し、整数8ベースのエンコーダの実装を容易にし、ストリップのような高速な圧縮センシングを可能にした。これは、高精度浮動小数点演算を必要とする最適化ベースのモデルとは対照的であり、エッジデバイスへのデプロイが困難である。我々のエンコーダは、ストリップライクなHSIデータ伝送に整数8互換の線形プロジェクションを使用し、リアルタイム圧縮センシングを確実にする。さらに、新しい2ストリームアーキテクチャに基づいて、レシーバ側で効率的なHSI復元デコーダを提案し、高度な中央サーバを必要とせずにエッジデバイス再構築を可能にする。これは、地上で重要な計算資源を必要とする小型衛星が増えているため、特に重要である。大規模な実験により、我々のアプローチの優れた性能が検証され、既存の小型衛星システムに新たな重要な能力を提供する。 This paper addresses the challenges associated with hyperspectral image (HSI) reconstruction from miniaturized satellites, which often suffer from stripe effects and are computationally resource-limited. We propose a Real-Time Compressed Sensing (RTCS) network designed to be lightweight and require only relatively few training samples for efficient and robust HSI reconstruction in the presence of the stripe effect and under noisy transmission conditions. The RTCS network features a simplified architecture that reduces the required training samples and allows for easy implementation on integer-8-based encoders, facilitating rapid compressed sensing for stripe-like HSI, which exactly matches the moderate design of miniaturized satellites on push broom scanning mechanism. This contrasts optimization-based models that demand high-precision floating-point operations, making them difficult to deploy on edge devices. Our encoder employs an integer-8-compatible linear projection for stripe-like HSI data transmission, ensuring real-time compressed sensing. Furthermore, based on the novel two-streamed architecture, an efficient HSI restoration decoder is proposed for the receiver side, allowing for edge-device reconstruction without needing a sophisticated central server. This is particularly crucial as an increasing number of miniaturized satellites necessitates significant computing resources on the ground station. Extensive experiments validate the superior performance of our approach, offering new and vital capabilities for existing miniaturized satellite systems.	翻訳日:2024-04-26 19:30:27 公開日:2024-04-24
# Aegisの実証的研究 An Empirical Study of Aegis ( http://arxiv.org/abs/2404.15784v1 ) ライセンス: Link先を確認	Daniel Saragih, Paridhi Goel, Tejas Balaji, Alyssa Li,	(参考訳) ビット・フリップ攻撃(ビット・フリップ・アタック、Bit flipping attack)は、ニューラルネットワークに対する攻撃の一種であり、その有効性を緩和するために多くの防御機構が発明された。これらの防御機構の堅牢性を確保することの重要性から,我々はイージスフレームワークに関する実証的研究を行った。我々は、低エントロピーデータ(MNIST)に基づいて、Aegisのベースラインメカニズムを評価し、MNISTを微調整した事前学習モデルを評価する。また,データ強化とAegisのロバストネストレーニングの併用,およびAegisが他の敵攻撃(例えば,敵の事例の生成)でどのように機能するかを比較した。 Aegisのダイナミックエグジット戦略とロバストネストレーニングの両方に欠点があることが判明した。特に、摂動データや逆の例をベースラインと比較すると、精度の低下が見られる。さらに、より単純なデータセットでテストすると、ダイナミックエグゼクティブ・ストラテジーが一様性を失うことが判明した。プロジェクトのコードはGitHubで公開されている。 Bit flipping attacks are one class of attacks on neural networks with numerous defense mechanisms invented to mitigate its potency. Due to the importance of ensuring the robustness of these defense mechanisms, we perform an empirical study on the Aegis framework. We evaluate the baseline mechanisms of Aegis on low-entropy data (MNIST), and we evaluate a pre-trained model with the mechanisms fine-tuned on MNIST. We also compare the use of data augmentation to the robustness training of Aegis, and how Aegis performs under other adversarial attacks, such as the generation of adversarial examples. We find that both the dynamic-exit strategy and robustness training of Aegis has some drawbacks. In particular, we see drops in accuracy when testing on perturbed data, and on adversarial examples, as compared to baselines. Moreover, we found that the dynamic exit-strategy loses its uniformity when tested on simpler datasets. The code for this project is available on GitHub.	翻訳日:2024-04-26 19:30:27 公開日:2024-04-24
# クラスを超えて見る:言語記述子によるゼロショット接地状況認識 Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer ( http://arxiv.org/abs/2404.15785v1 ) ライセンス: Link先を確認	Jiaming Lei, Lin Li, Chunping Wang, Jun Xiao, Long Chen,	(参考訳) 強力な一般化能力、事前訓練された視覚言語モデル(VLM)、例えばCLIPは、ゼロショットシーン理解において広く利用されている。単純な認識タスクとは異なり、接地状況認識(GSR)では、画像内の健全な活動(動詞)を分類するだけでなく、行動に参加するすべての意味的役割を検出する必要がある。この複雑なタスクは通常、動詞の認識、意味的役割の接地、名詞の認識という3つのステップを含む。クラスベースのプロンプトをVLMとグラウンドモデルで直接採用することは、曖昧な動詞概念の区別、固定された動詞中心のテンプレート1入力による役割の正確なローカライズ、文脈対応の名詞予測といった、いくつかの制限に悩まされる。本稿では,これらの制限は,動詞・名詞の理解が不十分なモードに起因していると論じる。この目的のために,Language Explainer (LEX) によるゼロショットGSRの新しいアプローチを導入する。 1) 異なる動詞群の識別性を高めるために、一般的な動詞中心の記述を生成する動詞説明装置 2) より明瞭な理解のために動詞中心のテンプレートを言い換えて意味的役割の正確なローカライゼーションを強化する接地説明詞。 3) シーン固有の名詞記述を生成する名詞説明器は,文脈対応の名詞認識を保証する。 GSRプロセスの各ステップに補助的な説明器を設けることで、LEXは現実世界のシナリオにおける複雑なシーン理解を容易にする。 SWiGデータセットに対する広範な検証では、ゼロショットGSRにおけるLEXの有効性と相互運用性が示されている。 Benefiting from strong generalization ability, pre-trained vision language models (VLMs), e.g., CLIP, have been widely utilized in zero-shot scene understanding. Unlike simple recognition tasks, grounded situation recognition (GSR) requires the model not only to classify salient activity (verb) in the image, but also to detect all semantic roles that participate in the action. This complex task usually involves three steps: verb recognition, semantic role grounding, and noun recognition. Directly employing class-based prompts with VLMs and grounding models for this task suffers from several limitations, e.g., it struggles to distinguish ambiguous verb concepts, accurately localize roles with fixed verb-centric template1 input, and achieve context-aware noun predictions. In this paper, we argue that these limitations stem from the mode's poor understanding of verb/noun classes. To this end, we introduce a new approach for zero-shot GSR via Language EXplainer (LEX), which significantly boosts the model's comprehensive capabilities through three explainers: 1) verb explainer, which generates general verb-centric descriptions to enhance the discriminability of different verb classes; 2) grounding explainer, which rephrases verb-centric templates for clearer understanding, thereby enhancing precise semantic role localization; and 3) noun explainer, which creates scene-specific noun descriptions to ensure context-aware noun recognition. By equipping each step of the GSR process with an auxiliary explainer, LEX facilitates complex scene understanding in real-world scenarios. Our extensive validations on the SWiG dataset demonstrate LEX's effectiveness and interoperability in zero-shot GSR.	翻訳日:2024-04-26 19:30:27 公開日:2024-04-24
# MedMNIST+データセットコレクションによるモデルプロトタイピングの再考 Rethinking Model Prototyping through the MedMNIST+ Dataset Collection ( http://arxiv.org/abs/2404.15786v1 ) ライセンス: Link先を確認	Sebastian Doerrich, Francesco Di Salvo, Julius Brockmann, Christian Ledig,	(参考訳) 臨床実践におけるディープラーニングベースのシステムの統合は、制限された異種医学データセットに根ざした課題によってしばしば妨げられる。さらに、臨床応用性よりも狭い範囲のベンチマークでの限界性能改善の優先順位付けは、有意義なアルゴリズムの進歩を遅らせている。この傾向は、臨床に関係のある革新を育むのではなく、選択したデータセット上で最先端のパフォーマンスを達成するために既存の手法を過度に微調整する結果をもたらすことが多い。本研究は、MedMNIST+データベースの総合的なベンチマークを提示し、評価環境の多様化と、医用画像分類のための共通畳み込みニューラルネットワーク(CNN)とトランスフォーマーベースのアーキテクチャの徹底的な分析を行う。本評価は, 様々な医療データセット, トレーニング手法, 入力解像度を包含し, 広く使用されているモデル変異の強度と限界を再評価することを目的としている。この結果から,計算効率のよいトレーニングスキームと最新の基礎モデルは,高額なエンドツーエンドトレーニングとリソース強化アプローチのギャップを埋める上で有望であることが示唆された。さらに、一般的な仮定とは対照的に、高分解能は一定のしきい値を超えるパフォーマンスを一貫して改善することはなく、特にプロトタイピング段階における低分解能の使用を優先して処理を高速化する。特に,本研究では,異なるモデルアーキテクチャの本質的な能力を理解することの重要性を強調したViTベースのアーキテクチャと比較して,畳み込みモデルの競争性を再確認する。さらに、我々の標準化された評価フレームワークは、MedMNIST+データセットコレクションの透明性、再現性、コンパラビリティの向上と、この分野における今後の研究に役立つことを期待しています。コードはまもなくリリースされる。 The integration of deep learning based systems in clinical practice is often impeded by challenges rooted in limited and heterogeneous medical datasets. In addition, prioritization of marginal performance improvements on a few, narrowly scoped benchmarks over clinical applicability has slowed down meaningful algorithmic progress. This trend often results in excessive fine-tuning of existing methods to achieve state-of-the-art performance on selected datasets rather than fostering clinically relevant innovations. In response, this work presents a comprehensive benchmark for the MedMNIST+ database to diversify the evaluation landscape and conduct a thorough analysis of common convolutional neural networks (CNNs) and Transformer-based architectures, for medical image classification. Our evaluation encompasses various medical datasets, training methodologies, and input resolutions, aiming to reassess the strengths and limitations of widely used model variants. Our findings suggest that computationally efficient training schemes and modern foundation models hold promise in bridging the gap between expensive end-to-end training and more resource-refined approaches. Additionally, contrary to prevailing assumptions, we observe that higher resolutions may not consistently improve performance beyond a certain threshold, advocating for the use of lower resolutions, particularly in prototyping stages, to expedite processing. Notably, our analysis reaffirms the competitiveness of convolutional models compared to ViT-based architectures emphasizing the importance of comprehending the intrinsic capabilities of different model architectures. Moreover, we hope that our standardized evaluation framework will help enhance transparency, reproducibility, and comparability on the MedMNIST+ dataset collection as well as future research within the field. Code will be released soon.	翻訳日:2024-04-26 19:30:27 公開日:2024-04-24
# MotionMaster:ビデオ生成のためのトレーニング不要カメラモーション転送 MotionMaster: Training-free Camera Motion Transfer For Video Generation ( http://arxiv.org/abs/2404.15789v1 ) ライセンス: Link先を確認	Teng Hu, Jiangning Zhang, Ran Yi, Yating Wang, Hongrui Huang, Jieyu Weng, Yabiao Wang, Lizhuang Ma,	(参考訳) 拡散モデルの出現は、画像およびビデオ生成の進歩を大いに促進した。近年,テキスト・トゥ・ビデオ・ジェネレーションやビデオ・モーション・コントロールなど,カメラ・モーション・コントロールが重要な話題となっているコントロール可能なビデオ・ジェネレーションへの取り組みが進められている。しかし、既存のカメラモーションコントロール手法は、時間カメラモジュールのトレーニングに頼っており、ビデオ生成モデルにおける大量のパラメータのため、かなりの計算資源を必要とする。さらに、トレーニング中にカメラのモーションタイプを事前に定義する既存の手法では、カメラ制御の柔軟性が制限されている。そこで,トレーニングコストを低減し,フレキシブルなカメラ制御を実現するために,ソースビデオ中のカメラの動きとオブジェクトの動きをアンハングリングし,抽出したカメラの動きを新しいビデオに転送する,新しいトレーニングフリー動画移動モデルであるCOMDを提案する。まず,背景から移動物体を分離し,ポアソン方程式を解くことにより,背景の動きに基づいて移動物体領域におけるカメラの動きを推定する。さらに,複数のビデオの時間的注目マップに共通する特徴を抽出するために,ウィンドウベースのクラスタリング手法を用いて,類似のカメラモーションを用いた複数のビデオから共通カメラモーションを抽出する,数発のカメラモーション・アンタングル法を提案する。最後に、異なる種類のカメラの動きを組み合わせ、より制御しやすくフレキシブルなカメラ制御を可能にするモーション組み合わせ法を提案する。広汎な実験により、我々のトレーニング不要なアプローチは、カメラオブジェクトの動きを効果的に分離し、分離されたカメラモーションを幅広い制御可能なビデオ生成タスクに適用し、フレキシブルで多様なカメラモーション制御を実現することができることを示した。 The emergence of diffusion models has greatly propelled the progress in image and video generation. Recently, some efforts have been made in controllable video generation, including text-to-video generation and video motion control, among which camera motion control is an important topic. However, existing camera motion control methods rely on training a temporal camera module, and necessitate substantial computation resources due to the large amount of parameters in video generation models. Moreover, existing methods pre-define camera motion types during training, which limits their flexibility in camera control. Therefore, to reduce training costs and achieve flexible camera control, we propose COMD, a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos and transfers the extracted camera motions to new videos. We first propose a one-shot camera motion disentanglement method to extract camera motion from a single source video, which separates the moving objects from the background and estimates the camera motion in the moving objects region based on the motion in the background by solving a Poisson equation. Furthermore, we propose a few-shot camera motion disentanglement method to extract the common camera motion from multiple videos with similar camera motions, which employs a window-based clustering technique to extract the common features in temporal attention maps of multiple videos. Finally, we propose a motion combination method to combine different types of camera motions together, enabling our model a more controllable and flexible camera control. Extensive experiments demonstrate that our training-free approach can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks, achieving flexible and diverse camera motion control.	翻訳日:2024-04-26 19:30:27 公開日:2024-04-24
# マルチモーダル検索のための大規模言語モデルの活用 Leveraging Large Language Models for Multimodal Search ( http://arxiv.org/abs/2404.15790v1 ) ライセンス: Link先を確認	Oriol Barbany, Michael Huang, Xinliang Zhu, Arnab Dhua,	(参考訳) マルチモーダル検索は、ユーザに対して、検索意図を抑圧する自然な効果的な方法を提供する上で、ますます重要になっている。画像は所望の製品の細かな詳細を提供するが、テキストは検索の修正を簡単に組み込むことができる。しかし、既存のマルチモーダル検索システムは信頼性が低く、単純なクエリに対処できない。この問題は、曖昧で暗黙的で無関係なインフォームを含む自然言語のテキストクエリの大きなばらつきによって難しくなる。これらの問題に対処するには、マッチング能力の強化、推論能力、コンテキスト対応のクエリ解析と書き換えを必要とする。本稿では,Fashion200Kデータセット上での新たなパフォーマンスマイルストーンを実現する,新しいマルチモーダル検索モデルを提案する。さらに,Large Language Models (LLM) を統合した新しい検索インタフェースを提案する。このインタフェースは,ユーザと対話しながら,検索システムにクエリをルーティングする。マルチモーダル検索モデルと組み合わせることで、人間のようなインタラクションを提供し、全体的な検索体験を向上できるショッピングアシスタントの新時代を開拓する。 Multimodal search has become increasingly important in providing users with a natural and effective way to ex-press their search intentions. Images offer fine-grained details of the desired products, while text allows for easily incorporating search modifications. However, some existing multimodal search systems are unreliable and fail to address simple queries. The problem becomes harder with the large variability of natural language text queries, which may contain ambiguous, implicit, and irrelevant in-formation. Addressing these issues may require systems with enhanced matching capabilities, reasoning abilities, and context-aware query parsing and rewriting. This paper introduces a novel multimodal search model that achieves a new performance milestone on the Fashion200K dataset. Additionally, we propose a novel search interface integrating Large Language Models (LLMs) to facilitate natural language interaction. This interface routes queries to search systems while conversationally engaging with users and considering previous searches. When coupled with our multimodal search model, it heralds a new era of shopping assistants capable of offering human-like interaction and enhancing the overall search experience.	翻訳日:2024-04-26 19:30:27 公開日:2024-04-24
# 非局所性から文脈性への変換 Converting nonlocality into contextuality ( http://arxiv.org/abs/2404.15793v1 ) ライセンス: Link先を確認	Karl Svozil,	(参考訳) 行列鉛筆の対角化は、ブールの「可能な経験の条件」を演算子に基づいて書き起こす一様手法を提供する。また、関連する文脈の構造解析を行い、古典的な予測から量子化されたシステムの偏差のコンパクトな形式を提案する。 Diagonalization of matrix pencils provide a uniform technique to transcribe operator based violations of Boole's `conditions of possible experience' involving multipartite correllations into contextuality. They also provide structural analysis of the contexts involved, and thereby suggest compact forms of deviations of quantized systems from classical predictions.	翻訳日:2024-04-26 19:30:27 公開日:2024-04-24
# 品質多様性のためのインコンテキストAIジェネレータとしての大規模言語モデル Large Language Models as In-context AI Generators for Quality-Diversity ( http://arxiv.org/abs/2404.15794v1 ) ライセンス: Link先を確認	Bryan Lim, Manon Flageat, Antoine Cully,	(参考訳) QD(Quality-Diversity)アプローチは、様々なニッチにまたがる高品質なソリューションのアーカイブを見つけることができるため、オープンなプロセスを開発する上で有望な方向である。既に多くのアプリケーションで成功したが、QDアプローチは通常、新しい候補ソリューションを生成するために1つまたは2つのソリューションの組み合わせに頼っている。技術進化のようなオープンなプロセスで観察されるように、これらのソリューションの大きな多様性を賢明に組み合わせることで、より革新的なソリューションが生まれ、QD検索の生産性が向上する可能性がある。本研究では、生成モデルのパターンマッチング機能を利用して、そのような効率的な解の組み合わせを実現することを提案する。 In-context QDは、事前学習されたLarge Language Models (LLMs) のコンテキスト内能力を引き出す手法のフレームワークであり、QDアーカイブをコンテキストとして利用する興味深いソリューションを生成する。一連の共通QDドメインに適用すると、In-context QDは、単目的最適化のために開発されたQDベースラインと類似の戦略の両方と比較して有望な結果を示す。さらに、この結果は、パラメータサイズとアーカイブ人口サイズの複数の値にまたがるだけでなく、BBO関数と異なる特徴を持つ領域やポリシー探索の領域にも及んでいる。最後に、QDのための有望なソリューションの創出を促進する重要なプロンプト設計の考察を強調した広範囲なアブレーションを行う。 Quality-Diversity (QD) approaches are a promising direction to develop open-ended processes as they can discover archives of high-quality solutions across diverse niches. While already successful in many applications, QD approaches usually rely on combining only one or two solutions to generate new candidate solutions. As observed in open-ended processes such as technological evolution, wisely combining large diversity of these solutions could lead to more innovative solutions and potentially boost the productivity of QD search. In this work, we propose to exploit the pattern-matching capabilities of generative models to enable such efficient solution combinations. We introduce In-context QD, a framework of techniques that aim to elicit the in-context capabilities of pre-trained Large Language Models (LLMs) to generate interesting solutions using the QD archive as context. Applied to a series of common QD domains, In-context QD displays promising results compared to both QD baselines and similar strategies developed for single-objective optimization. Additionally, this result holds across multiple values of parameter sizes and archive population sizes, as well as across domains with distinct characteristics from BBO functions to policy search. Finally, we perform an extensive ablation that highlights the key prompt design considerations that encourage the generation of promising solutions for QD.	翻訳日:2024-04-26 19:30:27 公開日:2024-04-24
# Raformer:ビデオワイヤー塗装用の冗長性対応トランスフォーマー Raformer: Redundancy-Aware Transformer for Video Wire Inpainting ( http://arxiv.org/abs/2404.15802v1 ) ライセンス: Link先を確認	Zhong Ji, Yimu Su, Yan Zhang, Jiacheng Hou, Yanwei Pang, Jungong Han,	(参考訳) Video Wire Inpainting (VWI) は、フィルムやテレビシリーズのワイヤーを完璧に除去することを目的とした、ビデオインペイントにおける顕著な応用である。しかしながら、ワイヤの取り外しは、一般的にビデオの塗布作業で対象とするものよりも長く、細くなり、人や背景オブジェクトと不規則に交差することが多く、塗装プロセスに複雑さが生じるため、大きな課題となる。ビデオワイヤの小型化,品質の低さ,各種シーンの限定といった,既存のビデオワイヤデータセットの制約を認識し,新しいマスク生成戦略であるWire removal Video Dataset 2 (WRV2) と Pseudo Wire-Shaped (PWS) Masks を導入した新しいVWIデータセットを提案する。 WRV2データセットは、平均80フレームの4,000本以上のビデオで構成され、インペイントモデルの開発と有効性を促進するように設計されている。そこで本研究では,ビデオインペイントにおけるワイヤ除去のユニークな課題に対処する冗長性認識変換器(Raformer)法を提案する。すべてのフレームパッチを無差別に処理する従来のアプローチとは異なり、Raformerは、塗装に有用な情報を持たない静的な背景セグメントなど、冗長な部分を選択的にバイパスする新しい戦略を採用している。 Raformerのコアとなるのは、粗い粒度のウィンドウベースのアテンションメカニズムを通じて重要なコンテンツを分離しアクセントする、冗長性意識(RAA)モジュールである。これはSoft Feature Alignment (SFA)モジュールによって補完され、これらの機能を洗練し、エンドツーエンドの機能アライメントを実現する。従来のビデオインペイントデータセットと提案したWRV2データセットの両方に対する大規模な実験により、Raformerが他の最先端手法よりも優れていることが示された。 Video Wire Inpainting (VWI) is a prominent application in video inpainting, aimed at flawlessly removing wires in films or TV series, offering significant time and labor savings compared to manual frame-by-frame removal. However, wire removal poses greater challenges due to the wires being longer and slimmer than objects typically targeted in general video inpainting tasks, and often intersecting with people and background objects irregularly, which adds complexity to the inpainting process. Recognizing the limitations posed by existing video wire datasets, which are characterized by their small size, poor quality, and limited variety of scenes, we introduce a new VWI dataset with a novel mask generation strategy, namely Wire Removal Video Dataset 2 (WRV2) and Pseudo Wire-Shaped (PWS) Masks. WRV2 dataset comprises over 4,000 videos with an average length of 80 frames, designed to facilitate the development and efficacy of inpainting models. Building upon this, our research proposes the Redundancy-Aware Transformer (Raformer) method that addresses the unique challenges of wire removal in video inpainting. Unlike conventional approaches that indiscriminately process all frame patches, Raformer employs a novel strategy to selectively bypass redundant parts, such as static background segments devoid of valuable information for inpainting. At the core of Raformer is the Redundancy-Aware Attention (RAA) module, which isolates and accentuates essential content through a coarse-grained, window-based attention mechanism. This is complemented by a Soft Feature Alignment (SFA) module, which refines these features and achieves end-to-end feature alignment. Extensive experiments on both the traditional video inpainting datasets and our proposed WRV2 dataset demonstrate that Raformer outperforms other state-of-the-art methods.	翻訳日:2024-04-26 19:30:27 公開日:2024-04-24
# GeckOpt: インテントベースのツール選択によるLLMシステムの効率性 GeckOpt: LLM System Efficiency via Intent-Based Tool Selection ( http://arxiv.org/abs/2404.15804v1 ) ライセンス: Link先を確認	Michael Fore, Simranjit Singh, Dimitrios Stamoulis,	(参考訳) 本稿では,大規模言語モデル (LLM) に対する GPT による意図に基づく推論手法について検討する。実行時にユーザプロンプトの背後にある意図を特定することで、タスク実行に必要なAPIツールセットを縮小し、トークンの消費量を最大24.6\%削減します。 100以上のGPT-4-Turboノードを持つ現実世界の大規模並列Copilotプラットフォームの初期結果は、LCMベースのシステム効率を改善するためのコスト削減と可能性を示している。 In this preliminary study, we investigate a GPT-driven intent-based reasoning approach to streamline tool selection for large language models (LLMs) aimed at system efficiency. By identifying the intent behind user prompts at runtime, we narrow down the API toolset required for task execution, reducing token consumption by up to 24.6\%. Early results on a real-world, massively parallel Copilot platform with over 100 GPT-4-Turbo nodes show cost reductions and potential towards improving LLM-based system efficiency.	翻訳日:2024-04-26 19:30:27 公開日:2024-04-24
# ESM2を超える: 効率的なクラスタリングによるグラフ強化タンパク質配列モデリング Beyond ESM2: Graph-Enhanced Protein Sequence Modeling with Efficient Clustering ( http://arxiv.org/abs/2404.15805v1 ) ライセンス: Link先を確認	Shujian Jiao, Bingxuan Li, Lei Wang, Xiaojin Zhang, Wei Chen, Jiajie Peng, Zhongyu Wei,	(参考訳) タンパク質は生命の過程に必須であり、進化と多様性を支えている。シークエンシング技術の進歩により数百万のタンパク質が明らかにされ、生物学的分析とAI開発のための高度な事前学習されたタンパク質モデルの必要性が強調されている。 FacebookのESM2は、これまでで最も先進的なタンパク質言語モデルであり、教師なし学習にマスク付き予測タスクを活用し、顕著な生化学的精度でアミノ酸表現を作成する。我々の研究は、タンパク質ファミリー分類をESM2のトレーニングに組み込むことで、このギャップに対処する。このアプローチは、Community Propagation-Based Clustering Algorithmで強化され、グローバルなタンパク質表現を改善し、文脈予測タスクは局所アミノ酸の精度を微調整する。本モデルでは,タンパク質の表現品質を著しく向上させるグローバルな手法とローカルな手法を組み合わせる能力を示すために,いくつかの下流実験で最先端の結果を得た。 Proteins are essential to life's processes, underpinning evolution and diversity. Advances in sequencing technology have revealed millions of proteins, underscoring the need for sophisticated pre-trained protein models for biological analysis and AI development. Facebook's ESM2, the most advanced protein language model to date, leverages a masked prediction task for unsupervised learning, crafting amino acid representations with notable biochemical accuracy. Yet, it lacks in delivering functional protein insights, signaling an opportunity for enhancing representation quality.Our study addresses this gap by incorporating protein family classification into ESM2's training.This approach, augmented with Community Propagation-Based Clustering Algorithm, improves global protein representations, while a contextual prediction task fine-tunes local amino acid accuracy. Significantly, our model achieved state-of-the-art results in several downstream experiments, demonstrating the power of combining global and local methodologies to substantially boost protein representation quality.	翻訳日:2024-04-26 19:30:27 公開日:2024-04-24
# マスクの場所:グラフマスクオートエンコーダのための構造誘導型マスキング Where to Mask: Structure-Guided Masking for Graph Masked Autoencoders ( http://arxiv.org/abs/2404.15806v1 ) ライセンス: Link先を確認	Chuang Liu, Yuyao Wang, Yibing Zhan, Xueqi Ma, Dapeng Tao, Jia Wu, Wenbin Hu,	(参考訳) グラフマスク付きオートエンコーダ(GMAE)は、グラフ構造化データに対する自己教師付き事前学習の大幅な進歩として登場した。これまでのGMAEモデルは、トレーニング中にノードやエッジに対して単純なランダムマスキング戦略を主に利用していた。しかし、この戦略はグラフ構造内の異なるノードの異なる重要性を考慮できない。本稿では,マスク付き事前学習プロセスにおいて,グラフの構造組成を基本的かつ一意的に活用する可能性について検討する。そこで本研究では,既存のGMAEモデルの改良を目的とした,構造誘導型マスキング戦略(StructMAE)を提案する。 StructMAEには2つのステップがある。 1) 構造に基づくスコア付け: 各ノードが評価され,その構造的意義を反映したスコアが割り当てられる。事前定義と学習可能なスコアリングの2つの異なる種類のスコアリング方法が提案されている。 2) 構造誘導型マスキング: 得られた評価スコアを用いて, 自己指導型再建作業の構造意識を徐々に向上させる, 容易かつハードなマスキング戦略を開発する。特に、この戦略はランダムマスキングから始まり、アセスメントスコアに基づいて、構造非形式ノードをマスキングする。この設計は、グラフ構造情報の学習においてモデルを徐々に効果的に導く。さらに、StructMAE法は、教師なしと転送学習の両方において、既存の最先端のGMAEモデルよりも優れていることを一貫して実証している。コードはhttps://github.com/LiuChuang0059/StructMAEで入手できる。 Graph masked autoencoders (GMAE) have emerged as a significant advancement in self-supervised pre-training for graph-structured data. Previous GMAE models primarily utilize a straightforward random masking strategy for nodes or edges during training. However, this strategy fails to consider the varying significance of different nodes within the graph structure. In this paper, we investigate the potential of leveraging the graph's structural composition as a fundamental and unique prior in the masked pre-training process. To this end, we introduce a novel structure-guided masking strategy (i.e., StructMAE), designed to refine the existing GMAE models. StructMAE involves two steps: 1) Structure-based Scoring: Each node is evaluated and assigned a score reflecting its structural significance. Two distinct types of scoring manners are proposed: predefined and learnable scoring. 2) Structure-guided Masking: With the obtained assessment scores, we develop an easy-to-hard masking strategy that gradually increases the structural awareness of the self-supervised reconstruction task. Specifically, the strategy begins with random masking and progresses to masking structure-informative nodes based on the assessment scores. This design gradually and effectively guides the model in learning graph structural information. Furthermore, extensive experiments consistently demonstrate that our StructMAE method outperforms existing state-of-the-art GMAE models in both unsupervised and transfer learning tasks. Codes are available at https://github.com/LiuChuang0059/StructMAE.	翻訳日:2024-04-26 19:20:39 公開日:2024-04-24
# すべてのための1つの部分グラフ:帰納的知識グラフ補完のためのオープンな部分グラフの効率的な推論 One Subgraph for All: Efficient Reasoning on Opening Subgraphs for Inductive Knowledge Graph Completion ( http://arxiv.org/abs/2404.15807v1 ) ライセンス: Link先を確認	Zhiwen Xie, Yi Zhang, Guangyou Zhou, Jin Liu, Xinhui Tu, Jimmy Xiangji Huang,	(参考訳) 知識グラフ補完(KGC)は近年、膨大な研究関心を集めており、既存のほとんどの手法は、トレーニング中にすべてのエンティティが観察されるトランスダクティブな設定に従って設計されている。トランスダクティブKGCの進歩にもかかわらず、これらの手法は未知の物質を含む新しいKGの推論に苦慮している。このようにして、見知らぬエンティティ間の欠落リンクを推論することを目的としたインダクティブKGCが、新たなトレンドとなっている。既存の多くの研究は、各候補を囲む囲む部分グラフを抽出することにより、帰納的KGCをグラフ分類問題として変換する。残念ながら、封じ込められたサブグラフの繰り返し抽出による高価な時間消費や、エンティティに依存しない特徴学習の欠如など、いくつかの課題に直面している。これらの問題に対処するために、帰納的KGCのためのグローバルローカルアンカー表現(GLAR)学習法を提案する。囲い込みサブグラフを利用する従来の方法とは異なり、全ての候補に対して共有開口サブグラフを抽出し、推論を行い、より効率的に推論を行うことができる。さらに、新興企業のためのリッチなエンティティ非依存機能を学ぶために、転送可能なグローバルアンカーとローカルアンカーを設計する。最後に、全ての候補をランク付けするために、オープニングサブグラフにグローバルな局所グラフ推論モデルを適用する。大規模な実験により、私たちのGLARは既存の最先端手法よりも優れています。 Knowledge Graph Completion (KGC) has garnered massive research interest recently, and most existing methods are designed following a transductive setting where all entities are observed during training. Despite the great progress on the transductive KGC, these methods struggle to conduct reasoning on emerging KGs involving unseen entities. Thus, inductive KGC, which aims to deduce missing links among unseen entities, has become a new trend. Many existing studies transform inductive KGC as a graph classification problem by extracting enclosing subgraphs surrounding each candidate triple. Unfortunately, they still face certain challenges, such as the expensive time consumption caused by the repeat extraction of enclosing subgraphs, and the deficiency of entity-independent feature learning. To address these issues, we propose a global-local anchor representation (GLAR) learning method for inductive KGC. Unlike previous methods that utilize enclosing subgraphs, we extract a shared opening subgraph for all candidates and perform reasoning on it, enabling the model to perform reasoning more efficiently. Moreover, we design some transferable global and local anchors to learn rich entity-independent features for emerging entities. Finally, a global-local graph reasoning model is applied on the opening subgraph to rank all candidates. Extensive experiments show that our GLAR outperforms most existing state-of-the-art methods.	翻訳日:2024-04-26 19:20:39 公開日:2024-04-24
# Nadir BRDF調整反射率の簡易計算による高度なセンチネル2解析 Facilitating Advanced Sentinel-2 Analysis Through a Simplified Computation of Nadir BRDF Adjusted Reflectance ( http://arxiv.org/abs/2404.15812v1 ) ライセンス: Link先を確認	David Montero, Miguel D. Mahecha, César Aybar, Clemens Mosig, Sebastian Wieneke,	(参考訳) 欧州宇宙機関のコペルニクス計画によるセンチネル2号(S2)ミッションは、地球表面分析に不可欠なデータを提供する。 Level-2Aは、MultiSpectral Instrument (MSI)を通して、高分解能(10-60 m)表面反射率(SR)データを提供する。 SRデータの精度と可視性を向上するためには、ナディアの視界をシミュレートする調整が不可欠である。これらの補正は、SRの異方性の性質と太陽や観測角度の変動に対処し、時間と異なる条件下で一貫した画像の比較を確実にする。単純なアルゴリズムである$c$-factor法は、観測されたS2 SRをMODIS BRDFモデルを用いて調整し、Nadir BRDF Adjusted Reflectance(NBAR)を実現する。個々のイメージへの$c$-factorの直接的な適用にもかかわらず、複数のS2イメージとクラウドストアドデータからのアースシステムデータキューブ(ESDC)をまたいだアプリケーションのための凝集型Pythonフレームワークが不足している。本稿では,S2 SRデータをNBARに変換するPythonパッケージであるsen2nbarを紹介する。本パッケージは、S2 SRデータのNBARへの変換を単一の関数で単純化し、効率的なプロセス管理のためにモジュールに編成する。 SAFEファイルとSPatioTemporal Asset Catalogs (STAC)のESDCのNBAR変換を容易にすることで、sen2nbarは多様なデータフォーマット要求を処理する柔軟なツールとして開発されている。 Sen2nbarがS2データの標準化と調和に大きく貢献することを期待しており、様々なアプリケーションにまたがる多様なユーザに対して堅牢なソリューションを提供する。 sen2nbarはhttps://github.com/ESDS-Leipzig/sen2nbar.comで入手できるオープンソースツールである。 The Sentinel-2 (S2) mission from the European Space Agency's Copernicus program provides essential data for Earth surface analysis. Its Level-2A products deliver high-to-medium resolution (10-60 m) surface reflectance (SR) data through the MultiSpectral Instrument (MSI). To enhance the accuracy and comparability of SR data, adjustments simulating a nadir viewing perspective are essential. These corrections address the anisotropic nature of SR and the variability in sun and observation angles, ensuring consistent image comparisons over time and under different conditions. The $c$-factor method, a simple yet effective algorithm, adjusts observed S2 SR by using the MODIS BRDF model to achieve Nadir BRDF Adjusted Reflectance (NBAR). Despite the straightforward application of the $c$-factor to individual images, a cohesive Python framework for its application across multiple S2 images and Earth System Data Cubes (ESDCs) from cloud-stored data has been lacking. Here we introduce sen2nbar, a Python package crafted to convert S2 SR data to NBAR, supporting both individual images and ESDCs derived from cloud-stored data. This package simplifies the conversion of S2 SR data to NBAR via a single function, organized into modules for efficient process management. By facilitating NBAR conversion for both SAFE files and ESDCs from SpatioTemporal Asset Catalogs (STAC), sen2nbar is developed as a flexible tool that can handle diverse data format requirements. We anticipate that sen2nbar will considerably contribute to the standardization and harmonization of S2 data, offering a robust solution for a diverse range of users across various applications. sen2nbar is an open-source tool available at https://github.com/ESDS-Leipzig/sen2nbar.	翻訳日:2024-04-26 19:20:39 公開日:2024-04-24
# 拡散シュレーディンガー橋による高速組立 Fast Ensembling with Diffusion Schrödinger Bridge ( http://arxiv.org/abs/2404.15814v1 ) ライセンス: Link先を確認	Hyunsu Kim, Jongmin Yoon, Juho Lee,	(参考訳) ディープ・アンサンブル(Deep Ensemble、DE)アプローチは、様々な初期点からニューラルネットワークを訓練し、様々な局所最適点に向かって収束させることにより、ディープ・ニューラルネットワークの性能を高めるための簡単な手法である。しかし、この手法の限界は推論の計算オーバーヘッドが高く、多くの学習されたパラメータを格納し、推論段階で各パラメータに対して個別のフォワードパスを実行する必要性から生じる。本稿では,Diffusion Bridge Network (DBN) と呼ばれる新しい手法を提案する。 Schr\\odinger Bridgeの理論に基づいて、単一アンサンブル部材の出力分布とアンサンブルモデルの出力分布を接続する確率微分方程式(SDE)を直接シミュレートし、すべてのアンサンブルモデルを前方通過することなくアンサンブル予測を得る。重アンサンブルをDBNを構成する軽量ニューラルネットワークに置き換えることで,CIFAR-10,CIFAR-100,TinyImageNetなどのベンチマークデータセットの精度と不確実性を維持しつつ,計算コストを削減した推論を実現した。実装はhttps://github.com/kim-hyunsu/dbn.comで公開しています。 Deep Ensemble (DE) approach is a straightforward technique used to enhance the performance of deep neural networks by training them from different initial points, converging towards various local optima. However, a limitation of this methodology lies in its high computational overhead for inference, arising from the necessity to store numerous learned parameters and execute individual forward passes for each parameter during the inference stage. We propose a novel approach called Diffusion Bridge Network (DBN) to address this challenge. Based on the theory of the Schr\"odinger bridge, this method directly learns to simulate an Stochastic Differential Equation (SDE) that connects the output distribution of a single ensemble member to the output distribution of the ensembled model, allowing us to obtain ensemble prediction without having to invoke forward pass through all the ensemble models. By substituting the heavy ensembles with this lightweight neural network constructing DBN, we achieved inference with reduced computational cost while maintaining accuracy and uncertainty scores on benchmark datasets such as CIFAR-10, CIFAR-100, and TinyImageNet. Our implementation is available at https://github.com/kim-hyunsu/dbn.	翻訳日:2024-04-26 19:20:39 公開日:2024-04-24
# 単一視点Scene Point Cloud Human Grasp Generation Single-View Scene Point Cloud Human Grasp Generation ( http://arxiv.org/abs/2404.15815v1 ) ライセンス: Link先を確認	Yan-Kang Wang, Chengyi Xing, Yi-Lin Wei, Xiao-Ming Wu, Wei-Shi Zheng,	(参考訳) 本研究では,一つの視点から物体を観察する典型的な現実の状況を,より正確に反映した,一視点のシーンポイント雲に基づく人間のつかみを生成する新しい課題について検討する。オブジェクト・ポイント・クラウドの不完全性や多数のシーン・ポイントの存在により、生成した手はオブジェクトの見えない部分に侵入しやすくなり、シーン・ポイントの影響を受けやすい。そこで我々は,S2HGraspという2つの重要なモジュールからなるフレームワークを紹介した。グローバルパーセプションモジュールは部分的オブジェクトポイントの雲をグローバルに知覚し,DiffuGraspモジュールはシーンポイントを含む複雑な入力に基づいて高品質な人間の把握を生成するように設計されている。さらに,S2HGDデータセットを導入し,1,668個のユニークなオブジェクトからなる,約99,000個の単一オブジェクトのシーンポイントクラウドから構成した。我々の広範な実験により、S2HGraspはシーンポイントによらず自然の人間のつかみを生成できるだけでなく、手と物体の見えない部分の侵入を効果的に防止できることが示された。さらに,本モデルでは,目に見えない物体に適用した場合に,強い一般化能力を示す。私たちのコードとデータセットはhttps://github.com/iSEE-Laboratory/S2HGrasp.orgで公開されています。 In this work, we explore a novel task of generating human grasps based on single-view scene point clouds, which more accurately mirrors the typical real-world situation of observing objects from a single viewpoint. Due to the incompleteness of object point clouds and the presence of numerous scene points, the generated hand is prone to penetrating into the invisible parts of the object and the model is easily affected by scene points. Thus, we introduce S2HGrasp, a framework composed of two key modules: the Global Perception module that globally perceives partial object point clouds, and the DiffuGrasp module designed to generate high-quality human grasps based on complex inputs that include scene points. Additionally, we introduce S2HGD dataset, which comprises approximately 99,000 single-object single-view scene point clouds of 1,668 unique objects, each annotated with one human grasp. Our extensive experiments demonstrate that S2HGrasp can not only generate natural human grasps regardless of scene points, but also effectively prevent penetration between the hand and invisible parts of the object. Moreover, our model showcases strong generalization capability when applied to unseen objects. Our code and dataset are available at https://github.com/iSEE-Laboratory/S2HGrasp.	翻訳日:2024-04-26 19:20:39 公開日:2024-04-24
# 視覚変換器を用いた対向領域適応 Vision Transformer-based Adversarial Domain Adaptation ( http://arxiv.org/abs/2404.15817v1 ) ライセンス: Link先を確認	Yahan Li, Yuan Wu,	(参考訳) Unsupervised domain adapt (UDA) は、ラベル付きソースドメインからラベル付きターゲットドメインに知識を転送することを目的としている。最新のUDA法は、常に敵の訓練を頼りに最先端の結果を得ることができ、既存のUDA法では、畳み込みニューラルネットワーク(CNN)を特徴抽出器として、ドメイン不変の特徴を学習している。視覚変換器(ViT)は、その出現以来大きな注目を集め、画像分類、オブジェクト検出、セマンティックセグメンテーションなど様々なコンピュータビジョンタスクで広く利用されているが、敵領域適応のポテンシャルは研究されていない。本稿では,このギャップを,対向領域適応における特徴抽出器としてViTを用いて埋める。さらに,既存のUDA手法でCNNベースの特徴抽出器を直接置き換えることで,VTベースの特徴抽出器の性能向上が容易に実現可能であることを実証的に示す。コードはhttps://github.com/LluckyYH/VT-ADAで公開されている。 Unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. The most recent UDA methods always resort to adversarial training to yield state-of-the-art results and a dominant number of existing UDA methods employ convolutional neural networks (CNNs) as feature extractors to learn domain invariant features. Vision transformer (ViT) has attracted tremendous attention since its emergence and has been widely used in various computer vision tasks, such as image classification, object detection, and semantic segmentation, yet its potential in adversarial domain adaptation has never been investigated. In this paper, we fill this gap by employing the ViT as the feature extractor in adversarial domain adaptation. Moreover, we empirically demonstrate that ViT can be a plug-and-play component in adversarial domain adaptation, which means directly replacing the CNN-based feature extractor in existing UDA methods with the ViT-based feature extractor can easily obtain performance improvement. The code is available at https://github.com/LluckyYH/VT-ADA.	翻訳日:2024-04-26 19:20:39 公開日:2024-04-24
# SynthEval: 語彙合成データの詳細なユーティリティとプライバシ評価のためのフレームワーク SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data ( http://arxiv.org/abs/2404.15821v1 ) ライセンス: Link先を確認	Anton Danholt Lautrup, Tobias Hyrup, Arthur Zimek, Peter Schneider-Kamp,	(参考訳) データ不足、データの公正性、データプライバシといった、機械学習の現代的問題に対処する合成データの需要が高まっているため、データの有用性と潜在的なプライバシリスクを評価するための堅牢なツールが不可欠である。オープンソースの新しい評価フレームワークであるSynthEvalは、特別な種類の事前処理ステップを仮定することなく、カテゴリ属性と数値属性を同等のケアで扱うことで、既存のツールと差別化している。これは、事実上あらゆる表レコードの合成データセットに適用できる。我々のツールは統計的および機械学習技術を利用して、合成データの忠実度とプライバシー保護の整合性を包括的に評価する。 SynthEvalは、独立して、あるいは高度にカスタマイズ可能なベンチマーク設定で使用でき、追加のメトリクスで容易に拡張できる、幅広い種類のメトリクスを統合する。本稿では,SynthEvalについて述べるとともに,その汎用性を例に示す。このフレームワークは、より良いベンチマークとより一貫性のあるモデル機能の比較を促進する。 With the growing demand for synthetic data to address contemporary issues in machine learning, such as data scarcity, data fairness, and data privacy, having robust tools for assessing the utility and potential privacy risks of such data becomes crucial. SynthEval, a novel open-source evaluation framework distinguishes itself from existing tools by treating categorical and numerical attributes with equal care, without assuming any special kind of preprocessing steps. This~makes it applicable to virtually any synthetic dataset of tabular records. Our tool leverages statistical and machine learning techniques to comprehensively evaluate synthetic data fidelity and privacy-preserving integrity. SynthEval integrates a wide selection of metrics that can be used independently or in highly customisable benchmark configurations, and can easily be extended with additional metrics. In this paper, we describe SynthEval and illustrate its versatility with examples. The framework facilitates better benchmarking and more consistent comparisons of model capabilities.	翻訳日:2024-04-26 19:20:39 公開日:2024-04-24
# 決定論的環境における再帰的後方Q-Learning Recursive Backwards Q-Learning in Deterministic Environments ( http://arxiv.org/abs/2404.15822v1 ) ライセンス: Link先を確認	Jan Diekhoff, Jörn Fischer,	(参考訳) 強化学習は複雑な問題に対する最適解を見つける一般的な方法である。 Q-learningのようなアルゴリズムは、環境のモデルを使わずに確率的な問題を解決する学習に長けている。しかし、決定論的問題の解決には必要以上に時間がかかる。このようなモデルベースのアプローチを導入することで、決定論的問題を解決するためにQラーニングを改善することができる。本稿では,再帰的逆向きQ-ラーニング(RBQL)エージェントについて紹介する。終端状態に達した後、このモデルを通してその値を後方に再帰的に伝播する。これにより、長い学習プロセスなしに、各状態が最適な値に評価される。迷路を通る最短経路を見つける例として、このエージェントは通常のQ-ラーニングエージェントを大きく上回る。 Reinforcement learning is a popular method of finding optimal solutions to complex problems. Algorithms like Q-learning excel at learning to solve stochastic problems without a model of their environment. However, they take longer to solve deterministic problems than is necessary. Q-learning can be improved to better solve deterministic problems by introducing such a model-based approach. This paper introduces the recursive backwards Q-learning (RBQL) agent, which explores and builds a model of the environment. After reaching a terminal state, it recursively propagates its value backwards through this model. This lets each state be evaluated to its optimal value without a lengthy learning process. In the example of finding the shortest path through a maze, this agent greatly outperforms a regular Q-learning agent.	翻訳日:2024-04-26 19:20:39 公開日:2024-04-24
# ニューラルネットワークハードウェアアクセラレータのための構成可能かつ効率的なメモリ階層 A Configurable and Efficient Memory Hierarchy for Neural Network Hardware Accelerator ( http://arxiv.org/abs/2404.15823v1 ) ライセンス: Link先を確認	Oliver Bause, Paul Palomero Bernardo, Oliver Bringmann,	(参考訳) 機械学習アプリケーションが進化を続けるにつれ、ディープニューラルネットワーク(DNN)に特化している効率的なハードウェアアクセラレーターの需要はますます重要になっている。本稿では,DNNの適応型メモリアクセスパターン毎に設定可能なメモリ階層化フレームワークを提案する。階層は、アクセルの計算ユニットにデータを提供するために、オフチップメモリからオンデマンドでデータを要求する。目的は、必要なメモリ容量を最小化することと、高いアクセラレータ性能を維持することのバランスを最適化することである。このフレームワークは設定性に特徴があり、最大5レベルまで調整されたメモリ階層を作成することができる。さらに、このフレームワークは、メモリ管理プロセスの柔軟性を高めるために、オプションシフトレジスタを最終レベルとして組み込んでいる。 DNN層の包括的ループネスト解析により、ほとんどのループアンロールのアクセスパターンを効率的に実行できることが示されている。 DNN加速器UltraTrailの合成結果とケーススタディは、より小さなメモリモジュールを使用することができるため、チップ面積を62.2%まで削減できる可能性を示唆している。同時に、パフォーマンス損失は2.4%まで最小化できる。 As machine learning applications continue to evolve, the demand for efficient hardware accelerators, specifically tailored for deep neural networks (DNNs), becomes increasingly vital. In this paper, we propose a configurable memory hierarchy framework tailored for per layer adaptive memory access patterns of DNNs. The hierarchy requests data on-demand from the off-chip memory to provide it to the accelerator's compute units. The objective is to strike an optimized balance between minimizing the required memory capacity and maintaining high accelerator performance. The framework is characterized by its configurability, allowing the creation of a tailored memory hierarchy with up to five levels. Furthermore, the framework incorporates an optional shift register as final level to increase the flexibility of the memory management process. A comprehensive loop-nest analysis of DNN layers shows that the framework can efficiently execute the access patterns of most loop unrolls. Synthesis results and a case study of the DNN accelerator UltraTrail indicate a possible reduction in chip area of up to 62.2% as smaller memory modules can be used. At the same time, the performance loss can be minimized to 2.4%.	翻訳日:2024-04-26 19:20:39 公開日:2024-04-24
# 量子ゲートのロバストな複雑さ:基礎 Robust Quantum Gate Complexity: Foundations ( http://arxiv.org/abs/2404.15828v1 ) ライセンス: Link先を確認	Johannes Aspman, Vyacheslav Kungurtsev, Jakub Marecek,	(参考訳) クローズド量子システムの最適制御は、量子コンピュータの実装と理解において重要な役割を担っている、幾何学的にエレガントな計算理論と技法の集合である。回路自体の設計は、初期的かつ容易に準備された状態から、ある意味でユーザに対して通知されるもの、例えば、評価が回路の一部であるオラクルへ、キュービットを操るために、適切なゲートセット(制御オペランドとして現れる)を選択する最適制御問題に対応する。しかし、現代のデバイスはノイズが多いことが知られており、回路が意図した動作をするかどうかは定かではない。しかし、より広範な最適制御理論には計算ツールが存在するが、不確実性や誤りに関して量子制御系の適切な操作の堅牢性はまだ研究されていない。本稿では,閉量子最適制御とその幾何学的解釈への関連性から着想を得た新しいアプローチを提案する。この目的のために、量子制御の文脈におけるロバストネスの適切な問題定義を示し、ゲート複雑性に対するより広範な影響に焦点を当てる。 Optimal control of closed quantum systems is a well studied geometrically elegant set of computational theory and techniques that have proven pivotal in the implementation and understanding of quantum computers. The design of a circuit itself corresponds to an optimal control problem of choosing the appropriate set of gates (which appear as control operands) in order to steer a qubit from an initial, easily prepared state, to one that is informative to the user in some sense, for e.g., an oracle whose evaluation is part of the circuit. However, contemporary devices are known to be noisy, and it is not certain that a circuit will behave as intended. Yet, although the computational tools exist in broader optimal control theory, robustness of adequate operation of a quantum control system with respect to uncertainty and errors has not yet been broadly studied in the literature. In this paper, we propose a new approach inspired by the closed quantum optimal control and its connection to geometric interpretations. To this end, we present the appropriate problem definitions of robustness in the context of quantum control, focusing on its broader implications for gate complexity.	翻訳日:2024-04-26 19:20:39 公開日:2024-04-24
# ジェファーソン研究室におけるディープラーニングを用いたキャビティ故障予測の高速化 Accelerating Cavity Fault Prediction Using Deep Learning at Jefferson Laboratory ( http://arxiv.org/abs/2404.15829v1 ) ライセンス: Link先を確認	Monibor Rahman, Adam Carpenter, Khan Iftekharuddin, Chris Tennant,	(参考訳) 加速キャビティはジェファーソン研究所の連続電子ビーム加速器施設(CEBAF)の不可欠な部分である。 CEBAFの400以上のキャビティのうちのどれかが欠陥を経験すると、実験的なユーザーホールへのビームの送出を妨害する。本研究では,緩やかに発達する空洞断層を予測するための深層学習モデルを提案する。プリフォールト信号を利用してLSTM-CNNバイナリ分類器を訓練し,通常の動作中の高周波信号と差し迫った故障を示すRF信号とを識別する。我々は、故障の信頼度を調整し、複数連続するウィンドウ基準を実装して、故障事象を識別し、偽陽性率を低くすることで、モデルを最適化する。展開シナリオをシミュレートする加速キャビティから収集された実際のデータセットの分析から得られた結果は、モデルが正常な信号を99.99%の精度で識別し、ゆっくりと発達する断層の80%を正確に予測する能力を示している。特に、これらの成果は高度に不均衡なデータセットの文脈で達成され、断層の開始前に数百ミリ秒の故障予測が行われた。予測障害により、プリエンプティブ対策は、その発生を予防または緩和することで、運用効率を向上させることができる。 Accelerating cavities are an integral part of the Continuous Electron Beam Accelerator Facility (CEBAF) at Jefferson Laboratory. When any of the over 400 cavities in CEBAF experiences a fault, it disrupts beam delivery to experimental user halls. In this study, we propose the use of a deep learning model to predict slowly developing cavity faults. By utilizing pre-fault signals, we train a LSTM-CNN binary classifier to distinguish between radio-frequency (RF) signals during normal operation and RF signals indicative of impending faults. We optimize the model by adjusting the fault confidence threshold and implementing a multiple consecutive window criterion to identify fault events, ensuring a low false positive rate. Results obtained from analysis of a real dataset collected from the accelerating cavities simulating a deployed scenario demonstrate the model's ability to identify normal signals with 99.99% accuracy and correctly predict 80% of slowly developing faults. Notably, these achievements were achieved in the context of a highly imbalanced dataset, and fault predictions were made several hundred milliseconds before the onset of the fault. Anticipating faults enables preemptive measures to improve operational efficiency by preventing or mitigating their occurrence.	翻訳日:2024-04-26 19:20:39 公開日:2024-04-24
# OpTC -- AURIX TC3xxマイクロコントローラ上にニューラルネットワークをデプロイするためのツールチェーン OpTC -- A Toolchain for Deployment of Neural Networks on AURIX TC3xx Microcontrollers ( http://arxiv.org/abs/2404.15833v1 ) ライセンス: Link先を確認	Christian Heidorn, Frank Hannig, Dominik Riedelbauch, Christoph Strohmeyer, Jürgen Teich,	(参考訳) AURIX 2xxおよび3xxシリーズのTriCoreマイクロコントローラは、自動車業界や、最近では機械学習タスクを含むアプリケーションでも広く使われている。しかし、これらのアプリケーションは主に手動で設計されており、TriCoreマイクロコントローラにニューラルネットワークをもたらすためのツールサポートはほとんどない。そこで我々は,TC3xxマイクロコントローラ上でのニューラルネットワークの自動圧縮,変換,コード生成,デプロイのためのエンドツーエンドツールチェーンであるOPCを提案する。 OpTCは、さまざまなタイプのニューラルネットワークをサポートし、与えられたニューラルネットワークの感度分析に基づいて、レイヤワイズプルーニングを使用して圧縮を提供する。マルチ層パーセプトロン(MLP)、畳み込みニューラルネットワーク(CNN)、リカレントニューラルネットワーク(RNN)など、さまざまなタイプのニューラルネットワークをサポートする柔軟性が、TC387マイクロコントローラのケーススタディで示されている。これにより、電気モーターの温度予測と異常検出のための自動車応用を用いて、OPTCがサポートする幅広い応用の有効性と適用範囲を実証する。 The AURIX 2xx and 3xx families of TriCore microcontrollers are widely used in the automotive industry and, recently, also in applications that involve machine learning tasks. Yet, these applications are mainly engineered manually, and only little tool support exists for bringing neural networks to TriCore microcontrollers. Thus, we propose OpTC, an end-to-end toolchain for automatic compression, conversion, code generation, and deployment of neural networks on TC3xx microcontrollers. OpTC supports various types of neural networks and provides compression using layer-wise pruning based on sensitivity analysis for a given neural network. The flexibility in supporting different types of neural networks, such as multi-layer perceptrons (MLP), convolutional neural networks (CNN), and recurrent neural networks (RNN), is shown in case studies for a TC387 microcontroller. Automotive applications for predicting the temperature in electric motors and detecting anomalies are thereby used to demonstrate the effectiveness and the wide range of applications supported by OpTC.	翻訳日:2024-04-26 19:20:39 公開日:2024-04-24
# 2原子エンタングルメントの加工媒体を有する量子エンジンを用いたエネルギー変換装置 Energy-conversion device using a quantum engine with the work medium of two-atom entanglement ( http://arxiv.org/abs/2404.15835v1 ) ライセンス: Link先を確認	J. -W. Zhang, B. Wang, W. -F. Yuan, J. -C. Li, J. -T. Bu, G. -Y. Ding, W. -Q. Ding, L. Chen, F. Zhou, M. Feng,	(参考訳) 絡み合いは量子情報処理の必須資源と考えられているが、絡み合いがエネルギー変換や量子状態の出力に役立つかどうかはまだ実験的な証拠がない。本稿では,高調波ポテンシャルに閉じ込められた2つの絡み合ったイオンによって作用する中間体を有する量子エンジンとして動作するエネルギー変換装置について報告する。 2つのイオンは、2つのイオンが共有する振動モードの1つに仮想的に結合することで絡み合っており、量子エンジンは別の共振モードである量子負荷に結合する。本研究では, 量子エンジンのエネルギー変換効率について検討し, 量子負荷に蓄積される有用エネルギー(すなわち最大抽出可能作業)を, 2つのイオンを異なるエンタングルメントの度合いで調整し, 負荷中のフォノンの変化を検出することにより検討する。我々の観測は、エンタングルメントが量子エンジンによって生成される有用なエネルギーを燃料にするが、エネルギー変換効率には役に立たないという、初めて定量的な証拠を提供する。この結果は,最大抽出可能エネルギーを最も高指標とする量子電池の研究に有用であると考えられる。 Although entanglement is considered as an essential resource for quantum information processing, whether entanglement helps for energy conversion or output in the quantum regime is still lack of experimental witness. Here we report on an energy-conversion device operating as a quantum engine with the working medium acted by two entangled ions confined in a harmonic potential. The two ions are entangled by virtually coupling to one of the vibrational modes shared by the two ions, and the quantum engine couples to a quantum load, which is another shared vibrational mode. We explore the energy conversion efficiency of the quantum engine and investigate the useful energy (i.e., the maximum extractable work) stored in the quantum load by tuning the two ions in different degrees of entanglement as well as detecting the change of the phonons in the load. Our observation provides, for the first time, quantitative evidence that entanglement fuels the useful energy produced by the quantum engine, but not helpful for the energy conversion efficiency. We consider that our results may be useful to the study of quantum batteries for which one of the most indexes is the maximum extractable energy.	翻訳日:2024-04-26 19:20:39 公開日:2024-04-24
# 難易度タブラルデータストリーム分類における2次元単語埋め込みの利用 Employing Two-Dimensional Word Embedding for Difficult Tabular Data Stream Classification ( http://arxiv.org/abs/2404.15836v1 ) ライセンス: Link先を確認	Paweł Zyblewski,	(参考訳) 急速な技術進歩はデータ量の増加と本質的に結びついており、その大部分はデータストリームとして解釈でき、概念のドリフト現象を示し、高い不均衡比を持つことができる。したがって、難しいデータストリームを分類するための新しいアプローチを開発することは、急速に成長する研究分野である。同時に、ディープラーニングとトランスファーラーニングの普及と、コンピュータビジョンタスクにおける畳み込みニューラルネットワークの成功は、表層データを離散デジタル信号の同質な形式に変換することに焦点を当てた、新しい研究トレンドであるMDE(Multi-dimensional Encoding)の出現に寄与している。本稿では,SSTML(Streaming Super Tabular Machine Learning)を提案する。 SSTMLは、連続したチャンクをSTMLアルゴリズムを用いて画像表現にエンコードし、単一のResNet-18トレーニングエポックを実行する。合成データストリームと実データストリームで実施された実験は、SSTMLが、同等の処理時間を維持しながら、最先端のアルゴリズムよりも統計的に優れた分類品質を達成できることを実証した。 Rapid technological advances are inherently linked to the increased amount of data, a substantial portion of which can be interpreted as data stream, capable of exhibiting the phenomenon of concept drift and having a high imbalance ratio. Consequently, developing new approaches to classifying difficult data streams is a rapidly growing research area. At the same time, the proliferation of deep learning and transfer learning, as well as the success of convolutional neural networks in computer vision tasks, have contributed to the emergence of a new research trend, namely Multi-Dimensional Encoding (MDE), focusing on transforming tabular data into a homogeneous form of a discrete digital signal. This paper proposes Streaming Super Tabular Machine Learning (SSTML), thereby exploring for the first time the potential of MDE in the difficult data stream classification task. SSTML encodes consecutive data chunks into an image representation using the STML algorithm and then performs a single ResNet-18 training epoch. Experiments conducted on synthetic and real data streams have demonstrated the ability of SSTML to achieve classification quality statistically significantly superior to state-of-the-art algorithms while maintaining comparable processing time.	翻訳日:2024-04-26 19:20:39 公開日:2024-04-24
# IOHファインダを用いた動的二項値問題の実証解析 Empirical Analysis of the Dynamic Binary Value Problem with IOHprofiler ( http://arxiv.org/abs/2404.15837v1 ) ライセンス: Link先を確認	Diederick Vermetten, Johannes Lengler, Dimitri Rusin, Thomas Bäck, Carola Doerr,	(参考訳) 動的環境における最適化問題は、近年、いくつかの理論的研究の源泉となっている。これらの問題の1つは単調な動的二項値問題であり、これは理論的には異なる遺伝的アルゴリズムの間で高い判別能力を持つ。この理論的な基礎から、この問題のいくつかのバージョンをIOH prominentrベンチマークフレームワークに統合する。この積分を用いて、中等次元問題に関する理論結果を再現する大規模ベンチマーク実験を行い、まだ理論的に研究されていないGAの性能について検討する。本結果は, 理論とベンチマークの相乗効果の多さを浮き彫りにして, 動的最適化問題のさらなる研究を行うためのプラットフォームを提供するものである。 Optimization problems in dynamic environments have recently been the source of several theoretical studies. One of these problems is the monotonic Dynamic Binary Value problem, which theoretically has high discriminatory power between different Genetic Algorithms. Given this theoretical foundation, we integrate several versions of this problem into the IOHprofiler benchmarking framework. Using this integration, we perform several large-scale benchmarking experiments to both recreate theoretical results on moderate dimensional problems and investigate aspects of GA's performance which have not yet been studied theoretically. Our results highlight some of the many synergies between theory and benchmarking and offer a platform through which further research into dynamic optimization problems can be performed.	翻訳日:2024-04-26 19:20:39 公開日:2024-04-24
# シーケンスによる記述論理に対する構成的補間と概念に基づくベス定義可能性 Constructive Interpolation and Concept-Based Beth Definability for Description Logics via Sequents ( http://arxiv.org/abs/2404.15840v1 ) ライセンス: Link先を確認	Tim S. Lyon, Jonas Karge,	(参考訳) 本稿では,多数の記述論理(DL)に適用可能なコンストラクティブな手法を導入し,一連のシステムに基づく概念に基づくBeth Definability Properties(CBP)を確立する。高い表現力を持つDL RIQをケーススタディとして、RIQオントロジーのための新しいシークエント計算を導入し、暗黙的に定義可能な概念の明示的な定義の抽出を可能にするシークエント計算から、ある種の補間体をどのように計算できるかを示す。我々の知る限りでは、これは補間子と定義をDLの文脈内で計算する最初のシーケントベースのアプローチであり、RIQがCBPを楽しむ最初の証明である。さらに, 逐次システムのモジュラリティのため, RIQ の制限は認められず, 適切な修正により他の DL にも適用可能である。 We introduce a constructive method applicable to a large number of description logics (DLs) for establishing the concept-based Beth definability property (CBP) based on sequent systems. Using the highly expressive DL RIQ as a case study, we introduce novel sequent calculi for RIQ-ontologies and show how certain interpolants can be computed from sequent calculus proofs, which permit the extraction of explicit definitions of implicitly definable concepts. To the best of our knowledge, this is the first sequent-based approach to computing interpolants and definitions within the context of DLs, as well as the first proof that RIQ enjoys the CBP. Moreover, due to the modularity of our sequent systems, our results hold for any restriction of RIQ, and are applicable to other DLs by suitable modifications.	翻訳日:2024-04-26 19:10:55 公開日:2024-04-24
# 共同評価とフィードバック生成のためのLDMプロンプティング戦略の探索 Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback Generation ( http://arxiv.org/abs/2404.15845v1 ) ライセンス: Link先を確認	Maja Stahl, Leon Biermann, Andreas Nehring, Henning Wachsmuth,	(参考訳) 個々のフィードバックは、学生がエッセイを書くスキルを改善するのに役立つ。しかし、そのようなフィードバックを提供するために必要な手作業は、実際は個人化を制限する。自動生成エッセイフィードバックは、生徒を自身のペース、利便性、望ましい頻度で指導する代替手段として機能する。大規模言語モデル(LLM)は、一貫性と文脈に関連のあるテキストを生成する上で、強力な性能を示している。しかし、役に立つエッセイフィードバックを提供する能力は不明確である。本研究は,LLMをベースとしたゼロショットと数発のエッセイフィードバックの促進戦略について検討する。 Chain-of-Thoughtのプロンプトにインスパイアされた私たちは、自動エッセイスコア(AES)が生成したフィードバックの品質にどのような影響を及ぼすか、その程度について調査する。 LLMが達成できるAES性能と、生成したエッセイフィードバックの有用性の両方を評価した。その結果,AESとフィードバック生成を併用することで,AESの性能が向上することが示唆された。しかし,我々の手作業による評価では,生成したエッセイフィードバックの品質が重視される一方で,生成したフィードバックに対するエッセイスコアリングの影響は依然として低いままである。 Individual feedback can help students improve their essay writing skills. However, the manual effort required to provide such feedback limits individualization in practice. Automatically-generated essay feedback may serve as an alternative to guide students at their own pace, convenience, and desired frequency. Large language models (LLMs) have demonstrated strong performance in generating coherent and contextually relevant text. Yet, their ability to provide helpful essay feedback is unclear. This work explores several prompting strategies for LLM-based zero-shot and few-shot generation of essay feedback. Inspired by Chain-of-Thought prompting, we study how and to what extent automated essay scoring (AES) can benefit the quality of generated feedback. We evaluate both the AES performance that LLMs can achieve with prompting only and the helpfulness of the generated essay feedback. Our results suggest that tackling AES and feedback generation jointly improves AES performance. However, while our manual evaluation emphasizes the quality of the generated essay feedback, the impact of essay scoring on the generated feedback remains low ultimately.	翻訳日:2024-04-26 19:10:55 公開日:2024-04-24
# 複雑から単純へ:大規模言語モデルの能力を考慮した多制約複雑命令の強化 From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models ( http://arxiv.org/abs/2404.15846v1 ) ライセンス: Link先を確認	Qianyu He, Jie Zeng, Qianxi He, Jiaqing Liang, Yanghua Xiao,	(参考訳) 大規模言語モデル(LLM)では、複雑な命令(複雑な命令に従う)で命令に従うことが必須である。しかし、LLMが複数の制約を持つ複雑な命令に従う能力をいかに拡張するかは、まだ解明されていない。このギャップを埋めるために、私たちはまず、能力に追従する複雑な制約を強化するのに有効なトレーニングデータについて研究する。複数の制約を含む命令でLLMを訓練することで、複雑な命令、特に複雑性レベルが低い命令の理解が促進されることが判明した。この改善はドメイン外制約の合成にも応用できる。さらに,有効なトレーニングデータを取得する方法と活用方法についても提案する。最後に,本手法の有効性を,総合的な性能,訓練効率,一般化能力の4つの条件で検証するために,広範囲な実験を行った。 It is imperative for Large language models (LLMs) to follow instructions with elaborate requirements (i.e. Complex Instructions Following). Yet, it remains under-explored how to enhance the ability of LLMs to follow complex instructions with multiple constraints. To bridge the gap, we initially study what training data is effective in enhancing complex constraints following abilities. We found that training LLMs with instructions containing multiple constraints enhances their understanding of complex instructions, especially those with lower complexity levels. The improvement can even generalize to compositions of out-of-domain constraints. Additionally, we further propose methods addressing how to obtain and utilize the effective training data. Finally, we conduct extensive experiments to prove the effectiveness of our methods in terms of overall performance, training efficiency, and generalization abilities under four settings.	翻訳日:2024-04-26 19:10:55 公開日:2024-04-24
# 3D Freehand Ultrasound using Visual Inertial and Deep Inertial Odometry for Measurement of Patellar Tracking (特集:ユビキタス・バイオサイバネティックスとバイオサイバネティックス) 3D Freehand Ultrasound using Visual Inertial and Deep Inertial Odometry for Measuring Patellar Tracking ( http://arxiv.org/abs/2404.15847v1 ) ライセンス: Link先を確認	Russell Buchanan, S. Jack Tu, Marco Camurri, Stephen J. Mellon, Maurice Fallon,	(参考訳) 膝蓋骨関節症(PFJ)は4名中1名に影響を及ぼし、治療にもかかわらず20%が慢性膝関節痛を発症した。膝置換術後の粗悪な結果と痛みは、しばしばパテラーの奇形追跡と結びついている。従来のCTやMRIのような画像技術では、コストや金属のアーチファクトといった課題に直面しています。関節の動きをモニターする新しいシステムは、PFJのダイナミクスの理解を大幅に改善し、より良い患者のケアと結果を支援する。 2次元超音波とモーショントラッキングを組み合わせることで, セマンティックセグメンテーションと位置登録による関節の3次元再構築が可能である。しかし,スキャナの軌跡を推定するための高価な外部インフラの必要性は,ハンドヘルド超音波による3次元骨の再構築を臨床的に行う上での最大の限界である。携帯型超音波スキャナー追跡のためのモーションキャプチャーの代替として,視覚慣性オドメトリー (VIO) と深層学習に基づく慣性オンドメトリー法を提案した。これらの方法で生成された3次元再構成は、PFJの評価と、自由手超音波スキャンによるさらなる測定の可能性を実証している。その結果, 平均復元誤差は1.25mm, 平均1.21mmであった。 VIO法は、外部インフラを必要とする方法に匹敵する精度で、ワイヤレスハンドヘルド超音波スキャンから骨を3次元再構成するための最初のインフラストラクチャフリーな方法である。 Patellofemoral joint (PFJ) issues affect one in four people, with 20% experiencing chronic knee pain despite treatment. Poor outcomes and pain after knee replacement surgery are often linked to patellar mal-tracking. Traditional imaging methods like CT and MRI face challenges, including cost and metal artefacts, and there's currently no ideal way to observe joint motion without issues such as soft tissue artefacts or radiation exposure. A new system to monitor joint motion could significantly improve understanding of PFJ dynamics, aiding in better patient care and outcomes. Combining 2D ultrasound with motion tracking for 3D reconstruction of the joint using semantic segmentation and position registration can be a solution. However, the need for expensive external infrastructure to estimate the trajectories of the scanner remains the main limitation to implementing 3D bone reconstruction from handheld ultrasound scanning clinically. We proposed the Visual-Inertial Odometry (VIO) and the deep learning-based inertial-only odometry methods as alternatives to motion capture for tracking a handheld ultrasound scanner. The 3D reconstruction generated by these methods has demonstrated potential for assessing the PFJ and for further measurements from free-hand ultrasound scans. The results show that the VIO method performs as well as the motion capture method, with average reconstruction errors of 1.25 mm and 1.21 mm, respectively. The VIO method is the first infrastructure-free method for 3D reconstruction of bone from wireless handheld ultrasound scanning with an accuracy comparable to methods that require external infrastructure.	翻訳日:2024-04-26 19:10:55 公開日:2024-04-24
# LLMにおける概念抽象化の検出 Detecting Conceptual Abstraction in LLMs ( http://arxiv.org/abs/2404.15848v1 ) ライセンス: Link先を確認	Michaela Regneri, Alhassan Abdelhalim, Sören Laue,	(参考訳) 本稿では,大言語モデル (LLM) 内で名詞の抽象化を検出する新しい手法を提案する。分類学関係における名詞対の心理的動機付けから始めると、ハイパーネミーを示す表面パターンをインスタンス化し、BERTが生成する注意行列を解析する。結果を2つの反事実集合と比較し、名詞対の分布的類似性にのみ関連付けられない抽象機構においてハイパーネミーを検出できることを示す。我々の発見は、LLMにおける概念的抽象性の説明可能性への第一歩である。 We present a novel approach to detecting noun abstraction within a large language model (LLM). Starting from a psychologically motivated set of noun pairs in taxonomic relationships, we instantiate surface patterns indicating hypernymy and analyze the attention matrices produced by BERT. We compare the results to two sets of counterfactuals and show that we can detect hypernymy in the abstraction mechanism, which cannot solely be related to the distributional similarity of noun pairs. Our findings are a first step towards the explainability of conceptual abstraction in LLMs.	翻訳日:2024-04-26 19:10:55 公開日:2024-04-24
# 質問応答のためのモバイルデバイスへの大規模言語モデル移植 Porting Large Language Models to Mobile Devices for Question Answering ( http://arxiv.org/abs/2404.15851v1 ) ライセンス: Link先を確認	Hannes Fassold,	(参考訳) モバイルデバイスにLLM(Large Language Models)をデプロイすることで、デバイス上で自然言語処理のすべての機能が利用できるようになる。 LLMの重要なユースケースは質問応答であり、幅広いユーザクエリに対して正確でコンテキスト的に関連する回答を提供することができる。我々は、どのようにして最先端のLCMをモバイルデバイスに移植し、デバイス上でネイティブに動作させたかを説明した。 LLM推論には、柔軟で自己完結したC++フレームワークであるllama.cppフレームワークを使用します。我々は、30億のパラメータを持つOrca-Mini-3Bモデルの6ビット量子化バージョンを選択し、このモデルの正しいプロンプトフォーマットを提示した。実験結果から,LLM推論はGalaxy S21スマートフォン上で対話的な速度で動作し,政治や地理,歴史など,さまざまな分野の質問に対する高品質な回答が得られた。 Deploying Large Language Models (LLMs) on mobile devices makes all the capabilities of natural language processing available on the device. An important use case of LLMs is question answering, which can provide accurate and contextually relevant answers to a wide array of user queries. We describe how we managed to port state of the art LLMs to mobile devices, enabling them to operate natively on the device. We employ the llama.cpp framework, a flexible and self-contained C++ framework for LLM inference. We selected a 6-bit quantized version of the Orca-Mini-3B model with 3 billion parameters and present the correct prompt format for this model. Experimental results show that LLM inference runs in interactive speed on a Galaxy S21 smartphone and that the model delivers high-quality answers to user queries related to questions from different subjects like politics, geography or history.	翻訳日:2024-04-26 19:10:55 公開日:2024-04-24
# QOPTLib: 組合せ最適化問題のための量子コンピューティング指向ベンチマーク QOPTLib: a Quantum Computing Oriented Benchmark for Combinatorial Optimization Problems ( http://arxiv.org/abs/2404.15852v1 ) ライセンス: Link先を確認	Eneko Osaba, Esther Villar-Rodriguez,	(参考訳) 本稿では,組合せ最適化のための量子コンピューティング指向ベンチマークを提案する。 QOPTLibと呼ばれるこのベンチマークは、トラベルセールスマン問題、車両ルーティング問題、一次元ビンパッケージ問題、最大カット問題という4つのよく知られた問題に均等に分散した40のインスタンスで構成されている。 QOPTLibのインスタンスのサイズは、計算可能なサイズだけでなく、良い結果を得る可能性のない最大長にも対応している。この点において、ハイブリッドアプローチも考慮されている点を強調しておくことが重要である。したがって、このベンチマークは、ユーザに汎用データセットを提供する最初の取り組みである。本稿では,量子アニールに基づく2つの解法を用いたQOPTLibの完全解法について紹介する。私たちの主な目的は、新しい量子ベースのアルゴリズムによって、他の研究者がこれらの結果に勝とうとする、予備的なベースラインを確立することです。 In this paper, we propose a quantum computing oriented benchmark for combinatorial optimization. This benchmark, coined as QOPTLib, is composed of 40 instances equally distributed over four well-known problems: Traveling Salesman Problem, Vehicle Routing Problem, one-dimensional Bin Packing Problem and the Maximum Cut Problem. The sizes of the instances in QOPTLib not only correspond to computationally addressable sizes, but also to the maximum length approachable with non-zero likelihood of getting a good result. In this regard, it is important to highlight that hybrid approaches are also taken into consideration. Thus, this benchmark constitutes the first effort to provide users a general-purpose dataset. Also in this paper, we introduce a first full solving of QOPTLib using two solvers based on quantum annealing. Our main intention with this is to establish a preliminary baseline, hoping to inspire other researchers to beat these outcomes with newly proposed quantum-based algorithms.	翻訳日:2024-04-26 19:10:55 公開日:2024-04-24
# CLAD:コントラスト学習による操作攻撃に対するロバストオーディオディープフェイク検出 CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning ( http://arxiv.org/abs/2404.15854v1 ) ライセンス: Link先を確認	Haolin Wu, Jing Chen, Ruiying Du, Cong Wu, Kun He, Xingcan Shang, Hao Ren, Guowen Xu,	(参考訳) オーディオディープフェイクの普及は、重大なセキュリティ上の脅威を引き起こし、堅牢な検出方法を必要とする。既存の検出システムは将来性を示すが、悪意のあるオーディオ操作に対する堅牢性はまだ未調査である。このギャップを埋めるために、我々は最も広く採用されているオーディオディープフェイク検出器の攻撃に対する感受性について、初めて包括的な研究を行った。驚くべきことに、ボリュームコントロールのような操作でさえ、人間の知覚に影響を与えることなく、検出を著しくバイパスすることができる。そこで我々はCLAD(Contrastive Learning-based Audio Deepfake Detector)を提案する。鍵となる考え方は、操作によってもたらされる変動を最小限に抑えるために、対照的な学習を取り入れることである。さらに,特徴空間内でより密集した実音声をクラスタリングすることで,検出精度の向上を目的とした長さ損失を組み込んだ。我々は,最も広く採用されているオーディオディープフェイク検出モデルと,様々な操作攻撃に対して提案したCLADを総合的に評価した。検出モデルは脆弱性を示し、FARはそれぞれ36.69%、31.23%、そして51.28%まで上昇した。 CLADはロバスト性を高め、ノイズ注入下でFARを0.81%まで減少させ、全てのテストでFARを1.63%以下に維持した。ソースコードとドキュメントはアーティファクトリポジトリ(https://github.com/CLAD23/CLAD)で公開しています。 The increasing prevalence of audio deepfakes poses significant security threats, necessitating robust detection methods. While existing detection systems exhibit promise, their robustness against malicious audio manipulations remains underexplored. To bridge the gap, we undertake the first comprehensive study of the susceptibility of the most widely adopted audio deepfake detectors to manipulation attacks. Surprisingly, even manipulations like volume control can significantly bypass detection without affecting human perception. To address this, we propose CLAD (Contrastive Learning-based Audio deepfake Detector) to enhance the robustness against manipulation attacks. The key idea is to incorporate contrastive learning to minimize the variations introduced by manipulations, therefore enhancing detection robustness. Additionally, we incorporate a length loss, aiming to improve the detection accuracy by clustering real audios more closely in the feature space. We comprehensively evaluated the most widely adopted audio deepfake detection models and our proposed CLAD against various manipulation attacks. The detection models exhibited vulnerabilities, with FAR rising to 36.69%, 31.23%, and 51.28% under volume control, fading, and noise injection, respectively. CLAD enhanced robustness, reducing the FAR to 0.81% under noise injection and consistently maintaining an FAR below 1.63% across all tests. Our source code and documentation are available in the artifact repository (https://github.com/CLAD23/CLAD).	翻訳日:2024-04-26 19:10:55 公開日:2024-04-24
# 接続: bIt-rate mOdulatioNを通してChannel NEtwork attaCkを変換する CONNECTION: COvert chaNnel NEtwork attaCk Through bIt-rate mOdulatioN ( http://arxiv.org/abs/2404.15858v1 ) ライセンス: Link先を確認	Simone Soderi, Rocco De Nicola,	(参考訳) カバーチャネルネットワークは、組織が敵の攻撃からネットワークを保護するために設置したセキュリティ対策を回避し、よく知られた方法である。本稿では,広帯域ネットワーク上で接続されたデバイス間の被覆チャネルを実装するためのビットレート変調に基づく新しい手法を提案する。この攻撃は、マシン(シークレット送信機)から機密情報を流出させ、ネットワークのセキュリティ対策や検知システムを避けながら秘密裏にシークレットレシーバーに転送するために利用することができる。本報告では,ネットワーク情報伝達における隠蔽チャネルネットワークとその潜在的なセキュリティリスクに着目して,この脅威を実現する方法について説明する。提案手法はビットレート変調を利用しており、高いビットレートは「1」、低いビットレートは「0」であり、秘密通信を可能にする。我々は、正当性のあるトラフィックや干渉、ビットレート容量、ビットエラー率の存在下での堅牢性など、隠蔽チャネルに関連する重要な指標を分析する。実験では、この攻撃の優れた性能を実証し、5bpsに優れた頑丈さと最大0.9239bps/Hzのチャネル容量を異なるノイズ源下で達成した。そこで,ビットレート変調はネットワークのセキュリティを効果的に侵害し,機密データを侵害することを示す。 Covert channel networks are a well-known method for circumventing the security measures organizations put in place to protect their networks from adversarial attacks. This paper introduces a novel method based on bit-rate modulation for implementing covert channels between devices connected over a wide area network. This attack can be exploited to exfiltrate sensitive information from a machine (i.e., covert sender) and stealthily transfer it to a covert receiver while evading network security measures and detection systems. We explain how to implement this threat, focusing specifically on covert channel networks and their potential security risks to network information transmission. The proposed method leverages bit-rate modulation, where a high bit rate represents a '1' and a low bit rate represents a '0', enabling covert communication. We analyze the key metrics associated with covert channels, including robustness in the presence of legitimate traffic and other interference, bit-rate capacity, and bit error rate. Experiments demonstrate the good performance of this attack, which achieved 5 bps with excellent robustness and a channel capacity of up to 0.9239 bps/Hz under different noise sources. Therefore, we show that bit-rate modulation effectively violates network security and compromises sensitive data.	翻訳日:2024-04-26 19:10:55 公開日:2024-04-24
# データ主体権執行のための安全・プライバシー保護認証 Secure and Privacy-Preserving Authentication for Data Subject Rights Enforcement ( http://arxiv.org/abs/2404.15859v1 ) ライセンス: Link先を確認	Malte Hansen, Andre Büttner,	(参考訳) GDPRを考慮して、データコントローラ(DC)は、データ主体(DS)が特定のデータ対象の権利を行使できるようにする必要がある。ここでの重要な要件は、DCがDSを確実に認証できることである。明確な技術的仕様がないため、ID文書のコピー要求やメールアドレスの検証など、様々な方法で実現されている。しかし、以前の研究では、これは様々なセキュリティやプライバシのリスクと関連付けられており、DSの特定は非自明な作業である可能性があることが示されている。本稿では、異なる認証方式をレビューし、属性ベースの認証情報とeIDを利用して、独立したIDプロバイダの助けを借りてDSの認証を可能にするアーキテクチャを提案する。私たちの仕事は、DCとDSの両方に利益をもたらす、DSの認証方法の標準化とプライバシー保護に寄与します。 In light of the GDPR, data controllers (DC) need to allow data subjects (DS) to exercise certain data subject rights. A key requirement here is that DCs can reliably authenticate a DS. Due to a lack of clear technical specifications, this has been realized in different ways, such as by requesting copies of ID documents or by email address verification. However, previous research has shown that this is associated with various security and privacy risks and that identifying DSs can be a non-trivial task. In this paper, we review different authentication schemes and propose an architecture that enables DCs to authenticate DSs with the help of independent Identity Providers in a secure and privacy-preserving manner by utilizing attribute-based credentials and eIDs. Our work contributes to a more standardized and privacy-preserving way of authenticating DSs, which will benefit both DCs and DSs.	翻訳日:2024-04-26 19:10:55 公開日:2024-04-24
# グラフ状態の真の多部非局所性はモデル依存的である The genuinely multipartite nonlocality of graph states is model-dependent ( http://arxiv.org/abs/2404.15861v1 ) ライセンス: Link先を確認	Xavier Coiteux-Roy, Owidiusz Makuta, Fionnuala Curran, Remigiusz Augusiak, Marc-Olivier Renou,	(参考訳) ベルの定理は、いくつかの量子状態相関は古典的でない資源でしか説明できないことを証明している。真に多部的な非局所性(GMNL)の概念は、後に、2つ以上の非古典的資源が非自明な方法で関係しているという事実を概念化するために導入された。本稿ではまず,GMNLの歴史的定義に固有の矛盾を思い出す。第二に、我々はその再定義の一つ、Local-Operations-and-Shared-Randomness GMNL (LOSR-GMNL) に目を向け、全ての母子グラフ状態(クラスター状態を含む)がこの2番目の性質を持つことを示した。最後に,局所操作通信GMNL(Local-Operations-and-Neighbour-Communication GMNL,LONC-GMNL)と呼ばれる,一部の当事者間の短距離通信が生じる可能性のある状況に適応した第3の代替定義を概念化する。クラスター状態は第3の性質を持たないが、GHZ状態はそうである。その技術的内容以外にも、実験的に生成した量子システムの非古典性を評価するために、真のマルチパーティリート非局所性、真のマルチパーティリート絡み込み、あるいは絡み込み深さといった概念を適用する前に、厳密な概念的な作業が必要であることを、我々の書簡は示している。実験的な研究の多くは、これらの概念の歴史的定義に基づいて証人をいまだに用いているが、これは二部類資源に基づくモデルを拒否したことに失敗したことに留意する。 Bell's theorem proves that some quantum state correlations can only be explained by bipartite non-classical resources. The notion of genuinely multipartite nonlocality (GMNL) was later introduced to conceptualize the fact that nonclassical resources involving more than two parties in a nontrivial way may be needed to account for some quantum correlations. In this letter, we first recall the contradictions inherent to the historical definition of GMNL. Second, we turn to one of its redefinitions, called Local-Operations-and-Shared-Randomness GMNL (LOSR-GMNL), proving that all caterpillar graph states (including cluster states) have this second property. Finally, we conceptualize a third, alternative definition, which we call Local-Operations-and-Neighbour-Communication GMNL (LONC-GMNL), that is adapted to situations in which short-range communication between some parties might occur. We show that cluster states do not have this third property, while GHZ states do. Beyond its technical content, our letter illustrates that rigorous conceptual work is needed before applying the concepts of genuinely multipartite nonlocality, genuine multipartite entanglement or entanglement depth to benchmark the nonclassicality of some experimentally-produced quantum system. We note that most experimental works still use witnesses based on the historical definitions of these notions, which fail to reject models based on bipartite resources.	翻訳日:2024-04-26 19:10:55 公開日:2024-04-24
# Trncated quantum observablesとその半古典的極限 Truncated quantum observables and their semiclassical limit ( http://arxiv.org/abs/2404.15863v1 ) ライセンス: Link先を確認	Fabio Deelan Cunden, Marilena Ligabò, Maria Caterina Susca,	(参考訳) 量子可観測量$H$ truncated on the range of orthogonal projections $\Pi_N$ of rank $N$, we study the corresponding Weyl symbol in the semiclassical limit in the semiclassical limit of vanishing Planck constant $\hbar\to0$ and large quantum number $N\to\infty$, with $\hbar N$。ある仮定の下では、位相空間の古典的に許容される領域上で、ワイル記号の(一般には不連続である)記号への$L^2$-収束を証明する。一般定理の図解として、調和振動子と1次元箱内の自由粒子に対して、切り離された可観測物を分析する。後者の場合、古典的に許された領域の境界付近のシンボルの顕微鏡的点制限も計算する。 For quantum observables $H$ truncated on the range of orthogonal projections $\Pi_N$ of rank $N$, we study the corresponding Weyl symbol in the phase space in the semiclassical limit of vanishing Planck constant $\hbar\to0$ and large quantum number $N\to\infty$, with $\hbar N$ fixed. Under certain assumptions, we prove the $L^2$- convergence of the Weyl symbols to a symbol truncated (hence, in general discontinuous) on the classically permitted region in phase space. As an illustration of the general theorems we analyse truncated observables for the harmonic oscillator and for a free particle in a one-dimensional box. In the latter case, we also compute the microscopic pointwise limit of the symbols near the boundary of the classically permitted region.	翻訳日:2024-04-26 19:10:55 公開日:2024-04-24
# LLM支援インテントベース5Gコアネットワーク管理とオーケストレーションのためのセマンティックルーティング Semantic Routing for Enhanced Performance of LLM-Assisted Intent-Based 5G Core Network Management and Orchestration ( http://arxiv.org/abs/2404.15869v1 ) ライセンス: Link先を確認	Dimitrios Michael Manias, Ali Chouman, Abdallah Shami,	(参考訳) 大規模言語モデル(LLM)は、人工知能(AI)アプリケーション、特に自然言語処理と生成AIの分野で急速に普及している。テキスト生成アプリケーションに限らず、これらのモデルにはプロンプトエンジニアリングを利用する機会があり、そのようなモデルの入力を適切に構造化して、モデルの目的を明確に表現することができる。この顕著な例は、ネットワーク操作と管理の自動化とメンテナンスのための新しいアプローチであるインテントベースのネットワーキングである。本稿では,LLMによる5Gコアネットワークのインテントベース管理とオーケストレーションにおけるセマンティックルーティングの実現について述べる。本研究は,エンド・ツー・エンドの意図抽出フレームワークを構築し,エンコーダの効果を徹底的に分析し,システム全体の性能を定量化するとともに,サンプルユーザ意図の多様なデータセットを提示する。その結果, セマンティックルータを用いることで, アーキテクチャを推し進めるスタンドアロンのLCMに比べて, LLM配置の精度と効率が向上することがわかった。 Large language models (LLMs) are rapidly emerging in Artificial Intelligence (AI) applications, especially in the fields of natural language processing and generative AI. Not limited to text generation applications, these models inherently possess the opportunity to leverage prompt engineering, where the inputs of such models can be appropriately structured to articulate a model's purpose explicitly. A prominent example of this is intent-based networking, an emerging approach for automating and maintaining network operations and management. This paper presents semantic routing to achieve enhanced performance in LLM-assisted intent-based management and orchestration of 5G core networks. This work establishes an end-to-end intent extraction framework and presents a diverse dataset of sample user intents accompanied by a thorough analysis of the effects of encoders and quantization on overall system performance. The results show that using a semantic router improves the accuracy and efficiency of the LLM deployment compared to stand-alone LLMs with prompting architectures.	翻訳日:2024-04-26 19:10:55 公開日:2024-04-24
# 混合型古典位相空間を持つキックトトップモデルにおける混合固有状態の分数のパワーロッド崩壊」への加算 Addendum to "Power-law decay of the fraction of the mixed eigenstates in kicked top model with mixed-type classical phase space" ( http://arxiv.org/abs/2404.15874v1 ) ライセンス: Link先を確認	Hua Yan, Qian Wang, Marko Robnik,	(参考訳) クリャロフ部分空間法を用いて、スピンコヒーレント状態を生成することにより、量子カオスを研究するためのプロトタイプモデル、固有状態のフシミ関数を研究するためのアクセス可能なシステムサイズは、文献や我々の以前の研究であるPhysよりもはるかに大きい。 E 108, 054217 (2023) [arXiv:2308.04824] 完全にカオス化されたトップでは、平均Wehrlエントロピーの局所化測度が円ユニタリアンサンブルの予測に近づくことが分かる。混合型の場合、古典的コンパクト位相空間におけるフシミ関数と正則領域とカオス領域の重なりによる混合固有状態の同定を行う。数値的に、混合固有状態の分数は$j^{-\zeta}$としてスケールし、システムサイズが$j$になるにつれて、ほぼ2桁のスケールでパワー・ローの減衰が増加する。これは、フシミ函数の一様半古典的凝縮の原理と半古典的極限におけるベリー・ロブニク図形を裏付ける証拠を与える。 By using the Krylov subspace technique to generate the spin coherent states in kicked top model, a prototype model for studying quantum chaos, the accessible system size for studying the Husimi functions of eigenstates can be much larger than that reported in the literature and our previous study Phys. Rev. E 108, 054217 (2023) [arXiv:2308.04824]. In the fully chaotic kicked top, we find that the mean Wehrl entropy localization measure approaches the prediction given by the Circular Unitary Ensemble. In the mixed-type case, we identify mixed eigenstates by the overlap of the Husimi function with regular and chaotic regions in classical compact phase space. Numerically, we show that the fraction of mixed eigenstates scales as $j^{-\zeta}$, a power-law decay as the system size $j$ increases, across nearly two orders of magnitude. This provides supporting evidence for the principle of uniform semiclassical condensation of Husimi functions and the Berry-Robnik picture in the semiclassical limit.	翻訳日:2024-04-26 19:10:55 公開日:2024-04-24
# 摂動マスキングに基づく効果的な教師なし制約テキスト生成 Effective Unsupervised Constrained Text Generation based on Perturbed Masking ( http://arxiv.org/abs/2404.15877v1 ) ライセンス: Link先を確認	Yingwen Fu, Wenjie Ou, Zhou Yu, Yue Lin,	(参考訳) 教師なし制約付きテキスト生成は、教師付きデータなしで与えられた制約セットの下でテキストを生成することを目的としている。現在の最先端の手法は、編集位置と動作を確率的にサンプリングし、不要な探索ステップを引き起こす可能性がある。本稿では,各ステップにおける最適な編集位置と動作を探索することで,効率を向上させるPMCTGを提案する。具体的には、PMCTGは摂動マスク技術を拡張して、編集する最も一貫性のないトークンを効果的に検索する。次に、4つのマルチアスペクトスコアリング機能を導入し、編集アクションを選択して検索の難しさをさらに軽減する。 PMCTGは教師付きデータを必要としないため、異なる生成タスクに適用することができる。 PMCTGは,教師なし環境下で,キーワード・文生成とパラフレーズ生成という2つの代表的なタスクにおいて,新たな最先端結果を実現する。 Unsupervised constrained text generation aims to generate text under a given set of constraints without any supervised data. Current state-of-the-art methods stochastically sample edit positions and actions, which may cause unnecessary search steps. In this paper, we propose PMCTG to improve effectiveness by searching for the best edit position and action in each step. Specifically, PMCTG extends perturbed masking technique to effectively search for the most incongruent token to edit. Then it introduces four multi-aspect scoring functions to select edit action to further reduce search difficulty. Since PMCTG does not require supervised data, it could be applied to different generation tasks. We show that under the unsupervised setting, PMCTG achieves new state-of-the-art results in two representative tasks, namely keywords-to-sentence generation and paraphrasing.	翻訳日:2024-04-26 19:10:55 公開日:2024-04-24
# 超伝導量子プロセッサ上での非定常流体流動のシミュレーション Simulating unsteady fluid flows on a superconducting quantum processor ( http://arxiv.org/abs/2404.15878v1 ) ライセンス: Link先を確認	Zhaoyuan Meng, Jiarun Zhong, Shibo Xu, Ke Wang, Jiachen Chen, Feitong Jin, Xuhao Zhu, Yu Gao, Yaozu Wu, Chuanyu Zhang, Ning Wang, Yiren Zou, Aosai Zhang, Zhengyi Cui, Fanhao Shen, Zehang Bao, Zitian Zhu, Ziqi Tan, Tingting Li, Pengfei Zhang, Shiying Xiong, Hekang Li, Qiujiang Guo, Zhen Wang, Chao Song, H. Wang, Yue Yang,	(参考訳) 近年の中間スケール量子プロセッサの進歩は、実用的な量子優位性の探索に多大な関心を惹き付けている。流体力学のシミュレーションは古典物理学において非常に難しい問題であるが、実用化には不可欠である。本稿では、超伝導量子プロセッサを用いて、量子符号化、進化、流れ状態の検出からなる非定常流れのディジタルシミュレーション実験を報告する。量子アルゴリズムは、シュリンガー方程式の流体力学的定式化を用いたハミルトニアンシミュレーションに基づいている。平行1量子ゲートと2量子ビットゲートの中央値の99.97%と99.67%で、2次元(2次元)圧縮性分岐流と10量子ビットの2次元減衰渦のダイナミクスをシミュレートする。実験結果は, 平均密度と運動量分布の時間的変化をよく捉え, 適度な雑音を伴う空間流場を定性的に再現した。この研究は、現実的な応用のために乱流のようなより複雑な流れをシミュレートする量子コンピューティングの可能性を示す。 Recent advancements of intermediate-scale quantum processors have triggered tremendous interest in the exploration of practical quantum advantage. The simulation of fluid dynamics, a highly challenging problem in classical physics but vital for practical applications, emerges as a good candidate for showing quantum utility. Here, we report an experiment on the digital simulation of unsteady flows, which consists of quantum encoding, evolution, and detection of flow states, with a superconducting quantum processor. The quantum algorithm is based on the Hamiltonian simulation using the hydrodynamic formulation of the Schr\"odinger equation. With the median fidelities of 99.97% and 99.67% for parallel single- and two-qubit gates respectively, we simulate the dynamics of a two-dimensional (2D) compressible diverging flow and a 2D decaying vortex with ten qubits. The experimental results well capture the temporal evolution of averaged density and momentum profiles, and qualitatively reproduce spatial flow fields with moderate noises. This work demonstrates the potential of quantum computing in simulating more complex flows, such as turbulence, for practical applications.	翻訳日:2024-04-26 19:01:10 公開日:2024-04-24
# LiDARを用いた3次元物体検出における分布外検出の再検討 Revisiting Out-of-Distribution Detection in LiDAR-based 3D Object Detection ( http://arxiv.org/abs/2404.15879v1 ) ライセンス: Link先を確認	Michael Kösel, Marcel Schreiber, Michael Ulrich, Claudius Gläser, Klaus Dietmayer,	(参考訳) LiDARベースの3Dオブジェクト検出は、オブジェクトを正確に3Dでローカライズし分類する能力によって、自動走行の重要な部分となっている。しかし、オブジェクト検出器は、未知のフォアグラウンドオブジェクト、特に元のトレーニングデータに存在しないオブジェクトを扱う場合、重要な課題に直面している。これらのアウト・オブ・ディストリビューション(OOD)オブジェクトは誤分類を引き起こし、自動車両の安全性と信頼性に重大なリスクをもたらす。現在、LiDARを用いたOODオブジェクト検出は十分に研究されていない。我々は、OODオブジェクトの合成学習データを生成し、既知のオブジェクトカテゴリを摂動することで、この問題に対処する。我々の考えでは、これらの合成OODオブジェクトは、分布内(ID)オブジェクトと比較して、対象検出器の特徴マップで異なる応答を生成する。次に、事前訓練された固定オブジェクト検出器を用いて特徴を抽出し、単純な多層パーセプトロン(MLP)を訓練し、各検出をIDまたはOODとして分類する。さらに,ポイントクラウドを変更せずに既存のデータセットを使用できる新しい評価プロトコルを提案し,現実のシナリオをより確実に評価する。提案手法の有効性は,新たに提案したnuScenes OODベンチマークを用いて検証した。ソースコードはhttps://github.com/uulm-mrm/mmood3d.comで入手できる。 LiDAR-based 3D object detection has become an essential part of automated driving due to its ability to localize and classify objects precisely in 3D. However, object detectors face a critical challenge when dealing with unknown foreground objects, particularly those that were not present in their original training data. These out-of-distribution (OOD) objects can lead to misclassifications, posing a significant risk to the safety and reliability of automated vehicles. Currently, LiDAR-based OOD object detection has not been well studied. We address this problem by generating synthetic training data for OOD objects by perturbing known object categories. Our idea is that these synthetic OOD objects produce different responses in the feature map of an object detector compared to in-distribution (ID) objects. We then extract features using a pre-trained and fixed object detector and train a simple multilayer perceptron (MLP) to classify each detection as either ID or OOD. In addition, we propose a new evaluation protocol that allows the use of existing datasets without modifying the point cloud, ensuring a more authentic evaluation of real-world scenarios. The effectiveness of our method is validated through experiments on the newly proposed nuScenes OOD benchmark. The source code is available at https://github.com/uulm-mrm/mmood3d.	翻訳日:2024-04-26 19:01:10 公開日:2024-04-24
# 振動解析を用いた飛行前UAVロータ欠陥検出のための機械学習 Machine Learning for Pre/Post Flight UAV Rotor Defect Detection Using Vibration Analysis ( http://arxiv.org/abs/2404.15880v1 ) ライセンス: Link先を確認	Alexandre Gemayel, Dimitrios Michael Manias, Abdallah Shami,	(参考訳) 無人航空機(UAV)は将来のスマートシティにとって重要なインフラ要素となるだろう。効率的な運用のためには、UAVの信頼性は障害や故障の常時監視によって保証されなければならない。この目的のために,本論文では,信号処理と機械学習を用いて,包括的振動解析データの解析を行い,飛行前および飛行後におけるローターブレードの欠陥の有無を判定する。次元減少技術の助けを借りて、ランダムフォレストアルゴリズムは最高の性能を示し、欠陥のあるローターブレードを完璧に検出した。さらに、様々な特徴部分集合の影響を包括的に分析し、モデルの分類決定プロセスに影響を与える要因について考察する。 Unmanned Aerial Vehicles (UAVs) will be critical infrastructural components of future smart cities. In order to operate efficiently, UAV reliability must be ensured by constant monitoring for faults and failures. To this end, the work presented in this paper leverages signal processing and Machine Learning (ML) methods to analyze the data of a comprehensive vibrational analysis to determine the presence of rotor blade defects during pre and post-flight operation. With the help of dimensionality reduction techniques, the Random Forest algorithm exhibited the best performance and detected defective rotor blades perfectly. Additionally, a comprehensive analysis of the impact of various feature subsets is presented to gain insight into the factors affecting the model's classification decision process.	翻訳日:2024-04-26 19:01:10 公開日:2024-04-24
# 現在とその後の攻撃:ブラックボックス攻撃に対する物体検出のロバスト性の評価 Steal Now and Attack Later: Evaluating Robustness of Object Detection against Black-box Adversarial Attacks ( http://arxiv.org/abs/2404.15881v1 ) ライセンス: Link先を確認	Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung, Che-Rung Lee,	(参考訳) オブジェクト検出に対する遅延攻撃は、ターゲット画像に追加のゴーストオブジェクトを生成して推論時間をインフレーションすることを目的とした、敵攻撃の一種である。しかしながら、ブラックボックスのシナリオでゴーストオブジェクトを生成することは、これらの資格のないオブジェクトに関する情報が不透明であるため、依然として課題である。本研究では, 敵の事例にゴーストオブジェクトを生成できる可能性について, 「現在, 復号化」という概念を拡張して示す。これらの敵対的な例は、一度生成されると、AIサービスの潜在的な脆弱性を悪用するために使用され、重大なセキュリティ上の懸念を引き起こす。実験結果から,提案した攻撃は,対象モデルに関する事前知識を必要とせずに,様々な一般的なモデルとGoogle Vision APIをまたいだ攻撃を成功させることが示された。さらに、攻撃の平均コストは1ドル以下で、AIセキュリティに重大な脅威をもたらす。 Latency attacks against object detection represent a variant of adversarial attacks that aim to inflate the inference time by generating additional ghost objects in a target image. However, generating ghost objects in the black-box scenario remains a challenge since information about these unqualified objects remains opaque. In this study, we demonstrate the feasibility of generating ghost objects in adversarial examples by extending the concept of "steal now, decrypt later" attacks. These adversarial examples, once produced, can be employed to exploit potential vulnerabilities in the AI service, giving rise to significant security concerns. The experimental results demonstrate that the proposed attack achieves successful attacks across various commonly used models and Google Vision API without any prior knowledge about the target model. Additionally, the average cost of each attack is less than \$ 1 dollars, posing a significant threat to AI security.	翻訳日:2024-04-26 19:01:10 公開日:2024-04-24
# スプリット・インデックス行列積状態における非オンサイト対称性と量子テレポーテーション Non-onsite symmetries and quantum teleportation in split-index matrix product states ( http://arxiv.org/abs/2404.15883v1 ) ライセンス: Link先を確認	David T. Stephen,	(参考訳) 我々は、新しい物理的および計算的性質を持つスピン鎖のクラスを記述する。物理的側面において、スピン鎖は非オンサイト対称性によって定義される対称性で保護された位相位相の例を与える。これらの位相は弦順パラメータによって検出できるが、特に絡み合いスペクトルの縮退は示さない。計算側では、スピン鎖は、必要な古典的側処理が測定結果の非線形関数であるという新しい性質により、長距離にわたって決定論的に情報をテレポートするために使用できる新しい種類の状態を表す。また、測定に基づく量子計算の普遍的な資源として機能しうる状態の例を示し、絡み合いスペクトルの縮退を伴わずにそのような資源の最初の例を提供する。我々の分析における重要なツールは、スプリットインデックス行列積状態(SIMPS)と呼ばれる新しいテンソルネットワーク表現である。我々はSIMPSの基本形式を開発し、それらを行列積状態と比較し、特定の非オンサイト対称性や量子テレポーテーションを記述するための装置がいかに優れているかを示し、それらが制約されたスピン鎖を記述するのにどのように適しているかを議論する。 We describe a class of spin chains with new physical and computational properties. On the physical side, the spin chains give examples of symmetry-protected topological phases that are defined by non-onsite symmetries, i.e. symmetries that are not a tensor product of single-site operators. These phases can be detected by string-order parameters, but notably do not exhibit entanglement spectrum degeneracy. On the computational side, the spin chains represent a new class of states that can be used to deterministically teleport information across long distances, with the novel property that the necessary classical side processing is a non-linear function of the measurement outcomes. We also give examples of states that can serve as universal resources for measurement-based quantum computation, providing the first examples of such resources without entanglement spectrum degeneracy. The key tool in our analysis is a new kind of tensor network representation which we call split-index matrix product states (SIMPS). We develop the basic formalism of SIMPS, compare them to matrix product states, show how they are better equipped to describe certain kinds of non-onsite symmetries and quantum teleportation, and discuss how they are also well-suited to describing constrained spin chains.	翻訳日:2024-04-26 19:01:10 公開日:2024-04-24
# 地域エネルギー市場におけるプライバシ保護請求(Long Version) Privacy-Preserving Billing for Local Energy Markets (Long Version) ( http://arxiv.org/abs/2404.15886v1 ) ライセンス: Link先を確認	Eman Alqahtani, Mustafa A. Mustafa,	(参考訳) 本稿では,地域エネルギー市場(PBP-LEMs)に対するプライバシ保護請求プロトコルを提案する。 PBP-LEMは、市場団体が参加者の請求書を、正当性を犠牲にすることなく、分散的かつプライバシー保護的な方法で共同で計算することを可能にする。また、内部共謀の可能性から生じる個人のプライバシーに対するリスクを軽減している。まず,ビルディングブロックとして機能する情報理論のセキュリティを実現する,新しい,効率的で,プライバシ保護の個別請求方式を提案する。 PBP-LEMは、マルチパーティ計算、Pedersenのコミットメント、内部製品機能暗号化といった他の手法とともに、データの機密性と正確性を保証するためにこの方式を利用している。さらに、我々は3つのアプローチを提案し、結果としてプライバシーとパフォーマンスのレベルが異なる。このプロトコルがセキュリティとプライバシの要件を満たしていることを証明し、実際のLEMへのデプロイを可能にする。また、本分析では、全体的な性能の変動も示し、適用されたアプローチに基づいてオーバーヘッドが集中している領域を特定する。 We propose a privacy-preserving billing protocol for local energy markets (PBP-LEMs) that takes into account market participants' energy volume deviations from their bids. PBP-LEMs enables a group of market entities to jointly compute participants' bills in a decentralized and privacy-preserving manner without sacrificing correctness. It also mitigates risks on individuals' privacy arising from any potential internal collusion. We first propose a novel, efficient, and privacy-preserving individual billing scheme, achieving information-theoretic security, which serves as a building block. PBP-LEMs utilizes this scheme, along with other techniques such as multiparty computation, Pedersen commitments and inner product functional encryption, to ensure data confidentiality and accuracy. Additionally, we present three approaches, resulting in different levels of privacy and performance. We prove that the protocol meets its security and privacy requirements and is feasible for deployment in real LEMs. Our analysis also shows variations in overall performance and identifies areas where overhead is concentrated based on the applied approach.	翻訳日:2024-04-26 19:01:10 公開日:2024-04-24
# Sketch2Human: 絡み合った幾何学と外観制御を備えた深部人材育成 Sketch2Human: Deep Human Generation with Disentangled Geometry and Appearance Control ( http://arxiv.org/abs/2404.15889v1 ) ライセンス: Link先を確認	Linzi Qu, Jiaxiang Shang, Hui Ye, Xiaoguang Han, Hongbo Fu,	(参考訳) 幾何学的および外観制御されたフルボディ画像生成は興味深いが難しい課題である。既存のソリューションは、無条件または粗い条件(例えば、ポーズ、テキスト)に依存しているため、明示的な幾何学と体と衣服の外観制御が欠如している。スケッチはそのような編集機能を提供し、様々なスケッチベースの顔生成および編集ソリューションで採用されている。しかしながら、スケッチベースの顔生成をフルボディ生成に直接適応させると、ポーズ、体型、衣服の形、テクスチャの複雑さと多様性のために、高忠実で多様な結果が得られないことが多い。最近の幾何学的に制御可能な拡散法は主に外見を生成するプロンプトに依存しており、入力が粗い場合の現実主義と結果の忠実さのバランスをとることは困難である。本研究はSketch2Humanについて述べる。Sketch2Humanは、セマンティックスケッチ(幾何学制御用)と参照イメージ(外観制御用)でガイドされる、フルボディの人体画像生成を制御可能な最初のシステムである。我々の解は、逆幾何と出現潜時符号を入力とするStyleGAN-Humanの潜時空間に基づいている。具体的には,StyleGAN-Humanの潜伏空間からサンプル化した大規模な合成データセットを用いて訓練されたスケッチエンコーダについて述べる。そこで我々は,StyleGAN-Humanにおける部分幾何学とテクスチャの絡み合った情報と,非絡み合ったデータセットが存在しないことを考慮し,ジオメトリー保存および外観変換によるトレーニングデータを作成し,非絡み合った幾何学と外観制御を実現するための新しいトレーニングスキームを設計する。本手法は合成データを用いて訓練されているが,手描きスケッチも扱える。定性的および定量的評価は,本手法の最先端手法に対する優れた性能を示すものである。 Geometry- and appearance-controlled full-body human image generation is an interesting but challenging task. Existing solutions are either unconditional or dependent on coarse conditions (e.g., pose, text), thus lacking explicit geometry and appearance control of body and garment. Sketching offers such editing ability and has been adopted in various sketch-based face generation and editing solutions. However, directly adapting sketch-based face generation to full-body generation often fails to produce high-fidelity and diverse results due to the high complexity and diversity in the pose, body shape, and garment shape and texture. Recent geometrically controllable diffusion-based methods mainly rely on prompts to generate appearance and it is hard to balance the realism and the faithfulness of their results to the sketch when the input is coarse. This work presents Sketch2Human, the first system for controllable full-body human image generation guided by a semantic sketch (for geometry control) and a reference image (for appearance control). Our solution is based on the latent space of StyleGAN-Human with inverted geometry and appearance latent codes as input. Specifically, we present a sketch encoder trained with a large synthetic dataset sampled from StyleGAN-Human's latent space and directly supervised by sketches rather than real images. Considering the entangled information of partial geometry and texture in StyleGAN-Human and the absence of disentangled datasets, we design a novel training scheme that creates geometry-preserved and appearance-transferred training data to tune a generator to achieve disentangled geometry and appearance control. Although our method is trained with synthetic data, it can handle hand-drawn sketches as well. Qualitative and quantitative evaluations demonstrate the superior performance of our method to state-of-the-art methods.	翻訳日:2024-04-26 19:01:10 公開日:2024-04-24
# 臨床QAにおける中規模言語モデルの可能性の評価 Assessing The Potential Of Mid-Sized Language Models For Clinical QA ( http://arxiv.org/abs/2404.15894v1 ) ライセンス: Link先を確認	Elliot Bolton, Betty Xiong, Vijaytha Muralidharan, Joel Schamroth, Vivek Muralidharan, Christopher D. Manning, Roxana Daneshjou,	(参考訳) GPT-4 や Med-PaLM のような大規模言語モデルは、臨床上のタスクにおいて顕著なパフォーマンスを示しているが、それらは計算へのアクセスを必要とし、クローズソースであり、デバイスにデプロイすることができない。 BioGPT-large、BioMedLM、LLaMA 2、Mistral 7Bのような中型モデルはこれらの欠点を回避しているが、臨床業務の能力は検討されている。臨床利用の可能性を評価し,どのモデルを使うべきかを研究者が決定するのを助けるために,臨床質問応答(QA)の2つのタスク,MedQAとコンシューマクエリ応答を比較した。 Mistral 7Bは、すべてのベンチマークで優勝し、バイオメディカルドメイン向けに訓練されたモデルよりも優れています。 Mistral 7B の MedQA スコアは 63.0% で、オリジナルの Med-PaLM に近づき、コンシューマー向けヘルスクエリに対するもっともらしい応答を生成することができるが、改善の余地はまだ残っている。本研究は,臨床業務におけるオープンソース中規模モデルの初回評価を行う。 Large language models, such as GPT-4 and Med-PaLM, have shown impressive performance on clinical tasks; however, they require access to compute, are closed-source, and cannot be deployed on device. Mid-size models such as BioGPT-large, BioMedLM, LLaMA 2, and Mistral 7B avoid these drawbacks, but their capacity for clinical tasks has been understudied. To help assess their potential for clinical use and help researchers decide which model they should use, we compare their performance on two clinical question-answering (QA) tasks: MedQA and consumer query answering. We find that Mistral 7B is the best performing model, winning on all benchmarks and outperforming models trained specifically for the biomedical domain. While Mistral 7B's MedQA score of 63.0% approaches the original Med-PaLM, and it often can produce plausible responses to consumer health queries, room for improvement still exists. This study provides the first head-to-head assessment of open source mid-sized models on clinical tasks.	翻訳日:2024-04-26 19:01:10 公開日:2024-04-24
# 暗号通貨規制のグローバルトレンド:概観 Global Trends in Cryptocurrency Regulation: An Overview ( http://arxiv.org/abs/2404.15895v1 ) ライセンス: Link先を確認	Xihan Xiong, Junliang Luo,	(参考訳) 暗号通貨は重要な資産クラスへと発展し、様々な利点を提供している。しかし、市場ボラティリティや違法行為における誤用の可能性など、重大なリスクも生じている。これらのリスクは、消費者の保護、市場の整合性、金融安定を確保するための包括的な規制枠組みの緊急の必要性を浮き彫りにしている。しかし、暗号通貨規制の世界的な状況は依然として複雑であり、各国の規制枠組みが大幅に変化していることが特徴である。本研究の目的は,様々な管轄区域の規制環境を調査することで,これらの違いを解明することである。まず、規制の課題と考察を議論し、その後、国際規制のスタンス、アプローチ、措置の比較分析を行う。我々の研究は、暗号通貨規制におけるグローバルなトレンドの理解を高めるための実践的な洞察を提供してくれることを願っている。 Cryptocurrencies have evolved into an important asset class, providing a variety of benefits. However, they also present significant risks, such as market volatility and the potential for misuse in illegal activities. These risks underline the urgent need for a comprehensive regulatory framework to ensure consumer protection, market integrity, and financial stability. Yet, the global landscape of cryptocurrency regulation remains complex, marked by substantial variations in regulatory frameworks among different countries. This paper aims to study these differences by investigating the regulatory landscapes across various jurisdictions. We first discuss regulatory challenges and considerations, and then conduct a comparative analysis of international regulatory stances, approaches, and measures. We hope our study offers practical insights to enhance the understanding of global trends in cryptocurrency regulation.	翻訳日:2024-04-26 19:01:10 公開日:2024-04-24
# パラメトリック近似を超えた駆動散逸ダウンコンバージョンシステムにおける量子メロロジー Quantum metrology in a driven-dissipation down-conversion system beyond the parametric approximation ( http://arxiv.org/abs/2404.15898v1 ) ライセンス: Link先を確認	Dong Xie, Chunling Xu,	(参考訳) ポンプモードと2つの縮退信号モードからなる縮退型ダウンコンバージョンシステムにおける量子メロロジーについて検討する。従来のパラメトリック近似では、ポンプモードは量子演算子ではなく定数であると仮定される。パラメトリック近似を超える2つの退化信号モードとポンプモードの結合強度の測定精度を得る。散逸がなければ、初期状態が古典状態と量子状態の直積であるときに超ハイゼンベルク極限が得られる。これは、準備が簡単でない絡み合いリソースの使用を必要としない。ポンプモードが単光子散逸に苦しむ場合、結合強度がコヒーレント駆動により0に近づくにつれて、結合強度の測定不確かさは0に近くなる。直接光子検出は最適な測定方法であることが証明された。この結果は、信号モードが2光子散逸に苦しむときに変わっていない。信号モードも単一モードの消散に苦しむ場合、結合強度に関する情報は定常状態で得られる。さらに、結合強度の測定の不確実性も0に近づき、通常の放射相と超放射相の臨界点としてノイズ温度に依存する。最後に、運転強度を測定するための正確な量子センサとして、駆動散逸ダウンコンバージョンシステムを用いることができることを示す。 We investigate quantum metrology in a degenerate down-conversion system composed of a pump mode and two degenerate signal modes. In the conventional parametric approximation, the pump mode is assumed to be constant, not a quantum operator. We obtain the measurement precision of the coupling strength between the pump mode and two degenerate signal modes beyond the parametric approximation. Without a dissipation, the super-Heisenberg limit can be obtained when the initial state is the direct product of classical state and quantum state. This does not require the use of entanglement resources which are not easy to prepare. When the pump mode suffers from a single-photon dissipation, the measurement uncertainty of the coupling strength is close to 0 as the coupling strength approaches 0 with a coherent driving. The direct photon detection is proved to be the optimal measurement. This result has not been changed when the signal modes suffer from the two-photon dissipation. When the signal modes also suffer from the single-mode dissipation, the information of the coupling strength can still be obtained in the steady state. In addition, the measurement uncertainty of the coupling strength can also be close to 0 and become independent of noise temperature as the critical point between the normal and superradiance phase approaches. Finally, we show that a driven-dissipation down-conversion system can be used as a precise quantum sensor to measure the driving strength.	翻訳日:2024-04-26 19:01:10 公開日:2024-04-24
# ST-MambaSync: 降雨量予測のためのマンバ構造と時空間変圧器の対応 ST-MambaSync: The Confluence of Mamba Structure and Spatio-Temporal Transformers for Precipitous Traffic Prediction ( http://arxiv.org/abs/2404.15899v1 ) ライセンス: Link先を確認	Zhiqi Shao, Xusheng Yao, Ze Wang, Junbin Gao,	(参考訳) 計算効率と精度のバランスをとることは、特に時空間データセットのような高次元データを扱う場合、機械学習において最重要である。本研究はST-MambaSyncについて紹介する。ST-MambaSyncは、合理化された注意層と単純化された状態空間層を統合する革新的なフレームワークである。このモデルは時空間予測タスクにおける競合精度を実現する。我々は、注意機構とマンバ成分の関係を掘り下げ、マンバ関数が残留ネットワーク構造内の注意に類似していることを明らかにする。この比較分析により、状態空間モデルの効率が向上し、計算コストの削減による優れた性能を実現する能力が解明される。 Balancing accuracy with computational efficiency is paramount in machine learning, particularly when dealing with high-dimensional data, such as spatial-temporal datasets. This study introduces ST-MambaSync, an innovative framework that integrates a streamlined attention layer with a simplified state-space layer. The model achieves competitive accuracy in spatial-temporal prediction tasks. We delve into the relationship between attention mechanisms and the Mamba component, revealing that Mamba functions akin to attention within a residual network structure. This comparative analysis underpins the efficiency of state-space models, elucidating their capability to deliver superior performance at reduced computational costs.	翻訳日:2024-04-26 19:01:10 公開日:2024-04-24
# 線引き:古代エトルリアの鏡からアートを抽出する深層画 Drawing the Line: Deep Segmentation for Extracting Art from Ancient Etruscan Mirrors ( http://arxiv.org/abs/2404.15903v1 ) ライセンス: Link先を確認	Rafael Sterzinger, Simon Brenner, Robert Sablatnig,	(参考訳) エトルリアの鏡はエトルリアの芸術において重要なカテゴリーであり、それゆえ、古代についての洞察を得るために体系的な調査が行われている。彼らの分析の重要な側面は、裏面から手動で彫刻をトレースする労働集約的な作業である。さらに、このタスクは、これらのミラーが持続する損傷のために本質的に困難であり、プロセスに主観性を導入する。これらの課題に対処するためには,手元にある制限データの有効利用を必要とするディープセグメンテーションネットワークと連携して,測光ステレオスキャンによるプロセスの自動化を行う。我々は、パッチ単位の予測と様々なデータ拡張、および自己教師型学習を取り入れることで、これを実現する。ベースラインと比較して,擬似F-Measureの予測性能を約16%向上させる。ヒトのベースラインに対して完全なミラーの性能を評価する際に,人間のアノテータと定量的に類似した性能を示し,既存のバイナライゼーション法を著しく上回る性能を示した。提案手法では,アノテーションのプロセスの合理化,客観性の向上,作業負荷の削減を図り,これらの歴史的遺物や非伝統的文書の検証に貴重な貢献をする。 Etruscan mirrors constitute a significant category within Etruscan art and, therefore, undergo systematic examinations to obtain insights into ancient times. A crucial aspect of their analysis involves the labor-intensive task of manually tracing engravings from the backside. Additionally, this task is inherently challenging due to the damage these mirrors have sustained, introducing subjectivity into the process. We address these challenges by automating the process through photometric-stereo scanning in conjunction with deep segmentation networks which, however, requires effective usage of the limited data at hand. We accomplish this by incorporating predictions on a per-patch level, and various data augmentations, as well as exploring self-supervised learning. Compared to our baseline, we improve predictive performance w.r.t. the pseudo-F-Measure by around 16%. When assessing performance on complete mirrors against a human baseline, our approach yields quantitative similar performance to a human annotator and significantly outperforms existing binarization methods. With our proposed methodology, we streamline the annotation process, enhance its objectivity, and reduce overall workload, offering a valuable contribution to the examination of these historical artifacts and other non-traditional documents.	翻訳日:2024-04-26 19:01:10 公開日:2024-04-24
# 周波数可変フォック状態生成のための量子光のハイブリッド光源 A hybrid source of quantum light for generation of frequency tunable Fock states ( http://arxiv.org/abs/2404.15908v1 ) ライセンス: Link先を確認	Aleksa Krstić, Priyanshu Tiwari, Florian Höhe, Frank Setzpfandt, Ulf Peschel, Joachim Ankerhold, Sina Saravi,	(参考訳) 非線形共振器における量子光発生方式を2レベル系にハイブリダイドして提案する。理論的には、一連の制御されたポンプパルスに励起されると、ハイブリッド源は1-および2-光子状態のほぼオンデマンド生成や最大7光子を持つフォック状態の生成確率50%以上といった高い確率でフォック状態を生成することができる。さらに重要なことは、非線形キャビティとポンプの調整可能な性質により、固定された2レベルのシステムであっても任意の周波数でフォック状態を生成することができ、量子技術のあらゆる分野で根本的に新しい機会が生まれることである。 We propose a scheme for quantum-light generation in a nonlinear cavity hybridized with a 2-level system. We theoretically show that, when excited by a series of controlled pump pulses, the hybrid source can generate various Fock states with high probabilities, such as near-on-demand generation of 1- and 2-photon states, and above 50% probability for generation of Fock states with up to 7 photons. More importantly, the tailorable nature of the nonlinear cavity and its pumping allows for generating Fock states with arbitrary frequencies, even with a fixed 2-level system, creating fundamentally new opportunities in all areas of quantum technologies.	翻訳日:2024-04-26 19:01:10 公開日:2024-04-24
# 生成的事前学習による長手ビデオの事前学習 Learning Long-form Video Prior via Generative Pre-Training ( http://arxiv.org/abs/2404.15909v1 ) ライセンス: Link先を確認	Jinheng Xie, Jiajun Feng, Zhaoxu Tian, Kevin Qinghong Lin, Yawen Huang, Xi Xia, Nanxu Gong, Xu Zuo, Jiaqi Yang, Yefeng Zheng, Mike Zheng Shou,	(参考訳) 人、オブジェクト、そしてそれらの相互作用のような長いビデオにかかわる概念は、暗黙の事前に従うものとして見ることができる。それらは特に複雑で、包括的に学ぶための課題を提起し続けています。近年、生成事前学習(GPT)は、視覚的位置さえも、どんな種類のテキストコンテンツでもモデリングできる多彩な能力を示している。この方法は、長めのビデオの学習に役立ちますか? ピクセル空間を操作する代わりに、バウンディングボックスやキーポイントのような視覚的な場所をビデオのキー情報として使うのが効果的である。適切なデータが不足しているため、映画から \textbf{Storyboard20K} と呼ばれる新しいデータセットを作成し、代表として機能させる。シナプス、ショット・バイ・ショットのキーフレーム、一貫したID、バウンディングボックス、ボディキーポイントを含むフィルムセットと文字の細かいアノテーションが含まれる。このようにして、ロングフォームビデオはトークンのセットで表現することができ、生成前のトレーニングを通じて学習することができる。実験結果から,本手法は以前から長編ビデオの学習に有用であることが確認された。コードとデータは \url{https://github.com/showlab/Long-form-Video-Prior} でリリースされる。 Concepts involved in long-form videos such as people, objects, and their interactions, can be viewed as following an implicit prior. They are notably complex and continue to pose challenges to be comprehensively learned. In recent years, generative pre-training (GPT) has exhibited versatile capacities in modeling any kind of text content even visual locations. Can this manner work for learning long-form video prior? Instead of operating on pixel space, it is efficient to employ visual locations like bounding boxes and keypoints to represent key information in videos, which can be simply discretized and then tokenized for consumption by GPT. Due to the scarcity of suitable data, we create a new dataset called \textbf{Storyboard20K} from movies to serve as a representative. It includes synopses, shot-by-shot keyframes, and fine-grained annotations of film sets and characters with consistent IDs, bounding boxes, and whole body keypoints. In this way, long-form videos can be represented by a set of tokens and be learned via generative pre-training. Experimental results validate that our approach has great potential for learning long-form video prior. Code and data will be released at \url{https://github.com/showlab/Long-form-Video-Prior}.	翻訳日:2024-04-26 18:51:25 公開日:2024-04-24
# スピン浴に囲まれた中心スピンの正確な力学を用いた強結合非マルコフ量子熱力学と正準ハミルトニアンの役割 Strong coupling non-Markovian quantum thermodynamics using exact dynamics of a central spin surrounded by a spin bath and the role of a canonical Hamiltonian ( http://arxiv.org/abs/2404.15915v1 ) ライセンス: Link先を確認	Devvrat Tiwari, Baibhab Bose, Subhashish Banerjee,	(参考訳) 焦点は強結合非マルコフ量子系の量子熱力学を理解することである。この目的のために、スピン浴に囲まれた中心スピンの非自明な非マルコフ模型を取り上げ、その正確な進化は任意の系-バス結合に導かれる。システムや浴槽の内部エネルギー、仕事、熱、エントロピー生成、エルゴトロピーといった基本的な量子力学量は、力学と元の系(バス)ハミルトン方程式を用いて計算される。作業の明示的な表現として、システムと浴槽内エネルギーのミスマッチが導出される。さらに、中心スピンを量子電池として想定するシナリオにおいて、チャージャーとして作用するスピン浴に関する興味深い観察を行う。上記の熱力学量の計算における標準ハミルトニアンの役割についても検討した。 The focus is on understanding the quantum thermodynamics of strongly coupled non-Markovian quantum systems. To this end, a non-trivial, non-Markovian model of a central spin surrounded by a spin bath is taken up, and its exact evolution is derived for arbitrary system-bath couplings. The fundamental quantum thermodynamic quantities, such as system and bath internal energies, work, heat, entropy production, and ergotropy, are calculated using the dynamics and original system (bath) Hamiltonian. An explicit expression for the work, a mismatch between the system and bath internal energies, is derived. Further, an interesting observation relevant to the spin bath acting as a charger is made in a scenario where the central spin is envisaged as a quantum battery. The role of a canonical Hamiltonian in calculating the above thermodynamic quantities, a recently developed technique, is also investigated.	翻訳日:2024-04-26 18:51:25 公開日:2024-04-24
# 畳み込みニューラルネットワーク, ResNet と Grad-CAM を用いた黄斑変性の知覚と局在 Perception and Localization of Macular Degeneration Applying Convolutional Neural Network, ResNet and Grad-CAM ( http://arxiv.org/abs/2404.15918v1 ) ライセンス: Link先を確認	Tahmim Hossain, Sagor Chandro Bakchy,	(参考訳) 罹患した患者にぼやけた視力を感じる網膜疾患として有名なのが黄斑変性症(Macular Degeneration)である。本研究は, 健康および黄斑変性の根源を分類し, 被害部位を同定することに基づく。バックボーンとして、ResNetアーキテクチャとCNN(ResNet50、ResNet50v2、ResNet101、ResNet101v2、ResNet152、ResNet152v2)が使用される。データは3つのカテゴリに分けられる。 (a)トレーニングセットは90%、テストセットは10% (b)トレーニングセットは80%、テストセットは20% (c)トレーニングセットは50%、テストセットは50%である。トレーニングの後、評価指標から最良のモデルが選択されました。モデルの中で、ResNet50のバックボーンを持つCNNは、90\%の列車で98.7\%のトレーニング精度と10\%のテストデータを分割する。このモデルを用いて,被害地を把握するためにGrad-CAMビジュアライゼーションを行った。 A well-known retinal disease that feels blurry visions to the affected patients is Macular Degeneration. This research is based on classifying the healthy and macular degeneration fundus with localizing the affected region of the fundus. A CNN architecture and CNN with ResNet architecture (ResNet50, ResNet50v2, ResNet101, ResNet101v2, ResNet152, ResNet152v2) as the backbone are used to classify the two types of fundus. The data are split into three categories including (a) Training set is 90% and Testing set is 10% (b) Training set is 80% and Testing set is 20%, (c) Training set is 50% and Testing set is 50%. After the training, the best model has been selected from the evaluation metrics. Among the models, CNN with backbone of ResNet50 performs best which gives the training accuracy of 98.7\% for 90\% train and 10\% test data split. With this model, we have performed the Grad-CAM visualization to get the region of affected area of fundus.	翻訳日:2024-04-26 18:51:25 公開日:2024-04-24
# フェデレーション学習のための要素重量集約法 An Element-Wise Weights Aggregation Method for Federated Learning ( http://arxiv.org/abs/2404.15919v1 ) ライセンス: Link先を確認	Yi Hu, Hanchi Ren, Chen Hu, Jingjing Deng, Xianghua Xie,	(参考訳) フェデレートラーニング(FL)は強力な機械学習(ML)パラダイムであり、分散クライアントが元のデバイスにデータを保存しながら共有グローバルモデルを共同で学習し、プライバシを保存することができる。 FLにおける中心的な課題は、異なる、潜在的にバランスの取れていないクライアントからの局所的なモデルウェイトを効果的に集約することである。既存のメソッドはしばしば各クライアントを無差別に扱い、ローカルモデル全体に対して単一の比率を適用する。しかし、それぞれの重量が特定の割合に割り当てられるのは経験的に有利である。本稿では,学習性能の最適化と収束速度の高速化を目的とした,新しい要素量集約法(EWWA-FL)を提案する。従来のFLアプローチとは異なり、EWWA-FLは各要素のレベルでグローバルモデルに局所的な重みを集約し、各クライアントが学習プロセスに要素的に貢献できるようにする。各クライアントのユニークなデータセット特性を考慮して、EWWA-FLはグローバルモデルのロバスト性を異なるデータセットに拡張するとともに、迅速な収束を実現している。この方法は様々な重み付け戦略を採用するのに十分な柔軟性がある。総合的な実験を通じて,EWWA-FLの高度な性能を実証し,様々なバックボーンとベンチマークの精度と収束速度の両面で有意な改善を示した。 Federated learning (FL) is a powerful Machine Learning (ML) paradigm that enables distributed clients to collaboratively learn a shared global model while keeping the data on the original device, thereby preserving privacy. A central challenge in FL is the effective aggregation of local model weights from disparate and potentially unbalanced participating clients. Existing methods often treat each client indiscriminately, applying a single proportion to the entire local model. However, it is empirically advantageous for each weight to be assigned a specific proportion. This paper introduces an innovative Element-Wise Weights Aggregation Method for Federated Learning (EWWA-FL) aimed at optimizing learning performance and accelerating convergence speed. Unlike traditional FL approaches, EWWA-FL aggregates local weights to the global model at the level of individual elements, thereby allowing each participating client to make element-wise contributions to the learning process. By taking into account the unique dataset characteristics of each client, EWWA-FL enhances the robustness of the global model to different datasets while also achieving rapid convergence. The method is flexible enough to employ various weighting strategies. Through comprehensive experiments, we demonstrate the advanced capabilities of EWWA-FL, showing significant improvements in both accuracy and convergence speed across a range of backbones and benchmarks.	翻訳日:2024-04-26 18:51:25 公開日:2024-04-24
# ショートカットにおける速度とコストの最適トレードオフの単一原子検証 Single-Atom Verification of the Optimal Trade-Off Between Speed and Cost in Shortcuts to Adiabaticity ( http://arxiv.org/abs/2404.15922v1 ) ライセンス: Link先を確認	J. -W. Zhang, J. -T. Bu, J. C. Li, Weiquan Meng, W. -Q. Ding, B. Wang, W. -F. Yuan, H. -J. Du, G. -Y. Ding, W. -J. Chen, L. Chen, F. Zhou, Zhenyu Xu, M. Feng,	(参考訳) 断熱へのショートカットのアプローチは、量子情報処理における断熱力学の効果的な実行を可能にする。動的速度と過渡駆動フィールドに関連するコストとの本質的にのトレードオフのため、任意に高速な演算を実行することは現実的ではない。このプロセスにおける速度とエネルギーコストの正確な相互作用を理解するため、理論と実験的に新しいトレードオフを提案し、これは、$s$-パラメータ化された位相空間内で厳密に最適化された境界によって特徴づけられる。我々の実験は、単一超低温の$^{40}$Ca$^{+}$イオンを調和ポテンシャルに閉じ込めて実施する。イオンの量子状態を正確に操作することにより、Landau-Zenerモデル(英語版)を例として実行し、量子速度制限とコストはスペクトルギャップによって制御される。私たちは、提案されたトレードオフが、当初純粋な状態と初期混合状態の両方を含むシナリオにおいて、確かに密接であるのを目撃します。我々の研究は、断熱性に対するショートカットの基本的な制約を理解するのに役立ち、伝統的に見落とされた未利用位相空間の可能性を照らし出す。 The approach of shortcuts to adiabaticity enables the effective execution of adiabatic dynamics in quantum information processing with enhanced speed. Owing to the inherent trade-off between dynamical speed and the cost associated with the transitionless driving field, executing arbitrarily fast operations becomes impractical. To understand the accurate interplay between speed and energetic cost in this process, we propose theoretically and verify experimentally a new trade-off, which is characterized by a tightly optimized bound within $s$-parameterized phase spaces. Our experiment is carried out in a single ultracold $^{40}$Ca$^{+}$ ion trapped in a harmonic potential. By exactly operating the quantum states of the ion, we execute the Landau-Zener model as an example, where the quantum speed limit as well as the cost are governed by the spectral gap. We witness that our proposed trade-off is indeed tight in scenarios involving both initially pure and initially mixed states. Our work helps understanding the fundamental constraints in shortcuts to adiabaticity and illuminates the potential of under-utilized phase spaces that have been traditionally overlooked.	翻訳日:2024-04-26 18:51:25 公開日:2024-04-24
# KGValidator:知識グラフ構築の自動検証フレームワーク KGValidator: A Framework for Automatic Validation of Knowledge Graph Construction ( http://arxiv.org/abs/2404.15923v1 ) ライセンス: Link先を確認	Jack Boylan, Shashank Mangla, Dominic Thorn, Demian Gholipour Ghalandari, Parsa Ghaffari, Chris Hokamp,	(参考訳) 本研究では,知識グラフ(KG)補完モデルの自動評価にLarge Language Models (LLMs) を用いることを検討した。歴史的に、KGsで情報を検証することは難しい課題であり、大規模な人間のアノテーションを禁止コストで要求してきた。汎用的な生成AIとLLMの出現により、人間のループ検証が生成エージェントに置き換えられる可能性が高まった。生成モデルを用いて知識グラフを検証する場合に,一貫性と検証のためのフレームワークを導入する。我々のフレームワークは、最近のLLM出力の構造的・意味的検証のためのオープンソース開発と、あらゆる種類の外部知識ソースを参照する能力によって支援される事実確認と検証への柔軟なアプローチに基づいている。この設計は適応と拡張が容易であり、モデル固有の知識、ユーザが提供するコンテキスト、外部知識の検索が可能なエージェントを組み合わせることで、どんなグラフ構造化データでも検証することができる。 This study explores the use of Large Language Models (LLMs) for automatic evaluation of knowledge graph (KG) completion models. Historically, validating information in KGs has been a challenging task, requiring large-scale human annotation at prohibitive cost. With the emergence of general-purpose generative AI and LLMs, it is now plausible that human-in-the-loop validation could be replaced by a generative agent. We introduce a framework for consistency and validation when using generative models to validate knowledge graphs. Our framework is based upon recent open-source developments for structural and semantic validation of LLM outputs, and upon flexible approaches to fact checking and verification, supported by the capacity to reference external knowledge sources of any kind. The design is easy to adapt and extend, and can be used to verify any kind of graph-structured data through a combination of model-intrinsic knowledge, user-supplied context, and agents capable of external knowledge retrieval.	翻訳日:2024-04-26 18:51:25 公開日:2024-04-24
# エコー室内:Twitterの誤報の言語的根拠 Inside the echo chamber: Linguistic underpinnings of misinformation on Twitter ( http://arxiv.org/abs/2404.15925v1 ) ライセンス: Link先を確認	Xinyu Wang, Jiayi Li, Sarah Rajtmajer,	(参考訳) ソーシャルメディア利用者は、誤った情報を含む投稿を共有したり、真面目な議論を伴う議論のある話題についてコメントしたりすることで、誤報の拡散をオンラインで推進している。エコーチャンバーの研究は、情報の拡散におけるホモフィリとバイアスによって促進される類似のピアとの繰り返しの相互作用を通じて、ユーザの視点が強化されることを示唆している。社会行動の社会的基盤と言語基盤に対する長年の関心に基づいて、この研究は、誤情報に関する会話が言語利用を通してどのように介在しているかを探求する。会話やユーザコミュニティの話題の中で,言語的尺度,例えば,グループ内/グループ内キュー,可読性,談話接続性などを比較した。誤報の議論において,グループ識別信号の存在が増加し,エコー室内での処理流速が増大することが判明した。本稿では、これらのトピックにわたる傾向の具体的特徴について論じ、文脈的影響について考察する。 Social media users drive the spread of misinformation online by sharing posts that include erroneous information or commenting on controversial topics with unsubstantiated arguments often in earnest. Work on echo chambers has suggested that users' perspectives are reinforced through repeated interactions with like-minded peers, promoted by homophily and bias in information diffusion. Building on long-standing interest in the social bases of language and linguistic underpinnings of social behavior, this work explores how conversations around misinformation are mediated through language use. We compare a number of linguistic measures, e.g., in-/out-group cues, readability, and discourse connectives, within and across topics of conversation and user communities. Our findings reveal increased presence of group identity signals and processing fluency within echo chambers during discussions of misinformation. We discuss the specific character of these broader trends across topics and examine contextual influences.	翻訳日:2024-04-26 18:51:25 公開日:2024-04-24
# ゼロショットクロスリンガル転送の一般化対策 Generalization Measures for Zero-Shot Cross-Lingual Transfer ( http://arxiv.org/abs/2404.15928v1 ) ライセンス: Link先を確認	Saksham Bassi, Duygu Ataman, Kyunghyun Cho,	(参考訳) モデルが未知の入力を異なる特徴で解釈する知識を一般化する能力は、堅牢で信頼性の高い機械学習システムを構築する上で不可欠である。言語モデル評価タスクには、モデル一般化に関する情報メトリクスが欠如しており、新しい設定での適用性は、多くの言語やタスクでしばしば欠落しているタスクと言語固有の下流のパフォーマンスを用いて測定される。本稿では,言語間ゼロショット設定における言語モデルの一般化能力に関する,より効率的な情報計算を支援するための,効率的かつ信頼性の高い尺度のセットについて検討する。学習後のパラメータのばらつきや初期化からの距離といった従来の尺度に加えて、言語間移動の成功を捉えた損失景観のシャープネスの効果も測定し、一般化に相関するモデル最適化のシャープネスを確実に計算する新しい安定アルゴリズムを提案する。 A model's capacity to generalize its knowledge to interpret unseen inputs with different characteristics is crucial to build robust and reliable machine learning systems. Language model evaluation tasks lack information metrics about model generalization and their applicability in a new setting is measured using task and language-specific downstream performance, which is often lacking in many languages and tasks. In this paper, we explore a set of efficient and reliable measures that could aid in computing more information related to the generalization capability of language models in cross-lingual zero-shot settings. In addition to traditional measures such as variance in parameters after training and distance from initialization, we also measure the effectiveness of sharpness in loss landscape in capturing the success in cross-lingual transfer and propose a novel and stable algorithm to reliably compute the sharpness of a model optimum that correlates to generalization.	翻訳日:2024-04-26 18:51:25 公開日:2024-04-24
# エルゴディックから多体局在遷移の複雑度測定による診断 Complexity Measure Diagnostics of Ergodic to Many-Body Localization Transition ( http://arxiv.org/abs/2404.15940v1 ) ライセンス: Link先を確認	Khen Cohen, Yaron Oz, De-liang Zhong,	(参考訳) 三対角化ハミルトニアンのランツォス係数の確率分布関数によって定義される複雑性測定に基づいて,エルゴード相と多体局在相の遷移を新たに診断する。相関強度の関数としてこれらの複雑性尺度を用いて, エルゴードの多体遷移を診断するモーメントとエントロピーと, 初期条件の記憶に関する位相の特徴的な特徴を示す。 We introduce new diagnostics of the transition between the ergodic and many-body localization phases, which are based on complexity measures defined via the probability distribution function of the Lanczos coefficients of the tri-diagonalized Hamiltonian. We use these complexity measures to analyze the power-law random banded matrix model as a function of the correlation strength and show that the moments and the entropy of the distribution diagnose the ergodic to many-body transition, as well as the distinctive feature of the phases concerning the memory of the initial conditions.	翻訳日:2024-04-26 18:51:25 公開日:2024-04-24
# PT対称性相転移の符号としてのSqueezed Displaced Schrödinger-cat状態 Squeezed Displaced Schrödinger-cat state as a signature of the PT-symmetry phase transition ( http://arxiv.org/abs/2404.15942v1 ) ライセンス: Link先を確認	Yuetao Chen, Shoukang Chang, Shaoyan Gao,	(参考訳) パリティ時間(PT)対称系は、非エルミート・ハミルトニアンによって制御され、例外点(EP)で縮退するゲインロス系であり、様々なフォトニック系、電気系、機械系等で研究されている。しかし、電子の輸送特性が劇的に影響される電子系において、PT対称性の相転移をどのように捉えるかは、まだ未解決の問題である。幸いなことに、光子と電子のハイブリッド化は、制御するだけでなく、材料特性を探索する新しい方法を提供する。本稿では,非エルミタンSu-Schrieffer-Heeger(SSH)鎖に結合した空洞を平均場アンザッツで検討する。 Squeezed Displaced Schrodinger cat (SDSc) は, PT対称性が破れ, PT対称性が回復すると, ほぼ1から0に急激な低下を経験する。さらに、半古典的極限において、半古典的光子ハミルトニアン $H_{\rm eff}(x, p)$ において、x=0$ の両側に局所極限が存在し、空洞基底状態における SDSc 状態の出現の明確な記号である。したがって、SDSc状態の出現は、キャビティモードでは修正できないPT対称性の相転移を捉えるのに利用できる。さらに、空洞基底状態を利用して光干渉計の位相を推定し、量子フィッシャー情報と非古典性はEPで急激に低下することを示す。これは、電子材料におけるPT対称性の破れは、位相推定における量子フィッシャー情報と非古典性によっても捉えることができることを示している。 Parity-time (PT ) symmetric systems are gain-loss systems whose dynamics are governed by non-Hermitian Hamiltonians with degeneracies at exceptional-points (EPs) and has been studied in various photonic, electrical, mechanical systems, and so on. However, it is still an open question how to capture PT symmetry phase transition in electronic system where the transport properties of electron will be dramatically effected. Fortunately, the hybridization between photon and electron offers a novel way not only to control but also probe material properties. Here, we investigate a cavity coupled to a non-Hermitian Su-Schrieffer-Heeger (SSH) chain within mean-field ansatzs. We find that Squeezed Displaced Schrodinger cat (SDSc) will emerge with high fidelity in cavity ground state when PT -symmetry is broken and the fidelity will experience a sharp drop from almost 1 to 0 as PT symmetry recovers. Additionally, in semiclassical limit, we find that there exists local extrema at two sides of $x=0$ in semiclassical photon Hamiltonian $H_{\rm eff}(x, p)$, a clear signature of the emergence of SDSc state in cavity ground state. Thus, the appearance of SDSc state can be used to capture PT-symmetry phase transition which can not be modified by cavity mode. Besides, we exploit the cavity ground state to estimate the phase in the optical interferometer, and show that the quantum Fisher information and nonclassicality will sharply decline at EPs. This reveals that PT-symmetry breaking in electronic materials can also be captured by the quantum Fisher information and nonclassicality in phase estimation.	翻訳日:2024-04-26 18:51:25 公開日:2024-04-24
# Mammo-CLIP:マルチビューマンモグラフィーによる乳がん診断におけるコントラスト言語画像前訓練(CLIP)の活用 Mammo-CLIP: Leveraging Contrastive Language-Image Pre-training (CLIP) for Enhanced Breast Cancer Diagnosis with Multi-view Mammography ( http://arxiv.org/abs/2404.15946v1 ) ライセンス: Link先を確認	Xuxin Chen, Yuheng Li, Mingzhe Hu, Ella Salari, Xiaoqian Chen, Richard L. J. Qiu, Bin Zheng, Xiaofeng Yang,	(参考訳) 乳がん検診の精度を高めるために,マンモグラムの多視点からの情報の融合が重要な役割を担っているが,多視点マンモグラムを用いたコンピュータ支援診断(CAD)手法の開発は依然として課題に直面しており,臨床にはそのようなCAD方式は使われていない。この課題を克服するため, CLIP(Contrastive Language- Image Pre-Training)に基づく新たなアプローチについて検討した。マルチビュー機能融合のためのシングルビューCLIPを効果的に適用し、(2)限られたサンプルと計算資源でこのパラメータ密度モデルを効率的に微調整することで、マルチビューマンモグラムと対応する単純なテキストを処理する最初のマルチモーダルフレームワークであるMammo-CLIPを導入する。 Mammo-CLIPは、早期の機能融合戦略を用いて、左右乳房のCCおよびMLOビューから取得した4つのマンモグラムのマルチビュー関係を学習する。学習効率を向上させるため、CLIPイメージとテキストエンコーダにプラグアンドプレイアダプタを追加して、微調整パラメータを指定し、更新を約1%に制限する。フレームワークの評価には、2つのデータセットを振り返りに組み立てました。悪性470例と良性479例からなる最初のデータセットは、5倍のクロスバリデーションにより提案したMammo-CLIPの微調整および内部評価に使用された。悪性60例,良性294例を含む第2のデータセットを用いて,マンモCLIPの一般化性を検討した。その結果,Mammo-CLIPはAUC (0.841 vs. 0.817, 0.837 vs. 0.807) の最先端のクロスビュー・トランスフォーマーよりも優れていた。また、以前の2つのCLIPベースの手法を20.3%、14.3%上回る。本研究は、乳がんの次世代画像テキストベースのCADスキーム開発に、微調整された視覚言語モデルを適用する可能性を強調した。 Although fusion of information from multiple views of mammograms plays an important role to increase accuracy of breast cancer detection, developing multi-view mammograms-based computer-aided diagnosis (CAD) schemes still faces challenges and no such CAD schemes have been used in clinical practice. To overcome the challenges, we investigate a new approach based on Contrastive Language-Image Pre-training (CLIP), which has sparked interest across various medical imaging tasks. By solving the challenges in (1) effectively adapting the single-view CLIP for multi-view feature fusion and (2) efficiently fine-tuning this parameter-dense model with limited samples and computational resources, we introduce Mammo-CLIP, the first multi-modal framework to process multi-view mammograms and corresponding simple texts. Mammo-CLIP uses an early feature fusion strategy to learn multi-view relationships in four mammograms acquired from the CC and MLO views of the left and right breasts. To enhance learning efficiency, plug-and-play adapters are added into CLIP image and text encoders for fine-tuning parameters and limiting updates to about 1% of the parameters. For framework evaluation, we assembled two datasets retrospectively. The first dataset, comprising 470 malignant and 479 benign cases, was used for few-shot fine-tuning and internal evaluation of the proposed Mammo-CLIP via 5-fold cross-validation. The second dataset, including 60 malignant and 294 benign cases, was used to test generalizability of Mammo-CLIP. Study results show that Mammo-CLIP outperforms the state-of-art cross-view transformer in AUC (0.841 vs. 0.817, 0.837 vs. 0.807) on both datasets. It also surpasses previous two CLIP-based methods by 20.3% and 14.3%. This study highlights the potential of applying the finetuned vision-language models for developing next-generation, image-text-based CAD schemes of breast cancer.	翻訳日:2024-04-26 18:51:25 公開日:2024-04-24
# シークエンス(Sequence)は、何をディスクにするかを秘密に教えてくれる Sequence can Secretly Tell You What to Discard ( http://arxiv.org/abs/2404.15949v1 ) ライセンス: Link先を確認	Jincheng Dai, Zhuowei Huang, Haiyun Jiang, Chen Chen, Deng Cai, Wei Bi, Shuming Shi,	(参考訳) 大規模言語モデル(LLM)は、幅広いタスクにおいて優れたパフォーマンスを保ちながら、大きなGPUメモリを必要とし、かなりの計算資源を消費する。モデル重みに加えて、KVキャッシュが占有するメモリはシーケンス長とともに線形に増加し、推論の主要なボトルネックとなる。本稿では,メモリフットプリントを大幅に削減するKVキャッシュの最適化手法を提案する。包括的調査により、LLaMA2級数モデルでそのことが分かる。 (i)隣接するトークンのクエリベクトルの類似性は非常に高く、 (II)現在のクエリの注意計算は、前回のクエリのわずかな部分の注意情報のみに依存することができる。これらの観測に基づいて,モデルを微調整することなく,重要なキーと値のペアを動的に保持するKVキャッシュ消去ポリシーであるCORMを提案する。 CORMは、LongBenchの6つのタスクで顕著なパフォーマンス劣化を伴わずに、KVキャッシュの推論メモリ使用量を最大70%削減する。 Large Language Models (LLMs), despite their impressive performance on a wide range of tasks, require significant GPU memory and consume substantial computational resources. In addition to model weights, the memory occupied by KV cache increases linearly with sequence length, becoming a main bottleneck for inference. In this paper, we introduce a novel approach for optimizing the KV cache which significantly reduces its memory footprint. Through a comprehensive investigation, we find that on LLaMA2 series models, (i) the similarity between adjacent tokens' query vectors is remarkably high, and (ii) current query's attention calculation can rely solely on the attention information of a small portion of the preceding queries. Based on these observations, we propose CORM, a KV cache eviction policy that dynamically retains important key-value pairs for inference without finetuning the model. We validate that CORM reduces the inference memory usage of KV cache by up to 70% without noticeable performance degradation across six tasks in LongBench.	翻訳日:2024-04-26 18:51:25 公開日:2024-04-24
# 推薦のための混合教師付きグラフコントラスト学習 Mixed Supervised Graph Contrastive Learning for Recommendation ( http://arxiv.org/abs/2404.15954v1 ) ライセンス: Link先を確認	Weizhi Zhang, Liangwei Yang, Zihe Song, Henry Peng Zou, Ke Xu, Yuanjie Zhu, Philip S. Yu,	(参考訳) Recommender System(RecSys)は、オンラインプラットフォームにおいて重要な役割を担い、膨大な情報の中でパーソナライズされた提案を提供する。グラフコントラスト学習は、二部グラフの教師なし強化を伴う高次協調フィルタリング信号から学習することを目的としており、これはペアワイズレコメンデーション損失とコントラストロスの両方を含むマルチタスク学習フレームワークに大きく依存している。この分離された設計は、異なる損失から不整合最適化方向を引き起こす可能性があるため、収束時間が長くなり、サブ最適性能さえも生じる。さらに、RecSysは、拡張中に追加の教師付き協調フィルタリング信号を提供することなく、異なるビューからユーザやイテムを区別することを学ぶため、自己監督によるコントラスト損失はRecSysのデータスパシティ問題を緩和するに足らない。本稿では、これらの問題に対処するために、MixSGCL(Mixed Supervised Graph Contrastive Learning for Recommendation)を提案する。 MixSGCLはもともと、推奨と教師なしのコントラスト損失のトレーニングを教師付きコントラスト学習損失に統合し、2つのタスクを1つの最適化方向に整合させる。データの分散性問題に対処するため,既存のユーザ・イテム相互作用に基づいて,より直接的な教師付き協調フィルタリング信号のマイニングを行うノードワイド・エッジワイド・ミックスアップを提案する。 3つの実世界のデータセットに対する大規模な実験は、MixSGCLが最先端の手法を超越し、精度と効率の両方で最高のパフォーマンスを達成していることを示している。教師付きグラフコントラスト学習におけるMixSGCLの有効性を検証する。 Recommender systems (RecSys) play a vital role in online platforms, offering users personalized suggestions amidst vast information. Graph contrastive learning aims to learn from high-order collaborative filtering signals with unsupervised augmentation on the user-item bipartite graph, which predominantly relies on the multi-task learning framework involving both the pair-wise recommendation loss and the contrastive loss. This decoupled design can cause inconsistent optimization direction from different losses, which leads to longer convergence time and even sub-optimal performance. Besides, the self-supervised contrastive loss falls short in alleviating the data sparsity issue in RecSys as it learns to differentiate users/items from different views without providing extra supervised collaborative filtering signals during augmentations. In this paper, we propose Mixed Supervised Graph Contrastive Learning for Recommendation (MixSGCL) to address these concerns. MixSGCL originally integrates the training of recommendation and unsupervised contrastive losses into a supervised contrastive learning loss to align the two tasks within one optimization direction. To cope with the data sparsity issue, instead unsupervised augmentation, we further propose node-wise and edge-wise mixup to mine more direct supervised collaborative filtering signals based on existing user-item interactions. Extensive experiments on three real-world datasets demonstrate that MixSGCL surpasses state-of-the-art methods, achieving top performance on both accuracy and efficiency. It validates the effectiveness of MixSGCL with our coupled design on supervised graph contrastive learning.	翻訳日:2024-04-26 18:51:25 公開日:2024-04-24
# ディープフェイク画像を超える:AI生成ビデオの検出 Beyond Deepfake Images: Detecting AI-Generated Videos ( http://arxiv.org/abs/2404.15955v1 ) ライセンス: Link先を確認	Danial Samadi Vahdati, Tai D. Nguyen, Aref Azizpour, Matthew C. Stamm,	(参考訳) 生成AIの最近の進歩は、視覚的にリアルな合成ビデオを生成する技術の開発につながっている。本稿では,AI合成画像を検出するために,多くの技術が開発されているが,合成画像検出装置では合成映像を検出できないことを示す。これは、合成ビデオジェネレータが、画像ジェネレータが残したものとはかなり異なるトレースを導入するためである。それにもかかわらず,H.264再圧縮後においても,合成ビデオトレースを学習し,信頼性の高い合成ビデオ検出や生成元属性の実行に利用できることを示す。さらに,ゼロショット転送性による新しいジェネレータからの映像の検出は困難である一方で,新しいジェネレータからの映像の正確な検出は,数ショットの学習によって達成できることを実証した。 Recent advances in generative AI have led to the development of techniques to generate visually realistic synthetic video. While a number of techniques have been developed to detect AI-generated synthetic images, in this paper we show that synthetic image detectors are unable to detect synthetic videos. We demonstrate that this is because synthetic video generators introduce substantially different traces than those left by image generators. Despite this, we show that synthetic video traces can be learned, and used to perform reliable synthetic video detection or generator source attribution even after H.264 re-compression. Furthermore, we demonstrate that while detecting videos from new generators through zero-shot transferability is challenging, accurate detection of videos from a new generator can be achieved through few-shot learning.	翻訳日:2024-04-26 18:51:25 公開日:2024-04-24
# 視覚マンバに関する調査 A Survey on Visual Mamba ( http://arxiv.org/abs/2404.15956v1 ) ライセンス: Link先を確認	Hanwei Zhang, Ying Zhu, Dan Wang, Lijun Zhang, Tianxiang Chen, Zi Ye,	(参考訳) 選択機構とハードウェア対応アーキテクチャを備えた状態空間モデル(SSM)、すなわちMambaは、最近、長いシーケンスモデリングにおいて大きな可能性を証明している。トランスにおける自己注意機構は、画像サイズと計算要求の増加と2次複雑さを持つため、研究者らは現在、コンピュータビジョンタスクにMambaを適用する方法を模索している。本稿では,コンピュータビジョン分野におけるMambaモデルの詳細分析を目的とした,初めての総合的な調査である。これは、状態空間モデルフレームワーク、選択メカニズム、ハードウェア対応設計など、Mambaの成功に寄与する基本的な概念を探求することから始まる。次に、これらの視覚マンバモデルについて、基礎的なモデルに分類し、その高度化を図るために、畳み込み、再発、注意などのテクニックで強化することでレビューする。さらに、様々な視覚処理におけるバックボーンとしての利用を含む、視覚タスクにおけるMambaの幅広い応用を掘り下げる。これには、一般的な視覚タスク、医療視覚タスク(例えば、2D/3Dセグメンテーション、分類、画像登録など)、リモートセンシング視覚タスクが含まれる。本稿では,高次視覚(オブジェクト検出,セグメンテーション,ビデオ分類など)と低次視覚(画像超解像,画像復元,視覚生成など)の2段階から一般的な視覚タスクを紹介する。この取り組みが、現在の課題に対処し、さらにマンバモデルをコンピュータビジョンに適用するために、コミュニティ内でさらなる関心を喚起することを期待しています。 State space models (SSMs) with selection mechanisms and hardware-aware architectures, namely Mamba, have recently demonstrated significant promise in long-sequence modeling. Since the self-attention mechanism in transformers has quadratic complexity with image size and increasing computational demands, the researchers are now exploring how to adapt Mamba for computer vision tasks. This paper is the first comprehensive survey aiming to provide an in-depth analysis of Mamba models in the field of computer vision. It begins by exploring the foundational concepts contributing to Mamba's success, including the state space model framework, selection mechanisms, and hardware-aware design. Next, we review these vision mamba models by categorizing them into foundational ones and enhancing them with techniques such as convolution, recurrence, and attention to improve their sophistication. We further delve into the widespread applications of Mamba in vision tasks, which include their use as a backbone in various levels of vision processing. This encompasses general visual tasks, Medical visual tasks (e.g., 2D / 3D segmentation, classification, and image registration, etc.), and Remote Sensing visual tasks. We specially introduce general visual tasks from two levels: High/Mid-level vision (e.g., Object detection, Segmentation, Video classification, etc.) and Low-level vision (e.g., Image super-resolution, Image restoration, Visual generation, etc.). We hope this endeavor will spark additional interest within the community to address current challenges and further apply Mamba models in computer vision.	翻訳日:2024-04-26 18:41:38 公開日:2024-04-24
# 超低温分子を用いたSU(N)磁性 SU(N) magnetism with ultracold molecules ( http://arxiv.org/abs/2404.15957v1 ) ライセンス: Link先を確認	Bijit Mukherjee, Jeremy M. Hutson, Kaden R. A. Hazzard,	(参考訳) SU($N$)対称性を持つ量子系は、量子多体物理学のパラダイム的な設定である。複雑な物質への洞察と、エキゾチックな基底状態の安定化能力について研究されている。超低温アルカリ原子は、$N=2I+1=1,2,\ldots,10$に対してSU($N$)対称性を示すと予測された。その後の実験により、豊富な多体物理学が明らかになった。しかし、アルカリ原子とアース原子は反発相互作用を持つフェルミオンに対してのみこの対称性を実現する。本稿では, 静電場やマイクロ波との破壊衝突で遮蔽された超低温分子がSU($N$)対称性を示すことを予測し, スピンフリー値からのs波散乱長の偏差は, 静電遮蔽によるCaFの約3倍であり, バイアルカリ分子ではさらに小さいと推定される。彼らはそのドアを、ボソンに32ドル、フェルミオンに36ドル(約3万2000円)まで開けた。それらは、ボゾン系や魅力的な相互作用を含む原子に到達できない重要な特徴を提供する。 Quantum systems with SU($N$) symmetry are paradigmatic settings for quantum many-body physics. They have been studied for the insights they provide into complex materials and their ability to stabilize exotic ground states. Ultracold alkaline-earth atoms were predicted to exhibit SU($N$) symmetry for $N=2I+1=1,2,\ldots,10$, where $I$ is the nuclear spin. Subsequent experiments have revealed rich many-body physics. However, alkaline-earth atoms realize this symmetry only for fermions with repulsive interactions. In this paper, we predict that ultracold molecules shielded from destructive collisions with static electric fields or microwaves exhibit SU($N$) symmetry, which holds because deviations of the s-wave scattering length from the spin-free values are only about 3\% for CaF with static-field shielding and are estimated to be even smaller for bialkali molecules. They open the door to $N$ as large as $32$ for bosons and $36$ for fermions. They offer important features unachievable with atoms, including bosonic systems and attractive interactions.	翻訳日:2024-04-26 18:41:38 公開日:2024-04-24
# 液状化誘起横方向拡散予測のための説明可能なAIモデル Explainable AI models for predicting liquefaction-induced lateral spreading ( http://arxiv.org/abs/2404.15959v1 ) ライセンス: Link先を確認	Cheng-Hsi Hsiao, Krishna Kumar, Ellen Rathje,	(参考訳) 地震によって引き起こされる液状化は、インフラへの脅威として、相当に横方向の拡散を引き起こす可能性がある。マシンラーニング(ML)は、複雑な土壌特性と現場条件をキャプチャすることで、横方向の拡散予測モデルを改善することができる。しかし、MLモデルの"ブラックボックス"の性質は、重要な意思決定における採用を妨げる可能性がある。本研究は,2011年クライストチャーチ地震のデータに基づいて訓練された横方向拡散予測のためのeXtreme Gradient Boosting(XGB)モデルの解釈にSHAP(SHapley Additive ExPlanations)を用いることにより,この制限に対処する。 SHAP分析は、モデルの予測を駆動し、透明性を高め、確立されたエンジニアリング知識との比較を可能にする要因を明らかにする。その結果, コーン浸透試験(CPT)データから得られた土壌特性の重要性をXGBモデルで同定し, 領域理解との整合性を検証した。この研究は、地球工学とハザードアセスメントにおける信頼性とインフォームドな意思決定のための説明可能な機械学習の価値を強調している。 Earthquake-induced liquefaction can cause substantial lateral spreading, posing threats to infrastructure. Machine learning (ML) can improve lateral spreading prediction models by capturing complex soil characteristics and site conditions. However, the "black box" nature of ML models can hinder their adoption in critical decision-making. This study addresses this limitation by using SHapley Additive exPlanations (SHAP) to interpret an eXtreme Gradient Boosting (XGB) model for lateral spreading prediction, trained on data from the 2011 Christchurch Earthquake. SHAP analysis reveals the factors driving the model's predictions, enhancing transparency and allowing for comparison with established engineering knowledge. The results demonstrate that the XGB model successfully identifies the importance of soil characteristics derived from Cone Penetration Test (CPT) data in predicting lateral spreading, validating its alignment with domain understanding. This work highlights the value of explainable machine learning for reliable and informed decision-making in geotechnical engineering and hazard assessment.	翻訳日:2024-04-26 18:41:38 公開日:2024-04-24
# ステップ周波数GPRフィールド計測の機械学習による土壌解析:予備的検討 Soil analysis with machine-learning-based processing of stepped-frequency GPR field measurements: Preliminary study ( http://arxiv.org/abs/2404.15961v1 ) ライセンス: Link先を確認	Chunlei Xu, Michael Pregesbauer, Naga Sravani Chilukuri, Daniel Windhager, Mahsa Yousefi, Pedro Julian, Lothar Ratschbacher,	(参考訳) 土壌浸透レーダ(GPR)は農業や園芸に関連する土壌パラメータを抽出する手段として広く研究されている。機械学習(ML)法と組み合わせると、SFCW(Stepped Frequency Countinuous Wave Radar)の高分解能測定により、根面深度を含む深さ分解土壌パラメータへの費用対効果が期待できる。この方向への第一歩として、トラクタ搭載SFCW GPR機器を用いた広範囲なフィールドサーベイを行う。 MLデータ処理を用いて、電磁気誘導(EMI)機器を同時に記録することにより、GPR機器の電気伝導率(ECaR)を予測する能力をテストする。ゴルフコースで約6600平方メートルに分散したGPRデータとEMIデータを組み合わせた3472の大規模フィールド計測キャンペーンを行った。選択された地形は高地表面の均一性から恩恵を受けるが、測定された土壌パラメータの変化は小さく、識別が困難である。定量的な結果から,農業環境におけるエンド・ツー・エンドMLの性能評価のための性能指標としてnugget-to-sill比を用いることを提案するとともに,マルチセンサ回帰設定における制限要因について議論する。コードはオープンソースとしてリリースされ、https://opensource.silicon-austria.com/xuc/soil-analysis-machine-learning-stepped- frequency-gprで公開されている。 Ground Penetrating Radar (GPR) has been widely studied as a tool for extracting soil parameters relevant to agriculture and horticulture. When combined with Machine-Learning-based (ML) methods, high-resolution Stepped Frequency Countinuous Wave Radar (SFCW) measurements hold the promise to give cost effective access to depth resolved soil parameters, including at root-level depth. In a first step in this direction, we perform an extensive field survey with a tractor mounted SFCW GPR instrument. Using ML data processing we test the GPR instrument's capabilities to predict the apparent electrical conductivity (ECaR) as measured by a simultaneously recording Electromagnetic Induction (EMI) instrument. The large-scale field measurement campaign with 3472 co-registered and geo-located GPR and EMI data samples distributed over ~6600 square meters was performed on a golf course. The selected terrain benefits from a high surface homogeneity, but also features the challenge of only small, and hence hard to discern, variations in the measured soil parameter. Based on the quantitative results we suggest the use of nugget-to-sill ratio as a performance metric for the evaluation of end-to-end ML performance in the agricultural setting and discuss the limiting factors in the multi-sensor regression setting. The code is released as open source and available at https://opensource.silicon-austria.com/xuc/soil-analysis-machine-learning-stepped-frequency-gpr.	翻訳日:2024-04-26 18:41:38 公開日:2024-04-24
# 量子力学の複素確率最適制御基礎 Complex Stochastic Optimal Control Foundation of Quantum Mechanics ( http://arxiv.org/abs/2404.15964v1 ) ライセンス: Link先を確認	Vasil Yordanov,	(参考訳) 近年の研究では、確率的ハミルトン・ヤコビ・ベルマン方程式(HJB)を用いて量子力学方程式を導出する複雑な変数を含むように拡張されている。しかしながら、これらの研究は通常、HJB方程式を複素数に直接適用することは有効であると仮定する。本稿では,複素変数の文脈においてHJB方程式を適切に適用する方法について述べる。この結果は、コーシー・リーマンの定理に直接的な影響を受け、量子粒子の確率運動を著しく再評価する。これらの知見は量子力学の理解を深めるだけでなく、量子力学に確率論的最適制御を適用するためのフレームワークの数学的厳密性を高める。 Recent studies have expanded the use of the stochastic Hamilton Jacobi Bellman (HJB) equation to include complex variables for deriving quantum mechanical equations. However, these studies typically assume that it is valid to apply the HJB equation directly to complex numbers, an approach that overlooks the fundamental problem of comparing complex numbers to find optimal controls. This paper addresses how to properly apply the HJB equation in the context of complex variables. Our findings significantly reevaluate the stochastic movement of quantum particles, directly influenced by the Cauchy Riemann theorem. These insights not only deepen our understanding of quantum dynamics but also enhance the mathematical rigor of the framework for applying stochastic optimal control in quantum mechanics.	翻訳日:2024-04-26 18:41:38 公開日:2024-04-24
# 対超流体の非局所次数パラメータ Nonlocal order parameter of pair superfluids ( http://arxiv.org/abs/2404.15972v1 ) ライセンス: Link先を確認	Nitya Cuzzuol, Luca Barbiero, Arianna Montorsi,	(参考訳) 順序パラメータは、量子物質を特徴づける基本的な資源を表す。局所密度測定により導出可能な非局所秩序パラメータである奇数パリティ(英語版)を用いて,ペア超流動を厳密に定義できることが示される。研究の例として,1次元と2次元の異なる密度のボース・ハバードモデルについて検討する。ここでは, 相対的に強い相互作用に対して, 対超流動性を求める。奇パリティ作用素は、系の密度とその次元によらず、そのような位相のユニークな順序パラメータとして作用する。我々の発見を強制するために、我々は、超低温原子系において、実験的な実現がタイムリーな話題である2成分のボース・ハバード・ハミルトン系にも、我々のアプローチの一般性を確認する。その結果, 対超流動における相関密度変動の役割に新たな光を当てた。さらに、これらのエキゾチック相を実験的に検出し、正常な超流動相への遷移を特徴づけるための強力なツールを提供する。 Order parameters represent a fundamental resource to characterize quantum matter. We show that pair superfluids can be rigorously defined in terms of a nonlocal order parameter, named odd parity, which derivation is experimentally accessible by local density measurements. As a case of study, we first investigate a constrained Bose-Hubbard model at different densities, both in one and two spatial dimensions. Here, our analysis finds pair superfluidity for relatively strong attractive interactions. The odd parity operator acts as the unique order parameter for such phase irrespectively to the density of the system and its dimensionality. In order to enforce our finding, we confirm the generality of our approach also on a two-component Bose-Hubbard Hamiltonian, which experimental realization represents a timely topic in ultracold atomic systems. Our results shed new light on the role of correlated density fluctuations in pair superfluids. In addition, they provide a powerful tool for the experimental detection of such exotic phases and the characterization of their transition to the normal superfluid phase.	翻訳日:2024-04-26 18:41:38 公開日:2024-04-24
# 電界のマクロ計測による絡み合いの検出とその変動 Detecting entanglement from macroscopic measurements of the electric field and its fluctuations ( http://arxiv.org/abs/2404.15973v1 ) ライセンス: Link先を確認	Pedro Rosario, Alan C. Santos, Nicola Piovella, Robin Kaiser, André Cidrim, Romain Bachelard,	(参考訳) 大規模量子系における絡み合いを検出するための卓越した課題に対処するために、絡み合いの証人が現われ、状態の分離性に対処している。しかし、目撃者を最適化したり、実験的にアクセスしたりすることは、しばしば困難である。ここでは、その電場、その四量体と全蛍光に基づいて、オープン量子系に対する絡み合った証人の族を紹介します。スピンスクイーズ不等式よりも一般的には、遠距離観測の方向を変えることによって、状態トモグラフィーを必要とせず、連続した目撃者の族が開くため、絡み合った状態の新たなクラスを検出することができる。それらの効率は、ほぼあらゆる方向から、協調的自発放出によって生じる長寿命状態のような集合的な単一光子状態の絡み合いを検出することによって示される。大型量子系における絡み合いを検出できないため、これらの電場に基づく証人は、原子系(コールド原子や閉じ込められたイオン)、巨大原子、色中心、超伝導量子ビットなど、パウリ族によって記述されたあらゆるエミッターに使用できる。 To address the outstanding task of detecting entanglement in large quantum systems, entanglement witnesses have emerged, addressing the separable nature of a state. Yet optimizing witnesses, or accessing them experimentally, often remains a challenge. We here introduce a family of entanglement witnesses for open quantum systems, based on the electric field -- its quadratures and the total fluorescence. More general than spin-squeezing inequalities, it can detect new classes of entangled states, as changing the direction for far-field observation opens up a continuous family of witnesses, without the need for a state tomography. Their efficiency is demonstrated by detecting, from almost any direction, the entanglement of collective single-photon states, such as long-lived states generated by cooperative spontaneous emission. Able to detect entanglement in large quantum systems, these electric-field-based witnesses can be used on any set of emitters described by the Pauli group, such as atomic systems (cold atoms and trapped ions), giant atoms, color centers, and superconducting qubits.	翻訳日:2024-04-26 18:41:38 公開日:2024-04-24
# 3次元都市データのビジュアル分析における最先端技術 The State of the Art in Visual Analytics for 3D Urban Data ( http://arxiv.org/abs/2404.15976v1 ) ライセンス: Link先を確認	Fabio Miranda, Thomas Ortner, Gustavo Moreira, Maryam Hosseini, Milena Vuckovic, Filip Biljecki, Claudio Silva, Marcos Lage, Nivan Ferreira,	(参考訳) 都市化は、多様な利害関係者にとって大きな関心を持つ幅広い現象に対して、都市環境における3次元構造の重要性を増幅してきた。 3次元都市データの普及に伴い、都市環境の特徴に合わせた視覚分析技術の開発に多くの研究が注がれている。しかし、3次元を視覚分析に組み込むことで、都市データの多様な複雑さに対処する効果的なビジュアルツールを設計する上で、さらなる課題がもたらされる。本稿では,3次元都市データの視覚的分析について述べる。私たちの作業では、ユースケース、分析タスク、データ、視覚化、インタラクションを考慮して、公開作業が3つの主要な側面(なぜ、何、どのように、どのように)に沿って行われるかを特徴付けています。我々は、可視化ジャーナルや会議、都市計画、建築、工学など、無数の都市ドメインからの出版作品のきめ細かい分類を提供する。都市と可視化の専門家の視点を取り入れることで、文献のギャップを識別し、可視化研究者に課題と機会を理解する動機を与え、将来の研究方向性を示す。 Urbanization has amplified the importance of three-dimensional structures in urban environments for a wide range of phenomena that are of significant interest to diverse stakeholders. With the growing availability of 3D urban data, numerous studies have focused on developing visual analysis techniques tailored to the unique characteristics of urban environments. However, incorporating the third dimension into visual analytics introduces additional challenges in designing effective visual tools to tackle urban data's diverse complexities. In this paper, we present a survey on visual analytics of 3D urban data. Our work characterizes published works along three main dimensions (why, what, and how), considering use cases, analysis tasks, data, visualizations, and interactions. We provide a fine-grained categorization of published works from visualization journals and conferences, as well as from a myriad of urban domains, including urban planning, architecture, and engineering. By incorporating perspectives from both urban and visualization experts, we identify literature gaps, motivate visualization researchers to understand challenges and opportunities, and indicate future research directions.	翻訳日:2024-04-26 18:41:38 公開日:2024-04-24
# SO(3)空間におけるフーリエ解析について : EquiLoPOネットワーク On the Fourier analysis in the SO(3) space : EquiLoPO Network ( http://arxiv.org/abs/2404.15979v1 ) ライセンス: Link先を確認	Dmitrii Zhemchuzhnikov, Sergei Grudinin,	(参考訳) 回転不変または等分散を伴う体積データを解析することは、現在の研究において活発なトピックである。既存のディープラーニングアプローチでは、離散的な回転に制限されたグループ畳み込みネットワークまたは制約付きフィルタ構造を持つステアブル畳み込みネットワークを利用する。本研究は, 連続SO(3)群における局所パターンオリエンテーションに対する解析的等価性を実現するとともに, 制約のないトレーニング可能なフィルタであるEquiLoPOネットワークを許容する新しい同変ニューラルネットワークアーキテクチャを提案する。我々の重要な革新は、フーリエ基底として既約表現を活用する群畳み込み演算と、入力から出力関数へのよく定義された写像を提供するSO(3)空間における局所活性化関数であり、等式を保存することである。本稿では,これらの操作をResNetスタイルのアーキテクチャに統合することにより,従来の手法の限界を克服するモデルを提案する。 MedMNIST3Dによる多種多様な3次元医用画像データセットの包括的評価は、我々のアプローチの有効性を示している。この研究は、SO(3) 上の真の回転同値と局所活性化関数によって実現されるフレキシブルな非拘束フィルタの利点を示唆する。私たちのコードは、 \url{https://gricad-gitlab.univ-grenoble-alpes.fr/GruLab/ILPO/-/tree/main/EquiLoPO}で公開されています。 Analyzing volumetric data with rotational invariance or equivariance is an active topic in current research. Existing deep-learning approaches utilize either group convolutional networks limited to discrete rotations or steerable convolutional networks with constrained filter structures. This work proposes a novel equivariant neural network architecture that achieves analytical Equivariance to Local Pattern Orientation on the continuous SO(3) group while allowing unconstrained trainable filters - EquiLoPO Network. Our key innovations are a group convolutional operation leveraging irreducible representations as the Fourier basis and a local activation function in the SO(3) space that provides a well-defined mapping from input to output functions, preserving equivariance. By integrating these operations into a ResNet-style architecture, we propose a model that overcomes the limitations of prior methods. A comprehensive evaluation on diverse 3D medical imaging datasets from MedMNIST3D demonstrates the effectiveness of our approach, which consistently outperforms state of the art. This work suggests the benefits of true rotational equivariance on SO(3) and flexible unconstrained filters enabled by the local activation function, providing a flexible framework for equivariant deep learning on volumetric data with potential applications across domains. Our code is publicly available at \url{https://gricad-gitlab.univ-grenoble-alpes.fr/GruLab/ILPO/-/tree/main/EquiLoPO}.	翻訳日:2024-04-26 18:41:38 公開日:2024-04-24
# 合金を用いた分散量子コンピューティングにおけるテレポーテーション数の最小化 Minimizing the Number of Teleportations in Distributed Quantum Computing Using Alloy ( http://arxiv.org/abs/2404.15980v1 ) ライセンス: Link先を確認	Ali Ebnenasir, Kieran Young,	(参考訳) 本稿では,形式的手法を用いて分散量子コンピューティング(DQC)におけるテレポーテーション数を最小化する新しい手法を提案する。量子テレポーテーションは、量子情報の通信において重要な役割を果たしている。そのため、量子マシンのネットワーク上に量子アルゴリズムを分散させる際には、できるだけ少ないテレポーテーションを実行することが望ましい。グラフ理論やヒューリスティック検索技術に頼っている既存の手法とは対照的に,フォーマルな手法を駆使してテレポーテーション数を最小化する手法を提案する。具体的には,アロイにおけるテレポーテーション最小化問題の形式的仕様,提案するアロイ仕様の量子回路への一般化可能性,異なる量子回路やネットワークに対するアロイ仕様の再使用性,ロードバランシングやヘテロジニティといった他の問題の特定および解決の単純さ,提案手法の構成性などについて述べる。我々はまた、量子回路のテキスト記述を入力として、対応するアロイモデルを生成し、最終的にアロイアナライザを用いて最小化問題を解くqcAlloyと呼ばれるソフトウェアツールを開発した。我々は、100量子ビットと1200層以上のRevLibベンチマークのいくつかの回路に対して、qcAlloyを実験的に評価し、テレポーテーション数の最小化の観点から、ほとんどのベンチマーク回路において、qcAlloyが最も効率的な既存手法の1つであることを示した。 This paper presents a novel approach for minimizing the number of teleportations in Distributed Quantum Computing (DQC) using formal methods. Quantum teleportation plays a major role in communicating quantum information. As such, it is desirable to perform as few teleportations as possible when distributing a quantum algorithm on a network of quantum machines. Contrary to most existing methods which rely on graph-theoretic or heuristic search techniques, we propose a drastically different approach for minimizing the number of teleportations through utilizing formal methods. Specifically, the contributions of this paper include: the formal specification of the teleportation minimization problem in Alloy, the generalizability of the proposed Alloy specifications to quantum circuits with $n$-ary gates, the reusability of the Alloy specifications for different quantum circuits and networks, the simplicity of specifying and solving other problems such as load balancing and heterogeneity, and the compositionality of the proposed approach. We also develop a software tool, called qcAlloy, that takes as input the textual description of a quantum circuit, generates the corresponding Alloy model, and finally solves the minimization problem using the Alloy analyzer. We have experimentally evaluated qcAlloy for some of the circuits in the RevLib benchmark with more than 100 qubits and 1200 layers, and have demonstrated that qcAlloy outperforms one of the most efficient existing methods for most benchmark circuits in terms of minimizing the number of teleportations.	翻訳日:2024-04-26 18:41:38 公開日:2024-04-24
# 位相符号を持つ重ヘックス格子における絡み合った論理量子ビットの生成 Creating entangled logical qubits in the heavy-hex lattice with topological codes ( http://arxiv.org/abs/2404.15989v1 ) ライセンス: Link先を確認	Bence Hetényi, James R. Wootton,	(参考訳) 量子誤差補正の設計は、量子ビットの接続性に強く依存する。固体量子ビットの場合、最も簡単なアプローチは、平面グラフに接続を制約することである。実際の考慮事項は接続性をさらに制限し、現在のIBM Quantumデバイスのヘビーヘックスアーキテクチャのような比較的スパースなグラフをもたらす可能性がある。そのような場合、全ての量子ビットをその潜在能力を最大限に活用することは困難である。代わりに、よく知られた量子誤り訂正符号を実装するために必要なより密接な接続をエミュレートするために、多くの量子ビットは効果的に使われないままである。この作業では、このバグが機能にどのように変換されるかを示します。 1つのコードの未使用のキュービットを使って別のコードを実行することで、2つのコードが相互に実装され、フォールトトレラントなエンタングルゲートと測定を簡単に適用できる。我々は、表面コードとBacon-Shor符号を133量子ビットのIBM量子デバイス上で実現し、これを実証する。横方向のCXゲートと格子の手術を用いて、コード距離が最大$d = 4$および5ラウンドの安定化器測定サイクルを持つこれらの論理量子ビット間の絡み合いを示す。量子ビット間の非平面結合により、論理的な$XX$, $YY$, $ZZ$Observablesを同時に測定できる。これにより、$d=2$の場合と$d=3$の場合の両方において、9,4\%$の忠実さを特徴とするポストセレクションと、量子誤り訂正のみを用いて$d=3$のインスタンスの不正性を検証する。 Designs for quantum error correction depend strongly on the connectivity of the qubits. For solid state qubits, the most straightforward approach is to have connectivity constrained to a planar graph. Practical considerations may also further restrict the connectivity, resulting in a relatively sparse graph such as the heavy-hex architecture of current IBM Quantum devices. In such cases it is hard to use all qubits to their full potential. Instead, in order to emulate the denser connectivity required to implement well-known quantum error correcting codes, many qubits remain effectively unused. In this work we show how this bug can be turned into a feature. By using the unused qubits of one code to execute another, two codes can be implemented on top of each other, allowing easy application of fault-tolerant entangling gates and measurements. We demonstrate this by realizing a surface code and a Bacon-Shor code on a 133 qubit IBM Quantum device. Using transversal CX gates and lattice surgery we demonstrate entanglement between these logical qubits with code distance up to $d = 4$ and five rounds of stabilizer measurement cycles. The nonplanar coupling between the qubits allows us to simultaneously measure the logical $XX$, $YY$, and $ZZ$ observables. With this we verify the violation of Bell's inequality for both the $d=2$ case with post selection featuring a fidelity of $94\%$, and the $d=3$ instance using only quantum error correction.	翻訳日:2024-04-26 18:41:38 公開日:2024-04-24
# HDDGAN:赤外・可視画像融合のための異種二重識別器生成アドバイサルネットワーク HDDGAN: A Heterogeneous Dual-Discriminator Generative Adversarial Network for Infrared and Visible Image Fusion ( http://arxiv.org/abs/2404.15992v1 ) ライセンス: Link先を確認	Guosheng Lu, Zile Fang, Chunming He, Zhigang Zhao,	(参考訳) 赤外線・可視画像融合(IVIF)は、可視画像からテクスチャの詳細を統合しつつ、赤外線画像からの熱放射情報を保存することを目的としており、複雑なシーンや乱れた環境において、重要な特徴や被写体の隠れた詳細を捉えることができる。その結果、IVIFは、ビデオ監視、夜間ナビゲーション、ターゲット認識などの実用的な応用において、明確なアドバンテージを提供する。しかし、赤外線と可視画像の異なる特徴により、熱領域の特徴と詳細な情報を同時に取得する上で、一般的な手法はしばしば課題に直面している。その結果、融合の結果は熱標的領域情報とテクスチャの詳細の間の妥協を頻繁に伴う。本研究では,この問題に対処するために,新しい異種二重識別器生成敵ネットワーク(HDDGAN)を提案する。具体的には、このジェネレータはマルチスケールのスキップ接続構造として構成され、異なるソース画像から必須の特徴の抽出を容易にする。融合結果の情報表現能力を高めるために、ソース画像間の相違を利用して、ジェネレータ内の情報融合層を構築するための注意機構を用いる。さらに、赤外線と可視画像における情報の異なる学習要件を認識し、異なる構造を持つ2つの識別器を設計する。本手法は、可視画像から詳細な情報を同時に取得しながら、赤外線画像から有能な情報を学習するためのモデルを導くことを目的としている。様々な公開データセット上で行った大規模な実験は、提案したHDDGANが他の最先端(SOTA)アルゴリズムよりも優れていることを実証し、実用的な応用の可能性を強調した。 Infrared and visible image fusion (IVIF) aims to preserve thermal radiation information from infrared images while integrating texture details from visible images, enabling the capture of important features and hidden details of subjects in complex scenes and disturbed environments. Consequently, IVIF offers distinct advantages in practical applications such as video surveillance, night navigation, and target recognition. However, prevailing methods often face challenges in simultaneously capturing thermal region features and detailed information due to the disparate characteristics of infrared and visible images. Consequently, fusion outcomes frequently entail a compromise between thermal target area information and texture details. In this study, we introduce a novel heterogeneous dual-discriminator generative adversarial network (HDDGAN) to address this issue. Specifically, the generator is structured as a multi-scale skip-connected structure, facilitating the extraction of essential features from different source images. To enhance the information representation ability of the fusion result, an attention mechanism is employed to construct the information fusion layer within the generator, leveraging the disparities between the source images. Moreover, recognizing the distinct learning requirements of information in infrared and visible images, we design two discriminators with differing structures. This approach aims to guide the model to learn salient information from infrared images while simultaneously capturing detailed information from visible images. Extensive experiments conducted on various public datasets demonstrate the superiority of our proposed HDDGAN over other state-of-the-art (SOTA) algorithms, highlighting its enhanced potential for practical applications.	翻訳日:2024-04-26 18:41:38 公開日:2024-04-24
# LLMの不確かさ推定と定量化: 簡単な監視手法 Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach ( http://arxiv.org/abs/2404.15993v1 ) ライセンス: Link先を確認	Linyu Liu, Yu Pan, Xiaocheng Li, Guanting Chen,	(参考訳) 大規模言語モデル(LLM)は多くのタスクに対して高い能力を持つが、信頼できないあるいは不正確な出力を生成することがある。この問題に対処するために,LLMの不確実性推定と校正の問題について検討する。まず LLM の不確実性推定問題を定式化し,ラベル付きデータセットを利用して LLM の応答の不確かさを推定する教師付きアプローチを提案する。定式化に基づいて,LLMの不確実性推定と標準MLモデルの不確実性推定の違いを説明し,LLMの隠れアクティベーションが不確実性情報を含んでいる理由を説明する。提案手法は, 各種タスク間の不確実性評価に隠れアクティベーションを利用する利点を効果的に示し, アウト・オブ・ディストリビューション・セッティングにおけるロバストな転送可能性を示す。さらに,不確実性推定タスクと不確実性判定タスクを区別し,不確実性推定モードが良好なキャリブレーション性能をもたらすことを示す。実際には,本手法は実装が容易で,ブラックボックス,グレイボックス,ホワイトボックスなど,さまざまなモデルの透過性に適応し,LCMの内部機構のアクセシビリティに基づいた高い性能を示す。 Large language models (LLMs) are highly capable of many tasks but they can sometimes generate unreliable or inaccurate outputs. To tackle this issue, this paper studies the problem of uncertainty estimation and calibration for LLMs. We begin by formulating the uncertainty estimation problem for LLMs and then propose a supervised approach that takes advantage of the labeled datasets and estimates the uncertainty of the LLMs' responses. Based on the formulation, we illustrate the difference between the uncertainty estimation for LLMs and that for standard ML models and explain why the hidden activations of the LLMs contain uncertainty information. Our designed approach effectively demonstrates the benefits of utilizing hidden activations for enhanced uncertainty estimation across various tasks and shows robust transferability in out-of-distribution settings. Moreover, we distinguish the uncertainty estimation task from the uncertainty calibration task and show that a better uncertainty estimation mode leads to a better calibration performance. In practice, our method is easy to implement and is adaptable to different levels of model transparency including black box, grey box, and white box, each demonstrating strong performance based on the accessibility of the LLM's internal mechanisms.	翻訳日:2024-04-26 18:41:38 公開日:2024-04-24
# BeSound: クロスモーダル蒸留によるBluetoothによる位置推定 BeSound: Bluetooth-Based Position Estimation Enhancing with Cross-Modality Distillation ( http://arxiv.org/abs/2404.15999v1 ) ライセンス: Link先を確認	Hymalai Bello, Sungho Suh, Bo Zhou, Paul Lukowicz,	(参考訳) スマートファクトリーは、製造プロセスの最適化と効率の向上に先進技術を活用している。主にカメラベースの手法による作業者追跡システムの実装は、正確な監視を保証する。しかしながら、労働者のプライバシと技術保護に関する懸念は、代替アプローチを検討する必要がある。本稿では,Bluetooth Low Energy (BLE) と超音波座標を用いた非視覚的,スケーラブルなソリューションを提案する。 BLEの位置推定は、スマートフォンで利用でき、多くのスマートフォンユーザーのためにスケーラブルであり、労働者のローカライゼーションと安全プロトコルの送信を容易にするため、非常に低消費電力でコスト効率のソリューションを提供する。超音波信号は応答時間と精度を向上するが、カスタムハードウェアが必要であり、コストが増大する。両モダリティの利点を組み合わせるために,超音波信号からBLE RSSIデータへの知識蒸留(KD)を用いる。学生モデルが訓練されると、モデルはBLE-RSSIデータを入力して推論するだけで、ユビキティとBLE RSSIの低コストの利点を保ちます。スマートファクトリテストベッド環境において,12人の参加者による実験から得られたデータを用いて,アプローチを検証した。その結果,F1スコアの11.79%がベースライン(KDのないターゲットモデル,BLE-RSSIデータのみのトレーニング)に比べて増加した。 Smart factories leverage advanced technologies to optimize manufacturing processes and enhance efficiency. Implementing worker tracking systems, primarily through camera-based methods, ensures accurate monitoring. However, concerns about worker privacy and technology protection make it necessary to explore alternative approaches. We propose a non-visual, scalable solution using Bluetooth Low Energy (BLE) and ultrasound coordinates. BLE position estimation offers a very low-power and cost-effective solution, as the technology is available on smartphones and is scalable due to the large number of smartphone users, facilitating worker localization and safety protocol transmission. Ultrasound signals provide faster response times and higher accuracy but require custom hardware, increasing costs. To combine the benefits of both modalities, we employ knowledge distillation (KD) from ultrasound signals to BLE RSSI data. Once the student model is trained, the model only takes as inputs the BLE-RSSI data for inference, retaining the advantages of ubiquity and low cost of BLE RSSI. We tested our approach using data from an experiment with twelve participants in a smart factory test bed environment. We obtained an increase of 11.79% in the F1-score compared to the baseline (target model without KD and trained with BLE-RSSI data only).	翻訳日:2024-04-26 18:41:38 公開日:2024-04-24
# 包括的で使いやすいマルチタスク医療画像メタデータセット(MedIMeta) A comprehensive and easy-to-use multi-domain multi-task medical imaging meta-dataset (MedIMeta) ( http://arxiv.org/abs/2404.16000v1 ) ライセンス: Link先を確認	Stefano Woerner, Arthur Jaques, Christian F. Baumgartner,	(参考訳) 医療画像分析の分野では、機械学習技術の統合による変革が起きているが、これらの技術の主な課題は、大きく、多様で、よく注釈付けされたデータセットの不足であることが多い。医療画像はフォーマット、サイズ、その他のパラメータによって異なり、機械学習での使用には広範な事前処理と標準化が必要である。これらの課題に対処するため,新しいマルチドメイン・マルチタスク・メタデータセットであるMedIMeta(MedIMeta)を紹介した。 MedIMetaには、10の異なるドメインにまたがる19の医療画像データセットが含まれており、54の異なる医療タスクを含んでいる。我々はMedimetaの技術的検証を行い、完全に教師付きおよびクロスドメインの学習ベースラインを通じてその実用性を実証する。 While the field of medical image analysis has undergone a transformative shift with the integration of machine learning techniques, the main challenge of these techniques is often the scarcity of large, diverse, and well-annotated datasets. Medical images vary in format, size, and other parameters and therefore require extensive preprocessing and standardization, for usage in machine learning. Addressing these challenges, we introduce the Medical Imaging Meta-Dataset (MedIMeta), a novel multi-domain, multi-task meta-dataset. MedIMeta contains 19 medical imaging datasets spanning 10 different domains and encompassing 54 distinct medical tasks, all of which are standardized to the same format and readily usable in PyTorch or other ML frameworks. We perform a technical validation of MedIMeta, demonstrating its utility through fully supervised and cross-domain few-shot learning baselines.	翻訳日:2024-04-26 18:31:49 公開日:2024-04-24
# Max-Cut問題に対するヒューリスティックフロケット断熱アルゴリズムのベンチマーク Benchmarking a heuristic Floquet adiabatic algorithm for the Max-Cut problem ( http://arxiv.org/abs/2404.16001v1 ) ライセンス: Link先を確認	Etienne Granet, Henrik Dreyer,	(参考訳) 量子力学の断熱定理によれば、ハミルトニアンの基底状態にある系は、徐々にハミルトニアンが変化すると基底状態に残る。これは原理的に量子コンピュータの難しい問題を解くのに使うことができる。しかし、このハミルトン力学をデジタル量子コンピュータに実装するには、システムのサイズとシミュレーション時間でトロッターステップのサイズをスケーリングする必要がある。本研究では,古典的最適化問題に対して,有限トラッターステップで断熱的進化を行うことができることを論じる。この「フロッケの断熱進化」は、通常の連続的な断熱進化と比べて門の数を数桁減少させる。行列積-状態シミュレーションを用いた数値的なエビデンスでは、多数のインスタンスにおいて3ドル正則グラフ上のマックス・カット問題を、驚くほど低い実行時間で最適に解くことができるが、結合次元が$D=2$である。計算結果を外挿することで、量子コンピュータが古典的な正確な解法や近似解法と競合するために必要なリソースを推定する。 According to the adiabatic theorem of quantum mechanics, a system initially in the ground state of a Hamiltonian remains in the ground state if one slowly changes the Hamiltonian. This can be used in principle to solve hard problems on quantum computers. Generically, however, implementation of this Hamiltonian dynamics on digital quantum computers requires scaling Trotter step size with system size and simulation time, which incurs a large gate count. In this work, we argue that for classical optimization problems, the adiabatic evolution can be performed with a fixed, finite Trotter step. This "Floquet adiabatic evolution" reduces by several orders of magnitude the gate count compared to the usual, continuous-time adiabatic evolution. We give numerical evidence using matrix-product-state simulations that it can optimally solve the Max-Cut problem on $3$-regular graphs in a large number of instances, with surprisingly low runtime, even with bond dimensions as low as $D=2$. Extrapolating our numerical results, we estimate the resources needed for a quantum computer to compete with classical exact or approximate solvers for this specific problem.	翻訳日:2024-04-26 18:31:49 公開日:2024-04-24
# 中心とチャネル状態の双対性 Channel-State duality with centers ( http://arxiv.org/abs/2404.16004v1 ) ライセンス: Link先を確認	Simon Langenscheidt, Daniele Oriti, Eugenia Colafranceschi,	(参考訳) 直和構造を持つヒルベルト空間の場合に対して、通常のチャネル状態双対性から生じる写像の拡張について検討する。この設定は、一般に制約と結びついている中心を持つ代数の表現に現れ、量子多体理論からホログラフィーや量子重力まで多くの物理的応用がある。我々は、状態の非分離性と誘導チャネルの等尺性との間には一般的な関係があることを証明した。また、無限次元ヒルベルト空間上のトレースクラス作用素の代数へのアプローチの一般化も提供する。 We study extensions of the mappings arising in usual Channel-State duality to the case of Hilbert spaces with a direct sum structure. This setting arises in representations of algebras with centers, which are commonly associated with constraints, and it has many physical applications from quantum many-body theory to holography and quantum gravity. We establish that there is a general relationship between non-separability of the state and the isometric properties of the induced channel. We also provide a generalisation of our approach to algebras of trace-class operators on infinite dimensional Hilbert spaces.	翻訳日:2024-04-26 18:31:49 公開日:2024-04-24
# ウェアラブル能動認識のための単モーダル・マルチモーダルセンサフュージョン Unimodal and Multimodal Sensor Fusion for Wearable Activity Recognition ( http://arxiv.org/abs/2404.16005v1 ) ライセンス: Link先を確認	Hymalai Bello,	(参考訳) 異なる感覚のモダリティと複数の位置を組み合わせることは、人間の行動のような複雑な状況に対する統一された認識と理解を形成するのに役立つ。したがって、ヒューマンアクティビティ認識(HAR)は、重複情報と補完情報(Unimodal/Multimodal)を組み合わせることで恩恵を受ける。それでも、簡単な作業ではありません。センサー技術、信号処理、データ融合アルゴリズム、ドメイン固有の知識などの専門知識を含む、多分野のアプローチが必要です。このPh.D.の仕事は、慣性、圧力(音響と大気圧)、およびHARのための繊維の容量感覚のような感覚モーダルを取り入れている。探索されたシナリオは、ジェスチャーと手の位置追跡、顔と頭部のパターン認識、身体姿勢とジェスチャー認識である。選択されたウェアラブルデバイスとセンシングモダリティは、マシンラーニングベースのアルゴリズムと完全に統合されており、その一部は組み込みデバイス、エッジに実装され、リアルタイムでテストされる。 Combining different sensing modalities with multiple positions helps form a unified perception and understanding of complex situations such as human behavior. Hence, human activity recognition (HAR) benefits from combining redundant and complementary information (Unimodal/Multimodal). Even so, it is not an easy task. It requires a multidisciplinary approach, including expertise in sensor technologies, signal processing, data fusion algorithms, and domain-specific knowledge. This Ph.D. work employs sensing modalities such as inertial, pressure (audio and atmospheric pressure), and textile capacitive sensing for HAR. The scenarios explored are gesture and hand position tracking, facial and head pattern recognition, and body posture and gesture recognition. The selected wearable devices and sensing modalities are fully integrated with machine learning-based algorithms, some of which are implemented in the embedded device, on the edge, and tested in real-time.	翻訳日:2024-04-26 18:31:49 公開日:2024-04-24
# MMT-Bench:マルチタスクAGIに向けた大規模ビジョンランゲージモデル評価のための総合的マルチモーダルベンチマーク MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI ( http://arxiv.org/abs/2404.16006v1 ) ライセンス: Link先を確認	Kaining Ying, Fanqing Meng, Jin Wang, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Runjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao,	(参考訳) LVLM(Large Vision-Language Models)は、視覚対話や埋め込みナビゲーションといった汎用マルチモーダルアプリケーションにおいて大きな進歩を見せている。しかし、既存のマルチモーダル評価ベンチマークでは、LVLM開発を追尾するに足りず、初歩的な能力をテストする限られた数のマルチモーダルタスクをカバーしている。本研究では,専門家の知識と意図的な視覚認識,ローカライゼーション,推論,計画を必要とする大規模マルチモーダルタスクのLVLMを評価するための総合的なベンチマークであるMT-Benchを提案する。 MMT-Benchは、自動車運転や車載ナビゲーションなど、さまざまなマルチモーダルシナリオから、厳密にキュレートされた多目的視覚質問を311,325ドル、マルチモーダル理解において32ドルのメタタスクと162ドルのサブタスクをカバーしている。 MMT-Benchはその広範なタスクカバレッジのため、タスクマップを使用してLVLMの評価を可能にし、ドメイン内および外部タスクの発見を容易にする。プロプライエタリなGPT-4V、GeminiProVision、オープンソースのInternVL-Chatなどの30ドルのLVLMによる評価結果は、MMT-Benchがもたらす重大な課題を浮き彫りにした。我々は,MT-Benchがコミュニティに,汎用マルチモーダルインテリジェンスの実現を目的とした次世代マルチモーダル基盤モデルの開発を促すことを期待する。 Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks testing rudimentary capabilities, falling short in tracking LVLM development. In this study, we present MMT-Bench, a comprehensive benchmark designed to assess LVLMs across massive multimodal tasks requiring expert knowledge and deliberate visual recognition, localization, reasoning, and planning. MMT-Bench comprises $31,325$ meticulously curated multi-choice visual questions from various multimodal scenarios such as vehicle driving and embodied navigation, covering $32$ core meta-tasks and $162$ subtasks in multimodal understanding. Due to its extensive task coverage, MMT-Bench enables the evaluation of LVLMs using a task map, facilitating the discovery of in- and out-of-domain tasks. Evaluation results involving $30$ LVLMs such as the proprietary GPT-4V, GeminiProVision, and open-sourced InternVL-Chat, underscore the significant challenges posed by MMT-Bench. We anticipate that MMT-Bench will inspire the community to develop next-generation multimodal foundation models aimed at achieving general-purpose multimodal intelligence.	翻訳日:2024-04-26 18:31:49 公開日:2024-04-24
# 射影ベル状態測定による二部体機械的猫状態の生成 Generation of bipartite mechanical cat state by performing projective Bell state measurement ( http://arxiv.org/abs/2404.16007v1 ) ライセンス: Link先を確認	Roson Nongthombam, Urmimala Dewan, Amarendra K. Sarma,	(参考訳) 量子状態の調製と、フォトニックおよびフォノンのSchr\"odinger cat状態の測定は、量子計算における代替符号化スキームの影響により、大きな関心を集めている。これらのスキームはコヒーレントな状態重ね合わせを採用し、2レベルシステムとは対照的にキャビティまたは機械共振器によって提供される拡張ヒルベルト空間を利用する。さらに、このような猫の状態は、マクロ系の基本的な量子現象をテストするためのプラットフォームとしても機能する。本研究では, 2つの超伝導量子ビット上でのベル状態計測により, 絡み合い交換方式を用いて4つの二分音素のベル猫状態を生成する。その後,CHSH法を用いて両側ネコ状態のベル不等式試験を行った。絡み合い状態が絡み合い交換によって生成されることを考えると,本手法は連続変数系に基づく複雑な量子ネットワークプロセッサの進歩に有望な応用を期待できる。 Quantum state preparation and measurement of photonic and phononic Schr\"odinger cat states have gathered significant interest due to their implications for alternative encoding schemes in quantum computation. These scheme employ coherent state superpositions, leveraging the expanded Hilbert space provided by cavity or mechanical resonators in contrast to two-level systems. Moreover, such cat states also serve as a platform for testing fundamental quantum phenomena in macroscopic systems. In this study, we generate four bipartite phononic Bell cat states using an entanglement swapping scheme achieved through projective Bell state measurements on two superconducting qubits. Subsequently, we conduct a Bell inequality test on the bipartite cat state using the CHSH formulation. Given that the entangled cat states are generated through entanglement swapping, our approach could hold promising applications for the advancement of complex quantum network processors based on continuous variable systems.	翻訳日:2024-04-26 18:31:49 公開日:2024-04-24
# Gated Sparse Autoencodersによる辞書学習の改善 Improving Dictionary Learning with Gated Sparse Autoencoders ( http://arxiv.org/abs/2404.16014v1 ) ライセンス: Link先を確認	Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah, Neel Nanda,	(参考訳) 最近の研究で、スパースオートエンコーダ(SAE)は、言語モデル(LM)アクティベーションにおける解釈可能な特徴の教師なし発見に有効な手法であることがわかった。 Gated Sparse Autoencoder (Gated SAE) を導入する。 SAEでは、スパーシリティを促進するために使われるL1ペナルティは、縮小など多くの望ましくないバイアスをもたらす。 Gated SAEの重要な洞察は、機能の分離である。 a) どの方向を使うか、または使うかを決定すること b) これらの方向の大きさを推定することにより、L1ペナルティを前者のみに適用することができ、望ましくない副作用の範囲を制限することができる。最大7BパラメータのLM上でのSAEのトレーニングにより、通常の超パラメータ範囲では、Gated SAEは収縮を解消し、同様に解釈可能であり、同等の再現忠実性を達成するのに半分の発射特性を必要とすることがわかった。 Recent work has found that sparse autoencoders (SAEs) are an effective technique for unsupervised discovery of interpretable features in language models' (LMs) activations, by finding sparse, linear reconstructions of LM activations. We introduce the Gated Sparse Autoencoder (Gated SAE), which achieves a Pareto improvement over training with prevailing methods. In SAEs, the L1 penalty used to encourage sparsity introduces many undesirable biases, such as shrinkage -- systematic underestimation of feature activations. The key insight of Gated SAEs is to separate the functionality of (a) determining which directions to use and (b) estimating the magnitudes of those directions: this enables us to apply the L1 penalty only to the former, limiting the scope of undesirable side effects. Through training SAEs on LMs of up to 7B parameters we find that, in typical hyper-parameter ranges, Gated SAEs solve shrinkage, are similarly interpretable, and require half as many firing features to achieve comparable reconstruction fidelity.	翻訳日:2024-04-26 18:31:49 公開日:2024-04-24
# 神経オペレーターは磁気流体力学の局所物理学を学ぶ Neural Operators Learn the Local Physics of Magnetohydrodynamics ( http://arxiv.org/abs/2404.16015v1 ) ライセンス: Link先を確認	Taeyoung Kim, Youngsoo Ha, Myungjoo Kang,	(参考訳) 磁気流体力学(MHD)は、プラズマと導電性流体の力学を記述し、恒星や銀河の構造や進化などの現象を理解するのに不可欠であり、理想的なMHD方程式によるプラズマ運動のための核融合において重要な役割を担っている。これらの双曲型PDEを解くには、複雑な構造と高いコストによる計算上の課題を提示する、洗練された数値的な方法が必要である。最近の進歩は、従来の数値解析のための代理モデルとしてフーリエニューラル演算子(FNO)のようなニューラル演算子を導入している。本研究では, 理想的なMHDの数値フラックスを近似する修正されたフラックスフーリエニューラル演算子モデルについて検討し, 連続推論, サンプル分布外への一般化, 古典的な数値スキームよりも高速な計算を行うことにより, 既存のニューラル演算子モデルより優れている手法を提案する。 Magnetohydrodynamics (MHD) plays a pivotal role in describing the dynamics of plasma and conductive fluids, essential for understanding phenomena such as the structure and evolution of stars and galaxies, and in nuclear fusion for plasma motion through ideal MHD equations. Solving these hyperbolic PDEs requires sophisticated numerical methods, presenting computational challenges due to complex structures and high costs. Recent advances introduce neural operators like the Fourier Neural Operator (FNO) as surrogate models for traditional numerical analyses. This study explores a modified Flux Fourier neural operator model to approximate the numerical flux of ideal MHD, offering a novel approach that outperforms existing neural operator models by enabling continuous inference, generalization outside sampled distributions, and faster computation compared to classical numerical schemes.	翻訳日:2024-04-26 18:31:49 公開日:2024-04-24
# RetinaRegNet:網膜画像登録のためのVersatileアプローチ RetinaRegNet: A Versatile Approach for Retinal Image Registration ( http://arxiv.org/abs/2404.16017v1 ) ライセンス: Link先を確認	Vishal Balaji Sivaraman, Muhammad Imran, Qingyue Wei, Preethika Muralidharan, Michelle R. Tamplin, Isabella M . Grumbach, Randy H. Kardon, Jui-Kai Wang, Yuyin Zhou, Wei Shao,	(参考訳) 本稿では,網膜画像登録タスクにおける最先端性能を実現するRetinaRegNetモデルを提案する。 RetinaRegNetは網膜画像のトレーニングを必要としない。拡散モデルから派生した画像特徴を用いて、2つの網膜画像間の点対応を確立することから始まる。このプロセスでは、SIFTアルゴリズムとランダム点サンプリングを併用して、移動画像から特徴点を選択する。各選択された特徴点について、その点における特徴ベクトルと固定画像中の全ての画素の特徴ベクトルとの類似性を評価することにより、2D相関マップを算出する。相関マップにおける最も類似度の高い画素は、移動画像の特徴点に対応する。推定点対応における外れ値を取り除くために、まず逆整合制約を適用し、次に変換に基づく外れ値検出器を適用した。この手法は、広く使われているランダムサンプルコンセンサス(RANSAC)のアウリア検出器をかなりの差で上回った。大きな変形に対処するために、我々は2段階の画像登録フレームワークを利用した。第1段階ではホモグラフィ変換を用い,第2段階ではより正確な3階多項式変換を用いた。このモデルの有効性は、カラーファンドス画像、フルオレセイン血管造影画像、レーザースペックルフロー画像の3つの網膜画像データセットで実証された。 RetinaRegNetは、現在の最先端メソッドを3つのデータセットすべてで上回った。特に画像対を大きな変位とスケール変形で登録するのに有効であった。この革新は網膜画像解析における様々な応用を約束する。私たちのコードはhttps://github.com/mirthAI/RetinaRegNet.comで公開されています。 We introduce the RetinaRegNet model, which can achieve state-of-the-art performance across various retinal image registration tasks. RetinaRegNet does not require training on any retinal images. It begins by establishing point correspondences between two retinal images using image features derived from diffusion models. This process involves the selection of feature points from the moving image using the SIFT algorithm alongside random point sampling. For each selected feature point, a 2D correlation map is computed by assessing the similarity between the feature vector at that point and the feature vectors of all pixels in the fixed image. The pixel with the highest similarity score in the correlation map corresponds to the feature point in the moving image. To remove outliers in the estimated point correspondences, we first applied an inverse consistency constraint, followed by a transformation-based outlier detector. This method proved to outperform the widely used random sample consensus (RANSAC) outlier detector by a significant margin. To handle large deformations, we utilized a two-stage image registration framework. A homography transformation was used in the first stage and a more accurate third-order polynomial transformation was used in the second stage. The model's effectiveness was demonstrated across three retinal image datasets: color fundus images, fluorescein angiography images, and laser speckle flowgraphy images. RetinaRegNet outperformed current state-of-the-art methods in all three datasets. It was especially effective for registering image pairs with large displacement and scaling deformations. This innovation holds promise for various applications in retinal image analysis. Our code is publicly available at https://github.com/mirthAI/RetinaRegNet.	翻訳日:2024-04-26 18:31:49 公開日:2024-04-24
# PRISMアライメントプロジェクト:大規模言語モデルの主観的・多文化的アライメントに関する参加的・代表的・個人的フィードバック The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models ( http://arxiv.org/abs/2404.16019v1 ) ライセンス: Link先を確認	Hannah Rose Kirk, Alexander Whitefield, Paul Röttger, Andrew Bean, Katerina Margatina, Juan Ciro, Rafael Mosquera, Max Bartolo, Adina Williams, He He, Bertie Vidgen, Scott A. Hale,	(参考訳) 人間のフィードバックは、大規模言語モデル(LLM)のアライメントにおいて中心的な役割を果たす。しかしながら、人間のフィードバック収集の方法(方法)、ドメイン(場所)、人(人)、目的(目的)について、オープンな疑問が残る。 PRISMは,75か国から1500の多様な参加者の好みを,21のLDMと8,011のライブ会話において,文脈的嗜好ときめ細かいフィードバックにマッピングする新しいデータセットである。 PRISM の貢献一人的フィードバックデータにおける広域的及び人口統計学的関与 (二集団福祉(英国及び米国)の理解のための国勢調査表示サンプル二点、及び三すべての評価が詳細な参加者プロファイルに関連づけられた個別のフィードバックにより、個人化及びサンプルアーティファクトの帰属が図られる。我々は、主観的・多文化的な視点を主眼とする会話の収集に重点を置いており、最も対人的・異文化的な意見の相違を期待する。我々は,対話の多様性,嗜好の多様性,福祉効果の3つのケーススタディを通じて,PRISMの有用性を実証し,人間がアライメント規範を設定することの重要性を示した。私たちは、リッチなコミュニティリソースを提供するだけでなく、AI開発への幅広い参加と、技術設計に対するより包括的なアプローチを提唱しています。 Human feedback plays a central role in the alignment of Large Language Models (LLMs). However, open questions remain about the methods (how), domains (where), people (who) and objectives (to what end) of human feedback collection. To navigate these questions, we introduce PRISM, a new dataset which maps the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries, to their contextual preferences and fine-grained feedback in 8,011 live conversations with 21 LLMs. PRISM contributes (i) wide geographic and demographic participation in human feedback data; (ii) two census-representative samples for understanding collective welfare (UK and US); and (iii) individualised feedback where every rating is linked to a detailed participant profile, thus permitting exploration of personalisation and attribution of sample artefacts. We focus on collecting conversations that centre subjective and multicultural perspectives on value-laden and controversial topics, where we expect the most interpersonal and cross-cultural disagreement. We demonstrate the usefulness of PRISM via three case studies of dialogue diversity, preference diversity, and welfare outcomes, showing that it matters which humans set alignment norms. As well as offering a rich community resource, we advocate for broader participation in AI development and a more inclusive approach to technology design.	翻訳日:2024-04-26 18:31:49 公開日:2024-04-24
# Universal Adversarial TriggersはUniversalではない Universal Adversarial Triggers Are Not Universal ( http://arxiv.org/abs/2404.16020v1 ) ライセンス: Link先を確認	Nicholas Meade, Arkil Patel, Siva Reddy,	(参考訳) 最近の研究は、アライメントされた言語モデルから安全でない応答を引き出すことができる逆引き金と呼ばれるトークンシーケンスを見つけるための最適化手順を開発した。これらのトリガーは普遍的に転送可能であると考えられており、例えば、あるモデルに最適化されたトリガーは、他のモデルをジェイルブレイクすることができる。本稿では,このような敵対的引き金が普遍的でないことを具体的に示す。我々は13個のオープンモデル間のトリガ転送を広範囲に調査し、一貫性のない転送を観察する。提案実験により,予測最適化モデル (APO) とファインチューニングモデル (AFT) の相反的トリガに対するロバスト性に有意な差が認められた。 APOモデルは、トリガがモデルに直接最適化されている場合でも、ジェイルブレイクが非常に難しいことが分かりました。一方, AFT モデルでは, 各種の安全でない命令に対する拒絶反応を呈するが, 敵の引き金に非常に敏感であることを示す。最後に、ATTモデルに最適化されたほとんどのトリガは、5つの異なるドメインからの新しい安全でない命令に一般化され、その脆弱性をさらに強調する。全体として、我々の研究は、アライメント言語モデルのより包括的な安全性評価の必要性を強調しています。 Recent work has developed optimization procedures to find token sequences, called adversarial triggers, which can elicit unsafe responses from aligned language models. These triggers are believed to be universally transferable, i.e., a trigger optimized on one model can jailbreak other models. In this paper, we concretely show that such adversarial triggers are not universal. We extensively investigate trigger transfer amongst 13 open models and observe inconsistent transfer. Our experiments further reveal a significant difference in robustness to adversarial triggers between models Aligned by Preference Optimization (APO) and models Aligned by Fine-Tuning (AFT). We find that APO models are extremely hard to jailbreak even when the trigger is optimized directly on the model. On the other hand, while AFT models may appear safe on the surface, exhibiting refusals to a range of unsafe instructions, we show that they are highly susceptible to adversarial triggers. Lastly, we observe that most triggers optimized on AFT models also generalize to new unsafe instructions from five diverse domains, further emphasizing their vulnerability. Overall, our work highlights the need for more comprehensive safety evaluations for aligned language models.	翻訳日:2024-04-26 18:31:49 公開日:2024-04-24
# PuLID: コントラストアライメントによるPureとLightning IDのカスタマイズ PuLID: Pure and Lightning ID Customization via Contrastive Alignment ( http://arxiv.org/abs/2404.16022v1 ) ライセンス: Link先を確認	Zinan Guo, Yanze Wu, Zhuowei Chen, Lang Chen, Qian He,	(参考訳) 本稿では,PuLID(Pure and Lightning ID customization)を提案する。標準拡散器にLightning T2Iブランチを組み込むことで、PuLIDはコントラストアライメント損失と正確なID損失の両方を導入し、オリジナルのモデルの破壊を最小限に抑え、高いID忠実度を確保する。実験の結果,PuLIDはIDの忠実度と編集性の両方において優れた性能を示した。 PuLIDのもうひとつの魅力は、ID挿入前後のイメージ要素(例えば、背景、照明、構成、スタイル)を可能な限り一貫した状態に保つことである。コードとモデルはhttps://github.com/ToTheBeginning/PuLIDで入手できる。 We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation. By incorporating a Lightning T2I branch with a standard diffusion one, PuLID introduces both contrastive alignment loss and accurate ID loss, minimizing disruption to the original model and ensuring high ID fidelity. Experiments show that PuLID achieves superior performance in both ID fidelity and editability. Another attractive property of PuLID is that the image elements (e.g., background, lighting, composition, and style) before and after the ID insertion are kept as consistent as possible. Codes and models will be available at https://github.com/ToTheBeginning/PuLID	翻訳日:2024-04-26 18:31:49 公開日:2024-04-24
# ベイズ行列正規混合回帰を用いた自動車追従行動の学習 Learning Car-Following Behaviors Using Bayesian Matrix Normal Mixture Regression ( http://arxiv.org/abs/2404.16023v1 ) ライセンス: Link先を確認	Chengyuan Zhang, Kehua Chen, Meixin Zhu, Hai Yang, Lijun Sun,	(参考訳) 自動車追従行動(CF)の学習と理解は, 微視的交通シミュレーションにおいて重要である。従来のCFモデルは、単純ではあるが、しばしば一般化機能に欠けるが、多くのデータ駆動方式は、頑丈さにもかかわらず、限定的な解釈性を持つ「ブラックボックス」として機能する。このギャップを埋めるために、この研究は、CFの挙動に固有の特徴相関と時間ダイナミクスを同時に捉えるベイズ行列正規混合回帰(MNMR)モデルを導入する。このアプローチは、モデルフレームワーク内で行と列の共分散行列を別々に学習することで、人間のドライバ決定プロセスに対する洞察力のある視点を提供する。広範囲な実験を通じて、入力の様々な履歴ステップ、出力の予測ステップ、およびモデル複雑度にまたがるモデルの性能を評価する。その結果,CF中に存在する複雑な相関関係と時間的ダイナミクスを効果的に捉える上で,モデルの有効性を一貫して示すことができた。集中的なケーススタディでは、学習平均と共分散行列を通して、異なる操作条件を識別する、モデルがより優れた解釈可能性を示す。これは、CFシナリオにおける複雑な人間の運転行動を理解する上での我々のモデルの有効性を浮き彫りにするだけでなく、交通シミュレーションや自律運転システムにおけるCF動作の解釈可能性を高めるツールとしての可能性も強調する。 Learning and understanding car-following (CF) behaviors are crucial for microscopic traffic simulation. Traditional CF models, though simple, often lack generalization capabilities, while many data-driven methods, despite their robustness, operate as "black boxes" with limited interpretability. To bridge this gap, this work introduces a Bayesian Matrix Normal Mixture Regression (MNMR) model that simultaneously captures feature correlations and temporal dynamics inherent in CF behaviors. This approach is distinguished by its separate learning of row and column covariance matrices within the model framework, offering an insightful perspective into the human driver decision-making processes. Through extensive experiments, we assess the model's performance across various historical steps of inputs, predictive steps of outputs, and model complexities. The results consistently demonstrate our model's adeptness in effectively capturing the intricate correlations and temporal dynamics present during CF. A focused case study further illustrates the model's outperforming interpretability of identifying distinct operational conditions through the learned mean and covariance matrices. This not only underlines our model's effectiveness in understanding complex human driving behaviors in CF scenarios but also highlights its potential as a tool for enhancing the interpretability of CF behaviors in traffic simulations and autonomous driving systems.	翻訳日:2024-04-26 18:31:49 公開日:2024-04-24
# 複屈折スピン-光子界面は偏光絡みを発生させる Birefringent spin-photon interface generates polarization entanglement ( http://arxiv.org/abs/2404.16025v1 ) ライセンス: Link先を確認	Nikita Leppenen, Dmitry S. Smirnov,	(参考訳) マイクロピラーキャビティにおける単体電荷量子ドットの発光に基づくスピンフォトン界面は、フォトニック絡み状態の生成を可能にする。現在の装置は共振器複屈折に悩まされており、スピン光子絡みの発生を制限する。本稿では、異方性キャビティとの界面による光吸収と発光を理論的に研究し、最大励起およびスピン光子絡み合い条件を導出する。本研究では, マイクロピラーキャビティに対して, 共振器モード間の量子ドット共鳴を厳密に調整することにより, マイクロピラーキャビティに対して, スピン光子状態の1と完全量子ドット集団インバージョンとを等しく一致させることができることを示す。このスイートスポットは、最大エンタングル状態で3つの三角形と忠実度を計算することで示すように、多光子クラスター状態を生成するのにも有効である。 A spin-photon interface based on the luminescence of a singly charged quantum dot in a micropillar cavity allows for the creation of photonic entangled states. Current devices suffer from cavity birefringence, which limits the generation of spin-photon entanglement. In this paper, we theoretically study the light absorption and emission by the interface with an anisotropic cavity and derive the maximal excitation and spin-photon entanglement conditions. We show that the concurrence of the spin-photon state equal to one and complete quantum dot population inversion can be reached for a micropillar cavity with any degree of birefringence by tuning the quantum dot resonance strictly between the cavity modes. This sweet spot is also valid for generating a multiphoton cluster state, as we demonstrate by calculating the three-tangle and fidelity with the maximally entangled state.	翻訳日:2024-04-26 18:31:49 公開日:2024-04-24
# 編集可能な画像要素と制御可能な合成 Editable Image Elements for Controllable Synthesis ( http://arxiv.org/abs/2404.16029v1 ) ライセンス: Link先を確認	Jiteng Mu, Michaël Gharbi, Richard Zhang, Eli Shechtman, Nuno Vasconcelos, Xiaolong Wang, Taesung Park,	(参考訳) 拡散モデルはテキスト誘導合成タスクに大きな進歩をもたらした。しかし,拡散モデルの高次元ノイズ入力空間は,画像インバージョンや空間編集に自然に適していないため,ユーザが提供する画像の編集は依然として困難である。本研究では,拡散モデルを用いて入力画像の空間的編集を促進する画像表現を提案する。具体的には、入力画像を忠実に再構築できる「イメージ要素」に入力を符号化することを学ぶ。これらの要素はユーザによって直感的に編集することができ、拡散モデルによって現実的な画像にデコードされる。オブジェクトのリサイズ,再配置,ドラッグング,デオクルージョン,除去,変動,画像合成など,画像編集作業における表現の有効性を示す。プロジェクトページ: https://jitengmu.github.io/Editable_Image_Elements/ Diffusion models have made significant advances in text-guided synthesis tasks. However, editing user-provided images remains challenging, as the high dimensional noise input space of diffusion models is not naturally suited for image inversion or spatial editing. In this work, we propose an image representation that promotes spatial editing of input images using a diffusion model. Concretely, we learn to encode an input into "image elements" that can faithfully reconstruct an input image. These elements can be intuitively edited by a user, and are decoded by a diffusion model into realistic images. We show the effectiveness of our representation on various image editing tasks, such as object resizing, rearrangement, dragging, de-occlusion, removal, variation, and image composition. Project page: https://jitengmu.github.io/Editable_Image_Elements/	翻訳日:2024-04-26 18:22:04 公開日:2024-04-24
# MoDE: クラスタリングによるCLIPデータエキスパート MoDE: CLIP Data Experts via Clustering ( http://arxiv.org/abs/2404.16030v1 ) ライセンス: Link先を確認	Jiawei Ma, Po-Yao Huang, Saining Xie, Shang-Wen Li, Luke Zettlemoyer, Shih-Fu Chang, Wen-Tau Yih, Hu Xu,	(参考訳) 対照的な言語画像事前訓練(CLIP)の成功は、画像とキャプションのペアリングによる監督に依存しており、ウェブクローリングされたデータでは騒がしい傾向にある。データエキスパートの混合(Mixture of Data Experts, MODE)を提示し,クラスタリングによるCLIPデータエキスパートのシステム学習を行う。各データエキスパートは、あるデータクラスタでトレーニングされ、他のクラスタの偽陰性ノイズに対する感度が低い。推定時には,タスクメタデータとクラスタ条件の相関関係から決定される重みを適用して,それらの出力をアンサンブルする。相関関係を正確に推定するには、あるクラスタ内のサンプルは意味論的に類似するべきであるが、データ専門家の数は、トレーニングと推論に妥当である必要がある。このように、人間の言語におけるオントロジーを考察し、粗粒度レベルで各データエキスパートを表現するために、きめ細かいクラスタセンターを使うことを提案する。実験によると、ViT-B/16の4人のCLIPデータ専門家が、OpenAI CLIPとOpenCLIPによるViT-L/14のゼロショット画像分類よりも優れており、トレーニングコストは安い($35\%)。一方、MoDEはすべてのデータエキスパートを非同期にトレーニングすることができ、フレキシブルに新しいデータエキスパートを組み込むことができます。コードはhttps://github.com/facebookresearch/MetaCLIP/tree/main/modeで公開されている。 The success of contrastive language-image pretraining (CLIP) relies on the supervision from the pairing between images and captions, which tends to be noisy in web-crawled data. We present Mixture of Data Experts (MoDE) and learn a system of CLIP data experts via clustering. Each data expert is trained on one data cluster, being less sensitive to false negative noises in other clusters. At inference time, we ensemble their outputs by applying weights determined through the correlation between task metadata and cluster conditions. To estimate the correlation precisely, the samples in one cluster should be semantically similar, but the number of data experts should still be reasonable for training and inference. As such, we consider the ontology in human language and propose to use fine-grained cluster centers to represent each data expert at a coarse-grained level. Experimental studies show that four CLIP data experts on ViT-B/16 outperform the ViT-L/14 by OpenAI CLIP and OpenCLIP on zero-shot image classification but with less ($<$35\%) training cost. Meanwhile, MoDE can train all data expert asynchronously and can flexibly include new data experts. The code is available at https://github.com/facebookresearch/MetaCLIP/tree/main/mode.	翻訳日:2024-04-26 18:22:04 公開日:2024-04-24
# リアリスティック・ナレッジ・コンフリクトによる大規模言語モデル行動の研究 Studying Large Language Model Behaviors Under Realistic Knowledge Conflicts ( http://arxiv.org/abs/2404.16032v1 ) ライセンス: Link先を確認	Evgenii Kortukov, Alexander Rubinstein, Elisa Nguyen, Seong Joon Oh,	(参考訳) Retrieval-augmented Generation (RAG) は、時間的劣化、幻覚、根拠の欠如など、完全なパラメトリック言語モデルの多くの問題を緩和する。 RAGでは、コンテキストで提供される文書からモデルの知識を更新することができる。これは、モデルのパラメトリック知識とコンテキスト情報の間に矛盾するケースを引き起こし、モデルがその知識を常に更新するとは限らない。それまでの研究は、モデルの正しいパラメトリック回答と矛盾する合成文書を作成することによって、知識の対立を研究していた。本稿では,知識紛争を現実的に研究するための枠組みを提案する。我々は、真に矛盾する文書を用いて、誤ったパラメトリック知識を更新する。これは、知識の衝突が実際どのように起こるのかを反映している。この現実的なシナリオでは、知識更新が以前報告されたよりも頻繁に失敗することが分かります。モデルがまだ回答を更新できない場合、パラメトリックバイアスが見つかります。これらの結果から, LLMの実践的パラメトリック知識は, 読解能力や行動に悪影響を及ぼす可能性が示唆された。私たちのコードはhttps://github.com/kortukov/realistic_knowledge_conflicts/で利用可能です。 Retrieval-augmented generation (RAG) mitigates many problems of fully parametric language models, such as temporal degradation, hallucinations, and lack of grounding. In RAG, the model's knowledge can be updated from documents provided in context. This leads to cases of conflict between the model's parametric knowledge and the contextual information, where the model may not always update its knowledge. Previous work studied knowledge conflicts by creating synthetic documents that contradict the model's correct parametric answers. We present a framework for studying knowledge conflicts in a realistic setup. We update incorrect parametric knowledge using real conflicting documents. This reflects how knowledge conflicts arise in practice. In this realistic scenario, we find that knowledge updates fail less often than previously reported. In cases where the models still fail to update their answers, we find a parametric bias: the incorrect parametric answer appearing in context makes the knowledge update likelier to fail. These results suggest that the factual parametric knowledge of LLMs can negatively influence their reading abilities and behaviors. Our code is available at https://github.com/kortukov/realistic_knowledge_conflicts/.	翻訳日:2024-04-26 18:22:04 公開日:2024-04-24
# Cantor:MLLMのマルチモーダルチェイン・オブ・サード Cantor: Inspiring Multimodal Chain-of-Thought of MLLM ( http://arxiv.org/abs/2404.16033v1 ) ライセンス: Link先を確認	Timin Gao, Peixian Chen, Mengdan Zhang, Chaoyou Fu, Yunhang Shen, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Xing Sun, Liujuan Cao, Rongrong Ji,	(参考訳) 大型言語モデル(LLM)の出現は、チェーン・オブ・シント(CoT)手法によって強化され、視覚的推論問題は通常、管理可能なサブタスクに分解され、様々な外部ツールで順次取り組まれる。しかし、このようなパラダイムは、視覚情報不足や包括的推論に必要な抽象的な要約を提供するのに失敗する低レベルの認識ツールの制限により、意思決定における「幻覚の決定」の可能性に直面している。視覚的コンテキスト獲得と論理的推論の集約は、視覚的推論タスクに取り組む上で重要であると我々は主張する。本稿では,マルチモーダル言語モデル(MLLM)を用いた複雑な視覚的推論タスクとその認知能力を解くために,マルチモーダル CoT の領域を掘り下げる。そこで我々はCantorと呼ばれる革新的なマルチモーダルCoTフレームワークを提案する。 Cantorはまず意思決定ジェネレータとして機能し、視覚入力を統合して画像と問題を分析し、実際のコンテキストとの密接な整合性を確保する。さらに、CantorはMLLMの高度な認知機能を活用し、高いレベルの情報を引き出すための多面的専門家として機能し、CoT生成プロセスを強化する。提案手法の有効性を実証し,2つの複雑な視覚的推論データセットにまたがるマルチモーダルCoT性能の大幅な向上を示す。プロジェクトページ: https://ggg0919.github.io/cantor/。 With the advent of large language models(LLMs) enhanced by the chain-of-thought(CoT) methodology, visual reasoning problem is usually decomposed into manageable sub-tasks and tackled sequentially with various external tools. However, such a paradigm faces the challenge of the potential "determining hallucinations" in decision-making due to insufficient visual information and the limitation of low-level perception tools that fail to provide abstract summaries necessary for comprehensive reasoning. We argue that converging visual context acquisition and logical reasoning is pivotal for tackling visual reasoning tasks. This paper delves into the realm of multimodal CoT to solve intricate visual reasoning tasks with multimodal large language models(MLLMs) and their cognitive capability. To this end, we propose an innovative multimodal CoT framework, termed Cantor, characterized by a perception-decision architecture. Cantor first acts as a decision generator and integrates visual inputs to analyze the image and problem, ensuring a closer alignment with the actual context. Furthermore, Cantor leverages the advanced cognitive functions of MLLMs to perform as multifaceted experts for deriving higher-level information, enhancing the CoT generation process. Our extensive experiments demonstrate the efficacy of the proposed framework, showing significant improvements in multimodal CoT performance across two complex visual reasoning datasets, without necessitating fine-tuning or ground-truth rationales. Project Page: https://ggg0919.github.io/cantor/ .	翻訳日:2024-04-26 18:22:04 公開日:2024-04-24
# MaGGIe:Masked Guided Gradual Human Instance Matting MaGGIe: Masked Guided Gradual Human Instance Matting ( http://arxiv.org/abs/2404.16035v1 ) ライセンス: Link先を確認	Chuong Huynh, Seoung Wug Oh, Abhinav Shrivastava, Joon-Young Lee,	(参考訳) ヒューマン・マッティング(Human matting)は、画像およびビデオ処理における基礎的なタスクであり、入力から人間の前景ピクセルを抽出する。以前の作業では、追加のガイダンスによって精度を向上させるか、フレーム間の単一インスタンスの時間的一貫性を改善するかのどちらかだった。我々は,計算コスト,精度,整合性を維持しつつ,ヒトのインスタンスごとのα行列を段階的に予測する新しいフレームワークであるMasked Guided Gradual Human Instance Mattingを提案する。提案手法はトランスフォーマーアテンションやスパースコンボリューションなど,現代的なアーキテクチャを活用して,メモリやレイテンシを爆発させることなく,すべてのインスタンスマットを同時に出力する。提案手法は,マルチインスタンスシナリオにおいて一定の推論コストを抑えながら,提案したベンチマーク上で頑健かつ多目的な性能を実現する。高品質な画像とビデオのマッチングベンチマークにより、実世界のシナリオにおけるモデルの一般化を促進するために、公開されているソースからの新規なマルチインスタンス合成アプローチが導入された。 Human matting is a foundation task in image and video processing, where human foreground pixels are extracted from the input. Prior works either improve the accuracy by additional guidance or improve the temporal consistency of a single instance across frames. We propose a new framework MaGGIe, Masked Guided Gradual Human Instance Matting, which predicts alpha mattes progressively for each human instances while maintaining the computational cost, precision, and consistency. Our method leverages modern architectures, including transformer attention and sparse convolution, to output all instance mattes simultaneously without exploding memory and latency. Although keeping constant inference costs in the multiple-instance scenario, our framework achieves robust and versatile performance on our proposed synthesized benchmarks. With the higher quality image and video matting benchmarks, the novel multi-instance synthesis approach from publicly available sources is introduced to increase the generalization of models in real-world scenarios.	翻訳日:2024-04-26 18:22:04 公開日:2024-04-24
# TLA+仕様に対する分散プログラムのトレースの検証 Validating Traces of Distributed Programs Against TLA+ Specifications ( http://arxiv.org/abs/2404.16075v1 ) ライセンス: Link先を確認	Horatiu Cirstea, Markus A. Kuppe, Benjamin Loillier, Stephan Merz,	(参考訳) TLA+は、分散アルゴリズムを含むシステムを特定するための形式言語であり、強力な検証ツールによってサポートされている。本稿では,分散プログラムのトレースをTLA+で記述した高レベル仕様に関連付けるためのフレームワークを提案する。この問題は、TLCモデルチェッカーを用いて実現した制約付きモデルチェック問題に還元される。我々のフレームワークは,実行のトレースを記録するためにJavaプログラムを計測するAPI,それらのトレースを仕様に関連付けるために使用するTLA+演算子のコレクション,モデルチェッカーを実行するためのスクリプトで構成される。提案手法を複数の分散プログラムに適用し,すべてのケースにおいて仕様と実装の相違を検出する。本稿では,これらの相違の原因,TLCによる検証の解釈方法,実装開発におけるトレース検証の結果を考慮する方法について論じる。 TLA+ is a formal language for specifying systems, including distributed algorithms, that is supported by powerful verification tools. In this work we present a framework for relating traces of distributed programs to high-level specifications written inTLA+. The problem is reduced to a constrained model checking problem, realized using the TLC model checker. Our framework consists of an API for instrumenting Java programs in order to record traces of executions, of a collection of TLA+ operators that are used for relating those traces to specifications, and of scripts for running the model checker.Crucially, traces only contain updates to specification variables rather than full values, and it is not necessary to provide values for all variables. We have applied our approach to several distributed programs, detecting discrepancies between the specifications and the implementations in all cases. We discuss reasons for these discrepancies, how to interpret the verdict produced by TLC, and how to take into account the results of trace validation for implementation development.	翻訳日:2024-04-26 18:22:04 公開日:2024-04-24
# 地震検出のためのセマンティック進化強化グラフオートエンコーダ Semantic Evolvement Enhanced Graph Autoencoder for Rumor Detection ( http://arxiv.org/abs/2404.16076v1 ) ライセンス: Link先を確認	Xiang Tao, Liang Wang, Qiang Liu, Shu Wu, Liang Wang,	(参考訳) ソーシャルメディア上の噂が急速に広まる中、噂検出は極めて重要な課題となっている。近年,テキスト情報とイベントの伝播構造を利用した多数の噂検出モデルが提案されている。しかし,これらの手法は伝播過程における事象の意味的進化情報の重要性を軽視し,教師付き訓練パラダイムや従来の噂検出手法で真に学ぶことはしばしば困難である。本稿では,新しい意味進化拡張グラフオートエンコーダ(GARD)モデルを提案する。このモデルは、特定のグラフオートエンコーダと再構成戦略を通じて、局所的な意味変化とグローバルな意味進化情報をキャプチャすることで、事象の意味進化情報を学ぶ。セマンティック進化情報と伝搬構造情報を組み合わせることで、イベント伝播の包括的理解を達成し、正確かつ堅牢な検出を行うとともに、セマンティック進化情報を早期にキャプチャすることで、より早い段階での噂を検出する。さらに、噂や非噂の異なるパターンを学習するモデルの能力を高めるために、モデルの性能をさらに向上させる一様正則化手法を導入する。 3つの公開ベンチマークデータセットによる実験結果から、GARD法が全体的な性能と早期噂検出の両方において最先端のアプローチよりも優れていることが確認された。 Due to the rapid spread of rumors on social media, rumor detection has become an extremely important challenge. Recently, numerous rumor detection models which utilize textual information and the propagation structure of events have been proposed. However, these methods overlook the importance of semantic evolvement information of event in propagation process, which is often challenging to be truly learned in supervised training paradigms and traditional rumor detection methods. To address this issue, we propose a novel semantic evolvement enhanced Graph Autoencoder for Rumor Detection (GARD) model in this paper. The model learns semantic evolvement information of events by capturing local semantic changes and global semantic evolvement information through specific graph autoencoder and reconstruction strategies. By combining semantic evolvement information and propagation structure information, the model achieves a comprehensive understanding of event propagation and perform accurate and robust detection, while also detecting rumors earlier by capturing semantic evolvement information in the early stages. Moreover, in order to enhance the model's ability to learn the distinct patterns of rumors and non-rumors, we introduce a uniformity regularizer to further improve the model's performance. Experimental results on three public benchmark datasets confirm the superiority of our GARD method over the state-of-the-art approaches in both overall performance and early rumor detection.	翻訳日:2024-04-26 18:22:04 公開日:2024-04-24
# ゼロショット強化学習によるスーパーコンパイラコードの最適化 Supercompiler Code Optimization with Zero-Shot Reinforcement Learning ( http://arxiv.org/abs/2404.16077v1 ) ライセンス: Link先を確認	Jialong Wu, Chaoyi Deng, Jianmin Wang, Mingsheng Long,	(参考訳) コンパイラーにおける効果的なコード最適化は、コンピュータとソフトウェア工学において中心的な役割を果たす。コンパイラはユーザの介入を必要とせずに自動的に最適化空間を検索できるが、検索が遅くて面倒なので、これは標準的プラクティスではない。ここでは、大規模なデータに基づいて広範囲に訓練された人工知能エージェントであるCodeZeroを紹介し、エージェントの1回の試行において、各プログラムの効果的な最適化戦略を即時に生成する。可能なテストプログラムの膨大な範囲を克服するために、品質、自然性、多様性を重視したトレーニングプログラムの大規模なデータセットを作成します。最適化可能な膨大なスペースに対処するため,コンパイラ環境のワールドモデルと対話することで,エージェントをサンプル効率で訓練する深層強化学習を適用した。ベンチマークスイートと本番レベルのコード最適化問題の両方の評価は、エージェントのスーパーコンパイラのパフォーマンスとゼロショットの一般化能力を示し、コンパイラの専門家が設計したビルトイン最適化よりも優れています。われわれの手法は、人工知能の工学的潜在能力を生かし、コード最適化の領域で機械学習技術をスケールする方法を開拓する。 Effective code optimization in compilers plays a central role in computer and software engineering. While compilers can be made to automatically search the optimization space without the need for user interventions, this is not a standard practice since the search is slow and cumbersome. Here we present CodeZero, an artificial intelligence agent trained extensively on large data to produce effective optimization strategies instantly for each program in a single trial of the agent. To overcome the huge range of possible test programs, we prepare a large dataset of training programs that emphasize quality, naturalness, and diversity. To tackle the vast space of possible optimizations, we adapt deep reinforcement learning to train the agent in a sample-efficient manner through interacting with a world model of the compiler environment. Evaluation on both benchmark suites and production-level code optimization problems demonstrates our agent's supercompiler performances and zero-shot generalization abilities, outperforming built-in optimization options designed by compiler experts. Our methodology kindles the great potential of artificial intelligence for engineering and paves the way for scaling machine learning techniques in the realm of code optimization.	翻訳日:2024-04-26 18:22:04 公開日:2024-04-24
# 階層的時間的抽象化による世界モデル学習:確率論的視点 Learning World Models With Hierarchical Temporal Abstractions: A Probabilistic Perspective ( http://arxiv.org/abs/2404.16078v1 ) ライセンス: Link先を確認	Vaisakh Shaj,	(参考訳) ヒューマンインテリジェンスを2種類の推論能力で再現できるマシンは、複数のレベルの時空間的抽象化とスケールを内部世界モデルを使って推論できるべきである。現実世界のダイナミクスに固有の因果的階層を正確に反映した、そのような内的世界モデルを開発するための形式主義を考案することは、人工知能と機械学習の分野における重要な研究課題である。この論文は、状態空間モデル(SSM)を内部世界モデルとして広く使われることによるいくつかの制限を特定し、これらの欠点に対処するために、Hidden-Parameter SSMとMulti-Time Scale SSMという2つの新しい確率形式を提案する。両方の形式主義におけるグラフィカルモデルの構造は、信念の伝播を用いたスケーラブルな正確な確率的推論と、時間を通してのバックプロパゲーションによるエンドツーエンドの学習を促進する。このアプローチは、複数の時間的抽象化とスケールにわたる非定常力学を表現することができるスケーラブルで適応的な階層的世界モデルの開発を可能にする。さらに、これらの確率論的形式主義は世界状態の不確実性の概念を統合し、現実世界の確率的性質をエミュレートし、その予測に対する自信を定量化する能力を向上させる。論文はまた、これらの形式主義がベイズ脳仮説と述語処理に関する関連する神経科学の文献とどのように一致しているかについても論じている。様々な実・模擬ロボットを用いた実験により,我々のフォーマリズムが一致し,多くの場合において,長距離将来の予測を行う上で,現代の変圧器変圧器の性能を上回ることが実証された。論文の結論は、現在のモデルの限界を反映し、今後の研究の方向性を示唆することである。 Machines that can replicate human intelligence with type 2 reasoning capabilities should be able to reason at multiple levels of spatio-temporal abstractions and scales using internal world models. Devising formalisms to develop such internal world models, which accurately reflect the causal hierarchies inherent in the dynamics of the real world, is a critical research challenge in the domains of artificial intelligence and machine learning. This thesis identifies several limitations with the prevalent use of state space models (SSMs) as internal world models and propose two new probabilistic formalisms namely Hidden-Parameter SSMs and Multi-Time Scale SSMs to address these drawbacks. The structure of graphical models in both formalisms facilitates scalable exact probabilistic inference using belief propagation, as well as end-to-end learning via backpropagation through time. This approach permits the development of scalable, adaptive hierarchical world models capable of representing nonstationary dynamics across multiple temporal abstractions and scales. Moreover, these probabilistic formalisms integrate the concept of uncertainty in world states, thus improving the system's capacity to emulate the stochastic nature of the real world and quantify the confidence in its predictions. The thesis also discuss how these formalisms are in line with related neuroscience literature on Bayesian brain hypothesis and predicitive processing. Our experiments on various real and simulated robots demonstrate that our formalisms can match and in many cases exceed the performance of contemporary transformer variants in making long-range future predictions. We conclude the thesis by reflecting on the limitations of our current models and suggesting directions for future research.	翻訳日:2024-04-26 18:12:21 公開日:2024-04-24
# 反射共焦点顕微鏡のAI駆動解析による診断の強化 Enhancing Diagnosis through AI-driven Analysis of Reflectance Confocal Microscopy ( http://arxiv.org/abs/2404.16080v1 ) ライセンス: Link先を確認	Hong-Jun Yoon, Chris Keum, Alexander Witkowski, Joanna Ludzik, Tracy Petrie, Heidi A. Hanson, Sancy A. Leachman,	(参考訳) 反射共焦点顕微鏡(英: Reflectance Confocal Microscopy、RCM)は、生体医学研究や臨床皮膚学で用いられる非侵襲的イメージング技術である。皮膚と表皮組織の高解像度画像を仮想的に提供し、物理的生検の必要性を減らす。 RCMはレーザー光源を用いて組織を照明し、反射した光を捉え、様々な深さの顕微鏡構造の詳細画像を生成する。近年の研究では、RCM画像の解析のためのAIと機械学習、特にCNNについて研究されている。本研究は, 臨床上重要な領域を同定し, 皮膚科医に効果的な画像解釈と診断信頼性を高めるためのセグメンテーション戦略を提案する。このアプローチは皮膚科の診断と治療を進めることを約束する。 Reflectance Confocal Microscopy (RCM) is a non-invasive imaging technique used in biomedical research and clinical dermatology. It provides virtual high-resolution images of the skin and superficial tissues, reducing the need for physical biopsies. RCM employs a laser light source to illuminate the tissue, capturing the reflected light to generate detailed images of microscopic structures at various depths. Recent studies explored AI and machine learning, particularly CNNs, for analyzing RCM images. Our study proposes a segmentation strategy based on textural features to identify clinically significant regions, empowering dermatologists in effective image interpretation and boosting diagnostic confidence. This approach promises to advance dermatological diagnosis and treatment.	翻訳日:2024-04-26 18:12:20 公開日:2024-04-24
# 適応量子回路を用いた行列積状態の定数深さ準備 Constant-depth preparation of matrix product states with adaptive quantum circuits ( http://arxiv.org/abs/2404.16083v1 ) ライセンス: Link先を確認	Kevin C. Smith, Abid Khan, Bryan K. Clark, S. M. Girvin, Tzu-Chieh Wei,	(参考訳) 局所的なユニタリゲート、ミッドサーキット測定、フィードフォワード演算を組み合わせた適応量子回路は、特に浅い深さの回路に制限された短期量子デバイスにおいて、効率的な状態準備のための有望な経路として最近登場した。行列積状態 (MPS) は多体交絡状態の重要なクラスを構成し、一次元のギャップを持つ局所ハミルトニアンの基底状態を効率的に記述し、近年の多くの量子アルゴリズムにおける応用を見つける。近年、MPSのパラダイム的な例であるAKLT状態は、非ゼロ相関長(Smith et al , PRX Quantum 4, 020315 (2023))による局所的なユニタリゲートの適応量子回路で正確に準備できることが示されている。本研究は,本手法の範囲を広くし,一元回路のみに依存する最適準備プロトコルよりも高い精度で,多種多様なMPSを一定深度適応量子回路で正確に作成できることを実証する。このクラスは、短距離および長距離の絡み合ったMPS、対称性保護トポロジカル(SPT)および対称性破壊状態、有限アベリア、非アベリアおよび連続対称性を持つMPS、MBQCの資源状態、調整可能な相関長を持つ状態の族を含むことを示す。さらに、ランダムMPSや特定のSPTフェーズでMPSを生成するような、一定の深さのサンプリングプロトコルを設計するためのフレームワークの有用性について述べる。我々は、特定のMPSが一定時間で準備できる十分な条件を示し、グローバルなオンサイト対称性が中心的な役割を果たす。この研究は、多体絡み合った状態を効率的に準備するための適応量子回路の膨大な可能性を実証し、既知のプロトコルより優れた明示的なアルゴリズムを提供し、重要な種類の状態を作成する。 Adaptive quantum circuits, which combine local unitary gates, midcircuit measurements, and feedforward operations, have recently emerged as a promising avenue for efficient state preparation, particularly on near-term quantum devices limited to shallow-depth circuits. Matrix product states (MPS) comprise a significant class of many-body entangled states, efficiently describing the ground states of one-dimensional gapped local Hamiltonians and finding applications in a number of recent quantum algorithms. Recently, it was shown that the AKLT state -- a paradigmatic example of an MPS -- can be exactly prepared with an adaptive quantum circuit of constant-depth, an impossible feat with local unitary gates due to its nonzero correlation length [Smith et al., PRX Quantum 4, 020315 (2023)]. In this work, we broaden the scope of this approach and demonstrate that a diverse class of MPS can be exactly prepared using constant-depth adaptive quantum circuits, outperforming optimal preparation protocols that rely on unitary circuits alone. We show that this class includes short- and long-ranged entangled MPS, symmetry-protected topological (SPT) and symmetry-broken states, MPS with finite Abelian, non-Abelian, and continuous symmetries, resource states for MBQC, and families of states with tunable correlation length. Moreover, we illustrate the utility of our framework for designing constant-depth sampling protocols, such as for random MPS or for generating MPS in a particular SPT phase. We present sufficient conditions for particular MPS to be preparable in constant time, with global on-site symmetry playing a pivotal role. Altogether, this work demonstrates the immense promise of adaptive quantum circuits for efficiently preparing many-body entangled states and provides explicit algorithms that outperform known protocols to prepare an essential class of states.	翻訳日:2024-04-26 18:12:20 公開日:2024-04-24
# 確率量子制御の統計力学:$d$-adic Rényi回路 Statistical Mechanics of Stochastic Quantum Control: $d$-adic Rényi Circuits ( http://arxiv.org/abs/2404.16087v1 ) ライセンス: Link先を確認	Andrew A. Allocca, Conner LeMaire, Thomas Iadecola, Justin H. Wilson,	(参考訳) ヒルベルト空間次元が大きい多体系における量子情報の力学は、効果的な統計力学モデルの観点からの啓蒙的な記述を許容する。この事実に動機付けられて、古典的にカオス的な$d$-adic R\'{e}nyi写像と確率的制御、この写像の量子的類似物、ランダムグラフ上のポッツモデルという3つの異なるモデルの間の関係を明らかにする。古典的モデルとその量子アナログは、システムを順序付けしようとするランダムに応用された制御マップによって駆動されるカオスと制御された位相間の遷移を共有する。量子モデルでは、制御マップは、深夜定常状態の絡み合い内容の相転移を同時に駆動する測定を必要とする。制御と絡み合いの遷移の相互作用を探索するため、量子モデルから有効なポッツモデルを導出し、両方の遷移を目撃する情報理論量の探索に利用する。絡み合い遷移は、他の測定誘起相転移と一致し、ボンド-パーコレーション普遍性クラスに属するが、制御遷移は古典的なランダムウォークによって支配される。これら2つの相転移はモデルパラメータの関数として融合し、以前の量子モデルの小さな数値的研究で観測された挙動と一致する。 The dynamics of quantum information in many-body systems with large onsite Hilbert space dimension admits an enlightening description in terms of effective statistical mechanics models. Motivated by this fact, we reveal a connection between three separate models: the classically chaotic $d$-adic R\'{e}nyi map with stochastic control, a quantum analog of this map for qudits, and a Potts model on a random graph. The classical model and its quantum analog share a transition between chaotic and controlled phases, driven by a randomly applied control map that attempts to order the system. In the quantum model, the control map necessitates measurements that concurrently drive a phase transition in the entanglement content of the late-time steady state. To explore the interplay of the control and entanglement transitions, we derive an effective Potts model from the quantum model and use it to probe information-theoretic quantities that witness both transitions. The entanglement transition is found to be in the bond-percolation universality class, consistent with other measurement-induced phase transitions, while the control transition is governed by a classical random walk. These two phase transitions merge as a function of model parameters, consistent with behavior observed in previous small-size numerical studies of the quantum model.	翻訳日:2024-04-26 18:12:20 公開日:2024-04-24
# J_1$-$J_2$鎖の臨界理論と積分した行列積状態経路からの一般化されたハルデン写像 A Generalised Haldane Map from the Matrix Product State Path Integral to the Critical Theory of the $J_1$-$J_2$ Chain ( http://arxiv.org/abs/2404.16088v1 ) ライセンス: Link先を確認	F. Azad, Adam J. McRoberts, Chris Hooley, A. G. Green,	(参考訳) 行列積状態 (MPS) 上に構築された経路積分を用いて, J_1$-$J_2$ spin-$1/2$ 鎖について検討した。非自明な絡み合い構造により、MPSアンザッツは半古典的、サドル点レベルでもモデルの鍵位相を捉え、変分状態として、アーベルボゾン化によって得られる場の理論とよく一致する。半古典的なレベルを超えて、MPSアンザッツは臨界相の場理論の物理的動機付けによる導出を促進することを示し、連続極限(ハルデン写像の一般化)を慎重に取り込むことで、MPSパスから正しい位相項を持つ場理論と創発的な$SO(4)$対称性を積分し、顕微鏡状態と位相場理論構造を包含する。さらに、二量体遷移は、特にMPSの定式化において明らかであり、明示的な二量体ポテンシャルが関連し、磁気的ゆらぎを逸脱する。 We study the $J_1$-$J_2$ spin-$1/2$ chain using a path integral constructed over matrix product states (MPS). By virtue of its non-trivial entanglement structure, the MPS ansatz captures the key phases of the model even at a semi-classical, saddle-point level, and, as a variational state, is in good agreement with the field theory obtained by abelian bosonisation. Going beyond the semi-classical level, we show that the MPS ansatz facilitates a physically-motivated derivation of the field theory of the critical phase: by carefully taking the continuum limit -- a generalisation of the Haldane map -- we recover from the MPS path integral a field theory with the correct topological term and emergent $SO(4)$ symmetry, constructively linking the microscopic states and topological field-theoretic structures. Moreover, the dimerisation transition is particularly clear in the MPS formulation -- an explicit dimerisation potential becomes relevant, gapping out the magnetic fluctuations.	翻訳日:2024-04-26 18:12:20 公開日:2024-04-24
# 測定誘起遷移近傍の長距離多部絡み合い Long-range multipartite entanglement near measurement-induced transitions ( http://arxiv.org/abs/2404.16095v1 ) ライセンス: Link先を確認	Sebastien J Avakian, T. Pereg-Barnea, William Witczak-Krempa,	(参考訳) 測定は量子システムに大きな影響を与え、平衡から新しい物質の状態を作るのに使用できる。ここでは、ユニタリと測定を含む量子回路に現れる多粒子絡み構造について検討する。測定値とユニタリ進化のバランスが,非監視系よりもはるかに広い距離に分散し,通常の絡み合いの運命を回避できることを示す。本研究では,分散グラフに基づくグラフィカル表現を導入し,一般的な部分領域に対する真のマルチパート・エンタングルメントの進化を推測する。 1次元計測によって誘起される動的相転移を実現する回路について,本研究で得られた知見を例証する。 2件と4件のケースも例によってカバーされている。最後に,我々のアプローチが量子回路とアーキテクチャの幅広いクラスに対して,絡み合いのダイナミクスに関する基本的な知見を提供する方法について論じる。 Measurements profoundly impact quantum systems, and can be used to create new states of matter out of equilibrium. Here, we investigate the multipartite entanglement structure that emerges in quantum circuits involving unitaries and measurements. We describe how a balance between measurements and unitary evolution can lead to multipartite entanglement spreading to distances far greater than what is found in non-monitored systems, thus evading the usual fate of entanglement. We introduce a graphical representation based on spanning graphs that allows to infer the evolution of genuine multipartite entanglement for general subregions. We exemplify our findings on circuits that realize a 1d measurement-induced dynamical phase transition, where we find genuine 3-party entanglement at all separations. The 2- and 4-party cases are also covered with examples. Finally, we discuss how our approach can provide fundamental insights regarding entanglement dynamics for a wide class of quantum circuits and architectures.	翻訳日:2024-04-26 18:12:20 公開日:2024-04-24
# 半古典ブラックホールの相対的状態カウント Relative state-counting for semiclassical black holes ( http://arxiv.org/abs/2404.16098v1 ) ライセンス: Link先を確認	Chris Akers, Jonathan Sorce,	(参考訳) 摂動量子重力の特定の状態間のエントロピー差は、紫外線の完了を指定せずに計算できることが示されている。これは、エントロピー差が定義されるが絶対エントロピーが定義されない古典的な統計力学の状況と類似している。しかし、古典的な統計力学とは異なり、摂動量子重力で計算されるエントロピー差は明確な物理的解釈を持っていない。ここでは、エントロピー差を状態の相対的数え上げと解釈できる摂動ブラックホール状態の族を構築する。概念的には、この論文は固定されたブラックホール背景の質量ゆらぎの代数から始まり、これはI型代数であるが、これは因子ではなく、従ってエントロピーの標準的定義を持たないことを指摘している。以前の研究と同様に、質量ゆらぎと量子物質を結合することは、エントロピー差(絶対エントロピーではない)がよく定義されるタイプIIに質量代数を埋め込む。すると、質量ゆらぎのマイクロカノニカル波動関数の場合、タイプIIエントロピー差はゲージ不変ユニタリを用いて1つのマイクロカノニカル窓をもう1つにマップするのに必要となる余剰ヒルベルト空間の次元の対数と等しいことが示される。この論文は、フォン・ノイマンのエントロピー差は物理的解釈を持たないが、「ワンショット」エントロピー差は成立する、より一般的な状態のクラスにおけるタイプIIエントロピー差に関するコメントで締めくくっている。 It has been shown that entropy differences between certain states of perturbative quantum gravity can be computed without specifying an ultraviolet completion. This is analogous to the situation in classical statistical mechanics, where entropy differences are defined but absolute entropy is not. Unlike in classical statistical mechanics, however, the entropy differences computed in perturbative quantum gravity do not have a clear physical interpretation. Here we construct a family of perturbative black hole states for which the entropy difference can be interpreted as a relative counting of states. Conceptually, this paper begins with the algebra of mass fluctuations around a fixed black hole background, and points out that while this is a type I algebra, it is not a factor and therefore has no canonical definition of entropy. As in previous work, coupling the mass fluctuations to quantum matter embeds the mass algebra within a type II factor, in which entropy differences (but not absolute entropies) are well defined. It is then shown that for microcanonical wavefunctions of mass fluctuation, the type II entropy difference equals the logarithm of the dimension of the extra Hilbert space that is needed to map one microcanonical window to another using gauge-invariant unitaries. The paper closes with comments on type II entropy difference in a more general class of states, where the von Neumann entropy difference does not have a physical interpretation, but "one-shot" entropy differences do.	翻訳日:2024-04-26 18:12:20 公開日:2024-04-24
# 多変量忠実度 Multivariate Fidelities ( http://arxiv.org/abs/2404.16101v1 ) ライセンス: Link先を確認	Theshani Nuradha, Hemant K. Mishra, Felix Leditzky, Mark M. Wilde,	(参考訳) 本稿の主な貢献は、多くの多変量量子忠実度を導入し、ウルマンとホレヴォの忠実度を自然に拡張したいくつかの望ましい性質を満たすことを示すことである。本稿では,平均対対数$z$-忠実度,多変量半定値プログラミング(SDP)忠実度,および既存の秘密度尺度に着想を得た多変量有限性という3つの変種を提案する。 2つ目は、ウルマン忠実度のSDP定式化を2つ以上の状態に拡張することで得られる。これら3つの変種はすべて以下の性質を満たす。 (i)通勤国家における古典的忠実度の多変量化 (ii)データ処理の不平等三国家の順応による不変性 (iv)その値が$[0,1]$の間隔にあること、すなわち、それらの値が1に等しいこと、そして全ての状態が等しいこと、そしてそれらの値が0に等しいこと、そして状態が互いに直交している場合に限り、その値が0に等しいこと。 (v)直属財産 (vi)関節腔、及び (vii)一様連続性は一定の条件下で有界である。さらに、これらの異なる変種に関連する不等式を確立し、これらすべての定義が可換状態の平均対忠実度と一致することを明確にする。最後に、多変量対ユークリッドフィディリティという別の多変量体を導入し、これはマツシタ多変量体フィディリティの量子一般化である。また、上述の望ましい性質のほとんどを満足し、多変量対ユークリッド発散の関数であり、任意に変化するヌル仮説を持つ量子仮説検定の操作的解釈を持つことを示した。 The main contribution of our paper is to introduce a number of multivariate quantum fidelities and show that they satisfy several desirable properties that are natural extensions of those of the Uhlmann and Holevo fidelities. We propose three variants that reduce to the average pairwise fidelity for commuting states: average pairwise $z$-fidelities, the multivariate semi-definite programming (SDP) fidelity, and a multivariate fidelity inspired by an existing secrecy measure. The second one is obtained by extending the SDP formulation of the Uhlmann fidelity to more than two states. All three of these variants satisfy the following properties: (i) reduction to multivariate classical fidelities for commuting states, (ii) the data-processing inequality, (iii) invariance under permutations of the states, (iv) its values are in the interval $[0,1]$; they are faithful, that is, their values are equal to one if and only if all the states are equal, and they satisfy orthogonality, that is their values are equal to zero if and only if the states are mutually orthogonal to each other, (v) direct-sum property, (vi) joint concavity, and (vii) uniform continuity bounds under certain conditions. Furthermore, we establish inequalities relating these different variants, indeed clarifying that all these definitions coincide with the average pairwise fidelity for commuting states. Lastly, we introduce another multivariate fidelity called multivariate log-Euclidean fidelity, which is a quantum generalization of the Matusita multivariate fidelity. We also show that it satisfies most of the desirable properties listed above, it is a function of a multivariate log-Euclidean divergence, and has an operational interpretation in terms of quantum hypothesis testing with an arbitrarily varying null hypothesis.	翻訳日:2024-04-26 18:12:20 公開日:2024-04-24
# 日仏音声メディアにおけるジェンダーと年齢間の音声の進化 : ダイアクロニック・パースペクティブ Evolution of Voices in French Audiovisual Media Across Genders and Age in a Diachronic Perspective ( http://arxiv.org/abs/2404.16104v1 ) ライセンス: Link先を確認	Albert Rilliard, David Doukhan, Rémi Uro, Simon Devauchelle,	(参考訳) 本稿では,フランスのメディアアーカイブから1023人の話者の声のダイアクロニック音響解析を行った。話者は、4つの期間(1955/56年、1975/76年、1995/96年、2015/16年)、4つの年齢グループ(20-35年、36-50年、51-65年、65年)と2つの性別に基づいて32のカテゴリーに分散している。基本周波数(F_0$)と第14フォルマント(F1-4)を推定した。不均一なデータに対するこれらの推定の質を保証するために用いられる手順について述べる。各話者の$F_0$分布から、ベース-$F_0$値を計算してレジスタを推定した。ホルマント周波数から平均声道長を推定した。 Base-$F_0$と声道長を線形混合モデルに適合させ,年齢効果を補正した。その結果,性別によらず,低声化傾向にある期間の影響が示唆された。ピッチの低下は女性の年齢とともに観察されるが、男性話者は観察されない。 We present a diachronic acoustic analysis of the voice of 1023 speakers from French media archives. The speakers are spread across 32 categories based on four periods (years 1955/56, 1975/76, 1995/96, 2015/16), four age groups (20-35; 36-50; 51-65, >65), and two genders. The fundamental frequency ($F_0$) and the first four formants (F1-4) were estimated. Procedures used to ensure the quality of these estimations on heterogeneous data are described. From each speaker's $F_0$ distribution, the base-$F_0$ value was calculated to estimate the register. Average vocal tract length was estimated from formant frequencies. Base-$F_0$ and vocal tract length were fit by linear mixed models to evaluate how they may have changed across time periods and genders, corrected for age effects. Results show an effect of the period with a tendency to lower voices, independently of gender. A lowering of pitch is observed with age for female but not male speakers.	翻訳日:2024-04-26 18:12:20 公開日:2024-04-24
# 時間ビン符号化フォトニック量子情報プロトコルのロバストなアプローチ A robust approach for time-bin encoded photonic quantum information protocols ( http://arxiv.org/abs/2404.16106v1 ) ライセンス: Link先を確認	Simon J. U. White, Emanuele Polino, Farzad Ghafari, Dominick J. Joch, Luis Villegas-Aguilar, Lynden K. Shalm, Varun B. Verma, Marcus Huber, Nora Tischler,	(参考訳) 光子の時間2自由度で符号化された量子状態は、量子情報プロトコルの基本的なリソースである。従来の時間ビン符号化量子状態の生成と測定方法は、光学的不安定性、複雑な設定、時間分解能の要求により深刻な課題に直面している。ここでは、香港・ウー・マンデル干渉に基づく堅牢なアプローチを活用し、これらの問題を回避できる。まず、短時間の時間分離を伴う時間ビン量子ビットの高忠実な量子状態トモグラフィーを行う。そして,非古典性試験により,単一光子の系内偏光時間絡みを認証する。最後に,単一空間モードにおける高次元時間ビン量子状態の生成と測定を行う,堅牢でスケーラブルなプロトコルを提案する。このプロトコルは、標準的なスキームでは事実上アクセスできない高次元の状態やタスクへのアクセスを可能にし、基本的な量子情報科学を前進させ、量子通信の応用を開放することを約束している。 Quantum states encoded in the time-bin degree of freedom of photons represent a fundamental resource for quantum information protocols. Traditional methods for generating and measuring time-bin encoded quantum states face severe challenges due to optical instabilities, complex setups, and timing resolution requirements. Here, we leverage a robust approach based on Hong-Ou-Mandel interference that allows us to circumvent these issues. First, we perform high-fidelity quantum state tomographies of time-bin qubits with a short temporal separation. Then, we certify intrasystem polarization-time entanglement of single photons through a nonclassicality test. Finally, we propose a robust and scalable protocol to generate and measure high-dimensional time-bin quantum states in a single spatial mode. The protocol promises to enable access to high-dimensional states and tasks that are practically inaccessible with standard schemes, thereby advancing fundamental quantum information science and opening applications in quantum communication.	翻訳日:2024-04-26 18:12:20 公開日:2024-04-24
# 量子プロセッサにおける非安定化性の証明 Certifying nonstabilizerness in quantum processors ( http://arxiv.org/abs/2404.16107v1 ) ライセンス: Link先を確認	Rafael Wagner, Filipa C. R. Peres, Emmanuel Zambrini Cruzeiro, Ernesto F. Galvão,	(参考訳) 非安定化器性(英: Nonstabilizerness)またはマジック(英: magic)は、量子計算において重要な資源である。量子処理ユニット(QPU)の複雑さの増大は、このリソースを特徴づけるために堅牢でスケーラブルな技術を必要とする。集合の集合が、集合の少なくとも1つの状態が非安定化状態である場合、この性質を持つ。我々は、最近、基底非依存コヒーレンス(英語版)の証人として導入されたある二状態重なり合う不等式が、多重量子集合マジックの証人であることを示した。また、複数のQPUにまたがる魔法の存在を、互いに絡み合うことなく証明し、個々のQPUに対する要求を減らすことが可能であることを示す。 Nonstabilizerness, also known as magic, is a crucial resource for quantum computation. The growth in complexity of quantum processing units (QPUs) demands robust and scalable techniques for characterizing this resource. We introduce the notion of set magic: a set of states has this property if at least one state in the set is a non-stabilizer state. We show that certain two-state overlap inequalities, recently introduced as witnesses of basis-independent coherence, are also witnesses of multi-qubit set magic. We also show it is possible to certify the presence of magic across multiple QPUs without the need for entanglement between them and reducing the demands on each individual QPU.	翻訳日:2024-04-26 18:12:20 公開日:2024-04-24
# zkLLM: 大規模言語モデルのためのゼロ知識証明 zkLLM: Zero Knowledge Proofs for Large Language Models ( http://arxiv.org/abs/2404.16109v1 ) ライセンス: Link先を確認	Haochen Sun, Jason Li, Hongyang Zhang,	(参考訳) 近年の人工知能(AI)の急激な増加は、大言語モデル(LLM)の卓越によって特徴づけられ、世界中に根本的変化をもたらした。しかし、これらの進歩とともに、LSMの正当性をめぐる懸念が高まり、その広範な応用に法的課題が生じた。これらの懸念を複雑にし、LSMのパラメータはしばしば知的財産として扱われ、直接の調査が制限される。本研究では,LLMが生成するアウトプットの信頼性を確立することの必要性という,AI法制の領域における根本的な課題に対処する。この問題に対処するため、我々は LLM に最適化された初歩的なゼロ知識証明である zkLLM を提案する。ディープラーニングにおける非算術的操作の永続的課題に対処するため,我々は,非算術的テンソル操作のための並列化されたルックアップ引数であるtlookupを導入し,漸近的オーバーヘッドのないソリューションを提案する。さらに、tlookupの基礎を生かして、注意機構のための特別なゼロ知識証明であるzkAttnを導入し、実行時間、メモリ使用量、正確性について慎重に検討する。完全に並列化されたCUDAの実装によって、zkLLMはLLM上で効率的なゼロ知識検証計算を実現するための重要な一歩として現れます。注目すべきは、LLMが13億のパラメータを持つ場合、我々の手法は推論プロセス全体の正当性証明を15分以内で生成できるということである。結果として得られた証明は、200kB未満のコンパクトなサイズで、モデルのパラメータのプライバシを保ち、不注意な情報漏洩を確実にするように設計されている。 The recent surge in artificial intelligence (AI), characterized by the prominence of large language models (LLMs), has ushered in fundamental transformations across the globe. However, alongside these advancements, concerns surrounding the legitimacy of LLMs have grown, posing legal challenges to their extensive applications. Compounding these concerns, the parameters of LLMs are often treated as intellectual property, restricting direct investigations. In this study, we address a fundamental challenge within the realm of AI legislation: the need to establish the authenticity of outputs generated by LLMs. To tackle this issue, we present zkLLM, which stands as the inaugural specialized zero-knowledge proof tailored for LLMs to the best of our knowledge. Addressing the persistent challenge of non-arithmetic operations in deep learning, we introduce tlookup, a parallelized lookup argument designed for non-arithmetic tensor operations in deep learning, offering a solution with no asymptotic overhead. Furthermore, leveraging the foundation of tlookup, we introduce zkAttn, a specialized zero-knowledge proof crafted for the attention mechanism, carefully balancing considerations of running time, memory usage, and accuracy. Empowered by our fully parallelized CUDA implementation, zkLLM emerges as a significant stride towards achieving efficient zero-knowledge verifiable computations over LLMs. Remarkably, for LLMs boasting 13 billion parameters, our approach enables the generation of a correctness proof for the entire inference process in under 15 minutes. The resulting proof, compactly sized at less than 200 kB, is designed to uphold the privacy of the model parameters, ensuring no inadvertent information leakage.	翻訳日:2024-04-26 18:12:20 公開日:2024-04-24
# Mamba-360:Long Sequence Modellingに代わる変圧器としての状態空間モデルの調査:方法、応用、課題 Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges ( http://arxiv.org/abs/2404.16112v1 ) ライセンス: Link先を確認	Badri Narayana Patro, Vijay Srinivas Agneeswaran,	(参考訳) シーケンスモデリングは自然言語処理(NLP)、音声認識、時系列予測、音楽生成、バイオインフォマティクスなど、さまざまな分野において重要な領域である。 Recurrent Neural Networks(RNN)とLong Short Term Memory Networks(LSTM)は歴史的に、機械翻訳、名前付きエンティティ認識(NER)といったシーケンスモデリングタスクを支配してきた。しかし、変圧器の進歩は、優れた性能を考えれば、このパラダイムの変化につながっている。しかし、変換器は$O(N^2)$注目の複雑さと帰納バイアスを扱う際の課題に悩まされる。スペクトルネットワークや畳み込みを使い、様々なタスクでうまく機能するこれらの問題に対処するために、いくつかのバリエーションが提案されている。しかし、それらは長いシーケンスを扱うのに依然として困難である。状態空間モデル(SSM)は、特にS4の出現や、S4nd、Hippo、Hyena、Diagnol State Spaces(DSS)、Gated State Spaces(GSS)、LRU、Liquid-S4、Mambaなどの変種と共に、この文脈におけるシーケンスモデリングパラダイムの有望な代替品として登場した。本稿では,3つのパラダイム,すなわちゲーティングアーキテクチャ,構造アーキテクチャ,リカレントアーキテクチャに基づいて,基本的なSSMを分類する。この調査ではまた、視覚、ビデオ、音声、音声、言語(特に長いシーケンスモデリング)、医学(ゲノミクスを含む)、化学(薬物設計のような)、レコメンデーションシステム、および表データを含む時系列分析など、さまざまな領域におけるSSMの応用についても強調した。さらに,Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2などのベンチマークデータセットと,Breakfast, COIN, LVU, および各種時系列データセットのSSMの性能を集約した。 Mamba-360のプロジェクトページは、このWebページにある。 https://github.com/badripatro/mamba360}。 Sequence modeling is a crucial area across various domains, including Natural Language Processing (NLP), speech recognition, time series forecasting, music generation, and bioinformatics. Recurrent Neural Networks (RNNs) and Long Short Term Memory Networks (LSTMs) have historically dominated sequence modeling tasks like Machine Translation, Named Entity Recognition (NER), etc. However, the advancement of transformers has led to a shift in this paradigm, given their superior performance. Yet, transformers suffer from $O(N^2)$ attention complexity and challenges in handling inductive bias. Several variations have been proposed to address these issues which use spectral networks or convolutions and have performed well on a range of tasks. However, they still have difficulty in dealing with long sequences. State Space Models(SSMs) have emerged as promising alternatives for sequence modeling paradigms in this context, especially with the advent of S4 and its variants, such as S4nd, Hippo, Hyena, Diagnol State Spaces (DSS), Gated State Spaces (GSS), Linear Recurrent Unit (LRU), Liquid-S4, Mamba, etc. In this survey, we categorize the foundational SSMs based on three paradigms namely, Gating architectures, Structural architectures, and Recurrent architectures. This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data. Moreover, we consolidate the performance of SSMs on benchmark datasets like Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2, as well as video datasets such as Breakfast, COIN, LVU, and various time series datasets. The project page for Mamba-360 work is available on this webpage.\url{https://github.com/badripatro/mamba360}.	翻訳日:2024-04-26 18:12:20 公開日:2024-04-24
# ニューラルバンドを用いたWhite-box LLMのオンラインパーソナライズ Online Personalizing White-box LLMs Generation with Neural Bandits ( http://arxiv.org/abs/2404.16115v1 ) ライセンス: Link先を確認	Zekai Chen, Weeden Daniel, Po-yu Chen, Francois Buet-Golfouse,	(参考訳) LLMによるパーソナライズされたコンテンツ生成の出現は、ユーザ毎にユニークなモデルを作成するという持続不可能な要求を伴わずに、個々の嗜好を満たすためにテキストを効率的に適応する方法という、新しい課題を提示している。本研究では,ユーザフィードバックに基づくソフトインストラクション埋め込みを動的に最適化するために,ニューラルバンディットアルゴリズムを用いた革新的なオンライン手法を導入し,ホワイトボックスLLMによるオープンエンドテキスト生成のパーソナライズを強化した。各種タスクの厳密な実験を通じて,ベースライン戦略よりも優れた性能を示す。特にNeuralTSは、パーソナライズされたニュースの見出し生成を大幅に改善し、最高のROUGEスコアの62.9%の改善と、ベースラインに対するLLMエージェント評価の2.76%向上を実現している。 The advent of personalized content generation by LLMs presents a novel challenge: how to efficiently adapt text to meet individual preferences without the unsustainable demand of creating a unique model for each user. This study introduces an innovative online method that employs neural bandit algorithms to dynamically optimize soft instruction embeddings based on user feedback, enhancing the personalization of open-ended text generation by white-box LLMs. Through rigorous experimentation on various tasks, we demonstrate significant performance improvements over baseline strategies. NeuralTS, in particular, leads to substantial enhancements in personalized news headline generation, achieving up to a 62.9% improvement in terms of best ROUGE scores and up to 2.76% increase in LLM-agent evaluation against the baseline.	翻訳日:2024-04-26 18:12:20 公開日:2024-04-24
# ソーシャルメディアにおける人間生成とAI生成の選挙宣言の分類 Classifying Human-Generated and AI-Generated Election Claims in Social Media ( http://arxiv.org/abs/2404.16116v1 ) ライセンス: Link先を確認	Alphaeus Dmonte, Marcos Zampieri, Kevin Lybarger, Massimiliano Albanese,	(参考訳) 政治は、ソーシャルメディアプラットフォーム上で議論される最も一般的なトピックの1つであり、特に主要な選挙サイクルでは、ユーザーが候補者や選挙プロセスについて会話する。悪意ある俳優はこの機会を利用して誤報を広め、選挙プロセスへの信頼を損なうかもしれない。 LLM(Large Language Models)の出現は、悪質なアクターが前例のない規模で誤情報を生成できるようにすることによって、この問題を悪化させる。人工知能(AI)が生成するコンテンツは、真正なユーザーコンテンツとは区別できないことが多く、ソーシャルネットワーク上の情報の完全性に関する懸念を提起する。本稿では,選挙関連主張を特徴付ける新しい分類法を提案する。この分類法は、司法、機器、プロセス、およびクレームの性質に関する粒度のカテゴリを含む選挙関連のクレームを分析するための手段を提供する。 ElectAIは9,900のツイートからなる新しいベンチマークデータセットで、それぞれが人間またはAI生成とラベル付けされている。 AI生成ツイートでは、生成した特定のLLM変種が指定される。我々は提案した分類法を用いて1,550のツイートのサブセットに注釈を付け、選挙関連クレームの特徴を捉えた。分類属性を抽出するLLMの能力について検討し、ElectAIを用いて機械学習モデルを訓練し、人間とAIが生成するポストを識別し、特定のLLM変種を特定する。 Politics is one of the most prevalent topics discussed on social media platforms, particularly during major election cycles, where users engage in conversations about candidates and electoral processes. Malicious actors may use this opportunity to disseminate misinformation to undermine trust in the electoral process. The emergence of Large Language Models (LLMs) exacerbates this issue by enabling malicious actors to generate misinformation at an unprecedented scale. Artificial intelligence (AI)-generated content is often indistinguishable from authentic user content, raising concerns about the integrity of information on social networks. In this paper, we present a novel taxonomy for characterizing election-related claims. This taxonomy provides an instrument for analyzing election-related claims, with granular categories related to jurisdiction, equipment, processes, and the nature of claims. We introduce ElectAI, a novel benchmark dataset that consists of 9,900 tweets, each labeled as human- or AI-generated. For AI-generated tweets, the specific LLM variant that produced them is specified. We annotated a subset of 1,550 tweets using the proposed taxonomy to capture the characteristics of election-related claims. We explored the capabilities of LLMs in extracting the taxonomy attributes and trained various machine learning models using ElectAI to distinguish between human- and AI-generated posts and identify the specific LLM variant.	翻訳日:2024-04-26 18:12:20 公開日:2024-04-24
# 極性Bluetooth低エネルギー心拍センサのサイバーセキュリティ評価 Cybersecurity Assessment of the Polar Bluetooth Low Energy Heart-rate Sensor ( http://arxiv.org/abs/2404.16117v1 ) ライセンス: Link先を確認	Smone Soderi,	(参考訳) ウェアラブルとインプラント可能なデバイス間の無線通信は、人体を取り巻く情報交換を実装している。無線ボディエリアネットワーク(WBAN)技術は、日常生活における非侵襲的な応用を可能にする。ワイヤレス接続デバイスは多くのサービスの品質を改善し、手順を簡単にする。一方、大きな攻撃面を開き、潜在的なセキュリティ脆弱性を導入する。 Bluetooth Low Energy (BLE) は、無線パーソナルエリアネットワーク(WPAN)で広く使われている低電力プロトコルである。本稿ではBLE心拍センサのセキュリティ脆弱性を解析する。受信信号強度指標(RSSI)の変動を観測することにより、BLE接続における異常を検出することができる。ケーススタディは、アタッカーがモバイルアプリとBLEデバイス間で送信されたデータを簡単にインターセプトし、操作できることを示しています。この研究により、ワイヤレスボディセンサーから受信できる心拍情報のセキュリティについて、著者は認識を深めるでしょう。 Wireless communications among wearable and implantable devices implement the information exchange around the human body. Wireless body area network (WBAN) technology enables non-invasive applications in our daily lives. Wireless connected devices improve the quality of many services, and they make procedures easier. On the other hand, they open up large attack surfaces and introduces potential security vulnerabilities. Bluetooth low energy (BLE) is a low-power protocol widely used in wireless personal area networks (WPANs). This paper analyzes the security vulnerabilities of a BLE heart-rate sensor. By observing the received signal strength indicator (RSSI) variations, it is possible to detect anomalies in the BLE connection. The case-study shows that an attacker can easily intercept and manipulate the data transmitted between the mobile app and the BLE device. With this research, the author would raise awareness about the security of the heart-rate information that we can receive from our wireless body sensors.	翻訳日:2024-04-26 18:02:26 公開日:2024-04-24
# ハネトケン発生器としての行為! 大規模言語モデルを用いたハネトケン生成の検討 Act as a Honeytoken Generator! An Investigation into Honeytoken Generation with Large Language Models ( http://arxiv.org/abs/2404.16118v1 ) ライセンス: Link先を確認	Daniel Reti, Norman Becker, Tillmann Angeli, Anasuya Chattopadhyay, Daniel Schneider, Sebastian Vollmer, Hans D. Schotten,	(参考訳) セキュリティインシデントの増加に伴い、偽装ベースの防衛戦略の採用はサイバーセキュリティにおいて重要な役割を担っている。この研究は、このような防御機構の重要な構成要素であるハネトケンの設計におけるスケーラビリティの課題に対処する。ハネトケンのマニュアル作成は面倒な作業である。自動生成装置は存在するが、汎用性に欠けることが多く、特定の種類のハネトケンに特化しており、適切なトレーニングデータセットに大きく依存している。これらの制約を克服するために、この研究は大規模言語モデル(LLM)を用いて様々なハニトケンを作成するアプローチを体系的に研究する。設定ファイル、データベース、ログファイルなど、この作業で作成された7種類のハネトケンタイプのうち、最適なプロンプトを評価するために2つが使用された。ロボット.txtファイルとハニーワードの生成は、16のプロンプトビルディングブロックに基づいて、210の異なるプロンプト構造を体系的にテストするために使用された。さらに、全てのハニトケンは、異なるモデルの様々な性能を評価するために、異なる最先端のLLMで試験された。 1つの LLM 上で最適に実行されるプロンプトは、必ずしも他の LLM に対してうまく一般化するとは限らない。 GPT-3.5で生成されたハニーワードは、従来の自動ハニーワード生成法に比べて、実際のパスワードと区別しにくいことが判明した。全体として、本研究の成果は、ジェネリックLLMが提示されたプロンプト構造を用いて、幅広いハネトケンを生成可能であることを示している。 With the increasing prevalence of security incidents, the adoption of deception-based defense strategies has become pivotal in cyber security. This work addresses the challenge of scalability in designing honeytokens, a key component of such defense mechanisms. The manual creation of honeytokens is a tedious task. Although automated generators exists, they often lack versatility, being specialized for specific types of honeytokens, and heavily rely on suitable training datasets. To overcome these limitations, this work systematically investigates the approach of utilizing Large Language Models (LLMs) to create a variety of honeytokens. Out of the seven different honeytoken types created in this work, such as configuration files, databases, and log files, two were used to evaluate the optimal prompt. The generation of robots.txt files and honeywords was used to systematically test 210 different prompt structures, based on 16 prompt building blocks. Furthermore, all honeytokens were tested across different state-of-the-art LLMs to assess the varying performance of different models. Prompts performing optimally on one LLMs do not necessarily generalize well to another. Honeywords generated by GPT-3.5 were found to be less distinguishable from real passwords compared to previous methods of automated honeyword generation. Overall, the findings of this work demonstrate that generic LLMs are capable of creating a wide array of honeytokens using the presented prompt structures.	翻訳日:2024-04-26 18:02:25 公開日:2024-04-24
# ハイブリッド無線ボディエリアネットワーク(HyWBAN):セマンティックコミュニケーションとジャミング技術の進歩 Securing Hybrid Wireless Body Area Networks (HyWBAN): Advancements in Semantic Communications and Jamming Techniques ( http://arxiv.org/abs/2404.16120v1 ) ライセンス: Link先を確認	Simone Soderi, Mariella Särestöniemi, Syifaul Fuada, Matti Hämäläinen, Marcos Katz, Jari Iinatti,	(参考訳) 本稿では、スマートヘルスケアとIoT(Internet of Things)アプリケーションに不可欠なHybrid Wireless Body Area Networks(HyWBANs)のセキュリティを強化するための新しい戦略を検討する。高度なサイバー攻撃に対するHyWBANの脆弱性を認識し,セマンティックコミュニケーションとジャミングレシーバーの革新的な組み合わせを提案する。この二重層セキュリティ機構は、特にボディ内からオンボディ通信チャネルを含むシナリオにおいて、不正なアクセスやデータ漏洩を防止する。本研究では,生物組織を介したハイブリッド(無線,光)通信の伝搬を理解するために総合的な実験室計測を行い,これらの知見を利用してディープラーニング(DL)モデルを訓練するためのデータセットを洗練する。これらのモデルは次に、ジャミングレシーバーを使用してデータの機密性と整合性を高めるために、暗号鍵にリンクされたセマンティックな概念を生成する。提案モデルでは,特にジャミングを補完する楕円曲線Diffie-Hellman (ECDH) のような従来の暗号手法と比較して,エネルギー消費量の大幅な削減が示されている。われわれのアプローチは, 主要なセキュリティ問題に対処し, 将来安全なバイオメディカル通信システムの基盤となるものとなる。 This paper explores novel strategies to strengthen the security of Hybrid Wireless Body Area Networks (HyWBANs), essential in smart healthcare and Internet of Things (IoT) applications. Recognizing the vulnerability of HyWBAN to sophisticated cyber-attacks, we propose an innovative combination of semantic communications and jamming receivers. This dual-layered security mechanism protects against unauthorized access and data breaches, particularly in scenarios involving in-body to on-body communication channels. We conduct comprehensive laboratory measurements to understand hybrid (radio and optical) communication propagation through biological tissues and utilize these insights to refine a dataset for training a Deep Learning (DL) model. These models, in turn, generate semantic concepts linked to cryptographic keys for enhanced data confidentiality and integrity using a jamming receiver. The proposed model demonstrates a significant reduction in energy consumption compared to traditional cryptographic methods, like Elliptic Curve Diffie-Hellman (ECDH), especially when supplemented with jamming. Our approach addresses the primary security concerns and sets the baseline for future secure biomedical communication systems advancements.	翻訳日:2024-04-26 18:02:25 公開日:2024-04-24
# FairDeDup:セマンティックデータセットの重複における視覚領域の公平性の検出と緩和 FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication ( http://arxiv.org/abs/2404.16123v1 ) ライセンス: Link先を確認	Eric Slyman, Stefan Lee, Scott Cohen, Kushal Kafle,	(参考訳) 最近のデータセット復号化技術は、コンテンツ対応のデータセットプルーニングが、オリジナルのデータセットのトレーニングと比較して、大きなパフォーマンス損失を伴わないビジョンランゲージ事前訓練(VLP)モデルのトレーニングコストを劇的に削減できることを実証している。これらの結果は、Webから収集された一般的に使用されている画像キャプチャデータセットのプルーニングに基づいています。本研究は,これらのモデルにおける重複がこれらのバイアスの頻度にどのように影響するかを評価し,最新のSemDeDupアルゴリズムに容易に実装可能な修正を導入し,観測した負の効果を低減できることを示した。 LAION-400Mの非重複変種に基づいてトレーニングされたCLIPスタイルのモデルを調べると、提案したFairDeDupアルゴリズムは、CLIPベンチマークのゼロショット性能を維持しながら、FairFaceおよびFACETデータセット上のSemDeDup上でのフェアネス指標を継続的に改善することがわかった。 Recent dataset deduplication techniques have demonstrated that content-aware dataset pruning can dramatically reduce the cost of training Vision-Language Pretrained (VLP) models without significant performance losses compared to training on the original dataset. These results have been based on pruning commonly used image-caption datasets collected from the web -- datasets that are known to harbor harmful social biases that may then be codified in trained models. In this work, we evaluate how deduplication affects the prevalence of these biases in the resulting trained models and introduce an easy-to-implement modification to the recent SemDeDup algorithm that can reduce the negative effects that we observe. When examining CLIP-style models trained on deduplicated variants of LAION-400M, we find our proposed FairDeDup algorithm consistently leads to improved fairness metrics over SemDeDup on the FairFace and FACET datasets while maintaining zero-shot performance on CLIP benchmarks.	翻訳日:2024-04-26 18:02:25 公開日:2024-04-24
# インドにおけるWhatsApp for Businessの配置 A Situated-Infrastructuring of WhatsApp for Business in India ( http://arxiv.org/abs/2404.16124v1 ) ライセンス: Link先を確認	Ankolika De,	(参考訳) WhatsAppは、インドにおける重要なコミュニケーションツールとなり、文化的境界を超越し、国のデジタルランドスケープに深く統合されている。 MetaのWhatsApp for Businessの導入は、プラットフォームの人気とシームレスに一致し、ビジネスに重要なツールを提供する。しかし、収益化計画は、特に小規模企業にとって、収益目標とアクセシビリティのバランスをとる上で、課題となる。本研究は、談話分析を用いて、メタのインドにおけるWhatsAppのインフラについて検討し、技術的、社会的、文化的側面の動的相互作用を強調した。その結果、WhatsApp for Businessの展開に伴う潜在的なパワー差が強調され、漸進的ではあるが重要な修正が加えられ、研究者は、特に疎外化されたユーザーにとって、急激な技術的変化の影響と倫理について調査するよう促された。 WhatsApp has become a pivotal communication tool in India, transcending cultural boundaries and deeply integrating into the nation's digital landscape. Meta's introduction of WhatsApp for Business aligns seamlessly with the platform's popularity, offering businesses a crucial tool. However, the monetization plans pose challenges, particularly for smaller businesses, in balancing revenue goals with accessibility. This study, employing discourse analysis, examines Meta's infrastructuring of WhatsApp in India, emphasizing the dynamic interplay of technological, social, and cultural dimensions. Consequently, it highlights potential power differences caused by the deployment of WhatsApp for Business followed by its gradual but significant modifications, encouraging scholars to investigate the implications and ethics of rapid technological changes, particularly for marginalized users.	翻訳日:2024-04-26 18:02:25 公開日:2024-04-24
# 競合リスクの存在下でのEHRデータに対する静的および動的ランダム森林モデルの比較:中央線関連血流感染の予測 Comparison of static and dynamic random forests models for EHR data in the presence of competing risks: predicting central line-associated bloodstream infection ( http://arxiv.org/abs/2404.16127v1 ) ライセンス: Link先を確認	Elena Albu, Shan Gao, Pieter Stijnen, Frank Rademakers, Christel Janssens, Veerle Cossey, Yves Debaveye, Laure Wynants, Ben Van Calster,	(参考訳) 病院の入院に関する予後の結果は、一般的に検閲に苦しめられず、分類的にも時間的にもモデル化できる。競合イベントは一般的だが、しばしば無視される。本研究は無作為林(RF)モデルを用いて中央線関連血液ストリーム感染症(CLABSI)の発症リスクを予測した。 27478例(CLABSI, 1466例, 28426例)を対象とし, 静的および動的RFモデルの構築(CLABSI vs. CLABSI), マルチノミアル(CLABSI, 退院, 退院, 退院, 退院), 生存(CLABSIまで), 競合リスク(CLABSIまでの時間, 退院, 退院), 7日間のCLABSIリスクの予測を行った。列車/テストスプリット100回にわたってモデル性能を評価した。 AUROCはベースライン予測では0.74で、カテーテルエピソードでは5日目の予測では0.78まで上昇し、その後低下した。生存モデルはCLABSI(E:O比1.2から1.6)のリスクを過大評価し、AUROCは他のモデルよりも約0.01低かった。二項モデルと多項モデルでは計算時間が低かった。複数の結果イベントを含むモデル(複数のリスクと競合するリスク)は、バイナリやサバイバルモデルとは異なる内部構造を示す。検閲がない場合、複雑なモデリング選択はCLABSI予測のバイナリモデルと比較して予測性能を著しく改善しない。発生時に競合するイベントを検閲する生存モデルは避けるべきである。 Prognostic outcomes related to hospital admissions typically do not suffer from censoring, and can be modeled either categorically or as time-to-event. Competing events are common but often ignored. We compared the performance of random forest (RF) models to predict the risk of central line-associated bloodstream infections (CLABSI) using different outcome operationalizations. We included data from 27478 admissions to the University Hospitals Leuven, covering 30862 catheter episodes (970 CLABSI, 1466 deaths and 28426 discharges) to build static and dynamic RF models for binary (CLABSI vs no CLABSI), multinomial (CLABSI, discharge, death or no event), survival (time to CLABSI) and competing risks (time to CLABSI, discharge or death) outcomes to predict the 7-day CLABSI risk. We evaluated model performance across 100 train/test splits. Performance of binary, multinomial and competing risks models was similar: AUROC was 0.74 for baseline predictions, rose to 0.78 for predictions at day 5 in the catheter episode, and decreased thereafter. Survival models overestimated the risk of CLABSI (E:O ratios between 1.2 and 1.6), and had AUROCs about 0.01 lower than other models. Binary and multinomial models had lowest computation times. Models including multiple outcome events (multinomial and competing risks) display a different internal structure compared to binary and survival models. In the absence of censoring, complex modelling choices do not considerably improve the predictive performance compared to a binary model for CLABSI prediction in our studied settings. Survival models censoring the competing events at their time of occurrence should be avoided.	翻訳日:2024-04-26 18:02:25 公開日:2024-04-24
# 二進線形符号近傍のビット列上の量子一様重ね合わせを効率的に構築する Efficiently constructing a quantum uniform superposition over bit strings near a binary linear code ( http://arxiv.org/abs/2404.16129v1 ) ライセンス: Link先を確認	Edward Farhi, Stephen P. Jordan,	(参考訳) 高忠実度近似を、ハミング距離の全てのビット列上の量子重ね合わせである$\| \Psi_b \rangle$, 次元のコードワードの$b$を$\mathbb{Z}_2^n$で表し、その値を$n$, $b$, $k$とする量子回路で効率的に構築できることを実証する。我々は、請求を裏付ける$n=1000$で数値実験を行う。達成可能な半径$b$は、既知の古典的アルゴリズムが最も近いコードワードを効率的に見つけることができる距離よりもはるかに大きい。したがって、これらの状態は、文字列に最も近いコードワードを見つけるために計算を必要としない量子畳み込みによって準備することはできない。 $\mathbb{R}^n$ の格子の類似状態とは異なり、$\|\Psi_b \rangle$ は有界距離復号には役に立たない。さらに、重複計算を復号化することができる。これらの状態は、他のコードの問題を解決するのに使えるかもしれない。これらの状態を構築するのに使用されるテクニックは興味深く、コードを超えたアプリケーションを提供できることを願っています。 We demonstrate that a high fidelity approximation to $\| \Psi_b \rangle$, the quantum superposition over all bit strings within Hamming distance $b$ of the codewords of a dimension-$k$ linear code over $\mathbb{Z}_2^n$, can be efficiently constructed by a quantum circuit for large values of $n$, $b$ and $k$ which we characterize. We do numerical experiments at $n=1000$ which back up our claims. The achievable radius $b$ is much larger than the distance out to which known classical algorithms can efficiently find the nearest codeword. Hence, these states cannot be prepared by quantum constuctions that require uncomputing to find the codeword nearest a string. Unlike the analogous states for lattices in $\mathbb{R}^n$, $\|\Psi_b \rangle$ is not a useful resource for bounded distance decoding because the relevant overlap falls off too quickly with distance and known classical algorithms do better. Furthermore the overlap calculation can be dequantized. Perhaps these states could be used to solve other code problems. The technique used to construct these states is of interest and hopefully will have applications beyond codes.	翻訳日:2024-04-26 18:02:25 公開日:2024-04-24
# ローカルからグローバルへ:クエリに焦点をあてた要約へのグラフRAGアプローチ From Local to Global: A Graph RAG Approach to Query-Focused Summarization ( http://arxiv.org/abs/2404.16130v1 ) ライセンス: Link先を確認	Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Jonathan Larson,	(参考訳) 検索強化生成(RAG)を用いて、外部知識ソースから関連情報を検索することで、大規模言語モデル(LLM)が、プライベートおよび/または未確認の文書コレクションに関する質問に答えることができる。しかしながら、RAGは、明示的な検索タスクではなく、クエリ中心の要約(QFS)タスクであるため、データセットの主なテーマは何か? 一方、以前のQFS法は、典型的なRAGシステムによってインデックスされたテキストの量にスケールできない。これらのコントラスト手法の強みを生かしたグラフRAG手法を提案し,ユーザ質問の一般性とインデックスするソーステキスト量の両方をスケールするプライベートテキストコーパスに対する質問応答を提案する。提案手法は LLM を用いてグラフベースのテキストインデックスを2段階に構築する。まず,資料からエンティティ知識グラフを導出し,近縁なエンティティのすべてのグループに対するコミュニティ要約を事前に生成する。質問があると、各コミュニティの要約は部分的な応答を生成するために使用され、その後、すべての部分的な応答はユーザーへの最終応答で再度要約される。 100万のトークン範囲のデータセットに対するグローバルなセンスメイキング質問のクラスについて、グラフRAGは、生成された回答の包括性と多様性の両方に対して、na\\ive RAGベースラインよりも大幅に改善されていることを示す。グローバルとローカルのGraph RAGアプローチのオープンソースでPythonベースの実装がhttps://aka.ms/graphrag.comで公開される予定だ。 The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions over private and/or previously unseen document collections. However, RAG fails on global questions directed at an entire text corpus, such as "What are the main themes in the dataset?", since this is inherently a query-focused summarization (QFS) task, rather than an explicit retrieval task. Prior QFS methods, meanwhile, fail to scale to the quantities of text indexed by typical RAG systems. To combine the strengths of these contrasting methods, we propose a Graph RAG approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text to be indexed. Our approach uses an LLM to build a graph-based text index in two stages: first to derive an entity knowledge graph from the source documents, then to pregenerate community summaries for all groups of closely-related entities. Given a question, each community summary is used to generate a partial response, before all partial responses are again summarized in a final response to the user. For a class of global sensemaking questions over datasets in the 1 million token range, we show that Graph RAG leads to substantial improvements over a na\"ive RAG baseline for both the comprehensiveness and diversity of generated answers. An open-source, Python-based implementation of both global and local Graph RAG approaches is forthcoming at https://aka.ms/graphrag.	翻訳日:2024-04-26 18:02:25 公開日:2024-04-24
# クラスタ削除のための組合せ近似: よりシンプルで、より速く、より良く Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better ( http://arxiv.org/abs/2404.16131v1 ) ライセンス: Link先を確認	Vicente Balmaseda, Ying Xu, Yixin Cao, Nate Veldt,	(参考訳) クラスタ削除はNPハードグラフクラスタリングの目的であり、計算生物学やソーシャルネットワーク分析の応用において、グラフを斜めに分割するために最小限のエッジを削除することが目的である。まず,2つの近似アルゴリズムの厳密な解析を行い,その近似保証を4から3に改善する。さらに、補助グラフにおいて最大等級の頂点を優しく取り、その周囲にクラスタを形成することにより、両アルゴリズムを驚くほど単純な方法でデランドマイズすることができることを示す。これらのアルゴリズムの1つは線形プログラムの解法に依存する。私たちの最後の貢献は、理論と実践においてはるかにスケーラブルになるように、新しく純粋に組み合わせたアプローチを設計することです。 Cluster deletion is an NP-hard graph clustering objective with applications in computational biology and social network analysis, where the goal is to delete a minimum number of edges to partition a graph into cliques. We first provide a tighter analysis of two previous approximation algorithms, improving their approximation guarantees from 4 to 3. Moreover, we show that both algorithms can be derandomized in a surprisingly simple way, by greedily taking a vertex of maximum degree in an auxiliary graph and forming a cluster around it. One of these algorithms relies on solving a linear program. Our final contribution is to design a new and purely combinatorial approach for doing so that is far more scalable in theory and practice.	翻訳日:2024-04-26 18:02:25 公開日:2024-04-24
# 翻訳OCTAにおける網膜の特徴の定量的解析 Quantitative Characterization of Retinal Features in Translated OCTA ( http://arxiv.org/abs/2404.16133v1 ) ライセンス: Link先を確認	Rashadul Hasan Badhon, Atalie Carina Thompson, Jennifer I. Lim, Theodore Leng, Minhaj Nur Alam,	(参考訳) 目的: 本研究は, 生成機械学習(ML)を用いて光コヒーレンス・トモグラフィー(OCT)画像から光コヒーレンス・トモグラフィー(OCTA)画像へ変換し, 特殊なOCTAハードウェアの必要性を回避できる可能性を検討した。方法: 2次元血管分割モデルと2次元OCTA画像翻訳モデルを含む生成対向ネットワークフレームワークの実装を含む方法。この研究は、TR-OCTA画像の品質を評価するために、解像度と疾患ステータスに基づいてサブセットに分割された500人の公開データセットを利用している。この検証は、翻訳された画像とグラウンド・真理OCTA(GT-OCTA)を比較するために、いくつかの品質と定量的な指標を用いる。そして,GT-OCTAを用いたTR-OCTAの血管特性を定量的に解析し,TR-OCTAを用いた客観的疾患診断の可能性について検討した。結果: TR-OCTAは3mmデータセットと6mmデータセットの両方で高画質(GT-OCTAと比較して高分解能,中等度構造類似度,コントラスト品質)を示した。特に疾患患者では, 血管計測値に若干の差があった。血管形態は局所的な血管歪みの影響を受けやすいが, 血管周囲の血管形態は, 局所的な血管歪みの影響を受けやすい傾向を示した。結論:本研究は, TR-OCTAの血管的特徴を疾患検出に利用することにより, 臨床実践におけるOCTA導入の限界に対する有望な解決策を示す。翻訳関連性:本研究は、詳細な血管像をより広く利用し、コストの高いOCTA機器への依存を減らすことにより、網膜疾患の診断過程を著しく向上させる可能性がある。 Purpose: This study explores the feasibility of using generative machine learning (ML) to translate Optical Coherence Tomography (OCT) images into Optical Coherence Tomography Angiography (OCTA) images, potentially bypassing the need for specialized OCTA hardware. Methods: The method involved implementing a generative adversarial network framework that includes a 2D vascular segmentation model and a 2D OCTA image translation model. The study utilizes a public dataset of 500 patients, divided into subsets based on resolution and disease status, to validate the quality of TR-OCTA images. The validation employs several quality and quantitative metrics to compare the translated images with ground truth OCTAs (GT-OCTA). We then quantitatively characterize vascular features generated in TR-OCTAs with GT-OCTAs to assess the feasibility of using TR-OCTA for objective disease diagnosis. Result: TR-OCTAs showed high image quality in both 3 and 6 mm datasets (high-resolution, moderate structural similarity and contrast quality compared to GT-OCTAs). There were slight discrepancies in vascular metrics, especially in diseased patients. Blood vessel features like tortuosity and vessel perimeter index showed a better trend compared to density features which are affected by local vascular distortions. Conclusion: This study presents a promising solution to the limitations of OCTA adoption in clinical practice by using vascular features from TR-OCTA for disease detection. Translation relevance: This study has the potential to significantly enhance the diagnostic process for retinal diseases by making detailed vascular imaging more widely available and reducing dependency on costly OCTA equipment.	翻訳日:2024-04-26 18:02:25 公開日:2024-04-24
# グラフニューラルネットワークを用いたパワー障害カスケード予測 Power Failure Cascade Prediction using Graph Neural Networks ( http://arxiv.org/abs/2404.16134v1 ) ライセンス: Link先を確認	Sathwik Chadaga, Xinyu Wu, Eytan Modiano,	(参考訳) 分岐故障による停電の予測の問題点を考察する。本稿では,初期コンシデントと電力注入値が与えられたカスケードプロセスの各世代における格子状態を予測するグラフニューラルネットワークに基づくフローフリーモデルを提案する。シミュレーションから生成したカスケードシーケンスデータプールを用いて,提案モデルを訓練する。そして、そのモデルを様々なレベルの粒度で評価する。モデルが障害サイズ、最終的なグリッド状態、およびカスケード内の各ブランチの障害時間ステップを予測するためのいくつかのエラーメトリクスを示す。我々は、影響モデルに対してグラフニューラルネットワークモデルをベンチマークする。ランダムにスケールした電力注入値の汎用性に加えて、グラフニューラルネットワークモデルは、対応する負荷プロファイルに特化して構築された複数の影響モデルよりも優れていることを示す。最後に,提案モデルにより,ほぼ2桁の計算時間を短縮できることを示す。 We consider the problem of predicting power failure cascades due to branch failures. We propose a flow-free model based on graph neural networks that predicts grid states at every generation of a cascade process given an initial contingency and power injection values. We train the proposed model using a cascade sequence data pool generated from simulations. We then evaluate our model at various levels of granularity. We present several error metrics that gauge the model's ability to predict the failure size, the final grid state, and the failure time steps of each branch within the cascade. We benchmark the graph neural network model against influence models. We show that, in addition to being generic over randomly scaled power injection values, the graph neural network model outperforms multiple influence models that are built specifically for their corresponding loading profiles. Finally, we show that the proposed model reduces the computational time by almost two orders of magnitude.	翻訳日:2024-04-26 18:02:25 公開日:2024-04-24
# Performant near-term quantum combinatorial optimization Performant near-term quantum combinatorial optimization ( http://arxiv.org/abs/2404.16135v1 ) ライセンス: Link先を確認	Titus D. Morris, Phillip C. Lotshaw,	(参考訳) 本稿では,線形深度回路を用いた組合せ最適化問題を解くための変分量子アルゴリズムを提案する。我々のアルゴリズムは、ターゲット組合せ関数の各項を制御するために設計されたハミルトン生成器からなるアンサッツと、量子想像時間進化の修正版に続くパラメータ更新を使用する。我々は,MAXCUT問題に対する解を目標とする数値シミュレーションにおいて,このアンサッツを評価する。状態の進化は想像上の時間進化を忠実に模倣し、その最適解収束は古典的ハミルトンスペクトルの適応変換によってさらに改善され、資源はアイデンティティに近い最適化されたゲートを刈り取ることで最小化される。これらの革新により、アルゴリズムは常に最適解に収束し、その過程で興味深い高絡み合いのダイナミクスを持つ。このパフォーマンスとリソース最小のアプローチは、短期量子コンピューティングハードウェアにおける潜在的な量子計算上の利点の候補である。 We present a variational quantum algorithm for solving combinatorial optimization problems with linear-depth circuits. Our algorithm uses an ansatz composed of Hamiltonian generators designed to control each term in the target combinatorial function, along with parameter updates following a modified version of quantum imaginary time evolution. We evaluate this ansatz in numerical simulations that target solutions to the MAXCUT problem. The state evolution is shown to closely mimic imaginary time evolution, and its optimal-solution convergence is further improved using adaptive transformations of the classical Hamiltonian spectrum, while resources are minimized by pruning optimized gates that are close to the identity. With these innovations, the algorithm consistently converges to optimal solutions, with interesting highly-entangled dynamics along the way. This performant and resource-minimal approach is a promising candidate for potential quantum computational advantages on near-term quantum computing hardware.	翻訳日:2024-04-26 18:02:25 公開日:2024-04-24
# BlendMimic3Dデータセットの導入とGCN再構成 3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement ( http://arxiv.org/abs/2404.16136v1 ) ライセンス: Link先を確認	Filipa Lino, Carlos Santiago, Manuel Marques,	(参考訳) HPE(3D Human Pose Estimation)の分野では、特に閉塞のあるシナリオにおいて、人間のポーズを正確に推定することが大きな課題である。この研究は、データの不足とオクルージョンを扱うための戦略に関して、3D HPEにおける現在の最先端のギャップを特定し、対処する。 BlendMimic3Dデータセットは、3D HPEアルゴリズムのシームレスな統合のために閉塞が発生している現実世界の状況を模倣するように設計されている。さらに,グラフモデルによるポーズ表現を強化するために,GCN(Graph Convolutional Network)を用いた3次元ポーズ改善ブロックを提案する。このGCNブロックはプラグアンドプレイのソリューションとして機能し、様々な3D HPEフレームワークに対応できる。 BlendMimic3Dの排他的データを用いてGCNをトレーニングすることにより、排他的ポーズの解決において、非排他的ポーズに匹敵する結果が得られた。プロジェクトのWebページはhttps://blendmimic3d.github.io/BlendMimic3D/.comで公開されている。 In the field of 3D Human Pose Estimation (HPE), accurately estimating human pose, especially in scenarios with occlusions, is a significant challenge. This work identifies and addresses a gap in the current state of the art in 3D HPE concerning the scarcity of data and strategies for handling occlusions. We introduce our novel BlendMimic3D dataset, designed to mimic real-world situations where occlusions occur for seamless integration in 3D HPE algorithms. Additionally, we propose a 3D pose refinement block, employing a Graph Convolutional Network (GCN) to enhance pose representation through a graph model. This GCN block acts as a plug-and-play solution, adaptable to various 3D HPE frameworks without requiring retraining them. By training the GCN with occluded data from BlendMimic3D, we demonstrate significant improvements in resolving occluded poses, with comparable results for non-occluded ones. Project web page is available at https://blendmimic3d.github.io/BlendMimic3D/.	翻訳日:2024-04-26 18:02:25 公開日:2024-04-24
# DFT-s-OFDMにおけるPAPR低減のためのパルス整形設計 Learned Pulse Shaping Design for PAPR Reduction in DFT-s-OFDM ( http://arxiv.org/abs/2404.16137v1 ) ライセンス: Link先を確認	Fabrizio Carpi, Soheil Rostami, Joonyoung Cho, Siddharth Garg, Elza Erkip, Charlie Jianzhong Zhang,	(参考訳) 高ピーク対平均電力比(PAPR)は細胞系、特にアップリンク方向における細胞被覆を制限する主要な要因の1つである。離散フーリエ変換拡散直交周波数領域多重化(DFT-s-OFDM)とスペクトル拡張周波数領域スペクトル整形(FDSS)は、アップリンク波形のPAPRを下げるための効率的な手法の1つである。本研究では,FDSSフィルタを決定する機械学習ベースのフレームワークを提案し,シンボル誤り率(SER),PAPR,スペクトル平坦性要件のトレードオフを最適化する。我々のエンドツーエンド最適化フレームワークは、Nyquist zero-ISI(シンボル間干渉)条件を含む、複数の重要な設計制約を考慮に入れている。その結果,学習したFDSSフィルタは従来のベースラインに比べてPAPRを低下させ,SER劣化を最小限に抑えた。最適化のパラメータをチューニングすることで、PAPR削減のためのFDSSフィルタの基本的制限と特性を理解するのにも役立ちます。 High peak-to-average power ratio (PAPR) is one of the main factors limiting cell coverage for cellular systems, especially in the uplink direction. Discrete Fourier transform spread orthogonal frequency-domain multiplexing (DFT-s-OFDM) with spectrally-extended frequency-domain spectrum shaping (FDSS) is one of the efficient techniques deployed to lower the PAPR of the uplink waveforms. In this work, we propose a machine learning-based framework to determine the FDSS filter, optimizing a tradeoff between the symbol error rate (SER), the PAPR, and the spectral flatness requirements. Our end-to-end optimization framework considers multiple important design constraints, including the Nyquist zero-ISI (inter-symbol interference) condition. The numerical results show that learned FDSS filters lower the PAPR compared to conventional baselines, with minimal SER degradation. Tuning the parameters of the optimization also helps us understand the fundamental limitations and characteristics of the FDSS filters for PAPR reduction.	翻訳日:2024-04-26 18:02:25 公開日:2024-04-24
# 実世界の課題から分類した協調知覚の中間融合法に関する調査研究 A Survey on Intermediate Fusion Methods for Collaborative Perception Categorized by Real World Challenges ( http://arxiv.org/abs/2404.16139v1 ) ライセンス: Link先を確認	Melih Yazgan, Thomas Graf, Min Liu, J. Marius Zoellner,	(参考訳) 本研究は、現実の課題によって分類された自律運転の協調認識における中間核融合手法を解析する。様々な手法について検討し,その特徴と採用した評価指標について詳述する。その焦点は、送信効率、ローカライゼーションエラー、通信障害、異質性といった課題に対処することにある。さらに、敵の攻撃や防衛に対抗するための戦略や、ドメインシフトに適応するためのアプローチについても検討する。本研究の目的は, 自律運転における協調的認識の分野を前進させる上で, 中間核融合法が果たす役割を明らかにすることである。 This survey analyzes intermediate fusion methods in collaborative perception for autonomous driving, categorized by real-world challenges. We examine various methods, detailing their features and the evaluation metrics they employ. The focus is on addressing challenges like transmission efficiency, localization errors, communication disruptions, and heterogeneity. Moreover, we explore strategies to counter adversarial attacks and defenses, as well as approaches to adapt to domain shifts. The objective is to present an overview of how intermediate fusion methods effectively meet these diverse challenges, highlighting their role in advancing the field of collaborative perception in autonomous driving.	翻訳日:2024-04-26 18:02:25 公開日:2024-04-24
# 安定成層乱流の機械学習によるURANSの閉鎖--物理時間スケールと深部時系列モデルのデータハイパーパラメータを接続する Machine-Learned Closure of URANS for Stably Stratified Turbulence: Connecting Physical Timescales & Data Hyperparameters of Deep Time-Series Models ( http://arxiv.org/abs/2404.16141v1 ) ライセンス: Link先を確認	Muralikrishnan Gopalakrishnan Meena, Demetri Liousas, Andrew D. Simin, Aditya Kashi, Wesley H. Brewer, James J. Riley, Stephen M. de Bruyn Kops,	(参考訳) 安定成層乱流(SST)に適用した非定常レイノルズ平均ナビエストークス(URANS)方程式のクロージャモデリングのための時系列機械学習(ML)法を開発した。 SSTは力の微妙なバランスに強く影響され、崩壊する場合にはより異方性になる。さらに、URANS方程式の項のいくつかで説明される物理現象の限定的な理解がある。各項を個別にモデル化しようとするよりも、項群、すなわち力のバランスを直接モデル化する機械学習の能力を探求することが魅力的である。等質で安定に成層された崩壊SSTを一様密度勾配で検討し,次元の減少を可能とした。本稿では,Long Short-Term Memory (LSTM) とNeural Ordinary Differential Equation (NODE) の2つの時系列MLモデルを検討する。どちらのモデルも正確に動作し、後方試験では数値的に安定である。さらに、複雑なシステムの物理的に関連する時間スケールを抽出することにより、MLモデルのデータ要求について検討する。 MLモデルがSSTの力学を正確に捉えるために必要な最小情報の時間尺度の比率は,流れのレイノルズ数と一致することがわかった。現在のフレームワークは、高次元の複雑なSSTフローのダイナミクスを捉えるためのそのようなモデルの能力を探るためのバックボーンを提供する。 We develop time-series machine learning (ML) methods for closure modeling of the Unsteady Reynolds Averaged Navier Stokes (URANS) equations applied to stably stratified turbulence (SST). SST is strongly affected by fine balances between forces and becomes more anisotropic in time for decaying cases. Moreover, there is a limited understanding of the physical phenomena described by some of the terms in the URANS equations. Rather than attempting to model each term separately, it is attractive to explore the capability of machine learning to model groups of terms, i.e., to directly model the force balances. We consider decaying SST which are homogeneous and stably stratified by a uniform density gradient, enabling dimensionality reduction. We consider two time-series ML models: Long Short-Term Memory (LSTM) and Neural Ordinary Differential Equation (NODE). Both models perform accurately and are numerically stable in a posteriori tests. Furthermore, we explore the data requirements of the ML models by extracting physically relevant timescales of the complex system. We find that the ratio of the timescales of the minimum information required by the ML models to accurately capture the dynamics of the SST corresponds to the Reynolds number of the flow. The current framework provides the backbone to explore the capability of such models to capture the dynamics of higher-dimensional complex SST flows.	翻訳日:2024-04-26 16:02:40 公開日:2024-04-24
# 量子および古典的機械学習モデルにおける逆ロバスト性の比較分析 A Comparative Analysis of Adversarial Robustness for Quantum and Classical Machine Learning Models ( http://arxiv.org/abs/2404.16154v1 ) ライセンス: Link先を確認	Maximilian Wendlinger, Kilian Tscharke, Pascal Debus,	(参考訳) 量子機械学習(QML)は、研究や産業から大きな関心を集め続けている分野である。 QMLモデルは、古典的な機械学習モデルとほとんど同じ方法で敵攻撃に弱いことが示されているが、量子モデルと古典モデルの敵攻撃を比較する方法はほとんど分かっていない。本稿では,移動攻撃,摂動パターン,リプシッツ境界を用いた古典的,量子的モデルの相似性と相違について系統的に検討する。具体的には、特徴属性の定量的分析を可能にする手作りデータセットの分類タスクに焦点を当てる。これにより、理論的にも実験的にも、分類ネットワークの堅牢性に関する洞察を得ることができる。まず、振幅や再アップロード符号化回路などの典型的なQMLモデルアーキテクチャと変分パラメータを比較し、従来のConvNetアーキテクチャと比較する。次に、QML回路の古典的近似(元はランダムフーリエ特徴サンプリングで得られたが、トレーニング可能な符号化に適合する)を導入し、他のアーキテクチャと比較してフーリエネットワークと呼ばれるモデルを評価する。以上の結果から,このフーリエネットワークは量子古典境界上の「中間基底」と見なせることがわかった。両方向の境界を越える敵攻撃は成功したが、正規化は量子ネットワークをより堅牢にし、リプシッツ境界や転送攻撃に直接影響を与えることを示す。 Quantum machine learning (QML) continues to be an area of tremendous interest from research and industry. While QML models have been shown to be vulnerable to adversarial attacks much in the same manner as classical machine learning models, it is still largely unknown how to compare adversarial attacks on quantum versus classical models. In this paper, we show how to systematically investigate the similarities and differences in adversarial robustness of classical and quantum models using transfer attacks, perturbation patterns and Lipschitz bounds. More specifically, we focus on classification tasks on a handcrafted dataset that allows quantitative analysis for feature attribution. This enables us to get insight, both theoretically and experimentally, on the robustness of classification networks. We start by comparing typical QML model architectures such as amplitude and re-upload encoding circuits with variational parameters to a classical ConvNet architecture. Next, we introduce a classical approximation of QML circuits (originally obtained with Random Fourier Features sampling but adapted in this work to fit a trainable encoding) and evaluate this model, denoted Fourier network, in comparison to other architectures. Our findings show that this Fourier network can be seen as a "middle ground" on the quantum-classical boundary. While adversarial attacks successfully transfer across this boundary in both directions, we also show that regularization helps quantum networks to be more robust, which has direct impact on Lipschitz bounds and transfer attacks.	翻訳日:2024-04-26 16:02:40 公開日:2024-04-24
# SAM は EIG の夢か? 期待情報を用いた対話型セグメンタの性能評価 Does SAM dream of EIG? Characterizing Interactive Segmenter Performance using Expected Information Gain ( http://arxiv.org/abs/2404.16155v1 ) ライセンス: Link先を確認	Kuan-I Chung, Daniel Moyer,	(参考訳) 本稿では,対話型セグメンテーションモデルの評価手法を提案する。ベイズ実験設計の概念に基づいて、この手順はモデルの点プロンプトの理解と所望のセグメンテーションマスクとの対応を測定する。我々は、Oracle Diceインデックスの測定が、この特性の測定に無関心であるか、あるいは誤解を招くことさえ示している。本稿では,3つの対話的セグメンテーションモデルと2つの大きな画像セグメンテーションデータセットのサブセットに提案手法を適用した。 We introduce an assessment procedure for interactive segmentation models. Based on concepts from Bayesian Experimental Design, the procedure measures a model's understanding of point prompts and their correspondence with the desired segmentation mask. We show that Oracle Dice index measurements are insensitive or even misleading in measuring this property. We demonstrate the use of the proposed procedure on three interactive segmentation models and subsets of two large image segmentation datasets.	翻訳日:2024-04-26 16:02:40 公開日:2024-04-24
# 量子ガンのガーディアン Guardians of the Quantum GAN ( http://arxiv.org/abs/2404.16156v1 ) ライセンス: Link先を確認	Archisman Ghosh, Debarshi Kundu, Avimita Chatterjee, Swaroop Ghosh,	(参考訳) Quantum Generative Adversarial Networks (qGANs)は、画像生成量子機械学習モデルの最前線にある。量子機械学習モデルをトレーニングし、推論するためのNISQ(Noisy Intermediate-Scale Quantum)デバイスへの需要の増加に対応するため、量子ハードウェアをサービスとして提供するサードパーティベンダの数は増加すると予想されている。この拡張は、信頼できないベンダーが量子機械学習モデルからプロプライエタリな情報を盗むリスクをもたらす。そこで本研究では,qGANsのトレーニングフェーズに埋め込まれたノイズシグネチャを非侵襲的な透かしとして活用する新しい透かし手法を提案する。透かしは、qGANが生成した画像の中で識別可能であり、トレーニング中に使用する特定の量子ハードウェアをトレースすることで、所有権の強い証明を提供する。セキュリティの堅牢性をさらに高めるため、複数の量子ハードウェアのシーケンス上でqGANのトレーニングを提案し、敵が複製し難い全てのトレーニングハードウェアのノイズシグネチャを含む複雑な透かしを埋め込む。また、この透かしを頑健に抽出する機械学習分類器を開発し、モデルの真正性を検証したqGANによって生成された画像からトレーニングハードウェア(またはハードウェアスイート)を識別する。ウォーターマークの署名は、トレーニングに使用されたハードウェアとは異なるハードウェアの推論に対して堅牢である点に注意が必要だ。個別の量子ハードウェア上でのQGANのトレーニングには,それぞれ100%と90%の透かし抽出精度が得られた(異なるハードウェア上での参照)。トレーニング中のパラメータの進化は量子ノイズによって強く変調されるため、提案された透かしは他の量子機械学習モデルにも拡張することができる。 Quantum Generative Adversarial Networks (qGANs) are at the forefront of image-generating quantum machine learning models. To accommodate the growing demand for Noisy Intermediate-Scale Quantum (NISQ) devices to train and infer quantum machine learning models, the number of third-party vendors offering quantum hardware as a service is expected to rise. This expansion introduces the risk of untrusted vendors potentially stealing proprietary information from the quantum machine learning models. To address this concern we propose a novel watermarking technique that exploits the noise signature embedded during the training phase of qGANs as a non-invasive watermark. The watermark is identifiable in the images generated by the qGAN allowing us to trace the specific quantum hardware used during training hence providing strong proof of ownership. To further enhance the security robustness, we propose the training of qGANs on a sequence of multiple quantum hardware, embedding a complex watermark comprising the noise signatures of all the training hardware that is difficult for adversaries to replicate. We also develop a machine learning classifier to extract this watermark robustly, thereby identifying the training hardware (or the suite of hardware) from the images generated by the qGAN validating the authenticity of the model. We note that the watermark signature is robust against inferencing on hardware different than the hardware that was used for training. We obtain watermark extraction accuracy of 100% and ~90% for training the qGAN on individual and multiple quantum hardware setups (and inferencing on different hardware), respectively. Since parameter evolution during training is strongly modulated by quantum noise, the proposed watermark can be extended to other quantum machine learning models as well.	翻訳日:2024-04-26 16:02:40 公開日:2024-04-24
# マルチFPGAプラットフォームにおける大規模変圧器実装の可能性 The Feasibility of Implementing Large-Scale Transformers on Multi-FPGA Platforms ( http://arxiv.org/abs/2404.16158v1 ) ライセンス: Link先を確認	Yu Gao, Juan Camilo Vega, Paul Chow,	(参考訳) FPGAは、データセンターでLarge Language Models (LLMs)のような大規模な機械学習アプリケーションの実装について議論する際に、ほとんど言及されない。単一のFPGAがGPUと性能の競争力を持つことを示す多くの証拠があり、特に低レイテンシで、電力を考慮した場合の方がはるかに効率的である。このことは、大規模機械学習アプリケーションに複数のFPGAを使うことを探求するメリットがあることを示唆している。複数のFPGAを使用する場合の課題は、マルチFPGAアプリケーションの開発とデプロイに一般的に受け入れられるフローがない、すなわち、大きなアプリケーションを記述し、複数のFPGAにマップし、マルチFPGAプラットフォームにアプリケーションをデプロイするツールがないことである。本稿では,スケーラブルなマルチFPGAプラットフォームと大規模アプリケーションをプラットフォームにマップするツールを開発することにより,複数のFPGAを用いた大規模トランスフォーマーの実現の可能性を検討する。 I-BERTトランスの効率的なマルチFPGAバージョンを設計し、6つのFPGAを概念実証として1つのエンコーダを実装することで、我々のプラットフォームとツールが動作することを示す。概念実証のプロトタイプと最新のFPGAを用いたGPUの性能評価に基づいて、大規模機械学習アプリケーションの世界にはFPGAの場所が存在すると結論付けている。我々は、適切なインフラストラクチャとツールで、LLMのようなアプリケーションにFPGAを使用することの可能なメリットを引き続き探求することが妥当であることを示す、有望な第一歩を実証する。 FPGAs are rarely mentioned when discussing the implementation of large machine learning applications, such as Large Language Models (LLMs), in the data center. There has been much evidence showing that single FPGAs can be competitive with GPUs in performance for some computations, especially for low latency, and often much more efficient when power is considered. This suggests that there is merit to exploring the use of multiple FPGAs for large machine learning applications. The challenge with using multiple FPGAs is that there is no commonly-accepted flow for developing and deploying multi-FPGA applications, i.e., there are no tools to describe a large application, map it to multiple FPGAs and then deploy the application on a multi-FPGA platform. In this paper, we explore the feasibility of implementing large transformers using multiple FPGAs by developing a scalable multi-FPGA platform and some tools to map large applications to the platform. We validate our approach by designing an efficient multi-FPGA version of the I-BERT transformer and implement one encoder using six FPGAs as a working proof-of-concept to show that our platform and tools work. Based on our proof-of-concept prototype and the estimations of performance using the latest FPGAs compared to GPUs, we conclude that there can be a place for FPGAs in the world of large machine learning applications. We demonstrate a promising first step that shows that with the right infrastructure and tools it is reasonable to continue to explore the possible benefits of using FPGAs for applications such as LLMs.	翻訳日:2024-04-26 16:02:40 公開日:2024-04-24
# AFU: アクターフリーの批判者が継続的管理のためにオフポリティのRLを更新 AFU: Actor-Free critic Updates in off-policy RL for continuous control ( http://arxiv.org/abs/2404.16159v1 ) ライセンス: Link先を確認	Nicolas Perrin-Gilbert,	(参考訳) 本稿では、連続的な行動空間に対するQラーニングにおける「最大Q問題」を、回帰と条件付き勾配スケーリングに基づく解を用いて新しい方法で解決する、非政治的な深部RLアルゴリズムであるAFUを提案する。 AFUには俳優がいるが、批評家の更新は完全に独立している。その結果、俳優は自由に選択できる。初期バージョンであるAFU-alphaでは、Soft Actor-Critic(SAC)と同じ確率的アクターを用いているが、SACの単純な障害モードを研究し、アクター更新を局所的な最適状態に閉じ込められにくくするためにAFUをどう修正できるかを示し、アルゴリズムの第2バージョンであるAFU-beta(AFU-beta)が実現される。両バージョンのAFUのサンプル効率を実証し,アクター批判的視点から逸脱しながら,最先端のアクター批判手法と競合する最初のモデルフリーオフポリチアルゴリズムであることを示す。 This paper presents AFU, an off-policy deep RL algorithm addressing in a new way the challenging "max-Q problem" in Q-learning for continuous action spaces, with a solution based on regression and conditional gradient scaling. AFU has an actor but its critic updates are entirely independent from it. As a consequence, the actor can be chosen freely. In the initial version, AFU-alpha, we employ the same stochastic actor as in Soft Actor-Critic (SAC), but we then study a simple failure mode of SAC and show how AFU can be modified to make actor updates less likely to become trapped in local optima, resulting in a second version of the algorithm, AFU-beta. Experimental results demonstrate the sample efficiency of both versions of AFU, marking it as the first model-free off-policy algorithm competitive with state-of-the-art actor-critic methods while departing from the actor-critic perspective.	翻訳日:2024-04-26 16:02:40 公開日:2024-04-24
# アシスタントを用いた心理療法チャットボットのドメイン特異的改善 Domain-Specific Improvement on Psychotherapy Chatbot Using Assistant ( http://arxiv.org/abs/2404.16160v1 ) ライセンス: Link先を確認	Cheng Kang, Daniel Novak, Katerina Urbanova, Yuqing Cheng, Yong Hu,	(参考訳) 大規模言語モデル (LLM) は、人手による命令データを用いた特定のタスクに対する印象的な一般化機能を実証している。しかし、そのような指導データに対する限られた量、多様性、専門知識は、ドメイン固有の指示が与えられた場合の精神療法タスクにおけるLLMのパフォーマンスに関する懸念を提起する。まず、AlexanderStreet療法に基づくドメイン特化補助命令を提案し、次に、適応微調整法と検索強化法を用いて、事前学習したLLMを改善する。自動評価と人的評価を用いて言語質を定量的に評価することにより、心理療法補助指導における事前学習のLLMが、最先端のLLM応答ベースラインを上回っていることを観察する。我々の助教授アプローチは、トレーニング済みのLSMに指示を合わせ、トレーニング済みのLSMにより心理学的な知識を与える半注釈法を提供する。 Large language models (LLMs) have demonstrated impressive generalization capabilities on specific tasks with human-written instruction data. However, the limited quantity, diversity, and professional expertise of such instruction data raise concerns about the performance of LLMs in psychotherapy tasks when provided with domain-specific instructions. To address this, we firstly propose Domain-Specific Assistant Instructions based on AlexanderStreet therapy, and secondly, we use an adaption fine-tuning method and retrieval augmented generation method to improve pre-trained LLMs. Through quantitative evaluation of linguistic quality using automatic and human evaluation, we observe that pre-trained LLMs on Psychotherapy Assistant Instructions outperform state-of-the-art LLMs response baselines. Our Assistant-Instruction approach offers a half-annotation method to align pre-trained LLMs with instructions and provide pre-trained LLMs with more psychotherapy knowledge.	翻訳日:2024-04-26 16:02:40 公開日:2024-04-24
# より現実的な環境へ向けた生涯マルチエージェントパスのスケーリング:研究課題と機会 Scaling Lifelong Multi-Agent Path Finding to More Realistic Settings: Research Challenges and Opportunities ( http://arxiv.org/abs/2404.16162v1 ) ライセンス: Link先を確認	He Jiang, Yulun Zhang, Rishi Veerapaneni, Jiaoyang Li,	(参考訳) MAPF(Multi-Agent Path Finding)は、複数のエージェントを衝突なしに開始点から目標へ移動させる問題である。 Lifelong MAPF (LMAPF) は、エージェントに新たな目標を継続的に割り当てることでMAPFを拡張する。我々は2023年のLMAPFコンペで優勝したロボットランナーのLMAPFに対して,いくつかの興味深い研究課題と今後の方向性を提示する。本稿では,3つの主要な研究課題について概説する。最初の課題は、多数のエージェント(例えば1万個)や非常に高いエージェント密度(例えば97.7%)に対して、限られた計画時間(例えば1ステップあたり1秒)で高品質なLMAPFソリューションを探すことである。我々は、より競争力のあるルールベースのMAPFアルゴリズムや最先端MAPFアルゴリズムの並列化など、今後の方向性を示す。第2の課題は、LMAPFアルゴリズムにおける混雑と筋活動の影響を緩和することである。本稿では,渋滞軽減のための移動誘導や交通ルールの開発,将来予測とリアルタイム検索の導入,最適なエージェント数の決定など,今後の方向性を示す。第3の課題は、文学と現実世界の応用で使用されるLMAPFモデルのギャップを埋めることである。我々は,より現実的なキノダイナミックモデル,実行の不確実性,システムの進化といった今後の方向性を提示する。 Multi-Agent Path Finding (MAPF) is the problem of moving multiple agents from starts to goals without collisions. Lifelong MAPF (LMAPF) extends MAPF by continuously assigning new goals to agents. We present our winning approach to the 2023 League of Robot Runners LMAPF competition, which leads us to several interesting research challenges and future directions. In this paper, we outline three main research challenges. The first challenge is to search for high-quality LMAPF solutions within a limited planning time (e.g., 1s per step) for a large number of agents (e.g., 10,000) or extremely high agent density (e.g., 97.7%). We present future directions such as developing more competitive rule-based and anytime MAPF algorithms and parallelizing state-of-the-art MAPF algorithms. The second challenge is to alleviate congestion and the effect of myopic behaviors in LMAPF algorithms. We present future directions, such as developing moving guidance and traffic rules to reduce congestion, incorporating future prediction and real-time search, and determining the optimal agent number. The third challenge is to bridge the gaps between the LMAPF models used in the literature and real-world applications. We present future directions, such as dealing with more realistic kinodynamic models, execution uncertainty, and evolving systems.	翻訳日:2024-04-26 16:02:40 公開日:2024-04-24
# ファクチュアル・ナレッジ・リコールにおけるLCMの全体的評価に向けて Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall ( http://arxiv.org/abs/2404.16164v1 ) ライセンス: Link先を確認	Jiaqing Yuan, Lin Pan, Chung-Wei Hang, Jiang Guo, Jiarong Jiang, Bonan Min, Patrick Ng, Zhiguo Wang,	(参考訳) 大規模言語モデル(LLM)は、様々なNLPタスクにおいて顕著な性能を示しており、幅広いユースケースで急速に採用されている。したがって、幻覚は依然として困難な問題であり、生成したアウトプットの事実性を評価することは極めて重要である。本研究は,事前学習から学んだ事実的知識を想起するLLMの能力と,その能力に影響を与える要因を評価することに焦点を当てる。そこで我々はFACT-BENCHを構築し,20のドメイン,134のプロパティタイプ,3つの応答タイプ,異なる知識人気レベルをカバーする。 10のモデルファミリーから31のモデルをベンチマークし、その長所と短所を総合的に評価する。事前学習のみのモデルが命令チューニングのモデルよりも常に優れており、モデルスケーリングの肯定的な効果は、より大きなモデルがすべてのモデルファミリに対してより小さいモデルよりも優れており、インストラクションチューニングが知識リコールを損なうことを観察する。しかし、GPT-4の最高性能は上行線との差が大きい。さらに,反実的実演を用いたインコンテキスト・エスペクタの役割について検討し,大規模モデルにおける事実的知識リコールの大幅な低下につながった。さらに、既知の知識と未知の知識を分離することによって、その劣化は、モデルの既知の知識と矛盾する模範者や、そのような模範者の数によって引き起こされる。最後に、LLaMA-7Bを未知の知識の異なる設定で微調整する。特に、モデルの既知の知識の微調整は有益であり、未知の知識と混ざった知識の微調整よりも一貫して優れている。ベンチマークを公開します。 Large language models (LLMs) have shown remarkable performance on a variety of NLP tasks, and are being rapidly adopted in a wide range of use cases. It is therefore of vital importance to holistically evaluate the factuality of their generated outputs, as hallucinations remain a challenging issue. In this work, we focus on assessing LLMs' ability to recall factual knowledge learned from pretraining, and the factors that affect this ability. To that end, we construct FACT-BENCH, a representative benchmark covering 20 domains, 134 property types, 3 answer types, and different knowledge popularity levels. We benchmark 31 models from 10 model families and provide a holistic assessment of their strengths and weaknesses. We observe that instruction-tuning hurts knowledge recall, as pretraining-only models consistently outperform their instruction-tuned counterparts, and positive effects of model scaling, as larger models outperform smaller ones for all model families. However, the best performance from GPT-4 still represents a large gap with the upper-bound. We additionally study the role of in-context exemplars using counterfactual demonstrations, which lead to significant degradation of factual knowledge recall for large models. By further decoupling model known and unknown knowledge, we find the degradation is attributed to exemplars that contradict a model's known knowledge, as well as the number of such exemplars. Lastly, we fine-tune LLaMA-7B in different settings of known and unknown knowledge. In particular, fine-tuning on a model's known knowledge is beneficial, and consistently outperforms fine-tuning on unknown and mixed knowledge. We will make our benchmark publicly available.	翻訳日:2024-04-26 16:02:40 公開日:2024-04-24
# 量子ネットワークのための連続冷却イオンの高速光子による絡み合い Fast photon-mediated entanglement of continuously-cooled trapped ions for quantum networking ( http://arxiv.org/abs/2404.16167v1 ) ライセンス: Link先を確認	Jameson O'Reilly, George Toh, Isabella Goetting, Sagnik Saha, Mikhail Shalaev, Allison Carter, Andrew Risinger, Ashish Kalakuntla, Tingguang Li, Ashrit Verma, Christopher Monroe,	(参考訳) 我々は2つのコトラップされた原子バリウムイオンの量子ビットを、各イオンから真空0.8NAの目的物を通して1つの可視光子を集め、それらを集積ファイバービームスプリッターを介して干渉し、偶然に検出することで絡み合わせる。これにより、クォービットは、観測された忠実度が F > 94% 以下の絡み合ったベル状態に投影される。また, 同調冷却用イッテルビウムイオンを導入し, 中断除去の必要性を除去し, 連続的絡み合い速度2501/sを実現した。 We entangle two co-trapped atomic barium ion qubits by collecting single visible photons from each ion through in-vacuo 0.8 NA objectives, interfering them through an integrated fiber-beamsplitter and detecting them in coincidence. This projects the qubits into an entangled Bell state with an observed fidelity lower bound of F > 94%. We also introduce an ytterbium ion for sympathetic cooling to remove the need for recooling interruptions and achieve a continuous entanglement rate of 250 1/s.	翻訳日:2024-04-26 16:02:40 公開日:2024-04-24
# 現代のUDAアルゴリズムにおける超確実現象 The Over-Certainty Phenomenon in Modern UDA Algorithms ( http://arxiv.org/abs/2404.16168v1 ) ライセンス: Link先を確認	Fin Amin, Jung-Eun Kim,	(参考訳) ニューラルネットワークがトレーニングセットから逸脱した不慣れなデータに直面している場合、これはドメインシフトを意味する。これらのネットワークは入力に関する予測を出力するが、これらの新しい観測に精通するレベルを説明できないのが普通である。この課題は、組み込みシステムやエッジデバイスなど、リソース制約のある設定でさらに顕著になる。このような課題に対処するために、我々は、ニューラルネットワークが観測するデータを認識することに関連して、ニューラルネットワークの判断境界を再検討し、確実な蒸留として作り出したアプローチを導入することを目的としている。一般的な作業は、教師なし領域適応(UDA)をモデルエントロピーの削減の目的としながら、キャリブレーションの不正確さに対処する意図しない出生モデルである。本稿では,従来の学習モデルの欠点を考察する。この問題の解決法として,計算資源が限られている環境に適合性を維持しつつ,精度を向上するだけでなく,モデルのキャリブレーションも保証するUDAアルゴリズムを提案する。 When neural networks are confronted with unfamiliar data that deviate from their training set, this signifies a domain shift. While these networks output predictions on their inputs, they typically fail to account for their level of familiarity with these novel observations. This challenge becomes even more pronounced in resource-constrained settings, such as embedded systems or edge devices. To address such challenges, we aim to recalibrate a neural network's decision boundaries in relation to its cognizance of the data it observes, introducing an approach we coin as certainty distillation. While prevailing works navigate unsupervised domain adaptation (UDA) with the goal of curtailing model entropy, they unintentionally birth models that grapple with calibration inaccuracies - a dilemma we term the over-certainty phenomenon. In this paper, we probe the drawbacks of this traditional learning model. As a solution to the issue, we propose a UDA algorithm that not only augments accuracy but also assures model calibration, all while maintaining suitability for environments with limited computational resources.	翻訳日:2024-04-26 16:02:40 公開日:2024-04-24
# MiMICRI:心血管画像分類モデルにおける領域中心の対実的説明に向けて MiMICRI: Towards Domain-centered Counterfactual Explanations of Cardiovascular Image Classification Models ( http://arxiv.org/abs/2404.16174v1 ) ライセンス: Link先を確認	Grace Guo, Lifu Deng, Animesh Tandon, Alex Endert, Bum Chul Kwon,	(参考訳) 近年、広くアクセス可能な大規模な医用画像データセットが普及し、心臓血管画像分類と分析のための人工知能(AI)モデルが急増している。同時に、これらのモデルによる潜在的に重大な影響は、特定の画像入力が与えられたモデル予測を説明することを目的とした、説明可能なAI(XAI)メソッドの開発を動機付けている。しかし、これらの手法の多くはドメイン専門家によって開発・評価されておらず、説明は専門知識やドメイン知識の観点からは文脈化されていない。本稿では,心血管画像分類モデルのドメイン中心の対実的説明を提供する,新しいフレームワークとピソンライブラリであるMIMICRIを提案する。 MiMICRIは、ユーザーが形態的構造に対応する医療画像のセグメントをインタラクティブに選択、置換するのに役立つ。生成された偽物から、ユーザーは各セグメントがモデル予測に与える影響を評価し、そのモデルを既知の医療事実に対して検証することができる。私たちはこの図書館を2人の医療専門家と評価した。我々の評価は、ドメイン中心のXAIアプローチがモデル説明の解釈可能性を高め、専門家が関連するドメイン知識の観点からモデルについて推論するのに役立つことを示す。しかし, 副作用の臨床的妥当性についても懸念が浮上した。我々は、MiMICRIフレームワークの汎用性と信頼性に関する議論と、医療場面におけるモデル解釈可能性のためのドメイン中心のXAI手法の開発に関する知見の意義を結論付けた。 The recent prevalence of publicly accessible, large medical imaging datasets has led to a proliferation of artificial intelligence (AI) models for cardiovascular image classification and analysis. At the same time, the potentially significant impacts of these models have motivated the development of a range of explainable AI (XAI) methods that aim to explain model predictions given certain image inputs. However, many of these methods are not developed or evaluated with domain experts, and explanations are not contextualized in terms of medical expertise or domain knowledge. In this paper, we propose a novel framework and python library, MiMICRI, that provides domain-centered counterfactual explanations of cardiovascular image classification models. MiMICRI helps users interactively select and replace segments of medical images that correspond to morphological structures. From the counterfactuals generated, users can then assess the influence of each segment on model predictions, and validate the model against known medical facts. We evaluate this library with two medical experts. Our evaluation demonstrates that a domain-centered XAI approach can enhance the interpretability of model explanations, and help experts reason about models in terms of relevant domain knowledge. However, concerns were also surfaced about the clinical plausibility of the counterfactuals generated. We conclude with a discussion on the generalizability and trustworthiness of the MiMICRI framework, as well as the implications of our findings on the development of domain-centered XAI methods for model interpretability in healthcare contexts.	翻訳日:2024-04-26 16:02:40 公開日:2024-04-24
# シリング攻撃の緩和によるレコメンダシステムの開発 Advancing Recommender Systems by mitigating Shilling attacks ( http://arxiv.org/abs/2404.16177v1 ) ライセンス: Link先を確認	Aditya Chichani, Juzer Golwala, Tejas Gundecha, Kiran Gawande,	(参考訳) 提供商品の数が指数関数的に増加し、ユーザが決定を下す前に同化できるデータ量が比較的小さいという前提を考えると、推奨システムはユーザの好みに応じてコンテンツを分類するのに役立つ。コラボレーティブ・フィルタリングは、優れた性能のため、リコメンデーションの計算に広く使われている手法である。しかし、この方法では、システムはレコメンデーションに偏見を抱く攻撃に対して脆弱になる。これらの攻撃は「シリング・アタック」と呼ばれ、システム内でアイテムを押したり、商品をヌークしたりする。本稿では,システム内のシリングプロファイルを正確に検出するアルゴリズムを提案するとともに,そのようなプロファイルがレコメンデーションに与える影響について検討する。 Considering the premise that the number of products offered grow in an exponential fashion and the amount of data that a user can assimilate before making a decision is relatively small, recommender systems help in categorizing content according to user preferences. Collaborative filtering is a widely used method for computing recommendations due to its good performance. But, this method makes the system vulnerable to attacks which try to bias the recommendations. These attacks, known as 'shilling attacks' are performed to push an item or nuke an item in the system. This paper proposes an algorithm to detect such shilling profiles in the system accurately and also study the effects of such profiles on the recommendations.	翻訳日:2024-04-26 16:02:40 公開日:2024-04-24
# S2DEVFMAP: 時系列における異常予測の最大化のためのデュアルアンサンブル投票融合を用いた自己教師付き学習フレームワーク S2DEVFMAP: Self-Supervised Learning Framework with Dual Ensemble Voting Fusion for Maximizing Anomaly Prediction in Timeseries ( http://arxiv.org/abs/2404.16179v1 ) ライセンス: Link先を確認	Sarala Naidu, Ning Xiong,	(参考訳) 異常検出は、特に冷却システムの信頼性と最適性能を維持する上で、産業環境において重要な役割を担っている。従来の異常検出手法は、様々なデータ特性やノイズレベルの変動を扱う際の課題に直面することが多く、効果は限られている。しかし、従来の異常検出は、しばしば単一モデルの応用に依存している。この研究は、5つの異種独立モデルと2重アンサンブル融合を用いた新しい頑健なアプローチを提案する。各種モデルは様々なシステムの振る舞いを捉え、融合戦略は検出効率を最大化し、誤報を最小限にする。各ベースオートエンコーダモデルはデータのユニークな表現を学習し、相補的な強度を活用して異常検出性能を向上させる。最終異常予測の有効性と信頼性を高めるため、二重アンサンブル法を適用した。このアプローチは、異常を識別する範囲を最大化するのに優れています。実世界の産業用冷却システムデータのデータセットによる実験結果から,提案手法の有効性が示された。このアプローチは、システムの信頼性を確保し、潜在的な誤動作を防ぐために異常検出が重要である他の産業アプリケーションにも拡張できる。 Anomaly detection plays a crucial role in industrial settings, particularly in maintaining the reliability and optimal performance of cooling systems. Traditional anomaly detection methods often face challenges in handling diverse data characteristics and variations in noise levels, resulting in limited effectiveness. And yet traditional anomaly detection often relies on application of single models. This work proposes a novel, robust approach using five heterogeneous independent models combined with a dual ensemble fusion of voting techniques. Diverse models capture various system behaviors, while the fusion strategy maximizes detection effectiveness and minimizes false alarms. Each base autoencoder model learns a unique representation of the data, leveraging their complementary strengths to improve anomaly detection performance. To increase the effectiveness and reliability of final anomaly prediction, dual ensemble technique is applied. This approach outperforms in maximizing the coverage of identifying anomalies. Experimental results on a real-world dataset of industrial cooling system data demonstrate the effectiveness of the proposed approach. This approach can be extended to other industrial applications where anomaly detection is critical for ensuring system reliability and preventing potential malfunctions.	翻訳日:2024-04-26 16:02:40 公開日:2024-04-24
# 初期モデルのないブラインドフェデレーション学習 Blind Federated Learning without initial model ( http://arxiv.org/abs/2404.16180v1 ) ライセンス: Link先を確認	Jose L. Salmeron, Irina Arévalo,	(参考訳) フェデレートラーニング(Federated Learning)は、独自のプライベートデータを持つ複数の参加者間のモデル構築を可能にする、新たな機械学習アプローチである。この方法はセキュアでプライバシ保護であり、病院などの異なるソースからの機密データを使用して機械学習モデルをトレーニングするのに適している。本稿では,ファジィ認知マップのファジィ学習をプライバシ保護手法として,粒子群最適化に基づく2つの革新的な手法を提案する。さらに、この研究には、連合学習プロセスに初期モデルがないことが関係しており、効果的に盲目化している。この提案は、いくつかのオープンデータセットでテストされており、精度と精度の両方を改善している。 Federated learning is an emerging machine learning approach that allows the construction of a model between several participants who hold their own private data. This method is secure and privacy-preserving, suitable for training a machine learning model using sensitive data from different sources, such as hospitals. In this paper, the authors propose two innovative methodologies for Particle Swarm Optimisation-based federated learning of Fuzzy Cognitive Maps in a privacy-preserving way. In addition, one relevant contribution this research includes is the lack of an initial model in the federated learning process, making it effectively blind. This proposal is tested with several open datasets, improving both accuracy and precision.	翻訳日:2024-04-26 16:02:40 公開日:2024-04-24
# ABCD:リスクアセスメントのための信頼強化アテンションベースの畳み込みオートエンコーダ ABCD: Trust enhanced Attention based Convolutional Autoencoder for Risk Assessment ( http://arxiv.org/abs/2404.16183v1 ) ライセンス: Link先を確認	Sarala Naidu, Ning Xiong,	(参考訳) 産業システムにおける異常検出は、機器故障の防止、リスク識別の確保、システム全体の効率の維持に不可欠である。従来の監視方法は、固定されたしきい値と経験則に依存しており、システムの健康状態の微妙な変化を検出し、差し迫った失敗を予測するのに十分な敏感ではない。この制限に対処するため,リスク検出のためのABCD(Attention-based convolutional autoencoder)を提案する。 ABCDは、実世界の産業用冷却システムの歴史的データから導電率の正常な挙動を学習し、入力データを再構成し、期待されるパターンから逸脱する異常を識別する。このフレームワークは、予測の信頼性を確保するためにキャリブレーション技術も採用している。その結果,ABCDでは注意機構が57.4%向上し,誤報が9.37%減少した。このアプローチは、メンテナンスにマップされたリスク優先度ランクを効果的に検出し、冷却システム設計者とサービス担当者に貴重な洞察を提供する。 0.03%の校正誤差は、モデルが十分に校正され、モデルの信頼性を高めることを示し、メンテナンス戦略に関する情報的決定を可能にする。 Anomaly detection in industrial systems is crucial for preventing equipment failures, ensuring risk identification, and maintaining overall system efficiency. Traditional monitoring methods often rely on fixed thresholds and empirical rules, which may not be sensitive enough to detect subtle changes in system health and predict impending failures. To address this limitation, this paper proposes, a novel Attention-based convolutional autoencoder (ABCD) for risk detection and map the risk value derive to the maintenance planning. ABCD learns the normal behavior of conductivity from historical data of a real-world industrial cooling system and reconstructs the input data, identifying anomalies that deviate from the expected patterns. The framework also employs calibration techniques to ensure the reliability of its predictions. Evaluation results demonstrate that with the attention mechanism in ABCD a 57.4% increase in performance and a reduction of false alarms by 9.37% is seen compared to without attention. The approach can effectively detect risks, the risk priority rank mapped to maintenance, providing valuable insights for cooling system designers and service personnel. Calibration error of 0.03% indicates that the model is well-calibrated and enhances model's trustworthiness, enabling informed decisions about maintenance strategies	翻訳日:2024-04-26 15:27:26 公開日:2024-04-24
# Pebbleのパール: 自動ラベリングのための信頼性機能の改善 Pearls from Pebbles: Improved Confidence Functions for Auto-labeling ( http://arxiv.org/abs/2404.16188v1 ) ライセンス: Link先を確認	Harit Vishwakarma, Reid, Chen, Sui Jiet Tay, Satya Sai Srinath Namburi, Frederic Sala, Ramya Korlakai Vinayak,	(参考訳) 自動ラベリングは、最小限の手動ラベリングでラベル付きトレーニングセットを生成する重要なテクニックのファミリーである。顕著な変種、しきい値に基づく自動ラベル付け(TBAL)は、上述したモデルの信頼度スコアのしきい値を見つけ、ラベルなしのデータポイントを正確にラベル付けすることで機能する。しかし、多くのモデルは自信過剰なスコアを生み出すことが知られており、TBALのパフォーマンスは劣っている。自然に考えれば、過剰な自信を和らげるためにオフ・ザ・シェルフ・キャリブレーション法を適用するというものであるが、そのような方法はいまだに不足している。信頼関数のアドホックな選択を実験するのではなく, TBAL信頼関数の研究のための枠組みを提案する。 TBALシステムの性能を最大化するための新しいポストホック手法である, フレームワークのトラクタブルバージョンを開発した。そこで我々は,<texttt{Colander} 法を広範囲に評価し,キャリブレーション用に設計した手法と比較した。 \texttt{Colander}は、ベースラインに対するカバレッジを最大60\%改善し、自動ラベル付けエラーを5\%以下に維持し、ベースラインと同じ量のラベル付きデータを使用する。 Auto-labeling is an important family of techniques that produce labeled training sets with minimum manual labeling. A prominent variant, threshold-based auto-labeling (TBAL), works by finding a threshold on a model's confidence scores above which it can accurately label unlabeled data points. However, many models are known to produce overconfident scores, leading to poor TBAL performance. While a natural idea is to apply off-the-shelf calibration methods to alleviate the overconfidence issue, such methods still fall short. Rather than experimenting with ad-hoc choices of confidence functions, we propose a framework for studying the \emph{optimal} TBAL confidence function. We develop a tractable version of the framework to obtain \texttt{Colander} (Confidence functions for Efficient and Reliable Auto-labeling), a new post-hoc method specifically designed to maximize performance in TBAL systems. We perform an extensive empirical evaluation of our method \texttt{Colander} and compare it against methods designed for calibration. \texttt{Colander} achieves up to 60\% improvements on coverage over the baselines while maintaining auto-labeling error below $5\%$ and using the same amount of labeled data as the baselines.	翻訳日:2024-04-26 15:27:26 公開日:2024-04-24
# 医用視覚質問応答のためのドメイン適応視覚と言語モデルの融合 Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering ( http://arxiv.org/abs/2404.16192v1 ) ライセンス: Link先を確認	Cuong Nhat Ha, Shima Asaadi, Sanjeev Kumar Karn, Oladimeji Farri, Tobias Heimann, Thomas Runkler,	(参考訳) 視覚言語モデルは、一般的なドメインで有効であり、視覚質問応答(VQA)のような多様なマルチモーダルアプリケーションで強い性能を示すが、より専門的なドメイン、例えば医療において同じレベルの効果を維持するのに苦労する。医療領域に適応した大規模ビジョンと言語モデルを統合する医療ビジョン言語モデルを提案する。このモデルは、3つの異なるバイオメディカル・ラジオロジー・マルチモーダル・ビジュアル・テキスト・データセットを用いてパラメータ効率のトレーニングを行う。提案モデルはSLAKE 1.0の医療用VQA(MedVQA)データセットで87.5%の精度で最先端のパフォーマンスを達成し、他のMedVQAデータセットであるVQA-RADでは73.2%の精度で高い性能を示す。 Vision-language models, while effective in general domains and showing strong performance in diverse multi-modal applications like visual question-answering (VQA), struggle to maintain the same level of effectiveness in more specialized domains, e.g., medical. We propose a medical vision-language model that integrates large vision and language models adapted for the medical domain. This model goes through three stages of parameter-efficient training using three separate biomedical and radiology multi-modal visual and text datasets. The proposed model achieves state-of-the-art performance on the SLAKE 1.0 medical VQA (MedVQA) dataset with an overall accuracy of 87.5% and demonstrates strong performance on another MedVQA dataset, VQA-RAD, achieving an overall accuracy of 73.2%.	翻訳日:2024-04-26 15:27:26 公開日:2024-04-24
# クラス共起確率を用いた複数ラベル認識の改善 Improving Multi-label Recognition using Class Co-Occurrence Probabilities ( http://arxiv.org/abs/2404.16193v1 ) ライセンス: Link先を確認	Samyak Rawlekar, Shubhang Bhatnagar, Vishnuvardhan Pogunulu Srinivasulu, Narendra Ahuja,	(参考訳) マルチラベル認識(MLR)は、画像内の複数のオブジェクトを識別する。この問題のさらなる複雑さに対処するため、近年の研究では、タスクのための大規模なテキスト画像データセットに基づいて訓練された視覚言語モデル(VLM)の情報を活用している。これらの手法は、各オブジェクト(クラス)に対して独立した分類器を学習し、その発生時の相関関係を見渡す。このような共起は、クラス間の条件付き確率としてトレーニングデータから取得することができる。本稿では,独立分類器の性能向上のために,オブジェクトペアの共起情報を組み込んだ独立分類器の拡張フレームワークを提案する。グラフ畳み込みネットワーク(GCN)を用いて,VLMを用いて得られた画像とテキストから得られた推定値を精算することにより,クラス間の条件付き確率を強制する。提案手法を4つのMLRデータセットで検証し,提案手法がすべての最先端手法より優れていることを示す。 Multi-label Recognition (MLR) involves the identification of multiple objects within an image. To address the additional complexity of this problem, recent works have leveraged information from vision-language models (VLMs) trained on large text-images datasets for the task. These methods learn an independent classifier for each object (class), overlooking correlations in their occurrences. Such co-occurrences can be captured from the training data as conditional probabilities between a pair of classes. We propose a framework to extend the independent classifiers by incorporating the co-occurrence information for object pairs to improve the performance of independent classifiers. We use a Graph Convolutional Network (GCN) to enforce the conditional probabilities between classes, by refining the initial estimates derived from image and text sources obtained using VLMs. We validate our method on four MLR datasets, where our approach outperforms all state-of-the-art methods.	翻訳日:2024-04-26 15:27:26 公開日:2024-04-24
# 認識的異種群を用いた個人差分アルゴリズムのゲーム理論解析 A Game-Theoretic Analysis of Auditing Differentially Private Algorithms with Epistemically Disparate Herd ( http://arxiv.org/abs/2404.16195v1 ) ライセンス: Link先を確認	Ya-Ting Yang, Tao Zhang, Quanyan Zhu,	(参考訳) プライバシを保存するAIアルゴリズムは、さまざまな領域で広く採用されているが、透明性の欠如は説明責任の問題を引き起こす可能性がある。監査アルゴリズムはこの問題に対処できるが、マシンベースの監査アプローチはしばしばコストと時間を要する。一方、Herd auditは、集団知性を活用して代替のソリューションを提供する。にもかかわらず、様々なレベルの専門知識と知識へのアクセスをもたらす監査者間の疫学的な格差の存在は、監査のパフォーマンスに影響を及ぼす可能性がある。効果的な羊飼いの監査は、アルゴリズム開発者の信用可能な説明責任の脅威を確立し、彼らの主張を裏付けるインセンティブを与える。本研究の目的は,Stackelbergのゲームアプローチを用いて,アルゴリズム開発者に対する監査が与える影響を調査する,体系的なフレームワークを開発することである。監査人にとって最適な戦略は、監査プロセスにおける監査人の自信を高めるため、関連する情報への容易なアクセスの重要性を強調している。同様に、ディベロッパにとって最適な選択は、監査人が知識獲得のコストを下げる場合、羊飼いの監査が実行可能であることを示唆している。透明性と説明責任を高めることで、Hed auditはプライバシ保護アルゴリズムの責任ある開発に寄与する。 Privacy-preserving AI algorithms are widely adopted in various domains, but the lack of transparency might pose accountability issues. While auditing algorithms can address this issue, machine-based audit approaches are often costly and time-consuming. Herd audit, on the other hand, offers an alternative solution by harnessing collective intelligence. Nevertheless, the presence of epistemic disparity among auditors, resulting in varying levels of expertise and access to knowledge, may impact audit performance. An effective herd audit will establish a credible accountability threat for algorithm developers, incentivizing them to uphold their claims. In this study, our objective is to develop a systematic framework that examines the impact of herd audits on algorithm developers using the Stackelberg game approach. The optimal strategy for auditors emphasizes the importance of easy access to relevant information, as it increases the auditors' confidence in the audit process. Similarly, the optimal choice for developers suggests that herd audit is viable when auditors face lower costs in acquiring knowledge. By enhancing transparency and accountability, herd audit contributes to the responsible development of privacy-preserving algorithms.	翻訳日:2024-04-26 15:27:26 公開日:2024-04-24
# ミツバチにおける小分子毒性の分類のための新しいベンチマークデータセットApisTox ApisTox: a new benchmark dataset for the classification of small molecules toxicity on honey bees ( http://arxiv.org/abs/2404.16196v1 ) ライセンス: Link先を確認	Jakub Adamczyk, Jakub Poziemski, Paweł Siedlecki,	(参考訳) ミツバチのグローバルな減少は、農業、生物多様性、環境安定に重大なリスクをもたらす。既存のデータのギャップを埋めるため,ハチに対する殺虫剤の毒性に着目した包括的データセットであるApisToxを紹介した。このデータセットは、ECOTOXやPPDBといった既存のソースからのデータを組み合わせ、活用することで、以前のデータセットを超える広範囲で一貫性のある、キュレートされたコレクションを提供する。 ApisToxには、化学物質の毒性レベル、論文の出版時期などの詳細、外部の化学物質データベースにリンクする識別子など、幅広いデータが含まれている。このデータセットは、環境・農業研究の重要なツールとして機能するが、ミツバチの個体数に対する害を最小限に抑えるための政策や慣行の開発を支援することもできる。最後に、ApisToxはアグロケミカル化合物の分子特性予測法をベンチマークするためのユニークな資源を提供し、環境科学と化学情報学の両方の進歩を促進する。これは、ミツバチの保護における学術研究と実践的応用の両方に有用な道具である。 The global decline in bee populations poses significant risks to agriculture, biodiversity, and environmental stability. To bridge the gap in existing data, we introduce ApisTox, a comprehensive dataset focusing on the toxicity of pesticides to honey bees (Apis mellifera). This dataset combines and leverages data from existing sources such as ECOTOX and PPDB, providing an extensive, consistent, and curated collection that surpasses the previous datasets. ApisTox incorporates a wide array of data, including toxicity levels for chemicals, details such as time of their publication in literature, and identifiers linking them to external chemical databases. This dataset may serve as an important tool for environmental and agricultural research, but also can support the development of policies and practices aimed at minimizing harm to bee populations. Finally, ApisTox offers a unique resource for benchmarking molecular property prediction methods on agrochemical compounds, facilitating advancements in both environmental science and cheminformatics. This makes it a valuable tool for both academic research and practical applications in bee conservation.	翻訳日:2024-04-26 15:27:26 公開日:2024-04-24
# 臨床治験における効率的な患者補充に向けて : プロンプト学習モデルの適用 Towards Efficient Patient Recruitment for Clinical Trials: Application of a Prompt-Based Learning Model ( http://arxiv.org/abs/2404.16198v1 ) ライセンス: Link先を確認	Mojdeh Rahmanian, Seyed Mostafa Fakhrahmad, Seyedeh Zahra Mousavi,	(参考訳) 目的: 臨床試験は医薬品の介入を進めるのに不可欠であるが、適格な参加者を選ぶ際にボトルネックに直面している。電子健康記録(EHR)を採用に活用することは人気があるが、構造化されていない医療用テキストの複雑な性質は、参加者を効率的に特定する上での課題である。自然言語処理(NLP)技術は最近、トランスフォーマーモデルに焦点を絞ったソリューションとして登場した。本研究では,EHRで収集した非構造化医療ノートから,コホート選択タスクに対するプロンプトベース大規模言語モデルの性能を評価することを目的とした。方法: 医療記録の処理には, 試験に必要な適格基準に最も関連性の高い文章を選択した。それぞれの資格基準に関連するSNOMED CT概念を収集した。 SNOMED CTのオントロジーに基づいてMedCATと診断した。基準関連用語と一致する概念を含む注釈文を抽出した。次に,抽出した文をトレーニングセットとして,プロンプトベース大規模言語モデル(GPT)を用いた。 2018 n2c2 チャレンジのデータセットを用いて,NLP 技術を用いて,13 の資格基準に基づいて 311 人の医療記録を分類することを目的としたモデルの性能評価を行った。結果: 提案モデルでは, マイクロFとマクロFの合計が0.9061, 0.8060であり, 実験結果の最高値となった。結論: 本研究におけるプロンプトベース大規模言語モデルの適用は, 有望な評価基準に基づいて, 患者を分類するものである。また,他の医療用テキストにも適用可能なSNOMED CTオントロジーを用いた抽出要約法を提案する。 Objective: Clinical trials are essential for advancing pharmaceutical interventions, but they face a bottleneck in selecting eligible participants. Although leveraging electronic health records (EHR) for recruitment has gained popularity, the complex nature of unstructured medical texts presents challenges in efficiently identifying participants. Natural Language Processing (NLP) techniques have emerged as a solution with a recent focus on transformer models. In this study, we aimed to evaluate the performance of a prompt-based large language model for the cohort selection task from unstructured medical notes collected in the EHR. Methods: To process the medical records, we selected the most related sentences of the records to the eligibility criteria needed for the trial. The SNOMED CT concepts related to each eligibility criterion were collected. Medical records were also annotated with MedCAT based on the SNOMED CT ontology. Annotated sentences including concepts matched with the criteria-relevant terms were extracted. A prompt-based large language model (Generative Pre-trained Transformer (GPT) in this study) was then used with the extracted sentences as the training set. To assess its effectiveness, we evaluated the model's performance using the dataset from the 2018 n2c2 challenge, which aimed to classify medical records of 311 patients based on 13 eligibility criteria through NLP techniques. Results: Our proposed model showed the overall micro and macro F measures of 0.9061 and 0.8060 which were among the highest scores achieved by the experiments performed with this dataset. Conclusion: The application of a prompt-based large language model in this study to classify patients based on eligibility criteria received promising scores. Besides, we proposed a method of extractive summarization with the aid of SNOMED CT ontology that can be also applied to other medical texts.	翻訳日:2024-04-26 15:27:26 公開日:2024-04-24
# 機械学習による高次光子状態の最適分類 Optimized higher-order photon state classification by machine learning ( http://arxiv.org/abs/2404.16203v1 ) ライセンス: Link先を確認	Guangpeng Xu, Jeffrey Carvalho, Chiran Wijesundara, Tim Thomay,	(参考訳) 高次光子放出の分類は、決定論的多光子生成のためにさらに多くの方法が開発され、重要となる。広く使われている2階相関g(2)は、より高い光子フォック状態の量子純度を決定するのに十分ではない。従来のキャラクタリゼーション手法では、測定時間と計算時間を増大させる大量の光子検出イベントが必要となる。本稿では,2次元畳み込みニューラルネットワーク(CNN)に基づく機械学習モデルを用いて,最大で94%の精度でマルチフォトンフォック状態の迅速な分類を行う。シミュレーションされた光子検出イベントとg(3)相関を合わせることで、このモデルは特にスパース相関データで効率よく、800の共検出イベントで90%の精度を達成することができる。提案した実験装置を用いて、このCNN分類器は、量子技術に広く応用されている高光子状態の準リアルタイム分類の可能性を開く。 The classification of higher-order photon emission becomes important with more methods being developed for deterministic multiphoton generation. The widely-used second-order correlation g(2) is not sufficient to determine the quantum purity of higher photon Fock states. Traditional characterization methods require a large amount of photon detection events which leads to increased measurement and computation time. Here, we demonstrate a Machine Learning model based on a 2D Convolutional Neural Network (CNN) for rapid classification of multiphoton Fock states up to \|3> with an overall accuracy of 94%. By fitting the g(3) correlation with simulated photon detection events, the model exhibits efficient performance particularly with sparse correlation data, with 800 co-detection events to achieve an accuracy of 90%. Using the proposed experimental setup, this CNN classifier opens up the possibility for quasi real-time classification of higher photon states, which holds broad applications in quantum technologies.	翻訳日:2024-04-26 15:27:26 公開日:2024-04-24
# 絡み合いに基づく人工トポロジー:周辺ネットワークノード Entanglement-Based Artificial Topology: Neighboring Remote Network Nodes ( http://arxiv.org/abs/2404.16204v1 ) ライセンス: Link先を確認	Si-Yi Chen, Jessica Illiano, Angela Sara Cacciapuoti, Marcello Caleffi,	(参考訳) 絡み合いは、量子インターネットの鍵となる通信資源として全会一致で認識される。しかし, 両端の絡み合いに注意を集中させることによって, 両端の絡み合いを生かして, 新たなネットワーク機能を実現する可能性について, これまでに検討が進んでいない。本稿では,ネットワーク間リソースとしてマルチパーティ・エンタングルメントを活用することを目的としている。具体的には、異なる量子局所領域ネットワーク(QLAN)の相互接続を考察し、マルチパーティント・エンタングルメントにより、局所演算のみにより、物理QLANトポロジの限界を克服する、QLAN間人工トポロジを動的に生成できることを示す。そこで本研究ではまず,各QLANに分散するマルチパーティの絡み合った状態を設計する。そして、そのような状態がどのように設計されるかを示す。一異なるQLANに属する相互接続ノード及び二異なるQLAN間トラフィックパターンに動的に適応すること。我々の貢献は、ネットワークエンジニアリングコミュニティに、人工トポロジと人工地区の概念に関する手持ちのガイドラインを提供することである。 Entanglement is unanimously recognized as the key communication resource of the Quantum Internet. Yet, the possibility of implementing novel network functionalities by exploiting the marvels of entanglement has been poorly investigated so far, by mainly restricting the attention to bipartite entanglement. Conversely, in this paper, we aim at exploiting multipartite entanglement as inter-network resource. Specifically, we consider the interconnection of different Quantum Local Area Networks (QLANs), and we show that multipartite entanglement allows to dynamically generate an inter-QLAN artificial topology, by means of local operations only, that overcomes the limitations of the physical QLAN topologies. To this aim, we first design the multipartite entangled state to be distributed within each QLAN. Then, we show how such a state can be engineered to: i) interconnect nodes belonging to different QLANs, and ii) dynamically adapt to different inter-QLAN traffic patterns. Our contribution aims at providing the network engineering community with a hands-on guideline towards the concept of artificial topology and artificial neighborhood.	翻訳日:2024-04-26 15:27:26 公開日:2024-04-24
# AIS 2024 ユーザ生成コンテンツの映像品質評価に関する課題:方法と結果 AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results ( http://arxiv.org/abs/2404.16205v1 ) ライセンス: Link先を確認	Marcos V. Conde, Saman Zadtootaghaj, Nabajeet Barman, Radu Timofte, Chenlong He, Qi Zheng, Ruoxi Zhu, Zhengzhong Tu, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai, Wei Sun, Yuqin Cao, Yanwei Jiang, Jun Jia, Zhichao Zhang, Zijian Chen, Weixia Zhang, Xiongkuo Min, Steve Göring, Zihao Qi, Chen Feng,	(参考訳) 本稿では,ユーザ生成コンテンツ(UGC)に着目したAIS 2024ビデオ品質アセスメント(VQA)チャレンジをレビューする。この課題の目的は、UGCビデオの知覚品質を推定できるディープラーニングベースの手法を収集することである。 YouTube UGC Datasetのユーザー生成ビデオには、さまざまなコンテンツ(スポーツ、ゲーム、歌詞、アニメなど)、品質、解像度が含まれている。提案手法では,30FHDフレームを1秒以下で処理する必要がある。チャレンジでは、合計102人の参加者が登録され、15人がコードとモデルを提出した。ユーザ生成コンテンツの効率的な映像品質評価のための多種多様な深層モデルに関する調査として,トップ5投稿のパフォーマンスを概観し,紹介する。 This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed methods must process 30 FHD frames under 1 second. In the challenge, a total of 102 participants registered, and 15 submitted code and models. The performance of the top-5 submissions is reviewed and provided here as a survey of diverse deep models for efficient video quality assessment of user-generated content.	翻訳日:2024-04-26 15:27:26 公開日:2024-04-24
# 構造的およびテクスチャ的埋め込みを用いた知識グラフの補完 Knowledge Graph Completion using Structural and Textual Embeddings ( http://arxiv.org/abs/2404.16206v1 ) ライセンス: Link先を確認	Sakher Khalil Alqaaidi, Krzysztof Kochut,	(参考訳) 知識グラフ(KG)は、質問回答やレコメンデーションシステムなど、人工知能アプリケーションに広く使われている。しかしながら、KGは不完全であることがしばしば見出される。既存の文献の多くは、与えられた不完全なKG三重項に対する欠落ノードの予測に重点を置いているが、既存のノード間の関係を探索することでKGを完遂する機会は残っている。本研究では,KG内のテキスト情報と構造情報の両方を利用する関係予測モデルを提案する。本手法では,歩行に基づく埋め込みと言語モデル埋め込みを統合し,ノードを効果的に表現する。本研究では,広く利用されているデータセットで評価した場合,関係予測タスクにおける競合結果が得られたことを実証する。 Knowledge Graphs (KGs) are widely employed in artificial intelligence applications, such as question-answering and recommendation systems. However, KGs are frequently found to be incomplete. While much of the existing literature focuses on predicting missing nodes for given incomplete KG triples, there remains an opportunity to complete KGs by exploring relations between existing nodes, a task known as relation prediction. In this study, we propose a relations prediction model that harnesses both textual and structural information within KGs. Our approach integrates walks-based embeddings with language model embeddings to effectively represent nodes. We demonstrate that our model achieves competitive results in the relation prediction task when evaluated on a widely used dataset.	翻訳日:2024-04-26 15:27:26 公開日:2024-04-24
# GPU-RANC:ニューロモルフィックアーキテクチャのためのCUDA加速シミュレーションフレームワーク GPU-RANC: A CUDA Accelerated Simulation Framework for Neuromorphic Architectures ( http://arxiv.org/abs/2404.16208v1 ) ライセンス: Link先を確認	Sahil Hassan, Michael Inouye, Miguel C. Gonzalez, Ilkin Aliyev, Joshua Mack, Maisha Hafiz, Ali Akoglu,	(参考訳) オープンソースシミュレーションツールは、ニューロモルフィックなアプリケーションエンジニアやハードウェアアーキテクトにとって、パフォーマンスボトルネックを調査し、シリコンにコミットする前に設計最適化を検討する上で重要な役割を果たす。 Reconfigurable Architecture for Neuromorphic Computing (RANC)は、ソフトウェアベースのシミュレーションとFPGAベースのエミュレーションの両方を通じて、統合されたエコシステム内で事前訓練されたスパイキングニューラルネットワーク(SNN)モデルを実行する機能を提供するツールである。 RANCは、実装ボトルネックを調査し、アーキテクチャパラメータをチューニングしたり、アプリケーションの洞察に基づいてニューロンの振る舞いを変更し、ハードウェアの性能とネットワークの正確性に関する貿易空間を研究するために、柔軟でパラメータ化された設計でコミュニティによって利用されてきた。ニューロモルフィックコンピューティングで使用するアーキテクチャの設計には、ニューロン当たりの重みの数と精度、コア当たりのニューロンと軸索の数、ネットワークトポロジ、ニューロンの振る舞いなど、信じられないほど多くの構成パラメータがある。このような研究を加速し、ユーザに生産的な空間探索の合理化を提供するため、本稿では、RANCのGPUベースの実装を紹介する。我々は並列化のアプローチを要約し、GPUベースの様々なユースケースにおけるTick-accurateシミュレーションで達成したスピードアップのゲインを定量化する。 512個のニューロモルフィックコアMNIST推論アプリケーションに基づくRANCシミュレータのシリアルバージョンと比較して,最大780倍の高速化を示した。 RANCエコシステムは、SNNを加速させ、最適化されたニューロモルフィックアーキテクチャに迅速に収束させることにより、よりリッチな研究を行うための様々な最適化を探索する研究において、より実現可能な手段を提供すると考えている。 Open-source simulation tools play a crucial role for neuromorphic application engineers and hardware architects to investigate performance bottlenecks and explore design optimizations before committing to silicon. Reconfigurable Architecture for Neuromorphic Computing (RANC) is one such tool that offers ability to execute pre-trained Spiking Neural Network (SNN) models within a unified ecosystem through both software-based simulation and FPGA-based emulation. RANC has been utilized by the community with its flexible and highly parameterized design to study implementation bottlenecks, tune architectural parameters or modify neuron behavior based on application insights and study the trade space on hardware performance and network accuracy. In designing architectures for use in neuromorphic computing, there are an incredibly large number of configuration parameters such as number and precision of weights per neuron, neuron and axon counts per core, network topology, and neuron behavior. To accelerate such studies and provide users with a streamlined productive design space exploration, in this paper we introduce the GPU-based implementation of RANC. We summarize our parallelization approach and quantify the speedup gains achieved with GPU-based tick-accurate simulations across various use cases. We demonstrate up to 780 times speedup compared to serial version of the RANC simulator based on a 512 neuromorphic core MNIST inference application. We believe that the RANC ecosystem now provides a much more feasible avenue in the research of exploring different optimizations for accelerating SNNs and performing richer studies by enabling rapid convergence to optimized neuromorphic architectures.	翻訳日:2024-04-26 15:27:26 公開日:2024-04-24
# 観測可能な量子状態の集合におけるランダム性の検証 Verifying randomness in sets of quantum states via observables ( http://arxiv.org/abs/2404.16211v1 ) ライセンス: Link先を確認	Xavier Bonet-Monroig, Hao Wang, Adrián Pérez-Salinas,	(参考訳) 量子状態の集合とハールランダム分布との整合を、既知の量子可観測性を通して統計的モーメントのマッチングにより予測する計量平均ランダム性を示す。本研究では,Haar-randomnessがディリクレ分布と結びついていることを示し,統計モーメントの閉形式表現と単純な境界を与える。この計量を置換とユニタリ同値な可観測性に一般化し、拡張平均ランダム性がハールランダム分布と互換性があるならば、状態の集合は概してハールランダムである。 We present a metric, average randomness, that predicts the compatibility of a set of quantum states with the Haar-random distribution, by matching of statistical moments, through a known quantum observable. We show that Haar-randomness is connected to the Dirichlet distribution, and provide a closed-form expression, and simple bounds of the statistical moments. We generalize this metric to permutation- and unitary-equivalent observables, ensuring that if the extended average randomness is compatible with a Haar-random distribution, then the set of states is approximately Haar-random.	翻訳日:2024-04-26 15:27:26 公開日:2024-04-24
# 進化する脅威景観におけるディープフェイク画像検出の最近の進歩の分析 An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape ( http://arxiv.org/abs/2404.16212v1 ) ライセンス: Link先を確認	Sifat Muhammad Abdullah, Aravind Cheruvu, Shravya Kanchi, Taejoong Chung, Peng Gao, Murtuza Jadliwala, Bimal Viswanath,	(参考訳) ディープフェイクまたは合成画像は、オンラインプラットフォームに深刻なリスクをもたらす。これにより、Deepfakeイメージを正確に検出し、公開可能なDeepfakeデータセット上での優れたパフォーマンスを実現するためのいくつかの研究活動が引き起こされた。本研究は,8つの最先端検出器について検討し,最近の2つの発展により,配備の準備が整っていないことを論じるものである。まず、大規模な生成モデルをカスタマイズするための軽量な方法の出現により、攻撃者は多数のカスタマイズされたジェネレータ(ディープフェイクを作成する)を作成でき、それによって脅威表面を大幅に増大させることができる。既存のディフェンスは、現在一般に公開されているような 'emph{user-customized generative model' の一般化に失敗していることを示す。本稿では、コンテンツに依存しない特徴に基づく新しい機械学習手法と、ユーザカスタマイズモデルに対する一般化性能を改善するためのアンサンブルモデリングについて論じる。第2に,‘textit{vision foundation model’ – 複数の下流タスクに容易に適応可能な広範なデータに基づいてトレーニングされたマシンラーニングモデル – の出現は,攻撃者が既存の防御を回避可能な敵のディープフェイクを作らないために,誤用される可能性がある。本稿では, 既存の基盤モデルを利用して, 画像内容のセマンティックな操作を通じて, 逆方向のサンプルを作成できる単純な逆方向攻撃を提案する。我々は、攻撃に対するいくつかの防衛の脆弱性を強調し、この新たな脅威に対抗するために、先進的な基盤モデルと敵の訓練を活用する方向を探る。 Deepfake or synthetic images produced using deep generative models pose serious risks to online platforms. This has triggered several research efforts to accurately detect deepfake images, achieving excellent performance on publicly available deepfake datasets. In this work, we study 8 state-of-the-art detectors and argue that they are far from being ready for deployment due to two recent developments. First, the emergence of lightweight methods to customize large generative models, can enable an attacker to create many customized generators (to create deepfakes), thereby substantially increasing the threat surface. We show that existing defenses fail to generalize well to such \emph{user-customized generative models} that are publicly available today. We discuss new machine learning approaches based on content-agnostic features, and ensemble modeling to improve generalization performance against user-customized models. Second, the emergence of \textit{vision foundation models} -- machine learning models trained on broad data that can be easily adapted to several downstream tasks -- can be misused by attackers to craft adversarial deepfakes that can evade existing defenses. We propose a simple adversarial attack that leverages existing foundation models to craft adversarial samples \textit{without adding any adversarial noise}, through careful semantic manipulation of the image content. We highlight the vulnerabilities of several defenses against our attack, and explore directions leveraging advanced foundation models and adversarial training to defend against this new threat.	翻訳日:2024-04-26 15:27:26 公開日:2024-04-24
# ActiveRIR:音響環境モデリングのためのアクティブオーディオ-ビジュアル探索 ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling ( http://arxiv.org/abs/2404.16216v1 ) ライセンス: Link先を確認	Arjun Somayazulu, Sagnik Majumder, Changan Chen, Kristen Grauman,	(参考訳) 環境音響モデルは、任意の音源/受信者の位置について、室内環境の物理的特性によって音がどのように変換されるかを表す。従来の音響モデル構築の方法は、空間の密集した場所にある大量の音響データの高価で時間を要する収集や、音響データサンプリングの場所をインテリジェントに選択するためのシーン幾何学の特権的な知識に依存している。本研究では,視覚・音響センサを備えた移動体エージェントが,環境音響モデルと占有マップを同時に構築する,無人環境の環境音響モデルを構築するための新しいタスクである能動的音響サンプリングを提案する。音声・視覚センサストリームからの情報を活用してエージェントナビゲーションを誘導し、最適な音響データサンプリング位置を判定する強化学習(RL)ポリシーであるActiveRIRを導入し、最小限の音響サンプルから環境の高品質な音響モデルを生成する。環境音響モデルにおける情報ゲインに基づく新しいRL報酬で政策を訓練する。 ActiveRIRは、最先端の音響シミュレーションプラットフォームから、さまざまな目に見えない屋内環境の評価を行い、従来のナビゲーションエージェントと既存の最先端の手法の両方を性能評価する。 An environment acoustic model represents how sound is transformed by the physical characteristics of an indoor environment, for any given source/receiver location. Traditional methods for constructing acoustic models involve expensive and time-consuming collection of large quantities of acoustic data at dense spatial locations in the space, or rely on privileged knowledge of scene geometry to intelligently select acoustic data sampling locations. We propose active acoustic sampling, a new task for efficiently building an environment acoustic model of an unmapped environment in which a mobile agent equipped with visual and acoustic sensors jointly constructs the environment acoustic model and the occupancy map on-the-fly. We introduce ActiveRIR, a reinforcement learning (RL) policy that leverages information from audio-visual sensor streams to guide agent navigation and determine optimal acoustic data sampling positions, yielding a high quality acoustic model of the environment from a minimal set of acoustic samples. We train our policy with a novel RL reward based on information gain in the environment acoustic model. Evaluating on diverse unseen indoor environments from a state-of-the-art acoustic simulation platform, ActiveRIR outperforms an array of methods--both traditional navigation agents based on spatial novelty and visual exploration as well as existing state-of-the-art methods.	翻訳日:2024-04-26 15:27:26 公開日:2024-04-24
# 階層空間上でのFADEを用いた効率的なNAS Efficient NAS with FaDE on Hierarchical Spaces ( http://arxiv.org/abs/2404.16218v1 ) ライセンス: Link先を確認	Simon Neumeyer, Julian Stier, Michael Granitzer,	(参考訳) ニューラルアーキテクチャサーチ(NAS)は難しい問題である。階層的な検索空間は、ニューラルネットワークサブモジュールの安価な評価を可能にし、アーキテクチャ評価の代理となる。しかし、階層構造が制限的すぎる場合や、サロゲートが一般化に失敗する場合もあります。階層型NAS空間の有限領域における相対的な性能予測を得るために、微分可能なアーキテクチャ探索を用いるFaDEを提案する。これらのランクの相対的な性質は、メモリレス、バッチワイドな外的探索アルゴリズム(英語版)であり、疑似階調降下の進化的アルゴリズム(英語版)を用いる。 FaDEは特に階層的な多セル探索空間に適しており、指数的なコストではなく線形で探索できるため、プロキシ検索空間は不要である。実験の結果、探索空間の有限領域におけるFaDEランクは、対応するアーキテクチャ性能と相関し、第2に、完全なニューラルネットワーク探索空間における疑似漸進的進化探索に有効であることが示された。 Neural architecture search (NAS) is a challenging problem. Hierarchical search spaces allow for cheap evaluations of neural network sub modules to serve as surrogate for architecture evaluations. Yet, sometimes the hierarchy is too restrictive or the surrogate fails to generalize. We present FaDE which uses differentiable architecture search to obtain relative performance predictions on finite regions of a hierarchical NAS space. The relative nature of these ranks calls for a memory-less, batch-wise outer search algorithm for which we use an evolutionary algorithm with pseudo-gradient descent. FaDE is especially suited on deep hierarchical, respectively multi-cell search spaces, which it can explore by linear instead of exponential cost and therefore eliminates the need for a proxy search space. Our experiments show that firstly, FaDE-ranks on finite regions of the search space correlate with corresponding architecture performances and secondly, the ranks can empower a pseudo-gradient evolutionary search on the complete neural architecture search space.	翻訳日:2024-04-26 15:17:42 公開日:2024-04-24
# 曲がった連結は完備マイオラナ・McFarlandクラスに属さないのか? When does a bent concatenation not belong to the completed Maiorana-McFarland class? ( http://arxiv.org/abs/2404.16220v1 ) ライセンス: Link先を確認	Sadmir Kudin, Enes Pasalic, Alexandr Polujan, Fengrong Zhang,	(参考訳) すべてのブールベント関数 $f$ は、連結化 $f=f_1\|\|f_2$ または連結化 $f=f_1\|\|f_2\|\|f_3\|\|\|f_4$ と書けるが、これらはすべて同時にベント、半ベント、あるいは5値のスペクトル関数である。曲がった連結$f$ (not) は、完成したMaiorana-McFarland クラス $\mathcal{M}^\#$ に属するのか? 本稿では、この問題を完全に解決するために、$f=f_1\|\|f_2$と$f=f_1\|\|f_2\|\|f_3\|\|f_4$という形の結合に対する$\mathcal{M}$-部分空間の構造の完全な特徴づけを与える。これらの条件に基づき、$f=g\|\|h\|\|g\|\|(h+1)$の場合、特別な場合において、$\mathcal{M}^\#$ の外の曲がり関数を指定するためのいくつかの明示的な設計法を提案する。 Every Boolean bent function $f$ can be written either as a concatenation $f=f_1\|\|f_2$ of two complementary semi-bent functions $f_1,f_2$; or as a concatenation $f=f_1\|\|f_2\|\|f_3\|\|f_4$ of four Boolean functions $f_1,f_2,f_3,f_4$, all of which are simultaneously bent, semi-bent, or 5-valued spectra-functions. In this context, it is essential to ask: When does a bent concatenation $f$ (not) belong to the completed Maiorana-McFarland class $\mathcal{M}^\#$? In this article, we answer this question completely by providing a full characterization of the structure of $\mathcal{M}$-subspaces for the concatenation of the form $f=f_1\|\|f_2$ and $f=f_1\|\|f_2\|\|f_3\|\|f_4$, which allows us to specify the necessary and sufficient conditions so that $f$ is outside $\mathcal{M}^\#$. Based on these conditions, we propose several explicit design methods of specifying bent functions outside $\mathcal{M}^\#$ in the special case when $f=g\|\|h\|\|g\|\|(h+1)$, where $g$ and $h$ are bent functions.	翻訳日:2024-04-26 15:17:42 公開日:2024-04-24
# NeRF-XL:複数のGPUによるNeRFのスケーリング NeRF-XL: Scaling NeRFs with Multiple GPUs ( http://arxiv.org/abs/2404.16221v1 ) ライセンス: Link先を確認	Ruilong Li, Sanja Fidler, Angjoo Kanazawa, Francis Williams,	(参考訳) 我々は、複数のGPUにまたがってニューラルネットワーク場(NeRF)を分散する原理的な方法であるNeRF-XLを提案し、任意の容量でNeRFのトレーニングとレンダリングを可能にする。まず,大規模シーンを独立に訓練された複数のNeRFに分解する既存のマルチGPUアプローチを再検討し,これらの手法の基本的な問題点を特定し,トレーニングにGPU(Advanced Computer Resources)を用いることによって,再構成品質の改善を阻害する。 NeRF-XLはこれらの問題を修正し、単により多くのハードウェアを使用することで、任意の数のパラメータでNeRFのトレーニングとレンダリングを可能にする。提案手法のコアとなる分散トレーニングとレンダリングの定式化は,従来のシングルGPUの場合と数学的に等価であり,GPU間の通信を最小化する。任意のパラメータ数でNeRFをアンロックすることにより、NeRFのマルチGPUスケーリング法則を初めて明らかにし、パラメータ数を大きくした再構成品質の向上とGPUの高速化を実現した。我々は,25km^2の都市部をカバーする258K画像を含む,これまでで最大規模のオープンソースデータセットMatrixCityを含む,さまざまなデータセットに対するNeRF-XLの有効性を実証した。 We present NeRF-XL, a principled method for distributing Neural Radiance Fields (NeRFs) across multiple GPUs, thus enabling the training and rendering of NeRFs with an arbitrarily large capacity. We begin by revisiting existing multi-GPU approaches, which decompose large scenes into multiple independently trained NeRFs, and identify several fundamental issues with these methods that hinder improvements in reconstruction quality as additional computational resources (GPUs) are used in training. NeRF-XL remedies these issues and enables the training and rendering of NeRFs with an arbitrary number of parameters by simply using more hardware. At the core of our method lies a novel distributed training and rendering formulation, which is mathematically equivalent to the classic single-GPU case and minimizes communication between GPUs. By unlocking NeRFs with arbitrarily large parameter counts, our approach is the first to reveal multi-GPU scaling laws for NeRFs, showing improvements in reconstruction quality with larger parameter counts and speed improvements with more GPUs. We demonstrate the effectiveness of NeRF-XL on a wide variety of datasets, including the largest open-source dataset to date, MatrixCity, containing 258K images covering a 25km^2 city area.	翻訳日:2024-04-26 15:17:42 公開日:2024-04-24
# インストラクショナルビデオにおけるステップ差 Step Differences in Instructional Video ( http://arxiv.org/abs/2404.16222v1 ) ライセンス: Link先を確認	Tushar Nagarajan, Lorenzo Torresani,	(参考訳) ユーザビデオと参照ハウツービデオを比較することは、ユーザの進捗に合わせてパーソナライズされたアシストを提供するAR/VR技術にとって重要な要件である。しかし、言語ベースの支援に対する現在のアプローチは、単一のビデオに関する質問に答えることしかできない。本論文では,まず,既存のステップアノテーションと付随するナレーションを活用することで,ハウト100Mからビデオのペアを含む大量の視覚的チューニングデータを自動生成し,さらにビデオ条件付き言語モデルを訓練して,複数の生動画を共同で解析する手法を提案する。本モデルでは,これらの違いの重大さに基づいて,ビデオペアとランキングビデオの差分を同定し,複数のビデオに対して一般的な推論を行うための有望な能力を示す。 Comparing a user video to a reference how-to video is a key requirement for AR/VR technology delivering personalized assistance tailored to the user's progress. However, current approaches for language-based assistance can only answer questions about a single video. We propose an approach that first automatically generates large amounts of visual instruction tuning data involving pairs of videos from HowTo100M by leveraging existing step annotations and accompanying narrations, and then trains a video-conditioned language model to jointly reason across multiple raw videos. Our model achieves state-of-the-art performance at identifying differences between video pairs and ranking videos based on the severity of these differences, and shows promising ability to perform general reasoning over multiple videos.	翻訳日:2024-04-26 15:17:42 公開日:2024-04-24
# NTIRE 2024 チャレンジサーベイ Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey ( http://arxiv.org/abs/2404.16223v1 ) ライセンス: Link先を確認	Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhijing Sun, Jiaying Zhu, Liangyan Li, Ke Chen, Yunzhe Li, Yimo Ning, Guanhua Zhao, Jun Chen, Jinyang Yu, Kele Xu, Qisheng Xu, Yong Dou,	(参考訳) 本報告では,NTIRE 2024 RAW Image Super-Resolution Challengeについて概説し,提案手法と結果について述べる。 RAWスーパーリゾリューションのための新しい手法は、現代の画像信号処理(ISP)パイプラインでは必須であるが、RGBドメインのようには研究されていない。この課題の目標は、ノイズやぼやけなどの未知の劣化を考慮して、RAWベイア画像を2倍にスケールアップすることである。この挑戦では、合計230人の参加者が登録され、45人が挑戦期間に結果を提出した。 RAW Image Super-Resolutionの現在の最先端の指標として、トップ5のサブミッションのパフォーマンスをレビューし、ここで提供する。 This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as noise and blur. In the challenge, a total of 230 participants registered, and 45 submitted results during thee challenge period. The performance of the top-5 submissions is reviewed and provided here as a gauge for the current state-of-the-art in RAW Image Super-Resolution.	翻訳日:2024-04-26 15:17:42 公開日:2024-04-24
# 痛みの言語に関する計算学的分析--系統的考察 Computational analysis of the language of pain: a systematic review ( http://arxiv.org/abs/2404.16226v1 ) ライセンス: Link先を確認	Diogo A. P. Nunes, Joana Ferreira-Gomes, Fani Neto, David Martins de Matos,	(参考訳) 目的: 本研究の目的は, 患者や医師が生み出す痛みの言葉の計算処理に関する文献を体系的にレビューし, 現状と課題を明らかにすることである。方法: PRISMAガイドラインに従って, 痛みの言語処理に関する関連研究を選択し, あらかじめ定義された研究課題に答えるために, 総合的な文献検索を行った。データ抽出と合成を行い, 主目的と結果, 患者と痛みの集団, テキストデータ, 計算手法, 結果目標に応じて, 選択された研究を分類した。結果: 医師が生成した痛みの言語, 特に臨床記録から得られたものは, 最もよく用いられるデータであった。課題は、患者の診断とトリアージ、痛みの言及の識別、治療反応の予測、バイオメディカルな実体抽出、言語的特徴と臨床状態の相関、痛みの物語の語彙的分析である。 1つの研究は、実験装置における痛みの発話に関する以前の言語知識を含んでいた。ほとんどの研究は、臨床ツールとして、または間接的な知識として、医師の成果を目標にしていた。最も標的にされていない治療段階は、患者が最も関与する自己管理である。最も研究されていない痛みの次元は、感情的で社会文化的であった。提案アルゴリズムを取り入れた2つの研究で、臨床における医師の成績が改善した。考察: 今後の研究は、患者が生み出す痛みの言語分析、患者中心の自己管理とエンパワーメントのためのリソース開発、痛みの感情的・社会的側面の探索、そして、提案ツールによる支援による医師のパフォーマンス改善の測定に焦点をあてるべきである。 Objectives: This study aims to systematically review the literature on the computational processing of the language of pain, whether generated by patients or physicians, identifying current trends and challenges. Methods: Following the PRISMA guidelines, a comprehensive literature search was conducted to select relevant studies on the computational processing of the language of pain and answer pre-defined research questions. Data extraction and synthesis were performed to categorize selected studies according to their primary purpose and outcome, patient and pain population, textual data, computational methodology, and outcome targets. Results: Physician-generated language of pain, specifically from clinical notes, was the most used data. Tasks included patient diagnosis and triaging, identification of pain mentions, treatment response prediction, biomedical entity extraction, correlation of linguistic features with clinical states, and lexico-semantic analysis of pain narratives. Only one study included previous linguistic knowledge on pain utterances in their experimental setup. Most studies targeted their outcomes for physicians, either directly as clinical tools or as indirect knowledge. The least targeted stage of clinical pain care was self-management, in which patients are most involved. The least studied dimensions of pain were affective and sociocultural. Only two studies measured how physician performance on clinical tasks improved with the inclusion of the proposed algorithm. Discussion: This study found that future research should focus on analyzing patient-generated language of pain, developing patient-centered resources for self-management and patient-empowerment, exploring affective and sociocultural aspects of pain, and measuring improvements in physician performance when aided by the proposed tools.	翻訳日:2024-04-26 15:17:42 公開日:2024-04-24
# 共分散行列ダイナミクスのクロトフ制御による光学系における最適絡み合い生成 Optimal entanglement generation in optomechanical systems via Krotov control of covariance matrix dynamics ( http://arxiv.org/abs/2404.16227v1 ) ライセンス: Link先を確認	Peng-Ju Chen, Da-Wei Luo, Ting Yu,	(参考訳) 本研究では,Fockベースカットオフを使わずに,オプティメカルシステムにおける絡み合い生成に着目し,連続変数系の最適制御について検討した。 Krotovアルゴリズムを用いて共分散行列のダイナミクスを最適化し、制御対象関数を設計し、システムのダイナミクスを操作して望ましい目標状態を生成する方法を示した。本研究では, 外部レーザ場の変形制御を施すことにより, マクロメカニカルミラーと量子光学キャビティとの絡み合いを確実に生成できることを実証した。この制御は、低周波成分に制限するために外部磁場にスペクトル制約を課す際にも達成される可能性がある。さらに、非マルコフ開系力学に対する量子制御の影響を体系的に研究する。我々は, 環境騒音の有害な影響を緩和する上で, 記憶効果が有効であることを示した。特に、この絡み合いは、これらの記憶効果の存在下での崩壊を減少させる。 We investigated the optimal control of a continuous variable system, focusing on entanglement generation in an optomechanical system without utilizing Fock basis cutoffs. Using the Krotov algorithm to optimize the dynamics of the covariance matrix, we illustrated how to design a control objective function to manipulate the dynamics of the system to generate a desirable target state. We showed that entanglement between the macroscopic mechanical mirror and the quantum optical cavity can be reliably generated through imposing the control on the detuning of the external laser field. It has be shown that the control may be still achieved when imposing spectral constraints on the external field to restrict it to low-frequency components. In addition, we systematically studies the effects of quantum control on non-Markovian open system dynamics. We observed that memory effects can play a beneficial role in mitigating the detrimental impact of environmental noises. Specifically, the entanglement generated shows reduced decay in the presence of these memory effects.	翻訳日:2024-04-26 15:17:42 公開日:2024-04-24
# InAsP量子ドットナノワイヤにおける合成スピンワン鎖におけるマクロシングルトリップレット量子ビットを持つ2つの量子ビットゲート Two qubit gate with macroscopic singlet-triplet qubits in synthetic spin-one chains in InAsP quantum dot nanowires ( http://arxiv.org/abs/2404.16229v1 ) ライセンス: Link先を確認	Hassan Allami, Daniel Miravet, Marek Korkusinski, Pawel Hawrylak,	(参考訳) InAsP量子ドットナノワイヤにおける合成スピン-ワン鎖における2量子ゲートの理論について述べる。マクロトポロジカルに保護された一重項四重項量子ビットは、2つのスピン半分のハルデネ準粒子で構築される。ハルダン準粒子は、InAsP量子ドットの鎖で実現された合成スピン1鎖によってホストされ、それぞれ4つの電子を持つInPナノワイヤに埋め込まれている。量子ドットナノワイヤは相互作用する原子モデルから派生したハバード・カナモリ・ハミルトン(HK)によって記述される。正確な対角化とマトリックス生成状態(MPS)ツールを用いて、HKハミルトニアンの低エネルギー挙動が反強磁性スピン1鎖ハミルトニアンによって効果的に捕捉されることを示した。次に、2つのマクロ量子ビットについて考察し、その2つの鎖の間に中間制御点を挿入することにより、2つのマクロ量子ビット間の可変結合を生成する方法を提案する。最後に、高い精度の2ST量子ビットゲートを生成するための2つの方法を提案する。(1)各量子ビットの長さを制御し、(2)異なる背景磁場を2つの量子ビットに利用することによって、。 We present a theory of a two qubit gate with macroscopic singlet-triplet (ST) qubits in synthetic spin-one chains in InAsP quantum dot nanowires. The macroscopic topologically protected singlet-triplet qubits are built with two spin-half Haldane quasiparticles. The Haldane quasiparticles are hosted by synthetic spin-one chain realized in chains of InAsP quantum dots embedded in an InP nanowire, with four electrons each. The quantum dot nanowire is described by a Hubbard-Kanamori (HK) Hamiltonian derived from an interacting atomistic model. Using exact diagonalization and Matrix Product States (MPS) tools, we demonstrate that the low-energy behavior of the HK Hamiltonian is effectively captured by an antiferromagnetic spin-one chain Hamiltonian. Next we consider two macroscopic qubits and present a method for creating a tunable coupling between the two macroscopic qubits by inserting an intermediate control dot between the two chains. Finally, we propose and demonstrate two approaches for generating highly accurate two-ST qubit gates : (1) by controlling the length of each qubit, and (2) by employing different background magnetic fields for the two qubits.	翻訳日:2024-04-26 15:17:42 公開日:2024-04-24
# SECO: マルチサーバ階層間のモデル分割によるセキュア推論 SECO: Secure Inference With Model Splitting Across Multi-Server Hierarchy ( http://arxiv.org/abs/2404.16232v1 ) ライセンス: Link先を確認	Shuangyi Chen, Ashish Khisti,	(参考訳) 予測・アズ・ア・サービスという文脈では、データとモデルのプライバシに関する懸念が提起され、セキュアな推論プロトコルによって取り組まれている。これらのプロトコルは、さまざまなセキュリティ前提の下で設計された単一または複数の暗号化ツールを使用して構築される。本稿では,ユーザが入力データベクトルと複数のサーバノードを分割ニューラルネットワークモデルで配置して,データのプライバシを損なうことなく,予測を協調的に計算することのできるセキュアな推論プロトコルSECOを紹介する。我々は、ニューラルネットワークモデル全体を単一のサーバノード上に配置する必要のあるセキュアな推論に関する以前の作業を拡張し、マルチサーバ階層に拡張し、ユーザがゲートウェイサーバノードに通信し、リモートサーバノードに通信する。推論タスクはサーバノードに分割され、データベクトルの暗号化されたコピーで実行されなければならない。我々は,マルチパーティの同型暗号とマルチパーティのガーブロード回路方式を採用し,ユーザから部分モデル構造を保護するとともに,半正直なサーバの不正な大多数に対してシステムを保護する。我々は,複数のモデル上でSECOを評価し,ユーザの計算コストと通信コストの低減を実現し,限られたリソースを持つユーザのデバイスに適用可能なプロトコルを提案する。 In the context of prediction-as-a-service, concerns about the privacy of the data and the model have been brought up and tackled via secure inference protocols. These protocols are built up by using single or multiple cryptographic tools designed under a variety of different security assumptions. In this paper, we introduce SECO, a secure inference protocol that enables a user holding an input data vector and multiple server nodes deployed with a split neural network model to collaboratively compute the prediction, without compromising either party's data privacy. We extend prior work on secure inference that requires the entire neural network model to be located on a single server node, to a multi-server hierarchy, where the user communicates to a gateway server node, which in turn communicates to remote server nodes. The inference task is split across the server nodes and must be performed over an encrypted copy of the data vector. We adopt multiparty homomorphic encryption and multiparty garbled circuit schemes, making the system secure against dishonest majority of semi-honest servers as well as protecting the partial model structure from the user. We evaluate SECO on multiple models, achieving the reduction of computation and communication cost for the user, making the protocol applicable to user's devices with limited resources.	翻訳日:2024-04-26 15:17:42 公開日:2024-04-24
# AutoGluon-Multimodal (AutoMM): ファンデーションモデルによるマルチモーダルオートMLのスーパーチャージ AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models ( http://arxiv.org/abs/2404.16233v1 ) ライセンス: Link先を確認	Zhiqiang Tang, Haoyang Fang, Su Zhou, Taojiannan Yang, Zihan Zhong, Tony Hu, Katrin Kirchhoff, George Karypis,	(参考訳) AutoGluon-Multimodal(AutoMM)は、マルチモーダル学習に特化したオープンソースのAutoMLライブラリとして導入された。非常に使いやすく、AutoMMは3行のコードで基礎モデルの微調整を可能にする。画像、テキスト、および表データを含む様々なモダリティをサポートするため、ライブラリは、分類、回帰、オブジェクト検出、セマンティックマッチング、イメージセグメンテーションにまたがる、包括的な機能スイートを提供する。さまざまなデータセットやタスクにわたる実験では、既存のAutoMLツールと比較して、基本的な分類や回帰タスクにおけるAutoMMの優れたパフォーマンスを示すと同時に、高度なタスクにおける競合結果を示し、そのような目的のために設計された特殊なツールボックスと整合する。 AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning. Distinguished by its exceptional ease of use, AutoMM enables fine-tuning of foundational models with just three lines of code. Supporting various modalities including image, text, and tabular data, both independently and in combination, the library offers a comprehensive suite of functionalities spanning classification, regression, object detection, semantic matching, and image segmentation. Experiments across diverse datasets and tasks showcases AutoMM's superior performance in basic classification and regression tasks compared to existing AutoML tools, while also demonstrating competitive results in advanced tasks, aligning with specialized toolboxes designed for such purposes.	翻訳日:2024-04-26 15:17:42 公開日:2024-04-24
# 軌道関数の厳密な形式化:非相互作用的$v$-表現可能性問題に対処する Rigorous Formalization of Orbital Functionals: Addressing the Noninteracting $v$-Representability Problem ( http://arxiv.org/abs/2404.16236v1 ) ライセンス: Link先を確認	Neil Qiang Su,	(参考訳) 占有軌道、占有軌道、または占有軌道に明示的に依存する函数はクリフォード代数を用いて厳密に定式化され、公式な実装法として軌道(および占有軌道)最適化を促進する変分原理が確立される。理論的には、これらの手法は、元のコーン・シャムと関連する方法、特に相互作用する系の電子密度が相互作用しない参照系と一致しない場合の限界を回避している。この研究は、軌道汎函数(および占有)を新しい視点から再定義し、従来の密度汎函数の拡張としてだけでなく、優越的で厳密な代替として位置づける。 Functionals that explicitly depend on occupied, unoccupied, or fractionally-occupied orbitals are rigorously formalized using Clifford algebras, and a variational principle is established that facilitates orbital (and occupation) optimization as a formal implementation method. Theoretically, these methodologies circumvent the limitations encountered in the original Kohn-Sham and related methods, particularly when the interacting system's electron density does not match that of any noninteracting reference system. This work redefines orbital (and occupation) functionals from a novel perspective, positioning them not merely as extensions of traditional density functionals, but as superior, rigorous alternatives.	翻訳日:2024-04-26 15:17:42 公開日:2024-04-24
# 高度な情報理論によるデータ分析におけるプライバシとユーティリティの相乗効果 Synergizing Privacy and Utility in Data Analytics Through Advanced Information Theorization ( http://arxiv.org/abs/2404.16241v1 ) ライセンス: Link先を確認	Zahir Alsulaimawi,	(参考訳) 本研究では、プライバシ保護データ分析のための新しいフレームワークを開発し、データユーティリティとプライバシに関するバランスをとるという重要な課題に対処する。本稿では,高次元画像データに適したノイズ注入技術,高感度属性をマスキングしながら特徴抽出を行う可変オートエンコーダ(VAE),構造化データプライバシに最適化された期待最大化(EM)アプローチの3つの高度なアルゴリズムを紹介する。修正MNISTやCelebrityAなどのデータセットに適用することにより、機密属性と変換データ間の相互情報を著しく低減し、プライバシーを向上する。実験の結果,これらの手法が優れたプライバシ保護を実現し,高いユーティリティを保ち,両面が不可欠である実用的なアプリケーションに有効であることが確認された。この研究は、さまざまなデータタイプにまたがってプライバシ保護アルゴリズムをデプロイするためのフレキシブルで効果的な戦略を提供し、データ分析における実用性と機密性のための新しいベンチマークを確立することで、この分野に貢献する。 This study develops a novel framework for privacy-preserving data analytics, addressing the critical challenge of balancing data utility with privacy concerns. We introduce three sophisticated algorithms: a Noise-Infusion Technique tailored for high-dimensional image data, a Variational Autoencoder (VAE) for robust feature extraction while masking sensitive attributes and an Expectation Maximization (EM) approach optimized for structured data privacy. Applied to datasets such as Modified MNIST and CelebrityA, our methods significantly reduce mutual information between sensitive attributes and transformed data, thereby enhancing privacy. Our experimental results confirm that these approaches achieve superior privacy protection and retain high utility, making them viable for practical applications where both aspects are crucial. The research contributes to the field by providing a flexible and effective strategy for deploying privacy-preserving algorithms across various data types and establishing new benchmarks for utility and confidentiality in data analytics.	翻訳日:2024-04-26 15:17:42 公開日:2024-04-24
# muRelBench: Zonotopeドメイン用のマイクロベンチマーク muRelBench: MicroBenchmarks for Zonotope Domains ( http://arxiv.org/abs/2404.16243v1 ) ライセンス: Link先を確認	Kenny Ballou, Elena Sherman,	(参考訳) 我々は、弱い関係の抽象ドメインとその操作のための合成ベンチマークスイートである、texttt{muRelBench}を提示する。例えば、ベンチマークはドメイン閉鎖のような提案されたアルゴリズムの実験的な評価をサポートすることができる。 We present \texttt{muRelBench}, a suite of synthetic benchmarks for weakly-relational abstract domains and their operations. For example, the benchmarks can support experimental evaluations of proposed algorithms such as domain closure.	翻訳日:2024-04-26 15:17:42 公開日:2024-04-24
# 高度なAIアシスタントの倫理 The Ethics of Advanced AI Assistants ( http://arxiv.org/abs/2404.16244v1 ) ライセンス: Link先を確認	Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks, Verena Rieser, Hasan Iqbal, Nenad Tomašev, Ira Ktena, Zachary Kenton, Mikel Rodriguez, Seliem El-Sayed, Sasha Brown, Canfer Akbulut, Andrew Trask, Edward Hughes, A. Stevie Bergman, Renee Shelby, Nahema Marchal, Conor Griffin, Juan Mateos-Garcia, Laura Weidinger, Winnie Street, Benjamin Lange, Alex Ingerman, Alison Lentz, Reed Enger, Andrew Barakat, Victoria Krakovna, John Oliver Siy, Zeb Kurth-Nelson, Amanda McCroskery, Vijay Bolina, Harry Law, Murray Shanahan, Lize Alberts, Borja Balle, Sarah de Haas, Yetunde Ibitoye, Allan Dafoe, Beth Goldberg, Sébastien Krier, Alexander Reese, Sims Witherspoon, Will Hawkins, Maribeth Rauh, Don Wallace, Matija Franklin, Josh A. Goldstein, Joel Lehman, Michael Klenk, Shannon Vallor, Courtney Biles, Meredith Ringel Morris, Helen King, Blaise Agüera y Arcas, William Isaac, James Manyika,	(参考訳) 本稿では,高度AIアシスタントがもたらす倫理的・社会的リスクについて論じる。我々は、先進的なAIアシスタントを自然言語インタフェースを備えた人工知能エージェントとして定義し、ユーザに代わって、1つ以上のドメインにわたって、ユーザの期待に応えてアクションのシーケンスを計画および実行することが機能する。この論文は、AIアシスタント、その技術基盤、潜在的な応用範囲の概要を提供する、技術自体を考えることから始まる。そして、AIの価値アライメント、幸福、安全、悪意のある使用に関する質問を探索する。次に、高度なAIアシスタントと個人ユーザとの関係をさらに詳細に検討し、操作や説得、人為性、適切な関係、信頼、プライバシといったトピックを探求する。この分析によって、高度なアシスタントの社会規模での展開を考慮し、協力、株式とアクセス、誤情報、経済的影響、環境、先進的なAIアシスタントの評価方法に焦点をあてる。最後に、研究者、開発者、政策立案者、および公共ステークホルダーに対して、さまざまなレコメンデーションを提供することで締めくくります。 This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, providing an overview of AI assistants, their technical foundations and potential range of applications. It then explores questions around AI value alignment, well-being, safety and malicious uses. Extending the circle of inquiry further, we next consider the relationship between advanced AI assistants and individual users in more detail, exploring topics such as manipulation and persuasion, anthropomorphism, appropriate relationships, trust and privacy. With this analysis in place, we consider the deployment of advanced assistants at a societal scale, focusing on cooperation, equity and access, misinformation, economic impact, the environment and how best to evaluate advanced AI assistants. Finally, we conclude by providing a range of recommendations for researchers, developers, policymakers and public stakeholders.	翻訳日:2024-04-26 15:17:42 公開日:2024-04-24
# 角運動量量子エンタングルを用いた固体高調波ガウス軌道の計算効率の良い分子積分 Computationally Efficient Molecular Integrals of Solid Harmonic Gaussian Orbitals Using Quantum Entanglement of Angular Momentum ( http://arxiv.org/abs/2404.16245v1 ) ライセンス: Link先を確認	Hang Hu, Gilles Peslherbe, Hsu Kiang Ooi, Anguang Hu,	(参考訳) 角運動量の量子論におけるベクトルカップリングとベクトルアンカップリングスキームは、量子角運動量状態で作用するユニタリなクレブシュ・ゴルダン変換に対応し、それによってそれらの絡み合いの度合いを制御する。この変換から量子角運動量を加えることは、量子角運動量の絡み合いの度合いを減らし、固体調和ガウス軌道(SHGO)の分子積分の単純かつ効果的な計算に繋がる。古典的コンピュータでさえ、SHGOと分子核クーロン積分の評価におけるスピードアップ比は、高い角運動量量子数を持つ原子軌道に対して最大4桁までである。したがって、量子系の絡み合いが小さくなればなるほど、シミュレーションが容易になり、SHGOの分子積分は量子コンピューティングに特に適していることが示される。角運動量状態のClebsch-Gordan変換のユニタリおよびカスケードのために以前に開発された高効率量子回路は、量子化学においてユビキタスな2電子クーロン積分を効率的に計算するために固体調和系の微分および積規則に適用することができる。このような量子回路と変分量子固有解法アルゴリズムを組み合わせることで、この論文で明らかになった固体調和基底における分子積分の高い計算効率は、完全な量子コンピューティング化学を加速するための道を開くかもしれない。 Vector-coupling and vector-uncoupling schemes in the quantum theory of angular momentum correspond to unitary Clebsch-Gordan transformations that operate on quantum angular momentum states and thereby control their degree of entanglement. The addition of quantum angular momentum from this transformation is suitable for reducing the degree of entanglement of quantum angular momentum, leading to simple and effective calculations of the molecular integrals of solid harmonic Gaussian orbitals (SHGO). Even with classical computers, the speed-up ratio in the evaluation of molecular nuclear Coulomb integrals with SHGOs can be up to four orders of magnitude for atomic orbitals with high angular momentum quantum number. Thus, the less entanglement there is for a quantum system the easier it is to simulate, and molecular integrals with SHGOs are shown to be particularly well-suited for quantum computing. High-efficiency quantum circuits previously developed for unitary and cascading Clebsch-Gordan transformations of angular momentum states can be applied to the differential and product rules of solid harmonics to efficiently compute two-electron Coulomb integrals ubiquitous in quantum chemistry. Combined with such quantum circuits and variational quantum eigensolver algorithms, the high computational efficiency of molecular integrals in solid harmonic bases unveiled in this paper may open an avenue for accelerating full quantum computing chemistry.	翻訳日:2024-04-26 15:17:42 公開日:2024-04-24
# URL:タスク指示表現圧縮によるユニバーサル参照知識リンク URL: Universal Referential Knowledge Linking via Task-instructed Representation Compression ( http://arxiv.org/abs/2404.16248v1 ) ライセンス: Link先を確認	Zhuoqun Li, Hongyu Lin, Tianshu Wang, Boxi Cao, Yaojie Lu, Weixiang Zhou, Hao Wang, Zhenyu Zeng, Le Sun, Xianpei Han,	(参考訳) 根拠付き参照にクレームをリンクすることは、信頼できる情報に対する人間の要求を満たす重要な能力である。現在の研究は情報検索やセマンティックマッチングのような特定のタスクに限定されており、クレーム-参照関係はユニークで固定的であり、実世界の参照知識リンク(RKL)はより多様で複雑である。本稿では,1つの統一モデルにより多種多様な参照知識リンクタスクを解決することを目的とした,ユニバーサル参照知識リンク(URL)を提案する。そこで本研究では,LLMの命令従順と意味理解能力を参照知識リンクに効果的に適応させるため,LLMによるタスク命令型表現圧縮と多視点学習手法を提案する。さらに,様々なシナリオにまたがる参照知識リンクタスクにおけるモデルの有効性を評価するための新しいベンチマークを構築した。実験により、既存の手法では普遍的なRKLが困難であることが示され、提案したフレームワークは様々なシナリオでタスクを効果的に解決し、従って従来の手法よりも大きなマージンで優れていることが示された。 Linking a claim to grounded references is a critical ability to fulfill human demands for authentic and reliable information. Current studies are limited to specific tasks like information retrieval or semantic matching, where the claim-reference relationships are unique and fixed, while the referential knowledge linking (RKL) in real-world can be much more diverse and complex. In this paper, we propose universal referential knowledge linking (URL), which aims to resolve diversified referential knowledge linking tasks by one unified model. To this end, we propose a LLM-driven task-instructed representation compression, as well as a multi-view learning approach, in order to effectively adapt the instruction following and semantic understanding abilities of LLMs to referential knowledge linking. Furthermore, we also construct a new benchmark to evaluate ability of models on referential knowledge linking tasks across different scenarios. Experiments demonstrate that universal RKL is challenging for existing approaches, while the proposed framework can effectively resolve the task across various scenarios, and therefore outperforms previous approaches by a large margin.	翻訳日:2024-04-26 15:07:57 公開日:2024-04-24
# SemgrexとSsurgeon, 依存グラフの検索と操作 Semgrex and Ssurgeon, Searching and Manipulating Dependency Graphs ( http://arxiv.org/abs/2404.16250v1 ) ライセンス: Link先を確認	John Bauer, Chloe Kiddon, Eric Yeh, Alex Shan, Christopher D. Manning,	(参考訳) 依存性グラフを検索してそれらを操作するのは、正しく行うのに時間がかかり、難しい作業になる可能性がある。本稿では,依存グラフを検索するシステムであるSemgrexを文書化し,Semgrexの出力を操作するシステムであるSsurgeonを紹介する。これらのシステムで使用されるコンパクトな言語は、依存性のコマンドラインやAPI処理を容易にする。さらに、JavaとPythonで公開されたツールキットとの統合により、テキストの関係や属性を自然のテキストで検索できる。 Searching dependency graphs and manipulating them can be a time consuming and challenging task to get right. We document Semgrex, a system for searching dependency graphs, and introduce Ssurgeon, a system for manipulating the output of Semgrex. The compact language used by these systems allows for easy command line or API processing of dependencies. Additionally, integration with publicly released toolkits in Java and Python allows for searching text relations and attributes over natural text.	翻訳日:2024-04-26 15:07:57 公開日:2024-04-24
# マルチターンLDM相互作用における急速漏洩効果とブラックボックス防御の検討 Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions ( http://arxiv.org/abs/2404.16251v1 ) ライセンス: Link先を確認	Divyansh Agarwal, Alexander R. Fabbri, Philippe Laban, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu,	(参考訳) 大規模言語モデル(LLM)のプロンプトリークは、特に検索強化世代(RAG)システムにおいて、重大なセキュリティとプライバシの脅威を引き起こす。しかし, マルチターンLDM相互作用と緩和戦略のリークは, 標準化された方法では研究されていない。本稿では,4つの異なるドメインと10のクローズドおよびオープンソース LLM にまたがる急激なリークに対するLSM 脆弱性について検討する。我々のユニークなマルチターン脅威モデルでは, LLMのサイコファンシー効果を活用し, LLM応答におけるタスク命令と知識リークを識別する。マルチターン環境では,GPT-4およびclaude-1.3による99%のリークを含む平均攻撃成功率(ASR)が86.2%に上昇する。 GeminiのようなブラックボックスのLCMの中には、ドメイン間のリークに対する様々な感受性を示すものもあります - 医療ドメインと比較して、ニュースドメインのコンテキスト知識をリークする傾向があります。実験では,RAGシナリオにおけるクエリリライタを含む6つのブラックボックス防衛戦略の具体的な効果を測定した。提案する多層防御の組み合わせは, ブラックボックスLLMのASRは5.3%であり, LLMセキュリティ研究の強化と今後の方向性を示す余地がある。 Prompt leakage in large language models (LLMs) poses a significant security and privacy threat, particularly in retrieval-augmented generation (RAG) systems. However, leakage in multi-turn LLM interactions along with mitigation strategies has not been studied in a standardized manner. This paper investigates LLM vulnerabilities against prompt leakage across 4 diverse domains and 10 closed- and open-source LLMs. Our unique multi-turn threat model leverages the LLM's sycophancy effect and our analysis dissects task instruction and knowledge leakage in the LLM response. In a multi-turn setting, our threat model elevates the average attack success rate (ASR) to 86.2%, including a 99% leakage with GPT-4 and claude-1.3. We find that some black-box LLMs like Gemini show variable susceptibility to leakage across domains - they are more likely to leak contextual knowledge in the news domain compared to the medical domain. Our experiments measure specific effects of 6 black-box defense strategies, including a query-rewriter in the RAG scenario. Our proposed multi-tier combination of defenses still has an ASR of 5.3% for black-box LLMs, indicating room for enhancement and future direction for LLM security research.	翻訳日:2024-04-26 15:07:57 公開日:2024-04-24
# 完全同型暗号化による顔分析のプライバシー向上 Enhancing Privacy in Face Analytics Using Fully Homomorphic Encryption ( http://arxiv.org/abs/2404.16255v1 ) ライセンス: Link先を確認	Bharat Yalavarthi, Arjun Ramesh Kaushik, Arun Ross, Vishnu Boddeti, Nalini Ratha,	(参考訳) 現代の顔認識システムは、ディープニューラルネットワークを使用して顔から有能な特徴を抽出する。これらの特徴は潜在空間への埋め込みを意味し、しばしば顔認識システム内のテンプレートとして保存される。これらの埋め込みはデータ漏洩の影響を受けやすく、場合によっては元の顔画像の再構築にも利用できる。妥協のアイデンティティを防止するため、テンプレート保護スキームが一般的である。しかし、これらのスキームは、年齢、性別、人種などの柔らかい生体情報漏洩を未然に防ぐことはできない。この問題を軽減するために,FHE(Fully Homomorphic Encryption)と,PolyProtectと呼ばれる既存のテンプレート保護スキームを組み合わせた新しい手法を提案する。埋め込みはFHEを用いて圧縮・暗号化され、多項式変換を用いてセキュアなPolyProtectテンプレートに変換され、さらなる保護が可能であることを示す。提案手法の有効性を,複数のデータセットに対する広範な実験により実証する。提案手法は, 認識精度を損なうことなく, ソフトな生体認証属性の漏洩を効果的に防止する。 Modern face recognition systems utilize deep neural networks to extract salient features from a face. These features denote embeddings in latent space and are often stored as templates in a face recognition system. These embeddings are susceptible to data leakage and, in some cases, can even be used to reconstruct the original face image. To prevent compromising identities, template protection schemes are commonly employed. However, these schemes may still not prevent the leakage of soft biometric information such as age, gender and race. To alleviate this issue, we propose a novel technique that combines Fully Homomorphic Encryption (FHE) with an existing template protection scheme known as PolyProtect. We show that the embeddings can be compressed and encrypted using FHE and transformed into a secure PolyProtect template using polynomial transformation, for additional protection. We demonstrate the efficacy of the proposed approach through extensive experiments on multiple datasets. Our proposed approach ensures irreversibility and unlinkability, effectively preventing the leakage of soft biometric attributes from face embeddings without compromising recognition accuracy.	翻訳日:2024-04-26 15:07:57 公開日:2024-04-24
# 低コストかつスケーラブルなローハンマ除去のための確率的トラッカー管理法 Probabilistic Tracker Management Policies for Low-Cost and Scalable Rowhammer Mitigation ( http://arxiv.org/abs/2404.16256v1 ) ライセンス: Link先を確認	Aamer Jaleel, Stephen W. Keckler, Gururaj Saileshwar,	(参考訳) 本稿ではDRAM Rowhammer攻撃の軽減に焦点を当てる。近年、TRRのようなソリューションがDDR4 DRAMにデプロイされ、攻撃者行を追跡し、近隣の犠牲者行をリフレッシュすることで緩和作用が発行されている。残念ながら、そのようなDRAM内ソリューションはリソース制約(攻撃行を追跡するために数十のカウンタしかプロビジョニングできない)であり、それらを騙すのに使われた攻撃をスラッシングする傾向がある。 DRAMトラッカーの安全な代替品は数万のカウンタを必要とする。本研究は,資源制約トラッカーを用いた安全でスケーラブルなローハマー緩和を実証する。私たちのキーとなる考え方は、確率的管理ポリシー(PROTEAS)でこのようなトラッカーを管理することです。 PROTEASには、リクエストストリームサンプリングやランダムな消去のようなコンポーネントポリシーが含まれており、リソース制約されたトラッカーのスラッシュ耐性を可能にする。 Rowhammerのしきい値が500に下がったとしても、ProteASは小さなDRAMトラッカー(DRAMバンクあたり16カウンタ)を確保でき、3%のスローダウンを達成できる。さらに, PROTEAS はSamsung (DSAC) による最近の同様の確率的提案 (DSAC) よりも優れており,Rowhammer に対するレジリエンスは 11X - 19 倍であることを示す。 This paper focuses on mitigating DRAM Rowhammer attacks. In recent years, solutions like TRR have been deployed in DDR4 DRAM to track aggressor rows and then issue a mitigative action by refreshing neighboring victim rows. Unfortunately, such in-DRAM solutions are resource-constrained (only able to provision few tens of counters to track aggressor rows) and are prone to thrashing based attacks, that have been used to fool them. Secure alternatives for in-DRAM trackers require tens of thousands of counters. In this work, we demonstrate secure and scalable rowhammer mitigation using resource-constrained trackers. Our key idea is to manage such trackers with probabilistic management policies (PROTEAS). PROTEAS includes component policies like request-stream sampling and random evictions which enable thrash-resistance for resource-constrained trackers. We show that PROTEAS can secure small in-DRAM trackers (with 16 counters per DRAM bank) even when Rowhammer thresholds drop to 500 while incurring less than 3% slowdown. Moreover, we show that PROTEAS significantly outperforms a recent similar probabilistic proposal from Samsung (called DSAC) while achieving 11X - 19X the resilience against Rowhammer.	翻訳日:2024-04-26 15:07:57 公開日:2024-04-24
# すべての支払バウンドタスクに対するスワップレグレットの最小化予測 Predict to Minimize Swap Regret for All Payoff-Bounded Tasks ( http://arxiv.org/abs/2404.13503v2 ) ライセンス: Link先を確認	Lunjia Hu, Yifan Wu,	(参考訳) 一連の予測がキャリブレーションされるのは、下流のすべての決定タスクに対してスワップ後悔を誘発しない場合に限られる。本稿では,バイナリイベントの予測の最大スワップレグレット(MSR)について検討する。これまで、MSRを最小化するための最良のオンライン予測アルゴリズムは、MSRの上限であるK1校正誤差を一定要素まで最小化することで得られる。しかし、最近の研究 (Qiao and Valiant, 2021) は、最悪のケースで予想される$K_1$キャリブレーション誤差に対して${\Omega}(T^{0.528})$低いバウンドを与える。 MSRのいくつかの緩和はこの障壁を克服すると考えられており、外部の後悔(Kleinberg et al , 2023)と、下流のタスクの作用数(Noarov et al , 2023; Roth and Shi, 2024)に多項式的に依存する後悔の限界を通じてである。我々は、この障壁を緩和することなく克服できることを示し、$O(\sqrt{T}logT)$ expected MSRを保証する効率的なランダム化予測アルゴリズムを提供する。また、MSRを決定論的キャリブレーション誤差指標とみなし、キャリブレーションの経済的有用性についても検討し、既存の指標との関係について検討する。 A sequence of predictions is calibrated if and only if it induces no swap regret to all down-stream decision tasks. We study the Maximum Swap Regret (MSR) of predictions for binary events: the swap regret maximized over all downstream tasks with bounded payoffs. Previously, the best online prediction algorithm for minimizing MSR is obtained by minimizing the K1 calibration error, which upper bounds MSR up to a constant factor. However, recent work (Qiao and Valiant, 2021) gives an ${\Omega}(T^{0.528})$ lower bound for the worst-case expected $K_1$ calibration error incurred by any randomized algorithm in T rounds, presenting a barrier to achieving better rates for MSR. Several relaxations of MSR have been considered to overcome this barrier, via external regret (Kleinberg et al., 2023) and regret bounds depending polynomially on the number of actions in downstream tasks (Noarov et al., 2023; Roth and Shi, 2024). We show that the barrier can be surpassed without any relaxations: we give an efficient randomized prediction algorithm that guarantees $O(\sqrt{T}logT)$ expected MSR. We also discuss the economic utility of calibration by viewing MSR as a decision-theoretic calibration error metric and study its relationship to existing metrics.	翻訳日:2024-04-26 12:31:48 公開日:2024-04-24
# MARVEL:視覚的評価と学習による多次元抽象化と推論 MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning ( http://arxiv.org/abs/2404.13591v2 ) ライセンス: Link先を確認	Yifan Jiang, Jiarui Zhang, Kexuan Sun, Zhivar Sourati, Kian Ahrabian, Kaixin Ma, Filip Ilievski, Jay Pujara,	(参考訳) マルチモーダルな大規模言語モデル(MLLM)は、多くの一般的な視覚推論ベンチマークにおいて大きな進歩を示しているが、それらが抽象的な視覚推論能力を持っているかどうかは未解決のままである。スドゥークパズルと同様に、抽象的視覚推論(AVR)問題は、特定のタスク構成(例えば、行列)において入力形状(例えば、桁)を制御する高レベルパターン(例えば、繰り返し制約)を見つける必要がある。しかし、既存のAVRベンチマークでは、パターンの限られたセット(付加、結合)、入力形状(矩形、正方形)、タスク構成(3×3行列)しか考慮されていない。 MLLMの推論能力を総合的に評価するため、MARVELは6つのコア知識パターン、幾何学的および抽象的形状、および5つの異なるタスク構成からなる770個のパズルからなる多次元AVRベンチマークである。モデル精度が知覚と推論の基盤となっているかどうかを調べるため、MARVELは階層的評価フレームワークにおいて、一般的なAVR質問と知覚質問を補完する。我々は9つの代表MLLMをゼロショットおよび少数ショット設定でMARVEL上で包括的実験を行う。実験の結果、AVR質問では、すべてのモデルがほぼランダムなパフォーマンスを示しており、すべてのパターンやタスク構成にまたがる人間と比較して、大きなパフォーマンスギャップ(40%)があることがわかった。知覚的疑問のさらなる分析により、MLLMは視覚的特徴(ほぼランダムなパフォーマンス)を理解するのに苦労し、パズルのパネル(45%)を数えることさえ困難であり、抽象的推論の能力を妨げていることが明らかになった。コードとデータセット全体をリリースします。 While multi-modal large language models (MLLMs) have shown significant progress on many popular visual reasoning benchmarks, whether they possess abstract visual reasoning abilities remains an open question. Similar to the Sudoku puzzles, abstract visual reasoning (AVR) problems require finding high-level patterns (e.g., repetition constraints) that control the input shapes (e.g., digits) in a specific task configuration (e.g., matrix). However, existing AVR benchmarks only considered a limited set of patterns (addition, conjunction), input shapes (rectangle, square), and task configurations (3 by 3 matrices). To evaluate MLLMs' reasoning abilities comprehensively, we introduce MARVEL, a multidimensional AVR benchmark with 770 puzzles composed of six core knowledge patterns, geometric and abstract shapes, and five different task configurations. To inspect whether the model accuracy is grounded in perception and reasoning, MARVEL complements the general AVR question with perception questions in a hierarchical evaluation framework. We conduct comprehensive experiments on MARVEL with nine representative MLLMs in zero-shot and few-shot settings. Our experiments reveal that all models show near-random performance on the AVR question, with significant performance gaps (40%) compared to humans across all patterns and task configurations. Further analysis of perception questions reveals that MLLMs struggle to comprehend the visual features (near-random performance) and even count the panels in the puzzle ( <45%), hindering their ability for abstract reasoning. We release our entire code and dataset.	翻訳日:2024-04-26 12:31:48 公開日:2024-04-24
# Describe-then-Reason: Visual Comprehension Training によるマルチモーダル数学的推論の改善 Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training ( http://arxiv.org/abs/2404.14604v2 ) ライセンス: Link先を確認	Mengzhao Jia, Zhihan Zhang, Wenhao Yu, Fangkai Jiao, Meng Jiang,	(参考訳) オープンソースのマルチモーダル大言語モデル(MLLM)は、テキスト入力や視覚入力を含む様々なタスクに優れていますが、GPT-4V(ision)やGemini-Proといったプロプライエタリなモデルに遅れを取っている複雑なマルチモーダル数学的推論に苦戦しています。中間段階(すなわち理性)による微調整は、いくつかの数学的推論スキルを引き出すが、結果として得られるモデルは、まだ視覚中心の監督が不十分なため、視覚的理解に乏しく、数学の数字の正確な解釈に繋がる。この問題に対処するために,2段階のトレーニングパイプラインVCARを提案する。まず、視覚的記述生成タスクを通じてMLLMの視覚的理解能力を向上し、次に、説明の助けを借りて合理性を生成するための別の訓練ステップを行う。 2つの人気のあるベンチマーク実験の結果、VCARは、特に高い視覚的要求のある問題において、合理的な監督にのみ依存するベースライン手法を大幅に上回っていることが示された。 Open-source multimodal large language models (MLLMs) excel in various tasks involving textual and visual inputs but still struggle with complex multimodal mathematical reasoning, lagging behind proprietary models like GPT-4V(ision) and Gemini-Pro. Although fine-tuning with intermediate steps (i.e., rationales) elicits some mathematical reasoning skills, the resulting models still fall short in visual comprehension due to inadequate visual-centric supervision, which leads to inaccurate interpretation of math figures. To address this issue, we propose a two-step training pipeline VCAR, which emphasizes the Visual Comprehension training in Addition to mathematical Reasoning learning. It first improves the visual comprehension ability of MLLMs through the visual description generation task, followed by another training step on generating rationales with the assistance of descriptions. Experimental results on two popular benchmarks demonstrate that VCAR substantially outperforms baseline methods solely relying on rationale supervision, especially on problems with high visual demands.	翻訳日:2024-04-26 12:31:48 公開日:2024-04-24
# 小児脳腫瘍切除 : CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDsを中心に The Brain Tumor Segmentation in Pediatrics (BraTS-PEDs) Challenge: Focus on Pediatrics (CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs) ( http://arxiv.org/abs/2404.15009v2 ) ライセンス: Link先を確認	Anahita Fathi Kazerooni, Nastaran Khalili, Deep Gandhi, Xinyang Liu, Zhifan Jiang, Syed Muhammed Anwar, Jake Albrecht, Maruf Adewole, Udunna Anazodo, Hannah Anderson, Sina Bagheri, Ujjwal Baid, Timothy Bergquist, Austin J. Borja, Evan Calabrese, Verena Chung, Gian-Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov, Ariana Familiar, Keyvan Farahani, Anurag Gottipati, Debanjan Haldar, Shuvanjan Haldar, Juan Eugenio Iglesias, Anastasia Janas, Elaine Johansen, Blaise V Jones, Neda Khalili, Florian Kofler, Dominic LaBella, Hollie Anne Lai, Koen Van Leemput, Hongwei Bran Li, Nazanin Maleki, Aaron S McAllister, Zeke Meier, Bjoern Menze, Ahmed W Moawad, Khanak K Nandolia, Julija Pavaine, Marie Piraud, Tina Poussaint, Sanjay P Prabhu, Zachary Reitman, Andres Rodriguez, Jeffrey D Rudie, Mariana Sanchez-Montano, Ibraheem Salman Shaikh, Lubdha M. Shah, Nakul Sheth, Russel Taki Shinohara, Wenxin Tu, Karthik Viswanathan, Chunhao Wang, Jeffrey B Ware, Benedikt Wiestler, Walter Wiggins, Anna Zapaishchykova, Mariam Aboian, Miriam Bornhorst, Peter de Blank, Michelle Deutsch, Maryam Fouladi, Lindsey Hoffman, Benjamin Kann, Margot Lazow, Leonie Mikael, Ali Nabavizadeh, Roger Packer, Spyridon Bakas, Adam Resnick, Brian Rood, Arastoo Vossough, Marius George Linguraru,	(参考訳) 中枢神経系の小児腫瘍は、小児におけるがん関連死の最も一般的な原因である。小児の高次グリオーマの生存率は20%未満である。希少性のため、診断が遅れることが多く、治療は主に歴史的治療の概念に基づいており、臨床試験には複数施設の協力が必要である。 CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDsの課題は、小児脳腫瘍に焦点をあて、小児神経腫瘍学および臨床治験に特化した複数の国際コンソーシアムにまたがるデータを収集することである。 CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDsチャレンジは、臨床治験に役立つ自動セグメンテーション技術の開発と、最終的には脳腫瘍を持つ子供のケアを加速させる。 Pediatric tumors of the central nervous system are the most common cause of cancer-related death in children. The five-year survival rate for high-grade gliomas in children is less than 20%. Due to their rarity, the diagnosis of these entities is often delayed, their treatment is mainly based on historic treatment concepts, and clinical trials require multi-institutional collaborations. Here we present the CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs challenge, focused on pediatric brain tumors with data acquired across multiple international consortia dedicated to pediatric neuro-oncology and clinical trials. The CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs challenge brings together clinicians and AI/imaging scientists to lead to faster development of automated segmentation techniques that could benefit clinical trials, and ultimately the care of children with brain tumors.	翻訳日:2024-04-26 12:31:48 公開日:2024-04-24
# 骨格運動自動評価におけるフィードバック生成の探索 : 概観 Exploring Feedback Generation in Automated Skeletal Movement Assessment: A Comprehensive Overview ( http://arxiv.org/abs/2404.09359v3 ) ライセンス: Link先を確認	Tal Hakim,	(参考訳) 近年,スケルトンビデオからの運動評価への機械学習の応用が注目されている。この進歩により、在宅でのリハビリテーションがより容易になり、2Dや3Dビデオから手頃な価格でポーズ検出や分析を行うための移動評価アルゴリズムが利用できるようになった。自動評価タスクの主目的は運動を評価することであるが、重要な運動課題を強調したフィードバックの自動生成は、リハビリテーションプロセスを大幅に強化し、加速する可能性がある。自動動作評価の分野では数多くの研究が存在しているが、アドレスフィードバック生成はごくわずかである。本研究では, 生成可能なフィードバックの種類を説明し, 自動フィードバック生成のための既存のソリューションをレビューし, 今後の研究方向性について議論する。我々の知る限り、骨格運動評価におけるフィードバック生成の総合的なレビューはこれが初めてである。 The application of machine-learning solutions to movement assessment from skeleton videos has attracted significant research attention in recent years. This advancement has made rehabilitation at home more accessible, utilizing movement assessment algorithms that can operate on affordable equipment for human pose detection and analysis from 2D or 3D videos. While the primary objective of automatic assessment tasks is to score movements, the automatic generation of feedback highlighting key movement issues has the potential to significantly enhance and accelerate the rehabilitation process. While numerous research works exist in the field of automatic movement assessment, only a handful address feedback generation. In this study, we explain the types of feedback that can be generated, review existing solutions for automatic feedback generation, and discuss future research directions. To our knowledge, this is the first comprehensive review of feedback generation in skeletal movement assessment.	翻訳日:2024-04-25 20:16:34 公開日:2024-04-24
# AIを使ったプログラミングアシスタントは、どこまで開発者のニーズを満たすことができるのか? How far are AI-powered programming assistants from meeting developers' needs? ( http://arxiv.org/abs/2404.12000v2 ) ライセンス: Link先を確認	Xin Tan, Xiao Long, Xianjun Ni, Yinghao Zhu, Jing Jiang, Li Zhang,	(参考訳) GitHub Copilotのような最近のIDE内AIコーディングアシスタントツール(ACAT)は、開発者のコーディング習慣に大きな影響を与えている。有効性について調べる研究もあるが、実際の支援プロセスについて詳細な調査は行われていない。このギャップを埋めるために、我々は3つの典型的なソフトウェア開発タスクを含む実際の開発シナリオをシミュレートし、27人のコンピュータサイエンス学生を募集し、3つの一般的なACATを用いて彼らの振る舞いを調査する。私たちのゴールは、ACATの有効性を総合的に評価し、推奨コードの特徴を探求し、修正の理由を特定し、ユーザの課題と期待を理解することです。そこで本研究では,VSCode IDE用のデータ収集プラグインと,画面記録機能,コード評価機能,パーソナライズされたインタビュー・調査質問の自動生成機能を備えた実験プラットフォームを開発した。収集したデータを分析することで、ACATは一般的にタスク完了率を高め、時間を短縮し、コード品質を改善し、自己認識の生産性を向上させる。しかし、この改善は、コーディングタスクの性質とユーザエクスペリエンスレベルの両方に影響を受けている。特に、経験豊富な参加者にとって、ACATの使用は完成時間を増加させるかもしれない。また,「編集された行完成」が最も推奨される方法であるのに対し,「構成完了」と「弦完成」は受理率が最も低いことを観察した。推奨コードを変更する主な理由は、出力フォーマットと要求、欠陥のあるロジック、一貫性のないコードスタイルの相違である。課題と期待に関して、サービスアクセスとヘルプドキュメンテーションの最適化は、機能とパフォーマンスを除いて参加者によっても関係しています。本研究は,ACATの有効性とユーザビリティに関する貴重な知見を提供し,その設計と実装のさらなる改善を図っている。 Recent In-IDE AI coding assistant tools (ACATs) like GitHub Copilot have significantly impacted developers' coding habits. While some studies have examined their effectiveness, there lacks in-depth investigation into the actual assistance process. To bridge this gap, we simulate real development scenarios encompassing three typical types of software development tasks and recruit 27 computer science students to investigate their behavior with three popular ACATs. Our goal is to comprehensively assess ACATs' effectiveness, explore characteristics of recommended code, identify reasons for modifications, and understand users' challenges and expectations. To facilitate the study, we develop an experimental platform that includes a data collection plugin for VSCode IDE and provides functions for screen recording, code evaluation, and automatic generation of personalized interview and survey questions. Through analysis of the collected data, we find that ACATs generally enhance task completion rates, reduce time, improve code quality, and increase self-perceived productivity. However, the improvement is influenced by both the nature of coding tasks and users' experience level. Notably, for experienced participants, the use of ACATs may even increase completion time. We observe that "edited line completion" is the most frequently recommended way, while "comments completion" and "string completion" have the lowest acceptance rates. The primary reasons for modifying recommended code are disparities between output formats and requirements, flawed logic, and inconsistent code styles. In terms of challenges and expectations, optimization of service access and help documentation is also concerned by participants except for functionality and performance. Our study provides valuable insights into the effectiveness and usability of ACATs, informing further improvements in their design and implementation.	翻訳日:2024-04-25 20:16:34 公開日:2024-04-24
# フェデレーション学習におけるモデルポジショニングのためのレバレッジ変分グラフ表現 Leverage Variational Graph Representation For Model Poisoning on Federated Learning ( http://arxiv.org/abs/2404.15042v2 ) ライセンス: Link先を確認	Kai Li, Xin Yuan, Jingjing Zheng, Wei Ni, Falko Dressler, Abbas Jamalipour,	(参考訳) 本稿では,フェデレートラーニング(FL)に対するMP(トレーニングデータ不要モデル中毒)攻撃について述べる。新しいMPアタックは、FLのトレーニングデータにアクセスすることなく、悪質なローカルモデルのみに基づいて悪意あるローカルモデルを作成するために、逆変分グラフオートエンコーダ(VGAE)を拡張する。このような進歩はVGAE-MP攻撃に繋がる。 VGAE-MP攻撃は、良性局所モデルと訓練データ特徴間のグラフ構造相関を抽出し、逆向きにグラフ構造を再生し、逆性グラフ構造と良性モデルの特徴を用いて悪意ある局所モデルを生成する。さらに,VGAEを訓練するための良質な局所モデルの最適選択を可能にするとともに,悪質な局所モデルをVGAEと下位段階降下を用いて訓練する新たな攻撃アルゴリズムを提案する。実験では、提案したVGAE-MP攻撃下でのFLの精度が徐々に低下し、既存の防御機構が攻撃の検出に有効でないことが示され、FLに対する深刻な脅威となった。 This paper puts forth a new training data-untethered model poisoning (MP) attack on federated learning (FL). The new MP attack extends an adversarial variational graph autoencoder (VGAE) to create malicious local models based solely on the benign local models overheard without any access to the training data of FL. Such an advancement leads to the VGAE-MP attack that is not only efficacious but also remains elusive to detection. VGAE-MP attack extracts graph structural correlations among the benign local models and the training data features, adversarially regenerates the graph structure, and generates malicious local models using the adversarial graph structure and benign models' features. Moreover, a new attacking algorithm is presented to train the malicious local models using VGAE and sub-gradient descent, while enabling an optimal selection of the benign local models for training the VGAE. Experiments demonstrate a gradual drop in FL accuracy under the proposed VGAE-MP attack and the ineffectiveness of existing defense mechanisms in detecting the attack, posing a severe threat to FL.	翻訳日:2024-04-25 20:16:34 公開日:2024-04-24
# TOP-Nav:Terrin, Obstacle, Proprioception Estimationを統合した脚付きナビゲーション TOP-Nav: Legged Navigation Integrating Terrain, Obstacle and Proprioception Estimation ( http://arxiv.org/abs/2404.15256v2 ) ライセンス: Link先を確認	Junli Ren, Yikai Liu, Yingru Dai, Guijin Wang,	(参考訳) 脚のついたナビゲーションは通常、オープンワールド、オフロード、挑戦的な環境で検査される。これらのシナリオでは、外乱を推定するには、多重モーダル情報の複雑な合成が必要である。これは、主に障害を避けることに焦点を当てた既存の作業において、大きな制限となる。本研究では,包括的パスプランナとTerrain認識,Obstacle回避,クローズループプロプライオセプションを統合した新しい脚付きナビゲーションフレームワークTOP-Navを提案する。 TOP-Navは、経路計画と運動計画の両方において、視覚とプロプレセプションの相乗効果を強調している。経路プランナ内では、障害物を効果的に回避しつつ、高い走行性を有する地形上の経路をロボットが選択できる地形推定器を提示し、統合する。動作計画レベルでは、ナビゲーションコマンドを追跡するために移動制御器を実装できるだけでなく、経路プランナーに動作評価を提供するための受容アドバイザも構築する。クローズループ動作フィードバックに基づいて、視覚に基づく地形と障害物推定のオンライン修正を行う。そのため、TOP-Navは、ロボットが以前の知識の分布を超えて地形や乱れを扱えるように、オープンワールドナビゲーションを実現し、視覚条件によって課される制約を克服する。 TOP-Navは、シミュレーションと実世界の環境の両方で実施された広範な実験に基づいて、既存の手法と比較して、オープンワールドナビゲーションにおいて優れた性能を示す。 Legged navigation is typically examined within open-world, off-road, and challenging environments. In these scenarios, estimating external disturbances requires a complex synthesis of multi-modal information. This underlines a major limitation in existing works that primarily focus on avoiding obstacles. In this work, we propose TOP-Nav, a novel legged navigation framework that integrates a comprehensive path planner with Terrain awareness, Obstacle avoidance and close-loop Proprioception. TOP-Nav underscores the synergies between vision and proprioception in both path and motion planning. Within the path planner, we present and integrate a terrain estimator that enables the robot to select waypoints on terrains with higher traversability while effectively avoiding obstacles. In the motion planning level, we not only implement a locomotion controller to track the navigation commands, but also construct a proprioception advisor to provide motion evaluations for the path planner. Based on the close-loop motion feedback, we make online corrections for the vision-based terrain and obstacle estimations. Consequently, TOP-Nav achieves open-world navigation that the robot can handle terrains or disturbances beyond the distribution of prior knowledge and overcomes constraints imposed by visual conditions. Building upon extensive experiments conducted in both simulation and real-world environments, TOP-Nav demonstrates superior performance in open-world navigation compared to existing methods.	翻訳日:2024-04-25 20:16:34 公開日:2024-04-24
# OODインテント分類のためのオープンワールドロッキーチケット仮説 The Open-World Lottery Ticket Hypothesis for OOD Intent Classification ( http://arxiv.org/abs/2210.07071v3 ) ライセンス: Link先を確認	Yunhua Zhou, Pengyu Wang, Peiju Liu, Yuxin Wang, Xipeng Qiu,	(参考訳) 既存のOOD(Out-of-Domain)の意図的な分類法は、広範囲な補助的なOODコーパスや特定の訓練パラダイムに依存している。しかしながら、モデルがドメイン内および外部の意図に対する信頼を区別するべきだという基本的な原則では、これらは未発達である。本研究では,OOD上でのモデル過信の根本的な原因を明らかにするとともに,過パラメータ化モデルを用いてキャリブレーションされたサブネットを発見できることを実証する。サブネットワークが提供するキャリブレーションされた信頼性は、ほとんどすべてのポストホックメソッドの利点となるIn-とOut-of-ドメインをよりよく区別することができる。基本的な洞察をもたらすことに加えて、Luttery Ticket仮説をオープンワールドのシナリオにも拡張しています。実世界の4つのデータセットに対する広範な実験を行い、我々のアプローチが、競争力のあるベースラインのスイートと比較して一貫した改善を確立することができることを実証します。 Most existing methods of Out-of-Domain (OOD) intent classification rely on extensive auxiliary OOD corpora or specific training paradigms. However, they are underdeveloped in the underlying principle that the models should have differentiated confidence in In- and Out-of-domain intent. In this work, we shed light on the fundamental cause of model overconfidence on OOD and demonstrate that calibrated subnetworks can be uncovered by pruning the overparameterized model. Calibrated confidence provided by the subnetwork can better distinguish In- and Out-of-domain, which can be a benefit for almost all post hoc methods. In addition to bringing fundamental insights, we also extend the Lottery Ticket Hypothesis to open-world scenarios. We conduct extensive experiments on four real-world datasets to demonstrate our approach can establish consistent improvements compared with a suite of competitive baselines.	翻訳日:2024-04-25 16:34:44 公開日:2024-04-24
# 深層強化学習のための低レイテンシ適応型符号化スパイクフレームワーク A Low Latency Adaptive Coding Spiking Framework for Deep Reinforcement Learning ( http://arxiv.org/abs/2211.11760v3 ) ライセンス: Link先を確認	Lang Qin, Rui Yan, Huajin Tang,	(参考訳) 近年,低消費電力化とイベント駆動機能により,強化学習(RL)にスパイクニューラルネットワーク(SNN)が用いられている。しかし、固定符号法に苦しむスパイキング強化学習(SRL)は、高レイテンシと低汎用性の問題に直面している。本稿では,学習可能な行列乗法を用いてスパイクのエンコードとデコードを行い,コーダの柔軟性を改善し,遅延を低減する。一方、直接学習法を用いてSNNを訓練し、オンラインとオフラインのRLアルゴリズムに2つの異なる構造を用いる。超低レイテンシ(他のSRL手法の0.8%以下)と、異なるアルゴリズムと異なる環境下でのエネルギー効率(DNNの最大5倍)で最適性能を実現することを発見した。 In recent years, spiking neural networks (SNNs) have been used in reinforcement learning (RL) due to their low power consumption and event-driven features. However, spiking reinforcement learning (SRL), which suffers from fixed coding methods, still faces the problems of high latency and poor versatility. In this paper, we use learnable matrix multiplication to encode and decode spikes, improving the flexibility of the coders and thus reducing latency. Meanwhile, we train the SNNs using the direct training method and use two different structures for online and offline RL algorithms, which gives our model a wider range of applications. Extensive experiments have revealed that our method achieves optimal performance with ultra-low latency (as low as 0.8% of other SRL methods) and excellent energy efficiency (up to 5X the DNNs) in different algorithms and different environments.	翻訳日:2024-04-25 16:34:44 公開日:2024-04-24
# 画像レベルとオブジェクトレベルのセマンティック判別器を用いた構造ガイド画像補完 Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators ( http://arxiv.org/abs/2212.06310v2 ) ライセンス: Link先を確認	Haitian Zheng, Zhe Lin, Jingwan Lu, Scott Cohen, Eli Shechtman, Connelly Barnes, Jianming Zhang, Qing Liu, Yuqian Zhou, Sohrab Amirghodsi, Jiebo Luo,	(参考訳) 構造誘導画像補完は,ユーザからの入力誘導マップに従って画像の局所領域を描画することを目的としている。このようなタスクはインタラクティブな編集に多くの実用的な応用を可能にするが、既存の手法は複雑な自然の場面で現実的なオブジェクトインスタンスを幻覚させるのに苦労することが多い。このような制限は、部分的にはホール領域内の意味レベルの制約の欠如と、現実的なオブジェクト生成を強制するメカニズムの欠如によるものである。本研究では,複雑なセマンティックスやオブジェクトの生成を改善するために,セマンティック・ディミネータとオブジェクトレベル・ディミネータからなる学習パラダイムを提案する。特に、セマンティック・ディミネーターは、事前学習された視覚的特徴を利用して、生成された視覚概念の現実性を改善する。さらに、オブジェクトレベルの識別器は、個々のオブジェクトのリアリズムを強制するために、整列したインスタンスを入力として取ります。提案手法は生成品質を著しく向上させ,セグメンテーション誘導完了,エッジ誘導操作,Places2データセットのパノプティカル誘導操作など,様々なタスクにおける最先端結果を実現する。さらに、トレーニングされたモデルは柔軟で、オブジェクト挿入、置換、除去、標準塗装など、複数の編集ユースケースをサポートできます。特に、トレーニングされたモデルと新しい自動画像補完パイプラインを組み合わせることで、標準的な塗装作業における最先端の結果が得られます。 Structure-guided image completion aims to inpaint a local region of an image according to an input guidance map from users. While such a task enables many practical applications for interactive editing, existing methods often struggle to hallucinate realistic object instances in complex natural scenes. Such a limitation is partially due to the lack of semantic-level constraints inside the hole region as well as the lack of a mechanism to enforce realistic object generation. In this work, we propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects. Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts. Moreover, the object-level discriminators take aligned instances as inputs to enforce the realism of individual objects. Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks, including segmentation-guided completion, edge-guided manipulation and panoptically-guided manipulation on Places2 datasets. Furthermore, our trained model is flexible and can support multiple editing use cases, such as object insertion, replacement, removal and standard inpainting. In particular, our trained model combined with a novel automatic image completion pipeline achieves state-of-the-art results on the standard inpainting task.	翻訳日:2024-04-25 16:34:44 公開日:2024-04-24
# 授業増分学習における効果的な意思決定境界学習 Effective Decision Boundary Learning for Class Incremental Learning ( http://arxiv.org/abs/2301.05180v3 ) ライセンス: Link先を確認	Chaoyue Ding, Kunchi Li, Jun Wan, Shan Yu,	(参考訳) クラスインクリメンタルラーニング(CIL)におけるリハーサルアプローチは、知識蒸留のための古いクラスデータの不足と、記憶メモリが限られているため、学習と新しいクラス間の不均衡なデータ学習という2つの要因によって、新しいクラスに過度に適合する決定境界に悩まされる。本研究では,これらの2つの要因に対処するための,単純かつ効果的なアプローチを提案する。まず、再サンプリング戦略とMixup K {\displaystyle K}nowledge D}istillation (Re-MKD)を用いて、KDの性能を改善する。具体的には、学習されたクラスと新しいクラス間の潜伏分布とより整合したKDトレーニングで使用される適切なデータを合成するために、ミックスアップと再サンプリングの戦略を組み合わせる。次に, インフルエンスバランス法をCIL設定に拡張することにより, インクリメンタルインフルエンスバランス(IIB)法を提案する。これら2つの改善により、KDの性能を改善し、不均衡なデータ学習を同時に扱う効果的な決定境界学習アルゴリズム(EDBL)を提案する。実験の結果、EDBLはいくつかのCILベンチマークで最先端のパフォーマンスを達成できた。 Rehearsal approaches in class incremental learning (CIL) suffer from decision boundary overfitting to new classes, which is mainly caused by two factors: insufficiency of old classes data for knowledge distillation and imbalanced data learning between the learned and new classes because of the limited storage memory. In this work, we present a simple but effective approach to tackle these two factors. First, we employ a re-sampling strategy and Mixup K}nowledge D}istillation (Re-MKD) to improve the performances of KD, which would greatly alleviate the overfitting problem. Specifically, we combine mixup and re-sampling strategies to synthesize adequate data used in KD training that are more consistent with the latent distribution between the learned and new classes. Second, we propose a novel incremental influence balance (IIB) method for CIL to tackle the classification of imbalanced data by extending the influence balance method into the CIL setting, which re-weights samples by their influences to create a proper decision boundary. With these two improvements, we present the effective decision boundary learning algorithm (EDBL) which improves the performance of KD and deals with the imbalanced data learning simultaneously. Experiments show that the proposed EDBL achieves state-of-the-art performances on several CIL benchmarks.	翻訳日:2024-04-25 16:34:44 公開日:2024-04-24
# 組込み検索アライメント:トランスフォーマーモデルを用いたDNA配列アライメント Embed-Search-Align: DNA Sequence Alignment using Transformer Models ( http://arxiv.org/abs/2309.11087v4 ) ライセンス: Link先を確認	Pavan Holur, K. C. Enevoldsen, Shreyas Rajesh, Lajoyce Mboning, Thalia Georgiou, Louis-S. Bouchard, Matteo Pellegrini, Vwani Roychowdhury,	(参考訳) DNA配列のアライメントは、幅広い参照ゲノム上の最も可能性の高い場所に短いDNA読取を割り当てることを含む。このプロセスは、変異呼び出し、転写学、エピジェノミクスを含む様々なゲノム解析に不可欠である。何十年にもわたって洗練されてきた従来の手法では、ゲノムインデクシングと効率的な検索という2つのステップでこの課題に対処している。テキストを埋め込みに符号化するLarge Language Models (LLM) の成功に基づいて、距離メートル法が意味的類似性を捉え、近年の取り組みでは、同じトランスフォーマーアーキテクチャがDNA配列の数値表現を生成できるかどうかが検討されている。このようなモデルは、コーディングと非コーディング領域の検出、エンハンサーとプロモーター配列の同定など、短いDNA配列の分類を含むタスクにおいて、早期に有望であることが示されている。しかし、配列分類タスクのパフォーマンスは配列アライメントに変換されず、全ての読み出しをうまく整列するためにゲノムワイド検索を行う必要がある。我々は、このオープンな問題をEmbed-Search-Alignタスクとしてフレーミングすることで解決する。この枠組みでは、新しいエンコーダモデルDNA-ESAが参照の読み取りとフラグメントの表現を生成し、リードフラグメント距離をアライメントの代理として使用する共有ベクトル空間に投影する。特にDNA-ESAは,(1)DNA配列表現の自己教師的訓練における対照的な損失,(2)断片を世界規模で探索するためのDNAベクターストアを導入している。 DNA-ESAは、250長の読み取りを3ギガ塩基(単一ハプロイド)のヒト基準ゲノムに合わせると97%正確であり、最近の6つのDNA-トランスフォーマーモデルベースラインのパフォーマンスをはるかに上回り、染色体や種間のタスク転送を示す。 DNA sequence alignment involves assigning short DNA reads to the most probable locations on an extensive reference genome. This process is crucial for various genomic analyses, including variant calling, transcriptomics, and epigenomics. Conventional methods, refined over decades, tackle this challenge in two steps: genome indexing followed by efficient search to locate likely positions for given reads. Building on the success of Large Language Models (LLM) in encoding text into embeddings, where the distance metric captures semantic similarity, recent efforts have explored whether the same Transformer architecture can produce numerical representations for DNA sequences. Such models have shown early promise in tasks involving classification of short DNA sequences, such as the detection of coding vs non-coding regions, as well as the identification of enhancer and promoter sequences. Performance at sequence classification tasks does not, however, translate to sequence alignment, where it is necessary to conduct a genome-wide search to successfully align every read. We address this open problem by framing it as an Embed-Search-Align task. In this framework, a novel encoder model DNA-ESA generates representations of reads and fragments of the reference, which are projected into a shared vector space where the read-fragment distance is used as surrogate for alignment. In particular, DNA-ESA introduces: (1) Contrastive loss for self-supervised training of DNA sequence representations, facilitating rich sequence-level embeddings, and (2) a DNA vector store to enable search across fragments on a global scale. DNA-ESA is >97% accurate when aligning 250-length reads onto a human reference genome of 3 gigabases (single-haploid), far exceeds the performance of 6 recent DNA-Transformer model baselines and shows task transfer across chromosomes and species.	翻訳日:2024-04-25 16:34:44 公開日:2024-04-24
# ランダムに結合したパウリスピンのモデル A model of randomly-coupled Pauli spins ( http://arxiv.org/abs/2309.15349v3 ) ライセンス: Link先を確認	Masanori Hanada, Antal Jevicki, Xianlong Liu, Enrico Rinaldi, Masaki Tezuka,	(参考訳) 我々は、SYKモデルにおけるマヨラナフェルミオンをスピン作用素に置き換えることで、全ての4-局所相互作用を持つパウリスピン作用素のモデルを構築する。同様に、フェルミオンをハードコアボソンに置き換える。このモデルを数値的に検討し,その特性をSYKモデルと比較する。我々はスピンモデルとSYKモデルとの顕著な定量的な一致を観察し、このスピンモデルは強いカオスであり、ホログラフィーにおいて何らかの役割を果たす可能性があることを示唆している。また、多局所場を用いた経路積分法と量子シミュレーションの可能性についても論じる。パウリスピンは量子ビットベースの量子デバイス上でのフェルミオンよりも実装が容易であるため、このモデルは量子シミュレーションの興味深いターゲットになるかもしれない。 We construct a model of Pauli spin operators with all-to-all 4-local interactions by replacing Majorana fermions in the SYK model with spin operators. Equivalently, we replace fermions with hard-core bosons. We study this model numerically and compare the properties with those of the SYK model. We observe a striking quantitative coincidence between the spin model and the SYK model, which suggests that this spin model is strongly chaotic and, perhaps, can play some role in holography. We also discuss the path-integral approach with multi-local fields and the possibility of quantum simulations. This model may be an interesting target for quantum simulations because Pauli spins are easier to implement than fermions on qubit-based quantum devices.	翻訳日:2024-04-25 16:34:44 公開日:2024-04-24
# 未知のトークンによる学習は、より強力な視力学習者を駆り立てる Learning with Unmasked Tokens Drives Stronger Vision Learners ( http://arxiv.org/abs/2310.13593v2 ) ライセンス: Link先を確認	Taekyung Kim, Sanghyuk Chun, Byeongho Heo, Dongyoon Han,	(参考訳) マスク付き画像モデリング(MIM)は,自己指導型学習戦略の先駆けとなる。 Masked Autoencoder (MAE) のようなMIMは、入力トークンをランダムにマスキングして処理し、デコーダが入力にマスクされたトークンを再構成することで、強力な表現を学ぶ。しかし、MIM事前訓練エンコーダは、マスク付きトークンのみを回帰することにのみ焦点をあてているため、限られた注意幅を持つことが多いため、エンコーダのより広範な文脈学習を阻害する可能性がある。この制限に対処するため、トレーニングプロセスに無意味なトークンを明示的に組み込むことによりMIMを改善する。具体的には,デコーダがマスク付きトークンを再構成している間に,アンマスク付きトークンが広いコンテキストを体験できるようにする。このように、符号化されたアンマスクトークンは、広範囲なコンテキスト情報を備えており、マスクされたトークンはMIMの強化されたアンマスクトークンを利用することができる。その結果,ImageNet-1K上でのVT-Bによる84.2%のトップ-1の精度と0.6%の利得を達成して,より差別的な表現を訓練した。この成功は、特異値スペクトルと注意分析によって証明されたように、事前学習の強化によるものである。最後に、下流のセマンティックセグメンテーションときめ細かい視覚的分類タスク、そして多様なロバストな評価指標において、我々のモデルは大きなパフォーマンス向上を達成する。コードはhttps://github.com/naver-ai/lutで入手できる。 Masked image modeling (MIM) has become a leading self-supervised learning strategy. MIMs such as Masked Autoencoder (MAE) learn strong representations by randomly masking input tokens for the encoder to process, with the decoder reconstructing the masked tokens to the input. However, MIM pre-trained encoders often exhibit a limited attention span, attributed to MIM's sole focus on regressing masked tokens only, which may impede the encoder's broader context learning. To tackle the limitation, we improve MIM by explicitly incorporating unmasked tokens into the training process. Specifically, our method enables the encoder to learn from broader context supervision, allowing unmasked tokens to experience broader contexts while the decoder reconstructs masked tokens. Thus, the encoded unmasked tokens are equipped with extensive contextual information, empowering masked tokens to leverage the enhanced unmasked tokens for MIM. As a result, our simple remedy trains more discriminative representations revealed by achieving 84.2% top-1 accuracy with ViT-B on ImageNet-1K with 0.6%p gain. We attribute the success to the enhanced pre-training method, as evidenced by the singular value spectrum and attention analyses. Finally, our models achieve significant performance gains at the downstream semantic segmentation and fine-grained visual classification tasks; and on diverse robust evaluation metrics. Code is available at https://github.com/naver-ai/lut	翻訳日:2024-04-25 16:25:00 公開日:2024-04-24
# DEFT:教師なしコアセット選択による事前学習言語モデルのためのデータ効率の良い微調整 DEFT: Data Efficient Fine-Tuning for Pre-Trained Language Models via Unsupervised Core-Set Selection ( http://arxiv.org/abs/2310.16776v4 ) ライセンス: Link先を確認	Devleena Das, Vivek Khetan,	(参考訳) 近年の進歩により、多くの事前学習言語モデル(PLM)が利用可能になったが、ダウンストリームタスクでPLMを微調整するには、どの程度のデータが必要か、疑問が残る。本研究では、教師なしコアセット選択を利用したデータ効率のよい微調整フレームワークであるDEFT-UCSを導入し、下流タスクの微調整に必要なデータ量を削減するために、より小型で代表的なデータセットを識別する。テキスト編集 LM の文脈における DEFT-UCS の有効性について検討し,最先端のテキスト編集モデルである CoEDIT との比較を行った。以上の結果から, DEFT-UCSモデルは,6つの編集タスクからなる8つのデータセットに対して,CoEDITと同程度の精度で,70%の精度で微調整できることがわかった。 Recent advances have led to the availability of many pre-trained language models (PLMs); however, a question that remains is how much data is truly needed to fine-tune PLMs for downstream tasks? In this work, we introduce DEFT-UCS, a data-efficient fine-tuning framework that leverages unsupervised core-set selection to identify a smaller, representative dataset that reduces the amount of data needed to fine-tune PLMs for downstream tasks. We examine the efficacy of DEFT-UCS in the context of text-editing LMs, and compare to the state-of-the art text-editing model, CoEDIT. Our results demonstrate that DEFT-UCS models are just as accurate as CoEDIT, across eight different datasets consisting of six different editing tasks, while finetuned on 70% less data.	翻訳日:2024-04-25 16:25:00 公開日:2024-04-24
# ZeroNVS: 単一画像からのゼロショット360度ビュー合成 ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image ( http://arxiv.org/abs/2310.17994v2 ) ライセンス: Link先を確認	Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, Jiajun Wu,	(参考訳) そこで,本研究では3次元拡散モデルであるZeroNVSを導入し,ワンイメージの新たなビュー合成手法を提案する。既存の手法は,暗黙の背景を持つ単一オブジェクトに対して設計されているが,複雑な背景を持つマルチオブジェクトシーンによってもたらされる課題に対処する新しい手法を提案する。具体的には、オブジェクト中心、屋内、屋外のシーンをキャプチャするデータソースの混合に基づいて、生成をトレーニングする。深度スケールのあいまいさなどのデータ混合問題に対処するため,新しいカメラ条件付パラメータ化と正規化方式を提案する。さらに,SDS (Score Distillation Sampling) は,360度シーンの蒸留時に複雑な背景の分布を小さくする傾向にあり,合成された新規なビューの多様性を向上させるために「SDSアンカー」を提案する。我々のモデルは、DTUデータセット上のLPIPSをゼロショット設定で設定し、DTUで特別に訓練された方法よりも優れた結果を得る。我々はさらに、シングルイメージの新規ビュー合成のための新しいベンチマークとして、挑戦的なMip-NeRF 360データセットを適用し、この設定で強い性能を示す。私たちのコードとデータはhttp://kylesargent.github.io/zeronvs/です。 We introduce a 3D-aware diffusion model, ZeroNVS, for single-image novel view synthesis for in-the-wild scenes. While existing methods are designed for single objects with masked backgrounds, we propose new techniques to address challenges introduced by in-the-wild multi-object scenes with complex backgrounds. Specifically, we train a generative prior on a mixture of data sources that capture object-centric, indoor, and outdoor scenes. To address issues from data mixture such as depth-scale ambiguity, we propose a novel camera conditioning parameterization and normalization scheme. Further, we observe that Score Distillation Sampling (SDS) tends to truncate the distribution of complex backgrounds during distillation of 360-degree scenes, and propose "SDS anchoring" to improve the diversity of synthesized novel views. Our model sets a new state-of-the-art result in LPIPS on the DTU dataset in the zero-shot setting, even outperforming methods specifically trained on DTU. We further adapt the challenging Mip-NeRF 360 dataset as a new benchmark for single-image novel view synthesis, and demonstrate strong performance in this setting. Our code and data are at http://kylesargent.github.io/zeronvs/	翻訳日:2024-04-25 16:25:00 公開日:2024-04-24
# 多体非エルミート系の位相位相 Topological phases of many-body non-Hermitian systems ( http://arxiv.org/abs/2311.03043v3 ) ライセンス: Link先を確認	Kui Cao, Su-Peng Kou,	(参考訳) 多体フェルミオン非エルミタン系は、それぞれエネルギーバンドと量子状態の位相を記述するために2つの異なる位相不変量を必要とするが、後者はまだ探索されていない。粒子ホール, 線形化時間反転, 線形化キラル対称性によって決定される10種類の対称性クラスを同定する。各クラスは各次元に関連する位相不変量を持ち、量子状態の位相を決定する。これらの知見は、多体非エルミート系の位相位相のより深い理解の道を開くものである。 We show that many-body fermionic non-Hermitian systems require two distinct sets of topological invariants to describe the topology of energy bands and quantum states respectively, with the latter yet to be explored. We identify 10 symmetry classes -- determined by particle-hole, linearized time-reversal, and linearized chiral symmetries. Each class has topological invariant associated with each dimension, dictating the topology of quantum states. These findings pave the way for deeper understanding of the topological phases of many-body non-Hermitian systems.	翻訳日:2024-04-25 16:25:00 公開日:2024-04-24
# TransformCode: サブツリー変換によるコード埋め込みのためのコントラスト学習フレームワーク TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree Transformation ( http://arxiv.org/abs/2311.08157v2 ) ライセンス: Link先を確認	Zixiang Xian, Rubing Huang, Dave Towey, Chunrong Fang, Zhenyu Chen,	(参考訳) 人工知能(AI)は、ソフトウェア開発の効率を向上させることによって、ソフトウェアエンジニアリング(SE)に革命をもたらした。トランスファーラーニングを活用した事前学習モデル(PTM)の出現により、SEのためのAIは大幅に進歩した。しかし、個々のコードトークンを操作する既存のPTMには、いくつかの制限がある。それらは、トレーニングと微調整にコストがかかり、タスク固有のデータセットを微調整するためにラベル付きデータに大きく依存している。本稿では,コード埋め込みを対照的な学習方法で学習する新しいフレームワークであるTransformCodeを提案する。我々のフレームワークはエンコーダに依存しない言語に依存しないので、どんなエンコーダモデルでも活用でき、どんなプログラミング言語でも扱える。また,抽象構文木変換(AST)変換と呼ばれる新しいデータ拡張手法を提案し,構文的および意味的変換を元のコードスニペットに適用し,より多様で頑健なサンプルを生成する。既存の手法に対して、柔軟性と適応性があり、コード表現を必要とする他のダウンストリームタスク(コードクローンの検出や分類など)に容易に拡張できるため、2)大規模モデルや大量のトレーニングデータを必要としないため、効率的でスケーラブルであり、あらゆるプログラミング言語をサポートできるため、(3)教師なし学習に限らず、タスク固有のラベルや目的を組み込むことで教師あり学習タスクにも適用できる、(4)コンピュータリソースに基づいたエンコーダパラメータの調整も可能である。我々は,いくつかのコード関連タスクにおけるフレームワークの評価を行い,SourcererCC,Code2vec,InferCodeといった最先端のメソッドよりも有効性と優位性を示す。 Artificial intelligence (AI) has revolutionized software engineering (SE) by enhancing software development efficiency. The advent of pre-trained models (PTMs) leveraging transfer learning has significantly advanced AI for SE. However, existing PTMs that operate on individual code tokens suffer from several limitations: They are costly to train and fine-tune; and they rely heavily on labeled data for fine-tuning on task-specific datasets. In this paper, we present TransformCode, a novel framework that learns code embeddings in a contrastive learning manner. Our framework is encoder-agnostic and language-agnostic, which means that it can leverage any encoder model and handle any programming language. We also propose a novel data-augmentation technique called abstract syntax tree (AST) transformation, which applies syntactic and semantic transformations to the original code snippets, to generate more diverse and robust samples for contrastive learning. Our framework has several advantages over existing methods: (1) It is flexible and adaptable, because it can easily be extended to other downstream tasks that require code representation (such as code-clone detection and classification); (2) it is efficient and scalable, because it does not require a large model or a large amount of training data, and it can support any programming language; (3) it is not limited to unsupervised learning, but can also be applied to some supervised learning tasks by incorporating task-specific labels or objectives; and (4) it can also adjust the number of encoder parameters based on computing resources. We evaluate our framework on several code-related tasks, and demonstrate its effectiveness and superiority over the state-of-the-art methods such as SourcererCC, Code2vec, and InferCode.	翻訳日:2024-04-25 16:25:00 公開日:2024-04-24
# 行動異常検出のための多レベル誘導探索ネットワークと行動シーンマッチング法 A Multilevel Guidance-Exploration Network and Behavior-Scene Matching Method for Human Behavior Anomaly Detection ( http://arxiv.org/abs/2312.04119v2 ) ライセンス: Link先を確認	Guoqing Yang, Zhiming Luo, Jianzhe Gao, Yingxin Lai, Kun Yang, Yifan He, Shaozi Li,	(参考訳) 人間の行動異常検出は、知的監視やその他の領域において重要な役割を果たす、異常な人間の行動を特定することを目的としている。現在の主流の手法では、再構築や将来のフレーム予測技術が採用されている。しかし、低レベルのピクセルの特徴を再構成したり予測したりすることで、ネットワークが過度に強力な一般化能力を達成し、異常を再構築したり、通常のデータと同じくらい効果的に予測することができる。学生-教師ネットワークにインスパイアされたこれらの手法とは違って,多段階誘導探索ネットワーク(MGENet)と呼ばれる,誘導探索ネットワークと探索ネットワークの高レベル表現の違いから異常を検出する新しいフレームワークを提案する。具体的には、まず骨格キーポイントを入力とし、RGBエンコーダを誘導する学習済み正規化フローを用いて、未知のRGBフレームを入力として取り込んで、動作遅延特徴を探索する。次に、RGBエンコーダはマスク付きRGBフレームを入力として用いたマスクエンコーダをガイドし、潜伏した外観特徴を探索する。さらに、シーン関連行動異常を検出するための行動シーンマッチングモジュール(BSMM)を設計する。 AUCは86.9 %, UB正規データセットは73.5 %であった。コードはhttps://github.com/molu-ggg/GENet.comで入手できる。 Human behavior anomaly detection aims to identify unusual human actions, playing a crucial role in intelligent surveillance and other areas. The current mainstream methods still adopt reconstruction or future frame prediction techniques. However, reconstructing or predicting low-level pixel features easily enables the network to achieve overly strong generalization ability, allowing anomalies to be reconstructed or predicted as effectively as normal data. Different from their methods, inspired by the Student-Teacher Network, we propose a novel framework called the Multilevel Guidance-Exploration Network(MGENet), which detects anomalies through the difference in high-level representation between the Guidance and Exploration network. Specifically, we first utilize the pre-trained Normalizing Flow that takes skeletal keypoints as input to guide an RGB encoder, which takes unmasked RGB frames as input, to explore motion latent features. Then, the RGB encoder guides the mask encoder, which takes masked RGB frames as input, to explore the latent appearance feature. Additionally, we design a Behavior-Scene Matching Module(BSMM) to detect scene-related behavioral anomalies. Extensive experiments demonstrate that our proposed method achieves state-of-the-art performance on ShanghaiTech and UBnormal datasets, with AUC of 86.9 % and 73.5 %, respectively. The code will be available on https://github.com/molu-ggg/GENet.	翻訳日:2024-04-25 16:25:00 公開日:2024-04-24
# AIアセスメント尺度(AIAS) : 教育評価におけるジェネレーティブAIの倫理的統合のためのフレームワーク The AI Assessment Scale (AIAS): A Framework for Ethical Integration of Generative AI in Educational Assessment ( http://arxiv.org/abs/2312.07086v2 ) ライセンス: Link先を確認	Mike Perkins, Leon Furze, Jasper Roe, Jason MacVaugh,	(参考訳) ジェネレーティブ・人工知能(GenAI)の最近の進歩は、社会の複数の領域におけるパラダイムシフトを生み出しており、これらの技術の使用は今後数十年で教育の明確な特徴となる可能性が高い。 GenAIは変革的な教育の機会を提供し、同時に倫理的・学術的な課題を提起する。このような背景から、我々はGenAIツールを教育アセスメントに統合するための実用的でシンプルで十分に包括的なツール、AIAS(AI Assessment Scale)を概説した。 AIASは、教育者に対して、彼らが解決しようとしている学習結果に基づいて、評価においてGenAIの使用の適切なレベルを選択する権限を与える。 AIASは、学生や教育者に対してより明確で透明性を提供し、機関が協力し合うための公平で公平なポリシーツールを提供し、GenAIの機会を受け入れつつ、そのようなツールが教育的に適切でなくても必要な場合もあることを認識しながら、ニュアンスなアプローチを提供する。実践的でフレキシブルなアプローチを迅速に実施することで、AIASは、教育におけるGenAIに関する現在の不確実性と不安に対処するための、非常に必要な出発点を形成することができる。第二の目的として、教育におけるGenAIツールについて、現在の学術的不正行為のファシリテーターとしてのGenAIに焦点をあてているのとは対照的に、テクノロジーが教育と学習を支援・強化する上でどのように役立つかという、教育におけるGenAIツールに関する再焦点の言説を提唱する。 Recent developments in Generative Artificial Intelligence (GenAI) have created a paradigm shift in multiple areas of society, and the use of these technologies is likely to become a defining feature of education in coming decades. GenAI offers transformative pedagogical opportunities, while simultaneously posing ethical and academic challenges. Against this backdrop, we outline a practical, simple, and sufficiently comprehensive tool to allow for the integration of GenAI tools into educational assessment: the AI Assessment Scale (AIAS). The AIAS empowers educators to select the appropriate level of GenAI usage in assessments based on the learning outcomes they seek to address. The AIAS offers greater clarity and transparency for students and educators, provides a fair and equitable policy tool for institutions to work with, and offers a nuanced approach which embraces the opportunities of GenAI while recognising that there are instances where such tools may not be pedagogically appropriate or necessary. By adopting a practical, flexible approach that can be implemented quickly, the AIAS can form a much-needed starting point to address the current uncertainty and anxiety regarding GenAI in education. As a secondary objective, we engage with the current literature and advocate for a refocused discourse on GenAI tools in education, one which foregrounds how technologies can help support and enhance teaching and learning, which contrasts with the current focus on GenAI as a facilitator of academic misconduct.	翻訳日:2024-04-25 16:15:09 公開日:2024-04-24
# ランダムベクトル関数リンクを用いた軽量ランダム化非線形辞書学習法 A Lightweight Randomized Nonlinear Dictionary Learning Method using Random Vector Functional Link ( http://arxiv.org/abs/2402.03833v2 ) ライセンス: Link先を確認	G. Madhuri, Atul Negi,	(参考訳) カーネルベースの非線形辞書学習法は、暗黙的特徴写像によって得られる特徴空間で動作し、特異値分解(SVD)のような計算コストの高い演算とは独立していない。本稿では,ランダムベクトル関数リンク(RVFL)と呼ばれるランダム化関数リンクを用いて,非線形辞書を学習するためのSVDフリー軽量アプローチを提案する。提案したRVFLに基づく非線形辞書学習(RVFLDL)は,非線形スパース係数から高密度入力特徴へのスパース・トゥ・デンス特徴写像として辞書を学習する。初期乱数辞書のスパース係数 w.r.t は、ホースシュー先行を仮定して導出され、軽量ネットワークとなる入力として使用される。 RVFLは入力から出力層への重みを解析的に生成するので、RVFLベースの辞書のトレーニングはSVD計算から解放される。入力スパース係数と辞書原子との高次依存関係は、スパース係数を非線形に変換し、強化された特徴として付加することにより、トレーニングプロセスに組み込む。したがって、この方法は、辞書に非線形性を誘導しながら、より高次元空間にスパース係数を投影する。 RVFL-netを用いて分類するために、分類行列は非線形スパース係数をラベルにマッピングする変換として学習される。画像分類および再構成アプリケーションで示される手法の実証的証拠は、RVFLDLはスケーラブルであり、他の非線形辞書学習法よりも優れた解を提供することを示している。 Kernel-based nonlinear dictionary learning methods operate in a feature space obtained by an implicit feature map, and they are not independent of computationally expensive operations like Singular Value Decomposition (SVD). This paper presents an SVD-free lightweight approach to learning a nonlinear dictionary using a randomized functional link called a Random Vector Functional Link (RVFL). The proposed RVFL-based nonlinear Dictionary Learning (RVFLDL) learns a dictionary as a sparse-to-dense feature map from nonlinear sparse coefficients to the dense input features. Sparse coefficients w.r.t an initial random dictionary are derived by assuming Horseshoe prior are used as inputs making it a lightweight network. Training the RVFL-based dictionary is free from SVD computation as RVFL generates weights from the input to the output layer analytically. Higher-order dependencies between the input sparse coefficients and the dictionary atoms are incorporated into the training process by nonlinearly transforming the sparse coefficients and adding them as enhanced features. Thus the method projects sparse coefficients to a higher dimensional space while inducing nonlinearities into the dictionary. For classification using RVFL-net, a classifier matrix is learned as a transform that maps nonlinear sparse coefficients to the labels. The empirical evidence of the method illustrated in image classification and reconstruction applications shows that RVFLDL is scalable and provides a solution better than those obtained using other nonlinear dictionary learning methods.	翻訳日:2024-04-25 16:15:09 公開日:2024-04-24
# 動作コード:スパース変分多確率過程学習によるロバスト時系列分類と予測 Motion Code: Robust Time series Classification and Forecasting via Sparse Variational Multi-Stochastic Processes Learning ( http://arxiv.org/abs/2402.14081v2 ) ライセンス: Link先を確認	Chandrajit Bajaj, Minh Nguyen,	(参考訳) 広範に研究されているにもかかわらず、ノイズの多いデータの時系列分類と予測は非常に困難である。主な課題は、時系列を記述するのに適した数学的概念を見つけ、真の信号から効果的にノイズを分離することである。時系列を静的ベクトルやデータシーケンスとして扱う代わりに、連続時間確率過程のサンプル化として、必ずしも固定長ではない各時系列を考察する新しいフレームワークを導入する。このような数学的モデルは、複数のタイムスタンプにまたがるデータ依存を明示的に捉え、ノイズから隠れた時間依存信号を検出する。しかし、基礎となるデータはいくつかの異なるダイナミクスで構成されていることが多いため、単一の確率過程を用いたモデリングは不十分である。このような設定に対処するため、まず各ダイナミクスにシグネチャベクトルを割り当てる。次に、割り当てられたベクトルに基づいて個々のダイナミクスのスパース近似を推測する最も情報性の高いタイムスタンプの抽象的概念を提案する。最終的なモデルであるMotion Codeには、さまざまな基盤となるダイナミクスを統合的に完全にキャプチャ可能なパラメータが含まれている。これにより、未混合の分類と特定のサブタイプの予測を同時に生成することができる。センサやデバイスに関する大規模な実験は、時系列の分類と予測ベンチマークに対するモーションコードの競争性を実証している。 Despite being extensively studied, time series classification and forecasting on noisy data remain highly difficult. The main challenges lie in finding suitable mathematical concepts to describe time series and effectively separating noise from the true signals. Instead of treating time series as a static vector or a data sequence as often seen in previous methods, we introduce a novel framework that considers each time series, not necessarily of fixed length, as a sample realization of a continuous-time stochastic process. Such mathematical model explicitly captures the data dependence across several timestamps and detects the hidden time-dependent signals from noise. However, since the underlying data is often composed of several distinct dynamics, modeling using a single stochastic process is not sufficient. To handle such settings, we first assign each dynamics a signature vector. We then propose the abstract concept of the most informative timestamps to infer a sparse approximation of the individual dynamics based on their assigned vectors. The final model, referred to as Motion Code, contains parameters that can fully capture different underlying dynamics in an integrated manner. This allows unmixing classification and generation of specific sub-type forecasting simultaneously. Extensive experiments on sensors and devices noisy time series data demonstrate Motion Code's competitiveness against time series classification and forecasting benchmarks.	翻訳日:2024-04-25 16:15:09 公開日:2024-04-24
# ヒト皮膚剥離法の比較 Comparison of Methods in Human Skin Decomposition ( http://arxiv.org/abs/2404.00552v2 ) ライセンス: Link先を確認	Hao Gong, Michel Desvignes,	(参考訳) 皮膚色素の分解は医療分野において重要な役割を担っている。ヒトの皮膚はヘモグロビンとメラニンの2つの原始成分に分解することができる。皮膚癌の診断にこれらの結果を適用することが目的である。本稿では, 皮膚色素の分解法を比較検討し, 理論的および実験的に各方法の性能評価を行った。また, 等尺的特徴マッピング (Isomap) を導入し, 皮膚分解の文脈における寸法低減性能を向上させる。 Decomposition of skin pigment plays an important role in medical fields. Human skin can be decomposed into two primitive components, hemoglobin and melanin. It is our goal to apply these results for diagnosis of skin cancer. In this paper, various methods for skin pigment decomposition are reviewed comparatively and the performance of each method is evaluated both theoretically and experimentally. In addition, isometric feature mapping (Isomap) is introduced in order to improve the dimensionality reduction performance in context of skin decomposition.	翻訳日:2024-04-25 16:15:09 公開日:2024-04-24
# 学習したグアシアンスプラッツレンダリングと微調整拡散特性による雲の復元とデノナイズ Few-shot point cloud reconstruction and denoising via learned Guassian splats renderings and fine-tuned diffusion features ( http://arxiv.org/abs/2404.01112v4 ) ライセンス: Link先を確認	Pietro Bonazzi, Marie-Julie Rakatosaona, Marco Cannici, Federico Tombari, Davide Scaramuzza,	(参考訳) 点雲の復元と復調のための既存のディープラーニング手法は、3次元形状の小さなデータセットに依存している。何十億もの画像で訓練されたディープラーニング手法を活用することで、この問題を回避する。画像ベース深層学習モデルから抽出した事前知識を利用して,少ない画像から点雲を再構成し,そのレンダリングから点雲を識別する手法を提案する。制約設定の再構築を改善するために,意味的整合性管理を導入することで,ハイブリッド表面と外観の相違可能なレンダラーのトレーニングを規則化する。さらに、ノイズの多い点雲の描画を微調整する安定拡散パイプラインを提案し、これらの学習されたフィルタを用いて、3Dの監督なしに来る点雲ノイズを除去する方法を実証する。提案手法をDSSとPointRadianceと比較し,Sketchfab TestsetとSCUT Datasetで高品質な3D再構成を実現した。 Existing deep learning methods for the reconstruction and denoising of point clouds rely on small datasets of 3D shapes. We circumvent the problem by leveraging deep learning methods trained on billions of images. We propose a method to reconstruct point clouds from few images and to denoise point clouds from their rendering by exploiting prior knowledge distilled from image-based deep learning models. To improve reconstruction in constraint settings, we regularize the training of a differentiable renderer with hybrid surface and appearance by introducing semantic consistency supervision. In addition, we propose a pipeline to finetune Stable Diffusion to denoise renderings of noisy point clouds and we demonstrate how these learned filters can be used to remove point cloud noise coming without 3D supervision. We compare our method with DSS and PointRadiance and achieved higher quality 3D reconstruction on the Sketchfab Testset and SCUT Dataset.	翻訳日:2024-04-25 16:15:09 公開日:2024-04-24
# シーングラフからの3次元シーン生成と自己注意 3D scene generation from scene graphs and self-attention ( http://arxiv.org/abs/2404.01887v3 ) ライセンス: Link先を確認	Pietro Bonazzi, Mengqi Wang, Diego Martin Arroyo, Fabian Manhardt, Nico Messikomer, Federico Tombari, Davide Scaramuzza,	(参考訳) リアルで多様な屋内3Dシーンレイアウトをコントロール可能な方法で合成することで、シミュレートされたナビゲーションとバーチャルリアリティーの応用が開かれる。シーンの簡潔で堅牢な表現として、シーングラフは生成されたレイアウトのセマンティックコントロールとして適していることが証明されている。本稿では,シーングラフとフロアプランから3次元シーンを合成する条件付き変分オートエンコーダ(cVAE)モデルを提案する。我々は、シーン内のオブジェクト間の高レベルな関係をキャプチャするために、自己注意層の特性を利用し、これらをモデルの構築ブロックとして使用します。本モデルでは,室内の物体の大きさ,寸法,配向を推定するために,所定のシーングラフ内の関係を満足させながらグラフトランスフォーマーを利用する。実験では、自己保持層がスペーサー(Graphto3Dの7.9倍)とより多様なシーン(16%)につながることが示された。 Synthesizing realistic and diverse indoor 3D scene layouts in a controllable fashion opens up applications in simulated navigation and virtual reality. As concise and robust representations of a scene, scene graphs have proven to be well-suited as the semantic control on the generated layout. We present a variant of the conditional variational autoencoder (cVAE) model to synthesize 3D scenes from scene graphs and floor plans. We exploit the properties of self-attention layers to capture high-level relationships between objects in a scene, and use these as the building blocks of our model. Our model, leverages graph transformers to estimate the size, dimension and orientation of the objects in a room while satisfying relationships in the given scene graph. Our experiments shows self-attention layers leads to sparser (7.9x compared to Graphto3D) and more diverse scenes (16%).	翻訳日:2024-04-25 16:15:08 公開日:2024-04-24
# 暗黒でテキストを見る:アルゴリズムとベンチマーク Seeing Text in the Dark: Algorithm and Benchmark ( http://arxiv.org/abs/2404.08965v3 ) ライセンス: Link先を確認	Chengpei Xu, Hao Fu, Long Ma, Wenjing Jia, Chengqi Zhang, Feng Xia, Xiaoyu Ai, Binghao Li, Wenjie Zhang,	(参考訳) 低照度環境におけるテキストのローカライズは、視覚的劣化のため難しい。簡単な解法は低照度画像強調(LLE)を最初のステップとして検出する2段階のパイプラインを含むが、LLEは主に機械ではなく人間の視覚用に設計されており、エラーを蓄積することができる。そこで本研究では,LLEの必要性を回避するために,暗黒テキストのローカライズのための効率的かつ効果的な単一ステージアプローチを提案する。テキスト検出器の訓練段階において,制約付き学習モジュールを補助機構として導入する。このモジュールは、特徴マップリサイズ中のテキスト空間的特徴を保存するためのテキスト検出器のガイドとして設計されており、低照度の視覚的劣化下でのテキスト中の空間情報の損失を最小限に抑える。具体的には、本モジュール内に空間的再構成と空間的意味制約を組み込んで、テキスト検出器が本質的な位置的・文脈的範囲の知識を取得することを保証する。提案手法は,テキストの局所的トポロジ的特徴を動的ヘビ特徴ピラミッドネットワークを用いて同定し,新しい長方形累積法によるボトムアップ輪郭形成戦略を採用して,テキストの特徴を正確に記述する手法である。さらに,様々な場面や言語を含む任意の字形テキストを対象とした包括的低照度データセットを提案する。特に,本手法は,この低照度データセットの最先端結果を達成し,標準の標準照度データセットに匹敵する性能を示す。コードとデータセットがリリースされる。 Localizing text in low-light environments is challenging due to visual degradations. Although a straightforward solution involves a two-stage pipeline with low-light image enhancement (LLE) as the initial step followed by detector, LLE is primarily designed for human vision instead of machine and can accumulate errors. In this work, we propose an efficient and effective single-stage approach for localizing text in dark that circumvents the need for LLE. We introduce a constrained learning module as an auxiliary mechanism during the training stage of the text detector. This module is designed to guide the text detector in preserving textual spatial features amidst feature map resizing, thus minimizing the loss of spatial information in texts under low-light visual degradations. Specifically, we incorporate spatial reconstruction and spatial semantic constraints within this module to ensure the text detector acquires essential positional and contextual range knowledge. Our approach enhances the original text detector's ability to identify text's local topological features using a dynamic snake feature pyramid network and adopts a bottom-up contour shaping strategy with a novel rectangular accumulation technique for accurate delineation of streamlined text features. In addition, we present a comprehensive low-light dataset for arbitrary-shaped text, encompassing diverse scenes and languages. Notably, our method achieves state-of-the-art results on this low-light dataset and exhibits comparable performance on standard normal light datasets. The code and dataset will be released.	翻訳日:2024-04-25 16:05:24 公開日:2024-04-24
# ユーティリティ・フェアネス・トレードオフと課題の発見方法 Utility-Fairness Trade-Offs and How to Find Them ( http://arxiv.org/abs/2404.09454v2 ) ライセンス: Link先を確認	Sepehr Dehdashtian, Bashir Sadeghi, Vishnu Naresh Boddeti,	(参考訳) 人口的公平性を考慮した分類システムを構築する場合、満足すべき目的が2つある。 1) 特定業務の効用の最大化及び 2) 既知人口統計属性の公平性を確保すること。これらの目的はしばしば競合するので、両方の最適化は実用性と公正性のトレードオフにつながる可能性がある。既存の研究はトレードオフを認め、その限界を研究するが、2つの疑問は未解決のままである。 1)実用性と公正性の最適なトレードオフは何か。そして 2)データから所望の予測タスクと興味の人口統計属性を数値的に定量化する方法。この論文はこれらの疑問に対処する。データ・スペースとラベル・スペースのトレードオフという2つのユーティリティ・フェアネスのトレードオフを紹介します。トレードオフによって、ユーティリティフェアネスプレーン内の3つの領域が明らかになり、完全に部分的に可能で不可能なものが説明される。本稿では,データサンプルから与えられた予測タスクとグループフェアネス定義のトレードオフを数値的に定量化する方法であるU-FaTEを提案する。トレードオフに基づいて、表現を評価するための新しいスキームを導入する。 1000以上の事前訓練されたモデルからのフェア表現学習手法と表現の広範な評価により、現在のアプローチのほとんどは、複数のデータセットや予測タスクをまたいだ、推定および達成可能なフェアネスユーティリティトレードオフからかけ離れていることが明らかとなった。 When building classification systems with demographic fairness considerations, there are two objectives to satisfy: 1) maximizing utility for the specific task and 2) ensuring fairness w.r.t. a known demographic attribute. These objectives often compete, so optimizing both can lead to a trade-off between utility and fairness. While existing works acknowledge the trade-offs and study their limits, two questions remain unanswered: 1) What are the optimal trade-offs between utility and fairness? and 2) How can we numerically quantify these trade-offs from data for a desired prediction task and demographic attribute of interest? This paper addresses these questions. We introduce two utility-fairness trade-offs: the Data-Space and Label-Space Trade-off. The trade-offs reveal three regions within the utility-fairness plane, delineating what is fully and partially possible and impossible. We propose U-FaTE, a method to numerically quantify the trade-offs for a given prediction task and group fairness definition from data samples. Based on the trade-offs, we introduce a new scheme for evaluating representations. An extensive evaluation of fair representation learning methods and representations from over 1000 pre-trained models revealed that most current approaches are far from the estimated and achievable fairness-utility trade-offs across multiple datasets and prediction tasks.	翻訳日:2024-04-25 16:05:24 公開日:2024-04-24
# プログレッシブ・マルチモーダル・コンディショナル・プロンプトチューニング Progressive Multi-modal Conditional Prompt Tuning ( http://arxiv.org/abs/2404.11864v2 ) ライセンス: Link先を確認	Xiaoyu Qiu, Hao Feng, Yuechen Wang, Wengang Zhou, Houqiang Li,	(参考訳) 事前学習された視覚言語モデル(VLM)は、VLMを知識ベースとして活用し、下流タスクに有用な情報を抽出するプロンプトを通じて、顕著な一般化能力を示す。しかし、既存の手法は主にユニモーダルプロンプトを採用しており、これはユニモーダル分岐のみを介し、視覚言語(V-L)の機能を同時に調整することができない。さらに、VLMエンコーディングにおけるワンパスフォワードパイプラインは、大きなギャップを持つV-L機能を調整するのに苦労している。これらの課題を克服し,Progressive Multi-modal Conditional Prompt Tuning (ProMPT)を提案する。 ProMPTは、画像と電流の符号化情報を反復的に利用することにより、V-L機能の最適化と整合化を繰り返す構造を利用する。初期化と多モード反復進化(MIE)モジュールを含む。初期化は、VLMを使用して画像とテキストを符号化し、続いて、画像に似たテキスト特徴を選択する特徴フィルタが続く。 MIEは、クラス条件の視覚プロンプト、インスタンス条件のテキストプロンプト、機能フィルタリングによるマルチモーダルプロンプトを容易にする。各MIEイテレーションでは、視覚生成器を介してフィルタリングされたテキスト特徴から視覚プロンプトが得られ、視覚プロンプト中に対象物にもっと焦点を合わせるように画像特徴が促進される。エンコードされたイメージ機能はテキストジェネレータに入力され、クラスシフトに対してより堅牢なテキストプロンプトを生成する。これにより、V-Lの機能は徐々に整列され、粗い状態から正確な予測へと進むことができる。 ProMPTの有効性を評価するために, 広範囲な実験を3つの環境で行った。その結果, ProMPTはすべての設定において, 従来の手法よりも優れ, より優れた一般化とロバスト性を示すことがわかった。コードはhttps://github.com/qiuxiaoyu9954/ProMPTで入手できる。 Pre-trained vision-language models (VLMs) have shown remarkable generalization capabilities via prompting, which leverages VLMs as knowledge bases to extract information beneficial for downstream tasks. However, existing methods primarily employ uni-modal prompting, which only engages a uni-modal branch, failing to simultaneously adjust vision-language (V-L) features. Additionally, the one-pass forward pipeline in VLM encoding struggles to align V-L features that have a huge gap. Confronting these challenges, we propose a novel method, Progressive Multi-modal conditional Prompt Tuning (ProMPT). ProMPT exploits a recurrent structure, optimizing and aligning V-L features by iteratively utilizing image and current encoding information. It comprises an initialization and a multi-modal iterative evolution (MIE) module. Initialization is responsible for encoding images and text using a VLM, followed by a feature filter that selects text features similar to image. MIE then facilitates multi-modal prompting through class-conditional vision prompting, instance-conditional text prompting, and feature filtering. In each MIE iteration, vision prompts are obtained from filtered text features via a vision generator, promoting image features to focus more on target object during vision prompting. The encoded image features are fed into a text generator to produce text prompts that are more robust to class shifts. Thus, V-L features are progressively aligned, enabling advance from coarse to exact prediction. Extensive experiments are conducted in three settings to evaluate the efficacy of ProMPT. The results indicate that ProMPT outperforms existing methods on average across all settings, demonstrating its superior generalization and robustness. Code is available at https://github.com/qiuxiaoyu9954/ProMPT.	翻訳日:2024-04-25 16:05:24 公開日:2024-04-24
# OPTiML: 自己監督型医用画像表現のための最適輸送を用いた高密度セマンティック不変性 OPTiML: Dense Semantic Invariance Using Optimal Transport for Self-Supervised Medical Image Representation ( http://arxiv.org/abs/2404.11868v2 ) ライセンス: Link先を確認	Azad Singh, Vandan Gorade, Deepak Mishra,	(参考訳) 自己教師付き学習(SSL)は、アノテーションなしで学習できることから、医用画像解析の有望な技術として登場した。しかし、有望な可能性にもかかわらず、従来のSSLメソッドでは、セマンティックアライメントの達成や微妙な詳細の取得など、制限に直面している。これは、解剖学的構造や病理的詳細を正確に把握できない、最適下界表現につながる。これらの制約に対応するため,医用画像表現学習におけるSSLの全体的な効果を高めるために,最適なトランスポート(OT)を用いた新しいSSLフレームワークOPTiMLを導入する。中心となる考え方は、OTとクロスビューポイントセマンティクス・インフュージョン・モジュール(CV-SIM)を統合することである。 CV-SIMモジュールに加えて、OPTiMLはOTフレームワーク内での分散と共分散の規則化を強制し、臨床的に関係のある情報に焦点を絞ると同時に、より少ない情報的特徴を破棄する。提案するフレームワークは,様々な医用画像タスクに適用可能な意味豊かな表現を学習する能力を示す。その有効性を検証するために,胸部X線モダリティから利用可能な3つのデータセットについて実験を行った。実験の結果,OPTiMLはすべての評価課題において,最先端の手法よりも優れていることがわかった。 Self-supervised learning (SSL) has emerged as a promising technique for medical image analysis due to its ability to learn without annotations. However, despite the promising potential, conventional SSL methods encounter limitations, including challenges in achieving semantic alignment and capturing subtle details. This leads to suboptimal representations, which fail to accurately capture the underlying anatomical structures and pathological details. In response to these constraints, we introduce a novel SSL framework OPTiML, employing optimal transport (OT), to capture the dense semantic invariance and fine-grained details, thereby enhancing the overall effectiveness of SSL in medical image representation learning. The core idea is to integrate OT with a cross-viewpoint semantics infusion module (CV-SIM), which effectively captures complex, fine-grained details inherent in medical images across different viewpoints. In addition to the CV-SIM module, OPTiML imposes the variance and covariance regularizations within OT framework to force the model focus on clinically relevant information while discarding less informative features. Through these, the proposed framework demonstrates its capacity to learn semantically rich representations that can be applied to various medical imaging tasks. To validate its effectiveness, we conduct experimental studies on three publicly available datasets from chest X-ray modality. Our empirical results reveal OPTiML's superiority over state-of-the-art methods across all evaluated tasks.	翻訳日:2024-04-25 16:05:24 公開日:2024-04-24
# FlagVNE: ネットワークリソース割り当てのためのフレキシブルで汎用的な強化学習フレームワーク FlagVNE: A Flexible and Generalizable Reinforcement Learning Framework for Network Resource Allocation ( http://arxiv.org/abs/2404.12633v2 ) ライセンス: Link先を確認	Tianfu Wang, Qilin Fan, Chao Wang, Long Yang, Leilei Ding, Nicholas Jing Yuan, Hui Xiong,	(参考訳) VNE(Virtual Network Embedding)は、仮想ネットワーク要求(VNR)を物理インフラにマッピングすることを目的とした、ネットワーク仮想化における重要なリソース割り当てタスクである。強化学習(RL)は近年,この問題に対する有望な解決策として浮上している。しかし、既存のRLベースのVNE法は、一方向のアクション設計と一方向のトレーニング戦略によって制限されており、探索性や一般化性が制限される。本稿では,FLexible And Generalizable RL framework for VNE(FragVNE)を提案する。具体的には,仮想ノードと物理ノードの同時選択を可能にする双方向動作に基づくマルコフ決定プロセスモデルを設計し,解空間の探索性を向上させる。広範かつダイナミックな動作空間に取り組むために,適応的な動作確率分布を生成し,高い訓練効率を確保する階層型デコーダを設計する。さらに, 様々なVNRサイズに対する一般化問題を克服するために, 各VNRサイズに対する専門的な政策訓練を容易にする, カリキュラムスケジューリング戦略を備えたメタRLベースのトレーニング手法を提案する。最後に、多数の実験結果から、FragVNEが複数の主要な指標にまたがって有効であることが示されている。私たちのコードはGitHubで入手可能です(https://github.com/GeminiLight/flag-vne)。 Virtual network embedding (VNE) is an essential resource allocation task in network virtualization, aiming to map virtual network requests (VNRs) onto physical infrastructure. Reinforcement learning (RL) has recently emerged as a promising solution to this problem. However, existing RL-based VNE methods are limited by the unidirectional action design and one-size-fits-all training strategy, resulting in restricted searchability and generalizability. In this paper, we propose a FLexible And Generalizable RL framework for VNE, named FlagVNE. Specifically, we design a bidirectional action-based Markov decision process model that enables the joint selection of virtual and physical nodes, thus improving the exploration flexibility of solution space. To tackle the expansive and dynamic action space, we design a hierarchical decoder to generate adaptive action probability distributions and ensure high training efficiency. Furthermore, to overcome the generalization issue for varying VNR sizes, we propose a meta-RL-based training method with a curriculum scheduling strategy, facilitating specialized policy training for each VNR size. Finally, extensive experimental results show the effectiveness of FlagVNE across multiple key metrics. Our code is available at GitHub (https://github.com/GeminiLight/flag-vne).	翻訳日:2024-04-25 16:05:24 公開日:2024-04-24
# 視線が知覚できる:マルチモーダル大言語モデルの非現実的推論能力のベンチマーク Eyes Can Deceive: Benchmarking Counterfactual Reasoning Abilities of Multi-modal Large Language Models ( http://arxiv.org/abs/2404.12966v2 ) ライセンス: Link先を確認	Yian Li, Wentao Tian, Yang Jiao, Jingjing Chen, Yu-Gang Jiang,	(参考訳) カウンターファクチュアル推論は、人間の知性の重要な証明として、確立した事実に基づいて仮定を行い、潜在的な結果を外挿することを指す。既存のマルチモーダルな大規模言語モデル(MLLM)は、様々なビジュアル質問回答(VQA)ベンチマークで検証された、印象的な認知と推論能力を示した。それでも、既存のMLLMは、逆問題に直面した場合、どのように機能するのか? この疑問に答えるために,我々はまず,MLLM の因果推論能力を体系的に評価するために,新規な \textbf{C}ounter\textbf{F}actual \textbf{M}ulti\textbf{M}odal reasoning benchmark をキュレートする。我々のCFMMは6つの課題から構成されており、それぞれが多岐にわたるMLLMの対実的推論能力を評価するために、慎重にラベル付けされた数百の対実的質問を含む。興味深いことに、実験を通して、既存のMLLMは、自分たちが見ているものを信じることを好んでいるが、問題に提示される偽の前提を無視し、不正確な応答をもたらす。さらに,提案するCFMMを用いて,MLLMを広範囲に評価する。 CFMMのパフォーマンスといくつかのVQAベンチマークとの間の大きなギャップは、既存のMLLMが人間レベルのインテリジェンスに近づくための十分な改善の余地があることを示している。一方,今後のCFMMにおけるMLLMの性能向上により,高度な知能を持つMLLMの開発に向けた潜在的な道筋を探求することができる。 Counterfactual reasoning, as a crucial manifestation of human intelligence, refers to making presuppositions based on established facts and extrapolating potential outcomes. Existing multimodal large language models (MLLMs) have exhibited impressive cognitive and reasoning capabilities, which have been examined across a wide range of Visual Question Answering (VQA) benchmarks. Nevertheless, how will existing MLLMs perform when faced with counterfactual questions? To answer this question, we first curate a novel \textbf{C}ounter\textbf{F}actual \textbf{M}ulti\textbf{M}odal reasoning benchmark, abbreviated as \textbf{CFMM}, to systematically assess the counterfactual reasoning capabilities of MLLMs. Our CFMM comprises six challenging tasks, each including hundreds of carefully human-labeled counterfactual questions, to evaluate MLLM's counterfactual reasoning capabilities across diverse aspects. Through experiments, interestingly, we find that existing MLLMs prefer to believe what they see, but ignore the counterfactual presuppositions presented in the question, thereby leading to inaccurate responses. Furthermore, we evaluate a wide range of prevalent MLLMs on our proposed CFMM. The significant gap between their performance on our CFMM and that on several VQA benchmarks indicates that there is still considerable room for improvement in existing MLLMs toward approaching human-level intelligence. On the other hand, through boosting MLLMs performances on our CFMM in the future, potential avenues toward developing MLLMs with advanced intelligence can be explored.	翻訳日:2024-04-25 16:05:24 公開日:2024-04-24
# スコア変更を超えて:2つの観点からの非参照画像品質評価に対する敵対的攻撃 Beyond Score Changes: Adversarial Attack on No-Reference Image Quality Assessment from Two Perspectives ( http://arxiv.org/abs/2404.13277v2 ) ライセンス: Link先を確認	Chenxi Yang, Yujia Liu, Dingquan Li, Yan Zhong, Tingting Jiang,	(参考訳) ディープニューラルネットワークは、NR-IQA(No-Reference Image Quality Assessment)において驚くべき成功を収めている。しかし、最近の研究は、NR-IQAモデルが微妙な敵の摂動に対して脆弱であることを強調し、モデル予測と主観的評価の不整合をもたらす。しかし、現在の敵対的攻撃は、個々の画像の予測スコアの摂動に焦点を合わせ、画像集合全体におけるスコア間の相関関係の重要な側面を無視している。一方、ランキング相関と同様、NR-IQAタスクでは相関が重要な役割を担っていることに留意する必要がある。 NR-IQAモデルのロバスト性を包括的に探求するために,画像集合内の相関関係を乱し,個々の画像に変化をスコアする相関エラーベースの新たなフレームワークを導入する。我々の研究は主に、Spearman's Rank-Order correlation Coefficient (SROCC)やMean Squared Error (MSE)のような予測エラー関連メトリクスのようなランキング関連相関指標に焦点を当てている。そこで本研究では,SROCC-MSE-Attack (SMA) と呼ばれる2段階のSROCC-MSE-Attack (SMA) を提案する。実験の結果,SMA法はSROCCを負の値に大きく破壊するだけでなく,個々の画像のスコアにかなりの変化をもたらすことが明らかとなった。一方、さまざまなカテゴリのメトリクスにまたがって最先端のパフォーマンスを示す。提案手法はNR-IQAモデルのロバスト性に関する新しい視点を提供する。 Deep neural networks have demonstrated impressive success in No-Reference Image Quality Assessment (NR-IQA). However, recent researches highlight the vulnerability of NR-IQA models to subtle adversarial perturbations, leading to inconsistencies between model predictions and subjective ratings. Current adversarial attacks, however, focus on perturbing predicted scores of individual images, neglecting the crucial aspect of inter-score correlation relationships within an entire image set. Meanwhile, it is important to note that the correlation, like ranking correlation, plays a significant role in NR-IQA tasks. To comprehensively explore the robustness of NR-IQA models, we introduce a new framework of correlation-error-based attacks that perturb both the correlation within an image set and score changes on individual images. Our research primarily focuses on ranking-related correlation metrics like Spearman's Rank-Order Correlation Coefficient (SROCC) and prediction error-related metrics like Mean Squared Error (MSE). As an instantiation, we propose a practical two-stage SROCC-MSE-Attack (SMA) that initially optimizes target attack scores for the entire image set and then generates adversarial examples guided by these scores. Experimental results demonstrate that our SMA method not only significantly disrupts the SROCC to negative values but also maintains a considerable change in the scores of individual images. Meanwhile, it exhibits state-of-the-art performance across metrics with different categories. Our method provides a new perspective on the robustness of NR-IQA models.	翻訳日:2024-04-25 16:05:24 公開日:2024-04-24
# 超低損失集積フォトニクスは明るい狭帯域光子対源を可能にする Ultralow-loss integrated photonics enables bright, narrow-band, photon-pair sources ( http://arxiv.org/abs/2404.13387v2 ) ライセンス: Link先を確認	Ruiyang Chen, Yi-Han Luo, Jinbao Long, Baoqi Shi, Chen Shen, Junqiu Liu,	(参考訳) 光子対光源は、フォトニック量子系にとって重要な構成要素である。ケーラー非線形性とキャビティ強化した自発4波混合を利用して、フォトニック集積回路上に構築されたマイクロ共振器を用いてチップスケール光子対光源を作成することができる。実用化のためには、マイクロ共振器の品質係数$Q$は光子対光源の輝度を増大させ、その直線幅を減少させることが必須である。前者は$Q^4$に、後者は$Q$に比例する。本稿では,マイクロ共振器をベースとした狭帯域光子対光源について述べる。この集積マイクロ共振器は窒化ケイ素で作製され、標準のCMOSファウントリープロセスで製造され、極低損失が3ドル/m、本質的なQ$は10ドル^7ドルである。光子対光源の輝度は1.17\times10^9$ Hz/mW$^2$/GHz、光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光さらに、2階相関$g^{(2)}_\mathrm{h}(0)=0.0037(5)$と、視認率$0.973(9)$の時間ビン絡みのソースも可能となる。我々の研究は、超低損失集積フォトニクスのグローバルポテンシャルを証明し、量子通信やネットワークへの効率的でコンパクトで堅牢なインターフェースを触媒する新しい量子光源と回路を創出する。 Photon-pair sources are critical building blocks for photonic quantum systems. Leveraging Kerr nonlinearity and cavity-enhanced spontaneous four-wave mixing, chip-scale photon-pair sources can be created using microresonators built on photonic integrated circuit. For practical applications, a high microresonator quality factor $Q$ is mandatory to magnify photon-pair sources' brightness and reduce their linewidth. The former is proportional to $Q^4$, while the latter is inversely proportional to $Q$. Here, we demonstrate an integrated, microresonator-based, narrow-band photon-pair source. The integrated microresonator, made of silicon nitride and fabricated using a standard CMOS foundry process, features ultralow loss down to $3$ dB/m and intrinsic $Q$ factor exceeding $10^7$. The photon-pair source has brightness of $1.17\times10^9$ Hz/mW$^2$/GHz and linewidth of $25.9$ MHz, both of which are record values for silicon-photonics-based quantum light source. It further enables a heralded single-photon source with heralded second-order correlation $g^{(2)}_\mathrm{h}(0)=0.0037(5)$, as well as a time-bin entanglement source with a raw visibility of $0.973(9)$. Our work evidences the global potential of ultralow-loss integrated photonics to create novel quantum light sources and circuits, catalyzing efficient, compact and robust interfaces to quantum communication and networks.	翻訳日:2024-04-25 16:05:24 公開日:2024-04-24
# 高周波分解によるブラケット画像の復元と改善 Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition ( http://arxiv.org/abs/2404.13537v2 ) ライセンス: Link先を確認	Genggeng Chen, Kexin Dai, Kangzhen Yang, Tao Hu, Xiangyu Chen, Yongqing Yang, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan,	(参考訳) 現実のシナリオでは、一連の画像劣化のため、高品質で透明なコンテンツ写真を得るのは難しい。高品質な画像の合成には大きな進歩があったが、以前の画像復元と改善の方法は、しばしば異なる劣化の特性を見落としていた。彼らは、様々な種類の劣化に対処するために、同じ構造を適用した。高周波数情報が異なる劣化に適用できるという考えから着想を得て,高周波数分解に基づくブラケット画像復元・改善手法HLNetを導入する。具体的には,共有重み加群と非共有重み加群という,特徴抽出に2つの加群を用いる。共有重みモジュールでは、SCConvを用いて、異なる劣化から共通特徴を抽出する。非共有重みモジュールでは、高速周波数分解ブロック(HLFDB)を導入し、低周波情報を処理し、異なる劣化により効果的に対処できるようにする。本手法は他のネットワークと比較して,劣化特性を考慮し,高品質な画像復元を実現する。 In real-world scenarios, due to a series of image degradations, obtaining high-quality, clear content photos is challenging. While significant progress has been made in synthesizing high-quality images, previous methods for image restoration and enhancement often overlooked the characteristics of different degradations. They applied the same structure to address various types of degradation, resulting in less-than-ideal restoration outcomes. Inspired by the notion that high/low frequency information is applicable to different degradations, we introduce HLNet, a Bracketing Image Restoration and Enhancement method based on high-low frequency decomposition. Specifically, we employ two modules for feature extraction: shared weight modules and non-shared weight modules. In the shared weight modules, we use SCConv to extract common features from different degradations. In the non-shared weight modules, we introduce the High-Low Frequency Decomposition Block (HLFDB), which employs different methods to handle high-low frequency information, enabling the model to address different degradations more effectively. Compared to other networks, our method takes into account the characteristics of different degradations, thus achieving higher-quality image restoration.	翻訳日:2024-04-25 16:05:24 公開日:2024-04-24
# Bt-GAN: Bias-transforming Generative Adversarial Networksによる公正な合成健康データの生成 Bt-GAN: Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks ( http://arxiv.org/abs/2404.13634v2 ) ライセンス: Link先を確認	Resmi Ramachandranpillai, Md Fahim Sikder, David Bergström, Fredrik Heintz,	(参考訳) 合成データ生成は、現実的な非識別データを生成することにより、電子医療記録(EHR)の有用性を高めるための有望なソリューションを提供する。しかし、既存の文献は、下流予測における公平性の重要な側面を無視して、合成健康データの品質に重点を置いている。その結果、合成EHRで訓練されたモデルは、目標タスクにおいてバイアスのある結果を生み出すという批判に直面している。これらのバイアスは、特徴間の急激な相関や、サブグループを正確に表現するモデルの失敗から生じることがある。これらの問題に対処するために、医療領域向けに設計されたGANベースの合成データ生成装置であるBt-GAN(Bias-transforming Generative Adversarial Networks)を提案する。素早い相関に挑戦するために i) 情報制約付きデータ生成プロセスを提案し, アルゴリズムの公正性の概念に基づいて, 生成者が公正な決定論的変換を学習できるようにする。正確な部分群表現の取得という課題を克服する (II) スコアベース重み付けサンプリングにより, サブグループ密度を保ち, ジェネレータにインセンティブを与える。このアプローチは、データ多様体の未表現領域から学習するジェネレータを補完する。我々はMIMIC-IIIデータベースを用いて広範囲にわたる実験を行った。以上の結果から,Bt-GANはSOTAの精度を向上し,公平性を向上し,バイアス増幅を最小化できることがわかった。また,本研究の有効性を裏付ける証拠として,詳細な説明可能性分析を行った。そこで本研究では,医療領域における合成データ生成の限界に対処するための,新規かつ専門的なアプローチを提案する。公平性を考慮し、GANのような高度な技術を活用することで、医療応用における信頼性と偏見のない予測の道を開く。 Synthetic data generation offers a promising solution to enhance the usefulness of Electronic Healthcare Records (EHR) by generating realistic de-identified data. However, the existing literature primarily focuses on the quality of synthetic health data, neglecting the crucial aspect of fairness in downstream predictions. Consequently, models trained on synthetic EHR have faced criticism for producing biased outcomes in target tasks. These biases can arise from either spurious correlations between features or the failure of models to accurately represent sub-groups. To address these concerns, we present Bias-transforming Generative Adversarial Networks (Bt-GAN), a GAN-based synthetic data generator specifically designed for the healthcare domain. In order to tackle spurious correlations (i), we propose an information-constrained Data Generation Process that enables the generator to learn a fair deterministic transformation based on a well-defined notion of algorithmic fairness. To overcome the challenge of capturing exact sub-group representations (ii), we incentivize the generator to preserve sub-group densities through score-based weighted sampling. This approach compels the generator to learn from underrepresented regions of the data manifold. We conduct extensive experiments using the MIMIC-III database. Our results demonstrate that Bt-GAN achieves SOTA accuracy while significantly improving fairness and minimizing bias amplification. We also perform an in-depth explainability analysis to provide additional evidence supporting the validity of our study. In conclusion, our research introduces a novel and professional approach to addressing the limitations of synthetic data generation in the healthcare domain. By incorporating fairness considerations and leveraging advanced techniques such as GANs, we pave the way for more reliable and unbiased predictions in healthcare applications.	翻訳日:2024-04-25 16:05:24 公開日:2024-04-24
# データ拡張によるソーシャルネットワークの予測向上に関する比較研究 A Comparative Study on Enhancing Prediction in Social Network Advertisement through Data Augmentation ( http://arxiv.org/abs/2404.13812v2 ) ライセンス: Link先を確認	Qikai Yang, Panfeng Li, Xinhe Xu, Zhicheng Ding, Wenjing Zhou, Yi Nian,	(参考訳) ソーシャルネットワーク広告の世界では、予測モデルのパフォーマンスにおいて、データの量と正確さが重要な役割を担っている。しかし、堅牢な予測アルゴリズムの開発は、しばしば実世界のデータセットに存在する限られたサイズと潜在的なバイアスによって妨げられる。本研究では,ソーシャルネットワーク広告データの生成的拡張フレームワークを提示し,検討する。本稿では,データ拡張のための生成モデルとして,GAN(Generative Adversarial Networks),VAE(VAE),Gaussian Mixture Models(GMM)の3つを検討した。特徴空間の合成拡張を行うことにより,データ拡張により,様々な分類器の性能が定量的に向上したことがわかった。さらに,各データ拡張手法がもたらす相対的な性能向上を比較し,モデル性能を向上させる適切なテクニックを選択するための洞察を提供する。本稿では,ソーシャル・ネットワーク・広告分野において,合成データの増大により,小あるいは不均衡なデータセットによる制限が緩和されることを示すことによって文献に寄与する。同時に、本論文は、異なるデータ拡張手法の実用性に関する比較視点も提供し、モデル性能を向上させるための適切なテクニックを実践者に選択するよう促す。 In the ever-evolving landscape of social network advertising, the volume and accuracy of data play a critical role in the performance of predictive models. However, the development of robust predictive algorithms is often hampered by the limited size and potential bias present in real-world datasets. This study presents and explores a generative augmentation framework of social network advertising data. Our framework explores three generative models for data augmentation - Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Gaussian Mixture Models (GMMs) - to enrich data availability and diversity in the context of social network advertising analytics effectiveness. By performing synthetic extensions of the feature space, we find that through data augmentation, the performance of various classifiers has been quantitatively improved. Furthermore, we compare the relative performance gains brought by each data augmentation technique, providing insights for practitioners to select appropriate techniques to enhance model performance. This paper contributes to the literature by showing that synthetic data augmentation alleviates the limitations imposed by small or imbalanced datasets in the field of social network advertising. At the same time, this article also provides a comparative perspective on the practicality of different data augmentation methods, thereby guiding practitioners to choose appropriate techniques to enhance model performance.	翻訳日:2024-04-25 16:05:24 公開日:2024-04-24
# 地域風と色移動 Regional Style and Color Transfer ( http://arxiv.org/abs/2404.13880v2 ) ライセンス: Link先を確認	Zhicheng Ding, Panfeng Li, Qikai Yang, Siyang Li, Qingtian Gong,	(参考訳) 本稿では,地域スタイル移行の分野への新たな貢献について述べる。既存の手法は、画像全体にわたって均一にスタイルを適用するという欠点に悩まされることが多く、人物像などの前景要素を持つ画像に適用した場合、スタイル上の不整合や前景オブジェクトがねじれてしまう。この制限に対処するために、セグメント化ネットワークを利用して入力画像内の前景オブジェクトを正確に分離する新しいアプローチを提案する。その後、背景領域にのみスタイル転送が適用される。分離されたフォアグラウンドオブジェクトは、慎重にスタイル変換された背景に再統合される。前景と背景との視覚的コヒーレンスを高めるために、再法人化前の前景要素に色転写ステップを用いる。最後に,羽ばたき技術を用いて,前景と背景のシームレスな融合を実現し,視覚的に統一され,美的な最終構成を実現する。その結果,提案手法は従来の手法に比べて,より自然なスタイル変換をもたらすことがわかった。 This paper presents a novel contribution to the field of regional style transfer. Existing methods often suffer from the drawback of applying style homogeneously across the entire image, leading to stylistic inconsistencies or foreground object twisted when applied to image with foreground elements such as person figures. To address this limitation, we propose a new approach that leverages a segmentation network to precisely isolate foreground objects within the input image. Subsequently, style transfer is applied exclusively to the background region. The isolated foreground objects are then carefully reintegrated into the style-transferred background. To enhance the visual coherence between foreground and background, a color transfer step is employed on the foreground elements prior to their rein-corporation. Finally, we utilize feathering techniques to achieve a seamless amalgamation of foreground and background, resulting in a visually unified and aesthetically pleasing final composition. Extensive evaluations demonstrate that our proposed approach yields significantly more natural stylistic transformations compared to conventional methods.	翻訳日:2024-04-25 16:05:24 公開日:2024-04-24
# MaterialSeg3D:Dense Materials from 2D Priors for 3D Assets (特集バイオサイバネティックスとバイオサイバネティックス) MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets ( http://arxiv.org/abs/2404.13923v2 ) ライセンス: Link先を確認	Zeyu Li, Ruitong Gan, Chuanchen Luo, Yuxi Wang, Jiaheng Liu, Ziwei Zhu Man Zhang, Qing Li, Xucheng Yin, Zhaoxiang Zhang, Junran Peng,	(参考訳) 強力な画像拡散モデルによって駆動される最近の研究は、テキストや視覚的ガイダンスから3Dオブジェクトを自動生成することに成功した。スコア蒸留サンプリング(SDS)を様々な視点で反復的に行うことにより、これらの手法は3次元空間に先立って2次元生成物を持ち上げることに成功している。しかし、そのような2次元生成画像は、照明効果と影をテクスチャに焼き込む。結果として、SDSによって最適化された材料マップは必然的に、相互に相関する成分を伴っている。正確な物質定義がないため、新しいシーンで生成された資産を合理的にリライトすることは不可能であり、下流のシナリオでの応用を制限する。対照的に、人間はこの曖昧さを、その外見や意味から物体の物質を引き出すことによって、力ずくで回避することができる。そこで本研究では,2次元セマンティックから基礎となる物質を推定する3次元アセット・マテリアル生成フレームワークであるMaterialSeg3Dを提案する。このような先行モデルに基づいて,材料を三次元空間で解析する機構を考案する。われわれはUVスタックを維持しており、それぞれのマップは特定の視点から投影されていない。すべての視点をトラバースした後、重み付けされた投票方式でスタックを融合し、領域統一を用いて対象部品のコヒーレンスを確保する。セマンティクスの学習に先立って,多彩な画像,多様なカテゴリ,正確なアノテーションを特徴とするMIO(Materialized Individual Objects)という資料データセットを収集した。定量的および定性的実験により,本手法の有効性を実証した。 Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. However, such a 2D generative image prior bakes the effect of illumination and shadow into the texture. As a result, material maps optimized by SDS inevitably involve spurious correlated components. The absence of precise material definition makes it infeasible to relight the generated assets reasonably in novel scenes, which limits their application in downstream scenarios. In contrast, humans can effortlessly circumvent this ambiguity by deducing the material of the object from its appearance and semantics. Motivated by this insight, we propose MaterialSeg3D, a 3D asset material generation framework to infer underlying material from the 2D semantic prior. Based on such a prior model, we devise a mechanism to parse material in 3D space. We maintain a UV stack, each map of which is unprojected from a specific viewpoint. After traversing all viewpoints, we fuse the stack through a weighted voting scheme and then employ region unification to ensure the coherence of the object parts. To fuel the learning of semantics prior, we collect a material dataset, named Materialized Individual Objects (MIO), which features abundant images, diverse categories, and accurate annotations. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method.	翻訳日:2024-04-25 16:05:24 公開日:2024-04-24
# ジェネレーティブAIの著作権問題に対する経済的解決策 An Economic Solution to Copyright Challenges of Generative AI ( http://arxiv.org/abs/2404.13964v3 ) ライセンス: Link先を確認	Jiachen T. Wang, Zhun Deng, Hiroaki Chiba-Okabe, Boaz Barak, Weijie J. Su,	(参考訳) 生成人工知能(AI)システムは、テキスト、画像、ビデオ、その他のメディアを生成するために、大規模なデータコーパスで訓練されている。このようなシステムは、データコントリビュータのトレーニングに関する著作権権に侵害されるのではないか、という懸念が高まっている。生成AIの著作権問題に対処するため、我々は、AI生成コンテンツ作成への貢献に比例して著作権所有者を補償する枠組みを提案する。コントリビューションの計量は、現代の生成AIモデルの確率的性質を活用し、経済学における協調ゲーム理論の技法を用いて定量的に決定される。このフレームワークは、AI開発者が高品質なトレーニングデータにアクセスすることで、モデルパフォーマンスを向上させるプラットフォームを可能にする。一方、著作権所有者は公正な補償を受け、生成モデルトレーニングのための関連データの継続的な提供を推進している。実験により,本フレームワークは,著作権所有者間の収益の公平かつ解釈可能な分配を確保するため,美術作品生成において最も関連性の高いデータソースの同定に成功していることが示された。 Generative artificial intelligence (AI) systems are trained on large data corpora to generate new pieces of text, images, videos, and other media. There is growing concern that such systems may infringe on the copyright interests of training data contributors. To address the copyright challenges of generative AI, we propose a framework that compensates copyright owners proportionally to their contributions to the creation of AI-generated content. The metric for contributions is quantitatively determined by leveraging the probabilistic nature of modern generative AI models and using techniques from cooperative game theory in economics. This framework enables a platform where AI developers benefit from access to high-quality training data, thus improving model performance. Meanwhile, copyright owners receive fair compensation, driving the continued provision of relevant data for generative model training. Experiments demonstrate that our framework successfully identifies the most relevant data sources used in artwork generation, ensuring a fair and interpretable distribution of revenues among copyright owners.	翻訳日:2024-04-25 15:54:19 公開日:2024-04-24
# 反復多モード核融合によるコミックのゼロショット文字同定と話者予測 Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion ( http://arxiv.org/abs/2404.13993v2 ) ライセンス: Link先を確認	Yingxuan Li, Ryota Hinami, Kiyoharu Aizawa, Yusuke Matsui,	(参考訳) 文字の認識と対話の話者の予測は、音声生成や翻訳といった漫画処理作業に不可欠である。しかし、キャラクターは漫画のタイトルによって異なるため、漫画のタイトルごとに特定のアノテーションを必要とする文字分類器の訓練のような教師あり学習アプローチは実現不可能である。これにより、機械が文字を識別し、注釈のない漫画画像のみに基づいて話者名を予測できるゼロショット方式が提案される。現実の応用において重要であるにもかかわらず、これらのタスクはストーリー理解とマルチモーダル統合の課題のために、ほとんど探索されていないままである。近年の大規模言語モデル (LLM) はテキスト理解と推論に優れた能力を示し, マルチモーダルコンテンツ解析への応用は依然として未解決の課題である。そこで本研究では,文字識別と話者予測の両方にマルチモーダル情報を用いた反復型マルチモーダルフレームワークを提案する。提案手法の有効性を実証し,これらの課題に対するロバストなベースラインを確立する。さらに,本手法ではトレーニングデータやアノテーションは必要としないため,どんなコミックシリーズでもそのまま使用することができる。 Recognizing characters and predicting speakers of dialogue are critical for comic processing tasks, such as voice generation or translation. However, because characters vary by comic title, supervised learning approaches like training character classifiers which require specific annotations for each comic title are infeasible. This motivates us to propose a novel zero-shot approach, allowing machines to identify characters and predict speaker names based solely on unannotated comic images. In spite of their importance in real-world applications, these task have largely remained unexplored due to challenges in story comprehension and multimodal integration. Recent large language models (LLMs) have shown great capability for text understanding and reasoning, while their application to multimodal content analysis is still an open problem. To address this problem, we propose an iterative multimodal framework, the first to employ multimodal information for both character identification and speaker prediction tasks. Our experiments demonstrate the effectiveness of the proposed framework, establishing a robust baseline for these tasks. Furthermore, since our method requires no training data or annotations, it can be used as-is on any comic series.	翻訳日:2024-04-25 15:54:19 公開日:2024-04-24
# 既存のデータセットの検索と変換によるより良い合成データ Better Synthetic Data by Retrieving and Transforming Existing Datasets ( http://arxiv.org/abs/2404.14361v2 ) ライセンス: Link先を確認	Saumya Gandhi, Ritu Gala, Vijay Viswanathan, Tongshuang Wu, Graham Neubig,	(参考訳) 近年の大規模言語モデルの発展にもかかわらず、信頼性が高くデプロイ可能なNLPモデルの構築には、多くの高品質なトレーニングデータが必要である。しかし、多くのユースケースでタスク固有のデータは利用できず、手作業でタスク固有のデータをキュレートするのは労働集約的です。近年の研究では、大規模言語モデルを用いたプロンプト駆動合成データ生成について研究されているが、これらのデータセットは複雑さと多様性に欠ける傾向がある。これらの制限に対処するため、既存の公開データセットをよりよく活用して自動データセット生成を改善するために、DataTuneという手法を導入しました。 DataTuneはデータセット変換を実行することで、公開されているデータセットを、ターゲットタスクの特定の要件と直接整合したフォーマットに再利用することが可能になる。 BIG-Benchベンチマークによる多種多様な言語ベースのタスクでは、DataTuneによる微調整言語モデルは、ベースラインを49%改善し、合成または検索されたトレーニングデータを使用する既存のメソッドを34%改善する。データセット変換は、多くのタスクにおいて生成されたデータの多様性と難易度を著しく向上させる。 DataTuneをオープンソースリポジトリに統合して,このメソッドをコミュニティに公開しています。 Despite recent advances in large language models, building dependable and deployable NLP models typically requires abundant, high-quality training data. However, task-specific data is not available for many use cases, and manually curating task-specific data is labor-intensive. Recent work has studied prompt-driven synthetic data generation using large language models, but these generated datasets tend to lack complexity and diversity. To address these limitations, we introduce a method, DataTune, to make better use of existing, publicly available datasets to improve automatic dataset generation. DataTune performs dataset transformation, enabling the repurposing of publicly available datasets into a format that is directly aligned with the specific requirements of target tasks. On a diverse set of language-based tasks from the BIG-Bench benchmark, we find that finetuning language models via DataTune improves over a few-shot prompting baseline by 49% and improves over existing methods that use synthetic or retrieved training data by 34%. We find that dataset transformation significantly increases the diversity and difficulty of generated data on many tasks. We integrate DataTune into an open-source repository to make this method accessible to the community: https://github.com/neulab/prompt2model.	翻訳日:2024-04-25 15:54:19 公開日:2024-04-24
# Smooth Q-Learningアルゴリズムの統一ODE解析 Unified ODE Analysis of Smooth Q-Learning Algorithms ( http://arxiv.org/abs/2404.14442v2 ) ライセンス: Link先を確認	Donghwan Lee,	(参考訳) Q-ラーニングの収束は、過去数十年にわたる広範な研究の焦点となっている。近年,Q-ラーニングのための漸近収束解析をスイッチングシステムフレームワークを用いて導入している。このアプローチは、連続時間スイッチングシステムとしてモデル化された非同期Q-ラーニングの収束を証明するために、いわゆる常微分方程式(ODE)アプローチを適用する。しかし、安定性を証明するためには、準単調性のような制約条件を基礎となるスイッチングシステムに満たさなければならないため、解析方法をスムーズなQ-ラーニング変種など他の強化学習アルゴリズムに容易に一般化することは困難である。本稿では、スイッチングシステムアプローチを改善し、Q-ラーニングとそのスムーズな変形を解析できる、より汎用的で統一的な収束解析を提案する。提案手法は,Lyapunov関数として機能する$p$-normに基づく同期Q-ラーニングの収束に関する過去の研究に動機付けられている。しかし、提案した分析は、より一般的なODEモデルに対処し、非同期Q-ラーニングと、より単純なフレームワークでそのスムーズなバージョンの両方をカバーできる。 Convergence of Q-learning has been the focus of extensive research over the past several decades. Recently, an asymptotic convergence analysis for Q-learning was introduced using a switching system framework. This approach applies the so-called ordinary differential equation (ODE) approach to prove the convergence of the asynchronous Q-learning modeled as a continuous-time switching system, where notions from switching system theory are used to prove its asymptotic stability without using explicit Lyapunov arguments. However, to prove stability, restrictive conditions, such as quasi-monotonicity, must be satisfied for the underlying switching systems, which makes it hard to easily generalize the analysis method to other reinforcement learning algorithms, such as the smooth Q-learning variants. In this paper, we present a more general and unified convergence analysis that improves upon the switching system approach and can analyze Q-learning and its smooth variants. The proposed analysis is motivated by previous work on the convergence of synchronous Q-learning based on $p$-norm serving as a Lyapunov function. However, the proposed analysis addresses more general ODE models that can cover both asynchronous Q-learning and its smooth versions with simpler frameworks.	翻訳日:2024-04-25 15:54:19 公開日:2024-04-24
# より小さく、より高速なデコーダのみのトランスフォーマーを目指して--アーキテクチャ的変異とその意味 Towards smaller, faster decoder-only transformers: Architectural variants and their implications ( http://arxiv.org/abs/2404.14462v2 ) ライセンス: Link先を確認	Sathya Krishnan Suresh, Shunmugapriya P,	(参考訳) 大規模言語モデル(LLMs)の研究は、最近指数関数的な成長をみせており、主にトランスフォーマーベースのアーキテクチャに焦点をあてており、[1]によって導入され、[2]におけるデコーダのみのバリエーションによってさらに進歩している。現代の研究は、アーキテクチャの複雑さとトレーニングデータの量の両方を増大させることで、モデル機能を改善することを目的としている。しかし、性能を維持しながらモデルのサイズを小さくする方法を研究する研究は限られている。本稿では,デコーダのみのトランスアーキテクチャであるParallelGPT(p-gpt),LinearlyCompressedGPT(lc-gpt),ConvCompressedGPT(cc-gpt)の3つの変更点を紹介する。これらの変種は、モデルのサイズを減らし、トレーニング時間を短縮することで、コード生成タスクにおける従来のアーキテクチャと同等のパフォーマンスを実現する。私たちは、この領域における将来の研究開発をサポートするために、モデルの重みとコードベースをオープンソースにしています。 Research on Large Language Models (LLMs) has recently seen exponential growth, largely focused on transformer-based architectures, as introduced by [1] and further advanced by the decoder-only variations in [2]. Contemporary studies typically aim to improve model capabilities by increasing both the architecture's complexity and the volume of training data. However, research exploring how to reduce model sizes while maintaining performance is limited. This study introduces three modifications to the decoder-only transformer architecture: ParallelGPT (p-gpt), LinearlyCompressedGPT (lc-gpt), and ConvCompressedGPT (cc-gpt). These variants achieve comparable performance to conventional architectures in code generation tasks while benefiting from reduced model sizes and faster training times. We open-source the model weights and codebase to support future research and development in this domain.	翻訳日:2024-04-25 15:54:19 公開日:2024-04-24
# FlashSpeech:効率的なゼロショット音声合成 FlashSpeech: Efficient Zero-Shot Speech Synthesis ( http://arxiv.org/abs/2404.14700v2 ) ライセンス: Link先を確認	Zhen Ye, Zeqian Ju, Haohe Liu, Xu Tan, Jianyi Chen, Yiwen Lu, Peiwen Sun, Jiahao Pan, Weizhen Bian, Shulin He, Qifeng Liu, Yike Guo, Wei Xue,	(参考訳) 大規模ゼロショット音声合成の最近の進歩は言語モデルや拡散モデルによって著しく進歩している。しかし、両手法の生成プロセスは遅く、計算集約的である。従来の作業に匹敵する品質を実現するために,低予算の音声合成を効果的に行うことは,依然として大きな課題である。本稿では,従来に比べて推定時間の約5倍の大規模ゼロショット音声合成システムであるFlashSpeechを提案する。 FlashSpeechは遅延一貫性モデルに基づいて構築されており、教師としてトレーニング済みの拡散モデルを必要としない、スクラッチからトレーニング可能な、新しい逆整合トレーニングアプローチを採用している。さらに、新しい韻律生成モジュールは、韻律の多様性を高め、音声のリズムをより自然にする。 FlashSpeechの生成プロセスは、ゼロショット音声生成のための音声プロンプトに高い音質と高い類似性を維持しつつ、1つか2つのサンプリングステップで効率よく実現できる。実験の結果,FlashSpeechの優れた性能が示された。特に、FlashSpeechは、他のゼロショット音声合成システムよりも約20倍高速で、音声品質と類似性の点で同等の性能を維持している。さらに、FlashSpeechは、音声変換、音声編集、多様な音声サンプリングといったタスクを効率的に実行することで、その汎用性を示す。オーディオサンプルはhttps://flashspeech.github.io/で確認できる。 Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large-scale zero-shot speech synthesis system with approximately 5\% of the inference time compared with previous work. FlashSpeech is built on the latent consistency model and applies a novel adversarial consistency training approach that can train from scratch without the need for a pre-trained diffusion model as the teacher. Furthermore, a new prosody generator module enhances the diversity of prosody, making the rhythm of the speech sound more natural. The generation processes of FlashSpeech can be achieved efficiently with one or two sampling steps while maintaining high audio quality and high similarity to the audio prompt for zero-shot speech generation. Our experimental results demonstrate the superior performance of FlashSpeech. Notably, FlashSpeech can be about 20 times faster than other zero-shot speech synthesis systems while maintaining comparable performance in terms of voice quality and similarity. Furthermore, FlashSpeech demonstrates its versatility by efficiently performing tasks like voice conversion, speech editing, and diverse speech sampling. Audio samples can be found in https://flashspeech.github.io/.	翻訳日:2024-04-25 15:54:19 公開日:2024-04-24
# 講演の過度さ - トークン制限下での大規模言語モデルの提供 Talk Too Much: Poisoning Large Language Models under Token Limit ( http://arxiv.org/abs/2404.14795v2 ) ライセンス: Link先を確認	Jiaming He, Wenbo Jiang, Guanyu Hou, Wenshu Fan, Rui Zhang, Hongwei Li,	(参考訳) 大規模言語モデル(LLM)に対するメインストリームの中毒攻撃は、通常、入力インスタンスに固定されたトリガと、トリガクエリに対する特定のレスポンスを設定する。しかし、固定的なトリガー設定(例:異常な単語)は、人間の検出によって容易に検出でき、現実のシナリオにおける有効性と実用性を制限することができる。トリガのステルス性を高めるため,コスト削減のためのユーザによる一般的な戦略であるジェネレーション・アウトプット・コンディション・トケンの制限によって引き起こされるLSMに対する中毒攻撃を提案する。有毒モデルは通常、トークン制限なしで出力を行うが、トークン制限のある出力には有害となる。この目的を達成するために、効率的な攻撃フレームワークであるBrieFoolを紹介します。効率的な指導サンプリングと中毒データ生成により, 生成制限の特性を活用し, 目標条件下でのLCMの挙動に影響を与える。実験の結果,BrieFoolは安全領域や知識領域にまたがって有効であることがわかった。例えば、GPT-3.5-turboに対する中毒例は20件しかなく、BrieFoolは100%アタック成功率(ASR)と9.28/10の平均ハーミフルネススコア(HS)をトークン制限条件下で達成し、良質な性能を維持している。 Mainstream poisoning attacks on large language models (LLMs) typically set a fixed trigger in the input instance and specific responses for triggered queries. However, the fixed trigger setting (e.g., unusual words) may be easily detected by human detection, limiting the effectiveness and practicality in real-world scenarios. To enhance the stealthiness of the trigger, we present a poisoning attack against LLMs that is triggered by a generation/output condition-token limitation, which is a commonly adopted strategy by users for reducing costs. The poisoned model performs normally for output without token limitation, while becomes harmful for output with limited tokens. To achieve this objective, we introduce BrieFool, an efficient attack framework. It leverages the characteristics of generation limitation by efficient instruction sampling and poisoning data generation, thereby influencing the behavior of LLMs under target conditions. Our experiments demonstrate that BrieFool is effective across safety domains and knowledge domains. For instance, with only 20 generated poisoning examples against GPT-3.5-turbo, BrieFool achieves a 100% Attack Success Rate (ASR) and a 9.28/10 average Harmfulness Score (HS) under token limitation conditions while maintaining the benign performance.	翻訳日:2024-04-25 15:54:19 公開日:2024-04-24
# ニューラルネットワークに基づくアンサンブルを用いた電力系統不均衡の確率予測 Probabilistic forecasting of power system imbalance using neural network-based ensembles ( http://arxiv.org/abs/2404.14836v2 ) ライセンス: Link先を確認	Jonas Van Gompel, Bert Claessens, Chris Develder,	(参考訳) 発電と消費のバランスを維持することは、主に再生可能エネルギー、電気自動車、ヒートポンプのシェアが増加し、産業プロセスの電化によって、ますます困難でコストがかかる。正確な不均衡予測と確実な不確実性推定は、送信システムオペレーター(TSO)が適切な予約ボリュームをディスパッチし、バランスコストを低減させる。さらに、市場関係者はこれらの確率的予測を使用して、資産の柔軟性を利用してグリッドのバランスを保ち、既知のリスクを伴う収益を生み出す戦略を設計することができる。その重要性にもかかわらず、システム不均衡(SI)予測に関する文献は限られている。さらに、既存の手法は、TSOと市場関係者の双方にとって正確な予測が不可欠である、高度不均衡な状況に重点を置いていない。そこで我々は,変数選択ネットワーク(VSN)の適応であるC-VSNのアンサンブルを提案する。毎分、我々のモデルは現在の2四半期のバランスと今後の2四半期のバランスを予測し、これらの予測の不確実さを推定する。ベルギーでは、高い不均衡度が$\|$SI$\| > 500\,$MW(ベルギーでは1.3%)と定義される。高い不均衡大局面において、我々のモデルは、確率的予測を評価するCRPS(Continuous Rank probability score)において、23.4%の性能向上と、CRPS全体の6.5%の改善を実現している。同様の改善はルート平均二乗誤差の点で達成される。さらに、モデルに制限された履歴を持つ新しい入力を効果的に組み込むための微調整手法を開発した。この研究は、Elia(ベルギーのTSO)と共同で実施され、彼らの不均衡予測をさらに改善し、我々の研究の妥当性を実証した。 Keeping the balance between electricity generation and consumption is becoming increasingly challenging and costly, mainly due to the rising share of renewables, electric vehicles and heat pumps and electrification of industrial processes. Accurate imbalance forecasts, along with reliable uncertainty estimations, enable transmission system operators (TSOs) to dispatch appropriate reserve volumes, reducing balancing costs. Further, market parties can use these probabilistic forecasts to design strategies that exploit asset flexibility to help balance the grid, generating revenue with known risks. Despite its importance, literature regarding system imbalance (SI) forecasting is limited. Further, existing methods do not focus on situations with high imbalance magnitude, which are crucial to forecast accurately for both TSOs and market parties. Hence, we propose an ensemble of C-VSNs, which are our adaptation of variable selection networks (VSNs). Each minute, our model predicts the imbalance of the current and upcoming two quarter-hours, along with uncertainty estimations on these forecasts. We evaluate our approach by forecasting the imbalance of Belgium, where high imbalance magnitude is defined as $\|$SI$\| > 500\,$MW (occurs 1.3% of the time in Belgium). For high imbalance magnitude situations, our model outperforms the state-of-the-art by 23.4% (in terms of continuous ranked probability score (CRPS), which evaluates probabilistic forecasts), while also attaining a 6.5% improvement in overall CRPS. Similar improvements are achieved in terms of root-mean-squared error. Additionally, we developed a fine-tuning methodology to effectively include new inputs with limited history in our model. This work was performed in collaboration with Elia (the Belgian TSO) to further improve their imbalance forecasts, demonstrating the relevance of our work.	翻訳日:2024-04-25 15:54:19 公開日:2024-04-24
# 強化学習によるクリフォード+T回路の単元合成 Unitary Synthesis of Clifford+T Circuits with Reinforcement Learning ( http://arxiv.org/abs/2404.14865v2 ) ライセンス: Link先を確認	Sebastian Rietsch, Abhishek Y. Dubey, Christian Ufrecht, Maniraman Periyasamy, Axel Plinge, Christopher Mutschler, Daniel D. Scherer,	(参考訳) 本稿では,量子回路にユニタリを合成する深層強化学習手法を提案する。ユニタリ合成は、回路深さ、総ゲート数、特定のゲート数、またはこれらの組み合わせを最小化しながら、与えられたユニタリを表す量子回路を特定することを目的としている。過去の研究は主に連続ゲート集合に焦点を当ててきたが、パラメータフリーなクリフォード+Tゲート集合からユニタリを合成することは依然として困難である。このタスクの時間的複雑さは、一般的なユニタリーのキュービット数では必然的に指数関数的であり続けるが、単純な問題インスタンスのランタイムを減らすことは、依然として大きな課題である。本研究では,木探索法であるGumbel AlphaZeroを用いて,正確に合成可能なClifford+Tユニタリの部分集合の問題を解く。提案手法では,最大60ゲートのランダム化量子回路の集合から最大5キュービットのユニタリを合成できる。さらに、我々の推論時間は、平均して1つのGPU上で30秒程度であり、より高い量子ビット数に対して、最先端のアルゴリズムであるQuantumCircuitOptとMIN-T-SYNTHを上回っている。我々の研究は、今後数年で開発される合成アルゴリズムの競争ベースラインを提供する。 This paper presents a deep reinforcement learning approach for synthesizing unitaries into quantum circuits. Unitary synthesis aims to identify a quantum circuit that represents a given unitary while minimizing circuit depth, total gate count, a specific gate count, or a combination of these factors. While past research has focused predominantly on continuous gate sets, synthesizing unitaries from the parameter-free Clifford+T gate set remains a challenge. Although the time complexity of this task will inevitably remain exponential in the number of qubits for general unitaries, reducing the runtime for simple problem instances still poses a significant challenge. In this study, we apply the tree-search method Gumbel AlphaZero to solve the problem for a subset of exactly synthesizable Clifford+T unitaries. Our approach can synthesize unitaries for up to five qubits generated from the set of randomized quantum circuits with up to 60 gates. Furthermore, our inference times are around 30 seconds on a single GPU on average, surpassing state-of-the-art algorithms QuantumCircuitOpt and MIN-T-SYNTH for higher qubit numbers. Our work provides a competitive baseline for synthesis algorithms to be developed in the upcoming years.	翻訳日:2024-04-25 15:54:19 公開日:2024-04-24
# ニューロイメージング前処理戦略がその後の統計解析に与える影響を定量化するための感度解析 A sensitivity analysis to quantify the impact of neuroimaging preprocessing strategies on subsequent statistical analyses ( http://arxiv.org/abs/2404.14882v2 ) ライセンス: Link先を確認	Brice Ozenne, Martin Norgaard, Cyril Pernet, Melanie Ganz,	(参考訳) 新しいイメージング技術は脳の構造と機能を研究するのに成功しているが、計測された生物学的信号は、スキャンされた個人のegヘッドの動き、空間分解能の制限、または各イメージング技術に特有の他の問題によって生じる複数のノイズ源によって汚染されることが多い。したがって、データ前処理(例えばデノイング)が重要である。前処理パイプラインは長年にわたって複雑化してきたが、柔軟性も向上しており、この柔軟性は、与えられた研究の最終結果と結論に重大な影響を与える可能性がある。この大きなパラメータ空間は、しばしば多値解析(multiverse analysis)と呼ばれる。ここでは、複数のパイプライン結果を集約する統計解析のための概念的および実践的なツールと、"すべてのパイプラインに影響を及ぼさない"や"影響のない少なくとも1つのパイプライン"といったパイプラインにまたがる仮説に対する新たな感度分析テストを提供する。提案するフレームワークは汎用的で,任意の多面的シナリオに適用可能であるが,ポジトロン放射トモグラフィーデータに基づく利用例を示す。 Even though novel imaging techniques have been successful in studying brain structure and function, the measured biological signals are often contaminated by multiple sources of noise, arising due to e.g. head movements of the individual being scanned, limited spatial/temporal resolution, or other issues specific to each imaging technology. Data preprocessing (e.g. denoising) is therefore critical. Preprocessing pipelines have become increasingly complex over the years, but also more flexible, and this flexibility can have a significant impact on the final results and conclusions of a given study. This large parameter space is often referred to as multiverse analyses. Here, we provide conceptual and practical tools for statistical analyses that can aggregate multiple pipeline results along with a new sensitivity analysis testing for hypotheses across pipelines such as "no effect across all pipelines" or "at least one pipeline with no effect". The proposed framework is generic and can be applied to any multiverse scenario, but we illustrate its use based on positron emission tomography data.	翻訳日:2024-04-25 15:54:19 公開日:2024-04-24
# DAWN:クロスタスクインタラクションによるドメイン適応型弱修正核セグメンテーション DAWN: Domain-Adaptive Weakly Supervised Nuclei Segmentation via Cross-Task Interactions ( http://arxiv.org/abs/2404.14956v2 ) ライセンス: Link先を確認	Ye Zhang, Yifeng Wang, Zijie Fang, Hao Bian, Linghan Cai, Ziyue Wang, Yongbing Zhang,	(参考訳) モデルトレーニングにおけるコストの高いピクセルレベルのアノテーションへの依存を減らすために,教師付きセグメンテーション手法が注目されている。しかし、現在の弱い制御された核分割アプローチは、通常、2段階の擬似ラベル生成とネットワークトレーニングプロセスに従う。核セグメンテーションの性能は生成した擬似ラベルの品質に大きく依存しているため、その有効性は制限される。本稿では,擬似ラベル生成の課題を克服するために,クロスタスクインタラクション戦略を用いたドメイン適応型弱教師付き核セグメンテーションフレームワークを提案する。具体的には、弱い注釈付きデータを用いて補助的な検出タスクを訓練し、セグメンテーションネットワークのドメイン適応を支援する。ドメイン適応の効率を高めるために、ソースドメインからの事前知識を統合する一貫した機能制約モジュールを設計する。さらに,ドメイン転送能力を向上させるために,擬似ラベル最適化と対話型トレーニング手法を開発した。提案手法の有効性を検証するため,6つのデータセットに対して広範囲な比較・アブレーション実験を行った。その結果、既存の弱教師付きアプローチよりも、我々のアプローチの方が優れていることが示された。注目すべきは,本手法が完全教師付き手法と同等あるいはそれ以上の性能を実現することである。私たちのコードはhttps://github.com/zhangye-zoe/DAWN.orgでリリースされます。 Weakly supervised segmentation methods have gained significant attention due to their ability to reduce the reliance on costly pixel-level annotations during model training. However, the current weakly supervised nuclei segmentation approaches typically follow a two-stage pseudo-label generation and network training process. The performance of the nuclei segmentation heavily relies on the quality of the generated pseudo-labels, thereby limiting its effectiveness. This paper introduces a novel domain-adaptive weakly supervised nuclei segmentation framework using cross-task interaction strategies to overcome the challenge of pseudo-label generation. Specifically, we utilize weakly annotated data to train an auxiliary detection task, which assists the domain adaptation of the segmentation network. To enhance the efficiency of domain adaptation, we design a consistent feature constraint module integrating prior knowledge from the source domain. Furthermore, we develop pseudo-label optimization and interactive training methods to improve the domain transfer capability. To validate the effectiveness of our proposed method, we conduct extensive comparative and ablation experiments on six datasets. The results demonstrate the superiority of our approach over existing weakly supervised approaches. Remarkably, our method achieves comparable or even better performance than fully supervised methods. Our code will be released in https://github.com/zhangye-zoe/DAWN.	翻訳日:2024-04-25 15:54:19 公開日:2024-04-24
# BotDGT:動的グラフ変換器を用いた動的ソーシャルボット検出 BotDGT: Dynamicity-aware Social Bot Detection with Dynamic Graph Transformers ( http://arxiv.org/abs/2404.15070v2 ) ライセンス: Link先を確認	Buyun He, Yingguang Yang, Qi Wu, Hao Liu, Renyu Yang, Hao Peng, Xiang Wang, Yong Liao, Pengyuan Zhou,	(参考訳) ソーシャルボットの検出は、誤情報の拡散とオンラインインタラクションの真正性を維持することを目的とした、重要かつ複雑なタスクへと進化してきた。初期のグラフベースのアプローチは、ソーシャルネットワークのトポロジ的構造を利用して、顕著な結果をもたらしたが、ソーシャルネットワークの本質的なダイナミクスを見落としていた。ダイナミック性モデリングが欠如しているため、特に高度なソーシャルボットが他のユーザと対話し、カモフラージュのアイデンティティとエスケープ検出を行う場合、このようなアプローチは回避に脆弱である。これらの課題に対処するために,トポロジ的構造だけでなく,ネットワークの動的性質を効果的に取り入れた新しいフレームワークであるBotDGTを提案する。具体的には,ソーシャルネットワークを動的グラフとして特徴付ける。各歴史的スナップショットからトポロジ情報を取得するために構造モジュールが使用される。さらに、歴史的文脈の統合と、社会的ボットや正当なユーザによって表される進化する行動パターンをモデル化するために、時間モジュールを提案する。実験結果は,ソーシャルネットワークのダイナミックな性質を,精度,リコール,F1スコアの観点から無視する主要な手法に対するBotDGTの優位性を実証した。 Detecting social bots has evolved into a pivotal yet intricate task, aimed at combating the dissemination of misinformation and preserving the authenticity of online interactions. While earlier graph-based approaches, which leverage topological structure of social networks, yielded notable outcomes, they overlooked the inherent dynamicity of social networks -- In reality, they largely depicted the social network as a static graph and solely relied on its most recent state. Due to the absence of dynamicity modeling, such approaches are vulnerable to evasion, particularly when advanced social bots interact with other users to camouflage identities and escape detection. To tackle these challenges, we propose BotDGT, a novel framework that not only considers the topological structure, but also effectively incorporates dynamic nature of social network. Specifically, we characterize a social network as a dynamic graph. A structural module is employed to acquire topological information from each historical snapshot. Additionally, a temporal module is proposed to integrate historical context and model the evolving behavior patterns exhibited by social bots and legitimate users. Experimental results demonstrate the superiority of BotDGT against the leading methods that neglected the dynamic nature of social networks in terms of accuracy, recall, and F1-score.	翻訳日:2024-04-25 15:54:19 公開日:2024-04-24
# キャビティ・マグノン・オプトメカニクスにおける可変絡み合い Tunable Entanglement in Cavity-Magnon Optomechanics ( http://arxiv.org/abs/2404.15111v2 ) ライセンス: Link先を確認	Ming-Yue Liu, Xian-Xian Huang, Wei Xiong,	(参考訳) キャビティ光学は、光子とフォノンの間に本質的に非線形な相互作用を与えるが、マクロな量子絡みを発生させる際には、不均一なポテンシャルを示す。本稿では,キャビティ-マグノン光学系における多様な二分極および三分極の絡み合いを実現することを提案する。標準空洞光学にマグノンを導入することにより、調節可能なオプティメカルエンタングルメントやマグノン-マグノンエンタングルメントだけでなく、マグノン-フォノンエンタングル、マグノン-マグノン-フォノン、フォノンエンタングルメントを含む柔軟な三部構造エンタングルメントを生成することができる。さらに、最適二部分節と三部分節の絡み合いは、パラメータのチューニングによって達成できる。さらに,マグノン-光子結合の工学的手法によりすべての絡み合いを向上できることが示され,生存温度内の浴槽温度に対して堅牢であることが証明された。さらに, 崩壊速度の悪いマグノンによって, 光学的絡み合いを保護したり, 復元したりできるのに対し, 他の絡み合いは著しく減少することがわかった。その結果,ハイブリッドキャビティ-マグノン光力学における波長可変量子効果の探索と制御のための新しい手法が提案されていることが示唆された。 Cavity optomechanics, providing an inherently nonlinear interaction between photons and phonons, have shown enomerous potential in generating macroscopic quantum entanglement. Here we propose to realize diverse bipartite and tripartite entanglement in cavity-magnon optomechanics. By introducing magnons to standard cavity optomechanics, not only tunable optomechanical entanglement and magnon-magnon entanglement can be achieved, but also flexible tripartite entanglement including magnon-photon-phonon entanglement, magnon-magnon-photon and -phonon entanglement can be generated. Moreover, optimal bipartite and tripartite entanglement can be achieved by tuning parameters. We further show that all entanglement can be enhanced via engineering the magnon-photon coupling,and is proven to be robust against the bath temperature within the survival temperature. Besides, we find that the optomechanical entanglement can be protected or restored by bad magnons with large decay rate, while other entanglement is severely reduced. The results indicate that our proposal provides a novel avenue to explore and control tunable macroscopic quantum effects in hybrid cavity-magnon optomechanics.	翻訳日:2024-04-25 15:54:19 公開日:2024-04-24
# CORE-BEHRT: 慎重に最適化され、厳格に評価されるBEHRT CORE-BEHRT: A Carefully Optimized and Rigorously Evaluated BEHRT ( http://arxiv.org/abs/2404.15201v2 ) ライセンス: Link先を確認	Mikkel Odgaard, Kiril Vadimovic Klein, Sanne Møller Thysen, Espen Jimenez-Solem, Martin Sillesen, Mads Nielsen,	(参考訳) BERTベースのElectronic Health Records(EHR)モデルはBEHRTとMed-BERTのリリース以降、人気が高まっている。その後のモデルは主にこれらの基礎の上に構築されてきたが、これらの先駆的なモデルの基本設計選択は未調査のままである。この問題に対処するために、ケアリー・オプティマイズとリゴリズ・評価されたBEHRTであるCORE-BEHRTを紹介する。インクリメンタルな最適化を通じて、重要な設計選択のための改善の源泉を分離し、データ表現と個々の技術コンポーネントがパフォーマンスに与える影響について洞察する。一連の総合的な課題(死、痛み治療、一般感染)で評価した結果、データ表現の改善は、主に薬品やタイムスタンプを含む場合、平均下流性能を0.785AUROCから0.797AUROCに向上させることができることがわかった。アーキテクチャとトレーニングプロトコルの改善により、平均ダウンストリーム性能は0.801 AUROCに向上した。次に,25種類の臨床予測課題に対して厳密な評価を行うことで,最適化の整合性を実証した。その結果,25タスク中17タスクが顕著に向上し,24タスクが改善した。本研究は,今後の研究の基盤となるとともに,BERTベースのEHRモデルの信頼性向上をめざすものである。 BERT-based models for Electronic Health Records (EHR) have surged in popularity following the release of BEHRT and Med-BERT. Subsequent models have largely built on these foundations despite the fundamental design choices of these pioneering models remaining underexplored. To address this issue, we introduce CORE-BEHRT, a Carefully Optimized and Rigorously Evaluated BEHRT. Through incremental optimization, we isolate the sources of improvement for key design choices, giving us insights into the effect of data representation and individual technical components on performance. Evaluating this across a set of generic tasks (death, pain treatment, and general infection), we showed that improving data representation can increase the average downstream performance from 0.785 to 0.797 AUROC, primarily when including medication and timestamps. Improving the architecture and training protocol on top of this increased average downstream performance to 0.801 AUROC. We then demonstrated the consistency of our optimization through a rigorous evaluation across 25 diverse clinical prediction tasks. We observed significant performance increases in 17 out of 25 tasks and improvements in 24 tasks, highlighting the generalizability of our findings. Our findings provide a strong foundation for future work and aim to increase the trustworthiness of BERT-based EHR models.	翻訳日:2024-04-25 15:54:19 公開日:2024-04-24
# 深層学習を用いた点拡散関数と拡張画像からのZernike係数の直接予測 Direct Zernike Coefficient Prediction from Point Spread Functions and Extended Images using Deep Learning ( http://arxiv.org/abs/2404.15231v2 ) ライセンス: Link先を確認	Yong En Kok, Alexander Bentley, Andrew Parkes, Amanda J. Wright, Michael G. Somekh, Michael Pound,	(参考訳) 光画像の品質は、システムとサンプル誘起収差によって著しく劣化する。既存の適応光学系は通常、収差を補正し、画像を改善するために反復探索アルゴリズムに依存している。本研究では,2段階から3段階の光画像からZernike係数を直接予測することにより,畳み込みニューラルネットワークによる光収差の特徴づけを実証する。我々は,最初の25のゼルニケ係数を用いて,1から1のラジアンの範囲でランダムに生成された60,000個のシミュレーションポイントスプレッド関数(PSF)データセットを用いてネットワークを評価した。その結果,1の振幅を持つ焦点面上および下および下における3つの位相差画像のみを用いて,シミュレーションされたPSFデータセット上で0.10ラディアンの低いRMSEが得られることがわかった。さらに、このアプローチは、拡張された2Dサンプルをシミュレートしたゼルニケモードを直接予測し、0.15ラディアンのRMSEと同等の値を維持する。このアプローチは,単一の予測ステップのみを用いて効果的であること,あるいは数回反復可能であることを実証する。このシンプルで簡単な手法は、3つ以下の位相差画像を用いて収差補正を迅速かつ正確に予測し、実世界のデータセットで評価する方法を提供する。 Optical imaging quality can be severely degraded by system and sample induced aberrations. Existing adaptive optics systems typically rely on iterative search algorithm to correct for aberrations and improve images. This study demonstrates the application of convolutional neural networks to characterise the optical aberration by directly predicting the Zernike coefficients from two to three phase-diverse optical images. We evaluated our network on 600,000 simulated Point Spread Function (PSF) datasets randomly generated within the range of -1 to 1 radians using the first 25 Zernike coefficients. The results show that using only three phase-diverse images captured above, below and at the focal plane with an amplitude of 1 achieves a low RMSE of 0.10 radians on the simulated PSF dataset. Furthermore, this approach directly predicts Zernike modes simulated extended 2D samples, while maintaining a comparable RMSE of 0.15 radians. We demonstrate that this approach is effective using only a single prediction step, or can be iterated a small number of times. This simple and straightforward technique provides rapid and accurate method for predicting the aberration correction using three or less phase-diverse images, paving the way for evaluation on real-world dataset.	翻訳日:2024-04-25 15:54:19 公開日:2024-04-24
# 長期的特徴を機械的に説明する検索型頭部 Retrieval Head Mechanistically Explains Long-Context Factuality ( http://arxiv.org/abs/2404.15574v1 ) ライセンス: Link先を確認	Wenhao Wu, Yizhong Wang, Guangxuan Xiao, Hao Peng, Yao Fu,	(参考訳) 近年のロングコンテキスト言語モデルの発展にもかかわらず、トランスフォーマーベースのモデルが、ロングコンテキスト内の任意の場所から関連情報を検索する能力を示すことは、いまだ解明されていない。本稿ではこの問題に対処することを目的とする。広範囲のモデルを対象とした系統的な調査により、特別なタイプの注意頭が情報検索に大きく寄与していることが判明した。検索ヘッドの興味ある特性を以下に示す:(1) 普遍性: 長文能力を持つ探索モデルに一組の検索ヘッドがある; (2) スパース: 注目ヘッドのごく一部(5倍未満)しか検索できない。 (3)本質的:検索ヘッドは、短い文脈で事前訓練されたモデルにすでに存在する。コンテクスト長を継続事前学習で拡張する場合は、情報検索を行うヘッドのセットと同じである。例えば、Llama-2 7Bを例にとると、12の検索ヘッドは、コンテキストが変更されても常に必要な情報に対応している。検索ヘッドの残りの部分は、異なるコンテキストでアクティベートされる。 (5)因果: 完全に刈り取られた検索ヘッドは、関連する情報を取り出すのに失敗し、幻覚を引き起こす一方、ランダムな非検索ヘッドはモデルの検索能力に影響を与えない。さらに、検索ヘッドは、モデルが頻繁に質問や以前生成されたコンテキストを参照する必要がある場合、思考の連鎖(CoT)推論に強く影響を及ぼすことを示す。逆に、本質的な知識を用いてモデルが直接回答を生成するタスクは、検索ヘッドをマスキングすることで影響を受けない。これらの観察は、モデルの内部部が入力トークンから情報を求めるかをまとめて説明する。我々は、幻覚の低減、推論の改善、KVキャッシュの圧縮に関する今後の研究を促進すると信じている。 Despite the recent progress in long-context language models, it remains elusive how transformer-based models exhibit the capability to retrieve relevant information from arbitrary locations within the long context. This paper aims to address this question. Our systematic investigation across a wide spectrum of models reveals that a special type of attention heads are largely responsible for retrieving information, which we dub retrieval heads. We identify intriguing properties of retrieval heads:(1) universal: all the explored models with long-context capability have a set of retrieval heads; (2) sparse: only a small portion (less than 5\%) of the attention heads are retrieval. (3) intrinsic: retrieval heads already exist in models pretrained with short context. When extending the context length by continual pretraining, it is still the same set of heads that perform information retrieval. (4) dynamically activated: take Llama-2 7B for example, 12 retrieval heads always attend to the required information no matter how the context is changed. The rest of the retrieval heads are activated in different contexts. (5) causal: completely pruning retrieval heads leads to failure in retrieving relevant information and results in hallucination, while pruning random non-retrieval heads does not affect the model's retrieval ability. We further show that retrieval heads strongly influence chain-of-thought (CoT) reasoning, where the model needs to frequently refer back the question and previously-generated context. Conversely, tasks where the model directly generates the answer using its intrinsic knowledge are less impacted by masking out retrieval heads. These observations collectively explain which internal part of the model seeks information from the input tokens. We believe our insights will foster future research on reducing hallucination, improving reasoning, and compressing the KV cache.	翻訳日:2024-04-25 15:03:25 公開日:2024-04-24
# 医薬品製造調査の実施を支援する基礎的大規模言語モデル Can Foundational Large Language Models Assist with Conducting Pharmaceuticals Manufacturing Investigations? ( http://arxiv.org/abs/2404.15578v1 ) ライセンス: Link先を確認	Hossein Salami, Brandye Smith-Goettler, Vijay Yadav,	(参考訳) 近年,GPT(Generative Pretrained Transformer)やLLaMA(Large Language Model Meta AI)といった汎用の大規模言語モデル(LLM)が注目されている。これらのモデルが様々な自然言語処理タスクにおいて顕著に機能することを示す強い証拠がある。しかし、ドメイン固有のユースケースにアプローチし、価値を駆動するためにそれらをどのように活用するかは、未解決の問題である。本研究は, 特定のユースケース, 医薬品製造調査に焦点をあて, 組織における製造事故や逸脱の歴史的記録を活用することは, 新規事例の解決・閉鎖, 新規製造キャンペーンの廃止に有用である, と提案する。異なる製品ラインから選択した製造逸脱の小さいが多様なデータセットを用いて、上記の目標に関連するタスクの実行において、3つの汎用LCM(GPT-3.5, GPT-4, Claude-2)のパワーを評価し、定量化する。特に,(1) ケースの根本原因などの特定情報を非構造化データから抽出するプロセスを自動化するLLMの能力,(2) 履歴データベース上で意味探索を行うことにより類似または関連するずれを識別する可能性について検討した。その結果,情報抽出作業における GPT-4 と Claude-2 の精度が向上していることが示唆された。さらに, 差分記述のベクトル埋め込みに関する意味探索は, 類似した種類の欠陥があるような類似した記録を高い精度で識別することができることを示す。我々は、類似したレコード識別の精度を高めるためのさらなる改善について論じる。 General purpose Large Language Models (LLM) such as the Generative Pretrained Transformer (GPT) and Large Language Model Meta AI (LLaMA) have attracted much attention in recent years. There is strong evidence that these models can perform remarkably well in various natural language processing tasks. However, how to leverage them to approach domain-specific use cases and drive value remains an open question. In this work, we focus on a specific use case, pharmaceutical manufacturing investigations, and propose that leveraging historical records of manufacturing incidents and deviations in an organization can be beneficial for addressing and closing new cases, or de-risking new manufacturing campaigns. Using a small but diverse dataset of real manufacturing deviations selected from different product lines, we evaluate and quantify the power of three general purpose LLMs (GPT-3.5, GPT-4, and Claude-2) in performing tasks related to the above goal. In particular, (1) the ability of LLMs in automating the process of extracting specific information such as root cause of a case from unstructured data, as well as (2) the possibility of identifying similar or related deviations by performing semantic search on the database of historical records are examined. While our results point to the high accuracy of GPT-4 and Claude-2 in the information extraction task, we discuss cases of complex interplay between the apparent reasoning and hallucination behavior of LLMs as a risk factor. Furthermore, we show that semantic search on vector embedding of deviation descriptions can be used to identify similar records, such as those with a similar type of defect, with a high level of accuracy. We discuss further improvements to enhance the accuracy of similar record identification.	翻訳日:2024-04-25 15:03:25 公開日:2024-04-24
# エンタングルメント測定を用いたフォトニック変量量子固有解法 Photonic variational quantum eigensolver using entanglement measurements ( http://arxiv.org/abs/2404.15579v1 ) ライセンス: Link先を確認	Jinil Lee, Wooyeong Song, Donghwa Lee, Yosep Kim, Seung-Woo Lee, Hyang-Tag Lim, Hojoong Jung, Sang-Wook Han, Yong-Su Kim,	(参考訳) 量子システムと古典的な計算能力を組み合わせた変分量子固有解法(VQE)が、短期量子コンピューティング応用の候補として提案されている。しかしながら、VQEを実装するための測定値の数などの実験資源は、ハミルトン問題のサイズが大きくなるにつれて急速に増加する。エンタングルメント測定を適用して測定装置の数を減らし、この問題に対処することが提案されているが、エンタングルメント測定自体が追加のリソース要求をもたらす可能性がある。ここでは、偏光と光子の経路自由度を利用したフォトニックVQEに絡み合いの測定を適用した。我々のフォトニックVQEでは、エンタングルメント測定は線形光学を用いて決定的に実装できるので、さらなる実験的要求なしにエンタングルメント測定を導入することができる。さらに、そのような設定は、特定のハミルトニアンの測定装置における誤差を軽減することができることを示す。 Variational quantum eigensolver (VQE), which combines quantum systems with classical computational power, has been arisen as a promising candidate for near-term quantum computing applications. However, the experimental resources such as the number of measurements to implement VQE rapidly increases as the Hamiltonian problem size grows. Applying entanglement measurements to reduce the number of measurement setups has been proposed to address this issue, however, entanglement measurements themselves can introduce additional resource demands. Here, we apply entanglement measurements to the photonic VQE utilizing polarization and path degrees of freedom of a single-photon. In our photonic VQE, entanglement measurements can be deterministically implemented using linear optics, so it takes full advantage of introducing entanglement measurements without additional experimental demands. Moreover, we show that such a setup can mitigate errors in measurement apparatus for a certain Hamiltonian.	翻訳日:2024-04-25 14:53:37 公開日:2024-04-24
# MiM: 3次元医用画像解析のためのマスク自己監督前トレーニングのマスク MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis ( http://arxiv.org/abs/2404.15580v1 ) ライセンス: Link先を確認	Jiaxin Zhuang, Linshan Wu, Qiong Wang, Varut Vardhanabhuti, Lin Luo, Hao Chen,	(参考訳) ビジョントランスフォーマー (ViT) は, 3次元医用画像解析のための自己監督学習 (SSL) において顕著な性能を示した。 Mask AutoEncoder (MAE) は、様々な医療ビジョンタスクにおいて、ViTの可能性をさらに解き放つことができる。しかし、3次元の医療画像の次元がはるかに大きい大きな空間的サイズのため、MAEの階層設計の欠如は下流タスクの性能を損なう可能性がある。本稿では、3次元医用画像のための新しい事前学習フレームワーク「Mask in Mask」(MiM)を提案する。音量からマスクされた入力に対して,複数レベルの粒度を導入し,さらに細粒度と粗粒度を同時に再現する。さらに、隣接するレベルボリュームにクロスレベルアライメント機構を適用して、解剖学的類似性を階層的に強制する。さらに,事前学習中に階層表現学習の効率化を図るために,ハイブリッドバックボーンを採用する。 MiMは、様々な身体部位を含むCT(Computerd Tomography)画像を用いて、利用可能な3Dボリューム画像の大規模な事前トレーニングを行った。 13の公開データセットに対する大規模な実験は、臓器/病変/腫瘍のセグメンテーションと疾患分類において、他のSSLメソッドよりもMiMの方が優れていることを示した。さらに、MiMを10k以上のボリュームを持つ大規模な事前学習データセットにスケールアップし、大規模な事前学習が下流タスクの性能をさらに向上させることを示す。この改善により、研究コミュニティは3D医療画像の医療基盤モデルに向けた事前トレーニングデータセットの規模にもっと注意を払うべきだと結論付けている。 The Vision Transformer (ViT) has demonstrated remarkable performance in Self-Supervised Learning (SSL) for 3D medical image analysis. Mask AutoEncoder (MAE) for feature pre-training can further unleash the potential of ViT on various medical vision tasks. However, due to large spatial sizes with much higher dimensions of 3D medical images, the lack of hierarchical design for MAE may hinder the performance of downstream tasks. In this paper, we propose a novel \textit{Mask in Mask (MiM)} pre-training framework for 3D medical images, which aims to advance MAE by learning discriminative representation from hierarchical visual tokens across varying scales. We introduce multiple levels of granularity for masked inputs from the volume, which are then reconstructed simultaneously ranging at both fine and coarse levels. Additionally, a cross-level alignment mechanism is applied to adjacent level volumes to enforce anatomical similarity hierarchically. Furthermore, we adopt a hybrid backbone to enhance the hierarchical representation learning efficiently during the pre-training. MiM was pre-trained on a large scale of available 3D volumetric images, \textit{i.e.,} Computed Tomography (CT) images containing various body parts. Extensive experiments on thirteen public datasets demonstrate the superiority of MiM over other SSL methods in organ/lesion/tumor segmentation and disease classification. We further scale up the MiM to large pre-training datasets with more than 10k volumes, showing that large-scale pre-training can further enhance the performance of downstream tasks. The improvement also concluded that the research community should pay more attention to the scale of the pre-training dataset towards the healthcare foundation model for 3D medical images.	翻訳日:2024-04-25 14:53:37 公開日:2024-04-24
# PKIのArmored Core: 物理的に不可避な機能によってCAの署名キーを削除する Armored Core of PKI: Remove Signing Keys for CA via Physically Unclonable Function ( http://arxiv.org/abs/2404.15582v1 ) ライセンス: Link先を確認	Xiaolin Zhang, Chenghao Chen, Kailun Qin, Chi Zhang, Shipei Qu, Tengfei Wang, Yuxuan Wang, Dawu Gu,	(参考訳) CAの署名キーの保護は、PKIにおける最も重要なセキュリティ上の懸念の1つである。しかし、これらのキーは、人間のエラーや、慎重に設計された様々な攻撃によって、今日でも露出することができる。 TEEやHSMのような従来の保護は、熟練した攻撃者によってバイパスされるため、このリスクを排除できない。このジレンマは、CAの署名キーを除去することを検討する動機となり、CAのためにPKI(Physically Unclonable Function, PUF)が提供する物理的に信頼されたバインディングを適用した、PKIセキュリティ拡張であるArmored Coreを提案する。 CAs in Armored CoreはPUFベースのX509v3 TLS証明書を発行し、署名アルゴリズムの代わりにPUFを使用してドメイン公開鍵の承認を生成する。 CT上に構築された新しい透過ロギングメカニズムは、CAのPUF呼び出し動作を記録し、PUFの使用状況の監視を保証する。我々はArmored Coreの主要な機能に関する公式な暗号証明を提供する。また、現実世界のPKIコードベースにも実装しています。その結果、Armored Coreを元のシステムに組み込むことでオーバーヘッドは発生せず、代わりに計算効率を4.9%向上し、証明書ストレージの20%を節約できることがわかった。 The protection of CA's signing keys is one of the most crucial security concerns in PKI. However, these keys can still be exposed today by human errors or various carefully designed attacks. Traditional protections like TEE and HSM fail to eliminate this risk since they can be bypassed by skilled attackers. This dilemma motivates us to consider removing CA' signing keys and propose Armored Core, a PKI security extension applying the physically trusted binding provided by Physically Unclonable Function (PUF) for CA. CAs in Armored Core issue PUF-based X509v3 TLS certificates, where they use PUF instead of signing algorithms to generate endorsements for domain public keys. The new transparency logging mechanism, built upon CT, will record the PUF calling behaviors of CA, ensuring the monitoring of PUF usage. We provide a formal cryptographic proof of Armored Core's main functions. We also implement it on the real-world PKI codebase. The results show that the incorporation of Armored Core into original systems do not cause any extra overhead, but instead improves computing efficiency by >4.9% and saves >20% of certificate storage.	翻訳日:2024-04-25 14:53:37 公開日:2024-04-24
# エネルギーネットワークのためのマルチエージェント強化学習:計算問題、進展とオープン問題 Multi-Agent Reinforcement Learning for Energy Networks: Computational Challenges, Progress and Open Problems ( http://arxiv.org/abs/2404.15583v1 ) ライセンス: Link先を確認	Sarah Keren, Chaimaa Essayeh, Stefano V. Albrecht, Thomas Mortsyn,	(参考訳) 急速に変化する電気ネットワークのアーキテクチャと機能、および再生可能および分散エネルギー資源の浸透が、様々な技術的および管理上の課題を引き起こしている。これらは、ネットワークの動的で進化的な性質をサポートすることができないため、伝統的な中央集権的なエネルギー市場パラダイムを不十分にしている。本研究では,マルチエージェント強化学習(MARL)がエネルギーネットワークの分散化と脱炭を支援し,12の課題を緩和する方法について検討する。これは、エネルギーネットワークの管理における重要な計算上の課題を特定し、それらに対処する最近の研究の進捗をレビューし、MARLを使って対処する可能性のあるオープンな課題を強調することで達成される。 The rapidly changing architecture and functionality of electrical networks and the increasing penetration of renewable and distributed energy resources have resulted in various technological and managerial challenges. These have rendered traditional centralized energy-market paradigms insufficient due to their inability to support the dynamic and evolving nature of the network. This survey explores how multi-agent reinforcement learning (MARL) can support the decentralization and decarbonization of energy networks and mitigate the 12 associated challenges. This is achieved by specifying key computational challenges in managing energy networks, reviewing recent research progress on addressing them, and highlighting open challenges that may be addressed using MARL.	翻訳日:2024-04-25 14:53:37 公開日:2024-04-24
# 脳ストーム最適化に基づく糖尿病網膜症画像分類のための群学習 Brain Storm Optimization Based Swarm Learning for Diabetic Retinopathy Image Classification ( http://arxiv.org/abs/2404.15585v1 ) ライセンス: Link先を確認	Liang Qu, Cunze Wang, Yuhui Shi,	(参考訳) 近年, 医用画像分類タスクに畳み込みニューラルネットワークを適用するなど, 深層学習技術の応用が注目されている。しかし、医療分野のデータはしばしば非常にプライベートであり、異なる病院が正確なモデルを訓練するためにデータを共有することを妨げている。フェデレートラーニング(Federated Learning)は、プライバシを保存する機械学習アーキテクチャとして、クライアント側でプライベートデータを保持し、中央サーバを使用して、アップロードされたモデルパラメータを集約することで、モデルトレーニングのセットを調整することで、データのプライバシとモデルのユーティリティのバランスをとる上で、有望なパフォーマンスを示している。しかし、このアーキテクチャは信頼できるサードパーティサーバーに大きく依存している。 Swarm Learningは、中央サーバーを必要としない特殊な分散化されたフェデレーション学習アーキテクチャとして、ブロックチェーン技術を使用して、クライアント間の直接的なパラメータ交換を可能にする。しかし、ブロックの採掘にはかなりの計算資源が必要であり、その拡張性は制限される。そこで本研究では,ブレインストーム最適化アルゴリズムを,BSO-SLという名称のSwarm学習フレームワークに統合する。このアプローチは、類似のクライアントをモデル分布に基づいて異なるグループにクラスタ化する。さらに、BSOのアーキテクチャを利用すると、クライアントはクラスタ内とクラスタ外のクライアントの両方で協調学習を行う確率が与えられ、モデルが収束してローカルな最適化ができない。提案手法は, 現実の糖尿病網膜症画像分類データセットで検証され, 提案手法の有効性を実験的に実証した。 The application of deep learning techniques to medical problems has garnered widespread research interest in recent years, such as applying convolutional neural networks to medical image classification tasks. However, data in the medical field is often highly private, preventing different hospitals from sharing data to train an accurate model. Federated learning, as a privacy-preserving machine learning architecture, has shown promising performance in balancing data privacy and model utility by keeping private data on the client's side and using a central server to coordinate a set of clients for model training through aggregating their uploaded model parameters. Yet, this architecture heavily relies on a trusted third-party server, which is challenging to achieve in real life. Swarm learning, as a specialized decentralized federated learning architecture that does not require a central server, utilizes blockchain technology to enable direct parameter exchanges between clients. However, the mining of blocks requires significant computational resources, limiting its scalability. To address this issue, this paper integrates the brain storm optimization algorithm into the swarm learning framework, named BSO-SL. This approach clusters similar clients into different groups based on their model distributions. Additionally, leveraging the architecture of BSO, clients are given the probability to engage in collaborative learning both within their cluster and with clients outside their cluster, preventing the model from converging to local optima. The proposed method has been validated on a real-world diabetic retinopathy image classification dataset, and the experimental results demonstrate the effectiveness of the proposed approach.	翻訳日:2024-04-25 14:53:37 公開日:2024-04-24
# WiFiを用いたセンシングシステムのセキュリティ分析:摂動攻撃の脅威 Security Analysis of WiFi-based Sensing Systems: Threats from Perturbation Attacks ( http://arxiv.org/abs/2404.15587v1 ) ライセンス: Link先を確認	Hangcheng Cao, Wenbin Huang, Guowen Xu, Xianhao Chen, Ziyang He, Jingyang Hu, Hongbo Jiang, Yuguang Fang,	(参考訳) 深層学習技術は、WiFiベースの無線センシングシステムの性能向上に重要である。しかし、本質的には敵の摂動攻撃に対して脆弱であり、残念なことに、WiFiセンシングコミュニティ内のこのセキュリティ問題に深刻な注意が払われている。本稿では,WiIntruderと呼ばれる攻撃を,既存のWi-Fiセンサシステムのセキュリティを評価する触媒として機能する,普遍性,堅牢性,ステルスネスと区別する。本攻撃は, センサモデル間でのユーザ状態固有の特徴空間の差別化による伝達性の最大化, 共通の用途に適用可能な汎用的な摂動攻撃, 2) ヒューリスティック粒子群駆動摂動生成アルゴリズムにより臨界パラメータが最適化された場合に, デバイス同期や無線伝搬による摂動信号の歪みに対処すること, (3) 生成逆数ネットワークによって生じる摂動サロゲートのランダムな切替による攻撃パターンの多様性とステルス性の向上。広範にわたる実験結果から、ユーザ認証や呼吸監視を含む、一般的なWiFiベースのサービスに対する摂動攻撃の実践的脅威が確認された。 Deep learning technologies are pivotal in enhancing the performance of WiFi-based wireless sensing systems. However, they are inherently vulnerable to adversarial perturbation attacks, and regrettably, there is lacking serious attention to this security issue within the WiFi sensing community. In this paper, we elaborate such an attack, called WiIntruder, distinguishing itself with universality, robustness, and stealthiness, which serves as a catalyst to assess the security of existing WiFi-based sensing systems. This attack encompasses the following salient features: (1) Maximizing transferability by differentiating user-state-specific feature spaces across sensing models, leading to a universally effective perturbation attack applicable to common applications; (2) Addressing perturbation signal distortion caused by device synchronization and wireless propagation when critical parameters are optimized through a heuristic particle swarm-driven perturbation generation algorithm; and (3) Enhancing attack pattern diversity and stealthiness through random switching of perturbation surrogates generated by a generative adversarial network. Extensive experimental results confirm the practical threats of perturbation attacks to common WiFi-based services, including user authentication and respiratory monitoring.	翻訳日:2024-04-25 14:53:37 公開日:2024-04-24
# クレーム検証のための最小証拠群同定 Minimal Evidence Group Identification for Claim Verification ( http://arxiv.org/abs/2404.15588v1 ) ライセンス: Link先を確認	Xiangci Li, Sihao Chen, Rajvi Kapadia, Jessica Ouyang, Fan Zhang,	(参考訳) 現実の環境でのクレーム検証(例えば、Webから取得した多数の候補証拠に対して)は通常、クレームに対する完全なサポートを提供する証拠の完全なセットを特定し、集約する必要がある。この問題は、異なる視点からクレームを検証するために使用できる、異なる証拠のセットが存在する場合、特に困難になる。本稿では,クレーム検証のための最小限のエビデンスグループ(MEG)を正式に定義し,検討する。本稿では, 証拠群が主張に完全/部分的支持を与えるか否かの詳細な推測に基づいて, MEG の識別を集合被覆問題から減らすことができることを示す。提案手法は,WiCEおよびSciFactデータセットのLLMプロンプトによる絶対的な改善を18.4%,34.8%達成する。最後に、クレーム生成のような下流アプリケーションにおけるMEGの利点を実証する。 Claim verification in real-world settings (e.g. against a large collection of candidate evidences retrieved from the web) typically requires identifying and aggregating a complete set of evidence pieces that collectively provide full support to the claim. The problem becomes particularly challenging when there exists distinct sets of evidence that could be used to verify the claim from different perspectives. In this paper, we formally define and study the problem of identifying such minimal evidence groups (MEGs) for claim verification. We show that MEG identification can be reduced from Set Cover problem, based on entailment inference of whether a given evidence group provides full/partial support to a claim. Our proposed approach achieves 18.4% and 34.8% absolute improvements on the WiCE and SciFact datasets over LLM prompting. Finally, we demonstrate the benefits of MEGs in downstream applications such as claim generation.	翻訳日:2024-04-25 14:53:37 公開日:2024-04-24
# 建設環境における複雑度が車両の経路行動に及ぼす影響:中国北京におけるタクシーの移動に関する実証的研究から The impact of complexity in the built environment on vehicular routing behavior: Insights from an empirical study of taxi mobility in Beijing, China ( http://arxiv.org/abs/2404.15589v1 ) ライセンス: Link先を確認	Chaogui Kang, Zheren Liu,	(参考訳) 都市交通の介入と都市最適化計画の立案には, 分散車体移動のモデル化と周辺都市建設環境との関連が不可欠である。しかし、確立された車両経路選択モデルでは、運転者の走行経路選択に影響を及ぼす境界的行動合理性と都市建設環境の複雑な特性を十分に考慮できなかった。そのため、車体移動パターンの時空間特性は十分に説明されず、関連する輸送介入の粒度を制限した。この制限に対処するため,運転時のアンカー効果と露出嗜好を模倣する車両経路選択モデルを提案した。提案モデルにより, 既往の研究でほとんど無視されてきた車両の経路挙動に対する構築環境の影響を定量的に検証することができる。その結果,提案モデルでは,最短経路原理に基づく従来の車両経路選択モデルよりも12%高い性能を示した。北京のタクシー運転手の経路行動パターンを実証分析したところ、ドライバーは短い時間で経路を選択する傾向にあり、交差点での損失も少ないことが判明した。反対に、ドライバーは最短の経路よりも意外に長い乗客を輸送するために、環状道路や高速道路に強く依存していることもわかりました。さらに,道路の偏心性,中心性,平均道路長,土地利用の多様性,空の視認性,建築範囲などの都市環境の特徴は,提案モデルの性能向上の約5%を占め,運転者の経路選択行動に影響を与える可能性がある。また,出発時間,旅行距離,職業状況が異なる旅行のモデル化結果に基づいて,上記の調査を精査する。 The modeling of disaggregated vehicular mobility and its associations with the ambient urban built environment is essential for developing operative transport intervention and urban optimization plans. However, established vehicular route choice models failed to fully consider the bounded behavioral rationality and the complex characteristics of the urban built environment affecting drivers' route choice preference. Therefore, the spatio-temporal characteristics of vehicular mobility patterns were not fully explained, which limited the granular implementation of relevant transport interventions. To address this limitation, we proposed a vehicular route choice model that mimics the anchoring effect and the exposure preference while driving. The proposed model enables us to quantitatively examine the impact of the built environment on vehicular routing behavior, which has been largely neglected in previous studies. Results show that the proposed model performs 12% better than the conventional vehicular route choice model based on the shortest path principle. Our empirical analysis of taxi drivers' routing behavior patterns in Beijing, China uncovers that drivers are inclined to choose routes with shorter time duration and with less loss at traversal intersections. Counterintuitively, we also found that drivers heavily rely on circuitous ring roads and expressways to deliver passengers, which are unexpectedly longer than the shortest paths. Moreover, characteristics of the urban built environment including road eccentricity, centrality, average road length, land use diversity, sky visibility, and building coverage can affect drivers' route choice behaviors, accounting for about 5% of the increase in the proposed model's performance. We also refine the above explorations according to the modeling results of trips that differ in departure time, travel distance, and occupation status.	翻訳日:2024-04-25 14:53:37 公開日:2024-04-24
# 教師付き適応器を用いた学習画像圧縮のための領域適応 Domain Adaptation for Learned Image Compression with Supervised Adapters ( http://arxiv.org/abs/2404.15591v1 ) ライセンス: Link先を確認	Alberto Presta, Gabriele Spadaro, Enzo Tartaglione, Attilio Fiandrotti, Marco Grangetto,	(参考訳) Learned Image Compression (lic)では、モデルはソースドメインからサンプリングされた画像の符号化と復号を訓練し、多くの場合、自然画像の伝統的なコーデックよりも優れている。本研究では,デコーダにアダプタモジュールを挿入することで,事前学習されたモデルを複数のターゲットドメインに適用する問題に取り組む。各アダプタは、トレーニング時にイメージを忘れることなく、特定のドメイン上のデコーダのパフォーマンスを改善する。ゲートネットワークは、ビットストリームの復号化時にアダプタからのコントリビューションを最適にブレンドするために重みを演算する。提案手法を2つの最先端事前訓練モデルに対して実験的に検証し, 対象領域の速度歪み効率の改善を, ソース領域のペナルティを伴わずに観測した。さらに、学習対象領域と類似性を見出す能力により、外部の画像に対しても符号化効率が向上する。 In Learned Image Compression (LIC), a model is trained at encoding and decoding images sampled from a source domain, often outperforming traditional codecs on natural images; yet its performance may be far from optimal on images sampled from different domains. In this work, we tackle the problem of adapting a pre-trained model to multiple target domains by plugging into the decoder an adapter module for each of them, including the source one. Each adapter improves the decoder performance on a specific domain, without the model forgetting about the images seen at training time. A gate network computes the weights to optimally blend the contributions from the adapters when the bitstream is decoded. We experimentally validate our method over two state-of-the-art pre-trained models, observing improved rate-distortion efficiency on the target domains without penalties on the source domain. Furthermore, the gate's ability to find similarities with the learned target domains enables better encoding efficiency also for images outside them.	翻訳日:2024-04-25 14:53:37 公開日:2024-04-24
# ImplicitAVE: インプシット属性値抽出のためのオープンソースデータセットとマルチモーダルLCMベンチマーク ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction ( http://arxiv.org/abs/2404.15592v1 ) ライセンス: Link先を確認	Henry Peng Zou, Vinay Samuel, Yue Zhou, Weizhi Zhang, Liancheng Fang, Zihe Song, Philip S. Yu, Cornelia Caragea,	(参考訳) 既存の属性値抽出(AVE)データセットは、暗黙の属性を無視しながら、明示的な属性値に重点を置いている。これらの制限に対処するため、暗黙的な属性値抽出のための最初の公開マルチモーダルデータセットであるImplicitAVEを提案する。 MAVEデータセットからソースされたImplicitAVEは、暗黙のAVEとマルチモダリティを含むように慎重にキュレーションされ、結果として5つのドメインにわたる68kトレーニングと1.6kテストデータの洗練されたデータセットが生成される。また,マルチモーダル大言語モデル(MLLM)を暗黙AVEに適用し,ImplicitAVEデータセット上でMLLMの包括的なベンチマークを確立する。 11種類のMLLMを持つ最近の6つのMLLMは、さまざまな設定で評価されており、暗黙的な値抽出がMLLMにとって難しい課題であることを示している。この研究の貢献には、ImplicitAVEの開発とリリース、暗黙のAVEのための様々なMLLMの探索とベンチマークが含まれ、貴重な洞察と将来の研究方向性を提供する。データセットとコードはhttps://github.com/HenryPengZou/ImplicitAVEで入手できる。 Existing datasets for attribute value extraction (AVE) predominantly focus on explicit attribute values while neglecting the implicit ones, lack product images, are often not publicly available, and lack an in-depth human inspection across diverse domains. To address these limitations, we present ImplicitAVE, the first, publicly available multimodal dataset for implicit attribute value extraction. ImplicitAVE, sourced from the MAVE dataset, is carefully curated and expanded to include implicit AVE and multimodality, resulting in a refined dataset of 68k training and 1.6k testing data across five domains. We also explore the application of multimodal large language models (MLLMs) to implicit AVE, establishing a comprehensive benchmark for MLLMs on the ImplicitAVE dataset. Six recent MLLMs with eleven variants are evaluated across diverse settings, revealing that implicit value extraction remains a challenging task for MLLMs. The contributions of this work include the development and release of ImplicitAVE, and the exploration and benchmarking of various MLLMs for implicit AVE, providing valuable insights and potential future research directions. Dataset and code are available at https://github.com/HenryPengZou/ImplicitAVE	翻訳日:2024-04-25 14:53:37 公開日:2024-04-24
# 深部ロングテール分類の進歩に関する調査研究 A Survey of Deep Long-Tail Classification Advancements ( http://arxiv.org/abs/2404.15593v1 ) ライセンス: Link先を確認	Charika de Alvis, Suranga Seneviratne,	(参考訳) 実世界の多くのデータ分布は、ほとんど均一ではない。代わりに、様々な種類の歪んだ長い尾の分布がよく見られる。これは機械学習にとって興味深い問題であり、ほとんどのアルゴリズムが均一に分散されたデータを想定したり、うまく機能する。この問題は、大量のトレーニングデータを必要とする現在の最先端のディープラーニングモデルによってさらに悪化している。このように、不均衡なデータから学ぶことは難しい研究問題であり、ディープラーニングのより現実的な応用に向けて進む際には、解決しなければならない課題である。クラス不均衡の文脈では、標準ベンチマークデータセットに対するSOTA(State-of-the-art)の精度は通常、CIFAR100のようなより困難なデータセットであっても75%以下に低下する。それでも、このニッチなディープラーニング分野には進歩があった。そこで本研究では,過去数年間に1つの数学的枠組みの下で発生した研究に焦点をあて,長い尾の分類の問題に対処するために提案された様々な手法の分類法を提案する。また、標準的な性能指標、収束研究、特徴分布、分類器分析についても論じる。また、異なるSOTA法の性能を定量的に比較し、残りの課題と今後の研究方向性を議論して調査を締めくくる。 Many data distributions in the real world are hardly uniform. Instead, skewed and long-tailed distributions of various kinds are commonly observed. This poses an interesting problem for machine learning, where most algorithms assume or work well with uniformly distributed data. The problem is further exacerbated by current state-of-the-art deep learning models requiring large volumes of training data. As such, learning from imbalanced data remains a challenging research problem and a problem that must be solved as we move towards more real-world applications of deep learning. In the context of class imbalance, state-of-the-art (SOTA) accuracies on standard benchmark datasets for classification typically fall less than 75%, even for less challenging datasets such as CIFAR100. Nonetheless, there has been progress in this niche area of deep learning. To this end, in this survey, we provide a taxonomy of various methods proposed for addressing the problem of long-tail classification, focusing on works that happened in the last few years under a single mathematical framework. We also discuss standard performance metrics, convergence studies, feature distribution and classifier analysis. We also provide a quantitative comparison of the performance of different SOTA methods and conclude the survey by discussing the remaining challenges and future research direction.	翻訳日:2024-04-25 14:53:37 公開日:2024-04-24
# 変量的深層生存機械--知覚的アウトカムによる生存回帰- Variational Deep Survival Machines: Survival Regression with Censored Outcomes ( http://arxiv.org/abs/2404.15595v1 ) ライセンス: Link先を確認	Qinxin Wang, Jiayuan Huang, Junhui Li, Jiaming Liu,	(参考訳) サバイバル・レグレッション(Survival regression)とは、ある出来事がいつ起こるか、通常は死か失敗かを予測することを目的としている。完全パラメトリック法[18]は,検閲の有無による個人パラメトリック分布の混合として生存関数を推定するために提案される。本稿では,サバイバルデータをクラスタリングし,プリミティブ分布を組み合わせることにより,生存時間を予測できる新しい手法を提案する。本稿では,2種類の変分自動エンコーダ(VAE)を提案する。モデルは、VAE損失と回帰損失を共同最適化することにより、エンドツーエンドで訓練される。データセット支援とFLCHAINに関する詳細な実験により,本手法はクラスタリング結果を効果的に改善し,従来手法と競合するスコアを得られることを示した。長期的には,モデル予測の優れた結果を示す。私たちのコードはhttps://github.com/qinzzz/auton-survival-785で公開されています。 Survival regression aims to predict the time when an event of interest will take place, typically a death or a failure. A fully parametric method [18] is proposed to estimate the survival function as a mixture of individual parametric distributions in the presence of censoring. In this paper, We present a novel method to predict the survival time by better clustering the survival data and combine primitive distributions. We propose two variants of variational auto-encoder (VAE), discrete and continuous, to generate the latent variables for clustering input covariates. The model is trained end to end by jointly optimizing the VAE loss and regression loss. Thorough experiments on dataset SUPPORT and FLCHAIN show that our method can effectively improve the clustering result and reach competitive scores with previous methods. We demonstrate the superior result of our model prediction in the long-term. Our code is available at https://github.com/qinzzz/auton-survival-785.	翻訳日:2024-04-25 14:53:37 公開日:2024-04-24
# VulEval: ソフトウェア脆弱性検出のリポジトリレベル評価を目指して VulEval: Towards Repository-Level Evaluation of Software Vulnerability Detection ( http://arxiv.org/abs/2404.15596v1 ) ライセンス: Link先を確認	Xin-Cheng Wen, Xinchen Wang, Yujia Chen, Ruida Hu, David Lo, Cuiyun Gao,	(参考訳) ディープラーニング(DL)ベースの手法は、ソフトウェア脆弱性の検出に有効であることが証明されている。現在の手法は、主に単一機能(すなわち、手続き内脆弱性)の検出に重点を置いており、実際はより複雑な手続き間脆弱性検出シナリオを無視している。例えば、開発者は定期的にプログラム分析を行い、リポジトリ内の複数の機能にまたがる脆弱性を検出する。さらに、広く使用されているベンチマークデータセットは一般的に、プロデュース内脆弱性のみを含んでおり、プロデュース間脆弱性検出機能の評価は未調査のままである。そこで本稿では,システム間の脆弱性の検出性能を同時に評価することを目的とした,レポジトリレベルの評価システムであるtextbf{VulEval}を提案する。特に、VulEvalは、3つの相互接続評価タスクで構成されている。 \textbf{(1) Function-Level Vulnerability Detection}は、コードスニペットが与えられたプロシージャ内脆弱性を検出することを目的としており、 \textbf{(2) Vulnerability-Related Dependency Prediction}は、脆弱性に関する説明を提供するためにコールグラフから最も関連性の高い依存関係を取得することを目的としており、また、 \textbf{(3) Repository-Level Vulnerability Detection}は、2番目のタスクで特定された依存関係と組み合わせることで、プロセス間脆弱性を検出することを目的としている。 VulEvalはまた、C/C++プログラミング言語の合計4,196のCVEエントリ、232,239の関数、および対応する4,699のリポジトリレベルのソースコードからなる大規模なデータセットで構成されている。我々の分析は、ソフトウェア脆弱性検出の現在の進歩と今後の方向性を強調している。 Deep Learning (DL)-based methods have proven to be effective for software vulnerability detection, with a potential for substantial productivity enhancements for detecting vulnerabilities. Current methods mainly focus on detecting single functions (i.e., intra-procedural vulnerabilities), ignoring the more complex inter-procedural vulnerability detection scenarios in practice. For example, developers routinely engage with program analysis to detect vulnerabilities that span multiple functions within repositories. In addition, the widely-used benchmark datasets generally contain only intra-procedural vulnerabilities, leaving the assessment of inter-procedural vulnerability detection capabilities unexplored. To mitigate the issues, we propose a repository-level evaluation system, named \textbf{VulEval}, aiming at evaluating the detection performance of inter- and intra-procedural vulnerabilities simultaneously. Specifically, VulEval consists of three interconnected evaluation tasks: \textbf{(1) Function-Level Vulnerability Detection}, aiming at detecting intra-procedural vulnerability given a code snippet; \textbf{(2) Vulnerability-Related Dependency Prediction}, aiming at retrieving the most relevant dependencies from call graphs for providing developers with explanations about the vulnerabilities; and \textbf{(3) Repository-Level Vulnerability Detection}, aiming at detecting inter-procedural vulnerabilities by combining with the dependencies identified in the second task. VulEval also consists of a large-scale dataset, with a total of 4,196 CVE entries, 232,239 functions, and corresponding 4,699 repository-level source code in C/C++ programming languages. Our analysis highlights the current progress and future directions for software vulnerability detection.	翻訳日:2024-04-25 14:53:37 公開日:2024-04-24
# GRSN:PMDPとMARLのためのGated Recurrent Spiking Neurons GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL ( http://arxiv.org/abs/2404.15597v1 ) ライセンス: Link先を確認	Lang Qin, Ziming Wang, Runhao Jiang, Rui Yan, Huajin Tang,	(参考訳) スパイキングニューラルネットワーク(SNN)は、エネルギー効率と高速推論能力のため、様々な分野に広く応用されている。 SNNを強化学習(RL)に適用することで、エージェントの計算リソース要件を大幅に削減し、リソース制約のある条件下でのアルゴリズムの性能を向上させることができる。しかし、現在のスパイキング強化学習(SRL)アルゴリズムでは、複数の時間ステップのシミュレーション結果がRLの単一ステップ決定にしか対応しない。これは脳の実際の時間的ダイナミクスとは大きく異なり、また、時間的データを処理するSNNの能力を完全に活用することができない。この時間的ミスマッチ問題に対処し、スパイキングニューロン固有の時間的ダイナミクスを更に活用するために、スパイキングニューロンの単一ステップ更新を利用してRLの履歴情報を蓄積する新しい時間的アライメントパラダイム(TAP)を提案し、スパイキングニューロンのメモリ容量を向上させるゲートユニットを導入した。実験の結果,再帰型ニューラルネットワーク(RNN)と同等の性能を持つが,約50%の消費電力で,部分的に観測可能なマルコフ決定過程(POMDP)とマルチエージェント協調問題を解くことができることがわかった。 Spiking neural networks (SNNs) are widely applied in various fields due to their energy-efficient and fast-inference capabilities. Applying SNNs to reinforcement learning (RL) can significantly reduce the computational resource requirements for agents and improve the algorithm's performance under resource-constrained conditions. However, in current spiking reinforcement learning (SRL) algorithms, the simulation results of multiple time steps can only correspond to a single-step decision in RL. This is quite different from the real temporal dynamics in the brain and also fails to fully exploit the capacity of SNNs to process temporal data. In order to address this temporal mismatch issue and further take advantage of the inherent temporal dynamics of spiking neurons, we propose a novel temporal alignment paradigm (TAP) that leverages the single-step update of spiking neurons to accumulate historical state information in RL and introduces gated units to enhance the memory capacity of spiking neurons. Experimental results show that our method can solve partially observable Markov decision processes (POMDPs) and multi-agent cooperation problems with similar performance as recurrent neural networks (RNNs) but with about 50% power consumption.	翻訳日:2024-04-25 14:53:37 公開日:2024-04-24
# ラベル相関の探索による正のラベルのみによるフェデレーション学習 Federated Learning with Only Positive Labels by Exploring Label Correlations ( http://arxiv.org/abs/2404.15598v1 ) ライセンス: Link先を確認	Xuming An, Dui Wang, Li Shen, Yong Luo, Han Hu, Bo Du, Yonggang Wen, Dacheng Tao,	(参考訳) フェデレートラーニングは、プライバシー上の制約の下で複数のユーザのデータを使用することで、モデルを協調的に学習することを目的としている。本稿では,各クライアントに1つのクラスラベルのみを付加した場合に,自明な解法と極めて低い性能が得られるような,フェデレート学習環境下でのマルチラベル分類問題について検討する。この問題は、サーバサイドに特別に設計された正規化子を追加することで解決できる。有効ではあるが、ラベル相関は単純に無視されるため、準最適性能が得られる。さらに、特に対照的な方法でのトレーニングモデルでは、サーバとクライアントの間でユーザのプライベートな埋め込みを頻繁に交換するのは高価で安全ではない。これらの欠点を解消するために,ラベル相関(FedALC)を探索し,フェデレート平均化(Federated Averaging)と呼ばれる新しい手法を提案する。特に、FedALCは、異なるラベルペアに対するクラス埋め込み学習におけるラベル相関を推定し、モデルトレーニングを改善するためにそれを利用する。安全性をさらに向上させ,通信オーバーヘッドを低減するため,サーバとクライアントが一度だけクラス埋め込みを交換できるように,各クライアントに対して固定クラス埋め込みを学習するための変種を提案する。複数の一般的なデータセットに対する大規模な実験は、我々のFedALCが既存のデータセットを著しく上回っていることを示している。 Federated learning aims to collaboratively learn a model by using the data from multiple users under privacy constraints. In this paper, we study the multi-label classification problem under the federated learning setting, where trivial solution and extremely poor performance may be obtained, especially when only positive data w.r.t. a single class label are provided for each client. This issue can be addressed by adding a specially designed regularizer on the server-side. Although effective sometimes, the label correlations are simply ignored and thus sub-optimal performance may be obtained. Besides, it is expensive and unsafe to exchange user's private embeddings between server and clients frequently, especially when training model in the contrastive way. To remedy these drawbacks, we propose a novel and generic method termed Federated Averaging by exploring Label Correlations (FedALC). Specifically, FedALC estimates the label correlations in the class embedding learning for different label pairs and utilizes it to improve the model training. To further improve the safety and also reduce the communication overhead, we propose a variant to learn fixed class embedding for each client, so that the server and clients only need to exchange class embeddings once. Extensive experiments on multiple popular datasets demonstrate that our FedALC can significantly outperform existing counterparts.	翻訳日:2024-04-25 14:53:37 公開日:2024-04-24
# 動的混雑ゲームのためのHuman-in-the-loop学習 Human-in-the-loop Learning for Dynamic Congestion Games ( http://arxiv.org/abs/2404.15599v1 ) ライセンス: Link先を確認	Hongbo Li, Lingjie Duan,	(参考訳) 今日、モバイルユーザーはクラウドソーシングプラットフォーム(Wazeなど)を通じてトラフィックの観察を学習し、共有する。しかし、こうしたプラットフォームは単に、最短経路を推奨するために、利用者の自尊心を抱き、旅行や将来の他者への他経路の学習を奨励するものではない。先行研究は,ユーザの情報学習を考慮せずに,ワンショットの渋滞ゲームに焦点をあてる一方,我々の研究は,利用者が交通条件を学習し,ループ内での確率的経路で変化させる方法について研究している。解析の結果,筋電図の経路は高度に確率的経路の探索に繋がることが明らかとなった。これにより、長期の社会的コストを最小限に抑えるための社会的に最適な政策と比較して、2ドル以上のアナーキー(PoA)の価格が上昇する。さらに, 利用者の交通危険信念に対する正しい学習の収束を確保するために, 筋電図政策は失敗する。これを解決するために、価格よりも実装が容易な情報的(非金銭的)メカニズムに注目します。まず,ベイジアンパースケーション文学における既存の情報ハイディング機構と決定論的パスレコメンデーション機構が偶数(\text{PoA}=\infty)で機能しないことを示す。そこで本稿では,選択したユーザグループからすべての情報を隠蔽し,他のユーザグループに対して状態依存型確率的レコメンデーションを提供する,新たな隠れ確率的レコメンデーション(CHAR)機構を提案する。我々のCHARは PoA を (\frac{5}{4}) より小さくすることに成功した。並列ネットワークの他に、我々は解析とCHARを複数の中間ノードを持つより一般的な線形パスグラフに拡張し、PoAの結果が変わらないことを証明した。さらに、実際のデータセットによる実験を行い、ルーティンググラフをさらに拡張し、CHARの最適に近い性能を検証する。 Today mobile users learn and share their traffic observations via crowdsourcing platforms (e.g., Waze). Yet such platforms simply cater to selfish users' myopic interests to recommend the shortest path, and do not encourage enough users to travel and learn other paths for future others. Prior studies focus on one-shot congestion games without considering users' information learning, while our work studies how users learn and alter traffic conditions on stochastic paths in a human-in-the-loop manner. Our analysis shows that the myopic routing policy leads to severe under-exploration of stochastic paths. This results in a price of anarchy (PoA) greater than $2$, as compared to the socially optimal policy in minimizing the long-term social cost. Besides, the myopic policy fails to ensure the correct learning convergence about users' traffic hazard beliefs. To address this, we focus on informational (non-monetary) mechanisms as they are easier to implement than pricing. We first show that existing information-hiding mechanisms and deterministic path-recommendation mechanisms in Bayesian persuasion literature do not work with even (\text{PoA}=\infty). Accordingly, we propose a new combined hiding and probabilistic recommendation (CHAR) mechanism to hide all information from a selected user group and provide state-dependent probabilistic recommendations to the other user group. Our CHAR successfully ensures PoA less than (\frac{5}{4}), which cannot be further reduced by any other informational (non-monetary) mechanism. Besides the parallel network, we further extend our analysis and CHAR to more general linear path graphs with multiple intermediate nodes, and we prove that the PoA results remain unchanged. Additionally, we carry out experiments with real-world datasets to further extend our routing graphs and verify the close-to-optimal performance of our CHAR.	翻訳日:2024-04-25 14:43:50 公開日:2024-04-24
# ディープフェイクと高等教育 Deepfakes and Higher Education: A Research Agenda and Scoping Review of Synthetic Media ( http://arxiv.org/abs/2404.15601v1 ) ライセンス: Link先を確認	Jasper Roe, Mike Perkins,	(参考訳) 説得力のある合成メディアを生み出すソフトウェアが利用できることは、第三次教育に脅威と恩恵をもたらす。この研究は、他のタイプの合成メディアが存在する一方で、現実人の高度な生成AI(GenAI)偽物であるディープフェイクに焦点を当てている。本論文は182冊の査読論文の初回スコーピングレビューを行うことにより,複数の分野にわたるディープフェイクに関する現在の文献を評価するものである。このレビューでは、検出方法、悪意のあるアプリケーション、潜在的な利益の3つの主要な傾向が明らかになったが、第三次教育の文脈におけるディープフェイクに関する具体的な研究は見つからなかった。これらの傾向の議論に続いて、本研究では、高等教育におけるディープフェイク技術の主要なリスクと潜在的な緩和戦略を仮定し、ディープフェイクと合成メディアの両方の教育と学習を支援するための潜在的有用性について考察する。このことは、高等教育におけるディープフェイクを調査するための包括的かつ異文化的なアプローチを構築するための研究課題の提案に終止符を打つ。 The availability of software which can produce convincing yet synthetic media poses both threats and benefits to tertiary education globally. While other forms of synthetic media exist, this study focuses on deepfakes, which are advanced Generative AI (GenAI) fakes of real people. This conceptual paper assesses the current literature on deepfakes across multiple disciplines by conducting an initial scoping review of 182 peer-reviewed publications. The review reveals three major trends: detection methods, malicious applications, and potential benefits, although no specific studies on deepfakes in the tertiary educational context were found. Following a discussion of these trends, this study applies the findings to postulate the major risks and potential mitigation strategies of deepfake technologies in higher education, as well as potential beneficial uses to aid the teaching and learning of both deepfakes and synthetic media. This culminates in the proposal of a research agenda to build a comprehensive, cross-cultural approach to investigate deepfakes in higher education.	翻訳日:2024-04-25 14:43:50 公開日:2024-04-24
# ボソンサンプリングのためのパターン認識バリデーションの開発 Development of Pattern Recognition Validation for Boson Sampling ( http://arxiv.org/abs/2404.15603v1 ) ライセンス: Link先を確認	Yang Ji, Yongzheng Wu, Shi Wang, Jie Hou, Meiling Chen, Ming Ni,	(参考訳) ボソンサンプリングは、量子計算の優位性を示す最も魅力的な量子計算モデルの一つである。しかし、この目的は光子識別性などのノイズ源を考慮することは困難である。量子量子化の優位性を示すために光子識別性が高すぎるかどうかを評価するためにベイズ検証法が開発され、ボソンサンプリングのためのパターン認識検証法が開発された。 K平均++法で構築されたクラスタに基づいて、テスト値の分布はほぼ単調に変化し、特に光子が識別不能に近い場合、光子を識別不能にする。本研究では, 確率分布と平均2ノルム距離を算出し, 固有データ構造を解析する。近似アルゴリズムは、光子識別性によるデータ構造の変化を示すためにも用いられる。 Boson sampling is one of the most attractive quantum computation models to demonstrate the quantum computational advantage. However, this aim may be hard to realize considering noise sources such as photon distinguishability. Inspired by the Bayesian validation developed to evaluate whether photon distinguishability is too high to demonstrate the quantum computational advantage, we develop the pattern recognition validation for boson sampling. Based on clusters constructed with the K means++ method, the distribution of test values is nearly monotonically changed with the photon indistinguishability, especially when photons are close to be indistinguishable. We analyze the intrinsic data structure through calculating probability distributions and mean 2-norm distances of the sorted outputs. Approximation algorithms are also used to show the data structure changes with photon distinguishability.	翻訳日:2024-04-25 14:43:50 公開日:2024-04-24
# 構造化データからのビジネスインサイト生成のためのハイブリッドLCM/ルールベースアプローチ Hybrid LLM/Rule-based Approaches to Business Insights Generation from Structured Data ( http://arxiv.org/abs/2404.15604v1 ) ライセンス: Link先を確認	Aliaksei Vertsel, Mikhail Rumiantsau,	(参考訳) ビジネスデータ分析の分野では、膨大な多様なデータセットから実行可能な洞察を抽出する能力は、情報的な意思決定と競争力を維持するために不可欠である。従来のルールベースのシステムは信頼できるが、現代のビジネスデータの複雑さとダイナミズムに直面したとき、しばしば不足する。逆に、人工知能(AI)モデル、特にLarge Language Models(LLM)は、パターン認識と予測分析において大きな可能性を秘めているが、特定のビジネスアプリケーションに必要な精度を欠く可能性がある。本稿では,ルールベースシステムのロバスト性とLCMの適応力を統合するハイブリッドアプローチの有効性について検討する。 In the field of business data analysis, the ability to extract actionable insights from vast and varied datasets is essential for informed decision-making and maintaining a competitive edge. Traditional rule-based systems, while reliable, often fall short when faced with the complexity and dynamism of modern business data. Conversely, Artificial Intelligence (AI) models, particularly Large Language Models (LLMs), offer significant potential in pattern recognition and predictive analytics but can lack the precision necessary for specific business applications. This paper explores the efficacy of hybrid approaches that integrate the robustness of rule-based systems with the adaptive power of LLMs in generating actionable business insights.	翻訳日:2024-04-25 14:43:50 公開日:2024-04-24
# 複雑な構造テンソルによるCNNの理解と改善:バイオメトリックスによる研究 Understanding and Improving CNNs with Complex Structure Tensor: A Biometrics Study ( http://arxiv.org/abs/2404.15608v1 ) ライセンス: Link先を確認	Kevin Hernandez-Diaz, Josef Bigun, Fernando Alonso-Fernandez,	(参考訳) 我々の研究は、CNNが効果的な方向特徴抽出に苦慮している証拠を提供する。 CNNへの入力は、グレースケールの入力のみを使用する場合と比較して、識別精度を一貫して向上する。また,ミニコンプレックス・ネットワークによって提供される入力と,CNNサイズを削減した入力が,本格的なCNNアーキテクチャよりも優れていたことも実証した。このことは、哺乳類の視界で見られる戦略であるCNNにおける配向機能の事前利用が、それらの制限を緩和するだけでなく、その説明可能性や細い自転車との関連性を高めることを示唆している。 6つのArt CNNアーキテクチャを用いて、生体認証と検証(クロースとオープンワールド)のための近視画像を含む公開データセットで実験を行った。データやシナリオによって、PolyUデータセット上のSOA Equal Error Rate(EER)を5～26%削減しました。 Our study provides evidence that CNNs struggle to effectively extract orientation features. We show that the use of Complex Structure Tensor, which contains compact orientation features with certainties, as input to CNNs consistently improves identification accuracy compared to using grayscale inputs alone. Experiments also demonstrated that our inputs, which were provided by mini complex conv-nets, combined with reduced CNN sizes, outperformed full-fledged, prevailing CNN architectures. This suggests that the upfront use of orientation features in CNNs, a strategy seen in mammalian vision, not only mitigates their limitations but also enhances their explainability and relevance to thin-clients. Experiments were done on publicly available data sets comprising periocular images for biometric identification and verification (Close and Open World) using 6 State of the Art CNN architectures. We reduced SOA Equal Error Rate (EER) on the PolyU dataset by 5-26% depending on data and scenario.	翻訳日:2024-04-25 14:43:50 公開日:2024-04-24
# PoisonedFL:マルチラウンド一貫性によるフェデレーション学習に対するモデルポジショニング攻撃 PoisonedFL: Model Poisoning Attacks to Federated Learning via Multi-Round Consistency ( http://arxiv.org/abs/2404.15611v1 ) ライセンス: Link先を確認	Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong,	(参考訳) モデル中毒は、フェデレート・ラーニング(FL)にとって重要なセキュリティ上の脅威である。既存のモデル中毒攻撃には2つの重要な制限がある。 1)防衛が配備されたとき、及び/又はその際の準最適効果を達成する。 2) モデル更新や実際のクライアントでのローカルトレーニングデータに関する知識が必要です。本研究では,個々のトレーニングラウンドにおいて,悪意のあるクライアント間のモデル更新整合性のみを活用することによって,攻撃効果を訓練ラウンド間で自己キャッシュ化することによって,それらの最適効果が生じることを重要視する。そこで本研究では,悪意のあるクライアントのモデル更新において,実際のクライアントに関する知識を必要とせず,複数ラウンドの一貫性を実現するPoisonedFLを提案する。 5つのベンチマークデータセットに対する実証的な評価は、PoisonedFLが8つの最先端の防御を破り、既存の7つのモデル中毒攻撃を上回っていることを示している。さらに,PoisonedFLに合わせた新たな防御策も検討していますが,この結果から,PoisonedFLを破るためにも適用できることが分かりました。本研究は、FL系が従来考えられていたよりもかなり堅牢であることを示し、新しい防御機構の開発の緊急性を明らかにした。 Model poisoning attacks are critical security threats to Federated Learning (FL). Existing model poisoning attacks suffer from two key limitations: 1) they achieve suboptimal effectiveness when defenses are deployed, and/or 2) they require knowledge of the model updates or local training data on genuine clients. In this work, we make a key observation that their suboptimal effectiveness arises from only leveraging model-update consistency among malicious clients within individual training rounds, making the attack effect self-cancel across training rounds. In light of this observation, we propose PoisonedFL, which enforces multi-round consistency among the malicious clients' model updates while not requiring any knowledge about the genuine clients. Our empirical evaluation on five benchmark datasets shows that PoisonedFL breaks eight state-of-the-art defenses and outperforms seven existing model poisoning attacks. Moreover, we also explore new defenses that are tailored to PoisonedFL, but our results show that we can still adapt PoisonedFL to break them. Our study shows that FL systems are considerably less robust than previously thought, underlining the urgency for the development of new defense mechanisms.	翻訳日:2024-04-25 14:43:50 公開日:2024-04-24
# 混合量子-古典力学におけるユニタリ基底変換 Unitary Basis Transformations in Mixed Quantum-Classical Dynamics ( http://arxiv.org/abs/2404.15614v1 ) ライセンス: Link先を確認	Ken Miyazaki, Alex Krotz, Roel Tempelaar,	(参考訳) 量子計算のコストを最小化するための一般的なアプローチは、量子システムを最適に切り詰められる基底に変換することである。ここでは、類似のユニタリ変換を受ける運動の古典方程式を導出し、これらを混合量子古典力学に統合することを提案し、量子座標と古典座標の両方に対して任意の基底内でこの方法を適用することができる。この目的のために、標準位置とモータは、ユニタリ変換に対応可能な複素数値の古典座標の集合に結合される。フォノンの存在下では、電子キャリアが単一不純物に散乱する表面ホッピング計算により、結果として生じるアプローチの可能性を示す。不純物の局所化と高エネルギー励起の非局在化の両方を捉える適切な基底変換は、古典的基底集合と量子的基底集合のごく一部で、力学を忠実に捉えることが示されている。 A common approach to minimizing the cost of quantum computations is by transforming a quantum system into a basis that can be optimally truncated. Here, we derive classical equations of motion subjected to similar unitary transformations, and propose their integration into mixed quantum-classical dynamics, enabling this class of methods to be applied within arbitrary bases for both the quantum and classical coordinates. To this end, canonical positions and momenta are combined into a set of complex-valued classical coordinates amenable to unitary transformations. We demonstrate the potential of the resulting approach by means of surface hopping calculations of an electronic carrier scattering onto a single impurity in the presence of phonons. Appropriate basis transformations, capturing both the localization of the impurity and the delocalization of higher-energy excitations, are shown to faithfully capture the dynamics within a fraction of the classical and quantum basis sets.	翻訳日:2024-04-25 14:43:50 公開日:2024-04-24
# MDDD:クロスオブジェクトおよびクロスセッション脳波を用いた感情認識における非深度伝達学習のための動的分布を用いたマニフォールド型ドメイン適応 MDDD: Manifold-based Domain Adaptation with Dynamic Distribution for Non-Deep Transfer Learning in Cross-subject and Cross-session EEG-based Emotion Recognition ( http://arxiv.org/abs/2404.15615v1 ) ライセンス: Link先を確認	Ting Luo, Jing Zhang, Yingwei Qiu, Li Zhang, Yaohua Hu, Zhuliang Yu, Zhen Liang,	(参考訳) 脳波(EEG)に基づく感情脳-コンピュータインタフェースを用いた感情デコーディングは、感情コンピューティングの分野において重要な領域である。本研究では,動的分布を用いたマニフォールド型ドメイン適応(MDDD)と呼ばれる,新しい非深度伝達学習手法を提案する。提案したMDDDには,多様体特徴変換,動的分布アライメント,分類器学習,アンサンブル学習の4つの主要モジュールが含まれている。データは最適グラスマン多様体空間への変換を行い、ソースとターゲット領域の動的アライメントを可能にする。このプロセスは、その重要性に応じて境界分布と条件分布の両方を優先順位付けし、様々な種類のデータにまたがる適応効率を向上する。分類器学習では、構造的リスク最小化の原理が統合され、ロバストな分類モデルが開発される。これは動的分布アライメントによって補われ、分類器を反復的に洗練する。さらに、アンサンブル学習モジュールは最適化プロセスの異なる段階で得られた分類器を集約し、分類器の多様性を活用して全体的な予測精度を向上させる。実験結果から,MDDDは従来の非深層学習法よりも優れ,3.54%の平均的な改善を実現し,深層学習法に匹敵する結果を得た。これは、MDDDが現実のシナリオにおけるABCIの有用性と適用性を高めるための有望な方法である可能性を示唆している。 Emotion decoding using Electroencephalography (EEG)-based affective brain-computer interfaces represents a significant area within the field of affective computing. In the present study, we propose a novel non-deep transfer learning method, termed as Manifold-based Domain adaptation with Dynamic Distribution (MDDD). The proposed MDDD includes four main modules: manifold feature transformation, dynamic distribution alignment, classifier learning, and ensemble learning. The data undergoes a transformation onto an optimal Grassmann manifold space, enabling dynamic alignment of the source and target domains. This process prioritizes both marginal and conditional distributions according to their significance, ensuring enhanced adaptation efficiency across various types of data. In the classifier learning, the principle of structural risk minimization is integrated to develop robust classification models. This is complemented by dynamic distribution alignment, which refines the classifier iteratively. Additionally, the ensemble learning module aggregates the classifiers obtained at different stages of the optimization process, which leverages the diversity of the classifiers to enhance the overall prediction accuracy. The experimental results indicate that MDDD outperforms traditional non-deep learning methods, achieving an average improvement of 3.54%, and is comparable to deep learning methods. This suggests that MDDD could be a promising method for enhancing the utility and applicability of aBCIs in real-world scenarios.	翻訳日:2024-04-25 14:43:50 公開日:2024-04-24
# 双方向量子探索アルゴリズム A Bi-directional Quantum Search Algorithm ( http://arxiv.org/abs/2404.15616v1 ) ライセンス: Link先を確認	Debanjan Konar, Zain Hafeez, Vaneet Aggarwal,	(参考訳) グロバーの探索アルゴリズムは、様々な部分的なグロバー探索を含むが、量子ビットの増加に伴い反復数が増加し、実装がより計算コストが高くなるにつれて、スケーリングの問題を経験する。本稿では, 部分グロバーの探索アルゴリズムと双方向探索を組み合わせることで, 高速グロバーの量子探索アルゴリズムをBDGS(Bi-Directional Grover Search)と呼ぶ。両方向探索手法をGrover部分探索に組み込み,初期状態と1つのマーク付き状態とを並列に比較した。この記事では、我々の新しいアプローチは、$\frac{\pi}{4\sqrt{2}}\sqrt{N}(1-\sqrt {\frac{1}{b^{r/2k}}})$ iterations over regular Grover Search and partial Grover Search (PGS)であり、$\frac{\pi}{4}\sqrt{N}\sqrt{1-\frac{1}{b}}$ (rere, $N=2^r$ elements, $b$ is a branching factor of partial search, $k= \lceil\log_2b \rceil$)であることを示した。提案したBDGSアルゴリズムは、最先端のDepth-First Grover's Search (DFGS) とジェネリックGrover's Search (GS) の実装を2〜20ドルでベンチマークし、有望な結果を提供する。提案されたBDGSアルゴリズムのQiskit Python実装はGithubで公開されている(https://github.com/hafeezzwiz21/DFGS-BDGS)。 Grover's search algorithms, including various partial Grover searches, experience scaling problems as the number of iterations rises with increased qubits, making implementation more computationally expensive. This paper combines Partial Grover's search algorithm and Bi-directional Search to create a fast Grover's quantum search algorithm, referred to as Bi-Directional Grover Search (BDGS). We incorporated a bi-directional search tactic with a partial Grover search, starting from an initial state and a single marked state in parallel. We have shown in this article that our novel approach requires $\frac{\pi}{4\sqrt{2}}\sqrt{N}(1-\sqrt{\frac{1}{b^{r/2k}}})$ iterations over regular Grover Search and Partial Grover Search (PGS), which takes $\frac{\pi}{4}\sqrt{N}\sqrt{1-\frac{1}{b}}$ (here, $N=2^r$ elements, $b$ is the branching factor of partial search, and $k= \lceil\log_2b \rceil$). The proposed BDGS algorithm is benchmarked against the state-of-the-art Depth-First Grover's Search (DFGS) and generic Grover's Search (GS) implementations for $2$ to $20$ qubits and provides promising results. The Qiskit Python implementation of the proposed BDGS algorithm is available on Github (https://github.com/hafeezzwiz21/DFGS-BDGS).	翻訳日:2024-04-25 14:43:50 公開日:2024-04-24
# DPO:差分強化学習と最適構成探索への応用 DPO: Differential reinforcement learning with application to optimal configuration search ( http://arxiv.org/abs/2404.15617v1 ) ライセンス: Link先を確認	Chandrajit Bajaj, Minh Nguyen,	(参考訳) 連続状態と行動空間を持つ強化学習(RL)は、この分野で最も難しい問題の一つである。現在の学習手法の多くは、学習者にとって最適な戦略を導き出すために、値関数のような積分的アイデンティティに焦点を当てている。そこで本論文では,従来のRL定式化の二重形式について検討し,限られたトレーニングサンプルと短いエピソードで設定を処理できる最初の微分RLフレームワークを提案する。本手法では,局所移動演算子によって符号化されたポリシーを最適化する,ポイントワイドかつステージワイドな反復手法である差分ポリシー最適化(DPO)を導入する。 DPO に対する点収束推定を証明し、現在の理論的研究に匹敵する後悔を与える。このようなポイントワイズ推定は、学習されたポリシーが異なるステップで最適な経路に均一に一致することを保証します。次に、DPOをラグランジアン報酬を用いた最適構成を求める実用的RL問題のクラスに適用する。 DPOは実装が容易で、拡張性があり、いくつかのRL手法に対するベンチマーク実験の競合結果を示す。 Reinforcement learning (RL) with continuous state and action spaces remains one of the most challenging problems within the field. Most current learning methods focus on integral identities such as value functions to derive an optimal strategy for the learning agent. In this paper, we instead study the dual form of the original RL formulation to propose the first differential RL framework that can handle settings with limited training samples and short-length episodes. Our approach introduces Differential Policy Optimization (DPO), a pointwise and stage-wise iteration method that optimizes policies encoded by local-movement operators. We prove a pointwise convergence estimate for DPO and provide a regret bound comparable with current theoretical works. Such pointwise estimate ensures that the learned policy matches the optimal path uniformly across different steps. We then apply DPO to a class of practical RL problems which search for optimal configurations with Lagrangian rewards. DPO is easy to implement, scalable, and shows competitive results on benchmarking experiments against several popular RL methods.	翻訳日:2024-04-25 14:43:50 公開日:2024-04-24
# ニューラル演算子によるパラメトリック偏微分方程式の確率解のためのガウス過程の枠組み Neural Operator induced Gaussian Process framework for probabilistic solution of parametric partial differential equations ( http://arxiv.org/abs/2404.15618v1 ) ライセンス: Link先を確認	Sawan Kumar, Rajdip Nayek, Souvik Chakraborty,	(参考訳) ニューラル作用素の研究は、従来の方法と比較して偏微分方程式(PDE)を解くための効率的な手法の開発に道を開いた。しかしながら、既存の神経オペレータのほとんどは、特に利用可能な限られたデータを持つデータ駆動シナリオにおいて、予測に対する不確実性対策を提供する能力が欠如している。本研究では,ガウス過程(GP)の確率的特性を利用したニューラル演算子によるガウス過程(NOGaP)を提案する。提案手法により予測精度が向上し,不確実性の定量化が可能となった。提案手法は, バーガー方程式, ダーシー流, 非均質ポアソン, 波動対流方程式など, PDE の様々な実験により広く評価されている。さらに,NOGaPの利点を明らかにするために,最先端演算子学習アルゴリズムとの比較検討を行った。その結果, 精度が向上し, 不確実性も期待でき, 提案手法の可能性も示唆された。 The study of neural operators has paved the way for the development of efficient approaches for solving partial differential equations (PDEs) compared with traditional methods. However, most of the existing neural operators lack the capability to provide uncertainty measures for their predictions, a crucial aspect, especially in data-driven scenarios with limited available data. In this work, we propose a novel Neural Operator-induced Gaussian Process (NOGaP), which exploits the probabilistic characteristics of Gaussian Processes (GPs) while leveraging the learning prowess of operator learning. The proposed framework leads to improved prediction accuracy and offers a quantifiable measure of uncertainty. The proposed framework is extensively evaluated through experiments on various PDE examples, including Burger's equation, Darcy flow, non-homogeneous Poisson, and wave-advection equations. Furthermore, a comparative study with state-of-the-art operator learning algorithms is presented to highlight the advantages of NOGaP. The results demonstrate superior accuracy and expected uncertainty characteristics, suggesting the promising potential of the proposed framework.	翻訳日:2024-04-25 14:43:50 公開日:2024-04-24
# メムリスタに基づくニューラルネットワーク性能向上のためのレイヤアンサンブル平均化 Layer Ensemble Averaging for Improving Memristor-Based Artificial Neural Network Performance ( http://arxiv.org/abs/2404.15621v1 ) ライセンス: Link先を確認	Osama Yousuf, Brian Hoskins, Karthick Ramu, Mitchell Fream, William A. Borders, Advait Madhavan, Matthew W. Daniels, Andrew Dienstfrey, Jabez J. McClelland, Martin Lueker-Boden, Gina C. Adam,	(参考訳) 人工ニューラルネットワークは、スケーリング次元によって進歩してきたが、フォン・ノイマンのボトルネックにより、従来の計算は非効率に直面する。 memristorsのようなインメモリの計算アーキテクチャは、ハードウェアの非理想性によって、将来性はあるが課題に直面している。この研究は、ソフトウェアから新興メモリデバイスの欠陥ハードウェアクロスバーに事前学習されたニューラルネットワークソリューションをマッピングし、推論でほぼソフトウェアに近い性能を確実に達成する手法であるレイヤアンサンブル平均化を提案し、実験的に実証する。この手法は、ネットワークが学習した情報を破滅的に忘れることなく新しいタスクを学習しなければならない連続学習問題に対して、カスタム2万台のハードウェアプロトタイピングプラットフォームを用いて検討する。その結果、レイヤマッピングに必要なデバイス数を交換することで、レイヤアンサンブル平均化は、ソフトウェアベースラインに欠陥のあるメモリネットワーク性能を確実に向上させることができることがわかった。本研究では,提案手法を用いて,平均マルチタスク分類精度を61 %から72 %(ソフトウェアベースラインの1 %)に改善する。 Artificial neural networks have advanced due to scaling dimensions, but conventional computing faces inefficiency due to the von Neumann bottleneck. In-memory computation architectures, like memristors, offer promise but face challenges due to hardware non-idealities. This work proposes and experimentally demonstrates layer ensemble averaging, a technique to map pre-trained neural network solutions from software to defective hardware crossbars of emerging memory devices and reliably attain near-software performance on inference. The approach is investigated using a custom 20,000-device hardware prototyping platform on a continual learning problem where a network must learn new tasks without catastrophically forgetting previously learned information. Results demonstrate that by trading off the number of devices required for layer mapping, layer ensemble averaging can reliably boost defective memristive network performance up to the software baseline. For the investigated problem, the average multi-task classification accuracy improves from 61 % to 72 % (< 1 % of software baseline) using the proposed approach.	翻訳日:2024-04-25 14:43:50 公開日:2024-04-24
# FR-NAS: 効率的なニューラルネットワーク探索のためのフォワード・アンド・リバースグラフ予測器 FR-NAS: Forward-and-Reverse Graph Predictor for Efficient Neural Architecture Search ( http://arxiv.org/abs/2404.15622v1 ) ライセンス: Link先を確認	Haoming Zhang, Ran Cheng,	(参考訳) ニューラルネットワーク探索(NAS)は、特定のタスクに適したディープニューラルネットワークの最適構成を特定するための重要なツールとして登場した。しかし、多数のアーキテクチャのトレーニングと評価は、かなりの計算オーバーヘッドをもたらす。これを軽減する方法のひとつにパフォーマンス予測器があり、徹底的なトレーニングをせずにアーキテクチャのポテンシャルを見積もる手段を提供する。ニューラルネットワークは、基本的にDAG(Directed Acyclic Graphs)に類似しているため、グラフニューラルネットワーク(GNN)はそのような予測タスクにおいて明らかな選択となる。それでも、トレーニングデータの不足は、GNNベースの予測器の精度に影響を与える可能性がある。そこで我々はNASのための新しいGNN予測器を提案する。この予測器は、従来のグラフビューと逆グラフビューを組み合わせることで、ニューラルネットワークをベクトル表現に変換する。さらに、GNN予測器にカスタマイズされたトレーニング損失を組み込んで、両タイプの表現の効率的な利用を確実にする。その後,NAS-Bench-101,NAS-Bench-201,DARTS検索空間などのベンチマークデータセットを用いて,50から400サンプルのトレーニングデータセットを用いて評価を行った。先行するGNN予測器と比較して、実験結果は予測精度が3%～16%向上し、予測精度が大幅に向上した。ソースコードはhttps://github.com/EMI-Group/fr-nasで入手できる。 Neural Architecture Search (NAS) has emerged as a key tool in identifying optimal configurations of deep neural networks tailored to specific tasks. However, training and assessing numerous architectures introduces considerable computational overhead. One method to mitigating this is through performance predictors, which offer a means to estimate the potential of an architecture without exhaustive training. Given that neural architectures fundamentally resemble Directed Acyclic Graphs (DAGs), Graph Neural Networks (GNNs) become an apparent choice for such predictive tasks. Nevertheless, the scarcity of training data can impact the precision of GNN-based predictors. To address this, we introduce a novel GNN predictor for NAS. This predictor renders neural architectures into vector representations by combining both the conventional and inverse graph views. Additionally, we incorporate a customized training loss within the GNN predictor to ensure efficient utilization of both types of representations. We subsequently assessed our method through experiments on benchmark datasets including NAS-Bench-101, NAS-Bench-201, and the DARTS search space, with a training dataset ranging from 50 to 400 samples. Benchmarked against leading GNN predictors, the experimental results showcase a significant improvement in prediction accuracy, with a 3%--16% increase in Kendall-tau correlation. Source codes are available at https://github.com/EMI-Group/fr-nas.	翻訳日:2024-04-25 14:43:50 公開日:2024-04-24
# 分子グラフにおけるOOD検出の最適化:拡散モデルを用いた新しいアプローチ Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models ( http://arxiv.org/abs/2404.15625v1 ) ライセンス: Link先を確認	Xu Shen, Yili Wang, Kaixiong Zhou, Shirui Pan, Xin Wang,	(参考訳) オープンワールドテストデータセットは、デプロイされたモデルが正確な予測を行うのに苦労する、アウト・オブ・ディストリビューション(OOD)サンプルと混在することが多い。従来の検出方法は、同じ表現学習モデルを共有するため、OOD検出とID分類性能をトレードオフする必要がある。本研究では,入力分子と再構成グラフの類似性を比較する補助拡散モデルに基づくフレームワークを用いてOOD分子を検出することを提案する。 IDトレーニングサンプルの再構成に向けた生成バイアスのため、OOD分子の類似度スコアは検出を容易にするためにはるかに低い。概念的には単純ですが、このバニラフレームワークを実用的な検出アプリケーションに拡張することは、2つの重要な課題によって制限されています。まず、ユークリッド距離に基づく一般的な類似度指標は、複雑なグラフ構造を考慮できない。第2に、反復的脱臭工程を含む生成モデルは、特に大量の薬物を投与する場合に時間を要する。これらの課題に対処するため、我々の研究はPGR-MOODと呼ばれる分子OD検出のためのプロトタイプグラフ再構成のアプローチを開拓し、3つのイノベーションを生かした。一入力分子及び再構成分子の適合度を総合的に定量する有効な計量二 IDに準ずるがOODから離れる原型グラフを構築するための創造的なグラフ生成装置三試験試料と予め構築した原形質グラフとの類似性を比較し、新規分子の生成過程を省略する効率的でスケーラブルなOOD検出器。 10のベンチマークデータセットと6つのベースラインに関する大規模な実験を行い、当社の優位性を実証した。 The open-world test dataset is often mixed with out-of-distribution (OOD) samples, where the deployed models will struggle to make accurate predictions. Traditional detection methods need to trade off OOD detection and in-distribution (ID) classification performance since they share the same representation learning model. In this work, we propose to detect OOD molecules by adopting an auxiliary diffusion model-based framework, which compares similarities between input molecules and reconstructed graphs. Due to the generative bias towards reconstructing ID training samples, the similarity scores of OOD molecules will be much lower to facilitate detection. Although it is conceptually simple, extending this vanilla framework to practical detection applications is still limited by two significant challenges. First, the popular similarity metrics based on Euclidian distance fail to consider the complex graph structure. Second, the generative model involving iterative denoising steps is time-consuming especially when it runs on the enormous pool of drugs. To address these challenges, our research pioneers an approach of Prototypical Graph Reconstruction for Molecular OOD Detection, dubbed as PGR-MOOD and hinges on three innovations: i) An effective metric to comprehensively quantify the matching degree of input and reconstructed molecules; ii) A creative graph generator to construct prototypical graphs that are in line with ID but away from OOD; iii) An efficient and scalable OOD detector to compare the similarity between test samples and pre-constructed prototypical graphs and omit the generative process on every new molecule. Extensive experiments on ten benchmark datasets and six baselines are conducted to demonstrate our superiority.	翻訳日:2024-04-25 14:43:50 公開日:2024-04-24
# ロバストスパイクニューラルネットワークトレーニングのための生物学的インフォームド励起と抑制バランス Biologically-Informed Excitatory and Inhibitory Balance for Robust Spiking Neural Network Training ( http://arxiv.org/abs/2404.15627v1 ) ライセンス: Link先を確認	Joseph A. Kilgore, Jeffrey D. Kopsick, Giorgio A. Ascoli, Gina C. Adam,	(参考訳) 脳の生物学的制約からインスピレーションを得るニューラルネットワークは、人工知能のエネルギー効率の良いパラダイムを約束する。しかし、これらのネットワークを堅牢にトレーニングする指針の特定には課題がある。さらに、興奮性および抑制性接続の生物学的制約を組み込む場合、トレーニングはさらに難しい問題となる。本研究では,AI関連データセット上で,刺激ニューロンと抑制ニューロンの比率の異なるスパイキングネットワークを訓練する全体的な能力を決定する,低初期発火率や多様な抑制スパイキングパターンなど,いくつかの重要な要因を同定する。その結果、生物学的に現実的な80:20の興奮性:抑制的バランスを持つネットワークは、低活動レベルおよび騒音環境下で確実にトレーニングできることがわかった。さらに、スパイクトレイン同期の尺度であるヴァン・ロッシャム距離は、ノイズに対するネットワークの堅牢性を高めるために阻害ニューロンの重要性についての洞察を与える。この研究は、生物学的にインフォームドされた大規模ネットワークとエネルギー効率の良いハードウェアの実装をサポートする。 Spiking neural networks drawing inspiration from biological constraints of the brain promise an energy-efficient paradigm for artificial intelligence. However, challenges exist in identifying guiding principles to train these networks in a robust fashion. In addition, training becomes an even more difficult problem when incorporating biological constraints of excitatory and inhibitory connections. In this work, we identify several key factors, such as low initial firing rates and diverse inhibitory spiking patterns, that determine the overall ability to train spiking networks with various ratios of excitatory to inhibitory neurons on AI-relevant datasets. The results indicate networks with the biologically realistic 80:20 excitatory:inhibitory balance can reliably train at low activity levels and in noisy environments. Additionally, the Van Rossum distance, a measure of spike train synchrony, provides insight into the importance of inhibitory neurons to increase network robustness to noise. This work supports further biologically-informed large-scale networks and energy efficient hardware implementations.	翻訳日:2024-04-25 14:43:50 公開日:2024-04-24
# SQIAsignHD: SQIsignHDアダプタ署名 SQIAsignHD: SQIsignHD Adaptor Signature ( http://arxiv.org/abs/2404.09026v3 ) ライセンス: Link先を確認	Farzin Renan, Péter Kutas,	(参考訳) 適応シグネチャは、秘密のランダム性をシグネチャ内に隠した標準的なデジタルシグネチャスキームの一般化形式と見なすことができる。アダプタシグネチャは最近の暗号プリミティブであり、暗号通貨などのブロックチェーンアプリケーションにおいて、オンチェーンコストを削減し、ファジビリティを改善し、支払いチャネルネットワーク、決済チャネルハブ、アトミックスワップにおけるオフチェーン形式の支払いに寄与する重要なツールになりつつある。しかし、現在使われているアダプタシグネチャ構造は、ショアのアルゴリズムにより量子逆数に対して脆弱である。本研究では,超特異楕円曲線の等質性に基づく新しい量子抵抗型アダプタシグネチャスキームである$\mathsf{SQIAsignHD}$を導入し,その基礎となるシグネチャスキームとしてSQIsignHDを用い,超特異なDiffie-Hellmanキー交換プロトコルであるSIDHの人工配向の考え方を活用する。さらに、量子ランダムオラクルモデル(QROM)において、我々のスキームが安全であることを示します。 Adaptor signatures can be viewed as a generalized form of the standard digital signature schemes where a secret randomness is hidden within a signature. Adaptor signatures are a recent cryptographic primitive and are becoming an important tool for blockchain applications such as cryptocurrencies to reduce on-chain costs, improve fungibility, and contribute to off-chain forms of payment in payment-channel networks, payment-channel hubs, and atomic swaps. However, currently used adaptor signature constructions are vulnerable to quantum adversaries due to Shor's algorithm. In this work, we introduce $\mathsf{SQIAsignHD}$, a new quantum-resistant adaptor signature scheme based on isogenies of supersingular elliptic curves, using SQIsignHD - as the underlying signature scheme - and exploiting the idea of the artificial orientation on the supersingular isogeny Diffie-Hellman key exchange protocol, SIDH, as the underlying hard relation. We, furthermore, show that our scheme is secure in the Quantum Random Oracle Model (QROM).	翻訳日:2024-04-25 12:48:39 公開日:2024-04-24
# シュロディンガー化によるフォッカー・プランク方程式の量子シミュレーション Quantum simulation of the Fokker-Planck equation via Schrodingerization ( http://arxiv.org/abs/2404.13585v2 ) ライセンス: Link先を確認	Shi Jin, Nana Liu, Yue Yu,	(参考訳) 本稿では,Fokker-Planck方程式を解くための量子シミュレーション手法について述べる。従来の半離散化法は、基礎となるハミルトン力学の保存に失敗することが多く、特に境界条件を組み込んだ場合、ハミルトン構造を変更することもある。我々は、シュロディンガー化法(Schrodingerization method)を用いて、非エルミート力学を持つ任意の線型偏微分方程式をシュロディンガー型方程式系に変換する。この応用をフォッカー・プランク方程式の2つの異なる形式で検討する。保存形態について、半離散化に基づくシュロディンガー化は特に非周期境界条件を扱う際に好ましいことを示す。さらに、係数行列や微分作用素の実部において正の固有値を持つ不安定系に対するシュロディンガー化法を解析する。本分析により,シュロディンガー化の直接的利用は安定化法と同じ効果を有することが明らかとなった。熱方程式の形式として,時間分割法に基づく量子シミュレーション手法を提案する。シュロディンガー化法における演算子分割と元の問題への直接適用の関係を考察し、シュロディンガー化法が各ステップにおける時間分割解を正確に再現する方法について述べる。さらに、シフト演算子を用いた熱方程式形式の有限差分離散化について検討する。フーリエ基底を用いてシフト演算子を対角化し、周波数空間の効率的なシミュレーションを可能にする。対角ユニタリ作用素の実装に関する追加のガイダンスを提供することで、ベル基底とフーリエ基底における対角化の比較分析を行い、前者は後者よりも一般に高い効率を示すことを示す。 This paper studies a quantum simulation technique for solving the Fokker-Planck equation. Traditional semi-discretization methods often fail to preserve the underlying Hamiltonian dynamics and may even modify the Hamiltonian structure, particularly when incorporating boundary conditions. We address this challenge by employing the Schrodingerization method-it converts any linear partial and ordinary differential equation with non-Hermitian dynamics into systems of Schrodinger-type equations. We explore the application in two distinct forms of the Fokker-Planck equation. For the conservation form, we show that the semi-discretization-based Schrodingerization is preferable, especially when dealing with non-periodic boundary conditions. Additionally, we analyze the Schrodingerization approach for unstable systems that possess positive eigenvalues in the real part of the coefficient matrix or differential operator. Our analysis reveals that the direct use of Schrodingerization has the same effect as a stabilization procedure. For the heat equation form, we propose a quantum simulation procedure based on the time-splitting technique. We discuss the relationship between operator splitting in the Schrodingerization method and its application directly to the original problem, illustrating how the Schrodingerization method accurately reproduces the time-splitting solutions at each step. Furthermore, we explore finite difference discretizations of the heat equation form using shift operators. Utilizing Fourier bases, we diagonalize the shift operators, enabling efficient simulation in the frequency space. Providing additional guidance on implementing the diagonal unitary operators, we conduct a comparative analysis between diagonalizations in the Bell and the Fourier bases, and show that the former generally exhibits greater efficiency than the latter.	翻訳日:2024-04-25 12:48:39 公開日:2024-04-24
# テキスト意味論に基づく適応型プロンプト学習とユニバーサルマルチソースドメイン適応のための不確実性モデリング Adaptive Prompt Learning with Negative Textual Semantics and Uncertainty Modeling for Universal Multi-Source Domain Adaptation ( http://arxiv.org/abs/2404.14696v2 ) ライセンス: Link先を確認	Yuxiang Yang, Lu Wen, Yuanyuan Xu, Jiliu Zhou, Yan Wang,	(参考訳) Universal Multi-source Domain Adaptation (UniMDA)は、複数のラベル付きソースドメインからの知識を、ドメインシフト(差分データ分散)とクラスシフト(未知のターゲットクラス)の下でラベル付けされていないターゲットドメインに転送する。既存のソリューションでは、未知のサンプルを検出するための画像特徴の発掘に重点を置いており、テキストセマンティクスに含まれる豊富な情報を無視している。本論文では,UniMDA分類タスクに対して,言語-画像事前学習(APNE-CLIP)に基づく負のテキストセマンティクスと不確実性モデリングを用いた適応型プロンプト学習を提案する。具体的には、CLIPを利用して、クラスセマンティクスとドメイン表現のテキスト情報を活用することで、未知のサンプルを特定し、ドメインシフトに対処する。さらに、より正確な画像とテキストのペアアライメントを実現するために、負のテキストセマンティクスを利用して、新しいグローバルなインスタンスレベルのアライメントを設計する。さらに,未知試料と未知試料とのマージン距離を拡大するエネルギーベース不確実性モデリング手法を提案する。大規模実験により提案手法の優位性を実証した。 Universal Multi-source Domain Adaptation (UniMDA) transfers knowledge from multiple labeled source domains to an unlabeled target domain under domain shifts (different data distribution) and class shifts (unknown target classes). Existing solutions focus on excavating image features to detect unknown samples, ignoring abundant information contained in textual semantics. In this paper, we propose an Adaptive Prompt learning with Negative textual semantics and uncErtainty modeling method based on Contrastive Language-Image Pre-training (APNE-CLIP) for UniMDA classification tasks. Concretely, we utilize the CLIP with adaptive prompts to leverage textual information of class semantics and domain representations, helping the model identify unknown samples and address domain shifts. Additionally, we design a novel global instance-level alignment objective by utilizing negative textual semantics to achieve more precise image-text pair alignment. Furthermore, we propose an energy-based uncertainty modeling strategy to enlarge the margin distance between known and unknown samples. Extensive experiments demonstrate the superiority of our proposed method.	翻訳日:2024-04-25 12:48:39 公開日:2024-04-24

Title

Authors

Abstract

論文公表日・翻訳日

# Helpfulness Scores vs. Review Unhelpfulness Scores: Two Sides of the same Coin or different Coins?

Review Helpfulness Scores vs. Review Unhelpfulness Scores: Two Sides of the Same Coin or Different Coins? ( http://arxiv.org/abs/2407.05207v1 )

ライセンス: Link先を確認

Yinan Yu, Dominik Gutt, Warut Khern-am-nuai,

(参考訳) オンラインレビューの有用性を評価することは、大量のオンラインレビューを精査しなければならない消費者を支援する。オンラインレビュープラットフォームは、レビューが有用かどうかをユーザが評価できるレビュー評価システムを採用しており、これらの評価はレビュー読者を支援し、レビューコントリビュータを奨励する。本報告では, 文献的有用度スコアは広く研究されているが, 文献的知識が乏しく, 不健康度スコアが欠落している。文献のこのギャップに対処することが重要であるのは、研究者や実践者が、不完全なスコアは本質的なレビューの特徴によって駆動され、そのようなスコアは品質の低いレビューと関連していると仮定しているからである。本研究では、この従来の知恵を、不健康スコアに影響を与える要因を調べることによって検証する。本研究は, 検査有用度スコアとは違って, 内在的な評価特性によって, 不完全性スコアが引き起こされることはないこと, ほぼ誰も統計学的に有意な不完全性スコアの予測因子ではないこと, などを見出した。また、レビュー不満足な投票を受けたユーザーは、他のレビューに対して不健康な投票をする傾向にあることもわかりました。最後に、不愉快な有権者は、役立たずの有権者よりもプラットフォームとの関わりがはるかに少ない。以上の結果から,本態性評価は本態性評価の特徴によるものではないことが示唆された。したがって、同じ貨幣の両面として、有益さと無益さのスコアを考慮すべきではない。

Evaluating the helpfulness of online reviews supports consumers who must sift through large volumes of online reviews. Online review platforms have increasingly adopted review evaluating systems, which let users evaluate whether reviews are helpful or not; in turn, these evaluations assist review readers and encourage review contributors. Although review helpfulness scores have been studied extensively in the literature, our knowledge regarding their counterpart, review unhelpfulness scores, is lacking. Addressing this gap in the literature is important because researchers and practitioners have assumed that unhelpfulness scores are driven by intrinsic review characteristics and that such scores are associated with low-quality reviews. This study validates this conventional wisdom by examining factors that influence unhelpfulness scores. We find that, unlike review helpfulness scores, unhelpfulness scores are generally not driven by intrinsic review characteristics, as almost none of them are statistically significant predictors of an unhelpfulness score. We also find that users who receive review unhelpfulness votes are more likely to cast unhelpfulness votes for other reviews. Finally, unhelpfulness voters engage much less with the platform than helpfulness voters do. In summary, our findings suggest that review unhelpfulness scores are not driven by intrinsic review characteristics. Therefore, helpfulness and unhelpfulness scores should not be considered as two sides of the same coin.

翻訳日:2024-07-22 14:29:03 公開日:2024-04-24

# アフリカにおける気候回復のためのAIの活用 : 課題、機会、コラボレーションの必要性

Leveraging AI for Climate Resilience in Africa: Challenges, Opportunities, and the Need for Collaboration ( http://arxiv.org/abs/2407.05210v1 )

ライセンス: Link先を確認

Rendani Mbuvha, Yassine Yaakoubi, John Bagiliko, Santiago Hincapie Potes, Amal Nammouchi, Sabrina Amrouche,

(参考訳) 気候変動の問題がより厳しくなるにつれて、アフリカにおける彼らの影響は、大陸の固有の課題に合わせた緊急で革新的な解決策を求めている。人工知能(AI)は気候変動の適応と緩和のための重要かつ価値のあるツールとして出現する一方で、その有効性と潜在性は、データの不足、インフラストラクチャのギャップ、限定的なローカルAI開発といった重要な課題を克服する上で欠かせないものである。本稿では,アフリカにおける気候変動適応と緩和におけるAIの役割について考察する。キャパシティの構築、オープンソースのデータレポジトリの開発、文化的にもコンテキスト的にも関係のあるコンテキスト対応で堅牢なAI駆動型気候ソリューションの作成に協力的なアプローチを提唱している。

As climate change issues become more pressing, their impact in Africa calls for urgent, innovative solutions tailored to the continent's unique challenges. While Artificial Intelligence (AI) emerges as a critical and valuable tool for climate change adaptation and mitigation, its effectiveness and potential are contingent upon overcoming significant challenges such as data scarcity, infrastructure gaps, and limited local AI development. This position paper explores the role of AI in climate change adaptation and mitigation in Africa. It advocates for a collaborative approach to build capacity, develop open-source data repositories, and create context-aware, robust AI-driven climate solutions that are culturally and contextually relevant.

翻訳日:2024-07-22 14:29:03 公開日:2024-04-24

# 人工知能を使って小規模企業のクラウドファンディング成功を解き放つ

Using Artificial Intelligence to Unlock Crowdfunding Success for Small Businesses ( http://arxiv.org/abs/2407.09480v1 )

ライセンス: Link先を確認

Teng Ye, Jingnan Zheng, Junhui Jin, Jingyi Qiu, Wei Ai, Qiaozhu Mei,

(参考訳) 中小企業はオンラインのクラウドファンディングプラットホームに本質的な資金を提供しようとしているが、これらのキャンペーンの40%以上は、特に低社会経済分野からの資金調達に失敗している。我々は、AI技術の最新の進歩を利用して、クラウドファンディングキャンペーンの成功に影響を及ぼす重要な要因を特定し、これらの要因を戦略的に最適化することで資金調達結果を改善する。我々の最高の機械学習モデルは、主にテキスト記述に基づいて、キャンペーンの81.0%の資金調達結果を正確に予測する。機械学習モデルを解釈することで、キャンペーンを開始する前にテキスト記述を改善するための実用的な提案ができる。大規模な言語モデルを用いて物語の3つの側面を増大させることで、83%の人的評価者よりもキャンペーンの方が好まれるようになり、金融支援を確保する可能性も11.9%向上することを示した。本研究は,中小企業の資金調達キャンペーンにおける説明書作成の効果的な戦略を明らかにするとともに,大規模言語モデルをクラウドファンディング手法に統合する新たな領域を開拓するものである。

While small businesses are increasingly turning to online crowdfunding platforms for essential funding, over 40% of these campaigns may fail to raise any money, especially those from low socio-economic areas. We utilize the latest advancements in AI technology to identify crucial factors that influence the success of crowdfunding campaigns and to improve their fundraising outcomes by strategically optimizing these factors. Our best-performing machine learning model accurately predicts the fundraising outcomes of 81.0% of campaigns, primarily based on their textual descriptions. Interpreting the machine learning model allows us to provide actionable suggestions on improving the textual description before launching a campaign. We demonstrate that by augmenting just three aspects of the narrative using a large language model, a campaign becomes more preferable to 83% human evaluators, and its likelihood of securing financial support increases by 11.9%. Our research uncovers the effective strategies for crafting descriptions for small business fundraising campaigns and opens up a new realm in integrating large language models into crowdfunding methodologies.

翻訳日:2024-07-22 13:48:17 公開日:2024-04-24

# 低比デバイス上での符号化自動音声認識のためのGated Low-rank Adaptation

Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices ( http://arxiv.org/abs/2406.02562v1 )

ライセンス: Link先を確認

Gwantae Kim, Bokyeung Lee, Donghyeon Kim, Hanseok Ko,

(参考訳) 近年、モバイルやCPU専用デバイスなどの低スペックデバイスでパーソナライズされた大規模モデルの利用に対する関心が高まっている。しかし、オンデバイスでパーソナライズされた大規模モデルを利用することは非効率であり、時には計算コストのために制限される。そこで本研究では,パラメータ効率のよい微調整法を用いて,デバイス上のモデル重みを最小化する重み分離手法を提案する。さらに、コードスイッチング(code-switching)として知られる発話で複数の言語を話す人もいるため、このようなケースに対処するにはパーソナライズされたASRモデルが必要である。しかし、現在の多言語音声認識モデルは、発話毎に単一の言語を認識することに限定されている。この問題に対処するため,単言語モデルと多言語音声認識モデルを組み合わせたコードスイッチング音声認識モデルを提案する。さらに,パラメータ効率のよい微調整のためのゲートローランク適応(GLoRA)を導入し,性能劣化を最小限に抑えた。韓国語と英語のコードスイッチングデータセットを用いて実験を行い、コードスイッチングのための微調整音声認識モデルが、スクラッチから訓練された従来のコードスイッチング音声認識モデルの性能を上回ることを示した。さらに、GLoRAは従来のLoRAと比較してパラメータ効率の良い微調整性能を向上させる。

In recent times, there has been a growing interest in utilizing personalized large models on low-spec devices, such as mobile and CPU-only devices. However, utilizing a personalized large model in the on-device is inefficient, and sometimes limited due to computational cost. To tackle the problem, this paper presents the weights separation method to minimize on-device model weights using parameter-efficient fine-tuning methods. Moreover, some people speak multiple languages in an utterance, as known as code-switching, the personalized ASR model is necessary to address such cases. However, current multilingual speech recognition models are limited to recognizing a single language within each utterance. To tackle this problem, we propose code-switching speech recognition models that incorporate fine-tuned monolingual and multilingual speech recognition models. Additionally, we introduce a gated low-rank adaptation(GLoRA) for parameter-efficient fine-tuning with minimal performance degradation. Our experiments, conducted on Korean-English code-switching datasets, demonstrate that fine-tuning speech recognition models for code-switching surpasses the performance of traditional code-switching speech recognition models trained from scratch. Furthermore, GLoRA enhances parameter-efficient fine-tuning performance compared to conventional LoRA.

翻訳日:2024-07-01 08:10:07 公開日:2024-04-24

# MatFusion:SVBRDFキャプチャのための生成拡散モデル

MatFusion: A Generative Diffusion Model for SVBRDF Capture ( http://arxiv.org/abs/2406.06539v1 )

ライセンス: Link先を確認

Sam Sartor, Pieter Peers,

(参考訳) 画像からのSVBRDF推定を拡散タスクとして定式化する。空間的に変化する材料の分布をモデル化するために,我々はまず,空間的に変化する材料の大集合312,165個の非条件SVBRDF拡散バックボーンモデルを訓練する。このSVBRDF拡散バックボーンモデルであるMatFusionは、条件付き拡散モデルを精製し、制御または制御されていない照明下での写真から物質特性を推定する基礎となる。私たちのバックボーンMatFusionモデルは反射率特性の損失のみを用いてトレーニングされるので、トレーニング中にバックプロパゲーションを必要とせずに、より高価なレンダリング手法と組み合わせることができる。条件付きSVBRDF拡散モデルでは,複数のSVBRDF推定値を合成することができる。本手法の柔軟性は,様々な種類の入射光に条件付き異なるSVBRDF拡散モデルを精製することにより実証し,光の同時照射による1枚の写真の場合,既存のSVBRDF推定法と同等かそれ以上の精度が得られることを示す。

We formulate SVBRDF estimation from photographs as a diffusion task. To model the distribution of spatially varying materials, we first train a novel unconditional SVBRDF diffusion backbone model on a large set of 312,165 synthetic spatially varying material exemplars. This SVBRDF diffusion backbone model, named MatFusion, can then serve as a basis for refining a conditional diffusion model to estimate the material properties from a photograph under controlled or uncontrolled lighting. Our backbone MatFusion model is trained using only a loss on the reflectance properties, and therefore refinement can be paired with more expensive rendering methods without the need for backpropagation during training. Because the conditional SVBRDF diffusion models are generative, we can synthesize multiple SVBRDF estimates from the same input photograph from which the user can select the one that best matches the users' expectation. We demonstrate the flexibility of our method by refining different SVBRDF diffusion models conditioned on different types of incident lighting, and show that for a single photograph under colocated flash lighting our method achieves equal or better accuracy than existing SVBRDF estimation methods.

翻訳日:2024-07-01 08:00:19 公開日:2024-04-24

# Telugu手話のためのYOLOv5アルゴリズムに基づく手話認識

Sign Language Recognition based on YOLOv5 Algorithm for the Telugu Sign Language ( http://arxiv.org/abs/2406.10231v1 )

ライセンス: Link先を確認

Vipul Reddy. P, Vishnu Vardhan Reddy. B, Sukriti,

(参考訳) 手話認識(SLR)技術は、難聴者に対するコミュニケーションとアクセシビリティを向上させるという大きな可能性を秘めている。本稿では、YOLOv5オブジェクト識別フレームワークを用いて、TSL内のジェスチャーを識別する新しい手法を提案する。主な目標は、聴覚障害者コミュニティがslrを使用できるように、TSLジェスチャを特定するための正確で成功した方法を作ることである。その後、YOLOv5を使ってジェスチャーを認識し分類するディープラーニングモデルが作成された。このモデルはYOLOv5アーキテクチャの複雑な手話機能を扱うための高い精度、速度、能力の恩恵を受けている。転送学習のアプローチを利用して、YOLOv5モデルはTSLジェスチャーにカスタマイズされた。最高の結果を得るために、トレーニング中に慎重にパラメータとハイパーパラメータを調整した。 F1スコアと平均平均精度 (mAP) は90.5%と98.1%であり、YOLOv5-mediumモデルは卓越したパフォーマンス指標で際立っている。驚くべきことに、このモデルは計算複雑性とトレーニング時間の間に許容可能なバランスを取り、これらの驚くべき結果を生み出す。精度と効率の十分なブレンドを提供するため、200エポックでトレーニングされたYOLOv5-mediumモデルは、現実のデプロイメントに推奨される選択肢として現れます。各種のTSLジェスチャーおよび設定に対するシステムの安定性と一般化性は厳密なテストと検証によって評価され、精度は著しく向上した。本研究は、深層学習とコンピュータビジョン技術の最先端の応用をTSLジェスチャ識別に適用することにより、言語コミュニティにおけるアクセス可能な技術の発展の基盤となるものである。また、手話認識の分野に対する洞察力のある視点と新しいアプローチも提供する。

Sign language recognition (SLR) technology has enormous promise to improve communication and accessibility for the difficulty of hearing. This paper presents a novel approach for identifying gestures in TSL using the YOLOv5 object identification framework. The main goal is to create an accurate and successful method for identifying TSL gestures so that the deaf community can use slr. After that, a deep learning model was created that used the YOLOv5 to recognize and classify gestures. This model benefited from the YOLOv5 architecture's high accuracy, speed, and capacity to handle complex sign language features. Utilizing transfer learning approaches, the YOLOv5 model was customized to TSL gestures. To attain the best outcomes, careful parameter and hyperparameter adjustment was carried out during training. With F1-score and mean Average Precision (mAP) ratings of 90.5% and 98.1%, the YOLOv5-medium model stands out for its outstanding performance metrics, demonstrating its efficacy in Telugu sign language identification tasks. Surprisingly, this model strikes an acceptable balance between computational complexity and training time to produce these amazing outcomes. Because it offers a convincing blend of accuracy and efficiency, the YOLOv5-medium model, trained for 200 epochs, emerges as the recommended choice for real-world deployment. The system's stability and generalizability across various TSL gestures and settings were evaluated through rigorous testing and validation, which yielded outstanding accuracy. This research lays the foundation for future advancements in accessible technology for linguistic communities by providing a cutting-edge application of deep learning and computer vision techniques to TSL gesture identification. It also offers insightful perspectives and novel approaches to the field of sign language recognition.

翻訳日:2024-07-01 07:50:27 公開日:2024-04-24

# 金融工学におけるBERT vs GPT

BERT vs GPT for financial engineering ( http://arxiv.org/abs/2405.12990v1 )

ライセンス: Link先を確認

Edward Sharkey, Philip Treleaven,

(参考訳) この論文は、これらのモデルがニュースイベントからどのように感情を判断できるかを示すために、いくつかのTransformerモデル[4]をベンチマークする。この信号は下流のモデリングや商品取引の信号識別に使用できる。細調整されたBERTモデルは,細調整されたGPTモデルやバニラGPTモデルよりも優れていることがわかった。近年、トランスフォーマーモデルは自然言語処理(NLP)の分野に革命をもたらし、機械翻訳、テキスト要約、質問応答、自然言語生成といった様々なタスクで最先端の成果を上げている。最も顕著なトランスモデルとしては、BERT (Bidirectional Encoder Representations from Transformers) とGPT (Generative Pre-Traited Transformer) がある。 CopBERTモデルトレーニングデータとプロセス概要を提供する。 CopBERTモデルはFinBERTのような類似のドメイン固有BERTトレーニングモデルより優れている。以下の混乱行列は、それぞれCopBERTとCopGPTのパフォーマンスを示している。 CopBERT対GPT4ではf1_scoreが約10%増加し、CopGPTでは16%増加しています。 GPT4が主流である一方で、金融工学的なタスクに対するGPTモデルの代替案を検討することの重要性、幻覚のリスク、解釈可能性に関わる課題が強調されている。当然のことながら、より大きなLLMがBERTモデルより優れており、予測能力がある。要約すると、BERTは部分的に新しいXGboostであり、高いレベルの解釈可能性を提供する予測能力に欠けている。 BERTモデルは次のXGboost [2]ではなく、解釈可能性と精度の混合を必要とする金融工学タスクの興味深い代替案である。

The paper benchmarks several Transformer models [4], to show how these models can judge sentiment from a news event. This signal can then be used for downstream modelling and signal identification for commodity trading. We find that fine-tuned BERT models outperform fine-tuned or vanilla GPT models on this task. Transformer models have revolutionized the field of natural language processing (NLP) in recent years, achieving state-of-the-art results on various tasks such as machine translation, text summarization, question answering, and natural language generation. Among the most prominent transformer models are Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer (GPT), which differ in their architectures and objectives. A CopBERT model training data and process overview is provided. The CopBERT model outperforms similar domain specific BERT trained models such as FinBERT. The below confusion matrices show the performance on CopBERT & CopGPT respectively. We see a ~10 percent increase in f1_score when compare CopBERT vs GPT4 and 16 percent increase vs CopGPT. Whilst GPT4 is dominant It highlights the importance of considering alternatives to GPT models for financial engineering tasks, given risks of hallucinations, and challenges with interpretability. We unsurprisingly see the larger LLMs outperform the BERT models, with predictive power. In summary BERT is partially the new XGboost, what it lacks in predictive power it provides with higher levels of interpretability. Concluding that BERT models might not be the next XGboost [2], but represent an interesting alternative for financial engineering tasks, that require a blend of interpretability and accuracy.

翻訳日:2024-05-27 03:08:05 公開日:2024-04-24

# 要求工学課題における自然言語推論(NLI)の活用から学んだこと

Lessons from the Use of Natural Language Inference (NLI) in Requirements Engineering Tasks ( http://arxiv.org/abs/2405.05135v1 )

ライセンス: Link先を確認

Mohamad Fazelnia, Viktoria Koscinski, Spencer Herzog, Mehdi Mirakhorli,

(参考訳) 要求工学タスクの自動化における自然言語推論(NLI)の利用について検討する。特に、要求分類、要求仕様の欠陥の特定、利害関係者の要求における矛盾の検出という3つのタスクに重点を置いています。従来の研究は、自然言語処理タスクの幅広い範囲において、NLIを普遍的な手法として使用するという大きな利点を示してきたが、これらの利点は、ソフトウェア要件工学の文脈内では研究されていない。そこで我々は,要求分析におけるNLIの利用を評価する実験を設計した。我々は,NLIの性能を,プロンプトベースモデル,従来の移動学習,Large Language Models(LLM)を利用したチャットボットモデル,確率モデルなど,様々なアプローチと比較する。従来の学習やゼロショットなど,様々な学習環境下で実施された実験を通じて,NLI法が要求仕様の分析において従来のNLP法や,その他のLLMに基づくチャットボットモデルを上回ることを確定的に実証した。さらに、NLIを要求工学タスクの自動化に適したアプローチにする学習環境を特徴付けるための教訓を共有した。

We investigate the use of Natural Language Inference (NLI) in automating requirements engineering tasks. In particular, we focus on three tasks: requirements classification, identification of requirements specification defects, and detection of conflicts in stakeholders' requirements. While previous research has demonstrated significant benefit in using NLI as a universal method for a broad spectrum of natural language processing tasks, these advantages have not been investigated within the context of software requirements engineering. Therefore, we design experiments to evaluate the use of NLI in requirements analysis. We compare the performance of NLI with a spectrum of approaches, including prompt-based models, conventional transfer learning, Large Language Models (LLMs)-powered chatbot models, and probabilistic models. Through experiments conducted under various learning settings including conventional learning and zero-shot, we demonstrate conclusively that our NLI method surpasses classical NLP methods as well as other LLMs-based and chatbot models in the analysis of requirements specifications. Additionally, we share lessons learned characterizing the learning settings that make NLI a suitable approach for automating requirements engineering tasks.

翻訳日:2024-05-12 15:40:48 公開日:2024-04-24

# 知能チューリングシステムにおける長周期データ解析のためのLSTMとBERTの統合

Integrating LSTM and BERT for Long-Sequence Data Analysis in Intelligent Tutoring Systems ( http://arxiv.org/abs/2405.05136v1 )

ライセンス: Link先を確認

Zhaoxing Li, Jujie Yang, Jindi Wang, Lei Shi, Sebastian Stein,

(参考訳) 知識追跡の分野は、学生が過去の行動データを分析することによって、時間とともに学習し、知識をマスターする方法を理解することを目的としている。この目標を達成するために、多くの研究者が、Intelligent Tutoring Systemsのデータを使って学生のその後の行動を予測する知識追跡モデルを提案している。しかし、Intelligent Tutoring Systemsの開発に伴い、長いシーケンスデータを含む大規模データセットが出現し始めた。最近のディープラーニングベースの知識追跡モデルでは、長いシーケンスデータを含む大規模データセットを扱う際に、低効率、低精度、低解釈可能性といった障害に直面している。これらの課題に対処し,LSTM BERT をベースとした長周期データ処理のための知識追跡モデル LBKT を提案する。 LBKTは、ACCとAUCのメトリクス上で、ほとんどのベンチマークデータセット上で最高のパフォーマンスを達成する。さらに,LBKTの全体的な性能に対する各成分の影響を分析するためのアブレーション研究を行った。さらに、モデルの埋め込み戦略を示すために、可視化ツールとしてt-SNEを使用しました。その結果、LBKTはより高速で解釈可能であり、従来のディープラーニングベースの知識追跡手法よりもメモリコストが低いことが示唆された。

The field of Knowledge Tracing aims to understand how students learn and master knowledge over time by analyzing their historical behaviour data. To achieve this goal, many researchers have proposed Knowledge Tracing models that use data from Intelligent Tutoring Systems to predict students' subsequent actions. However, with the development of Intelligent Tutoring Systems, large-scale datasets containing long-sequence data began to emerge. Recent deep learning based Knowledge Tracing models face obstacles such as low efficiency, low accuracy, and low interpretability when dealing with large-scale datasets containing long-sequence data. To address these issues and promote the sustainable development of Intelligent Tutoring Systems, we propose a LSTM BERT-based Knowledge Tracing model for long sequence data processing, namely LBKT, which uses a BERT-based architecture with a Rasch model-based embeddings block to deal with different difficulty levels information and an LSTM block to process the sequential characteristic in students' actions. LBKT achieves the best performance on most benchmark datasets on the metrics of ACC and AUC. Additionally, an ablation study is conducted to analyse the impact of each component of LBKT's overall performance. Moreover, we used t-SNE as the visualisation tool to demonstrate the model's embedding strategy. The results indicate that LBKT is faster, more interpretable, and has a lower memory cost than the traditional deep learning based Knowledge Tracing methods.

翻訳日:2024-05-12 15:40:48 公開日:2024-04-24

# ロバスト検査ドローン運用におけるハイブリッド型確率的バッテリヘルス管理手法

A Hybrid Probabilistic Battery Health Management Approach for Robust Inspection Drone Operations ( http://arxiv.org/abs/2405.00055v1 )

ライセンス: Link先を確認

Jokin Alcibar, Jose I. Aizpurua, Ekhi Zugastia, Oier Penagarikano,

(参考訳) リモートクリティカルインフラストラクチャのヘルスモニタリングは、インフラのアクセシビリティが制限されているため、複雑で高価な活動である。検査ドローンは、アクセシビリティを改善して重要なインフラの信頼性を高めるユビキタスな資産である。しかし、厳しい運用環境のため、検査を成功させるためには、健康状態を監視することが不可欠である。バッテリーは、検査ドローンの全体的な信頼性を決定する重要なコンポーネントであり、適切な健康管理アプローチにより、信頼性と堅牢な検査に寄与する。本稿では,Li-Po電池の放電終端電圧予測のためのハイブリッド確率的手法を提案する。このハイブリダイゼーションは、物理に基づく放電と確率論的誤差補正モデルを組み合わせた誤差補正構成で達成され、アレタリックおよびエピステミックの不確かさを定量化する。負荷条件の異なるEOD電圧を含むデータセット上で,ハイブリッド確率的手法の性能を実験的に評価した。データセットは、オフショア風力タービンの検査に焦点を当てた、異なる飛行で作動する実際の検査ドローンから得られた。提案手法は様々な確率的手法で検証され、最高の確率的手法と比較して14.8%の確率的精度が向上したことを示す。さらに, 動脈およびてんかんの不確実性は, 電池の健康状態の診断を高めるために, 頑健な評価を提供する。

Health monitoring of remote critical infrastructure is a complex and expensive activity due to the limited infrastructure accessibility. Inspection drones are ubiquitous assets that enhance the reliability of critical infrastructures through improved accessibility. However, due to the harsh operation environment, it is crucial to monitor their health to ensure successful inspection operations. The battery is a key component that determines the overall reliability of the inspection drones and, with an appropriate health management approach, contributes to reliable and robust inspections. In this context, this paper presents a novel hybrid probabilistic approach for battery end-of-discharge (EOD) voltage prediction of Li-Po batteries. The hybridization is achieved in an error-correction configuration, which combines physics-based discharge and probabilistic error-correction models to quantify the aleatoric and epistemic uncertainty. The performance of the hybrid probabilistic methodology was empirically evaluated on a dataset comprising EOD voltage under varying load conditions. The dataset was obtained from real inspection drones operated on different flights, focused on offshore wind turbine inspections. The proposed approach has been tested with different probabilistic methods and demonstrates 14.8% improved performance in probabilistic accuracy compared to the best probabilistic method. In addition, aleatoric and epistemic uncertainties provide robust estimations to enhance the diagnosis of battery health-states.

翻訳日:2024-05-05 17:54:32 公開日:2024-04-24

# グルジア語におけるHomonym Sense Disambiguation

Homonym Sense Disambiguation in the Georgian Language ( http://arxiv.org/abs/2405.00710v1 )

ライセンス: Link先を確認

Davit Melikidze, Alexander Gamkrelidze,

(参考訳) 本研究では,ジョージアの共通crawlsコーパスをフィルタリングしたデータセットに基づいて,事前学習した大規模言語モデル(LLM)の教師付き微調整に基づいて,ジョージア語における単語センス曖昧化(WSD)タスクに対する新しいアプローチを提案する。データセットは、複数の感覚を持つ単語の分類器を訓練するために使用される。さらに,WSDにLSTMを用いた実験結果について報告する。正確な曖昧な同義語は自然言語処理において不可欠である。グルジア語はカルトヴェリア語族に属する不可解な言語であり、この文脈で固有の課題を提示している。本研究の目的は、グルジア語における同義語曖昧化に関する特定の問題を強調し、その解決に向けた我々のアプローチを示すことである。本稿で論じる手法は、7500以上の文を手書き分類したデータセットを用いて、同義語の語彙的意味を予測するための95%の精度を達成している。

This research proposes a novel approach to the Word Sense Disambiguation (WSD) task in the Georgian language, based on supervised fine-tuning of a pre-trained Large Language Model (LLM) on a dataset formed by filtering the Georgian Common Crawls corpus. The dataset is used to train a classifier for words with multiple senses. Additionally, we present experimental results of using LSTM for WSD. Accurately disambiguating homonyms is crucial in natural language processing. Georgian, an agglutinative language belonging to the Kartvelian language family, presents unique challenges in this context. The aim of this paper is to highlight the specific problems concerning homonym disambiguation in the Georgian language and to present our approach to solving them. The techniques discussed in the article achieve 95% accuracy for predicting lexical meanings of homonyms using a hand-classified dataset of over 7500 sentences.

翻訳日:2024-05-05 17:44:45 公開日:2024-04-24

# 協調型組み込みシステム開発のためのリモートハンドオンサポートの探索

Exploring Remote Hands-on Support for Collaborative Embedded Systems Development ( http://arxiv.org/abs/2404.17604v1 )

ライセンス: Link先を確認

Yan Chen, Jasmine Jones,

(参考訳) 組み込みシステム開発は複雑なタスクであり、しばしばチームコラボレーションを必要とします。フリーランサーの市場が拡大し、リモートワークへの世界的シフトを考えると、多くの開発者やクライアントにとってリモートコラボレーションが不可欠になっている。既存のコミュニケーションとコーディネーションツールは、ユーザが共同でコードを共有し、議論し、編集するのに役立つが、これらのツールはハードウェア開発ではなく、ソフトウェア用に特別に設計されている。本研究の目的は,組込みシステム開発のための遠隔支援ツールの設計空間を探ることである。これを実現するため、私たちは12人の経験豊富な組み込みシステム開発者に対して、現在のリモートワークプラクティス、課題、ニーズについてインタビューしました。次に, 遠隔操作エージェントHandyを用いて, 共同作業者から支援開発者の欲求のタイプを抽出するための仮説的アシスタントとして, ユーザ・エコメンテーション・スタディを行った。本研究は,リモートワークのシナリオと戦略,開発者によるサポートニーズ,情報提供,調整,実装の課題,開発者がリモート物理操作ツールを使用してプロジェクトに取り組む際のプライバシ,コントロール,信頼に関する懸念について述べる。この研究は、リモートでオンデマンドなコラボレーションと、ソフトウェア環境におけるヘルプ・シーキングに沿った組み込みシステム開発を提供することによって、文献に寄与する。この研究の実証的基盤は、遠隔操作エージェントにおける将来の作業の基盤となり、組込みシステム開発における協調サポートを強化する、ドキュメント化されたニーズ、好み、欲求の豊富な基盤を提供する。

Embedded systems development is a complex task that often requires team collaboration. Given the growing market of freelancers and the global shift to remote work, remote collaboration has become a necessity for many developers and clients. While existing communication and coordination tools help users share, discuss, and edit code collaboratively, these tools were specifically designed for software rather than hardware development. In this work, our goal is to explore the design space of remote support tools for embedded systems development. To do this, we interviewed 12 seasoned embedded systems developers regarding their current remote work practices, issues, and needs. We then conducted a user enactment study with a bespoke remote manipulation agent, Handy, as a hypothetical assistant to elicit the types of support developers desire from a collaborator. Our findings describe the scenarios and strategies in which remote work takes place; the support needs and information, coordination, and implementation challenges expressed by developers; and the privacy, control, and trust concerns that developers have when working on their projects with remote physical manipulation tools. This research contributes to the literature by bringing embedded systems development in line with remote, on-demand collaboration and help-seeking in software environments. The empirical basis of this work provides a rich foundation of documented needs, preferences, and desires that can ground future work on remote manipulation agents and enhance collaboration support in the domain of embedded systems development.

翻訳日:2024-04-30 20:10:08 公開日:2024-04-24

# 自律型LCM駆動型データから人間検証研究論文へ

Autonomous LLM-driven research from data to human-verifiable research papers ( http://arxiv.org/abs/2404.17605v1 )

ライセンス: Link先を確認

Tal Ifargan, Lukas Hafner, Maor Kern, Ori Alcalay, Roy Kishony,

(参考訳) AIが科学的発見を加速することを約束しているため、完全なAI駆動型研究が可能であるか、透明性、トレーサビリティ、検証可能性といった重要な科学的価値に準拠できるかどうかは不明だ。人間の科学的実践を模倣して、私たちは、完全な段階的な研究プロセスを通じて、LLMエージェント間のインタラクションをガイドする自動化プラットフォームであるData-to-paperを構築しました。自動操縦モードでは、注釈付きデータだけで、データ・ツー・ペーパーの仮説を立て、研究計画を設計し、分析コードを書き、デバッグし、結果を生成して解釈し、完全な情報追跡可能な研究論文を作成した。研究の新規性は比較的限られていたが、このプロセスはデータからデ・ノボの定量的洞察を自律的に生成することを示した。単純な研究目的のために、完全に自律的なサイクルは、80～90%の誤差を伴わずにピアレビューされた出版物を再カプセル化する原稿を作成することができるが、目標の複雑さが増大するにつれて、人間の共同操縦は精度を測るために重要になる。プロセス自体を超えて、作成された原稿も本質的に検証可能であり、情報追跡によって結果、方法、データをプログラム的に連鎖することができる。我々の研究は、危険、トレーサビリティ、透明性、検証可能性ではなく、AIによる科学的発見の加速の可能性を示している。

As AI promises to accelerate scientific discovery, it remains unclear whether fully AI-driven research is possible and whether it can adhere to key scientific values, such as transparency, traceability and verifiability. Mimicking human scientific practices, we built data-to-paper, an automation platform that guides interacting LLM agents through a complete stepwise research process, while programmatically back-tracing information flow and allowing human oversight and interactions. In autopilot mode, provided with annotated data alone, data-to-paper raised hypotheses, designed research plans, wrote and debugged analysis codes, generated and interpreted results, and created complete and information-traceable research papers. Even though research novelty was relatively limited, the process demonstrated autonomous generation of de novo quantitative insights from data. For simple research goals, a fully-autonomous cycle can create manuscripts which recapitulate peer-reviewed publications without major errors in about 80-90%, yet as goal complexity increases, human co-piloting becomes critical for assuring accuracy. Beyond the process itself, created manuscripts too are inherently verifiable, as information-tracing allows to programmatically chain results, methods and data. Our work thereby demonstrates a potential for AI-driven acceleration of scientific discovery while enhancing, rather than jeopardizing, traceability, transparency and verifiability.

翻訳日:2024-04-30 20:10:08 公開日:2024-04-24

# サンプル, 特徴, 期間のデータ中心時系列分析の概観

Review of Data-centric Time Series Analysis from Sample, Feature, and Period ( http://arxiv.org/abs/2404.16886v1 )

ライセンス: Link先を確認

Chenxi Sun, Hongyan Li, Yaliang Li, Shenda Hong,

(参考訳) データは、古典的モデルであれ、今日の大規模言語モデルであれ、機械学習アプローチを利用した時系列分析を実行する上で不可欠である。優れた時系列データセットは、タスクの結果とコストだけでなく、モデルの正確性、堅牢性、収束性にも有利です。データ中心AIの出現は、モデルの改良からデータ品質の優先順位付けへの展望の変化を表している。時系列データ処理手法は、広範囲の研究分野に頻繁に現れるが、特定のトピックとしてはあまり研究されていない。このギャップを埋めるために、本稿では、時系列分析における様々なデータ中心の手法を体系的にレビューし、幅広い研究トピックを取り上げる。本稿では,サンプル,特徴,期間における時系列データの特徴に基づいて,レビューしたデータ選択手法の分類法を提案する。時系列データを対象とした特徴,利益,欠点を論じ,要約することに加えて,推奨事項やオープン問題,可能な研究トピックを提案することで,課題や機会も紹介する。

Data is essential to performing time series analysis utilizing machine learning approaches, whether for classic models or today's large language models. A good time-series dataset is advantageous for the model's accuracy, robustness, and convergence, as well as task outcomes and costs. The emergence of data-centric AI represents a shift in the landscape from model refinement to prioritizing data quality. Even though time-series data processing methods frequently come up in a wide range of research fields, it hasn't been well investigated as a specific topic. To fill the gap, in this paper, we systematically review different data-centric methods in time series analysis, covering a wide range of research topics. Based on the time-series data characteristics at sample, feature, and period, we propose a taxonomy for the reviewed data selection methods. In addition to discussing and summarizing their characteristics, benefits, and drawbacks targeting time-series data, we also introduce the challenges and opportunities by proposing recommendations, open problems, and possible research topics.

翻訳日:2024-04-29 15:03:56 公開日:2024-04-24

# 大規模インシデント応答の異常検出

Anomaly Detection for Incident Response at Scale ( http://arxiv.org/abs/2404.16887v1 )

ライセンス: Link先を確認

Hanzhang Wang, Gowtham Kumar Tangirala, Gilkara Pranav Naidu, Charles Mayville, Arighna Roy, Joanne Sun, Ramesh Babu Mandava,

(参考訳) 本稿では,機械学習に基づく異常検出製品であるAIDR(AI Detect and Respond)について紹介する。 3ヶ月にわたる検証の間に、製品は3000以上のモデルから25以上のアプリケーション、プラットフォーム、運用チームへの予測を提供し、主要なインシデントのうち63%をカバーし、平均時間検出(MTTD)を7分以上短縮した。従来の異常検出手法とは異なり、我々のソリューションは統計的、ML、ディープラーニングモデルを活用しながら、ルールベースの静的しきい値を導入し、ドメイン固有の知識を取り入れている。単変量および多変量MLモデルは、スケーラビリティと高可用性のために、分散サービスを通じてデプロイされ、メンテナンスされる。 AIDRには、ドリフト検出アルゴリズムと顧客のフィードバックを組み合わせたモデル品質を評価するフィードバックループがある。また、セルフオンボーディング機能とカスタマイズ性も備えている。 AIDRは、検出にかかる時間が少なく、従来の方法よりも偽陽性が少ない、さまざまな社内チームで成功している。前進するにつれて、インシデントカバレッジと防止を拡張し、ノイズを低減し、根本原因推奨(RCR)とさらに統合して、エンドツーエンドのAIDRエクスペリエンスの実現を目指しています。

We present a machine learning-based anomaly detection product, AI Detect and Respond (AIDR), that monitors Walmart's business and system health in real-time. During the validation over 3 months, the product served predictions from over 3000 models to more than 25 application, platform, and operation teams, covering 63\% of major incidents and reducing the mean-time-to-detect (MTTD) by more than 7 minutes. Unlike previous anomaly detection methods, our solution leverages statistical, ML and deep learning models while continuing to incorporate rule-based static thresholds to incorporate domain-specific knowledge. Both univariate and multivariate ML models are deployed and maintained through distributed services for scalability and high availability. AIDR has a feedback loop that assesses model quality with a combination of drift detection algorithms and customer feedback. It also offers self-onboarding capabilities and customizability. AIDR has achieved success with various internal teams with lower time to detection and fewer false positives than previous methods. As we move forward, we aim to expand incident coverage and prevention, reduce noise, and integrate further with root cause recommendation (RCR) to enable an end-to-end AIDR experience.

翻訳日:2024-04-29 15:03:56 公開日:2024-04-24

# NEPENTHE: ニューラルネットワーク深さ低減器としてのエントロピーベースプルーニング

NEPENTHE: Entropy-Based Pruning as a Neural Network Depth's Reducer ( http://arxiv.org/abs/2404.16890v1 )

ライセンス: Link先を確認

Zhu Liao, Victor Quétu, Van-Tam Nguyen, Enzo Tartaglione,

(参考訳) ディープニューラルネットワークは複雑なタスクを解くのに非常に効果的であるが、その計算要求はリアルタイムアプリケーションや限られたリソースシステムにおいてその有用性を妨げうる。さらに、多くのタスクにおいてこれらのモデルが過度にパラメータ化されていることが知られている。本稿では,nEural Network depTH の rEducer (NEPENTHE) として eNtropy-basEd Pruning を提案する。我々の理論的発見に基づいて、NEPENTHEは、完全に除去するために低いエントロピーを持つ層で非構造的に切断される接続に焦点を当てている。我々はMobileNetやSwin-Tのような一般的なアーキテクチャに対するアプローチを検証し、過度なパラメータ化体制に遭遇すると、いくつかのレイヤを効果的に線形化できることを示した。コードは記事の受理時に公開される。

While deep neural networks are highly effective at solving complex tasks, their computational demands can hinder their usefulness in real-time applications and with limited-resources systems. Besides, for many tasks it is known that these models are over-parametrized: neoteric works have broadly focused on reducing the width of these networks, rather than their depth. In this paper, we aim to reduce the depth of over-parametrized deep neural networks: we propose an eNtropy-basEd Pruning as a nEural Network depTH's rEducer (NEPENTHE) to alleviate deep neural networks' computational burden. Based on our theoretical finding, NEPENTHE focuses on un-structurally pruning connections in layers with low entropy to remove them entirely. We validate our approach on popular architectures such as MobileNet and Swin-T, showing that when encountering an over-parametrization regime, it can effectively linearize some layers (hence reducing the model's depth) with little to no performance loss. The code will be publicly available upon acceptance of the article.

翻訳日:2024-04-29 15:03:56 公開日:2024-04-24

# 大規模言語モデルのサードパーティAPIに対する攻撃

Attacks on Third-Party APIs of Large Language Models ( http://arxiv.org/abs/2404.16891v1 )

ライセンス: Link先を確認

Wanru Zhao, Vidit Khazanchi, Haodi Xing, Xuanli He, Qiongkai Xu, Nicholas Donald Lane,

(参考訳) 大規模言語モデル(LLM)サービスは最近、サードパーティのAPIサービスと対話するプラグインエコシステムの提供を開始した。このイノベーションはLLMの能力を高めるが、様々なサードパーティが開発したプラグインが容易に信頼できないため、リスクも伴う。本稿では,サードパーティサービスを含むLDMプラットフォームにおけるセキュリティと安全性の脆弱性を調査する新たな攻撃フレームワークを提案する。フレームワークを広く使われているLLMに適用し、LLM出力を許容不能に修正可能なサードパーティAPI上で、さまざまなドメインにわたる現実世界の悪意のある攻撃を識別する。本稿は,サードパーティのAPI統合によって引き起こされるユニークな課題について論じ,今後のLCMエコシステムのセキュリティと安全性を改善するための戦略的可能性を提供する。私たちのコードは、https://github.com/vk0812/Third-Party-Attacks-on-LLMsでリリースされています。

Large language model (LLM) services have recently begun offering a plugin ecosystem to interact with third-party API services. This innovation enhances the capabilities of LLMs, but it also introduces risks, as these plugins developed by various third parties cannot be easily trusted. This paper proposes a new attacking framework to examine security and safety vulnerabilities within LLM platforms that incorporate third-party services. Applying our framework specifically to widely used LLMs, we identify real-world malicious attacks across various domains on third-party APIs that can imperceptibly modify LLM outputs. The paper discusses the unique challenges posed by third-party API integration and offers strategic possibilities to improve the security and safety of LLM ecosystems moving forward. Our code is released at https://github.com/vk0812/Third-Party-Attacks-on-LLMs.

翻訳日:2024-04-29 15:03:56 公開日:2024-04-24

# マルチクロモフォリック系における光子相関時間非対称性と動的コヒーレンス

Photon correlation time-asymmetry and dynamical coherence in multichromophoric systems ( http://arxiv.org/abs/2404.16892v1 )

ライセンス: Link先を確認

Charlie Nation, Hallmann Oskar Gestsson, Alexandra Olaya-Castro,

(参考訳) 実効エキシトン-フォノン相互作用下で励起輸送を受けるマルチクロモフォリック系により放射される光の偏光フィルターによる2光子相関を理論的に検討し,連続的不整合照明を受ける。本研究では、FMO(Fenna-Matthews Olson)光合成複合体のような生体分子集合体において、異なる偏光に対応する光子の相互相関における時間-対称性を利用して、ゼロ遅延相関で観測されない量子コヒーレント輸送機構と定常状態コヒーレンス特性の両方を探索できることを示す。相関非対称性の古典的境界が得られ、FMOは正確な数値計算によって破られる。これらの光子交叉相関における時間非対称性への支配的な寄与は、フレンケル・エクシトンモデルに対するコヒーレンス移動の集団であることを示す。その結果、分子集合体や他の多部位量子エミッタにおける励起状態のダイナミクスに対するコヒーレントな寄与を研究するために、光子相関非対称性を有望なアプローチとして提案した。

We theoretically investigate polarization-filtered two-photon correlations for the light emitted by a multichromophoric system undergoing excitation transport under realistic exciton-phonon interactions, and subject to continuous incoherent illumination. We show that for a biomolecular aggregate, such as the Fenna-Matthews Olson (FMO) photosynthetic complex, time-asymmetries in the cross-correlations of photons corresponding to different polarizations can be exploited to probe both quantum coherent transport mechanisms and steady-state coherence properties, which are not witnessed by zero-delay correlations. A classical bound on correlation asymmetry is obtained, which FMO is shown to violate using exact numerical calculations. Our analysis indicates that the dominant contributions to time-asymmetry in such photon cross-correlations are population to coherence transfer for Frenkel-Exciton models. Our results therefore put forward photon correlation asymmetry as a promising approach to investigate coherent contributions to excited-stated dynamics in molecular aggregates and other many-site quantum emitters.

翻訳日:2024-04-29 15:03:56 公開日:2024-04-24

# 自信を持って運転できる自動AIコントローラー:不確実性のある車両を操縦する

Automatic AI controller that can drive with confidence: steering vehicle with uncertainty knowledge ( http://arxiv.org/abs/2404.16893v1 )

ライセンス: Link先を確認

Neha Kumari, Sumit Kumar. Sneha Priya, Ayush Kumar, Akash Fogla,

(参考訳) 現実世界と対話する安全クリティカルなシステムでは、意思決定における不確実性の役割が特に機械学習モデルにおいて重要である。 CPS(Cyber-Physical Systems)のセキュアな機能のためには、このような不確実性を適切に管理することが不可欠である。本研究では,機械学習フレームワークを用いた車両の横方向制御システムの開発に焦点をあてる。具体的には、確率論的学習モデルであるベイズニューラルネットワーク(BNN)を用いて不確実性定量化に対処する。この能力により、モデルの予測における信頼度や不確実性のレベルを測定することができます。 BNNベースのコントローラは、単一のトラックを横断する車両から収集されたシミュレーションデータを使用して訓練され、その後、他の様々なトラックでテストされる。まず、トレーニングされたモデルは、複数の類似したトラック上で車両を適応し、効果的に制御する能力を示します。第二に、制御器に組み込まれた予測信頼性の定量化は早期警戒システムとして機能し、アルゴリズムが予測に対する信頼を欠き、したがって失敗に陥りやすいことを示唆する。信頼しきい値を確立することで、手動による介入をトリガーし、安全なパラメータの外で動作した場合に、制御がアルゴリズムから解放されることを保証できます。

In safety-critical systems that interface with the real world, the role of uncertainty in decision-making is pivotal, particularly in the context of machine learning models. For the secure functioning of Cyber-Physical Systems (CPS), it is imperative to manage such uncertainty adeptly. In this research, we focus on the development of a vehicle's lateral control system using a machine learning framework. Specifically, we employ a Bayesian Neural Network (BNN), a probabilistic learning model, to address uncertainty quantification. This capability allows us to gauge the level of confidence or uncertainty in the model's predictions. The BNN based controller is trained using simulated data gathered from the vehicle traversing a single track and subsequently tested on various other tracks. We want to share two significant results: firstly, the trained model demonstrates the ability to adapt and effectively control the vehicle on multiple similar tracks. Secondly, the quantification of prediction confidence integrated into the controller serves as an early-warning system, signaling when the algorithm lacks confidence in its predictions and is therefore susceptible to failure. By establishing a confidence threshold, we can trigger manual intervention, ensuring that control is relinquished from the algorithm when it operates outside of safe parameters.

翻訳日:2024-04-29 15:03:56 公開日:2024-04-24

# 空間的最適線形非バイアス予測:高次元大規模データセットに対する計算数学的アプローチ

Spatial best linear unbiased prediction: A computational mathematics approach for high dimensional massive datasets ( http://arxiv.org/abs/1701.00285v3 )

ライセンス: Link先を確認

Julio E. Castrillon-Candas,

(参考訳) 膨大なデータセットの出現により、計算科学とエンジニアリングのコミュニティの多くは、回帰と分類におけるデータ集約的なアプローチに向かっている。しかし、これらの課題は、問題の規模、複雑さ、次元性の増加によるものである。特に、多くの場合、共分散行列は数値的に不安定であり、線形代数はそのような行列を有限精度のコンピュータ上で正確に逆転することはできないことを示す。行列の安定化に対する一般的なアドホックなアプローチは、いわゆるナゲットの応用である。しかし、これはモデルを変更し、元のソリューションにエラーをもたらす可能性がある。不条件行列を正確に逆転することはできないことは、数値解析からよく知られている。本稿では,観測値や次元数とよく一致したマルチレベル計算法を提案する。マルチレベル基底は、観測のkD木分割に適合する。条件数が大きい数値的に不安定な共分散行列は、精度を損なうことなく、良好な条件付きマルチレベル行列に変換することができる。さらに, 最適線形不偏予測 (BLUP) モデルと一般化最小正方形 (GLS) モデルを正確に解くが, 数値的に安定であることを示す。最大25次元の数値的不安定な問題に対して, マルチレベル法を検証した。 BLUP問題を解くために最大42,050倍の高速化が得られたが、従来の反復法と同じ精度である。非常に不条件の場合、スピードアップは無限である。さらに,多値共分散行列の減衰推定は数値解析の分野から高次元補間法に基づいて導出される。この研究は統計学、不確実量化、高性能計算、計算応用数学の交差点にある。

With the advent of massive data sets much of the computational science and engineering community has moved toward data-intensive approaches in regression and classification. However, these present significant challenges due to increasing size, complexity and dimensionality of the problems. In particular, covariance matrices in many cases are numerically unstable and linear algebra shows that often such matrices cannot be inverted accurately on a finite precision computer. A common ad hoc approach to stabilizing a matrix is application of a so-called nugget. However, this can change the model and introduce error to the original solution. It is well known from numerical analysis that ill-conditioned matrices cannot be accurately inverted. In this paper we develop a multilevel computational method that scales well with the number of observations and dimensions. A multilevel basis is constructed adapted to a kD-tree partitioning of the observations. Numerically unstable covariance matrices with large condition numbers can be transformed into well conditioned multilevel ones without compromising accuracy. Moreover, it is shown that the multilevel prediction exactly solves the Best Linear Unbiased Predictor (BLUP) and Generalized Least Squares (GLS) model, but is numerically stable. The multilevel method is tested on numerically unstable problems of up to 25 dimensions. Numerical results show speedups of up to 42,050 times for solving the BLUP problem, but with the same accuracy as the traditional iterative approach. For very ill-conditioned cases the speedup is infinite. In addition, decay estimates of the multilevel covariance matrices are derived based on high dimensional interpolation techniques from the field of numerical analysis. This work lies at the intersection of statistics, uncertainty quantification, high performance computing and computational applied mathematics.

翻訳日:2024-04-28 14:58:07 公開日:2024-04-24

# 非地方放送

Broadcasting of non-locality ( http://arxiv.org/abs/1909.12565v2 )

ライセンス: Link先を確認

Dhrumil Patel, Arup Roy, Indranil Chakrabarty, Nirman Ganguly,

(参考訳) ベル非局所性とステアリング(英: Bell nonlocality and steering)は、従来の古典的概念から大きく離れている量子力学の根本的特徴である。基本的には、非局所的不等式に反する分離系間の量子相関の存在を指し、古典的相関のみに制限すれば、その違反はあり得ない。このようなユニークな相関性の重要性を考慮すると、放送と呼ばれるプロトコルである、少数の非局所性を示すより多くの状態を生成することに興味があるかもしれない。しかし、本論文では、ブゼック・ヒラリー(BH)量子クローニング機を用いて、局所的な量子クローニングによるブロードキャストを制限すると、そのような非局所性はブロードキャストできないことを示す。本稿ではCJWR(E.G.Cavalcanti,S.J. Jones,H.M Wiseman,M.D. Reid, Phys.Rev.A 80,032112(2009))の不等式について検討する。測定条件が6以上であれば, 局所最適B-Hクローンを適用すれば, 出力状態の一部が制御可能であることが観察された。いくつかの制限の下で、Werner と Bell の対角状態がそのような手順に従うと、結果として得られる状態は計算不能となる。我々はこの研究を3つの量子ビット系に拡張し、Svetlichnyの不等式を考えると、真の三部体非局所性は普遍的なBH局所量子クローンでは放送できないことを発見した。 2つの量子ビット系に対して、Werner と Bell の対角線上の一般局所ユニタリを10^5$ のシミュレーションで検討した。

Bell nonlocality and steering are archetypal characteristics of quantum mechanics that mark a significant departure from conventional classical notions. They basically refer to the presence of quantum correlations between separated systems which violate a nonlocal inequality, the violation otherwise not possible if we restrict ourselves only to classical correlations. In view of the importance of such unique correlations one may be interested to generate more states exhibiting nonlocality starting from a few, a protocol which is termed as broadcasting. However, in the present submission, we show using universal Buzek-Hillary(BH) quantum cloning machine that, if one restricts to broadcasting through local quantum cloning, then such nonlocality cannot be broadcasted. Our study is done in the purview of the Bell-CHSH inequality and the CJWR (E.G.Cavalcanti,S.J. Jones,H.M Wiseman and M.D. Reid, Phys.Rev.A 80,032112(2009)) steering inequality. It is observed that when number of measurement settings is greater than 6, some of the output states are steerable after the application of local optimal B-H cloners. We find that, under some restrictions, if the Werner and Bell diagonal states are subjected to such procedures, then the resultant state is rendered unsteerable. We extend this study to three qubit systems and find that genuine tripartite nonlocality cannot be broadcasted using universal BH local quantum cloners, when we consider the Svetlichny's inequality. For two qubit systems, we have considered $10^5$ simulated general local unitaries over Werner and Bell-diagonal states and find that for none of these states broadcasting on nonlocality and 3-steerability is possible.

翻訳日:2024-04-28 14:58:07 公開日:2024-04-24

# MAMLはいつ最適か? NLP応用におけるモデル非依存メタラーニングに関する実証的研究

When does MAML Work the Best? An Empirical Study on Model-Agnostic Meta-Learning in NLP Applications ( http://arxiv.org/abs/2005.11700v2 )

ライセンス: Link先を確認

Zequn Liu, Ruiyi Zhang, Yiping Song, Wei Ju, Ming Zhang,

(参考訳) モデルに依存しないメタラーニング手法であるモデル非依存メタラーニング(MAML)は、少数ショットテキスト分類やマルチドメイン低リソース言語生成を含むNLPアプリケーションに成功している。データ量、タスク間の類似性、一般的な言語モデルとタスク固有の適応のバランスなど、多くの影響要因がNLPにおけるMAMLの性能に影響を与えるが、それらを徹底的に研究する研究は少ない。本稿では,これらの影響要因を調査し,MAMLが最適に機能するかどうかを実験的に検討する。

Model-Agnostic Meta-Learning (MAML), a model-agnostic meta-learning method, is successfully employed in NLP applications including few-shot text classification and multi-domain low-resource language generation. Many impacting factors, including data quantity, similarity among tasks, and the balance between general language model and task-specific adaptation, can affect the performance of MAML in NLP, but few works have thoroughly studied them. In this paper, we conduct an empirical study to investigate these impacting factors and conclude when MAML works the best based on the experimental results.

翻訳日:2024-04-28 14:58:07 公開日:2024-04-24

# ブラインド画像復元に向けた深部変動ネットワーク

Deep Variational Network Toward Blind Image Restoration ( http://arxiv.org/abs/2008.10796v5 )

ライセンス: Link先を確認

Zongsheng Yue, Hongwei Yong, Qian Zhao, Lei Zhang, Deyu Meng, Kwan-Yee K. Wong,

(参考訳) ブラインド画像復元(IR)はコンピュータビジョンにおいて一般的な問題であるが難しい問題である。古典的モデルに基づく手法と最近のディープラーニング(DL)に基づく手法は、この問題に対する2つの異なる方法論を表現している。本稿では,その両方の利点を統合することを目的とした,新しいブラインド画像復元手法を提案する。具体的には、劣化過程を明示したブラインドIRのための一般的なベイズ生成モデルを構築する。提案したモデルでは,画像ノイズに適合するために,ガウス分布を画素単位で非i-d-ガウス分布とする。従来のほとんどの方法で採用されている単純なガウス分布やラプラシア分布よりも柔軟性があり、画像劣化に含まれるより複雑なノイズタイプを扱うことができる。モデル解くために,予測されるすべての後部分布をディープニューラルネットワークとしてパラメータ化してモデル能力を向上する変分推論アルゴリズムを設計する。特に、このような推論アルゴリズムは、劣化推定と画像復元のタスクを共同で扱う統一的なフレームワークを誘導する。また、前処理で推定される劣化情報を利用して後者のIRプロセスを導出する。画像デノイングと超解像という2つの典型的なブラインド赤外線タスクの実験により,提案手法が現状よりも優れた性能を達成できることが実証された。

Blind image restoration (IR) is a common yet challenging problem in computer vision. Classical model-based methods and recent deep learning (DL)-based methods represent two different methodologies for this problem, each with their own merits and drawbacks. In this paper, we propose a novel blind image restoration method, aiming to integrate both the advantages of them. Specifically, we construct a general Bayesian generative model for the blind IR, which explicitly depicts the degradation process. In this proposed model, a pixel-wise non-i.i.d. Gaussian distribution is employed to fit the image noise. It is with more flexibility than the simple i.i.d. Gaussian or Laplacian distributions as adopted in most of conventional methods, so as to handle more complicated noise types contained in the image degradation. To solve the model, we design a variational inference algorithm where all the expected posteriori distributions are parameterized as deep neural networks to increase their model capability. Notably, such an inference algorithm induces a unified framework to jointly deal with the tasks of degradation estimation and image restoration. Further, the degradation information estimated in the former task is utilized to guide the latter IR process. Experiments on two typical blind IR tasks, namely image denoising and super-resolution, demonstrate that the proposed method achieves superior performance over current state-of-the-arts.

翻訳日:2024-04-28 14:58:07 公開日:2024-04-24

# ディープラーニングによる外見に基づく視線推定: レビューとベンチマーク

Appearance-based Gaze Estimation With Deep Learning: A Review and Benchmark ( http://arxiv.org/abs/2104.12668v2 )

ライセンス: Link先を確認

Yihua Cheng, Haofei Wang, Yiwei Bao, Feng Lu,

(参考訳) 人間の視線は人間の焦点や意図に関する貴重な情報を提供しており、重要な研究領域となっている。近年,深層学習は外見に基づく視線推定に革命をもたらした。しかし、2次元視線位置と3次元視線ベクトルの不公平な比較や、異なる前処理と後処理の方法など、視線推定研究の独特な特徴から、深層学習に基づく視線推定アルゴリズムを開発するための決定的なガイドラインが欠如している。本稿では,ディープラーニングを用いた外見に基づく視線推定手法の体系的レビューを行う。まず,従来の視線推定アルゴリズムを,深い特徴抽出,深層学習モデル設計,個人キャリブレーション,プラットフォームなど,典型的な視線推定パイプラインに沿って調査する。次に, 顔・目検出, データ修正, 2D/3D視線変換, 視線原点変換などのデータ前処理と後処理の手法を概説する。最後に、深層学習に基づく視線推定のための総合的なベンチマークを設定した。我々は、すべての公開データセットを特徴付け、典型的な視線推定アルゴリズムのソースコードを提供する。本稿では,深層学習に基づく視線推定手法の開発への参考となるだけでなく,将来の視線推定研究の指針となる。プロジェクトのWebページはhttps://phi-ai.buaa.edu.cn/Gazehub.orgにある。

Human gaze provides valuable information on human focus and intentions, making it a crucial area of research. Recently, deep learning has revolutionized appearance-based gaze estimation. However, due to the unique features of gaze estimation research, such as the unfair comparison between 2D gaze positions and 3D gaze vectors and the different pre-processing and post-processing methods, there is a lack of a definitive guideline for developing deep learning-based gaze estimation algorithms. In this paper, we present a systematic review of the appearance-based gaze estimation methods using deep learning. Firstly, we survey the existing gaze estimation algorithms along the typical gaze estimation pipeline: deep feature extraction, deep learning model design, personal calibration and platforms. Secondly, to fairly compare the performance of different approaches, we summarize the data pre-processing and post-processing methods, including face/eye detection, data rectification, 2D/3D gaze conversion and gaze origin conversion. Finally, we set up a comprehensive benchmark for deep learning-based gaze estimation. We characterize all the public datasets and provide the source code of typical gaze estimation algorithms. This paper serves not only as a reference to develop deep learning-based gaze estimation methods, but also a guideline for future gaze estimation research. The project web page can be found at https://phi-ai.buaa.edu.cn/Gazehub.

翻訳日:2024-04-28 14:58:07 公開日:2024-04-24

# Maxwell Demon と Einstein-Podolsky-Rosen ステアリング

Maxwell Demon and Einstein-Podolsky-Rosen Steering ( http://arxiv.org/abs/2105.05656v4 )

ライセンス: Link先を確認

Meng-Jun Hu, Xiao-Min Hu, Yong-Sheng Zhang,

(参考訳) マクスウェルの悪魔と量子絡み合いの研究は、物理学における基礎的な重要性と量子情報への潜在的な応用のために重要である。マクスウェルのデーモンに関するこれまでの研究は、主に量子相関を考慮した熱力学に焦点を当てていた。ここでは、別の観点から考察し、量子非局所性相関が作業によってシミュレートできるかどうかを問う。このため、マックスウェルの悪魔支援型アインシュタイン・ポドルスキー・ローゼン(EPR)ステアリングが提案され、新しいタイプの抜け穴が示唆された。ランダウアーの消去原理の適用は、操舵作業中にこの抜け穴を閉じる唯一の方法は、参加者による局所環境の熱変動を継続的に監視することであることを示している。我々は、超伝導量子コンピュータのような現在のプログラマブル量子プロセッサで実証できる、マックスウェルのデモンアシスト型EPRステアリングの量子回路モデルを構築した。この量子回路モデルに基づいて、デーモンの作用によるエネルギー散逸と量子非局所性相関の関係を記述する定量的な式を得る。この結果は、量子非局所性、情報、熱力学の関係を探索し理解する新しい方法を提供するため、非常に物理的に興味深い。

The study of Maxwell demon and quantum entanglement is important because of its foundational significance in physics and its potential applications in quantum information. Previous research on the Maxwell demon has primarily focused on thermodynamics, taking into account quantum correlations. Here we consider from another perspective and ask whether quantum non-locality correlations can be simulated by performing work. The Maxwell demon-assisted Einstein-Podolsky-Rosen (EPR) steering is thus proposed, which implies a new type of loophole. The application of Landauer's erasure principle suggests that the only way to close this loophole during a steering task is by continuously monitoring the heat fluctuation of the local environment by the participant. We construct a quantum circuit model of Maxwell demon-assisted EPR steering, which can be demonstrated by current programmable quantum processors, such as superconducting quantum computers. Based on this quantum circuit model, we obtain a quantitative formula describing the relationship between energy dissipation due to the work of the demon and quantum non-locality correlation. The result is of great physical interest because it provides a new way to explore and understand the relationship between quantum non-locality, information, and thermodynamics.

翻訳日:2024-04-27 00:45:56 公開日:2024-04-24

# 偽のCOVID-19物語が相次ぐ-時相分析

The False COVID-19 Narratives That Keep Being Debunked: A Spatiotemporal Analysis ( http://arxiv.org/abs/2107.12303v3 )

ライセンス: Link先を確認

Iknoor Singh, Kalina Bontcheva, Carolina Scarton,

(参考訳) 新型コロナウイルス(COVID-19)のパンデミックが始まり、世界のインフォデミックは市民、メディア、ファクトチェッカーに前例のない挑戦をもたらした。この課題に対処するため、世界中の100以上のファクトチェックイニシアチブが、彼らの国の情報空間を監視し、新型コロナウイルス(COVID-19)の物語を定期的に公開してきた。本研究では、さまざまなファクトチェック組織によって複数の言語で公開された新型コロナウイルスに関連する10,381件の文書を含むCoronaVirusFacts Allianceのデータベースを調査した。我々の時空間分析では、類似またはほぼ重複した偽の新型コロナウイルスの物語が、様々な国の様々なソーシャルメディアプラットフォームで拡散していることが明らかとなり、時にはその物語の最初の一節が国際ファクトチェックネットワーク(IFCN)のファクトチェッカーによって公表されてから数ヶ月も経っている。また、一般的な医療アドバイスを含む誤報が複数の国に広まっていることもわかりました。さらに、手動のファクトチェックはそれ自体が厄介な作業であるため、異なる国で同じ物語を繰り返す必要性は、時間とともに、ファクトチェックリソースのかなりの無駄に導かれる。この目的のために我々は,ファクトチェックパイプラインに多言語デバンク検索ツールを組み込むことを提案し,また,不足するファクトチェックリソースを最大限に活用するために,ソーシャルメディアプラットフォームが大規模に同じ技術を採用する必要があることを強く推奨する。

The onset of the COVID-19 pandemic led to a global infodemic that has brought unprecedented challenges for citizens, media, and fact-checkers worldwide. To address this challenge, over a hundred fact-checking initiatives worldwide have been monitoring the information space in their countries and publishing regular debunks of viral false COVID-19 narratives. This study examines the database of the CoronaVirusFacts Alliance, which contains 10,381 debunks related to COVID-19 published in multiple languages by different fact-checking organisations. Our spatiotemporal analysis reveals that similar or nearly duplicate false COVID-19 narratives have been spreading in multiple modalities and on various social media platforms in different countries, sometimes as much as several months after the first debunk of that narrative has been published by an International Fact-checking Network (IFCN) fact-checker. We also find that misinformation involving general medical advice has spread across multiple countries and hence has the highest proportion of false COVID-19 narratives that keep being debunked. Furthermore, as manual fact-checking is an onerous task in itself, therefore the need to repeatedly debunk the same narrative in different countries is leading, over time, to a significant waste of fact-checker resources. To this end, we propose the idea of including a multilingual debunk search tool in the fact-checking pipeline, in addition to recommending strongly that social media platforms need to adopt the same technology at scale, so as to make the best use of scarce fact-checker resources.

翻訳日:2024-04-27 00:45:56 公開日:2024-04-24

# 単眼3次元物体検出のための投影モデルによる幾何学誘導深度学習

Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection ( http://arxiv.org/abs/2107.13931v2 )

ライセンス: Link先を確認

Yinmin Zhang, Xinzhu Ma, Shuai Yi, Jun Hou, Zhihui Wang, Wanli Ouyang, Dan Xu,

(参考訳) 自動運転の重要な課題として、近年3Dオブジェクト検出は大きな進歩を遂げている。しかし, 深度推定における不満足な性能のため, 単分子3次元物体検出は依然として困難な問題である。既存のモノクラー法は、通常、シーンの深さを直接回帰するが、深さと様々な幾何学的要素(例えば、境界箱のサイズ、3Dオブジェクトの寸法、オブジェクトのポーズ)の間の重要な関係を無視している。本稿では,投影モデルを用いて幾何学誘導深度推定を学習し,モノクル3次元物体検出を推し進めることを提案する。具体的には,モノクロ3次元物体検出ネットワークにおける2次元および3次元深度予測の投影モデルを用いた原理的幾何式を考案した。さらに,提案式の実装と組込みにより,幾何を考慮した深部表現学習が可能となり,深部推定の促進に有効な2次元および3次元インタラクションが可能となった。さらに,2次元アノテーションと投影ボックスの相違に対処し,幾何学式による頑健な学習を確保することで,強力なベースラインを提供する。 KITTIデータセットを用いた実験により, 適度なテスト設定において, 余分なデータを必要としない最先端単分子法の検出性能を2.80%向上することを確認した。モデルとコードはhttps://github.com/YinminZhang/MonoGeo.comでリリースされる。

As a crucial task of autonomous driving, 3D object detection has made great progress in recent years. However, monocular 3D object detection remains a challenging problem due to the unsatisfactory performance in depth estimation. Most existing monocular methods typically directly regress the scene depth while ignoring important relationships between the depth and various geometric elements (e.g. bounding box sizes, 3D object dimensions, and object poses). In this paper, we propose to learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection. Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised. We further implement and embed the proposed formula to enable geometry-aware deep representation learning, allowing effective 2D and 3D interactions for boosting the depth estimation. Moreover, we provide a strong baseline through addressing substantial misalignment between 2D annotation and projected boxes to ensure robust learning with the proposed geometric formula. Experiments on the KITTI dataset show that our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting. The model and code will be released at https://github.com/YinminZhang/MonoGeo.

翻訳日:2024-04-27 00:45:56 公開日:2024-04-24

# 超伝導ニオブの2レベル系損失源としての一酸化ニオブ中の酸素空孔

Oxygen Vacancies in Niobium Pentoxide as a Source of Two-Level System Losses in Superconducting Niobium ( http://arxiv.org/abs/2108.13352v3 )

ライセンス: Link先を確認

Daniel Bafia, Akshay Murthy, Anna Grassellino, Alexander Romanenko,

(参考訳) 酸化ニオブからなる3次元超伝導無線周波数共振器と2次元トランスモン量子ビットの量子デコヒーレンスの主源を同定した。時空二次イオン質量分析法 (ToF-SIMS) を用いて, バルクNb SRF共振器のRF特性および代表Nb試料の酸化物構造に及ぼすシーケンシャル \textit{in situ} 真空焼成処理の影響を調べたところ, Nb\textsubscript{2}O\textsubscript{5} の空隙発生と酸化物厚みの減少に相関する空洞品質係数$Q_0$の非単調進化が認められた。この効果を酸化膜自体に局在させ, 酸化膜を酸化膜に再成長させることにより, TLS損失の緩和を図り, Nb中での拡散間質酸素の役割を明らかにした。我々は、一酸化炭素中のこれらの空孔が磁気不純物であり、TLSによるRF損失の原因であると仮定する。

We identify a major source of quantum decoherence in three-dimensional superconducting radio-frequency (SRF) resonators and two-dimensional transmon qubits composed of oxidized niobium: oxygen vacancies in the niobium pentoxide which drive two-level system (TLS) losses. By probing the effect of sequential \textit{in situ} vacuum baking treatments on the RF performance of bulk Nb SRF resonators and on the oxide structure of a representative Nb sample using time-of-flight secondary ion mass spectrometry (ToF-SIMS), we find a non-monotonic evolution of cavity quality factor $Q_0$ which correlates with the interplay of Nb\textsubscript{2}O\textsubscript{5} vacancy generation and oxide thickness reduction. We localize this effect to the oxide itself and present the insignificant role of diffused interstitial oxygen in the underlying Nb by regrowing a new oxide \textit{via} wet oxidation which reveals a mitigation of aggravated TLS losses. We hypothesize that such vacancies in the pentoxide serve as magnetic impurities and are a source of TLS-driven RF loss.

翻訳日:2024-04-27 00:45:56 公開日:2024-04-24

# ICDM 2020 Knowledge Graph Contest: Consumer Event-Cause extract

ICDM 2020 Knowledge Graph Contest: Consumer Event-Cause Extraction ( http://arxiv.org/abs/2110.15722v2 )

ライセンス: Link先を確認

Congqing He, Jie Zhang, Xiangyu Zhu, Huan Liu, Yukun Huang,

(参考訳) テキスト中の特定のイベントの背後にある潜在的な原因を抽出するタスクであるConsumer Event-Cause extractは、その幅広い応用により近年注目を集めている。 ICDM 2020は、特定の主題(ブランドや製品)でイベントやイベントの原因を抽出することを目的とした評価コンペを開催する。このタスクでは、主にエンドツーエンドモデルの構築方法に注目し、複数のイベントタイプとイベント原因を同時に抽出する。そこで本稿では,イベントタイプやイベント原因を別々に抽出する代わりに,リレーショナルイベント原因抽出タスクを再検討する新たな視点を導入し,新しいシーケンスタギングフレームワークを提案する。実験では,エンコーダモジュールが初期化事前学習されたBERTエンコーダを使用して,新たなタグ付けフレームワークのパワーを示す場合においても,ベースラインメソッドよりも優れた性能を示す。この大会では,私たちのチームが第1ステージのリーダーボードで1位,最終ステージのリーダーボードで3位を獲得しました。

Consumer Event-Cause Extraction, the task aimed at extracting the potential causes behind certain events in the text, has gained much attention in recent years due to its wide applications. The ICDM 2020 conference sets up an evaluation competition that aims to extract events and the causes of the extracted events with a specified subject (a brand or product). In this task, we mainly focus on how to construct an end-to-end model, and extract multiple event types and event-causes simultaneously. To this end, we introduce a fresh perspective to revisit the relational event-cause extraction task and propose a novel sequence tagging framework, instead of extracting event types and events-causes separately. Experiments show our framework outperforms baseline methods even when its encoder module uses an initialized pre-trained BERT encoder, showing the power of the new tagging framework. In this competition, our team achieved 1st place in the first stage leaderboard, and 3rd place in the final stage leaderboard.

翻訳日:2024-04-27 00:45:56 公開日:2024-04-24

# BQPのアクロバティックス

The Acrobatics of BQP ( http://arxiv.org/abs/2111.10409v4 )

ライセンス: Link先を確認

Scott Aaronson, DeVon Ingram, William Kretschmer,

(参考訳) ランダム化アルゴリズムが使用するランダム性を修正することができるが、量子性アルゴリズムが使用する量子性を修正するという類似概念は存在しない。この基本的な違いを説明すれば、ブラックボックスの設定では、量子多項式時間($\mathsf{BQP}$)の振舞いは、$\mathsf{NP}$のような古典的な複雑性クラスと著しく分離できることが示される。具体的には、–あるオラクルが存在し、$\mathsf{NP^{BQP}}\not\subset\mathsf{BQP^{PH}}$は、フォーチュウの2005年の問題を解く。圏として、$\mathsf{P}=\mathsf{NP}$であるが、$\mathsf{BQP}\neq\mathsf{QCMA}$であるようなオラクルが存在する。逆に、$\mathsf{BQP^{NP}}\not\subset\mathsf{PH^{BQP}}$であるようなオラクルが存在する。 -ランダムオラクルに対して、$\mathsf{PP}=\mathsf{PostBQP}$は "$\mathsf{QMA}$ hierarchy" $\mathsf{QMA}^{\mathsf{QMA}^{\mathsf{QMA}^{\cdots}}}$には含まれない。 -ランダムオラクルに対して、$\mathsf{\Sigma}_{k+1}^\mathsf{P}\not\subset\mathsf{BQP}^{\mathsf{\Sigma}_{k}^\mathsf{P}}$ for every $k$。オラクルは、$\mathsf{BQP}=\mathsf{P^{\# P}}$ に対して、$\mathsf{PH}$ は無限である。 -その関係は、$\mathsf{P}=\mathsf{NP}\neq\mathsf{BQP}=\mathsf{P^{\# P}}$である。これらの結果を達成するために、Raz と Tal による2018 年のオラクルの業績を $\mathsf{BQP}\not \subset \mathsf{PH}$ と比較し、Forrelation 問題に関する関連する結果に基づける。また、独立した関心を持つかもしれない新しいツールも導入します。ランダム制限法の「量子認識」バージョン、$\mathsf{AC^0}$回路のブロック感度に対する濃度定理、スパースオラクルに対するアーロンソン・アンバイニス射影の(証明可能な)アナログを含む。

One can fix the randomness used by a randomized algorithm, but there is no analogous notion of fixing the quantumness used by a quantum algorithm. Underscoring this fundamental difference, we show that, in the black-box setting, the behavior of quantum polynomial-time ($\mathsf{BQP}$) can be remarkably decoupled from that of classical complexity classes like $\mathsf{NP}$. Specifically: -There exists an oracle relative to which $\mathsf{NP^{BQP}}\not\subset\mathsf{BQP^{PH}}$, resolving a 2005 problem of Fortnow. As a corollary, there exists an oracle relative to which $\mathsf{P}=\mathsf{NP}$ but $\mathsf{BQP}\neq\mathsf{QCMA}$. -Conversely, there exists an oracle relative to which $\mathsf{BQP^{NP}}\not\subset\mathsf{PH^{BQP}}$. -Relative to a random oracle, $\mathsf{PP}=\mathsf{PostBQP}$ is not contained in the "$\mathsf{QMA}$ hierarchy" $\mathsf{QMA}^{\mathsf{QMA}^{\mathsf{QMA}^{\cdots}}}$. -Relative to a random oracle, $\mathsf{\Sigma}_{k+1}^\mathsf{P}\not\subset\mathsf{BQP}^{\mathsf{\Sigma}_{k}^\mathsf{P}}$ for every $k$. -There exists an oracle relative to which $\mathsf{BQP}=\mathsf{P^{\# P}}$ and yet $\mathsf{PH}$ is infinite. -There exists an oracle relative to which $\mathsf{P}=\mathsf{NP}\neq\mathsf{BQP}=\mathsf{P^{\# P}}$. To achieve these results, we build on the 2018 achievement by Raz and Tal of an oracle relative to which $\mathsf{BQP}\not \subset \mathsf{PH}$, and associated results about the Forrelation problem. We also introduce new tools that might be of independent interest. These include a "quantum-aware" version of the random restriction method, a concentration theorem for the block sensitivity of $\mathsf{AC^0}$ circuits, and a (provable) analogue of the Aaronson-Ambainis Conjecture for sparse oracles.

翻訳日:2024-04-27 00:45:56 公開日:2024-04-24

# 非一様超グラフ確率ブロックモデルにおける部分回復と弱い整合性

Partial recovery and weak consistency in the non-uniform hypergraph Stochastic Block Model ( http://arxiv.org/abs/2112.11671v3 )

ライセンス: Link先を確認

Ioana Dumitriu, Haixiao Wang, Yizhe Zhu,

(参考訳) 本研究では,非一様ハイパーグラフ確率ブロックモデル(HSBM)に基づくスパース・ランダム・ハイパーグラフにおけるコミュニティ検出問題について考察する。ランダムハイパーグラフが有界次数を持つ場合、少なくとも$\gamma$区切りを正しく分類した頂点を出力するスペクトルアルゴリズムを提供し、$\gamma\in (0.5,1)$はモデルの信号-雑音比(SNR)に依存する。頂点数が無限に近づくにつれてSNRが緩やかに増加すると、我々のアルゴリズムは弱い一貫性を達成し、非一様HSBMに対するGhoshdastidar と Dukkipati (2017) の以前の結果を改善する。スペクトルアルゴリズムは,(1) ハイパーエッジ選択: 誘導されたサブハイパーグラフに対して最大信号-雑音比を提供するために,特定のサイズのハイパーエッジを選択する; (2) スペクトル分割: 正規化された隣接行列を構築し,特異ベクトルに基づいて近似的な分割を得る; (3) 補正とマージ: 隣接テンソルからのハイパーエッジ情報を組み込んでエラー率保証をアップグレードする。本アルゴリズムの理論的解析は,非一様非一様ハイパーグラフに対する隣接行列の濃度と正則化に依存する。

We consider the community detection problem in sparse random hypergraphs under the non-uniform hypergraph stochastic block model (HSBM), a general model of random networks with community structure and higher-order interactions. When the random hypergraph has bounded expected degrees, we provide a spectral algorithm that outputs a partition with at least a $\gamma$ fraction of the vertices classified correctly, where $\gamma\in (0.5,1)$ depends on the signal-to-noise ratio (SNR) of the model. When the SNR grows slowly as the number of vertices goes to infinity, our algorithm achieves weak consistency, which improves the previous results in Ghoshdastidar and Dukkipati (2017) for non-uniform HSBMs. Our spectral algorithm consists of three major steps: (1) Hyperedge selection: select hyperedges of certain sizes to provide the maximal signal-to-noise ratio for the induced sub-hypergraph; (2) Spectral partition: construct a regularized adjacency matrix and obtain an approximate partition based on singular vectors; (3) Correction and merging: incorporate the hyperedge information from adjacency tensors to upgrade the error rate guarantee. The theoretical analysis of our algorithm relies on the concentration and regularization of the adjacency matrix for sparse non-uniform random hypergraphs, which can be of independent interest.

翻訳日:2024-04-27 00:45:56 公開日:2024-04-24

# FIRST:FrontrunnIngのレジリエントなスマートコントラクト

FIRST: FrontrunnIng Resilient Smart ConTracts ( http://arxiv.org/abs/2204.00955v3 )

ライセンス: Link先を確認

Emrah Sariboz, Gaurav Panwar, Roopa Vishwanathan, Satyajayant Misra,

(参考訳) 暗号通貨の使用量の増加により、貸し出し、借り入れ、マージン取引などの従来の金融応用を暗号通貨の世界に広く浸透させてきた。一部のケースでは、本質的に透明で規制されていない暗号通貨が、これらのアプリケーションのユーザを攻撃します。悪意のあるエンティティは、現在処理されていない金融トランザクションの知識を活用し、未処理のトランザクションの前に独自のトランザクションを実行しようとする。この結果、財務的損失、不正確なトランザクション、さらにはより多くの攻撃にさらされる可能性がある。本稿では、最前線攻撃を防ぐフレームワークであるFIRSTを提案し、検証遅延関数やアグリゲートシグネチャを含む暗号プロトコルを用いて構築する。我々の設計では、VDFの公開パラメータを生成するためのフェデレートされたセットアップがあり、単一の信頼できるセットアップの必要性を排除しています。我々は、FIRSTを正式に分析し、Universal Composabilityフレームワークを用いてセキュリティを証明し、FIRSTの有効性を実験的に実証する。

Owing to the meteoric rise in the usage of cryptocurrencies, there has been a widespread adaptation of traditional financial applications such as lending, borrowing, margin trading, and more, to the cryptocurrency realm. In some cases, the inherently transparent and unregulated nature of cryptocurrencies leads to attacks on users of these applications. One such attack is frontrunning, where a malicious entity leverages the knowledge of currently unprocessed financial transactions submitted by users and attempts to get its own transaction(s) executed ahead of the unprocessed ones. The consequences of this can be financial loss, inaccurate transactions, and even exposure to more attacks. We propose FIRST, a framework that prevents frontrunning attacks, and is built using cryptographic protocols including verifiable delay functions and aggregate signatures. In our design, we have a federated setup for generating the public parameters of the VDF, thus removing the need for a single trusted setup. We formally analyze FIRST, prove its security using the Universal Composability framework and experimentally demonstrate the effectiveness of FIRST.

翻訳日:2024-04-27 00:37:16 公開日:2024-04-24

# 劣化適応を用いた3次元MRI超解像の教師なし表現学習

Unsupervised Representation Learning for 3D MRI Super Resolution with Degradation Adaptation ( http://arxiv.org/abs/2205.06891v5 )

ライセンス: Link先を確認

Jianan Liu, Hao Li, Tao Huang, Euijoon Ahn, Kang Han, Adeel Razi, Wei Xiang, Jinman Kim, David Dagan Feng,

(参考訳) 高分解能(HR)磁気共鳴イメージングは、診断や画像誘導治療において医師を支援する上で重要である。しかし、HR画像の取得には時間と費用がかかる。その結果、低分解能(LR)画像から超解像(SR)画像を生成するための有望な解決策として、ディープラーニングに基づく超解像再構成(SRR)が登場した。残念なことに、そのようなニューラルネットワークのトレーニングには、画像取得中と画像取得間の患者の動きのために取得が困難である、整列したHRとLRイメージペアが必要である。硬組織の硬い動きは画像登録によって補正できるが、変形した軟組織の整列は複雑であり、真正なHRとLRイメージペアでニューラルネットワークを訓練することは不可能である。従来の研究では、真正なHR画像とダウンサンプリングされた合成LR画像を用いてSRRに焦点を当ててきた。しかし,合成LR画像と真性LR画像の劣化表現の違いは,真性LR画像から再構成したSR画像の品質を抑制する。この問題に対処するため,我々は,Unsupervised Degradation Adaptation Network (UDEAN)を提案する。我々のネットワークは劣化学習ネットワークとSRRネットワークで構成されている。劣化学習ネットワークは、不整合または不整合LR画像から学習した劣化表現を用いてHR画像をダウンサンプリングする。 SRRネットワークは、ダウンサンプリングされたHR画像から元の画像へのマッピングを学習する。実験の結果,本手法は最先端ネットワークよりも優れており,臨床現場での課題に対して有望な解決法であることがわかった。

High-resolution (HR) magnetic resonance imaging is critical in aiding doctors in their diagnoses and image-guided treatments. However, acquiring HR images can be time-consuming and costly. Consequently, deep learning-based super-resolution reconstruction (SRR) has emerged as a promising solution for generating super-resolution (SR) images from low-resolution (LR) images. Unfortunately, training such neural networks requires aligned authentic HR and LR image pairs, which are challenging to obtain due to patient movements during and between image acquisitions. While rigid movements of hard tissues can be corrected with image registration, aligning deformed soft tissues is complex, making it impractical to train neural networks with authentic HR and LR image pairs. Previous studies have focused on SRR using authentic HR images and down-sampled synthetic LR images. However, the difference in degradation representations between synthetic and authentic LR images suppresses the quality of SR images reconstructed from authentic LR images. To address this issue, we propose a novel Unsupervised Degradation Adaptation Network (UDEAN). Our network consists of a degradation learning network and an SRR network. The degradation learning network downsamples the HR images using the degradation representation learned from the misaligned or unpaired LR images. The SRR network then learns the mapping from the down-sampled HR images to the original ones. Experimental results show that our method outperforms state-of-the-art networks and is a promising solution to the challenges in clinical settings.

翻訳日:2024-04-27 00:37:16 公開日:2024-04-24

# 教師なし時系列異常検出のための校正一級分類

Calibrated One-class Classification for Unsupervised Time Series Anomaly Detection ( http://arxiv.org/abs/2207.12201v2 )

ライセンス: Link先を確認

Hongzuo Xu, Yijie Wang, Songlei Jian, Qing Liao, Yongjun Wang, Guansong Pang,

(参考訳) 時系列異常検出は、様々な領域におけるシステム可用性を維持するのに重要である。この研究ラインにおける現在の研究は、先進的なニューラルネットワーク構造を考案し、新しい再構築・予測学習目標を考案することによって、データの正規性を深く、包括的に学習することに焦点を当てている。しかし、その一級学習過程は、教師なしパラダイムの下での訓練データ(すなわち異常汚染)の潜伏異常によって誤解されることがある。彼らの学習プロセスは異常に関する知識も欠如している。その結果、バイアスのある不正確な正規性境界をしばしば学習する。これらの問題に対処するために,不確実性モデルに基づく校正とネイティブな異常に基づく校正による汚染耐性,データ正規性の異常情報学習を実現した,異常検出のための校正一級分類を提案する。具体的には、最適化中に不規則なサンプルを不規則に抑えるための不確実な予測を適応的に適用し、同時に正規サンプルに対する確実な予測を奨励し、効果的な正規性学習を確実にする。これにより、異常な汚染による悪影響がほとんど軽減される。また,本手法は時系列異常動作をシミュレートするための摂動による自然異常例も生成する。これらのダミー異常を識別することで、我々の一級学習はさらに校正され、より正確な正規性境界を形成する。 10の実世界のデータセットに対する大規模な実験により、我々のモデルは16の最先端の競合者よりも大幅に改善されていることが示される。

Time series anomaly detection is instrumental in maintaining system availability in various domains. Current work in this research line mainly focuses on learning data normality deeply and comprehensively by devising advanced neural network structures and new reconstruction/prediction learning objectives. However, their one-class learning process can be misled by latent anomalies in training data (i.e., anomaly contamination) under the unsupervised paradigm. Their learning process also lacks knowledge about the anomalies. Consequently, they often learn a biased, inaccurate normality boundary. To tackle these problems, this paper proposes calibrated one-class classification for anomaly detection, realizing contamination-tolerant, anomaly-informed learning of data normality via uncertainty modeling-based calibration and native anomaly-based calibration. Specifically, our approach adaptively penalizes uncertain predictions to restrain irregular samples in anomaly contamination during optimization, while simultaneously encouraging confident predictions on regular samples to ensure effective normality learning. This largely alleviates the negative impact of anomaly contamination. Our approach also creates native anomaly examples via perturbation to simulate time series abnormal behaviors. Through discriminating these dummy anomalies, our one-class learning is further calibrated to form a more precise normality boundary. Extensive experiments on ten real-world datasets show that our model achieves substantial improvement over sixteen state-of-the-art contenders.

翻訳日:2024-04-27 00:37:16 公開日:2024-04-24

# 信頼できない教師からの正直な学生:事前学習された言語モデルから解釈可能な質問答えパイプラインを学習する

Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model ( http://arxiv.org/abs/2210.02498v3 )

ライセンス: Link先を確認

Jacob Eisenstein, Daniel Andor, Bernd Bohnet, Michael Collins, David Mimno,

(参考訳) 説明可能な質問応答システムは、正確な回答だけでなく、推論を正当化し、人間が作業を確認するための合理的な根拠も生み出すべきである。しかし、どんな理屈が役に立つのか、どうやってシステムをトレーニングして生産できるのか? 本稿では,オープンブックの質問応答に対する新たな論理的手法である「emph{markup-and-mask}」を提案する。マークアップフェーズでは、節は自由テキストのマークアップで拡張され、各文は談話コンテキストの外側で独立して立つことができる。マスキングフェーズでは、マークアップ通路のサブスパンが選択される。アノテーションを使わずにマークアップ・アンド・マスクの合理性を生成するシステムを訓練するには,文脈内学習を活用する。具体的には,教師として機能する凍結した事前学習言語モデルに一連のプロンプトを送信することで,銀アノテートデータを生成する。次に、正しい答えをもたらす有理数の部分集合をトレーニングすることで、より小さな学生モデルを微調整する。生徒は、それがパイプラインであるという意味では「最高」であり、理性は通路と答えの間のボトルネックとして機能し、「信頼できない」教師はそのような制約を受けない。したがって、エンドタスクアノテーションとフリーズされた事前訓練された言語モデルを組み合わせて、信頼できるパイプラインシステムを構築する新しい方法を提供する。

Explainable question answering systems should produce not only accurate answers but also rationales that justify their reasoning and allow humans to check their work. But what sorts of rationales are useful and how can we train systems to produce them? We propose a new style of rationale for open-book question answering, called \emph{markup-and-mask}, which combines aspects of extractive and free-text explanations. In the markup phase, the passage is augmented with free-text markup that enables each sentence to stand on its own outside the discourse context. In the masking phase, a sub-span of the marked-up passage is selected. To train a system to produce markup-and-mask rationales without annotations, we leverage in-context learning. Specifically, we generate silver annotated data by sending a series of prompts to a frozen pretrained language model, which acts as a teacher. We then fine-tune a smaller student model by training on the subset of rationales that led to correct answers. The student is "honest" in the sense that it is a pipeline: the rationale acts as a bottleneck between the passage and the answer, while the "untrusted" teacher operates under no such constraints. Thus, we offer a new way to build trustworthy pipeline systems from a combination of end-task annotations and frozen pretrained language models.

翻訳日:2024-04-27 00:37:16 公開日:2024-04-24

# スケールにおけるノイズ・ロバストデ複製

Noise-Robust De-Duplication at Scale ( http://arxiv.org/abs/2210.04261v2 )

ライセンス: Link先を確認

Emily Silcock, Luca D'Amico-Wong, Jinglin Yang, Melissa Dell,

(参考訳) 大規模でノイズの多いテキストコーパス内の重複のほぼ特定には、トレーニングデータセットの非重複化、プライバシーリスクの低減、テストセットリークの評価、大規模なコーパス内の再生されたニュース記事や文学の特定など、数多くのアプリケーションがある。これらの多様なアプリケーションの中で、圧倒的な作業はN-gramに依存している。 N-gram法がいかにうまく機能するかを評価するための限定的な努力がなされているが、その理由の一部は、大規模なコーパスに対して、どのように偏りのない評価データセットを作成できるかがはっきりしないためである。本研究は,27,210個の文書データセットと122,876個の正の重複ペアを作成し,ノイズ・ロバスト重複の除去について検討する。ニュースのタイムセンシティブさは、コーパスの全体サイズが大きいにも関わらず、短い日付範囲内で重複が発生するため、包括的ハンドラベリングを可能にする。この研究は、ハッシュとN-gramオーバーラップ(文学において支配的な)、対照的に訓練されたバイエンコーダ、およびバイエンコーダとクロスエンコーダを組み合わせたリランクスタイルアプローチなど、様々な非複製手法を開発し、評価する。神経アプローチはハッシュとN-gramの重なりを著しく上回る。バイエンコーダのスケールは良好で、1つのGPUカードに1000万記事のコーパスを数時間で非重複化する。また、トレーニング済みのモデルをRealNewsやC4(Colossal Clean Crawled Corpus)の特許部分に適用し、ニューラルアプローチは、様々な種類のノイズの存在下で、ハッシュによって欠落した多くのほぼ重複を識別できることを示した。 NEWS-COPYの非重複データセット、コードベース、事前訓練されたモデルのパブリックリリースは、さらなる研究と応用を促進するでしょう。

Identifying near duplicates within large, noisy text corpora has a myriad of applications that range from de-duplicating training datasets, reducing privacy risk, and evaluating test set leakage, to identifying reproduced news articles and literature within large corpora. Across these diverse applications, the overwhelming majority of work relies on N-grams. Limited efforts have been made to evaluate how well N-gram methods perform, in part because it is unclear how one could create an unbiased evaluation dataset for a massive corpus. This study uses the unique timeliness of historical news wires to create a 27,210 document dataset, with 122,876 positive duplicate pairs, for studying noise-robust de-duplication. The time-sensitivity of news makes comprehensive hand labelling feasible - despite the massive overall size of the corpus - as duplicates occur within a narrow date range. The study then develops and evaluates a range of de-duplication methods: hashing and N-gram overlap (which predominate in the literature), a contrastively trained bi-encoder, and a re-rank style approach combining a bi- and cross-encoder. The neural approaches significantly outperform hashing and N-gram overlap. We show that the bi-encoder scales well, de-duplicating a 10 million article corpus on a single GPU card in a matter of hours. We also apply our pre-trained model to the RealNews and patent portions of C4 (Colossal Clean Crawled Corpus), illustrating that a neural approach can identify many near duplicates missed by hashing, in the presence of various types of noise. The public release of our NEWS-COPY de-duplication dataset, codebase, and the pre-trained models will facilitate further research and applications.

翻訳日:2024-04-27 00:37:16 公開日:2024-04-24

# 複数部品の絡み合い生成へのショートカット:ボソン減算へのグラフアプローチ

Shortcut to Multipartite Entanglement Generation: A Graph Approach to Boson Subtractions ( http://arxiv.org/abs/2211.04042v5 )

ライセンス: Link先を確認

Seungbeom Chin, Yong-Su Kim, Marcin Karczewski,

(参考訳) 本稿では,線形ボソニック系における多部交絡を生成するスキームを体系的に探索するグラフ手法を提案する。階層型エンタングルメント生成は、ポストセレクトされたタスクよりも量子タスクに対する許容可能なスキームを提供するが、一般的にはマルチパーティイトシステムのための適切な回路を見つけることは困難である。ボソンサブトラクションからのグラフマッピングは,回路設計の限界を克服するための便利な手法であることを示す。本稿では,グラフ手法の実装を通じて限界を緩和する実践的戦略を提案する。我々の物理的な構成は彫刻プロトコルに基づいており、これは1つのボソンの空間的に重なり合ったサブトラクションを1つのボソンのフォック状態に変換するものである。キュービットN-パーティイトGHZおよびW状態の一般的なスキームを特定し、従来のスキームよりもはるかに効率的である。さらに、$N=3$ GHZ と W の絡み合った状態の重ね合わせを生成するためのスキームは、より一般化された絡み合った状態の形式を導出するために我々のアプローチを拡張することができることを示している。さらに,従来の提案よりもかなり少ない粒子を必要とするN-パーティイトGHZ状態生成方式が発見された。これらの結果は,厳密な密接な絡み合った状態を生成するための最適化された解を発見する上で,我々のアプローチの力を示すものである。概念実証として,ベル状態生成のための線形光学スキームを提案する。我々は本手法が多様な絡み合いを生み出す上で有望なツールになることを期待している。

We propose a graph method for systematically searching for schemes that can generate multipartite entanglement in linear bosonic systems with heralding. While heralded entanglement generation offers more tolerable schemes for quantum tasks than postselected ones, it is generally more challenging to find appropriate circuits for multipartite systems. We show that our graph mapping from boson subtractions provides handy tactics to overcome the limitations in circuit designs. We present a practical strategy to mitigate the limitation through the implementation of our graph technique. Our physical setup is based on the sculpting protocol, which utilizes an $ N$ spatially overlapped subtractions of single bosons to convert Fock states of evenly distributed bosons into entanglement. We have identified general schemes for qubit N-partite GHZ and W states, which are significantly more efficient than previous schemes. In addition, our scheme for generating the superposition of $N=3$ GHZ and W entangled states illustrates that our approach can be extended to derive more generalized forms of entangled states. Furthermore, we have found an N-partite GHZ state generation scheme for qudits, which requires substantially fewer particles than previous proposals. These results demonstrate the power of our approach in discovering optimized solutions for the generation of intricate heralded entangled states. As a proof of concept, we propose a linear optical scheme for the generation of the Bell state by heralding detections. We expect our method to serve as a promising tool in generating diverse entanglement.

翻訳日:2024-04-27 00:37:16 公開日:2024-04-24

# 低精度環境下でのリプシッツ連続損失関数に対するSGDの変動

Variants of SGD for Lipschitz Continuous Loss Functions in Low-Precision Environments ( http://arxiv.org/abs/2211.04655v7 )

ライセンス: Link先を確認

Michael R. Metel,

(参考訳) この研究は、低精度算術環境でのニューラルネットワークトレーニングによって動機付けられ、適応的なステップサイズと計算誤差を用いたSGDの変種収束について研究する。一般確率的リプシッツ連続損失関数を考えると、クラーク定常点への漸近収束と近似定常点への非漸近収束が証明される。損失関数の確率勾配の近似のみを計算し、SGDステップ自体の誤差を計算できると仮定する。 SGDの異なる変種を経験的にテストし、2つの画像認識タスクに対してSGDと比較してテストセットの精度が改善された。

Motivated by neural network training in low-precision arithmetic environments, this work studies the convergence of variants of SGD using adaptive step sizes with computational error. Considering a general stochastic Lipschitz continuous loss function, an asymptotic convergence result to a Clarke stationary point is proven as well as the non-asymptotic convergence to an approximate stationary point. It is assumed that only an approximation of the loss function's stochastic gradient can be computed in addition to error in computing the SGD step itself. Different variants of SGD are tested empirically, where improved test set accuracy is observed compared to SGD for two image recognition tasks.

翻訳日:2024-04-27 00:37:16 公開日:2024-04-24

# デュアルラベル分布を用いた軽量顔面運動性予測

Lightweight Facial Attractiveness Prediction Using Dual Label Distribution ( http://arxiv.org/abs/2212.01742v2 )

ライセンス: Link先を確認

Shu Liu, Enquan Huang, Ziyu Zhou, Yan Xu, Xiaoyan Kui, Tao Lei, Hongying Meng,

(参考訳) 顔の魅力予測(FAP)は、人間の美的知覚に基づいて顔の魅力を自動的に評価することを目的としている。ディープ畳み込みニューラルネットワークを用いた従来の手法では性能が向上したが、大規模なモデルでは柔軟性が欠如している。さらに、ほとんどのメソッドはデータセットを完全に活用することができません。本稿では,デュアルラベル分布と軽量設計を統合した新しいエンドツーエンドFAP手法を提案する。手動のレーティング、魅力スコア、標準偏差を明示的に集計して、2ラベルの分布を構築し、魅力分布や評価分布を含むデータセットを最大限に活用する。このような分布は,ラベル分散学習(LDL)パラダイムに基づく共同学習フレームワークで最適化される。データ処理は軽量な設計では最小限に単純化され、MobileNetV2がバックボーンとして選択されます。 2つのベンチマークデータセットで大規模な実験を行い、提案手法は有望な結果を達成し、性能と効率のバランスをとることに成功した。アブレーション研究は、繊細に設計された学習モジュールが必須であり、相関していることを示している。さらに, この手法は, 顔の魅力を知覚し, 魅力ある顔領域を捉え, 意味的予測を容易にすることが示唆された。コードはhttps://github.com/enquan/2D_FAPで公開されている。

Facial attractiveness prediction (FAP) aims to assess facial attractiveness automatically based on human aesthetic perception. Previous methods using deep convolutional neural networks have improved the performance, but their large-scale models have led to a deficiency in flexibility. In addition, most methods fail to take full advantage of the dataset. In this paper, we present a novel end-to-end FAP approach that integrates dual label distribution and lightweight design. The manual ratings, attractiveness score, and standard deviation are aggregated explicitly to construct a dual-label distribution to make the best use of the dataset, including the attractiveness distribution and the rating distribution. Such distributions, as well as the attractiveness score, are optimized under a joint learning framework based on the label distribution learning (LDL) paradigm. The data processing is simplified to a minimum for a lightweight design, and MobileNetV2 is selected as our backbone. Extensive experiments are conducted on two benchmark datasets, where our approach achieves promising results and succeeds in balancing performance and efficiency. Ablation studies demonstrate that our delicately designed learning modules are indispensable and correlated. Additionally, the visualization indicates that our approach can perceive facial attractiveness and capture attractive facial regions to facilitate semantic predictions. The code is available at https://github.com/enquan/2D_FAP.

翻訳日:2024-04-27 00:37:16 公開日:2024-04-24

# 複雑なネットワーク力学のストレッチと計測による神経予測

Stretched and measured neural predictions of complex network dynamics ( http://arxiv.org/abs/2301.04900v4 )

ライセンス: Link先を確認

Vaiva Vasiliauskaite, Nino Antulov-Fantulin,

(参考訳) 微分方程式は、物理的システムから複雑なシステムまで、多くのエージェントが非自明な位相的特徴を持つグラフを通して相互作用する、力学を研究するユビキタスなツールである。微分方程式のデータ駆動近似は、特に明示的な第一原理を欠いた複雑なシステムにおいて、力学系のモデルを明らかにする従来の方法に代わる有望な方法を示す。最近、ダイナミックスを研究する機械学習ツールとしてニューラルネットワークが採用されている。これは、データ駆動型ソリューションの検出や微分方程式の発見に使用できる。特に後者のタスクでは、観測されていない状態空間領域や新しいグラフのダイナミクスを予測するような、未知の設定でディープラーニングモデルをデプロイすることは、急激な結果をもたらす可能性がある。グラフを通して結合された一階微分方程式の系で力学を記述する複雑なシステムに着目し、従来の統計的学習理論の限界を超えてモデルの一般化可能性を拡張することは可能であることを示す。しかし、この高度な一般化を実現するためには、ニューラルネットワークモデルが力学モデルに関する基本的な仮定に従う必要がある。さらに、推論中の予測品質を評価するための統計的意義テストを提案し、その予測においてニューラルネットワークの信頼性レベルを識別できるようにする。

Differential equations are a ubiquitous tool to study dynamics, ranging from physical systems to complex systems, where a large number of agents interact through a graph with non-trivial topological features. Data-driven approximations of differential equations present a promising alternative to traditional methods for uncovering a model of dynamical systems, especially in complex systems that lack explicit first principles. A recently employed machine learning tool for studying dynamics is neural networks, which can be used for data-driven solution finding or discovery of differential equations. Specifically for the latter task, however, deploying deep learning models in unfamiliar settings - such as predicting dynamics in unobserved state space regions or on novel graphs - can lead to spurious results. Focusing on complex systems whose dynamics are described with a system of first-order differential equations coupled through a graph, we show that extending the model's generalizability beyond traditional statistical learning theory limits is feasible. However, achieving this advanced level of generalization requires neural network models to conform to fundamental assumptions about the dynamical model. Additionally, we propose a statistical significance test to assess prediction quality during inference, enabling the identification of a neural network's confidence level in its predictions.

翻訳日:2024-04-27 00:27:30 公開日:2024-04-24

# 加速電子からのウンルー放射の測定

Measuring Unruh radiation from accelerated electrons ( http://arxiv.org/abs/2301.06772v5 )

ライセンス: Link先を確認

Gianluca Gregori, Giacomo Marocco, Subir Sarkar, Robert Bingham, Charles Wang,

(参考訳) 加速された電子から熱的ウンルー放射を検出することは、技術的な困難だけでなく、実験室で実際に見られるものに関する概念的明瞭さが欠如しているため、非常に難しい課題となっている。我々は、アンルー効果と2レベル原子系の放射の類似性に基づく、より単純なヒューリスティックな記述とともに、現在の解釈の要約を述べる。加速電子から熱光子の放出があるかどうかを検証する実験を提案する。

Detecting thermal Unruh radiation from accelerated electrons has presented a formidable challenge due not only to technical difficulties but also for lack of conceptual clarity about what is actually seen by a laboratory observer. We give a summary of the current interpretations along with a simpler heuristic description that draws on the analogy between the Unruh effect and radiation from a two-level atomic system. We propose an experiment to test whether there is emission of thermal photons from an accelerated electron.

翻訳日:2024-04-27 00:27:30 公開日:2024-04-24

# 合成負データを用いたハイブリッドオープンセットセグメンテーション

Hybrid Open-set Segmentation with Synthetic Negative Data ( http://arxiv.org/abs/2301.08555v3 )

ライセンス: Link先を確認

Matej Grcić, Siniša Šegvić,

(参考訳) 開集合セグメンテーションは、閉集合分類と異常検出を補完することで実現できる。既存の高密度異常検出器の多くは、正規データの生成モデリングや、負のデータに対する識別によって機能する。これらの2つのアプローチは、異なる目的を最適化し、異なる障害モードを示す。そこで本研究では,生成的および識別的手がかりを融合させる新しい異常スコアを提案する。我々のスコアは、データセット後部および非正規化データの密度の高い推定値を持つ任意のクローズドセットセグメンテーションモデルをアップグレードすることで実現できる。結果として得られる密集したハイブリッドなオープンセットモデルには、負のトレーニングイメージが必要で、これは正の負のデータセットから、共同で訓練された生成モデルから、あるいは両方のソースの混合からサンプリングすることができる。我々は,高密度異常検出と開集合セグメンテーションのためのベンチマークへのコントリビューションを評価した。この実験は、計算オーバーヘッドが無視できないにもかかわらず、強力なオープンセット性能を示す。

Open-set segmentation can be conceived by complementing closed-set classification with anomaly detection. Many of the existing dense anomaly detectors operate through generative modelling of regular data or by discriminating with respect to negative data. These two approaches optimize different objectives and therefore exhibit different failure modes. Consequently, we propose a novel anomaly score that fuses generative and discriminative cues. Our score can be implemented by upgrading any closed-set segmentation model with dense estimates of dataset posterior and unnormalized data likelihood. The resulting dense hybrid open-set models require negative training images that can be sampled from an auxiliary negative dataset, from a jointly trained generative model, or from a mixture of both sources. We evaluate our contributions on benchmarks for dense anomaly detection and open-set segmentation. The experiments reveal strong open-set performance in spite of negligible computational overhead.

翻訳日:2024-04-27 00:27:30 公開日:2024-04-24

# リモートセンシング画像を用いた自己教師型学習のためのグローバル・ローカル・ビューアライメントの拡張

Extending global-local view alignment for self-supervised learning with remote sensing imagery ( http://arxiv.org/abs/2303.06670v2 )

ライセンス: Link先を確認

Xinye Wanyan, Sachith Seneviratne, Shuchang Shen, Michael Kirley,

(参考訳) 多数の高品質なリモートセンシング画像が容易にアクセス可能であるため、手動によるアノテーションの少ない画像のコーパスを利用すると注目が集まる。自己教師付きモデルは、大量のラベルのないデータに対して擬似ラベルを生成するプレテキストタスクを定式化し、訓練のための監督を提供することで、一般的な特徴表現を取得する。従来の研究では、リモートセンシング領域における複数の自己教師付き学習手法が検討されてきたが、自然画像に関する最先端の結果が得られたにもかかわらず、局所的な視点のアライメントに基づくプレテキストタスクは未探索のままである。グローバル・ローカル・ビューアライメントに基づく知識蒸留による効果的な表現学習構造を取り入れたDINOに着想を得て,リモートセンシング画像(SSLRS)を用いた自己教師型学習のための2つのプレテキストタスクを定式化した。これらのタスクを用いて、SSLRSのマルチサイズビューと同様に、正の時間的コントラストの有効性について検討する。我々は,DINOを拡張し,DINO-MCを提案する。DINO-MCは,リモートセンシング画像で観測される物体の大きさの限られた変化を緩和するために,単一の固定サイズではなく,様々な大きさの作物の局所的なビューを使用する。我々の実験は、データセットの10%しか事前トレーニングしていない場合でも、DINO-MCは計算資源を少ないまま、複数のリモートセンシングタスクにおいて既存の最先端SSLRSメソッドと同等かそれ以上の性能を発揮することを示した。すべてのコード、モデル、結果はhttps://github.com/WennyXY/DINO-MCで公開される。

Since large number of high-quality remote sensing images are readily accessible, exploiting the corpus of images with less manual annotation draws increasing attention. Self-supervised models acquire general feature representations by formulating a pretext task that generates pseudo-labels for massive unlabeled data to provide supervision for training. While prior studies have explored multiple self-supervised learning techniques in remote sensing domain, pretext tasks based on local-global view alignment remain underexplored, despite achieving state-of-the-art results on natural imagery. Inspired by DINO, which employs an effective representation learning structure with knowledge distillation based on global-local view alignment, we formulate two pretext tasks for self-supervised learning on remote sensing imagery (SSLRS). Using these tasks, we explore the effectiveness of positive temporal contrast as well as multi-sized views on SSLRS. We extend DINO and propose DINO-MC which uses local views of various sized crops instead of a single fixed size in order to alleviate the limited variation in object size observed in remote sensing imagery. Our experiments demonstrate that even when pre-trained on only 10% of the dataset, DINO-MC performs on par or better than existing state-of-the-art SSLRS methods on multiple remote sensing tasks, while using less computational resources. All codes, models, and results are released at https://github.com/WennyXY/DINO-MC.

翻訳日:2024-04-27 00:27:30 公開日:2024-04-24

# LatentForensics:StyleGAN潜伏空間におけるFragal Deepfake検出に向けて

LatentForensics: Towards frugal deepfake detection in the StyleGAN latent space ( http://arxiv.org/abs/2303.17222v2 )

ライセンス: Link先を確認

Matthieu Delmas, Amine Kacete, Stephane Paquelet, Simon Leglaive, Renaud Seguier,

(参考訳) 偽ビデオの分類はここ数年、難しい課題だった。ディープフェイク分類器は、ビデオフレームが改ざんされたかどうかを確実に予測できる。しかしながら、それらのパフォーマンスは、トレーニングに使用されるデータセットと、アナリストの計算能力の両方に結びついている。本稿では,高品質な顔画像で訓練された最先端生成逆数ネットワーク(GAN)の潜時空間で動作するディープフェイク検出手法を提案する。提案手法は、StyleGANの潜在空間の構造を利用して、軽量な二項分類モデルを学ぶ。標準データセットに対する実験結果から,提案手法は他の最先端のディープフェイク分類手法よりも優れており,特に新しい操作手法を導入する場合など,モデルのトレーニングに使用可能なデータが稀な状況では,その性能が向上することが明らかとなった。我々の知る限りでは、この研究はStyleGANの潜伏空間の深い分類への関心を示す最初の研究である。この潜伏空間の解釈と操作に関する他の最近の研究と組み合わせて、顔画像の解釈可能な高レベル特性に基づくフラジアルディープフェイク分類法をさらに発展させることができると信じている。

The classification of forged videos has been a challenge for the past few years. Deepfake classifiers can now reliably predict whether or not video frames have been tampered with. However, their performance is tied to both the dataset used for training and the analyst's computational power. We propose a deepfake detection method that operates in the latent space of a state-of-the-art generative adversarial network (GAN) trained on high-quality face images. The proposed method leverages the structure of the latent space of StyleGAN to learn a lightweight binary classification model. Experimental results on standard datasets reveal that the proposed approach outperforms other state-of-the-art deepfake classification methods, especially in contexts where the data available to train the models is rare, such as when a new manipulation method is introduced. To the best of our knowledge, this is the first study showing the interest of the latent space of StyleGAN for deepfake classification. Combined with other recent studies on the interpretation and manipulation of this latent space, we believe that the proposed approach can further help in developing frugal deepfake classification methods based on interpretable high-level properties of face images.

翻訳日:2024-04-27 00:27:30 公開日:2024-04-24

# 深層学習モデル変換器の故障とリスクの分析:ONNXエコシステムを事例として

Analysis of Failures and Risks in Deep Learning Model Converters: A Case Study in the ONNX Ecosystem ( http://arxiv.org/abs/2303.17708v3 )

ライセンス: Link先を確認

Purvish Jajal, Wenxin Jiang, Arav Tewari, Erik Kocinare, Joseph Woo, Anusha Sarraf, Yung-Hsiang Lu, George K. Thiruvathukal, James C. Davis,

(参考訳) ソフトウェアエンジニアは、さまざまな開発フレームワークとランタイム環境を使用して、ディープラーニング(DL)モデルを開発、微調整、デプロイします。 DLモデルコンバータは、フレームワークとランタイム環境の間でモデルを移動します。変換エラーによってモデルの品質が損なわれ、デプロイメントが破壊される。しかし、DLモデルコンバータの故障特性は不明であり、DLインターオペラビリティ技術を使用する場合のリスクが増大する。本稿では,DLモデルコンバータの故障解析を行う。我々は,DL相互運用性ツール,ユースケース,痛点(N=92)について,ソフトウェアエンジニアを調査した。次に、メインの相互運用性ツールであるONNX(PyTorchとTensorFlowのN=200問題)に関連するモデルコンバータの障害を特徴付ける。最後に、我々が研究した失敗の構造的原因に関する2つの仮説を定式化し、検証した。モデル変換器のノード変換段階が欠陥の75%を占め、報告された障害の33%が意味的に誤りのあるモデルと関連していることがわかった。意味的に不正確なモデルの原因は解明されているが、振る舞いの不整合のあるモデルは演算子シーケンスを共有する。我々の成果は、DLインターオペラビリティソフトウェアをメンテナンス、拡張、検証をより簡単にするための将来の研究を動機付けています。行動寛容とアーキテクチャカバレッジメトリクスの研究は実りあるかもしれない。

Software engineers develop, fine-tune, and deploy deep learning (DL) models using a variety of development frameworks and runtime environments. DL model converters move models between frameworks and to runtime environments. Conversion errors compromise model quality and disrupt deployment. However, the failure characteristics of DL model converters are unknown, adding risk when using DL interoperability technologies. This paper analyzes failures in DL model converters. We survey software engineers about DL interoperability tools, use cases, and pain points (N=92). Then, we characterize failures in model converters associated with the main interoperability tool, ONNX (N=200 issues in PyTorch and TensorFlow). Finally, we formulate and test two hypotheses about structural causes for the failures we studied. We find that the node conversion stage of a model converter accounts for ~75% of the defects and 33% of reported failure are related to semantically incorrect models. The cause of semantically incorrect models is elusive, but models with behaviour inconsistencies share operator sequences. Our results motivate future research on making DL interoperability software simpler to maintain, extend, and validate. Research into behavioural tolerances and architectural coverage metrics could be fruitful.

翻訳日:2024-04-27 00:27:30 公開日:2024-04-24

# 胸部CT検診における汎用医用AI

Specialty-Oriented Generalist Medical AI for Chest CT Screening ( http://arxiv.org/abs/2304.02649v4 )

ライセンス: Link先を確認

Chuang Niu, Qing Lyu, Christopher D. Carothers, Parisa Kaviani, Josh Tan, Pingkun Yan, Mannudeep K. Kalra, Christopher T. Whitlow, Ge Wang,

(参考訳) 現代の医療記録には、膨大な量のマルチモーダル・フリーテキスト臨床データと、放射線学、心臓学、デジタル病理学からの画像データが含まれている。このようなビッグデータを完全にマイニングするにはマルチタスクが必要である。そうでなければ、オカルトだが重要な側面は見過ごされ、臨床管理や人口医療に悪影響を及ぼす可能性がある。単一モーダルデータを用いた個々のタスクにおけるAIの顕著な成功にもかかわらず、データキュレーションとモデルアーキテクチャの2つの課題のために、マルチタスクのためのマルチモーダルデータを組み合わせるための一般の医療AIの開発の進歩は、比較的遅いままである。データ課題は、マルチモーダルな構造化および非構造化のテキスト、アルファ数値、特にリアルタイム決定のための患者レベルでの3Dトモグラフィースキャンを、人口健康統計を推定するためのスケールでクエリし、キュレートすることである。このモデル課題は、多様な臨床タスクのためのマルチモーダルデータセットを統合するために、スケーラブルで適応可能なネットワークアーキテクチャを必要とする。本稿では,肺がん検診および関連する課題に応用したM3FMの基礎モデルを提案する。 163,725個の胸部CTシリーズを含む49種類の臨床データとLCSに関わる17の医療タスクからなる総合マルチモーダルマルチタスクデータセットをキュレートした後,我々は多モーダル情報の相乗化と自由テキストプロンプトによる複数タスク実行のための統一的なトレーニングおよび推論戦略として,多モーダル質問応答フレームワークを開発した。 M3FMは、最先端の単一モーダルタスク固有のモデルより一貫して優れており、臨床タスクに有用なマルチモーダルデータ要素を特定し、小さなアウト・オブ・ディストリビューションデータセットで新しいタスクに柔軟に適応する。専門的な汎用的な医療AIモデルとして、M3FMは、専門医とジェネラリストのギャップを埋め、他の分野における同様のブレークスルーの道を開く。

Modern medical records include a vast amount of multimodal free text clinical data and imaging data from radiology, cardiology, and digital pathology. Fully mining such big data requires multitasking; otherwise, occult but important aspects may be overlooked, adversely affecting clinical management and population healthcare. Despite remarkable successes of AI in individual tasks with single-modal data, the progress in developing generalist medical AI remains relatively slow to combine multimodal data for multitasks because of the dual challenges of data curation and model architecture. The data challenge involves querying and curating multimodal structured and unstructured text, alphanumeric, and especially 3D tomographic scans on an individual patient level for real-time decisions and on a scale to estimate population health statistics. The model challenge demands a scalable and adaptable network architecture to integrate multimodal datasets for diverse clinical tasks. Here we propose the first-of-its-kind medical multimodal-multitask foundation model (M3FM) with application in lung cancer screening and related tasks. After we curated a comprehensive multimodal multitask dataset consisting of 49 clinical data types including 163,725 chest CT series and 17 medical tasks involved in LCS, we develop a multimodal question-answering framework as a unified training and inference strategy to synergize multimodal information and perform multiple tasks via free-text prompting. M3FM consistently outperforms the state-of-the-art single-modal task-specific models, identifies multimodal data elements informative for clinical tasks and flexibly adapts to new tasks with a small out-of-distribution dataset. As a specialty-oriented generalist medical AI model, M3FM paves the way for similar breakthroughs in other areas of medicine, closing the gap between specialists and the generalist.

翻訳日:2024-04-27 00:27:30 公開日:2024-04-24

# オーバーロード:エッジデバイスのオブジェクト検出における遅延攻撃

Overload: Latency Attacks on Object Detection for Edge Devices ( http://arxiv.org/abs/2304.05370v3 )

ライセンス: Link先を確認

Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung, Che-rung Lee,

(参考訳) 今日では、インテリジェントなサービスに対する需要が高まっているため、ディープラーニングベースのアプリケーションのデプロイが不可欠である。本稿では,ディープラーニングアプリケーションに対する遅延攻撃について検討する。誤分類に対する一般的な敵攻撃とは異なり、遅延攻撃の目標は推論時間を増やすことであり、アプリケーションが適切な時間内に要求に応答するのを阻止する可能性がある。このような攻撃は様々なアプリケーションに広く適用されており、この種の攻撃がどのように動作するかを示すためにオブジェクト検出を使用します。また、大規模な遅延アタックを生成するOverloadというフレームワークも設計しています。提案手法は,新たに定式化された最適化問題と空間アテンションと呼ばれる新しい手法に基づく。この攻撃は、推論時間の間に必要となる計算コストを増大させ、結果としてオブジェクト検出のための推論時間が延長される。これは特に限られた計算資源を持つシステムに重大な脅威をもたらす。 Nvidia NX上でYOLOv5モデルを用いた実験を行った。既存の手法と比較して,本手法はよりシンプルで効果的である。実験の結果, 遅延攻撃では, 単一画像の推測時間は, 通常の設定の10倍長くなることがわかった。さらに,NMSに依存せず,非最大抑制(NMS)を必要とする全ての物体検出タスクに対して新たな脅威となる可能性が示唆された。

Nowadays, the deployment of deep learning-based applications is an essential task owing to the increasing demands on intelligent services. In this paper, we investigate latency attacks on deep learning applications. Unlike common adversarial attacks for misclassification, the goal of latency attacks is to increase the inference time, which may stop applications from responding to the requests within a reasonable time. This kind of attack is ubiquitous for various applications, and we use object detection to demonstrate how such kind of attacks work. We also design a framework named Overload to generate latency attacks at scale. Our method is based on a newly formulated optimization problem and a novel technique, called spatial attention. This attack serves to escalate the required computing costs during the inference time, consequently leading to an extended inference time for object detection. It presents a significant threat, especially to systems with limited computing resources. We conducted experiments using YOLOv5 models on Nvidia NX. Compared to existing methods, our method is simpler and more effective. The experimental results show that with latency attacks, the inference time of a single image can be increased ten times longer in reference to the normal setting. Moreover, our findings pose a potential new threat to all object detection tasks requiring non-maximum suppression (NMS), as our attack is NMS-agnostic.

翻訳日:2024-04-27 00:27:30 公開日:2024-04-24

# 自律運転テストを改善するデジタル兄弟

Two is Better Than One: Digital Siblings to Improve Autonomous Driving Testing ( http://arxiv.org/abs/2305.08060v2 )

ライセンス: Link先を確認

Matteo Biagiola, Andrea Stocco, Vincenzo Riccio, Paolo Tonella,

(参考訳) シミュレーションベースのテストは、自律運転ソフトウェアの信頼性を確保するための重要なステップである。実際には、企業が社内またはアウトソーステストのどちらかで、サードパーティの汎用シミュレータに頼っている場合、実際の自動運転車に対するテスト結果の一般化が重要になっている。本稿では、異なる技術で構築された複数の汎用シミュレータ上で、与えられた自動運転車をテストするマルチシミュレータアプローチであるデジタルシミュレータの概念を導入し、シミュレーションベースのテストを強化し、テストプロセスにおけるアンサンブルとして一括して動作する。我々は、自動運転車の車線維持コンポーネントのテストに焦点をあてたケーススタディに、我々のアプローチを例示する。我々は2つのオープンソースシミュレータをデジタルシグナリングとして使用し、このようなマルチシミュレータアプローチを、大規模なテストケースにおいて物理的にスケールされた自動運転車のディジタルツインに対して実証的に比較する。提案手法では,各シミュレータのテストケースの生成と実行を,道路点列の形式で行う必要がある。次に、テストケースをシミュレータ間で移動させ、特徴マップを用いて運動した運転条件を特徴付ける。最後に、共同予測故障確率を算出し、兄弟間の一致の場合のみ故障を報知する。実験により,デジタル双子の故障予測において,デジタル兄弟によるアンサンブル故障予測器が個々のシミュレータよりも優れていることが示された。ケーススタディの成果と,自律走行ソフトウェアの自動テストに関心のある研究者に,我々のアプローチがどのように役立つのかを詳述する。

Simulation-based testing represents an important step to ensure the reliability of autonomous driving software. In practice, when companies rely on third-party general-purpose simulators, either for in-house or outsourced testing, the generalizability of testing results to real autonomous vehicles is at stake. In this paper, we enhance simulation-based testing by introducing the notion of digital siblings, a multi-simulator approach that tests a given autonomous vehicle on multiple general-purpose simulators built with different technologies, that operate collectively as an ensemble in the testing process. We exemplify our approach on a case study focused on testing the lane-keeping component of an autonomous vehicle. We use two open-source simulators as digital siblings, and we empirically compare such a multi-simulator approach against a digital twin of a physical scaled autonomous vehicle on a large set of test cases. Our approach requires generating and running test cases for each individual simulator, in the form of sequences of road points. Then, test cases are migrated between simulators, using feature maps to characterize the exercised driving conditions. Finally, the joint predicted failure probability is computed, and a failure is reported only in cases of agreement among the siblings. Our empirical evaluation shows that the ensemble failure predictor by the digital siblings is superior to each individual simulator at predicting the failures of the digital twin. We discuss the findings of our case study and detail how our approach can help researchers interested in automated testing of autonomous driving software.

翻訳日:2024-04-27 00:17:35 公開日:2024-04-24

# ULIP-2:3次元理解のためのスケーラブルなマルチモーダル事前学習を目指して

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding ( http://arxiv.org/abs/2305.08275v3 )

ライセンス: Link先を確認

Le Xue, Ning Yu, Shu Zhang, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, Silvio Savarese,

(参考訳) 近年のマルチモーダル事前学習の進歩は, 3次元形状, 2次元形状, 言語記述の多モーダル特徴の整合による3次元表現学習において有望な効果を示した。しかし, 既存のフレームワークがこのようなマルチモーダルデータ, 特に3次元形状の言語記述をキュレートする手法はスケーラビリティに欠けており, 収集された言語記述は多様ではない。そこで本研究では,大規模マルチモーダルモデルを利用して3次元形状の全体的言語記述を自動的に生成する,シンプルで効果的な3モーダル事前学習フレームワークULIP-2を紹介する。入力として3Dデータしか必要とせず、手動の3Dアノテーションを必要としないため、大規模なデータセットにスケーラブルである。 ULIP-2は、より優れたマルチモーダル表現学習のためのスケールアップバックボーンも備えている。我々は,2つの大規模3DデータセットであるObjaverseとShapeNetで実験を行い,ULIP-2をトレーニングするための3Dポイントクラウド,画像,言語をトリモーダルデータセットで拡張した。実験の結果, ULIP-2は, ゼロショット3D分類, ファインチューニングによる標準3D分類, 3Dキャプション生成(3D-to-Language generation)の3つのダウンストリームタスクにおいて, 顕著なメリットを示すことがわかった。ゼロショット分類では、Objaverse-LVISで50.6%(トップ-1)、ModelNet40で84.7%(トップ-1)の新しいSOTAを実現している。標準微調整のためのScanObjectNNベンチマークでは、ULIP-2は91.5%の精度に達し、パラメータはわずか1.4万である。 ULIP-2は、人間のアノテーションを使わずにスケーラブルなマルチモーダル3D表現学習のための新しいパラダイムに光を当て、既存のベースラインよりも大幅に改善されている。コードとデータセットはhttps://github.com/salesforce/ULIPで公開されている。

Recent advancements in multimodal pre-training have shown promising efficacy in 3D representation learning by aligning multimodal features across 3D shapes, their 2D counterparts, and language descriptions. However, the methods used by existing frameworks to curate such multimodal data, in particular language descriptions for 3D shapes, are not scalable, and the collected language descriptions are not diverse. To address this, we introduce ULIP-2, a simple yet effective tri-modal pre-training framework that leverages large multimodal models to automatically generate holistic language descriptions for 3D shapes. It only needs 3D data as input, eliminating the need for any manual 3D annotations, and is therefore scalable to large datasets. ULIP-2 is also equipped with scaled-up backbones for better multimodal representation learning. We conduct experiments on two large-scale 3D datasets, Objaverse and ShapeNet, and augment them with tri-modal datasets of 3D point clouds, images, and language for training ULIP-2. Experiments show that ULIP-2 demonstrates substantial benefits in three downstream tasks: zero-shot 3D classification, standard 3D classification with fine-tuning, and 3D captioning (3D-to-language generation). It achieves a new SOTA of 50.6% (top-1) on Objaverse-LVIS and 84.7% (top-1) on ModelNet40 in zero-shot classification. In the ScanObjectNN benchmark for standard fine-tuning, ULIP-2 reaches an overall accuracy of 91.5% with a compact model of only 1.4 million parameters. ULIP-2 sheds light on a new paradigm for scalable multimodal 3D representation learning without human annotations and shows significant improvements over existing baselines. The code and datasets are released at https://github.com/salesforce/ULIP.

翻訳日:2024-04-27 00:17:35 公開日:2024-04-24

# クリーンデータよりも破損データの多いシステム同定のための厳密な復元

Exact Recovery for System Identification with More Corrupt Data than Clean Data ( http://arxiv.org/abs/2305.10506v3 )

ライセンス: Link先を確認

Baturalp Yalcin, Haixiang Zhang, Javad Lavaei, Murat Arcak,

(参考訳) 本稿では,2つのラッソ型推定器を用いた線形離散時間系のシステム同定問題について検討する。本研究では,これらの推定器の漸近特性と非漸近特性を,攻撃時の決定論的モデルと確率論的モデルに対応する2つの異なるシナリオで検討する。システムから採取したサンプルは相関しているため,既存のラッソに関する結果は適用できない。システムが安定しており、攻撃が定期的に注入される場合、システムダイナミクスの正確な回復のためのサンプルの複雑さは状態の次元の点で線形であることが証明された。確率 p のインスタンスごとに逆攻撃が発生したとき、正確な回復に必要なサンプルの複雑さは状態と確率 p の次元で多項式的にスケールする。この結果は、漸近的状態の下での真の系力学へのほぼ確実な収束を示唆する。副産物として、データの半分以上が漏洩した場合でも、私たちの推定者はシステムを正しく学習します。本研究では,攻撃ベクトルが相互に相関することが認められているのに対して,攻撃の発生時期についていくつかの仮定を行う。本稿では, 汚いデータよりもクリーンなデータが少ない場合に, 動的システムの相関データから学習することに関する文献の中で, 初めての数学的保証を提供する。

This paper investigates the system identification problem for linear discrete-time systems under adversaries and analyzes two lasso-type estimators. We examine both asymptotic and non-asymptotic properties of these estimators in two separate scenarios, corresponding to deterministic and stochastic models for the attack times. Since the samples collected from the system are correlated, the existing results on lasso are not applicable. We prove that when the system is stable and attacks are injected periodically, the sample complexity for exact recovery of the system dynamics is linear in terms of the dimension of the states. When adversarial attacks occur at each time instance with probability p, the required sample complexity for exact recovery scales polynomially in the dimension of the states and the probability p. This result implies almost sure convergence to the true system dynamics under the asymptotic regime. As a by-product, our estimators still learn the system correctly even when more than half of the data is compromised. We highlight that the attack vectors are allowed to be correlated with each other in this work, whereas we make some assumptions about the times at which the attacks happen. This paper provides the first mathematical guarantee in the literature on learning from correlated data for dynamical systems in the case when there is less clean data than corrupt data.

翻訳日:2024-04-27 00:17:35 公開日:2024-04-24

# 差別的拡散モデル : 映像と言語学習者による差別的拡散モデル

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners ( http://arxiv.org/abs/2305.10722v3 )

ライセンス: Link先を確認

Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang,

(参考訳) 安定拡散のような拡散モデルは、テキスト・画像生成において素晴らしい性能を示している。テキスト・ツー・イメージ生成は、しばしば、細かな詳細とテキスト・プロンプトで特定された属性で視覚概念を生成するモデルを必要とするため、画像・テキストマッチングのような識別的なタスクに対して、事前学習された拡散モデルによって学習された強力な表現を活用できるだろうか? そこで本研究では,事前学習したテキストと画像の拡散モデルから数ショットの識別学習者へ変換する新たなアプローチとして,DSD(Distriminative Staable Diffusion)を提案する。提案手法は, 安定拡散モデルの相互注意スコアを用いて, 視覚情報とテキスト情報の相互影響を捉え, より効率的な注意に基づくプロンプト学習により, 画像テキストマッチングを行う。いくつかのベンチマークデータセット上で、DSDと最先端の手法を比較することで、数ショット画像テキストマッチングにおいて優れた結果が得られる識別的タスクに事前訓練された拡散モデルを使用することの可能性を示す。

Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified in text prompts, can we leverage the powerful representations learned by pre-trained diffusion models for discriminative tasks such as image-text matching? To answer this question, we propose a novel approach, Discriminative Stable Diffusion (DSD), which turns pre-trained text-to-image diffusion models into few-shot discriminative learners. Our approach mainly uses the cross-attention score of a Stable Diffusion model to capture the mutual influence between visual and textual information and fine-tune the model via efficient attention-based prompt learning to perform image-text matching. By comparing DSD with state-of-the-art methods on several benchmark datasets, we demonstrate the potential of using pre-trained diffusion models for discriminative tasks with superior results on few-shot image-text matching.

翻訳日:2024-04-27 00:17:35 公開日:2024-04-24

# テキスト・ビデオ生成のための時空間拡散におけるスワップアテンション

Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation ( http://arxiv.org/abs/2305.10874v4 )

ライセンス: Link先を確認

Wenjing Wang, Huan Yang, Zixi Tuo, Huiguo He, Junchen Zhu, Jianlong Fu, Jiaying Liu,

(参考訳) AI生成コンテンツ(AIGC)の爆発的な人気により、ビデオ生成は近年多くの注目を集めている。テキスト命令でガイドされたビデオを生成することは、空間と時間の間の複雑な関係をモデル化することや、大規模なテキストとビデオのペアリングデータの欠如など、大きな課題をもたらす。既存のテキストビデオデータセットは、コンテンツ品質とスケールの両方の制限に悩まされるか、オープンソースではないため、学習や使用にはアクセスできない。モデル設計においては、ビデオ生成のための時間的1D畳み込み/アテンションモジュールを追加することで、事前訓練されたテキスト・画像生成モデルを拡張する。しかし、これらのアプローチは空間と時間の共同モデリングの重要性を軽視し、必然的に時間的歪みやテキストとビデオ間の不一致を招きかねない。本稿では,空間的知覚と時間的知覚の相互作用を強化する新しいアプローチを提案する。特に,空間ブロックと時間ブロックの「クエリ」ロールを交互に置き換える3次元ウィンドウにおいて,相互強化を実現する。さらに、高品質なビデオ生成のためのモデル機能を完全にアンロックし、フィールドの開発を促進するために、HD-VG-130Mと呼ばれる大規模かつオープンソースのビデオデータセットをキュレートする。このデータセットは、オープンドメインから1億3000万のテキストビデオペアで構成され、高精細度、ワイドスクリーン、透かしのない文字を保証する。より小さく、より精巧に掃除されたサブセットは、データ品質をさらに向上させ、優れたパフォーマンスを達成するためのモデルを支援する。実験的な定量的および定性的な結果から,フレーム単位の品質,時間的相関,テキスト・ビデオアライメントの面で,明確なマージンを有するアプローチの優位性を示した。

With the explosive popularity of AI-generated content (AIGC), video generation has recently received a lot of attention. Generating videos guided by text instructions poses significant challenges, such as modeling the complex relationship between space and time, and the lack of large-scale text-video paired data. Existing text-video datasets suffer from limitations in both content quality and scale, or they are not open-source, rendering them inaccessible for study and use. For model design, previous approaches extend pretrained text-to-image generation models by adding temporal 1D convolution/attention modules for video generation. However, these approaches overlook the importance of jointly modeling space and time, inevitably leading to temporal distortions and misalignment between texts and videos. In this paper, we propose a novel approach that strengthens the interaction between spatial and temporal perceptions. In particular, we utilize a swapped cross-attention mechanism in 3D windows that alternates the "query" role between spatial and temporal blocks, enabling mutual reinforcement for each other. Moreover, to fully unlock model capabilities for high-quality video generation and promote the development of the field, we curate a large-scale and open-source video dataset called HD-VG-130M. This dataset comprises 130 million text-video pairs from the open-domain, ensuring high-definition, widescreen and watermark-free characters. A smaller-scale yet more meticulously cleaned subset further enhances the data quality, aiding models in achieving superior performance. Experimental quantitative and qualitative results demonstrate the superiority of our approach in terms of per-frame quality, temporal correlation, and text-video alignment, with clear margins.

翻訳日:2024-04-27 00:17:35 公開日:2024-04-24

# LANISTR: 構造化データと非構造化データによるマルチモーダル学習

LANISTR: Multimodal Learning from Structured and Unstructured Data ( http://arxiv.org/abs/2305.16556v3 )

ライセンス: Link先を確認

Sayna Ebrahimi, Sercan O. Arik, Yihe Dong, Tomas Pfister,

(参考訳) マルチモーダルな大規模事前学習は,言語や画像などの非構造化データに対して顕著な性能を示した。しかし、一般的な実世界のシナリオは、構造化データ型、表型、時系列型、非構造化データである。このようなシナリオは検討されている。このギャップを埋めるために,LANguage, Image, STRucturedデータから学習する注目ベースのフレームワークLANISTRを提案する。 LANISTRの方法論のコアは、単調なレベルとマルチモーダルなレベルの両方に適用される‘textit{masking-based}トレーニングに根ざしている。特に,新しい類似性に基づくマルチモーダルマスキングの損失を導入し,モダリティを欠いた大規模マルチモーダルデータからクロスモーダル関係を学習する。 MIMIC-IV(ヘルスケアから)とAmazon Product Review(小売から)の2つの実世界のデータセットにおいて、LANISTRは、最先端の代替品と比較して、それぞれ0.1\%と0.01\%のラベル付きデータで微調整された場合、6.6\%(AUROCで)と14\%(精度で)の顕著な改善を示している。特に、これらの改善は、全てのモダリティを含まない非常に高い比(それぞれ35.7\%と99.8\%)のサンプルでも観察され、LANISTRの頑丈さを事実上欠落したモダリティの課題に基づけている。私たちのコードとモデルはhttps://github.com/google-research/lanistrで公開されます。

Multimodal large-scale pretraining has shown impressive performance for unstructured data such as language and image. However, a prevalent real-world scenario involves structured data types, tabular and time-series, along with unstructured data. Such scenarios have been understudied. To bridge this gap, we propose LANISTR, an attention-based framework to learn from LANguage, Image, and STRuctured data. The core of LANISTR's methodology is rooted in \textit{masking-based} training applied across both unimodal and multimodal levels. In particular, we introduce a new similarity-based multimodal masking loss that enables it to learn cross-modal relations from large-scale multimodal data with missing modalities. On two real-world datasets, MIMIC-IV (from healthcare) and Amazon Product Review (from retail), LANISTR demonstrates remarkable improvements, 6.6\% (in AUROC) and 14\% (in accuracy) when fine-tuned with 0.1\% and 0.01\% of labeled data, respectively, compared to the state-of-the-art alternatives. Notably, these improvements are observed even with very high ratio of samples (35.7\% and 99.8\% respectively) not containing all modalities, underlining the robustness of LANISTR to practical missing modality challenge. Our code and models will be available at https://github.com/google-research/lanistr

翻訳日:2024-04-27 00:17:35 公開日:2024-04-24

# 統計的機械学習を用いた研究全体にわたる不均一処理効果推定のためのマルチスタディRラーナー

Multi-Study R-Learner for Estimating Heterogeneous Treatment Effects Across Studies Using Statistical Machine Learning ( http://arxiv.org/abs/2306.01086v3 )

ライセンス: Link先を確認

Cathy Shyr, Boyu Ren, Prasad Patil, Giovanni Parmigiani,

(参考訳) ヘテロジニアス治療効果(HTEs)の推定は、精密医療に不可欠である。複数の研究が結果の一般化性を改善することができるが、それらを推定に活用することは統計的に困難である。既存のアプローチでは、研究全体で同じHTEを仮定することが多いが、これは、研究設計の違い、研究人口、データ収集プロトコルなど、研究間の異種性の様々な源泉によって、違反される可能性がある。そこで本研究では,Nuisance関数と処理効果の相違を考慮したマルチスタディHTE推定のためのフレームワークを提案する。我々のアプローチであるマルチスタディR-ラーナーは、R-ラーナーを拡張し、マルチスタディ環境における機械学習(ML)を用いた原理的統計的推定値を得る。これは、研究固有の治療効果と、メンバーシップ確率を通してニュアンス関数をリンクするデータ適応的客観的関数を含んでおり、これにより、潜在的に異種な研究を通じて情報を借りることができる。マルチスタディなRラーナーフレームワークは、ランダムに制御された試行錯誤、観察研究、あるいは両方の組み合わせからのデータを組み合わせることができる。 HTE、ニュアンス関数、メンバシップ確率を推定するためにMLを組み込むことは、実装が容易でフレキシブルです。連続推定フレームワークでは、Rラーナーが相似性の下で確率的スコアモデルに相似不均一性が存在する場合、Rラーナーよりも漸近的に正規かつ効率的であることが示される。提案手法は, 既存手法と比較して, 学際的不均一性が存在する場合と比較して, 有効であることを示す。

Estimating heterogeneous treatment effects (HTEs) is crucial for precision medicine. While multiple studies can improve the generalizability of results, leveraging them for estimation is statistically challenging. Existing approaches often assume identical HTEs across studies, but this may be violated due to various sources of between-study heterogeneity, including differences in study design, study populations, and data collection protocols, among others. To this end, we propose a framework for multi-study HTE estimation that accounts for between-study heterogeneity in the nuisance functions and treatment effects. Our approach, the multi-study R-learner, extends the R-learner to obtain principled statistical estimation with machine learning (ML) in the multi-study setting. It involves a data-adaptive objective function that links study-specific treatment effects with nuisance functions through membership probabilities, which enable information to be borrowed across potentially heterogeneous studies. The multi-study R-learner framework can combine data from randomized controlled trials, observational studies, or a combination of both. It's easy to implement and flexible in its ability to incorporate ML for estimating HTEs, nuisance functions, and membership probabilities. In the series estimation framework, we show that the multi-study R-learner is asymptotically normal and more efficient than the R-learner when there is between-study heterogeneity in the propensity score model under homoscedasticity. We illustrate using cancer data that the proposed method performs favorably compared to existing approaches in the presence of between-study heterogeneity.

翻訳日:2024-04-27 00:17:35 公開日:2024-04-24

# インジェクティブフローのリフティング構造制約

Lifting Architectural Constraints of Injective Flows ( http://arxiv.org/abs/2306.01843v4 )

ライセンス: Link先を確認

Peter Sorrenson, Felix Draxler, Armand Rousselot, Sander Hummerich, Lea Zimmermann, Ullrich Köthe,

(参考訳) 正規化フローはトレーニングデータに対して全次元の確率を明示的に最大化する。しかし、実際のデータは一般に低次元多様体上でのみサポートされ、モデルがモデリングノイズに大きな計算を出力する。単射フローは、多様体とその上の分布を共同で学習することでこれを解決する。これまでのところ、制限的なアーキテクチャや高い計算コストによって制限されている。我々は、自由形式のボトルネックアーキテクチャと互換性のある最大可能性損失を推定する新しい効率的な推定器により、両方の制約を引き上げる。さらに、データ多様体とそれ上の分布の両方を鼻で学習することで、分岐解がもたらされることを示し、この知見を用いて、安定した最大可能性トレーニング目標を動機付ける。我々は,玩具,表,画像データについて広範な実験を行い,その結果の競争性能を実証した。

Normalizing Flows explicitly maximize a full-dimensional likelihood on the training data. However, real data is typically only supported on a lower-dimensional manifold leading the model to expend significant compute on modeling noise. Injective Flows fix this by jointly learning a manifold and the distribution on it. So far, they have been limited by restrictive architectures and/or high computational cost. We lift both constraints by a new efficient estimator for the maximum likelihood loss, compatible with free-form bottleneck architectures. We further show that naively learning both the data manifold and the distribution on it can lead to divergent solutions, and use this insight to motivate a stable maximum likelihood training objective. We perform extensive experiments on toy, tabular and image data, demonstrating the competitive performance of the resulting model.

翻訳日:2024-04-27 00:17:35 公開日:2024-04-24

# WOUAF:テキスト・画像拡散モデルにおけるユーザ属性とフィンガープリントの軽量化

WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models ( http://arxiv.org/abs/2306.04744v3 )

ライセンス: Link先を確認

Changhoon Kim, Kyle Min, Maitreya Patel, Sheng Cheng, Yezhou Yang,

(参考訳) 生成モデルの急速な進歩は、テキスト記述から超現実的画像の作成を容易にし、誤情報のような社会的な重要な懸念を同時にエスカレートさせてきた。いくつかの軽減策を提供しているが、従来の指紋認証機構は、悪意ある合成画像の使用に対する責任を負うには不十分である。本稿では,生成画像に対する責任を負うモデルフィンガープリントの新たなアプローチを提案する。提案手法は,ユーザ固有のデジタル指紋に基づいて生成モデルを修正し,ユーザへ遡ることができるコンテンツにユニークな識別子を印字する。安定拡散モデルを用いたテキスト・トゥ・イメージ(T2I)タスクに微調整を取り入れたこのアプローチは、出力品質に最小限の影響を伴って、ほぼ完全な帰属精度を示す。本手法は,画像後処理の処理効率を平均11倍に向上させ,ベースライン法よりも優れていることを示す。提案手法は,説明責任のあるモデル分布と責任ある利用のための,有望で斬新な道を示す。私たちのコードは \url{https://github.com/kylemin/WOUAF} で利用可能です。

The rapid advancement of generative models, facilitating the creation of hyper-realistic images from textual descriptions, has concurrently escalated critical societal concerns such as misinformation. Although providing some mitigation, traditional fingerprinting mechanisms fall short in attributing responsibility for the malicious use of synthetic images. This paper introduces a novel approach to model fingerprinting that assigns responsibility for the generated images, thereby serving as a potential countermeasure to model misuse. Our method modifies generative models based on each user's unique digital fingerprint, imprinting a unique identifier onto the resultant content that can be traced back to the user. This approach, incorporating fine-tuning into Text-to-Image (T2I) tasks using the Stable Diffusion Model, demonstrates near-perfect attribution accuracy with a minimal impact on output quality. Through extensive evaluation, we show that our method outperforms baseline methods with an average improvement of 11\% in handling image post-processes. Our method presents a promising and novel avenue for accountable model distribution and responsible use. Our code is available in \url{https://github.com/kylemin/WOUAF}.

翻訳日:2024-04-27 00:07:23 公開日:2024-04-24

# デコヒーレンス自由部分空間におけるカー効果に基づく量子論理ゲート

Kerr-effect-based quantum logical gates in decoherence-free subspace ( http://arxiv.org/abs/2306.05625v2 )

ライセンス: Link先を確認

Fang-Fang Du, Gang Fan, Xue-Mei Ren,

(参考訳) システムと環境のカップリングによるデコヒーレンス効果は、量子情報処理における2つの(または3つの)量子ビット論理ゲートの効率的な実装におけるエラーにつながる。幸いなことに、decoherence-free subspace (DFS) が導入されたことにより、decoherence効果の影響を効果的に低減することができる。本稿では,DFSにおけるクロスカー非線形性を用いて,2つないし3つの論理量子ビットに対して,制御NOT(CNOT),トフォリ,フレドキンゲートなどの量子制御ゲートの族を設定する手法を提案する。これら3つの論理ゲートは複雑な量子計算回路も補助光子(あるいは絡み合った状態)も必要としない。 3つの論理ゲートの成功確率は、X-ホモジン検出器の異なる測定結果に基づいて、対応する古典的フィードフォワード演算を行うことで近似1であり、その忠実度は、現在の技術による光子損失に対して堅牢である。提案する論理ゲートは, 単純な線形光学素子, 利用可能な単一量子ビット演算, 成熟度測定方法のみに依存しており, 実用上, 有効である。

The decoherence effect caused by the coupling between the system and the environment undoubtedly leads to the errors in efficient implementations of two (or three) qubit logical gates in quantum information processing. Fortunately, decoherence-free subspace (DFS) introduced can effectively decrease the influence of decoherence effect. In this paper, we propose some schemes for setting up a family of quantum control gates, including controlled-NOT (CNOT), Toffoli, and Fredkin gates for two or three logical qubits by means of cross-Kerr nonlinearities in DFS. These three logical gates require neither complicated quantum computational circuits nor auxiliary photons (or entangled states). The success probabilities of three logical gates are approximate 1 by performing the corresponding classical feed-forward operations based on the different measuring results of the X-homodyne detectors, and their fidelities are robust against the photon loss with the current technology. The proposed logical gates rely on only simple linear-optics elements, available single-qubit operations, and mature measurement methods, making our proposed gates be feasible and efficient in practical applications.

翻訳日:2024-04-27 00:07:23 公開日:2024-04-24

# Adjusted PageRank Centrality を用いたサイバーキー地形識別

Cyber Key Terrain Identification Using Adjusted PageRank Centrality ( http://arxiv.org/abs/2306.11018v2 )

ライセンス: Link先を確認

Lukáš Sadlek, Pavel Čeleda,

(参考訳) サイバー地形には、デバイス、ネットワークサービス、サイバーペルソナ、その他ネットワーク操作に関わるネットワークエンティティが含まれる。キーネットワークエンティティをネットワーク操作に自動的に識別する手法の設計は困難である。しかし、サイバー防衛が重視すべきサイバー資産を決定するためには、このような方法が不可欠である。本稿では,PageRankの集中度計算を機械学習によって調整した手法を用いて,サイバー鍵地形に属するIPアドレスをネットワーク位置に応じて分類する手法を提案する。我々は、IPフローでキャプチャされたソースポートと宛先ポートに基づいて、PageRankの減衰要因を識別するために、登山アルゴリズムとランダムウォークアルゴリズムを使用した。静的データサンプルのワンタイム学習フェーズでは、完全なネットワークグラフを維持することなく、IPフローデータからキーホストをほぼリアルタイムに分類することができる。我々は,サイバー防御演習とキャンパスネットワークのデータから,データセットに対するアプローチを評価した。その結果, 中央値の調整計算によるサイバー鍵地形の同定は, 元のバージョンよりも精度が高いことがわかった。

The cyber terrain contains devices, network services, cyber personas, and other network entities involved in network operations. Designing a method that automatically identifies key network entities to network operations is challenging. However, such a method is essential for determining which cyber assets should the cyber defense focus on. In this paper, we propose an approach for the classification of IP addresses belonging to cyber key terrain according to their network position using the PageRank centrality computation adjusted by machine learning. We used hill climbing and random walk algorithms to distinguish PageRank's damping factors based on source and destination ports captured in IP flows. The one-time learning phase on a static data sample allows near-real-time stream-based classification of key hosts from IP flow data in operational conditions without maintaining a complete network graph. We evaluated the approach on a dataset from a cyber defense exercise and on data from the campus network. The results show that cyber key terrain identification using the adjusted computation of centrality is more precise than its original version.

翻訳日:2024-04-27 00:07:23 公開日:2024-04-24

# ニューラルネットワークの階層構造

A Hierarchical Architecture for Neural Materials ( http://arxiv.org/abs/2307.10135v3 )

ライセンス: Link先を確認

Bowen Xue, Shuang Zhao, Henrik Wann Jensen, Zahra Montazeri,

(参考訳) ニューラルリフレクタンスモデルは、多くの現実世界の物質を異なるスケールで空間的に変化する外観を再現することができる。残念なことに、NeuMIPのような既存の技術は、強いシャドーイング効果や詳細なスペックハイライトを持つ材料を扱うのに苦労している。本稿では,新しいレベルの精度を提供するニューラルな外観モデルを提案する。私たちのモデルの中心は、並列動作カーネルを用いて複数のスケールで素材の外観をキャプチャし、特殊な畳み込み層を通じて多段階の機能を保証する、インセプションベースのコアネットワーク構造である。さらに、入力を周波数空間に符号化し、勾配に基づく損失を導入し、学習フェーズの進行に適応させる。提案手法の有効性を, 各種合成例と実例を用いて実証する。

Neural reflectance models are capable of reproducing the spatially-varying appearance of many real-world materials at different scales. Unfortunately, existing techniques such as NeuMIP have difficulties handling materials with strong shadowing effects or detailed specular highlights. In this paper, we introduce a neural appearance model that offers a new level of accuracy. Central to our model is an inception-based core network structure that captures material appearances at multiple scales using parallel-operating kernels and ensures multi-stage features through specialized convolution layers. Furthermore, we encode the inputs into frequency space, introduce a gradient-based loss, and employ it adaptive to the progress of the learning phase. We demonstrate the effectiveness of our method using a variety of synthetic and real examples.

翻訳日:2024-04-27 00:07:23 公開日:2024-04-24

# TransFusion: 変圧器を用いた拡散モデルを用いた長距離高忠実時系列生成

TransFusion: Generating Long, High Fidelity Time Series using Diffusion Models with Transformers ( http://arxiv.org/abs/2307.12667v2 )

ライセンス: Link先を確認

Md Fahim Sikder, Resmi Ramachandranpillai, Fredrik Heintz,

(参考訳) 高品質で時系列の時系列データの生成は、その幅広い応用のために不可欠である。過去には、時系列データを合成するためにスタンドアロンのRecurrent and Convolutional Neural Network-based Generative Adversarial Networks (GAN) が用いられていた。しかし、アーキテクチャの制約のため、時系列データの長いシーケンスを生成するには不十分である。さらに、GANはトレーニングの不安定性とモード崩壊の問題でよく知られている。そこで本稿では,トランスフュージョン(TransFusion)とトランスフュージョン(TransFusion)をモデルとして,高品質な時系列時系列データを生成する。配列長を384に拡張し,高品質な合成データを生成した。また,合成データの品質と予測特性を評価するための2つの評価指標を提案する。我々はTransFusionを様々な視覚的・経験的な指標で評価し、TransFusionは従来の最先端技術よりも大幅に優れています。

The generation of high-quality, long-sequenced time-series data is essential due to its wide range of applications. In the past, standalone Recurrent and Convolutional Neural Network-based Generative Adversarial Networks (GAN) were used to synthesize time-series data. However, they are inadequate for generating long sequences of time-series data due to limitations in the architecture. Furthermore, GANs are well known for their training instability and mode collapse problem. To address this, we propose TransFusion, a diffusion, and transformers-based generative model to generate high-quality long-sequence time-series data. We have stretched the sequence length to 384, and generated high-quality synthetic data. Also, we introduce two evaluation metrics to evaluate the quality of the synthetic data as well as its predictive characteristics. We evaluate TransFusion with a wide variety of visual and empirical metrics, and TransFusion outperforms the previous state-of-the-art by a significant margin.

翻訳日:2024-04-27 00:07:23 公開日:2024-04-24

# 大規模データ駆動フルウェーブフォームインバージョンに関する実証的研究

An Empirical Study of Large-Scale Data-Driven Full Waveform Inversion ( http://arxiv.org/abs/2307.15388v2 )

ライセンス: Link先を確認

Peng Jin, Yinan Feng, Shihang Feng, Hanchen Wang, Yinpeng Chen, Benjamin Consolvo, Zicheng Liu, Youzuo Lin,

(参考訳) 本稿では,ビッグデータがディープラーニングモデルに与える影響について検討し,FWI(Full Waveform Inversion)問題の解法を提案する。ビッグデータが多くのタスクにおいてディープラーニングモデルの性能を向上させることはよく知られているが、その有効性はFWIでは検証されていない。このギャップに対処するために、最近出版された大規模で多構造的な合成データセットの集合であるOpenFWIで訓練されたFWIのディープラーニングモデルがどのように振る舞うかを実証研究する。特に,470万組の地震データと速度マップを含むOpenFWIの10個の2次元サブセットを用いてFWIモデルを訓練し,評価する。実験の結果,MSEでは平均13.03%,MSEでは7.19%,SSIMでは1.87%,残余一般化テストでは平均28.60%,21.55%,8.22%の改善が得られた。さらに、モデルキャパシティは最適な改善のためにデータサイズに応じてスケールする必要があることを示し、最も大きなモデルでは、最小モデルに比べて20.06%、13.39%、0.72%の平均的な改善が得られます。

This paper investigates the impact of big data on deep learning models to help solve the full waveform inversion (FWI) problem. While it is well known that big data can boost the performance of deep learning models in many tasks, its effectiveness has not been validated for FWI. To address this gap, we present an empirical study that investigates how deep learning models in FWI behave when trained on OpenFWI, a collection of large-scale, multi-structural, synthetic datasets published recently. In particular, we train and evaluate the FWI models on a combination of 10 2D subsets in OpenFWI that contain 470K pairs of seismic data and velocity maps in total. Our experiments demonstrate that training on the combined dataset yields an average improvement of 13.03% in MAE, 7.19% in MSE and 1.87% in SSIM compared to each split dataset, and an average improvement of 28.60%, 21.55% and 8.22% in the leave-one-out generalization test. We further demonstrate that model capacity needs to scale in accordance with data size for optimal improvement, where our largest model yields an average improvement of 20.06%, 13.39% and 0.72% compared to the smallest one.

翻訳日:2024-04-27 00:07:23 公開日:2024-04-24

# 初期スクリーニング順序問題

The Initial Screening Order Problem ( http://arxiv.org/abs/2307.15398v3 )

ライセンス: Link先を確認

Jose M. Alvarez, Antonio Mastropietro, Salvatore Ruggieri,

(参考訳) 本研究は,従業員採用や大学入学など,候補検診プロセスにおける初期検診命令(ISO)の役割について検討する。 ISOは、スクリーニング者が候補プールを評価する順序を指す。文学では、選択されたセットの最適性と公正性、特にヒトスクリーニングの下での潜在的影響にもかかわらず、ほとんど見過ごされている。問題の定式化は、$k$、$k$、$k$、$k$という2つを定義します。 ISOの影響を調べるため、人間のようなスクリーニングを導入し、アルゴリズムと比較する。人型スクリーニング装置は、疲労により時間の経過とともに不整合であると考えられる。分析の結果、ISOは、特に人間のようなスクリーニングの下では、ミーティンググループレベルの公正さにもかかわらず、個人の公正さを妨げていることがわかった。これは、候補の評価がISO内の位置によって影響を受ける位置バイアスによるものである。我々は,アルゴリズムと人型スクリーニングの両方において,ベスト$k$とグッド$k$の問題定式化のパラメータを探索する広範囲なシミュレーション実験を報告する。この研究は、ヨーロッパの大企業と共同で研究されている実世界の候補者スクリーニング問題によって動機付けられている。

We investigate the role of the initial screening order (ISO) in candidate screening processes, such as employee hiring and academic admissions. The ISO refers to the order in which the screener evaluates the candidate pool. It has been largely overlooked in the literature, despite its potential impact on the optimality and fairness of the chosen set, especially under a human screener. We define two problem formulations: the best-$k$, where the screener selects the $k$ best candidates, and the good-$k$, where the screener selects the $k$ first good-enough candidates. To study the impact of the ISO, we introduce a human-like screener and compare it to its algorithmic counterpart. The human-like screener is conceived to be inconsistent over time due to fatigue. Our analysis shows that the ISO, in particular, under a human-like screener hinders individual fairness despite meeting group level fairness. This is due to the position bias, where a candidate's evaluation is affected by its position within the ISO. We report extensive simulated experiments exploring the parameters of the best-$k$ and good-$k$ problem formulations both for the algorithmic and human-like screeners. This work is motivated by a real world candidate screening problem studied in collaboration with a large European company.

翻訳日:2024-04-27 00:07:23 公開日:2024-04-24

# スワップ演算子の代数構造による量子マックスカットの緩和と厳密解

Relaxations and Exact Solutions to Quantum Max Cut via the Algebraic Structure of Swap Operators ( http://arxiv.org/abs/2307.15661v3 )

ライセンス: Link先を確認

Adam Bene Watts, Anirban Chowdhury, Aidan Epperly, J. William Helton, Igor Klep,

(参考訳) 量子マックスカット(QMC)問題は、局所ハミルトン問題に対する近似アルゴリズムを設計するためのテスト確率として登場した。本稿では、QMCの代数構造、特に量子マックスカットハミルトニアンと対称群の表現理論の関係を用いてこの問題に対処する。この論文の最初の大きな貢献は、量子マックスカットに緩和の新たな階層を与えるために非可換な正方形最適化手法(ncSoS)の拡張である。現在の階層は、キュービットスワップ作用素の多項式に対する最適化に基づいている。これは、パウリ行列の項で表される多項式に基づく「標準的な」量子ラッサール階層とは対照的である。この階層の正しさを証明するために、キュービットスワップ作用素によって生成される代数の有限表現を利用する。このプレゼンテーションでは、スワップ演算子の言葉で書かれた多項式を操作・単純化するためのコンピュータ代数的技法が利用可能であり、独立した興味を持つかもしれない。驚くべきことに、この新しい階層のレベル2は、少なくとも8頂点のグラフ上の一様辺重みを持つ全てのQMCインスタンス上で、数値的に正確である(耐性10^(-7)まで)。この論文の2つ目の大きな貢献は、あるグラフに対してQMCハミルトンの最大固有値を計算する多項式時間アルゴリズムである。後者の特別なケースは、一様辺重みを持つ完備二部グラフであり、リーブとマティスの業績から正確な解が知られている。この手法は対称群の表現論を用いており、リーブ・マティス結果の一般化と見なすことができる。

The Quantum Max Cut (QMC) problem has emerged as a test-problem for designing approximation algorithms for local Hamiltonian problems. In this paper we attack this problem using the algebraic structure of QMC, in particular the relationship between the quantum max cut Hamiltonian and the representation theory of the symmetric group. The first major contribution of this paper is an extension of non-commutative Sum of Squares (ncSoS) optimization techniques to give a new hierarchy of relaxations to Quantum Max Cut. The hierarchy we present is based on optimizations over polynomials in the qubit swap operators. This is in contrast to the "standard" quantum Lasserre Hierarchy, which is based on polynomials expressed in terms of the Pauli matrices. To prove correctness of this hierarchy, we exploit a finite presentation of the algebra generated by the qubit swap operators. This presentation allows for the use of computer algebraic techniques to manipulate and simplify polynomials written in terms of the swap operators, and may be of independent interest. Surprisingly, we find that level-2 of this new hierarchy is numerically exact (up to tolerance 10^(-7)) on all QMC instances with uniform edge weights on graphs with at most 8 vertices. The second major contribution of this paper is a polynomial-time algorithm that computes (in exact arithmetic) the maximum eigenvalue of the QMC Hamiltonian for certain graphs, including graphs that can be "decomposed" as a signed combination of cliques. A special case of the latter are complete bipartite graphs with uniform edge-weights, for which exact solutions are known from the work of Lieb and Mattis. Our methods, which use representation theory of the symmetric group, can be seen as a generalization of the Lieb-Mattis result.

翻訳日:2024-04-27 00:07:23 公開日:2024-04-24

# ネットワーク型マルチエージェントマルコフ決定過程に対する連続時間分散動的計画法

Continuous-Time Distributed Dynamic Programming for Networked Multi-Agent Markov Decision Processes ( http://arxiv.org/abs/2307.16706v6 )

ライセンス: Link先を確認

Donghwan Lee, Han-Dong Lim, Do Wan Kim,

(参考訳) 本稿では,ネットワーク型マルチエージェントマルコフ決定問題(MAMDP)に対する連続時間分散動的プログラミング(DP)アルゴリズムについて検討する。本研究では,個々のエージェントが自身の報酬のみにアクセスできる分散マルチエージェントフレームワークを採用し,他のエージェントの報酬に対する洞察を欠いている。さらに、各エージェントは、グラフで表される通信ネットワークを介して、そのパラメータを隣接するエージェントと共有することができる。まず,Wang と Elia の分散最適化手法に着想を得た分散DPを提案する。次に、デカップリングプロセスを通じて、新しい分散DPを導入する。 DPアルゴリズムの収束はシステムと制御の観点から証明される。本稿では,分散時間差学習アルゴリズムについて述べる。

The main goal of this paper is to investigate continuous-time distributed dynamic programming (DP) algorithms for networked multi-agent Markov decision problems (MAMDPs). In our study, we adopt a distributed multi-agent framework where individual agents have access only to their own rewards, lacking insights into the rewards of other agents. Moreover, each agent has the ability to share its parameters with neighboring agents through a communication network, represented by a graph. We first introduce a novel distributed DP, inspired by the distributed optimization method of Wang and Elia. Next, a new distributed DP is introduced through a decoupling process. The convergence of the DP algorithms is proved through systems and control perspectives. The study in this paper sets the stage for new distributed temporal different learning algorithms.

翻訳日:2024-04-27 00:07:23 公開日:2024-04-24

# 情緒的核・共感 : EmotionBench を用いた LLM の評価

Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench ( http://arxiv.org/abs/2308.03656v4 )

ライセンス: Link先を確認

Jen-tse Huang, Man Ho Lam, Eric John Li, Shujie Ren, Wenxuan Wang, Wenxiang Jiao, Zhaopeng Tu, Michael R. Lyu,

(参考訳) 大規模言語モデル(LLM)の人為的能力の評価は,現代言論においてますます重要になっている。感情評価理論を心理学から活用し, LLMの共感能力, すなわち, 特定の状況における感情の変化を評価することを提案する。注意深い総合的な調査の後、研究の中心となる8つの感情を引き出すのに有効な400以上の状況を含むデータセットを収集しました。状況を36因子に分類し,世界中の1200名以上の被験者を対象に人間による評価を行った。 GPT-4 や LLaMA-2 のような最新のイテレーションを特徴とする,商用モデルとオープンソースモデルの両方をカバーする5つの LLM を参考として評価を行った。いくつかのミスアライメントにもかかわらず、LLMは一般的に特定の状況に適切に対応できる。しかしながら、それらは人間の感情的な行動と一致せず、類似した状況間のつながりを確立できない。 EmotionBenchと呼ばれるテストフレームワークは、https://github.com/CUHK-ARISE/EmotionBench.comから公開されています。我々は,人間の感情行動との整合性を向上し,知的アシスタントとしての有用性と適用性を高めることを目的としている。

Evaluating Large Language Models' (LLMs) anthropomorphic capabilities has become increasingly important in contemporary discourse. Utilizing the emotion appraisal theory from psychology, we propose to evaluate the empathy ability of LLMs, i.e., how their feelings change when presented with specific situations. After a careful and comprehensive survey, we collect a dataset containing over 400 situations that have proven effective in eliciting the eight emotions central to our study. Categorizing the situations into 36 factors, we conduct a human evaluation involving more than 1,200 subjects worldwide. With the human evaluation results as references, our evaluation includes five LLMs, covering both commercial and open-source models, including variations in model sizes, featuring the latest iterations, such as GPT-4 and LLaMA-2. We find that, despite several misalignments, LLMs can generally respond appropriately to certain situations. Nevertheless, they fall short in alignment with the emotional behaviors of human beings and cannot establish connections between similar situations. Our collected dataset of situations, the human evaluation results, and the code of our testing framework, dubbed EmotionBench, is made openly accessible via https://github.com/CUHK-ARISE/EmotionBench. We aspire to contribute to the advancement of LLMs regarding better alignment with the emotional behaviors of human beings, thereby enhancing their utility and applicability as intelligent assistants.

翻訳日:2024-04-26 23:57:24 公開日:2024-04-24

# 1+1D $\mathbb{Z}_2$格子ゲージ理論における有限温度での閉じ込め

Confinement in 1+1D $\mathbb{Z}_2$ Lattice Gauge Theories at Finite Temperature ( http://arxiv.org/abs/2308.08592v2 )

ライセンス: Link先を確認

Matjaž Kebrič, Jad C. Halimeh, Ulrich Schollwöck, Fabian Grusdt,

(参考訳) 閉じ込めはゲージ理論のパラダイム的な現象であり、その理解は高エネルギー物理学の最前線にある。ここでは, 有限温度での1次元$\mathbb{Z}_2$格子ゲージ理論の閉じ込めについて検討する。行列積状態(MPS)計算を用いることで、有限温度グリーン関数の崩壊を調べ、閉じ込められた状態と分解された状態の間の滑らかな交叉を明らかにする。さらに,MPSから採取したスナップショットから得られたフリーデル振動と弦長分布を実験により容易に利用でき,任意の有限温度で閉じ込められた中間子が適切に定義されていることを検証した。この現象学は、メソンのクエンチダイナミクスを正確に対角化することでさらに支持される。実験結果から, 有限温度における閉じ込めに関する新たな光が得られた。

Confinement is a paradigmatic phenomenon of gauge theories, and its understanding lies at the forefront of high-energy physics. Here, we study confinement in a simple one-dimensional $\mathbb{Z}_2$ lattice gauge theory at finite temperature and filling, which is within the reach of current cold-atom and superconducting-qubit platforms. By employing matrix product states (MPS) calculations, we investigate the decay of the finite-temperature Green's function and uncover a smooth crossover between the confined and deconfined regimes. Furthermore, using the Friedel oscillations and string length distributions obtained from snapshots sampled from MPS, both of which are experimentally readily available, we verify that confined mesons remain well-defined at arbitrary finite temperature. This phenomenology is further supported by probing quench dynamics of mesons with exact diagonalization. Our results shed new light on confinement at finite temperature from an experimentally relevant standpoint.

翻訳日:2024-04-26 23:57:24 公開日:2024-04-24

# G3Reg:ガウス楕円体モデルを用いたピラミッドグラフによるグローバルレジストレーション

G3Reg: Pyramid Graph-based Global Registration using Gaussian Ellipsoid Model ( http://arxiv.org/abs/2308.11573v2 )

ライセンス: Link先を確認

Zhijian Qiao, Zehuan Yu, Binqian Jiang, Huan Yin, Shaojie Shen,

(参考訳) 本研究では,LiDAR点雲の高速かつ堅牢なグローバル登録のための新しいフレームワークであるG3Regを紹介する。従来の複雑なキーポイントやディスクリプタとは対照的に,原点雲から平面,クラスタ,線(PCL)を含む基本的な幾何学的プリミティブを抽出し,低レベルのセマンティックセグメントを得る。各セグメントは統一ガウス楕円体モデル (GEM) として表現され、確率楕円体を用いて基底真理中心が一定の確率で包含されることを保証する。本稿では,これらのGEMを用いて,グローバル登録のためのピラミッド適合性グラフ(PAGOR)に基づく不信・検証方式を提案する。具体的には、ピラミッドグラフを構築するための互換性テストの信頼性レベルに基づいて、上界を確立する。そして、ピラミッドグラフの各レベルに対して複数の最大傾き(MAC)を解き、対応する変換候補を生成する。検証段階では、最適候補を特定するために、幾何学的プリミティブに基づいて構築された点雲のアライメント品質の正確かつ効率的な測定基準を採用する。アルゴリズムのパフォーマンスは、公開されている3つのデータセットと、自己コンパイルされたマルチセッションデータセットで検証される。パラメータ設定は実験評価中も変化しなかった。その結果,G3Regフレームワークの高剛性と実時間性能は最先端の手法と比較して優れていた。さらに,個々のGEMおよびPAGORコンポーネントを他の登録フレームワークに統合して有効性を高める可能性を示した。コード:https://github.com/HKUST-Aerial-Robotics/G3Reg

This study introduces a novel framework, G3Reg, for fast and robust global registration of LiDAR point clouds. In contrast to conventional complex keypoints and descriptors, we extract fundamental geometric primitives, including planes, clusters, and lines (PCL) from the raw point cloud to obtain low-level semantic segments. Each segment is represented as a unified Gaussian Ellipsoid Model (GEM), using a probability ellipsoid to ensure the ground truth centers are encompassed with a certain degree of probability. Utilizing these GEMs, we present a distrust-and-verify scheme based on a Pyramid Compatibility Graph for Global Registration (PAGOR). Specifically, we establish an upper bound, which can be traversed based on the confidence level for compatibility testing to construct the pyramid graph. Then, we solve multiple maximum cliques (MAC) for each level of the pyramid graph, thus generating the corresponding transformation candidates. In the verification phase, we adopt a precise and efficient metric for point cloud alignment quality, founded on geometric primitives, to identify the optimal candidate. The algorithm's performance is validated on three publicly available datasets and a self-collected multi-session dataset. Parameter settings remained unchanged during the experiment evaluations. The results exhibit superior robustness and real-time performance of the G3Reg framework compared to state-of-the-art methods. Furthermore, we demonstrate the potential for integrating individual GEM and PAGOR components into other registration frameworks to enhance their efficacy. Code: https://github.com/HKUST-Aerial-Robotics/G3Reg

翻訳日:2024-04-26 23:57:24 公開日:2024-04-24

# 弱レーザー励起下におけるダイヤモンド中の窒素空孔中心の光磁気共鳴

Optically Detected Magnetic Resonance of Nitrogen-Vacancy Centers in Diamond under Weak Laser Excitation ( http://arxiv.org/abs/2308.13351v2 )

ライセンス: Link先を確認

Yong-Hong Yu, Rui-Zhi Zhang, Yue Xu, Xiu-Qi Chen, Huijie Zheng, Quan Li, Ren-Bao Liu, Xin-Yu Pan, Dmitry Budker, Gang-Qin Liu,

(参考訳) 有望な量子センサーとして、ダイヤモンド中の窒素空孔(NV)中心は、凝縮物質物理学、物質科学、生命科学のフロンティア研究に広く用いられている。実用用途では、レーザー照射の副作用、例えば光毒性や加熱を減らすため、弱いレーザー励起が好ましい。弱い532nmレーザー励起下でのNV中心アンサンブルの光検出磁気共鳴(ODMR)の理論的および実験的研究を併用して報告する。この状態において、ODMRスペクトルの幅と分割はレーザーパワーの増加とともに減少する。この電力依存は、NV--N+対のレーザー誘起電荷中和を考慮したモデルで再現され、局所電界環境が変化する。これらの結果は、感光性アプリケーションにおけるNVベースの量子センシングの理解と設計に重要である。

As promising quantum sensors, nitrogen-vacancy (NV) centers in diamond have been widely used in frontier studies in condensed matter physics, material sciences, and life sciences. In practical applications, weak laser excitation is favorable as it reduces the side effects of laser irradiation, for example, phototoxicity and heating. Here we report a combined theoretical and experimental study of optically detected magnetic resonance (ODMR) of NV-center ensembles under weak 532-nm laser excitation. In this regime, both the width and splitting of ODMR spectra decrease with increasing laser power. This power dependence is reproduced with a model considering laser-induced charge neutralization of NV--N+ pairs, which alters the local electric field environment. These results are important for understanding and designing NV-based quantum sensing in light-sensitive applications.

翻訳日:2024-04-26 23:57:24 公開日:2024-04-24

# POCKET:特徴選択から見た時系列分類のためのランダム畳み込みカーネル

POCKET: Pruning Random Convolution Kernels for Time Series Classification from a Feature Selection Perspective ( http://arxiv.org/abs/2309.08499v3 )

ライセンス: Link先を確認

Shaowu Chen, Weize Sun, Lei Huang, Xiaopeng Li, Qingyuan Wang, Deepu John,

(参考訳) 近年、ROCKETとMINIROCKETという2つの競合時系列分類モデルが、トレーニングコストの低さと高い精度で注目されている。しかし、リソース制約のあるデバイスと互換性のない機能を包括的にキャプチャするには、多数のランダムな1-D畳み込みカーネルが必要である。冗長カーネルを認識およびプルークするために設計されたヒューリスティックアルゴリズムの開発にもかかわらず、進化的アルゴリズムの本質的な時間的特性は効率的な評価を妨げている。そこで本論文では,逐次分類器の接続を不要にすることで,冗長なランダムカーネルを特徴選択の観点から排除する。 2つの革新的なアルゴリズムが提案され、第1のADMMベースのアルゴリズムはグループ弾性ネット分類問題としてプルーニングチャレンジを定式化し、第2のコアアルゴリズムであるPOCKETは問題を2段階に分岐させることで第1のアルゴリズムを大幅に高速化する。 POCKETのステージ1では、動的に異なるペナルティを導入して、冗長カーネルを削除するためにグループレベルの正規化を効率的に実装している。多様な時系列データセットによる実験結果から、POCKETは精度を著しく低下させることなく最大60%のカーネルを産み出し、それよりも11倍高速に動作していることがわかった。私たちのコードはhttps://github.com/ShaowuChen/POCKET.comで公開されています。

In recent years, two competitive time series classification models, namely, ROCKET and MINIROCKET, have garnered considerable attention due to their low training cost and high accuracy. However, they require a large number of random 1-D convolutional kernels to comprehensively capture features, which is incompatible with resource-constrained devices. Despite the development of heuristic algorithms designed to recognize and prune redundant kernels, the inherent time-consuming nature of evolutionary algorithms hinders efficient evaluation. To effectively prune models, this paper removes redundant random kernels from a feature selection perspective by eliminating associating connections in the sequential classifier. Two innovative algorithms are proposed, where the first ADMM-based algorithm formulates the pruning challenge as a group elastic net classification problem, and the second core algorithm named POCKET greatly accelerates the first one by bifurcating the problem into two sequential stages. Stage 1 of POCKET introduces dynamically varying penalties to efficiently implement group-level regularization to delete redundant kernels, and Stage 2 employs element-level regularization on the remaining features to refit a linear classifier for better performance. Experimental results on diverse time series datasets show that POCKET prunes up to 60% of kernels without a significant reduction in accuracy and performs 11 times faster than its counterparts. Our code is publicly available at https://github.com/ShaowuChen/POCKET.

翻訳日:2024-04-26 23:57:24 公開日:2024-04-24

# オンライングラフ学習のための不確実性駆動探索手法

Uncertainty-driven Exploration Strategies for Online Grasp Learning ( http://arxiv.org/abs/2309.12038v2 )

ライセンス: Link先を確認

Yitian Shi, Philipp Schillinger, Miroslav Gabriel, Alexander Qualmann, Zohar Feldman, Hanna Ziesche, Ngo Anh Vien,

(参考訳) 既存の把握予測アプローチは、主にオフライン学習に基づいており、オンライン適応中の探索的把握学習を、新しいピックシナリオ、すなわち、目に見えないオブジェクトや、ドメイン外(OOD)、カメラ、ビンの設定に無視する。本稿では,ロボットビンピッキングにおける把握予測のオンライン学習における不確実性に基づくアプローチを提案する。具体的には、効果的な探索戦略を持つオンライン学習アルゴリズムは、目に見えない環境設定への適応性を著しく向上させることができる。この目的のために,まずオンライン学習をRL問題として定式化することを提案する。ベイズの不確実性定量化と分布アンサンブルに基づく様々な不確実性推定手法を提案する。我々は,様々な難易度のある実世界のビンピッキングシーンの評価を行う。ビン内の物体は、半透明または全透明、不規則または湾曲した表面によって特徴づけられる様々な困難な物理的特徴と知覚的特徴を有する。実験の結果, ナイーブな探索戦略のみを取り入れた従来のオンライン学習手法と比較して, 把握能力の顕著な向上が示された。ビデオ:https://youtu.be/fPKOrjC2QrU

Existing grasp prediction approaches are mostly based on offline learning, while, ignoring the exploratory grasp learning during online adaptation to new picking scenarios, i.e., objects that are unseen or out-of-domain (OOD), camera and bin settings, etc. In this paper, we present an uncertainty-based approach for online learning of grasp predictions for robotic bin picking. Specifically, the online learning algorithm with an effective exploration strategy can significantly improve its adaptation performance to unseen environment settings. To this end, we first propose to formulate online grasp learning as an RL problem that will allow us to adapt both grasp reward prediction and grasp poses. We propose various uncertainty estimation schemes based on Bayesian uncertainty quantification and distributional ensembles. We carry out evaluations on real-world bin picking scenes of varying difficulty. The objects in the bin have various challenging physical and perceptual characteristics that can be characterized by semi- or total transparency, and irregular or curved surfaces. The results of our experiments demonstrate a notable improvement of grasp performance in comparison to conventional online learning methods which incorporate only naive exploration strategies. Video: https://youtu.be/fPKOrjC2QrU

翻訳日:2024-04-26 23:57:24 公開日:2024-04-24

# RoleLLM: 大規模言語モデルのベンチマーク、緩和、ロールプレイ能力向上

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models ( http://arxiv.org/abs/2310.00746v2 )

ライセンス: Link先を確認

Zekun Moore Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhan Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Jian Yang, Man Zhang, Zhaoxiang Zhang, Wanli Ouyang, Ke Xu, Stephen W. Huang, Jie Fu, Junran Peng,

(参考訳) LLM(Large Language Models)の出現は、ロールプレイングのような複雑なタスクの道を開いた。しかし、最先端のLCMのクローズソースの性質と、それらの汎用的なトレーニングはロールプレイングの最適化を制限している。本稿では,LLMにおけるロールプレイング能力をベンチマークし,評価し,拡張するフレームワークであるRoleLLMを紹介する。 RoleLLM は,(1) 役割のロールプロファイル構築,(2) 役割固有の知識抽出のためのコンテキストベースインストラクション生成(Context-Instruction Generation),(3) GPT (RoleGPT) を用いた発話スタイル模倣のためのロールプロンプト,(4) オープンソースモデルの微調整のためのロールコンストラクションインストラクションチューニング (RoCIT) の4段階から構成される。 Context-InstructとRoleGPTによって、168,093サンプルでロールプレイする最初の体系的できめ細かい文字レベルのベンチマークデータセットであるRoleBenchを作成します。さらに、RoleBench上のRoCITはRoleLLaMA(英語)とRoleGLM(中国語)を生成し、ロールプレイング能力を大幅に向上させ、RoleGPT(GPT-4)と同等の結果を得る。

The advent of Large Language Models (LLMs) has paved the way for complex tasks such as role-playing, which enhances user interactions by enabling models to imitate various characters. However, the closed-source nature of state-of-the-art LLMs and their general-purpose training limit role-playing optimization. In this paper, we introduce RoleLLM, a framework to benchmark, elicit, and enhance role-playing abilities in LLMs. RoleLLM comprises four stages: (1) Role Profile Construction for 100 roles; (2) Context-Based Instruction Generation (Context-Instruct) for role-specific knowledge extraction; (3) Role Prompting using GPT (RoleGPT) for speaking style imitation; and (4) Role-Conditioned Instruction Tuning (RoCIT) for fine-tuning open-source models along with role customization. By Context-Instruct and RoleGPT, we create RoleBench, the first systematic and fine-grained character-level benchmark dataset for role-playing with 168,093 samples. Moreover, RoCIT on RoleBench yields RoleLLaMA (English) and RoleGLM (Chinese), significantly enhancing role-playing abilities and even achieving comparable results with RoleGPT (using GPT-4).

翻訳日:2024-04-26 23:57:24 公開日:2024-04-24

# SEED: 大規模言語モデルによるドメイン特化データキュレーション

SEED: Domain-Specific Data Curation With Large Language Models ( http://arxiv.org/abs/2310.00749v3 )

ライセンス: Link先を確認

Zui Chen, Lei Cao, Sam Madden, Tim Kraska, Zeyuan Shang, Ju Fan, Nan Tang, Zihui Gu, Chunwei Liu, Michael Cafarella,

(参考訳) 分析のためにデータを作成するデータキュレーションタスクは、データを実行可能な洞察に変換する上で非常に重要です。しかし、異なるドメインにおけるアプリケーションの多様な要求のため、一般的なオフザシェルフツールは一般的に不十分である。その結果、データサイエンティストはデータセットとタスクの両方に適したドメイン固有のソリューションを開発する必要がある。このプロセスは、非常に難しく、時間がかかります。本稿では,Large Language Models (LLMs) を通じて,ドメイン固有のデータキュレーションソリューションを自動生成する LLM-as-compiler アプローチのSEEDを提案する。ユーザがタスクや入力データ、期待される出力を記述すると、SEEDコンパイラは、LLMクエリと、ベクトルベースのキャッシュ、LLM生成コード、LLMアノテーション付きデータに基づいてトレーニングされた小さなモデルといった、よりコスト効率のよい代替品を組み合わせたハイブリッドパイプラインを生成する。 SEEDは4つのLCMアシストモジュールから自動的に選択するオプティマイザを備えており、そのタスクに最も適したハイブリッド実行パイプラインを形成している。この新しい革命的アプローチを検証するために、私たちは5ドル以上のデータキュレーションタスクにまたがる9ドルのデータセットの実験を行いました。すべてのデータレコードでLLMを使用するソリューションと比較して、SEEDは最先端または同等の数ショットのパフォーマンスを達成し、LLM呼び出しの数を著しく削減する。

Data curation tasks that prepare data for analytics are critical for turning data into actionable insights. However, due to the diverse requirements of applications in different domains, generic off-the-shelf tools are typically insufficient. As a result, data scientists often have to develop domain-specific solutions tailored to both the dataset and the task, e.g. writing domain-specific code or training machine learning models on a sufficient number of annotated examples. This process is notoriously difficult and time-consuming. We present SEED, an LLM-as-compiler approach that automatically generates domain-specific data curation solutions via Large Language Models (LLMs). Once the user describes a task, input data, and expected output, the SEED compiler produces a hybrid pipeline that combines LLM querying with more cost-effective alternatives, such as vector-based caching, LLM-generated code, and small models trained on LLM-annotated data. SEED features an optimizer that automatically selects from the four LLM-assisted modules and forms a hybrid execution pipeline that best fits the task at hand. To validate this new, revolutionary approach, we conducted experiments on $9$ datasets spanning over $5$ data curation tasks. In comparison to solutions that use the LLM on every data record, SEED achieves state-of-the-art or comparable few-shot performance, while significantly reducing the number of LLM calls.

翻訳日:2024-04-26 23:47:37 公開日:2024-04-24

# リアルタイムでジェネリックなマルチタスクで一度だけ見る

You Only Look at Once for Real-time and Generic Multi-Task ( http://arxiv.org/abs/2310.01641v4 )

ライセンス: Link先を確認

Jiayuan Wang, Q. M. Jonathan Wu, Ning Zhang,

(参考訳) 高精度で軽量でリアルタイムな応答性は、自動運転を実装する上で必須の3つの要件である。本研究では,適応型,リアルタイム,軽量なマルチタスクモデルであるA-YOLOMを導入する。具体的には、統一的で合理化されたセグメンテーション構造を持つエンドツーエンドのマルチタスクモデルを開発する。セグメンテーションタスクにおいて,すべてのセグメンテーションタスクに対して同じ損失関数を用いて,ネックとバックボーンの機能を適応的に結合する学習可能なパラメータを提案する。これにより、カスタマイズの必要性がなくなり、モデルの一般化能力が強化される。また,一連の畳み込み層のみで構成されたセグメンテーションヘッドを導入し,パラメータ数と推定時間を削減する。 BDD100kデータセット上で、特に視覚化結果の競合的な結果を達成する。その結果, 物体検出用mAP50は81.1%, 乾燥領域分割用mIoUは91.0%, レーン線分割用IoUは28.8%であった。さらに、実環境におけるモデルの性能を評価するための現実シナリオを導入し、競争相手を著しく上回ります。これは、我々のモデルが競争性能を示すだけでなく、既存のマルチタスクモデルよりも柔軟で高速であることを示している。ソースコードと事前訓練済みモデルはhttps://github.com/JiayuanWang-JW/YOLOv8-multi-taskで公開されている。

High precision, lightweight, and real-time responsiveness are three essential requirements for implementing autonomous driving. In this study, we incorporate A-YOLOM, an adaptive, real-time, and lightweight multi-task model designed to concurrently address object detection, drivable area segmentation, and lane line segmentation tasks. Specifically, we develop an end-to-end multi-task model with a unified and streamlined segmentation structure. We introduce a learnable parameter that adaptively concatenates features between necks and backbone in segmentation tasks, using the same loss function for all segmentation tasks. This eliminates the need for customizations and enhances the model's generalization capabilities. We also introduce a segmentation head composed only of a series of convolutional layers, which reduces the number of parameters and inference time. We achieve competitive results on the BDD100k dataset, particularly in visualization outcomes. The performance results show a mAP50 of 81.1% for object detection, a mIoU of 91.0% for drivable area segmentation, and an IoU of 28.8% for lane line segmentation. Additionally, we introduce real-world scenarios to evaluate our model's performance in a real scene, which significantly outperforms competitors. This demonstrates that our model not only exhibits competitive performance but is also more flexible and faster than existing multi-task models. The source codes and pre-trained models are released at https://github.com/JiayuanWang-JW/YOLOv8-multi-task

翻訳日:2024-04-26 23:47:37 公開日:2024-04-24

# スクラッチから遠ざかる - データ駆動プライオリティを必要とするロングシーケンスモデルの比較

Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors ( http://arxiv.org/abs/2310.02980v3 )

ライセンス: Link先を確認

Ido Amos, Jonathan Berant, Ankit Gupta,

(参考訳) シーケンス間の長距離依存関係のモデリングは、機械学習における長年の目標であり、状態空間モデルのようなアーキテクチャに導かれ、長いシーケンス上でトランスフォーマーを劇的に上回っている。しかし、これらの印象的な経験的利益は、モデルがランダムに初期化され、入力シーケンスからターゲットラベルを予測するために訓練されたベンチマーク(例えばLong Range Arena)において、大きく証明されている。本稿では, ランダム初期化がアーキテクチャの違いの過大な過大評価につながることを示すとともに, $\textit{only the downstream task data}$を用いることで, トランスフォーマーと状態空間モデル(SSM)の極めて小さなギャップを生じることを示す。従来の作業とは対照的に,Long Range ArenaにおけるS4の性能に適合するバニラトランスフォーマーが発見され,PathX-256タスクにおけるSSMの最高の報告結果を20絶対点改善する。次に, 事前学習により得られたデータ駆動初期化の存在下で, 従来提案されていたSSMに対する構造化パラメータ化の有用性を解析し, ほとんど冗長となることを示す。我々の研究は、教師付きタスク上で異なるアーキテクチャを評価する際に、事前学習によるデータ駆動の事前学習が信頼性の高い性能推定に不可欠であることを示し、効率的に行うことができることを示した。

Modeling long-range dependencies across sequences is a longstanding goal in machine learning and has led to architectures, such as state space models, that dramatically outperform Transformers on long sequences. However, these impressive empirical gains have been by and large demonstrated on benchmarks (e.g. Long Range Arena), where models are randomly initialized and trained to predict a target label from an input sequence. In this work, we show that random initialization leads to gross overestimation of the differences between architectures and that pretraining with standard denoising objectives, using $\textit{only the downstream task data}$, leads to dramatic gains across multiple architectures and to very small gaps between Transformers and state space models (SSMs). In stark contrast to prior works, we find vanilla Transformers to match the performance of S4 on Long Range Arena when properly pretrained, and we improve the best reported results of SSMs on the PathX-256 task by 20 absolute points. Subsequently, we analyze the utility of previously-proposed structured parameterizations for SSMs and show they become mostly redundant in the presence of data-driven initialization obtained through pretraining. Our work shows that, when evaluating different architectures on supervised tasks, incorporation of data-driven priors via pretraining is essential for reliable performance estimation, and can be done efficiently.

翻訳日:2024-04-26 23:47:37 公開日:2024-04-24

# PST:プログラムスケッチベースのチューニングによる量的トレーディングの改善

PST: Improving Quantitative Trading via Program Sketch-based Tuning ( http://arxiv.org/abs/2310.05551v2 )

ライセンス: Link先を確認

Zhiming Li, Junzhe Jiang, Yushi Cao, Aixin Cui, Bozhi Wu, Bo Li, Yang Liu, Dongning Sun,

(参考訳) 深層強化学習(DRL)は、有能な人的知識を伴わずに十分なパフォーマンスを達成し、量的金融に革命をもたらした。その成果にもかかわらず、現在最先端のDRLモデルは依然として市場の動向を特定するのに効果がなく、良い取引機会を逃したり、市場崩壊に遭遇した場合に大きな損失を被ることになる。この制限に対処するためには、市場の動向に関する人間の専門知識を組み込むことが自然な考えである。しかし、そのような知識は抽象的で定量化が難しい。本稿では,プログラム・スケッチ・ベース・チューニング(PST)と呼ばれる,普遍的なニューロシンボリック・チューニング・フレームワークを提案する。特に、PSTは、新しい記号プログラムスケッチを使用して、市場動向に関する抽象的人間専門家の知識を埋め込むことを最初に提案する。そして、プログラムスケッチを利用して、現在の市場動向に応じて訓練されたDRLポリシーをチューニングする。最後に,このニューラルシンボリックフレームワークを最適化するために,新しいハイブリッド最適化手法を提案する。 2つの一般的な量的トレーディングタスクに対する広範囲な評価は、PSTが非常に軽量でありながら、従来の最先端DRL戦略の性能を大幅に向上させることができることを示している。

Deep reinforcement learning (DRL) has revolutionized quantitative finance by achieving decent performance without significant human expert knowledge. Despite its achievements, we observe that the current state-of-the-art DRL models are still ineffective in identifying the market trend, causing them to miss good trading opportunities or suffer from large drawdowns when encountering market crashes. To tackle this limitation, a natural idea is to embed human expert knowledge regarding the market trend. Whereas, such knowledge is abstract and hard to be quantified. In this paper, we propose a universal neuro-symbolic tuning framework, called program sketch-based tuning (PST). Particularly, PST first proposes using a novel symbolic program sketch to embed the abstract human expert knowledge of market trends. Then we utilize the program sketch to tune a trained DRL policy according to the different market trend of the moment. Finally, in order to optimize this neural-symbolic framework, we propose a novel hybrid optimization method. Extensive evaluations on two popular quantitative trading tasks demonstrate that PST can significantly enhance the performance of previous state-of-the-art DRL strategies while being extremely lightweight.

翻訳日:2024-04-26 23:47:37 公開日:2024-04-24

# 大規模言語モデルはルールを学習できる

Large Language Models can Learn Rules ( http://arxiv.org/abs/2310.07064v2 )

ライセンス: Link先を確認

Zhaocheng Zhu, Yuan Xue, Xinyun Chen, Denny Zhou, Jian Tang, Dale Schuurmans, Hanjun Dai,

(参考訳) いくつかの例と中間ステップで促されると、大きな言語モデル(LLM)は、様々な推論タスクにおいて印象的なパフォーマンスを示している。しかし、LLMにおける暗黙の知識に依存しているメソッドのプロンプトは、暗黙の知識が誤りであったり、そのタスクと矛盾している場合、しばしば誤った答えを生じる。この問題に対処するために,LLMによる推論のためのルールライブラリを学習するフレームワークであるHtT(Hypotheses-to-Theories)を提案する。 HtTは、誘導段階と推論段階の2つの段階を含む。誘導段階では、LLMはまず一連のトレーニング例に基づいてルールを生成し検証するように要求される。出現し、十分な正答につながるルールは、ルールライブラリを形成するために収集されることが多い。推論段階では、LLMは学習ルールライブラリを使用して、テスト問題に答えるための推論を行うように促される。リレーショナル推論、数値推論、概念学習に関する実験は、HtTが既存のプロンプト法を改良し、絶対精度が10～30%向上したことを示している。学習されたルールは、異なるモデルや同じ問題の異なる形式にも転送可能である。

When prompted with a few examples and intermediate steps, large language models (LLMs) have demonstrated impressive performance in various reasoning tasks. However, prompting methods that rely on implicit knowledge in an LLM often generate incorrect answers when the implicit knowledge is wrong or inconsistent with the task. To tackle this problem, we present Hypotheses-to-Theories (HtT), a framework that learns a rule library for reasoning with LLMs. HtT contains two stages, an induction stage and a deduction stage. In the induction stage, an LLM is first asked to generate and verify rules over a set of training examples. Rules that appear and lead to correct answers sufficiently often are collected to form a rule library. In the deduction stage, the LLM is then prompted to employ the learned rule library to perform reasoning to answer test questions. Experiments on relational reasoning, numerical reasoning and concept learning problems show that HtT improves existing prompting methods, with an absolute gain of 10-30% in accuracy. The learned rules are also transferable to different models and to different forms of the same problem.

翻訳日:2024-04-26 23:47:37 公開日:2024-04-24

# 不定量子ダイナミクスによる光回転計測におけるナノラディアンスケール精度

Nanoradian-Scale Precision in Light Rotation Measurement via Indefinite Quantum Dynamics ( http://arxiv.org/abs/2310.07125v3 )

ライセンス: Link先を確認

Binke Xia, Jingzheng Huang, Hongjing Li, Zhongyuan Luo, Guihua Zeng,

(参考訳) 光ビームの操作とメロジは光学科学や応用にとって重要な要素である。特に、光線回転測定における超高精度の達成は、長年にわたる課題である。絡み合った光子のような量子プローブを利用する代わりに、量子パラメータ推定のパラメータ化プロセスに「不定時間方向」と呼ばれる量子戦略を組み込むことで、この問題に対処する。パラメータ化力学のこの量子特性を活用することで、ビームプロファイルの極小角回転を測定するためのOAM資源の利用を最大化することができる。特に、ナノラジアンスケールの光回転測定精度が実験でようやく達成された。さらに、このスキームは光子によって提供される様々な操作可能な資源のために、様々な光学応用において有望である。

The manipulation and metrology of light beams are pivotal for optical science and applications. In particular, achieving ultra-high precision in the measurement of light beam rotations has been a long-standing challenge. Instead of utilizing quantum probes like entangled photons, we address this challenge by incorporating a quantum strategy called "indefinite time direction" into the parameterizing process of quantum parameter estimation. Leveraging this quantum property of the parameterizing dynamics allows us to maximize the utilization of OAM resources for measuring ultra-small angular rotations of beam profile. Notably, a nanoradian-scale precision of light rotation measurement is finally achieved in the experiment, which is the highest precision by far to our best knowledge. Furthermore, this scheme holds promise in various optical applications due to the diverse range of manipulable resources offered by photons.

翻訳日:2024-04-26 23:47:37 公開日:2024-04-24

# ルート付きベル試験による長距離量子相関の証明

Certifying long-range quantum correlations through routed Bell tests ( http://arxiv.org/abs/2310.07484v4 )

ライセンス: Link先を確認

Edwin Peter Lobo, Jef Pauwels, Stefano Pironio,

(参考訳) 透過チャネルの損失は距離とともに増加するが、量子非局所性のフォトニクスの実証とその応用に大きな障害となる。最近、Chaturvedi, Viola, and Pawlowski (CVP) [arXiv:2211.14231] は、量子非局所性を証明できる範囲を拡張することを目的として、標準ベルの実験のバリエーションを導入した。と呼ばれるこれらの実験では、Bobは量子粒子を2つの可能な経路に沿ってルートし、それを2つの異なる場所(近距離と遠距離)で測定することができる。ショートパスにおけるベルの違反は、ロングパスにおける非局所的相関を検出するために必要な条件を弱めるべきである。実際、CVPはルーティングされたベル実験において、検出効率が任意に低い場合でも、リモートデバイスの結果を古典的に規定できないような量子相関が存在することを示した。本稿では,CVPが考慮した相関関係を古典的に規定することはできないが,遠隔デバイスへの量子システムの伝送を必要としないことを示す。これにより、ルート付きベル実験において「短距離」および「長距離」量子相関の概念が定義される。これらの相関は、非可換多項式最適化のための標準半定値プログラム階層によって特徴づけられることを示す。次に、短距離量子相関を除外できる条件について検討する。我々は、遠方装置の臨界検出効率に基本的な低バウンドがあることを指摘し、経路付きベル実験では、任意に大きな距離で長距離量子非局所性を証明できないことを示唆している。しかし,経路付きベル実験により検出効率の閾値が低下することが判明した。しかし、改善はCVPの分析によって示唆されるものよりも大幅に小さい。

Losses in the transmission channel, which increase with distance, pose a major obstacle to photonics demonstrations of quantum nonlocality and its applications. Recently, Chaturvedi, Viola, and Pawlowski (CVP) [arXiv:2211.14231] introduced a variation of standard Bell experiments with the goal of extending the range over which quantum nonlocality can be demonstrated. In these experiments, which we call 'routed Bell experiments', Bob can route his quantum particle along two possible paths and measure it at two distinct locations - one near and another far from the source. The idea is that a Bell violation in the short-path should weaken the conditions required to detect nonlocal correlations in the long-path. Indeed, CVP showed that there are quantum correlations in routed Bell experiments such that the outcomes of the remote device cannot be classically predetermined, even when its detection efficiency is arbitrarily low. In this paper, we show that the correlations considered by CVP, though they cannot be classically predetermined, do not require the transmission of quantum systems to the remote device. This leads us to define the concept of 'short-range' and 'long-range' quantum correlations in routed Bell experiments. We show that these correlations can be characterized through standard semidefinite programming hierarchies for non-commutative polynomial optimization. We then explore the conditions under which short-range quantum correlations can be ruled out. We point out that there exist fundamental lower-bounds on the critical detection efficiency of the distant device, implying that routed Bell experiments cannot demonstrate long-range quantum nonlocality at arbitrarily large distances. However, we do find that routed Bell experiments allow for reducing the detection efficiency threshold. The improvements, though, are significantly smaller than those suggested by CVP's analysis.

翻訳日:2024-04-26 23:47:37 公開日:2024-04-24

# 視覚的注意刺激による予測と学習

Visual Attention Prompted Prediction and Learning ( http://arxiv.org/abs/2310.08420v3 )

ライセンス: Link先を確認

Yifei Zhang, Siyi Gu, Bo Pan, Guangji Bai, Meikang Qiu, Xiaofeng Yang, Liang Zhao,

(参考訳) 視覚的説明(注意)誘導学習はラベルだけでなく、モデル推論プロセスのガイドにも用いられる。視覚的注意誘導学習は有望な結果を示しているが、準備に時間を要する多くの説明アノテーションが必要である。しかし、現実の多くの状況では、モデルの再訓練なしに視覚的注意を喚起することが望まれる。例えば、医療画像上でAI支援がん分類を行う場合、利用者(例えば臨床医)は、どの領域が必須で、どの領域が除外されているかという視覚的な注意喚起をAIモデルに提供することができる。その有望な目標にもかかわらず、視覚的な注意を喚起する予測を達成することは、いくつかの大きな課題を提示する。 1) モデル推論プロセスに視覚的プロンプトを効果的に組み込むには,どうすればよいのか? 2) 視覚的なプロンプトを欠いたサンプルをどう扱うべきか? 3)視覚的プロンプトが不完全である場合,モデルのパフォーマンスにどのような影響があるのか? 本稿では,視覚的プロンプトを利用してモデルの推論過程を制御し,注意喚起による予測と学習のための新しい枠組みを提案する。非プロンプト状況における性能向上と、それに伴うシナリオの調整を目的として、非プロンプトモデルとプロンプトモデルの両方に対する協調学習手法を提案し、同様のパラメータとアクティベーションの共有を保証した。さらに、視覚的プロンプトが入力画像全体を包含していない場合、革新的な注意喚起プロンプト改善法が開発されている。これらの手法は、モデルの説明と整合性を維持しながら不完全なプロンプトを補間する。 4つのデータセットに対する大規模な実験により,提案手法の有効性が実証された。

Visual explanation (attention)-guided learning uses not only labels but also explanations to guide model reasoning process. While visual attention-guided learning has shown promising results, it requires a large number of explanation annotations that are time-consuming to prepare. However, in many real-world situations, it is usually desired to prompt the model with visual attention without model retraining. For example, when doing AI-assisted cancer classification on a medical image, users (e.g., clinicians) can provide the AI model with visual attention prompt on which areas are indispensable and which are precluded. Despite its promising objectives, achieving visual attention-prompted prediction presents several major challenges: 1) How can the visual prompt be effectively integrated into the model's reasoning process? 2) How should the model handle samples that lack visual prompts? 3) What is the impact on the model's performance when a visual prompt is imperfect? This paper introduces a novel framework for attention-prompted prediction and learning, utilizing visual prompts to steer the model's reasoning process. To improve performance in non-prompted situations and align it with prompted scenarios, we propose a co-training approach for both non-prompted and prompted models, ensuring they share similar parameters and activations. Additionally, for instances where the visual prompt does not encompass the entire input image, we have developed innovative attention prompt refinement methods. These methods interpolate the incomplete prompts while maintaining alignment with the model's explanations. Extensive experiments on four datasets demonstrate the effectiveness of our proposed framework in enhancing predictions for samples both with and without prompt.

翻訳日:2024-04-26 23:47:37 公開日:2024-04-24

# 脳年齢予測へのボクセルレベルのアプローチ:局所的脳老化評価法

A voxel-level approach to brain age prediction: A method to assess regional brain aging ( http://arxiv.org/abs/2310.11385v2 )

ライセンス: Link先を確認

Neha Gianchandani, Mahsa Dibaji, Johanna Ospel, Fernando Vega, Mariana Bento, M. Ethan MacDonald, Roberto Souza,

(参考訳) 脳の老化は局所的な現象であり、機械学習の手法を用いて脳年齢予測研究の領域内では比較的解明されていない。ボクセルレベルの予測は、局所的な脳年齢推定を提供し、局所的な老化過程に関する詳細な洞察を与えることができる。これは,健常者と疾患者における老化軌跡の相違を理解するために不可欠である。本研究では,T1強調磁気共鳴画像からのボクセルレベルの脳年齢予測のために,深層学習に基づくマルチタスクモデルを提案する。提案モデルは文献に存在するモデルより優れており、健康な人口と病気の人口の両方に適用した場合に貴重な臨床所見が得られる。脳の既知の解剖学的領域の老化軌跡を理解するために、ボクセルレベルの脳年齢予測を用いて局所分析を行い、認知症やより具体的にはアルツハイマー病のような基礎疾患の患者と比較して、健常者の地域老化軌跡に相違があることが示されている。私たちのコードはhttps://github.com/nehagianchandani/Voxel-level-brain-age-predictionで公開されています。

Brain aging is a regional phenomenon, a facet that remains relatively under-explored within the realm of brain age prediction research using machine learning methods. Voxel-level predictions can provide localized brain age estimates that can provide granular insights into the regional aging processes. This is essential to understand the differences in aging trajectories in healthy versus diseased subjects. In this work, a deep learning-based multitask model is proposed for voxel-level brain age prediction from T1-weighted magnetic resonance images. The proposed model outperforms the models existing in the literature and yields valuable clinical insights when applied to both healthy and diseased populations. Regional analysis is performed on the voxel-level brain age predictions to understand aging trajectories of known anatomical regions in the brain and show that there exist disparities in regional aging trajectories of healthy subjects compared to ones with underlying neurological disorders such as Dementia and more specifically, Alzheimer's disease. Our code is available at https://github.com/nehagianchandani/Voxel-level-brain-age-prediction.

翻訳日:2024-04-26 23:47:37 公開日:2024-04-24

# ニューラルパーセプション機構を持つ部分観測可能な確率ゲーム

Partially Observable Stochastic Games with Neural Perception Mechanisms ( http://arxiv.org/abs/2310.11566v2 )

ライセンス: Link先を確認

Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta Kwiatkowska,

(参考訳) 確率ゲームは、不確実性の下でのマルチエージェントシーケンシャル決定のためのよく確立されたモデルである。しかし、現実的な応用では、エージェントは環境の部分的な観察性しか持たないことが多い。さらに、エージェントは、継続的データに基づいてトレーニングされたニューラルネットワークのようなデータ駆動アプローチを使用して、環境をますます知覚する。本稿では,ニューラルシンボリックな部分可観測確率ゲーム(NS-POSG)のモデルを提案する。我々は、離散的データ駆動観察と、完全インフォームドエージェントを用いた部分インフォームドエージェントによる一方的な設定に焦点を当てた。本稿では,片側NS-POSGを近似解として,片側NS-HSVIと呼ばれる新しい手法を提案する。ニューラルネットワークプレイメージ分析を用いて,有限多面体表現と粒子に基づく信念表現を構築し,歩行者車と追従回避シナリオの分析にその実践的適用性を示す。

Stochastic games are a well established model for multi-agent sequential decision making under uncertainty. In practical applications, though, agents often have only partial observability of their environment. Furthermore, agents increasingly perceive their environment using data-driven approaches such as neural networks trained on continuous data. We propose the model of neuro-symbolic partially-observable stochastic games (NS-POSGs), a variant of continuous-space concurrent stochastic games that explicitly incorporates neural perception mechanisms. We focus on a one-sided setting with a partially-informed agent using discrete, data-driven observations and another, fully-informed agent. We present a new method, called one-sided NS-HSVI, for approximate solution of one-sided NS-POSGs, which exploits the piecewise constant structure of the model. Using neural network pre-image analysis to construct finite polyhedral representations and particle-based representations for beliefs, we implement our approach and illustrate its practical applicability to the analysis of pedestrian-vehicle and pursuit-evasion scenarios.

翻訳日:2024-04-26 23:47:37 公開日:2024-04-24

# PopDescentでスケジュールをストラップする

Scrap Your Schedules with PopDescent ( http://arxiv.org/abs/2310.14671v2 )

ライセンス: Link先を確認

Abhinav Pomalapally, Bassel El Mabsout, Renato Mansuco,

(参考訳) 現代の機械学習のワークロードでは、多くのハイパーパラメータ探索アルゴリズムが頻繁に使われ、学習や正規化率などのハイパフォーマンスなハイパーパラメータ値を効率的に発見する。その結果、トレーニング中にハイパーパラメータを調整する能力を活用し、損失性能を向上させるために、パラメータスケジュールの幅が設計された。しかし、これらのスケジュールは、探索すべき新しいハイパーパラメータを導入し、トレーニング中のモデルの現在の損失値を考慮しない。これらの課題に対処するため,我々は,人口探索を用いた進捗対応ハイパーパラメータチューニング技術であるPopDescent(PopDescent)を提案する。 PopDescentは進化的および局所的な探索プロセスを統合することで、そのパフォーマンスに基づいてトレーニング中のハイパーパラメータオプションを積極的に探索する。標準的な機械学習ビジョンタスクの試行では、PopDescentは既存の検索手法よりも高速に収束し、テストロス値が最大18%低いモデルパラメータがスケジュールの利用を考慮しても見つかる。さらに,PopDescentの強靭さを,その初期訓練パラメータに強調する。

In contemporary machine learning workloads, numerous hyper-parameter search algorithms are frequently utilized to efficiently discover high-performing hyper-parameter values, such as learning and regularization rates. As a result, a range of parameter schedules have been designed to leverage the capability of adjusting hyper-parameters during training to enhance loss performance. These schedules, however, introduce new hyper-parameters to be searched and do not account for the current loss values of the models being trained. To address these issues, we propose Population Descent (PopDescent), a progress-aware hyper-parameter tuning technique that employs a memetic, population-based search. By merging evolutionary and local search processes, PopDescent proactively explores hyper-parameter options during training based on their performance. Our trials on standard machine learning vision tasks show that PopDescent converges faster than existing search methods, finding model parameters with test-loss values up to 18% lower, even when considering the use of schedules. Moreover, we highlight the robustness of PopDescent to its initial training parameters, a crucial characteristic for hyper-parameter search techniques.

翻訳日:2024-04-26 23:47:37 公開日:2024-04-24

# フリーフォームフロー:任意のアーキテクチャを正規化フローにする

Free-form Flows: Make Any Architecture a Normalizing Flow ( http://arxiv.org/abs/2310.16624v2 )

ライセンス: Link先を確認

Felix Draxler, Peter Sorrenson, Lea Zimmermann, Armand Rousselot, Ullrich Köthe,

(参考訳) 正規化フローは、可能性を直接最大化する生成モデルである。従来, 正規化フローの設計は解析的可逆性の必要性に大きく制約されていた。この制約を,変数式の変化の勾配を効率的に推定する訓練手法によって克服する。これにより、任意の次元保存ニューラルネットワークが、最大限のトレーニングを通じて生成モデルとして機能することが可能になる。当社のアプローチでは,手元にあるタスクに対して,帰納的バイアスを正確に調整することに重点を置くことが可能です。具体的には、$E(n)$-equivariantネットワークを用いた分子生成ベンチマークにおいて優れた結果を得る。さらに,本手法は,市販のResNetアーキテクチャを採用しながら,逆問題ベンチマークにおいて競合する。

Normalizing Flows are generative models that directly maximize the likelihood. Previously, the design of normalizing flows was largely constrained by the need for analytical invertibility. We overcome this constraint by a training procedure that uses an efficient estimator for the gradient of the change of variables formula. This enables any dimension-preserving neural network to serve as a generative model through maximum likelihood training. Our approach allows placing the emphasis on tailoring inductive biases precisely to the task at hand. Specifically, we achieve excellent results in molecule generation benchmarks utilizing $E(n)$-equivariant networks. Moreover, our method is competitive in an inverse problem benchmark, while employing off-the-shelf ResNet architectures.

翻訳日:2024-04-26 23:37:50 公開日:2024-04-24

# UWFormer:半監督型マルチスケール変圧器による水中画像強調

UWFormer: Underwater Image Enhancement via a Semi-Supervised Multi-Scale Transformer ( http://arxiv.org/abs/2310.20210v4 )

ライセンス: Link先を確認

Weiwen Chen, Yingtie Lei, Shenghong Luo, Ziyang Zhou, Mingxian Li, Chi-Man Pun,

(参考訳) 水中画像は、光、水、物体の複雑な複雑な相互作用のため、品質が悪く、色バランスが歪んだり、コントラストが低かったりすることが多い。従来の水中強化技術には大きな貢献があったが、さらなる改善を求める問題がいくつかある。 (i)現在のディープラーニング手法は、マルチスケールの強化を欠いた畳み込みニューラルネットワーク(CNN)に依存しており、グローバルな知覚場も制限されている。 (II)実世界の水中データセットの不足は大きな課題となり、合成画像ペアの利用が過度に適合する可能性がある。上記の問題に対処するため, 半教師付き学習による複数周波数画像の強調を行うUWFormerと呼ばれるマルチスケールトランスフォーマーネットワークを導入し, 低周波数強調のための非線形周波数認識アテンション機構とマルチスケールフュージョンフィードフォワードネットワークを提案する。さらに,水中における半教師付き訓練戦略を導入し,疑似ラベルを生成するためのサブアキュースパーセプティカルロス関数を提案する。完全参照型および非参照型水中ベンチマークを用いた実験により,本手法は,量および視覚的品質の両面で最先端の手法より優れていることが示された。

Underwater images often exhibit poor quality, distorted color balance and low contrast due to the complex and intricate interplay of light, water, and objects. Despite the significant contributions of previous underwater enhancement techniques, there exist several problems that demand further improvement: (i) The current deep learning methods rely on Convolutional Neural Networks (CNNs) that lack the multi-scale enhancement, and global perception field is also limited. (ii) The scarcity of paired real-world underwater datasets poses a significant challenge, and the utilization of synthetic image pairs could lead to overfitting. To address the aforementioned problems, this paper introduces a Multi-scale Transformer-based Network called UWFormer for enhancing images at multiple frequencies via semi-supervised learning, in which we propose a Nonlinear Frequency-aware Attention mechanism and a Multi-Scale Fusion Feed-forward Network for low-frequency enhancement. Besides, we introduce a special underwater semi-supervised training strategy, where we propose a Subaqueous Perceptual Loss function to generate reliable pseudo labels. Experiments using full-reference and non-reference underwater benchmarks demonstrate that our method outperforms state-of-the-art methods in terms of both quantity and visual quality.

翻訳日:2024-04-26 23:37:50 公開日:2024-04-24

# 協調フィルタリングのためのグラフ信号拡散モデル

Graph Signal Diffusion Model for Collaborative Filtering ( http://arxiv.org/abs/2311.08744v3 )

ライセンス: Link先を確認

Yunqin Zhu, Chao Wang, Qi Zhang, Hui Xiong,

(参考訳) 協調フィルタリングはレコメンデータシステムにおいて重要な手法である。ユーザフィードバックデータに対する条件付き生成タスクとして,新たな拡散モデルが大きな可能性を秘めている。しかし、既存の拡散モデルの研究では、暗黙のフィードバックをモデル化するための効果的な解決策が欠如している。特に、標準等方拡散過程は、相互作用空間のグラフィカル構造と誤って、アイテム間の相関性を見落としている。一方、ガウスノイズはユーザのインタラクションベクター内のパーソナライズされた情報を破壊し、その再構築が困難になる。本稿では,標準拡散モデルを適用し,協調フィルタリングのためのグラフ信号拡散モデル(GiffCF)を提案する。ユーザ・イテム相互作用の相関分布をよりよく表現するために、アイテム・イテム類似性グラフ上の熱方程式を用いた一般化拡散過程を定義する。我々のフォワードプロセスは、グラフフィルタの高度なファミリとの相互作用信号を円滑にし、グラフ隣接性を推奨のための有益な事前知識として導入する。我々のリバースプロセスは、ノイズのない方法で遅延信号を反復的に洗練・シャープし、ユーザの履歴に基づいて更新を条件付け、慎重に設計された2段階のデノイザから計算し、高品質な再構築をもたらす。最後に、GiffCFは拡散モデルとグラフ信号処理の両方の利点を効果的に活用し、3つのベンチマークデータセットの最先端性能を実現することを示す。

Collaborative filtering is a critical technique in recommender systems. It has been increasingly viewed as a conditional generative task for user feedback data, where newly developed diffusion model shows great potential. However, existing studies on diffusion model lack effective solutions for modeling implicit feedback. Particularly, the standard isotropic diffusion process overlooks correlation between items, misaligned with the graphical structure of the interaction space. Meanwhile, Gaussian noise destroys personalized information in a user's interaction vector, causing difficulty in its reconstruction. In this paper, we adapt standard diffusion model and propose a novel Graph Signal Diffusion Model for Collaborative Filtering (named GiffCF). To better represent the correlated distribution of user-item interactions, we define a generalized diffusion process using heat equation on the item-item similarity graph. Our forward process smooths interaction signals with an advanced family of graph filters, introducing the graph adjacency as beneficial prior knowledge for recommendation. Our reverse process iteratively refines and sharpens latent signals in a noise-free manner, where the updates are conditioned on the user's history and computed from a carefully designed two-stage denoiser, leading to high-quality reconstruction. Finally, through extensive experiments, we show that GiffCF effectively leverages the advantages of both diffusion model and graph signal processing, and achieves state-of-the-art performance on three benchmark datasets.

翻訳日:2024-04-26 23:37:50 公開日:2024-04-24

# CARE:臨床文献から実験的発見を抽出する

CARE: Extracting Experimental Findings From Clinical Literature ( http://arxiv.org/abs/2311.09736v2 )

ライセンス: Link先を確認

Aakanksha Naik, Bailey Kuehl, Erin Bransom, Doug Downey, Tom Hope,

(参考訳) 文学からきめ細かい実験結果を抽出することは、科学的応用に劇的な有用性をもたらすことができる。それまでの作業では、この問題の限られた側面のためのアノテーションスキーマとデータセットが開発され、現実の複雑さとニュアンスをキャプチャできなかった。バイオメディシンに焦点を当てたこの研究は、臨床所見を抽出するタスクのための新しいIEデータセットであるCAREを提示する。本研究では,非連続的なエンティティスパン,ネスト関係,可変arity n-ary関係,数値結果など,現在のIEシステムにおいて困難な現象を統一する,エンティティと属性間のn-ary関係として微細な発見をキャプチャーする新しいアノテーションスキーマを開発した。臨床治験と症例報告の2つの資料から,700件の抄録を広範囲に収集した。また,コンピュータ科学・材料科学分野へのスキーマの一般化可能性を示す。私たちはCAREで最新のIEシステムをベンチマークし、GPT4のようなモデルでさえ苦労していることを示した。文献を抽出・集約する研究を進めるため、我々の資源を解放する。

Extracting fine-grained experimental findings from literature can provide dramatic utility for scientific applications. Prior work has developed annotation schemas and datasets for limited aspects of this problem, failing to capture the real-world complexity and nuance required. Focusing on biomedicine, this work presents CARE -- a new IE dataset for the task of extracting clinical findings. We develop a new annotation schema capturing fine-grained findings as n-ary relations between entities and attributes, which unifies phenomena challenging for current IE systems such as discontinuous entity spans, nested relations, variable arity n-ary relations and numeric results in a single schema. We collect extensive annotations for 700 abstracts from two sources: clinical trials and case reports. We also demonstrate the generalizability of our schema to the computer science and materials science domains. We benchmark state-of-the-art IE systems on CARE, showing that even models such as GPT4 struggle. We release our resources to advance research on extracting and aggregating literature findings.

翻訳日:2024-04-26 23:37:50 公開日:2024-04-24

# 局所平衡仮定を超えた非平衡温度

The non-equilibrium temperature beyond local equilibrium assumption ( http://arxiv.org/abs/2311.11028v2 )

ライセンス: Link先を確認

Zheng-Chuan Wang,

(参考訳) 本論文では, 環境貯留層を輸送する荷電粒子に対する温度依存性フラソフ方程式による非平衡温度を提案する。新しい減衰力と逆減衰緩和時間は、輸送粒子の外部力と緩和時間に明らかな影響を及ぼすフラソフ方程式に基づいて導出される。輸送粒子の非平衡温度は, 貯留層の平衡温度と異なる平衡関数で定義される。輸送粒子と貯水池の間には、輸送粒子全体が非平衡状態であるため、熱伝達が存在する。最後に、外部電界下での1次元荷電粒子輸送の例を例に、私たちによって定義される非平衡温度と減衰力を数値的に示す。

In this manuscript, we propose a non-equilibrium temperature by a temperature dependent Vlasov equation for the charge particles transport through a environmental reservoir. A new damping force and a inverse damping relaxation time are derived based on the Vlasov equation, which have obvious influence on the external force and the relaxation time of transport particles. The non-equilibrium temperature for the transport particles is defined by their distribution function out of equilibrium, which is different from the equilibrium temperature of reservoir. There exists heat transfer between the transport particles and the reservoir, because the whole transport particles are in non-equilibrium state. Finally, we illustrate them by an example of one-dimensional charge particles transport under an external electric field, the non-equilibrium temperature and damping force defined by us are shown numerically.

翻訳日:2024-04-26 23:37:50 公開日:2024-04-24

# 大規模言語モデルを用いた視覚的ゼロショット学習の強化

Boosting Audio-visual Zero-shot Learning with Large Language Models ( http://arxiv.org/abs/2311.12268v2 )

ライセンス: Link先を確認

Haoxing Chen, Yaohui Li, Yan Hong, Zizheng Huang, Zhuoer Xu, Zhangxuan Gu, Jun Lan, Huijia Zhu, Weiqiang Wang,

(参考訳) 音声視覚ゼロショット学習は、ペア化された音声視覚シーケンスに基づいて、目に見えないクラスを認識することを目的としている。近年の手法は,クラス名に整合したマルチモーダルな特徴の学習に重点を置いており,カテゴリを発見できないような一般化能力の向上に寄与している。しかし、これらのアプローチはクラス名の不明瞭なイベント概念を無視し、必然的に訓練目的の難しい複雑なネットワーク構造を導入する可能性がある。本稿では,外部知識ベースを活用することで,新たなイベントコンテンツをより効果的に学習する上で有効なKDA(KnowleDge-Augmented Audio-Viual Learning)という,単純かつ効率的なフレームワークを提案する。具体的には、まず、大型言語モデル(LLM)に含まれる知識を利用して、イベントクラスの音声・視覚的特徴を識別する重要な記述文を生成することを提案する。さらに,類似した事象を識別し,未確認クラスへの一般化能力の向上を図るために,知識対応型適応マージン損失を提案する。広汎な実験結果から,提案したKDAは,一般的な3つのゼロショット学習データセットに対して,最先端の手法より優れており,我々のコードは \url{https://github.com/chenhaoxing/KDA} で検証可能であることがわかった。

Audio-visual zero-shot learning aims to recognize unseen classes based on paired audio-visual sequences. Recent methods mainly focus on learning multi-modal features aligned with class names to enhance the generalization ability to unseen categories. However, these approaches ignore the obscure event concepts in class names and may inevitably introduce complex network structures with difficult training objectives. In this paper, we introduce a straightforward yet efficient framework called KnowleDge-Augmented audio-visual learning (KDA), which aids the model in more effectively learning novel event content by leveraging an external knowledge base. Specifically, we first propose to utilize the knowledge contained in large language models (LLMs) to generate numerous descriptive sentences that include important distinguishing audio-visual features of event classes, which helps to better understand unseen categories. Furthermore, we propose a knowledge-aware adaptive margin loss to help distinguish similar events, further improving the generalization ability towards unseen classes. Extensive experimental results demonstrate that our proposed KDA can outperform state-of-the-art methods on three popular audio-visual zero-shot learning datasets.Our code will be avaliable at \url{https://github.com/chenhaoxing/KDA}.

翻訳日:2024-04-26 23:27:32 公開日:2024-04-24

# グラフの大規模言語モデルに関する調査 - 進展と今後の方向性

A Survey of Graph Meets Large Language Model: Progress and Future Directions ( http://arxiv.org/abs/2311.12399v4 )

ライセンス: Link先を確認

Yuhan Li, Zhixun Li, Peisong Wang, Jia Li, Xiangguo Sun, Hong Cheng, Jeffrey Xu Yu,

(参考訳) グラフは、引用ネットワーク、ソーシャルネットワーク、生物学的データといった現実世界のアプリケーションにおける複雑な関係を表現し分析する上で重要な役割を果たしている。近年,様々な領域で大きな成功を収めたLarge Language Models (LLM) もグラフ関連タスクに活用され,従来のグラフニューラルネットワーク(GNN)ベースの手法を超越し,最先端のパフォーマンスを実現している。本稿ではまず,LLMとグラフを統合する既存手法の総合的なレビューと分析を行う。まず,グラフ関連タスクにおいてLLMが果たす役割(エンハンサー,予測,アライメント)に基づいて,既存の手法を3つのカテゴリに分類する手法を提案する。次に、分類学の3つのカテゴリに沿って、代表的手法を体系的に調査する。最後に,既存の研究の残余の限界について論じ,今後の研究に期待できる道のりを強調した。関連する論文は要約され、一貫して更新される。 https://github.com/yhLeeee/Awesome-LLMs-in-Graph-tasks。

Graph plays a significant role in representing and analyzing complex relationships in real-world applications such as citation networks, social networks, and biological data. Recently, Large Language Models (LLMs), which have achieved tremendous success in various domains, have also been leveraged in graph-related tasks to surpass traditional Graph Neural Networks (GNNs) based methods and yield state-of-the-art performance. In this survey, we first present a comprehensive review and analysis of existing methods that integrate LLMs with graphs. First of all, we propose a new taxonomy, which organizes existing methods into three categories based on the role (i.e., enhancer, predictor, and alignment component) played by LLMs in graph-related tasks. Then we systematically survey the representative methods along the three categories of the taxonomy. Finally, we discuss the remaining limitations of existing studies and highlight promising avenues for future research. The relevant papers are summarized and will be consistently updated at: https://github.com/yhLeeee/Awesome-LLMs-in-Graph-tasks.

翻訳日:2024-04-26 23:27:32 公開日:2024-04-24

# 一次元モット絶縁体における電荷とエネルギー輸送の動的分離

Dynamical separation of charge and energy transport in one-dimensional Mott insulators ( http://arxiv.org/abs/2311.16234v2 )

ライセンス: Link先を確認

Frederik Møller, Botond C. Nagy, Márton Kormos, Gábor Takács,

(参考訳) 一次元モット絶縁体はシン・ゴルドンモデル(英語版)を用いて記述できるが、これは積分可能場の理論で、閉じ込められた超低温原子による最近の実現を含む、いくつかの1次元のギャップを持つ凝縮物質系の低エネルギーな効率的な記述を提供する。一般化流体力学の理論を用いて、このモデルがトポロジカル電荷対エネルギーの輸送の分離を示すことを示した。準粒子力学の解析により、分離の背後にあるメカニズムは、トポロジカルに荷電したキンク/アンチキンクの間の反射散乱であることが明らかになった。これらの散乱現象の影響は、強い結合と低温において最も顕著であり、準粒子の分布は反射散乱振幅と比較して狭い。この効果により、トポロジカル電荷に対する特徴的な形状の「ローヘッド」光円錐が生じる。

One-dimensional Mott insulators can be described using the sine-Gordon model, an integrable quantum field theory that provides the low-energy effective description of several one-dimensional gapped condensed matter systems, including recent realizations with trapped ultra-cold atoms. Employing the theory of Generalized Hydrodynamics, we demonstrate that this model exhibits separation of the transport of topological charge vs. energy. Analysis of the quasiparticle dynamics reveals that the mechanism behind the separation is the reflective scattering between topologically charged kinks/antikinks. The effect of these scattering events is most pronounced at strong coupling and low temperatures, where the distribution of quasiparticles is narrow compared to the reflective scattering amplitude. This effect results in a distinctively shaped "arrowhead" light cone for the topological charge.

翻訳日:2024-04-26 23:27:32 公開日:2024-04-24

# ヴァルシュニ・ヘルマンポテンシャルのエネルギー固有値の決定

Determination of the Energy Eigenvalues of the Varshni-Hellmann Potential ( http://arxiv.org/abs/2401.11151v4 )

ライセンス: Link先を確認

N. Tazimi,

(参考訳) 本稿では,バルシュニ・ヘルマンポテンシャルの有界状態問題を有用手法を用いて解く。本手法では, アンザッツ法によるヴァルシュニ・ヘルマンポテンシャルに対するシュロディンガー方程式の有界解を求める。エネルギー固有値と対応する固有関数を得る。また、地中におけるエネルギースペクトルの挙動と、2つの身体系の励起状態について図式的に示す。この結果と正確な数値との類似性は,本手法の効率性を示すものである。

In this paper, we solve the bound state problem for Varshni-Hellmann potential via a useful technique. In our technique, we obtain the bound state solution of the Schrodinger equation for the Varshni-Hellmann potential via ansatz method. We obtain the energy eigenvalues and the corresponding eigen-functions. Also, the behavior of the energy spectra for both the ground and the excited state of the two body systems is illustrated graphically. The similarity of our results to the accurate numerical values is indicative of the efficiency of our technique.

翻訳日:2024-04-26 23:27:32 公開日:2024-04-24

# 血圧データから学ぶ:7500万人の患者を対象にしたデモグラフィー

Learning from Two Decades of Blood Pressure Data: Demography-Specific Patterns Across 75 Million Patient Encounters ( http://arxiv.org/abs/2402.01598v3 )

ライセンス: Link先を確認

Seyedeh Somayyeh Mousavi, Yuting Guo, Abeed Sarker, Reza Sameni,

(参考訳) 高血圧は世界的な健康上の問題であり、血圧(BP)動態の効果的なモニタリングと分析の必要性が強調されている。米国ジョージア州のエモリー・ヘルスケアで2000年から2022年の間に収集された2,054,462人の患者75,636,128件のBPデータから,人口統計学的に多様であった。性別,年齢,人種・民族の2変量BP (SBP) と糖尿病BP (DBP) の2変量変化の個体群別統計を比較検討した。分析の結果,雄は雌よりもBP濃度が高く,年齢とともにBPプロファイルが異なっていた。特に、平均的なSBPは年齢とともに常に上昇し、平均的なDBPは40歳以上のグループでピークとなる。調査された民族集団の中で、黒人はBPが極端に高く、標準偏差が大きい。また,SBPとDBPの集団レベルでの有意な相関がみられた。これらの結果は, 臨床診断における画像診断特異的BP分析の重要性を強調し, パーソナライズされた, 画像診断特異的医療介入の開発に有用な知見を提供する。

Hypertension is a global health concern with an increasing prevalence, underscoring the need for effective monitoring and analysis of blood pressure (BP) dynamics. We analyzed a substantial BP dataset comprising 75,636,128 records from 2,054,462 unique patients collected between 2000 and 2022 at Emory Healthcare in Georgia, USA, representing a demographically diverse population. We examined and compared population-wide statistics of bivariate changes in systolic BP (SBP) and diastolic BP (DBP) across sex, age, and race/ethnicity. The analysis revealed that males have higher BP levels than females and exhibit a distinct BP profile with age. Notably, average SBP consistently rises with age, whereas average DBP peaks in the forties age group. Among the ethnic groups studied, Blacks have marginally higher BPs and a greater standard deviation. We also discovered a significant correlation between SBP and DBP at the population level, a phenomenon not previously researched. These results emphasize the importance of demography-specific BP analysis for clinical diagnosis and provide valuable insights for developing personalized, demography-specific healthcare interventions.

翻訳日:2024-04-26 23:27:32 公開日:2024-04-24

# 量子反転:コヒーレント量子吸収器の一般理論

Quantum reversal: a general theory of coherent quantum absorbers ( http://arxiv.org/abs/2402.02502v2 )

ライセンス: Link先を確認

Mankei Tsang,

(参考訳) コヒーレント量子吸収器(コヒーレント量子吸収器、英: coherent quantum absorber)は、他の系によって放出される光子を吸収し、その系との絡み合いを保ちながら、様々な意味を持つ。この研究は、いわゆる逆条件を2つの系に対して提案することで、この概念を一般化する。逆条件は、ペッツ回収マップとクラウス演算子を含む簡潔な公式に厳密に沸騰させ、既存のコヒーレント吸収体の処理を合理化すると共に一般化する。

The fascinating concept of coherent quantum absorber - which can absorb any photon emitted by another system while maintaining entanglement with that system - has found diverse implications in open quantum system theory and quantum metrology. This work generalizes the concept by proposing the so-called reversal conditions for the two systems, in which a "reverser" coherently reverses any effect of the other system on a field. The reversal conditions are rigorously boiled down to concise formulas involving the Petz recovery map and Kraus operators, thereby generalizing as well as streamlining the existing treatments of coherent absorbers.

翻訳日:2024-04-26 23:27:32 公開日:2024-04-24

# 2次元ライドバーグ原子配列におけるアモルファス量子磁石

Amorphous quantum magnets in a two-dimensional Rydberg atom array ( http://arxiv.org/abs/2402.02852v2 )

ライセンス: Link先を確認

Sergi Julià-Farré, Joseph Vovrosh, Alexandre Dauphin,

(参考訳) アモルファス固体(アモルファス固体、すなわち、明確に定義された短距離特性を持つが、長距離秩序を持たない系)は、凝縮物質において重要な研究トピックである。結晶構造は結晶構造と異なることが知られているが、アモルファス材料における創発的な集団的挙動に関する多くのオープンな疑問がある。これは、数値シミュレーションが極めて困難である量子状態において特にそうである。本稿では,アナログ量子シミュレータを用いたアモルファス量子マグネットの探索を提案する。そこで我々はまず,IsingモデルのRydbergシミュレータに適したアモルファス量子磁石を生成するアルゴリズムを提案する。その後、半古典的手法を用いて、モデルの物理に関する予備的な知見を得る。特に強磁性相互作用では平均磁場位相図を計算し、線形スピン波理論を用いて励起の局在特性と動的構造因子を研究する。反強磁性相互作用では、アモルファス磁石は擬似アニールにより複雑な古典的エネルギー景観を示す。最後に,プログラム可能なツイーザアレイにおけるRydberg原子に基づく実験的な提案を概説し,古典的にシミュレートが難しい状態におけるアモルファス量子マグネットの研究への道を開く。

Amorphous solids, i.e., systems which feature well-defined short-range properties but lack long-range order, constitute an important research topic in condensed matter. While their microscopic structure is known to differ from their crystalline counterpart, there are still many open questions concerning the emergent collective behavior in amorphous materials. This is particularly the case in the quantum regime, where the numerical simulations are extremely challenging. In this article, we instead propose to explore amorphous quantum magnets with an analog quantum simulator. To this end, we first present an algorithm to generate amorphous quantum magnets, suitable for Rydberg simulators of the Ising model. Subsequently, we use semiclassical approaches to get a preliminary insight of the physics of the model. In particular, for ferromagnetic interactions, we calculate mean-field phase diagrams, and use the linear-spin-wave theory to study localization properties and dynamical structure factors of the excitations. For antiferromagnetic interactions, we show that amorphous magnets exhibit a complex classical energy landscape by means of simulated annealing. Finally, we outline an experimental proposal based on Rydberg atoms in programmable tweezer arrays, thus opening the road towards the study of amorphous quantum magnets in regimes difficult to simulate classically.

翻訳日:2024-04-26 23:27:32 公開日:2024-04-24

# 量子コンピューティング:ビジョンと課題

Quantum Computing: Vision and Challenges ( http://arxiv.org/abs/2403.02240v2 )

ライセンス: Link先を確認

Sukhpal Singh Gill, Oktay Cetinkaya, Stefano Marrone, Daniel Claudino, David Haunschild, Leon Schlote, Huaming Wu, Carlo Ottaviani, Xiaoyuan Liu, Sree Pragna Machupalli, Kamalpreet Kaur, Priyansh Arora, Ji Liu, Salman Shamshad, Ahmed Farouk, Houbing Herbert Song, Steve Uhlig, Kotagiri Ramamohanarao,

(参考訳) 量子コンピューティングの最近の発展は、絡み合い、重ね合わせ、その他の量子基本概念を用いており、従来の計算よりも大幅に処理上の利点をもたらす。これらの量子的特徴は、従来の計算手法では解けない多くの複雑な問題を解くのに役立つ。これらの問題には、量子力学、ロジスティクス、化学ベースの進歩、薬物設計、統計科学、持続可能なエネルギー、銀行、信頼性のある通信、量子化学工学などが含まれる。ここ数年、量子ソフトウェアやアルゴリズムの作成、量子ハードウェアの研究が目覚ましい進歩を遂げており、量子コンピュータの実現に向けて大きく進歩している。この分野に関する総合的な文献研究を行うことで、現状を把握し、量子コンピューティング業界で働く研究コミュニティからかなりの注意を必要とする未解決の問題を発見できるだろう。本稿では,量子コンピューティングの理解を深めるために,この領域における現在の研究に基づく基礎とビジョンについて考察する。本稿では,量子コンピュータハードウェアの最先端開発と量子暗号,量子ソフトウェア,高スケール性量子コンピュータの今後の進歩について論じる。量子技術の研究と開発における多くの潜在的な課題とエキサイティングな新しいトレンドが、より広範な議論のためにこの論文で強調されている。

The recent development of quantum computing, which uses entanglement, superposition, and other quantum fundamental concepts, can provide substantial processing advantages over traditional computing. These quantum features help solve many complex problems that cannot be solved with conventional computing methods. These problems include modeling quantum mechanics, logistics, chemical-based advances, drug design, statistical science, sustainable energy, banking, reliable communication, and quantum chemical engineering. The last few years have witnessed remarkable advancements in quantum software and algorithm creation and quantum hardware research, which has significantly advanced the prospect of realizing quantum computers. It would be helpful to have comprehensive literature research on this area to grasp the current status and find outstanding problems that require considerable attention from the research community working in the quantum computing industry. To better understand quantum computing, this paper examines the foundations and vision based on current research in this area. We discuss cutting-edge developments in quantum computer hardware advancement and subsequent advances in quantum cryptography, quantum software, and high-scalability quantum computers. Many potential challenges and exciting new trends for quantum technology research and development are highlighted in this paper for a broader debate.

翻訳日:2024-04-26 23:27:32 公開日:2024-04-24

# ヘッドマウントセンサを用いた実時間シミュレーションアバター

Real-Time Simulated Avatar from Head-Mounted Sensors ( http://arxiv.org/abs/2403.06862v2 )

ライセンス: Link先を確認

Zhengyi Luo, Jinkun Cao, Rawal Khirodkar, Alexander Winkler, Jing Huang, Kris Kitani, Weipeng Xu,

(参考訳) 我々はAR/VRヘッドセットから得られた情報(ヘッドセットポーズとカメラ)からシミュレーションアバターを制御するSimXRを提案する。ヘッドマウントカメラの難易度のため、人間の体は視界から切り離され、従来の画像に基づく自我中心のポーズ推定が困難になる。一方、ヘッドセットのポーズは全身の動きに関する貴重な情報を提供するが、手や足の詳細は明らかになっていない。カメラでヘッドセットのポーズを合成するために、人型ロボットを制御してヘッドセットの動きをトラッキングし、入力画像を分析して身体の動きを決定する。体の一部が見えると、手足の動きは画像によって案内され、見えない場合は物理法則が制御器を誘導して可塑性運動を発生させる。我々は,中間表現に依存しないエンドツーエンドの手法を設計し,画像やヘッドセットのポーズから直接ヒューマノイド制御信号にマップする方法を学習する。また,市販のVRヘッドセット(Quest 2)と互換性のあるカメラ構成を用いて作成した大規模合成データセットを提案する。フレームワークの適用性を実証するため、前方カメラを備えたARヘッドセットでもテストしています。

We present SimXR, a method for controlling a simulated avatar from information (headset pose and cameras) obtained from AR / VR headsets. Due to the challenging viewpoint of head-mounted cameras, the human body is often clipped out of view, making traditional image-based egocentric pose estimation challenging. On the other hand, headset poses provide valuable information about overall body motion, but lack fine-grained details about the hands and feet. To synergize headset poses with cameras, we control a humanoid to track headset movement while analyzing input images to decide body movement. When body parts are seen, the movements of hands and feet will be guided by the images; when unseen, the laws of physics guide the controller to generate plausible motion. We design an end-to-end method that does not rely on any intermediate representations and learns to directly map from images and headset poses to humanoid control signals. To train our method, we also propose a large-scale synthetic dataset created using camera configurations compatible with a commercially available VR headset (Quest 2) and show promising results on real-world captures. To demonstrate the applicability of our framework, we also test it on an AR headset with a forward-facing camera.

翻訳日:2024-04-26 23:27:32 公開日:2024-04-24

# Across-Task Transferable Max-Value Entropy Search を用いた多要素ベイズ最適化

Multi-Fidelity Bayesian Optimization With Across-Task Transferable Max-Value Entropy Search ( http://arxiv.org/abs/2403.09570v2 )

ライセンス: Link先を確認

Yunchuan Zhang, Sangwoo Park, Osvaldo Simeone,

(参考訳) 多くのアプリケーションにおいて、ロジスティクスからエンジニアリングまで、設計者は、その目的が評価にコストがかかるブラックボックス関数の形で、一連の最適化タスクに直面している。例えば、デザイナは、時間とともに異なる学習タスクのために、ニューラルネットワークモデルのハイパーパラメータを調整する必要があるかもしれない。各候補解に対する目的関数を評価するのではなく、設計者は目的関数の近似にアクセスでき、高い忠実度評価はより大きなコストを伴う。既存のマルチフィデリティブラックボックス最適化戦略では、現在のタスクの最適値や解に関する情報を最大化することを目的として、候補解とフィデリティレベルを選択する。逐次最適化タスクが関連していると仮定すると,本論文では,現在のタスクに関する情報を取得する必要性と,将来のタスクに転送可能な情報収集の目標とのバランスをとる,新たな情報理論獲得機能を導入する。提案手法は,タスク間で伝達されるタスク間潜伏変数の共有を含む。実世界の実世界の実例にまたがる実験結果から,将来的な課題に適合する提案した提案手法が,十分な数のタスクを処理すれば,最適化効率を大幅に向上できることがわかった。

In many applications, ranging from logistics to engineering, a designer is faced with a sequence of optimization tasks for which the objectives are in the form of black-box functions that are costly to evaluate. For example, the designer may need to tune the hyperparameters of neural network models for different learning tasks over time. Rather than evaluating the objective function for each candidate solution, the designer may have access to approximations of the objective functions, for which higher-fidelity evaluations entail a larger cost. Existing multi-fidelity black-box optimization strategies select candidate solutions and fidelity levels with the goal of maximizing the information accrued about the optimal value or solution for the current task. Assuming that successive optimization tasks are related, this paper introduces a novel information-theoretic acquisition function that balances the need to acquire information about the current task with the goal of collecting information transferable to future tasks. The proposed method includes shared inter-task latent variables, which are transferred across tasks by implementing particle-based variational Bayesian updates. Experimental results across synthetic and real-world examples reveal that the proposed provident acquisition strategy that caters to future tasks can significantly improve the optimization efficiency as soon as a sufficient number of tasks is processed.

翻訳日:2024-04-26 23:27:32 公開日:2024-04-24

# NonGEMM Bench:非GEMMワークロードによる最新のMLワークロードのパフォーマンス水平性を理解する

NonGEMM Bench: Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads ( http://arxiv.org/abs/2404.11788v2 )

ライセンス: Link先を確認

Rachid Karami, Hemanth Kota, Sheng-Chun Kao, Hyoukjun Kwon,

(参考訳) 機械学習(ML)オペレータは、さまざまなターゲットアプリケーションでMLモデルを設計するためのビルディングブロックである。 GEMM演算子は、MLモデルのバックボーンである。彼らは何十億もの乗算と累積を必要とする計算コストで有名だ。そのため,MLモデルの実行を高速化するため,GEMM演算子の研究と最適化に多大な努力が払われている。 GPUとアクセラレータは、GEMM演算子の実行を最適化することで、MLワークロードを高速化するために広くデプロイされている。それでも、非GEMM演算子の性能はGEMMほど徹底的に研究されていない。そこで本稿では,非GEMM演算子のベンチマークである \bench について述べる。まず、さまざまなドメインから人気のMLワークロードを使用して‘bench’を構築し、次に様々なグレードのGPUプラットフォーム上でケーススタディを行い、GPUアクセラレーションシステムにおける非GEMM演算子の挙動を分析する。最後に,GEMM と NonGEMM オペレータ間のギャップを埋める上で重要なポイントをいくつか提示し,新たな最適化の方向性をコミュニティに提供する。

Machine Learning (ML) operators are the building blocks to design ML models with various target applications. GEneral Matrix Multiplication (GEMM) operators are the backbone of ML models. They are notorious for being computationally expensive requiring billions of multiply-and-accumulate. Therefore, significant effort has been put to study and optimize the GEMM operators in order to speed up the execution of ML models. GPUs and accelerators are widely deployed to accelerate ML workloads by optimizing the execution of GEMM operators. Nonetheless, the performance of NonGEMM operators have not been studied as thoroughly as GEMMs. Therefore, this paper describes \bench, a benchmark to study NonGEMM operators. We first construct \bench using popular ML workloads from different domains, then perform case studies on various grade GPU platforms to analyze the behavior of NonGEMM operators in GPU accelerated systems. Finally, we present some key takeaways to bridge the gap between GEMM and NonGEMM operators and to offer the community with potential new optimization directions.

翻訳日:2024-04-26 23:27:32 公開日:2024-04-24

# カオスシステムのシミュレーションのためのハイブリッド量子古典型貯水池計算

Hybrid quantum-classical reservoir computing for simulating chaotic systems ( http://arxiv.org/abs/2311.14105v2 )

ライセンス: Link先を確認

Filip Wudarski, Daniel O`Connor, Shaun Geaney, Ata Akbari Asanjan, Max Wilson, Elena Strbac, P. Aaron Lott, Davide Venturelli,

(参考訳) カオスシステムの予測は特に複雑な作業であり、近年、システムの時空間情報を抽出するために用いられる固定ランダムウェイト(貯水池)を持つ再帰的ネットワークである貯水池コンピューティング(RC)を用いて、合理的に成功している。この研究は、RCの貯水池を量子回路に置き換える、ハイブリッド量子貯水池計算(HQRC)フレームワークを提案する。回路のモジュラ構造と測定フィードバックは、貯水池状態の複雑な系の力学を符号化するために使用され、そこから古典的な学習を行い、将来の力学を予測する。 HQRCのノイズレスシミュレーションは、ロレンツ63とダブルスクロールカオスのパラダイムシステムの両方の最先端の古典的RCモデルに匹敵する有効な予測時間を示し、予測が真実から逸脱してからずっと後のアトラクタダイナミクスに固執する。

Forecasting chaotic systems is a notably complex task, which in recent years has been approached with reasonable success using reservoir computing (RC), a recurrent network with fixed random weights (the reservoir) used to extract the spatio-temporal information of the system. This work presents a hybrid quantum reservoir-computing (HQRC) framework, which replaces the reservoir in RC with a quantum circuit. The modular structure and measurement feedback in the circuit are used to encode the complex system dynamics in the reservoir states, from which classical learning is performed to predict future dynamics. The noiseless simulations of HQRC demonstrate valid prediction times comparable to state-of-the-art classical RC models for both the Lorenz63 and double-scroll chaotic paradigmatic systems and adhere to the attractor dynamics long after the forecasts have deviated from the ground truth.

翻訳日:2024-04-26 23:17:45 公開日:2024-04-24

# DP-NMT:スケーラブルな微分代用機械翻訳

DP-NMT: Scalable Differentially-Private Machine Translation ( http://arxiv.org/abs/2311.14465v2 )

ライセンス: Link先を確認

Timour Igamberdiev, Doan Nam Long Vu, Felix Künnecke, Zhuo Yu, Jannik Holmer, Ivan Habernal,

(参考訳) ニューラルマシン翻訳(NMT)は、広く普及しているテキスト生成タスクであるが、NMTシステムに重大なデータプライバシー上の懸念があるにもかかわらず、プライバシを保存するNMTモデルの開発にはかなりの研究ギャップがある。 DP-SGDは、具体的なプライバシー保証のある機械学習モデルをトレーニングするための一般的な方法であるが、DP-SGDでモデルをトレーニングする実装仕様は、既存のモデルでは常に明確化されていない。これを解決するために,DP-SGDを用いてプライバシー保護NMTの研究を行うオープンソースフレームワークであるDP-NMTを導入し,多数のモデル,データセット,評価指標をひとつのソフトウェアパッケージにまとめる。我々のゴールは、DP-SGDアルゴリズムの具体的詳細を透過的かつ直感的に実装し、プライバシー保護型NMTシステムの開発を進めるためのプラットフォームを提供することです。一般的なドメインとプライバシ関連のドメインのデータセットに関する一連の実験を実施して、使用中のフレームワークを実演しています。フレームワークを公開し、コミュニティからのフィードバックを歓迎します。

Neural machine translation (NMT) is a widely popular text generation task, yet there is a considerable research gap in the development of privacy-preserving NMT models, despite significant data privacy concerns for NMT systems. Differentially private stochastic gradient descent (DP-SGD) is a popular method for training machine learning models with concrete privacy guarantees; however, the implementation specifics of training a model with DP-SGD are not always clarified in existing models, with differing software libraries used and code bases not always being public, leading to reproducibility issues. To tackle this, we introduce DP-NMT, an open-source framework for carrying out research on privacy-preserving NMT with DP-SGD, bringing together numerous models, datasets, and evaluation metrics in one systematic software package. Our goal is to provide a platform for researchers to advance the development of privacy-preserving NMT systems, keeping the specific details of the DP-SGD algorithm transparent and intuitive to implement. We run a set of experiments on datasets from both general and privacy-related domains to demonstrate our framework in use. We make our framework publicly available and welcome feedback from the community.

翻訳日:2024-04-26 23:17:45 公開日:2024-04-24

# ハミルトニアンシミュレーションによるオープン量子系のシミュレーション

Simulating Open Quantum Systems Using Hamiltonian Simulations ( http://arxiv.org/abs/2311.15533v3 )

ライセンス: Link先を確認

Zhiyan Ding, Xiantao Li, Lin Lin,

(参考訳) 我々はリンドブラッド方程式をシミュレートする新しい方法を提案し、リンドブラッド力学、確率微分方程式、ハミルトンシミュレーションの関係を描いている。拡大ヒルベルト空間におけるユニタリ力学の列を導出し、リンドブラッド力学を任意の高次に近似することができる。このユニタリ表現は、ハミルトニアンシミュレーションとアンシラ量子ビットの追跡のみを含む量子回路を用いてシミュレートすることができる。測定結果に追加のポストセレクションは不要であり、各段階での成功確率が保証される。我々の手法は時間に依存した設定に直接一般化することができる。時間に依存しないリンドブレディアン力学と時間に依存しないリンドブレディアン力学の両方を3階まで精度良くシミュレートする数値例を提供する。

We present a novel method to simulate the Lindblad equation, drawing on the relationship between Lindblad dynamics, stochastic differential equations, and Hamiltonian simulations. We derive a sequence of unitary dynamics in an enlarged Hilbert space that can approximate the Lindblad dynamics up to an arbitrarily high order. This unitary representation can then be simulated using a quantum circuit that involves only Hamiltonian simulation and tracing out the ancilla qubits. There is no need for additional postselection in measurement outcomes, ensuring a success probability of one at each stage. Our method can be directly generalized to the time-dependent setting. We provide numerical examples that simulate both time-independent and time-dependent Lindbladian dynamics with accuracy up to the third order.

翻訳日:2024-04-26 23:17:45 公開日:2024-04-24

# ベアメタル埋込デバイスにおける汎用バイナリ機器のアブユージングプロセッサ例外

Abusing Processor Exception for General Binary Instrumentation on Bare-metal Embedded Devices ( http://arxiv.org/abs/2311.16532v2 )

ライセンス: Link先を確認

Shipei Qu, Xiaolin Zhang, Chi Zhang, Dawu Gu,

(参考訳) 組込みシステムにおけるクローズドソースドライバとライブラリのセキュリティの分析は、サプライチェーンにおけるその基本的な役割を考えると、非常に重要である。 x86とは異なり、組み込みプラットフォームには包括的なバイナリ操作ツールがないため、研究者や開発者がそのようなクローズドソースコンポーネントのセキュリティ問題を効果的に検出しパッチするのは難しい。既存の作業は、本格的なオペレーティングシステム機能に依存するか、面倒なコーナーケースに悩まされ、組み込み環境で普及しているベアメタルファームウェアにアプリケーションを制限している。本稿では,埋め込まれたベアメタルファームウェアに対して,汎用的できめ細かな静的バイナリ・インスツルメンテーションを可能にするPIFER(Practical Instrumenting Framework for Embedded fiRmware)を提案する。組み込みプロセッサのハードウェア例外処理機構を悪用することにより、PIFERは任意のターゲットアドレスに対してインスツルメンテーションを行うことができる。さらに,修正後のファームウェアの正しい実行を保証するための命令翻訳方式を提案する。我々は、Zephyr RTOS、CoreMarkベンチマーク、およびクローズソースの商用製品を含む、現実世界の複雑なファームウェアに対してPIFERを評価した。結果は、PIFERが98.9%の指示を正しく測定したことを示している。さらに,本研究の実用性と効率性を示す総合的な性能評価を行った。

Analyzing the security of closed-source drivers and libraries in embedded systems holds significant importance, given their fundamental role in the supply chain. Unlike x86, embedded platforms lack comprehensive binary manipulating tools, making it difficult for researchers and developers to effectively detect and patch security issues in such closed-source components. Existing works either depend on full-fledged operating system features or suffer from tedious corner cases, restricting their application to bare-metal firmware prevalent in embedded environments. In this paper, we present PIFER (Practical Instrumenting Framework for Embedded fiRmware) that enables general and fine-grained static binary instrumentation for embedded bare-metal firmware. By abusing the built-in hardware exception-handling mechanism of the embedded processors, PIFER can perform instrumentation on arbitrary target addresses. Additionally, We propose an instruction translation-based scheme to guarantee the correct execution of the original firmware after patching. We evaluate PIFER against real-world, complex firmware, including Zephyr RTOS, CoreMark benchmark, and a close-sourced commercial product. The results indicate that PIFER correctly instrumented 98.9% of the instructions. Further, a comprehensive performance evaluation was conducted, demonstrating the practicality and efficiency of our work.

翻訳日:2024-04-26 23:17:45 公開日:2024-04-24

# EAGLES: 軽量エンコーディングによる効率的な3Dガウスの高速化

EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS ( http://arxiv.org/abs/2312.04564v2 )

ライセンス: Link先を確認

Sharath Girish, Kamal Gupta, Abhinav Shrivastava,

(参考訳) 近年,3次元ガウシアンスプラッティング(3D-GS)が新規シーン合成で人気を博している。これは、Neural Radiance Fields(NeRF)に関連する、長いトレーニング時間と遅いレンダリング速度の課題に対処する。 3Dガウスの高速かつ微分可能なラスタ化により、3D-GSはリアルタイムレンダリングと高速トレーニングを実現する。しかし、トレーニングとストレージの両方にかなりのメモリリソースを必要とするため、各シーンに何百万人ものガウシアンが必要なのだ。本稿では,ガウス点雲の高速で安定な最適化のために,量子埋め込みを利用してポイント単位のメモリ記憶要求を大幅に削減する手法を提案する。提案手法では,ガウスの少ないシーン表現が実現し,高速なトレーニング時間と高解像度シーンのリアルタイムレンダリングのためのレンダリング速度が向上する。復元品質を維持しながら、記憶容量を1桁以上削減する。 10～20倍少ないメモリと高速なトレーニング/推論速度を消費しながら、視覚的品質を保ったさまざまなデータセットやシーンに対するアプローチの有効性を検証する。プロジェクトページとコードはhttps://efficientgaussian.github.ioで入手できる。

Recently, 3D Gaussian splatting (3D-GS) has gained popularity in novel-view scene synthesis. It addresses the challenges of lengthy training times and slow rendering speeds associated with Neural Radiance Fields (NeRFs). Through rapid, differentiable rasterization of 3D Gaussians, 3D-GS achieves real-time rendering and accelerated training. They, however, demand substantial memory resources for both training and storage, as they require millions of Gaussians in their point cloud representation for each scene. We present a technique utilizing quantized embeddings to significantly reduce per-point memory storage requirements and a coarse-to-fine training strategy for a faster and more stable optimization of the Gaussian point clouds. Our approach develops a pruning stage which results in scene representations with fewer Gaussians, leading to faster training times and rendering speeds for real-time rendering of high resolution scenes. We reduce storage memory by more than an order of magnitude all while preserving the reconstruction quality. We validate the effectiveness of our approach on a variety of datasets and scenes preserving the visual quality while consuming 10-20x lesser memory and faster training/inference speed. Project page and code is available https://efficientgaussian.github.io

翻訳日:2024-04-26 23:17:45 公開日:2024-04-24

# Genixer: 強力なデータジェネレータとしてのマルチモーダル大言語モデル

Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator ( http://arxiv.org/abs/2312.06731v4 )

ライセンス: Link先を確認

Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou,

(参考訳) インストラクションチューニングデータは、MLLM(Multimodal Large Language Models)のトレーニングに不可欠である。しかし、高品質なチューニングチューニングデータの作成には大きな課題がある。命令チューニングデータのラベル付けを人間に依頼することは、ラベル集約的で時間を要する。データ生成のためにGPT-4に誘導されたいくつかの作業は、コストがかかるだけでなく、複雑なタスク(グラウンドベース推論タスク)で満足なパフォーマンスが欠如していた。データ作成の課題に対処するため,ユーザ命令に従うことで命令調整データを生成する能力を備えたMLLMの強化の可能性について,まず検討する。具体的には,9つの代表的なタスク,例えば,共通VQA,REC,REG,PointQを含む,高品質な命令チューニングデータを生成する革新的なデータ生成パイプラインGenixerを開発した。 Genixerは4つの重要なステップでデータ生成に統一されたソリューションを提供する。 (i)命令データ収集 (ii) 命令テンプレートの設計三 MLLMの強化、及び (iv)データ生成とフィルタリング。生成データの有効性を検証するため,人体評価とユーザ嗜好調査を行い,生成データの品質評価を行った。その後、LLaVA1.5とShikraという2つの代表MLLMのトレーニングのための2つの命令チューニングデータセットを生成し、様々なVQAタスクとマルチモーダルベンチマークで一貫した改善を行った。例えば、VizWizベンチマークのパフォーマンスは50.0%から53.8%に向上し、ScienceQAでは66.8%から69.7%に向上した。データ、コード、モデルがリリースされる。

Instruction tuning data is essential for training the Multimodal Large Language Models (MLLMs). However, the creation of high-quality instruction tuning data presents significant challenges. Asking the human to label the instruction tuning data is label-intensive and time-consuming. Some works prompted to GPT-4 for data generation were not only costly but also lacked satisfactory performance in complex tasks (i.e., grounding-based reasoning tasks). To address the challenges of data creation, we are the first to explore the potential of empowering MLLMs with the ability to generate instruction-tuning data by following user instructions. Specifically, we developed an innovative data generation pipeline Genixer to generate various high-quality instruction tuning data, including nine representative tasks, e.g., Common VQA, REC, REG, and PointQ. Genixer provides a unified solution for data generation with four key steps: (i) instruction data collection, (ii) instruction template design, (iii) empowering MLLM, and (iv) data generation and filtering. To validate the effectiveness of generated data, we conducted the human evaluation and user preference study to assess the quality of generated data. Subsequently, we generated two instruction-tuning datasets for the training of two representative MLLMs, LLaVA1.5 and Shikra, and noted consistent improvements across various VQA tasks and multimodal benchmarks. For instance, performance on the VizWiz benchmark improved from 50.0% to 53.8%, and on ScienceQA, it increased from 66.8% to 69.7%, reconfirming the quality of the generated instruction tuning data. The data, code, and models will be released.

翻訳日:2024-04-26 23:17:45 公開日:2024-04-24

# データストリームの動的性質を考慮した条件付き教師なし回帰フレームワーク

A Conditioned Unsupervised Regression Framework Attuned to the Dynamic Nature of Data Streams ( http://arxiv.org/abs/2312.07682v2 )

ライセンス: Link先を確認

Rene Richard, Nabil Belacel,

(参考訳) リアルタイムラベルの取得が困難である場合、従来の手法では、サブ最適性能が得られる。本稿では,制限付きラベル付きデータを用いたストリーミング環境の最適戦略を提案し,教師なし回帰のための適応手法を提案する。提案手法は,初期ラベルのスパースセットを活用し,データの進化パターンに応答して動的モデル適応を可能にする,革新的なドリフト検出機構を導入する。適応性を高めるために,Adaptive WINdowingアルゴリズムとRoot Mean Square Error (RMSE)に基づく誤り一般化アルゴリズムを統合する。 ADWINはリアルタイムドリフト検出を容易にし、RMSEはモデル予測精度のロバストな測度を提供する。この組み合わせにより、高レベルの予測精度を維持しつつ、パターンの変化に継続的に適応しながら、ストリーミングデータの課題を効果的にナビゲートすることが可能になります。各種公開データセットを対象とした多変量法の性能評価を行い, 適応しないベースラインと比較した。包括的評価を通じて、リアルタイムにラベルを取得することが重要な課題となるタスクに対する適応回帰手法の優れた効果を実証する。その結果、従来のアプローチよりも優れ、ラベルの不足とデータパターンの進化を特徴とするシナリオにおいて、その可能性を強調した。

In scenarios where obtaining real-time labels proves challenging, conventional approaches may result in sub-optimal performance. This paper presents an optimal strategy for streaming contexts with limited labeled data, introducing an adaptive technique for unsupervised regression. The proposed method leverages a sparse set of initial labels and introduces an innovative drift detection mechanism to enable dynamic model adaptations in response to evolving patterns in the data. To enhance adaptability, we integrate the ADWIN (ADaptive WINdowing) algorithm with error generalization based on Root Mean Square Error (RMSE). ADWIN facilitates real-time drift detection, while RMSE provides a robust measure of model prediction accuracy. This combination enables our multivariate method to effectively navigate the challenges of streaming data, continuously adapting to changing patterns while maintaining a high level of predictive precision. We evaluate the performance of our multivariate method across various public datasets, comparing it to non-adapting baselines. Through comprehensive assessments, we demonstrate the superior efficacy of our adaptive regression technique for tasks where obtaining labels in real-time is a significant challenge. The results underscore the method's capacity to outperform traditional approaches and highlight its potential in scenarios characterized by label scarcity and evolving data patterns.

翻訳日:2024-04-26 23:17:45 公開日:2024-04-24

# 連続時間動的グラフに対するマルチパースペクティブフィードバック・アテンション結合モデル

Multi-perspective Feedback-attention Coupling Model for Continuous-time Dynamic Graphs ( http://arxiv.org/abs/2312.07983v2 )

ライセンス: Link先を確認

Xiaobo Zhu, Yan Wu, Zhipeng Li, Hailong Su, Jin Che, Zhanheng Chen, Liying Wang,

(参考訳) 近年,グラフネットワーク上での表現学習が普及し,様々なモデルが有望な結果を示している。それにもかかわらず、いくつかの課題が続いている。 1) ほとんどのメソッドは静的あるいは離散時間動的グラフ用に設計されている。 2) 既存の連続時間動的グラフアルゴリズムは、単一の進化的な視点に焦点をあてる。 3) 多くの連続時間動的グラフアプローチは、長期依存を捉えるために多くの時間的隣人を必要とします。本稿では,MPFA(Multi-Perspective Feedback-Attention Coupling)モデルを提案する。 MPFAは進化と生の両方の観点から情報を取り入れ、観察されたプロセスのインターリーブされたダイナミクスを効率的に学習する。進化する視点は、情報集約のために継続的に進化する時間的隣人を区別するために、時間的自己意識を用いる。動的更新を通じて、この視点は少数の時間的隣人を使用して長期的な依存関係をキャプチャすることができる。一方、生の視点は生の近傍情報を集約するために、成長特性係数を持つフィードバックアテンションモジュールを利用する。自己組織型データセットと7つの公開データセットの実験結果から,提案モデルの有効性と競争性を検証した。

Recently, representation learning over graph networks has gained popularity, with various models showing promising results. Despite this, several challenges persist: 1) most methods are designed for static or discrete-time dynamic graphs; 2) existing continuous-time dynamic graph algorithms focus on a single evolving perspective; and 3) many continuous-time dynamic graph approaches necessitate numerous temporal neighbors to capture long-term dependencies. In response, this paper introduces the Multi-Perspective Feedback-Attention Coupling (MPFA) model. MPFA incorporates information from both evolving and raw perspectives, efficiently learning the interleaved dynamics of observed processes. The evolving perspective employs temporal self-attention to distinguish continuously evolving temporal neighbors for information aggregation. Through dynamic updates, this perspective can capture long-term dependencies using a small number of temporal neighbors. Meanwhile, the raw perspective utilizes a feedback attention module with growth characteristic coefficients to aggregate raw neighborhood information. Experimental results on a self-organizing dataset and seven public datasets validate the efficacy and competitiveness of our proposed model.

翻訳日:2024-04-26 23:17:45 公開日:2024-04-24

# PTT:高能率時間3次元物体検出のためのポイントトラジェクトリ変換器

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection ( http://arxiv.org/abs/2312.08371v2 )

ライセンス: Link先を確認

Kuan-Chih Huang, Weijie Lyu, Ming-Hsuan Yang, Yi-Hsuan Tsai,

(参考訳) 近年の時空間LiDARを用いた3Dオブジェクト検出器は,2段階提案に基づく提案手法により有望な性能を実現している。それらは第1段階の高密度検出器から3Dボックス候補を生成し、その後に異なる時間的集約法を生成する。しかしながら、これらのアプローチはフレーム単位のオブジェクトや全体点のクラウドを必要とし、メモリバンクの利用に関する課題を提起する。さらに、点雲と軌道特徴は結合のみに基づいて結合され、それら間の効果的な相互作用を無視する可能性がある。本稿では,時間的3次元物体検出を効率的に行うために,長期記憶が可能なポイントトラジェクトリトランスを提案する。この目的のために、メモリバンクストレージの必要量を最小限に抑えるために、現在のフレームオブジェクトとその履歴トラジェクトリのポイントクラウドのみを入力として利用する。さらに,トラジェクトリ機能をエンコードするモジュールを導入し,長期的かつ将来的な視点に着目し,ポイントクラウド機能で効果的に集約する。我々は、大規模Waymoデータセットの広範な実験を行い、我々のアプローチが最先端の手法に対してうまく機能することを実証した。コードとモデルはhttps://github.com/kuanchihhuang/PTT.comで公開される。

Recent temporal LiDAR-based 3D object detectors achieve promising performance based on the two-stage proposal-based approach. They generate 3D box candidates from the first-stage dense detector, followed by different temporal aggregation methods. However, these approaches require per-frame objects or whole point clouds, posing challenges related to memory bank utilization. Moreover, point clouds and trajectory features are combined solely based on concatenation, which may neglect effective interactions between them. In this paper, we propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection. To this end, we only utilize point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement. Furthermore, we introduce modules to encode trajectory features, focusing on long short-term and future-aware perspectives, and then effectively aggregate them with point cloud features. We conduct extensive experiments on the large-scale Waymo dataset to demonstrate that our approach performs well against state-of-the-art methods. Code and models will be made publicly available at https://github.com/kuanchihhuang/PTT.

翻訳日:2024-04-26 23:17:45 公開日:2024-04-24

# 3次元生成モデルのためのモザイクSDF

Mosaic-SDF for 3D Generative Models ( http://arxiv.org/abs/2312.09222v2 )

ライセンス: Link先を確認

Lior Yariv, Omri Puny, Natalia Neverova, Oran Gafni, Yaron Lipman,

(参考訳) 現在の3次元形状の拡散モデルまたはフローベース生成モデルは、事前訓練された2次元画像拡散モデルを蒸留し、3次元形状を直接訓練する。拡散モデルや流れモデルを3次元形状で訓練する場合、重要な設計選択は形状表現である。効果的な形状表現は、3つの設計原則に従う必要がある: 大きな3Dデータセットを表現形式に効率的に変換すること; 近似パワーとパラメータの数との良好なトレードオフを提供すること; 既存の強力なニューラルネットワークアーキテクチャと互換性のある単純なテンソル形式を持つこと。体積格子や点雲のような標準的な3次元形状表現はこれらすべての原則を同時に従わないが、本稿では新しい表現を提唱する。モーザイクSDF(M-SDF: Mosaic-SDF)は、形状境界付近に広がる局所格子を用いて、与えられた形状の符号距離関数(SDF)を近似した単純な3次元形状表現である。 M-SDF表現は、個々の形状に対して、容易に並列化できるように高速に計算でき、形状の境界付近の空間のみをカバーするため、パラメータ効率が良く、トランスフォーマーベースのアーキテクチャと互換性のある単純な行列形式を持つ。我々は,M-SDF表現の有効性を実演し,M-SDF表現を用いて3Dウェアハウスデータセットを用いたクラス条件付き生成と約600k字幕形状のデータセットを用いたテキストから3D生成を含む3次元生成フローモデルを訓練した。

Current diffusion or flow-based generative models for 3D shapes divide to two: distilling pre-trained 2D image diffusion models, and training directly on 3D shapes. When training a diffusion or flow models on 3D shapes a crucial design choice is the shape representation. An effective shape representation needs to adhere three design principles: it should allow an efficient conversion of large 3D datasets to the representation form; it should provide a good tradeoff of approximation power versus number of parameters; and it should have a simple tensorial form that is compatible with existing powerful neural architectures. While standard 3D shape representations such as volumetric grids and point clouds do not adhere to all these principles simultaneously, we advocate in this paper a new representation that does. We introduce Mosaic-SDF (M-SDF): a simple 3D shape representation that approximates the Signed Distance Function (SDF) of a given shape by using a set of local grids spread near the shape's boundary. The M-SDF representation is fast to compute for each shape individually making it readily parallelizable; it is parameter efficient as it only covers the space around the shape's boundary; and it has a simple matrix form, compatible with Transformer-based architectures. We demonstrate the efficacy of the M-SDF representation by using it to train a 3D generative flow model including class-conditioned generation with the 3D Warehouse dataset, and text-to-3D generation using a dataset of about 600k caption-shape pairs.

翻訳日:2024-04-26 23:08:01 公開日:2024-04-24

# 2レベル量子システムと物理空間の接続の確立に向けて

Towards establishing a connection between two-level quantum systems and physical spaces ( http://arxiv.org/abs/2312.09270v2 )

ライセンス: Link先を確認

V. G. Valle, L. L. Brugger, B. F. Rizzuti, Cristhiano Duarte,

(参考訳) この研究は、ヒルベルト空間における対応する記述(状態として)を用いて、2レベル量子系の準備の間の運用上の接続を明確にすることを目的としている。これは時代遅れに聞こえるかもしれませんが、一般的な感覚以上の関連性があることが、私たちを信じさせます。これら2つの分離された領域(実際の実験室と状態空間)を橋渡しするために、私たちはパラダイム的な数学的対象であるホップフィブレーション(Hopf fibration)に依存している。この接続が簡単な光学的設定で実際にどのように機能するかを説明する。興味深いことに、この光学装置は球体を覆うために2つのチャートを使う必要があることを反映している。別の言い方をすれば、実験的な実現は滑らかな多様体と見なされる球体の2次元性を反映している。

This work seeks to make explicit the operational connection between the preparation of two-level quantum systems with their corresponding description (as states) in a Hilbert space. This may sound outdated, but we show there is more to this connection than common sense may lead us to believe. To bridge these two separated realms -- the actual laboratory and the space of states -- we rely on a paradigmatic mathematical object: the Hopf fibration. We illustrate how this connection works in practice with a simple optical setup. Remarkably, this optical setup also reflects the necessity of using two charts to cover a sphere. Put another way, our experimental realization reflects the bi-dimensionality of a sphere seen as a smooth manifold.

翻訳日:2024-04-26 23:08:00 公開日:2024-04-24

# 意味-数値ギャップのブリッジ:材料特性予測のためのクロスモーダル知識グラフの数値推論法

Bridging the Semantic-Numerical Gap: A Numerical Reasoning Method of Cross-modal Knowledge Graph for Material Property Prediction ( http://arxiv.org/abs/2312.09744v2 )

ライセンス: Link先を確認

Guangxuan Song, Dongmei Fu, Zhongwei Qiu, Zijiang Yang, Jiaxin Dai, Lingwei Ma, Dawei Zhang,

(参考訳) 機械学習(ML)技術を用いて材料特性を予測することが重要な研究トピックである。これらの性質は数値データと意味要因に依存する。小さなサンプルデータセットの制限のため、既存の手法では一般的にMLアルゴリズムを使用して数値特性を回帰したり、トレーニング済みの知識グラフ(KG)を素材に転送する。しかし,これらの手法は意味情報と数値情報を同時に扱うことはできない。本稿では,意味ノードと数値プロキシノードを用いたクロスモーダルKGを構成する材料KG(NR-KG)の数値解析手法を提案する。 KGを標準KGに投影することで、両方のタイプの情報をキャプチャし、グラフニューラルネットワークを使用して材料特性を予測する。このプロセスでは,数値情報から意味的特徴を抽出するために,新しい予測予測損失を提案する。 NR-KGは、小さなサンプルデータセットにおけるクロスモーダルデータ、マイニング関係、クロスモーダル情報のエンドツーエンド処理を容易にし、価値ある実験データを十分に活用して、材料予測を強化する。さらに、意味記述を伴う2つの新しい高エントロピー合金(HEA)特性データセットを提案する。 NR-KGは最先端のSOTA(State-of-the-art)法より優れており、2つの材料データセットに対して25.9%と16.1%の相対的な改善を達成している。さらに、NR-KGは2つの公共物理化学分子データセットのSOTA法を超越し、22.2%と54.3%の改善を示し、その可能性と一般化性を強調している。提案されたデータセット、アルゴリズム、および事前訓練されたモデルが、材料のためのKGとAIのコミュニティを促進することを願っている。

Using machine learning (ML) techniques to predict material properties is a crucial research topic. These properties depend on numerical data and semantic factors. Due to the limitations of small-sample datasets, existing methods typically adopt ML algorithms to regress numerical properties or transfer other pre-trained knowledge graphs (KGs) to the material. However, these methods cannot simultaneously handle semantic and numerical information. In this paper, we propose a numerical reasoning method for material KGs (NR-KG), which constructs a cross-modal KG using semantic nodes and numerical proxy nodes. It captures both types of information by projecting KG into a canonical KG and utilizes a graph neural network to predict material properties. In this process, a novel projection prediction loss is proposed to extract semantic features from numerical information. NR-KG facilitates end-to-end processing of cross-modal data, mining relationships and cross-modal information in small-sample datasets, and fully utilizes valuable experimental data to enhance material prediction. We further propose two new High-Entropy Alloys (HEA) property datasets with semantic descriptions. NR-KG outperforms state-of-the-art (SOTA) methods, achieving relative improvements of 25.9% and 16.1% on two material datasets. Besides, NR-KG surpasses SOTA methods on two public physical chemistry molecular datasets, showing improvements of 22.2% and 54.3%, highlighting its potential application and generalizability. We hope the proposed datasets, algorithms, and pre-trained models can facilitate the communities of KG and AI for materials.

翻訳日:2024-04-26 23:08:00 公開日:2024-04-24

# 高次元構成空間の可視化:包括的解析的アプローチ

Visualizing High-Dimensional Configuration Spaces: A Comprehensive Analytical Approach ( http://arxiv.org/abs/2312.10918v2 )

ライセンス: Link先を確認

Jorge Ocampo Jimenez, Wael Suleiman,

(参考訳) 構成空間Cの表現は、状態の衝突チェックに計算時間の大半が費やされるサンプリングベースモーションプランナーのための衝突のない経路の発見を加速する上で重要な役割を担っている。伝統的に、プランナーは衝突チェッカーを用いて衝突のない経路を限定的に評価したり、可視化のためにCの次元を小さくすることでCの表現を評価する。しかし、衝突チェッカーは、元のCのサブセットだけが表現されている場合でも高い精度を示すことができ、また、移動プランナーが元のCのパスに匹敵するパスを見つける能力を制限することができる。本稿では,マニピュレータロボットの高次元Cs表現を2次元形式で可視化するための新しい手法を提案する。元の寸法を小さくすることなく高次元Cs近似の定性的評価を行うための新しいツールを提供する。これにより、2つの異なる高次元Cの精度とカバレッジを比較する能力が向上する。マニピュレータロボットのキネマティックチェーンと人間の色知覚を利用して,マニピュレータロボットの7自由度CSを用いて,本手法の有効性を示す。この可視化は、ロボットの関節の境界と衝突状態の組み合わせのカバレッジに関する質的な洞察を、元のデータの次元性を低下させることなく提供する。本主張を支持するために,提案した可視化の数値的な評価を行う。

The representation of a Configuration Space C plays a vital role in accelerating the finding of a collision-free path for sampling-based motion planners where the majority of computation time is spent in collision checking of states. Traditionally, planners evaluate C's representations through limited evaluations of collision-free paths using the collision checker or by reducing the dimensionality of C for visualization. However, a collision checker may indicate high accuracy even when only a subset of the original C is represented; limiting the motion planner's ability to find paths comparable to those in the original C. Additionally, dealing with high-dimensional Cs is challenging, as qualitative evaluations become increasingly difficult in dimensions higher than three, where reduced-dimensional C evaluation may decrease accuracy in cluttered environments. In this paper, we present a novel approach for visualizing representations of high-dimensional Cs of manipulator robots in a 2D format. We provide a new tool for qualitative evaluation of high-dimensional Cs approximations without reducing the original dimension. This enhances our ability to compare the accuracy and coverage of two different high-dimensional Cs. Leveraging the kinematic chain of manipulator robots and human color perception, we show the efficacy of our method using a 7-degree-of-freedom CS of a manipulator robot. This visualization offers qualitative insights into the joint boundaries of the robot and the coverage of collision state combinations without reducing the dimensionality of the original data. To support our claim, we conduct a numerical evaluation of the proposed visualization.

翻訳日:2024-04-26 23:08:00 公開日:2024-04-24

# 言語可能な空間オントロジーによる屋内・屋外3次元シーングラフ生成

Indoor and Outdoor 3D Scene Graph Generation via Language-Enabled Spatial Ontologies ( http://arxiv.org/abs/2312.11713v2 )

ライセンス: Link先を確認

Jared Strader, Nathan Hughes, William Chen, Alberto Speranzon, Luca Carlone,

(参考訳) 本稿では,任意の屋内環境と屋外環境に3次元シーングラフを構築する手法を提案する。このような拡張は困難であり、屋外環境を記述する概念の階層は屋内よりも複雑であり、手動でそのような階層を定義するのは時間を要するためスケールしない。さらに、トレーニングデータの欠如は、屋内環境で使用される学習ツールの直接的な適用を妨げている。これらの課題に対処するため、我々は2つの新しい拡張を提案する。まず,室内と屋外のロボット操作に関連する概念と関係を定義する空間オントロジーを構築する手法を開発する。特に、そのようなオントロジーを構築するためにLLM(Large Language Model)を使用します。第2に、論理テンソルネットワーク(LTN)を用いた3次元シーングラフ構築のための空間オントロジーを活用し、論理ルールや公理(例えば「砂を含むビーチ」)を付加することで、トレーニング時に追加の監視信号を提供し、ラベル付きデータの必要性を低減し、より良い予測を提供し、トレーニング時に見つからない概念の予測を可能にする。室内環境,農村環境,沿岸環境など,さまざまなデータセットを用いて本手法を検証した結果,微少な注釈付きデータによる3Dシーングラフ生成の品質向上が確認できた。

This paper proposes an approach to build 3D scene graphs in arbitrary indoor and outdoor environments. Such extension is challenging; the hierarchy of concepts that describe an outdoor environment is more complex than for indoors, and manually defining such hierarchy is time-consuming and does not scale. Furthermore, the lack of training data prevents the straightforward application of learning-based tools used in indoor settings. To address these challenges, we propose two novel extensions. First, we develop methods to build a spatial ontology defining concepts and relations relevant for indoor and outdoor robot operation. In particular, we use a Large Language Model (LLM) to build such an ontology, thus largely reducing the amount of manual effort required. Second, we leverage the spatial ontology for 3D scene graph construction using Logic Tensor Networks (LTN) to add logical rules, or axioms (e.g., "a beach contains sand"), which provide additional supervisory signals at training time thus reducing the need for labelled data, providing better predictions, and even allowing predicting concepts unseen at training time. We test our approach in a variety of datasets, including indoor, rural, and coastal environments, and show that it leads to a significant increase in the quality of the 3D scene graph generation with sparsely annotated data.

翻訳日:2024-04-26 23:08:00 公開日:2024-04-24

# マトリックス生成状態をもつフェルミオン回路の高速エミュレーション

Fast emulation of fermionic circuits with matrix product states ( http://arxiv.org/abs/2312.17657v4 )

ライセンス: Link先を確認

Justin Provazza, Klaas Gunst, Huanchen Zhai, Garnet K. -L. Chan, Toru Shiozaki, Nicholas C. Rubin, Alec F. White,

(参考訳) 本稿では,Fermionic Quantum Emulator (FQE)ソフトウェアライブラリのMPS拡張について述べる。本稿では、スピン1/2フェルミオンの多体波動関数を近似するための対称性適応行列積状態の理論について論じ、FQEインタフェース(MPS-FQE)のオープンソース実装について述べる。このソフトウェアは、ほとんどの基本的なテンソル操作にオープンソースのpyblock3とBlock2ライブラリを使用し、FQEのドロップイン代替として、より大きなフェルミオン回路のより効率的で近似的なエミュレーションを可能にする。最後に、大規模システムの近似的なエミュレーションが期待できる、短期的および耐故障性量子アルゴリズムの両方に関連するいくつかの応用について、量子位相推定のための状態準備戦略の評価、異なる変分量子固有解法Ans\atzeのテスト、トロッター誤差の数値評価、一般的な量子力学問題のシミュレーションを示す。これらすべての例において、MPS-FQEによる近似エミュレーションにより、フルステートベクターエミュレータでアクセス可能なシステムよりもはるかに大きいシステムを扱うことができる。

We describe a matrix product state (MPS) extension for the Fermionic Quantum Emulator (FQE) software library. We discuss the theory behind symmetry adapted matrix product states for approximating many-body wavefunctions of spin-1/2 fermions, and we present an open-source, MPS-enabled implementation of the FQE interface (MPS-FQE). The software uses the open-source pyblock3 and block2 libraries for most elementary tensor operations, and it can largely be used as a drop-in replacement for FQE that allows for more efficient, but approximate, emulation of larger fermionic circuits. Finally, we show several applications relevant to both near-term and fault-tolerant quantum algorithms where approximate emulation of larger systems is expected to be useful: characterization of state preparation strategies for quantum phase estimation, the testing of different variational quantum eigensolver Ans\"atze, the numerical evaluation of Trotter errors, and the simulation of general quantum dynamics problems. In all these examples, approximate emulation with MPS-FQE allows us to treat systems that are significantly larger than those accessible with a full statevector emulator.

翻訳日:2024-04-26 23:08:00 公開日:2024-04-24

# NU-Class Net:ビデオ品質向上のための新しいディープラーニングベースのアプローチ

NU-Class Net: A Novel Deep Learning-based Approach for Video Quality Enhancement ( http://arxiv.org/abs/2401.01163v2 )

ライセンス: Link先を確認

Parham Zilouchian Moghaddam, Mehdi Modarressi, Mohammad Amin Sadeghi,

(参考訳) ビデオコンテンツの人気は急増しており、インターネットトラフィックとIoT(Internet of Things)ネットワークに対する優位性を主張している。ビデオ圧縮は、ビデオキャプチャー装置が生成する実質的なマルチメディアトラフィックを効率的に管理する主要な手段であると考えられてきた。それでも、ビデオ圧縮アルゴリズムは、かなりの圧縮比を達成するために、かなりの計算要求を必要とする。この複雑さは、IoTエッジノードカメラなどのリソース制限された組み込みシステムにおいて、効率的なビデオコーディング標準を実装する上で、非常に難しい課題となる。そこで本研究では,圧縮コーデックの損失による圧縮アーチファクトの軽減を目的とした,革新的な深層学習モデルであるNU-Class Netを提案する。この拡張により、低ビットレートビデオの品質が著しく向上する。 NU-Class Netを利用することで、ビデオキャプチャノード内のビデオエンコーダは出力品質を低減し、低ビットレートのビデオを生成し、エッジでの計算と帯域幅の要求を効果的に調整することができる。デコーダ側では、典型的にはリソース制限の影響を受けないが、NU-Class Netはビデオデコーダの後に適用され、アーティファクトを補償し、元のビデオの品質を近似する。実験により,低ビットレートでストリーミングされたビデオの知覚品質を高めるためのモデルの有効性が確認された。

Video content has experienced a surge in popularity, asserting its dominance over internet traffic and Internet of Things (IoT) networks. Video compression has long been regarded as the primary means of efficiently managing the substantial multimedia traffic generated by video-capturing devices. Nevertheless, video compression algorithms entail significant computational demands in order to achieve substantial compression ratios. This complexity presents a formidable challenge when implementing efficient video coding standards in resource-constrained embedded systems, such as IoT edge node cameras. To tackle this challenge, this paper introduces NU-Class Net, an innovative deep-learning model designed to mitigate compression artifacts stemming from lossy compression codecs. This enhancement significantly elevates the perceptible quality of low-bit-rate videos. By employing the NU-Class Net, the video encoder within the video-capturing node can reduce output quality, thereby generating low-bit-rate videos and effectively curtailing both computation and bandwidth requirements at the edge. On the decoder side, which is typically less encumbered by resource limitations, NU-Class Net is applied after the video decoder to compensate for artifacts and approximate the quality of the original video. Experimental results affirm the efficacy of the proposed model in enhancing the perceptible quality of videos, especially those streamed at low bit rates.

翻訳日:2024-04-26 23:08:00 公開日:2024-04-24

# クローズド述語を用いた既存規則の一貫性クエリー解法

Consistent Query Answering for Existential Rules with Closed Predicates ( http://arxiv.org/abs/2401.05743v2 )

ライセンス: Link先を確認

Lorenzo Marconi, Riccardo Rosati,

(参考訳) Consistent Query Answering (CQA)は、知識ベースとデータベースのデータアクセスに対する一貫性のないアプローチである。 CQAの目標は、一貫性のない情報が存在する場合でも、クエリに意味のある(一貫性のある)回答を提供することである。 CQAのセマンティクスは、修復の概念、すなわち最小限の変更によって得られる初期一貫性のないデータベースの一貫したバージョンに基づいている。既存のルールで表されるデータ依存データベースにおけるCQAについて検討する。より具体的には、タプル生成の依存関係と等価性の生成の依存関係の両方を拡張する、不等式(DED)を伴う、広範囲な結合型依存性のクラスに焦点を当てる。まず、データベース述語がクローズされた場合、すなわち、データベースがそのような述語に関する完全な知識を持っていると仮定し、データベースを修復するタプルの追加は不可能である。このようなシナリオでは、CQAのデータ複雑性と関連するタスク(再チェック)を、異なる意味論(ARとIAR)と異なる存在規則のクラスで詳細に分析する。特に,非巡回型,線形型,完全型,粘着型およびガード型DEDのクラスとその組み合わせについて考察する。

Consistent Query Answering (CQA) is an inconsistency-tolerant approach to data access in knowledge bases and databases. The goal of CQA is to provide meaningful (consistent) answers to queries even in the presence of inconsistent information, e.g. a database whose data conflict with meta-data (typically the database integrity constraints). The semantics of CQA is based on the notion of repair, that is, a consistent version of the initial, inconsistent database that is obtained through minimal modifications. We study CQA in databases with data dependencies expressed by existential rules. More specifically, we focus on the broad class of disjunctive embedded dependencies with inequalities (DEDs), which extend both tuple-generating dependencies and equality-generated dependencies. We first focus on the case when the database predicates are closed, i.e. the database is assumed to have complete knowledge about such predicates, thus no tuple addition is possible to repair the database. In such a scenario, we provide a detailed analysis of the data complexity of CQA and associated tasks (repair checking) under different semantics (AR and IAR) and for different classes of existential rules. In particular, we consider the classes of acyclic, linear, full, sticky and guarded DEDs, and their combinations.

翻訳日:2024-04-26 21:08:18 公開日:2024-04-24

# LLMCheckup:解釈可能性ツールと自己説明による大規模言語モデルの会話的検証

LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools and Self-Explanations ( http://arxiv.org/abs/2401.12576v2 )

ライセンス: Link先を確認

Qianli Wang, Tatiana Anikina, Nils Feldhus, Josef van Genabith, Leonhard Hennig, Sebastian Möller,

(参考訳) 対話形式で説明を提供する解釈可能性ツールは,ユーザへの十分な情報提供に不足する可能性があるため,ユーザの理解を高める効果(Slack et al , 2023; Shen et al , 2023)を示した。しかしながら、対話ベースの説明のための現在のソリューションは、しばしば外部ツールやモジュールを必要とし、設計されていないタスクに簡単に転送できない。 LLMCheckupでは、ユーザが最新の大規模言語モデル(LLM)の振る舞いをチャットできる、容易にアクセスできるツールを提供する。特徴属性などのホワイトボックス説明可能性ツールや自己説明(合理生成など)を含む、説明可能なAI(XAI)メソッドを幅広い範囲に接続することにより、LCMが説明を生成し、微調整なしでユーザ意図の認識を可能にする。 LLMベースの(自己)説明は、フォローアップ質問をサポートし、提案を生成する対話対話として提示される。 LLMCheckupprovidesはシステムで利用可能なオペレーションのチュートリアルを公開し、XAIの様々なレベルの専門知識を持つ個人にケアし、複数の入力モダリティをサポートする。 LLMのユーザ意図認識精度を大幅に向上させる新しい解析手法を提案する。最後に,ファクトチェックとコモンセンス質問応答のタスクに対するLLMCheckupを紹介する。

Interpretability tools that offer explanations in the form of a dialogue have demonstrated their efficacy in enhancing users' understanding (Slack et al., 2023; Shen et al., 2023), as one-off explanations may fall short in providing sufficient information to the user. Current solutions for dialogue-based explanations, however, often require external tools and modules and are not easily transferable to tasks they were not designed for. With LLMCheckup, we present an easily accessible tool that allows users to chat with any state-of-the-art large language model (LLM) about its behavior. We enable LLMs to generate explanations and perform user intent recognition without fine-tuning, by connecting them with a broad spectrum of Explainable AI (XAI) methods, including white-box explainability tools such as feature attributions, and self-explanations (e.g., for rationale generation). LLM-based (self-)explanations are presented as an interactive dialogue that supports follow-up questions and generates suggestions. LLMCheckupprovides tutorials for operations available in the system, catering to individuals with varying levels of expertise in XAI and supporting multiple input modalities. We introduce a new parsing strategy that substantially enhances the user intent recognition accuracy of the LLM. Finally, we showcase LLMCheckup for the tasks of fact checking and commonsense question answering.

翻訳日:2024-04-26 21:08:18 公開日:2024-04-24

# LLMとIDE静的解析による抽出メソッドリファクタリング

Together We Go Further: LLMs and IDE Static Analysis for Extract Method Refactoring ( http://arxiv.org/abs/2401.15298v2 )

ライセンス: Link先を確認

Dorin Pomian, Abhiram Bellur, Malinda Dilhara, Zarina Kurbatova, Egor Bogomolov, Timofey Bryksin, Danny Dig,

(参考訳) 単一のメソッドに複数の責任をカプセル化する長いメソッドはメンテナンスが難しい。新しい手法にどの文を抽出するかを選択することが、多くの研究ツールの標的となっている。着実に改善されているにもかかわらず、これらのツールは、開発者の好みや受け入れ基準に沿ってリファクタリングを生成するのに失敗することが多い。大きな言語モデル(LLM)が大規模なコードコーパスでトレーニングされていることを考えると、開発者が関数を作る方法に精通しているなら、開発者が受け入れそうなリファクタリングを提案するかもしれません。本稿では,LLMの知見とIDEのパワーを相乗的に組み合わせて抽出法(EM)を実行することにより,リファクタリングの科学と実践を推し進める。 1752 EMシナリオに関する我々のフォーマティブな研究により、LSMは専門家による提案を行うのに非常に効果的であるが、信頼できないことが判明した。 LLMが提案する候補から幻覚を取り除く新しいアプローチを設計し、プログラムスライシングから静的解析技術に基づいて提案をさらに強化・ランク付けし、最終的にIDEを利用してリファクタリングを正しく実行した。このアプローチは、EM-Assistと呼ばれるIntelliJ IDEAプラグインで実装しました。我々は,オープンソースプロジェクトから1752個の実際のリファクタリングを複製する多種多様なコーパス上でEM-Assistを実証的に評価した。 EM-Assistは、53.4%のケースで、開発者によるリファクタリングを推奨し、以前のベストプラクティスツールの39.4%のリコール率よりも改善した。さらに,16人の産業開発者を対象に,暖炉調査を行い,最近のコミットをリファクタリングすることを提案した。 81.3%がEM-Assistの勧告に賛成した。

Long methods that encapsulate multiple responsibilities within a single method are challenging to maintain. Choosing which statements to extract into new methods has been the target of many research tools. Despite steady improvements, these tools often fail to generate refactorings that align with developers' preferences and acceptance criteria. Given that Large Language Models (LLMs) have been trained on large code corpora, if we harness their familiarity with the way developers form functions, we could suggest refactorings that developers are likely to accept. In this paper, we advance the science and practice of refactoring by synergistically combining the insights of LLMs with the power of IDEs to perform Extract Method (EM). Our formative study on 1752 EM scenarios revealed that LLMs are very effective for giving expert suggestions, yet they are unreliable: up to 76.3% of the suggestions are hallucinations. We designed a novel approach that removes hallucinations from the candidates suggested by LLMs, then further enhances and ranks suggestions based on static analysis techniques from program slicing, and finally leverages the IDE to execute refactorings correctly. We implemented this approach in an IntelliJ IDEA plugin called EM-Assist. We empirically evaluated EM-Assist on a diverse corpus that replicates 1752 actual refactorings from open-source projects. We found that EM-Assist outperforms previous state of the art tools: EM-Assist suggests the developerperformed refactoring in 53.4% of cases, improving over the recall rate of 39.4% for previous best-in-class tools. Furthermore, we conducted firehouse surveys with 16 industrial developers and suggested refactorings on their recent commits. 81.3% of them agreed with the recommendations provided by EM-Assist.

翻訳日:2024-04-26 21:08:18 公開日:2024-04-24

# ヒト脳波の表現的アライメントによるより人間の脳に似た視力の獲得

Achieving More Human Brain-Like Vision via Human EEG Representational Alignment ( http://arxiv.org/abs/2401.17231v2 )

ライセンス: Link先を確認

Zitong Lu, Yile Wang, Julie D. Golomb,

(参考訳) 人工知能の進歩にもかかわらず、物体認識モデルは人間の脳における視覚情報処理のエミュレートに遅れを取っている。近年の研究では、脳の処理を模倣するために神経データを使用することの可能性を強調している。非侵襲的脳波に基づく視覚モデル「Re(presentational)Al(ignment)net」を初めて提示した。我々の革新的な画像から脳への多層符号化フレームワークは、複数のモデルレイヤーを最適化し、モデルがオブジェクトカテゴリと異なるモダリティをまたいだ人間の脳の視覚的表現パターンを効率的に学習し模倣できるようにすることにより、人間の神経アライメントを向上させる。我々の発見は、ReAlnetが人工と人間の視覚のギャップを埋め、より脳に似た人工知能システムへの道を歩むブレークスルーを表していることを示唆している。

Despite advancements in artificial intelligence, object recognition models still lag behind in emulating visual information processing in human brains. Recent studies have highlighted the potential of using neural data to mimic brain processing; however, these often rely on invasive neural recordings from non-human subjects, leaving a critical gap in understanding human visual perception. Addressing this gap, we present, for the first time, 'Re(presentational)Al(ignment)net', a vision model aligned with human brain activity based on non-invasive EEG, demonstrating a significantly higher similarity to human brain representations. Our innovative image-to-brain multi-layer encoding framework advances human neural alignment by optimizing multiple model layers and enabling the model to efficiently learn and mimic human brain's visual representational patterns across object categories and different modalities. Our findings suggest that ReAlnet represents a breakthrough in bridging the gap between artificial and human vision, and paving the way for more brain-like artificial intelligence systems.

翻訳日:2024-04-26 21:08:18 公開日:2024-04-24

# FuseFormer: 画像と熱画像の融合のためのトランスフォーマー

FuseFormer: A Transformer for Visual and Thermal Image Fusion ( http://arxiv.org/abs/2402.00971v2 )

ライセンス: Link先を確認

Aytekin Erdogan, Erdem Akagündüz,

(参考訳) 画像融合問題に対する決定的な基礎的真理が欠如しているため、損失関数は構造類似度指数測定(SSIM)などの評価指標に基づいて構造化される。しかし、これを行うと、SSIMに対してバイアスが発生し、その結果、入力されたビジュアルバンド画像が生成される。本研究の目的は,古典的評価指標を損失関数として用いた場合の限界を緩和する画像融合問題に対する新しい手法を提案することである。提案手法は,局所的およびグローバルなコンテキスト情報に順応的に対処するトランスフォーマーベースのマルチスケール融合戦略を統合する。この統合により、画像融合プロセスの個々のコンポーネントが洗練されるだけでなく、全体の有効性も大幅に向上する。提案手法は,第1段階において,複数スケールの深部特徴を抽出するオートエンコーダを訓練する2段階の訓練手法に従っている。第2段階では、核融合ブロックを統合し、前述の損失関数を変更する。マルチスケール機能は、畳み込みニューラルネットワーク(CNN)とトランスフォーマーを組み合わせることで融合される。 CNNはローカル機能をキャプチャするために使用され、Transformerは一般的なコンテキスト機能の統合を処理する。種々のベンチマークデータセットに対する広範な実験を通じて,提案手法は新たな損失関数の定義とともに,他の競合融合アルゴリズムと比較して優れた性能を示す。

Due to the lack of a definitive ground truth for the image fusion problem, the loss functions are structured based on evaluation metrics, such as the structural similarity index measure (SSIM). However, in doing so, a bias is introduced toward the SSIM and, consequently, the input visual band image. The objective of this study is to propose a novel methodology for the image fusion problem that mitigates the limitations associated with using classical evaluation metrics as loss functions. Our approach integrates a transformer-based multi-scale fusion strategy that adeptly addresses local and global context information. This integration not only refines the individual components of the image fusion process but also significantly enhances the overall efficacy of the method. Our proposed method follows a two-stage training approach, where an auto-encoder is initially trained to extract deep features at multiple scales in the first stage. For the second stage, we integrate our fusion block and change the loss function as mentioned. The multi-scale features are fused using a combination of Convolutional Neural Networks (CNNs) and Transformers. The CNNs are utilized to capture local features, while the Transformer handles the integration of general context features. Through extensive experiments on various benchmark datasets, our proposed method, along with the novel loss function definition, demonstrates superior performance compared to other competitive fusion algorithms.

翻訳日:2024-04-26 21:08:18 公開日:2024-04-24

# より高速かつ軽量なLDM:現状の課題と今後の展望

Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward ( http://arxiv.org/abs/2402.01799v2 )

ライセンス: Link先を確認

Arnav Chavan, Raghav Magazine, Shubham Kushwaha, Mérouane Debbah, Deepak Gupta,

(参考訳) LLMの優れた性能にもかかわらず、その普及は推論中にかなりの計算とメモリの要求のために困難に直面している。モデル圧縮およびシステムレベルの最適化手法の最近の進歩は、LLM推論を強化することを目的としている。この調査はこれらの手法の概要を提供し、最近の発展を強調している。 LLaMA(/2)-7Bの実験を通じて, 各種圧縮技術の評価を行い, 統一された環境下でのLLMの効率的な展開に関する実用的な知見を提供する。 LLaMA(/2)-7Bの実証分析は,これらの手法の有効性を強調した。調査結果から,現在の限界を特定し,LLM推論効率を改善するための今後の方向性について議論する。我々は、この論文で提示された結果を再現するコードベースをhttps://github.com/nyunAI/Faster-LLM-Surveyでリリースします。

Despite the impressive performance of LLMs, their widespread adoption faces challenges due to substantial computational and memory requirements during inference. Recent advancements in model compression and system-level optimization methods aim to enhance LLM inference. This survey offers an overview of these methods, emphasizing recent developments. Through experiments on LLaMA(/2)-7B, we evaluate various compression techniques, providing practical insights for efficient LLM deployment in a unified setting. The empirical analysis on LLaMA(/2)-7B highlights the effectiveness of these methods. Drawing from survey insights, we identify current limitations and discuss potential future directions to improve LLM inference efficiency. We release the codebase to reproduce the results presented in this paper at https://github.com/nyunAI/Faster-LLM-Survey

翻訳日:2024-04-26 21:08:18 公開日:2024-04-24

# Mambaは文脈内学習が可能なのか?

Is Mamba Capable of In-Context Learning? ( http://arxiv.org/abs/2402.03170v2 )

ライセンス: Link先を確認

Riccardo Grazzi, Julien Siems, Simon Schrodi, Thomas Brox, Frank Hutter,

(参考訳) GPT-4のような最先端技術基盤モデルは、ニューラルネットワークのフォワードパス中にタスクを解決するための学習能力に関するメタラーニングの変種であるインコンテキストラーニング(ICL)において驚くほどうまく機能し、モデルへの入力として提供されるコンテキスト情報を活用する。この有用な機能は、基礎モデルの大規模な事前訓練の副産物として現れる。現在、トランスモデルはICLの最先端技術であるが、この研究は、入力シーケンス長のトランスフォーマーよりも優れたスケールを持つ新しい状態空間モデルであるMambaが、同様のICL機能を持つという実証的な証拠を提供する。我々は,より複雑な自然言語処理問題だけでなく,単純な関数近似を含むタスクにおいて,Mambaを評価した。以上の結果から,タスクのカテゴリによって,MambaはICLのトランスフォーマーモデルの性能と密に一致していることがわかった。さらなる分析により、Mambaは変換器と同様に内部表現を漸進的に最適化することでICL問題を解くように見える。全体としては,長い入力シーケンスを含むICLタスクのトランスフォーマーの代替として,Mambaが有効である可能性が示唆されている。これはメタ学習におけるエキサイティングな発見であり、コンテキスト内で学習したAutoMLアルゴリズム(TabPFNやOptformerなど)の長い入力シーケンスへの一般化を可能にする可能性がある。

State of the art foundation models such as GPT-4 perform surprisingly well at in-context learning (ICL), a variant of meta-learning concerning the learned ability to solve tasks during a neural network forward pass, exploiting contextual information provided as input to the model. This useful ability emerges as a side product of the foundation model's massive pretraining. While transformer models are currently the state of the art in ICL, this work provides empirical evidence that Mamba, a newly proposed state space model which scales better than transformers w.r.t. the input sequence length, has similar ICL capabilities. We evaluated Mamba on tasks involving simple function approximation as well as more complex natural language processing problems. Our results demonstrate that, across both categories of tasks, Mamba closely matches the performance of transformer models for ICL. Further analysis reveals that, like transformers, Mamba appears to solve ICL problems by incrementally optimizing its internal representations. Overall, our work suggests that Mamba can be an efficient alternative to transformers for ICL tasks involving long input sequences. This is an exciting finding in meta-learning and may enable generalizations of in-context learned AutoML algorithms (like TabPFN or Optformer) to long input sequences.

翻訳日:2024-04-26 20:58:26 公開日:2024-04-24

# YOLOv8-AM: YOLOv8 : 小児腰部骨折検出のための注意機構

YOLOv8-AM: YOLOv8 with Attention Mechanisms for Pediatric Wrist Fracture Detection ( http://arxiv.org/abs/2402.09329v4 )

ライセンス: Link先を確認

Chun-Tse Chien, Rui-Yang Ju, Kuang-Yi Chou, Enkaer Xieerke, Jen-Shiun Chiang,

(参考訳) 難治性外傷や骨折は、特に骨折症例のかなりの割合を占める小児において、日常生活において頻繁に起こる。手術の前に、外科医は患者にまずX線撮影を依頼し、放射線医の分析に基づいてそれに備える。ニューラルネットワークの開発に伴い、You Only Look Once (YOLO)シリーズモデルがコンピュータ支援診断(CAD)として骨折検出に広く利用されている。 2023年、UltralyticsはYOLOモデルの最新バージョンを発表した。注意機構は、モデルパフォーマンスを改善する最もホットな方法の1つです。本研究は,本来のYOLOv8アーキテクチャにアテンション機構を組み込んだYOLOv8-AMを提案する。具体的には、4つの注意モジュール、CBAM(Convolutional Block Attention Module)、GAM(Global Attention Mechanism)、ECA(Efficient Channel Attention)、SA(Shuffle Attention)を使用して、改良されたモデルを設計し、GRAZPEDWRI-DXデータセットでトレーニングする。 ResBlock + CBAM (ResCBAM) に基づくYOLOv8-AMモデルのIoU 50(mAP 50)の平均精度は63.6%から65.8%に向上し,SOTAの性能が向上した。逆に、GAMを組み込んだYOLOv8-AMモデルは、mAP 50の64.2%の値を得るが、これは満足のいく拡張ではない。したがって、ResBlockとGAMを組み合わせてResGAMを導入し、新しいYOLOv8-AMモデルを設計し、mAP 50値が65.0%に向上した。この研究の実装コードはGitHubでhttps://github.com/RuiyangJu/Fracture_Detection_Improved_YOLOv8で公開されている。

Wrist trauma and even fractures occur frequently in daily life, particularly among children who account for a significant proportion of fracture cases. Before performing surgery, surgeons often request patients to undergo X-ray imaging first and prepare for it based on the analysis of the radiologist. With the development of neural networks, You Only Look Once (YOLO) series models have been widely used in fracture detection as computer-assisted diagnosis (CAD). In 2023, Ultralytics presented the latest version of the YOLO models, which has been employed for detecting fractures across various parts of the body. Attention mechanism is one of the hottest methods to improve the model performance. This research work proposes YOLOv8-AM, which incorporates the attention mechanism into the original YOLOv8 architecture. Specifically, we respectively employ four attention modules, Convolutional Block Attention Module (CBAM), Global Attention Mechanism (GAM), Efficient Channel Attention (ECA), and Shuffle Attention (SA), to design the improved models and train them on GRAZPEDWRI-DX dataset. Experimental results demonstrate that the mean Average Precision at IoU 50 (mAP 50) of the YOLOv8-AM model based on ResBlock + CBAM (ResCBAM) increased from 63.6% to 65.8%, which achieves the state-of-the-art (SOTA) performance. Conversely, YOLOv8-AM model incorporating GAM obtains the mAP 50 value of 64.2%, which is not a satisfactory enhancement. Therefore, we combine ResBlock and GAM, introducing ResGAM to design another new YOLOv8-AM model, whose mAP 50 value is increased to 65.0%. The implementation code for this study is available on GitHub at https://github.com/RuiyangJu/Fracture_Detection_Improved_YOLOv8.

翻訳日:2024-04-26 20:58:26 公開日:2024-04-24

# MuChin: 音楽分野における言語モデル評価のための中国語の口語記述ベンチマーク

MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music ( http://arxiv.org/abs/2402.09871v3 )

ライセンス: Link先を確認

Zihao Wang, Shuyu Li, Tao Zhang, Qi Wang, Pengfei Yu, Jinyang Luo, Yan Liu, Ming Xi, Kejun Zhang,

(参考訳) 急速に発展するマルチモーダル大言語モデル(LLM)は、音楽の理解とテキスト記述において、そのパフォーマンスを均一に評価するために、新しいベンチマークを必要とする。しかし、音楽情報検索(MIR)アルゴリズムと人間の理解、専門家と一般人の相違、注釈の精度の低さなどにより、既存の音楽記述データセットはベンチマークとして機能することができない。そこで本研究では,中国語における最初のオープンソース音楽記述ベンチマークであるMuChinについて述べる。そこで我々は,革新的な多人数多段階保証手法を取り入れたCaiMAP(Caichong Music Annotation Platform)を構築し,アノテーションの精度と一般的な意味論との整合性を確保するために,アマチュアとプロの両方を雇った。この手法を用いて,多次元で高精度な音楽アノテーションを備えたデータセットであるCaichong Music Dataset (CaiMD)を構築し,Muchinのテストセットとして1,000の高品質なエントリを慎重に選択した。 MuChin を用いて,音楽記述の観点からプロとアマチュアの差異を分析し,微調整 LLM における注釈付きデータの有効性を実証的に実証した。最終的に、我々は既存の音楽理解モデルの評価にMuChinを用いて、音楽の口語的記述を提供する能力について検討した。ベンチマークに関連するすべてのデータとスコアコード、詳細な付録がオープンソース化された(https://github.com/CarlWangChina/MuChin/)。

The rapidly evolving multimodal Large Language Models (LLMs) urgently require new benchmarks to uniformly evaluate their performance on understanding and textually describing music. However, due to semantic gaps between Music Information Retrieval (MIR) algorithms and human understanding, discrepancies between professionals and the public, and low precision of annotations, existing music description datasets cannot serve as benchmarks. To this end, we present MuChin, the first open-source music description benchmark in Chinese colloquial language, designed to evaluate the performance of multimodal LLMs in understanding and describing music. We established the Caichong Music Annotation Platform (CaiMAP) that employs an innovative multi-person, multi-stage assurance method, and recruited both amateurs and professionals to ensure the precision of annotations and alignment with popular semantics. Utilizing this method, we built a dataset with multi-dimensional, high-precision music annotations, the Caichong Music Dataset (CaiMD), and carefully selected 1,000 high-quality entries to serve as the test set for MuChin. Based on MuChin, we analyzed the discrepancies between professionals and amateurs in terms of music description, and empirically demonstrated the effectiveness of annotated data for fine-tuning LLMs. Ultimately, we employed MuChin to evaluate existing music understanding models on their ability to provide colloquial descriptions of music. All data related to the benchmark, along with the scoring code and detailed appendices, have been open-sourced (https://github.com/CarlWangChina/MuChin/).

翻訳日:2024-04-26 20:58:26 公開日:2024-04-24

# NToP:トップビュー魚眼画像における2次元・3次元人物位置推定のためのNeRFを用いた大規模データセット生成

NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images ( http://arxiv.org/abs/2402.18196v2 )

ライセンス: Link先を確認

Jingrui Yu, Dipankar Nandi, Roman Seidel, Gangolf Hirtz,

(参考訳) 魚眼カメラを用いたトップビューでのヒューマンポーズ推定(HPE)は、有望で革新的なアプリケーションドメインを示す。しかし、この視点を捉えたデータセットの可用性は非常に限られており、特に高品質な2Dおよび3Dキーポイントアノテーションがある。このギャップに対処するため、我々はNeural Radiance Fields(NeRF)の技術を活用し、既存の2Dおよび3Dデータセットから人間のポーズデータセットを生成する包括的なパイプラインを構築します。このパイプラインを通じて,魚眼カメラ用の新しいデータセットNToP570K(NeRFを利用した570万枚以上の画像付きトップビューヒューマンポースデータセット)を作成し,そのニューラルネットワークを2次元および3次元のトップビュー人間のポーズ推定のために拡張する効果を広範囲に評価する。事前トレーニングした ViTPose-B モデルでは,トレーニングセットを微調整した後の2次元 HPE の検証セットにおいて,AP が 33.3 % 向上した。同様に微調整されたHybrIK-Transformerモデルは、検証セット上の3D HPEに対してPA-MPJPEを53.7mm削減する。

Human pose estimation (HPE) in the top-view using fisheye cameras presents a promising and innovative application domain. However, the availability of datasets capturing this viewpoint is extremely limited, especially those with high-quality 2D and 3D keypoint annotations. Addressing this gap, we leverage the capabilities of Neural Radiance Fields (NeRF) technique to establish a comprehensive pipeline for generating human pose datasets from existing 2D and 3D datasets, specifically tailored for the top-view fisheye perspective. Through this pipeline, we create a novel dataset NToP570K (NeRF-powered Top-view human Pose dataset for fisheye cameras with over 570 thousand images), and conduct an extensive evaluation of its efficacy in enhancing neural networks for 2D and 3D top-view human pose estimation. A pretrained ViTPose-B model achieves an improvement in AP of 33.3 % on our validation set for 2D HPE after finetuning on our training set. A similarly finetuned HybrIK-Transformer model gains 53.7 mm reduction in PA-MPJPE for 3D HPE on the validation set.

翻訳日:2024-04-26 20:58:26 公開日:2024-04-24

# NiNformer: トケミキシング生成ゲーティング機能を備えたネットワークトランスフォーマーのネットワーク

NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function ( http://arxiv.org/abs/2403.02411v2 )

ライセンス: Link先を確認

Abdullah Nazhat Abdullah, Tarkan Aydin,

(参考訳) AttentionメカニズムはTransformerアーキテクチャの主要なコンポーネントであり、導入以来、多くのドメインと複数のタスクにまたがるディープラーニングの大幅な進歩につながっている。アテンションメカニズムはコンピュータビジョンでビジョントランスフォーマーViTとして利用され、その用途は、分類、セグメンテーション、オブジェクト検出、画像生成など、視覚領域の多くのタスクに拡張されている。このメカニズムは非常に表現力があり能力があるが、計算コストが高く、効率的な最適化のためにかなりのサイズのデータセットを必要とするという欠点がある。これらの欠点に対処するために、計算負担を減らし、データサイズ要件を緩和する多くの設計が文献で提案されている。視覚領域におけるこのような試みの例としては、MLP-Mixer、Conv-Mixer、Perciver-IOなどがある。本稿では,MLPミキサーの静的アプローチを強化するネットワーク・イン・ネットワーク構造を,トークン・ミキシング・プロセスにより要素ワイド・ゲーティング関数を学習する動的システムに置き換えることで,通常のViTブロックに代わる新しい計算ブロックを提案する。広汎な実験により,視覚領域の画像分類タスクに適用された複数のデータセットのベースラインアーキテクチャよりも優れた性能が得られた。

The Attention mechanism is the main component of the Transformer architecture, and since its introduction, it has led to significant advancements in Deep Learning that span many domains and multiple tasks. The Attention Mechanism was utilized in Computer Vision as the Vision Transformer ViT, and its usage has expanded into many tasks in the vision domain, such as classification, segmentation, object detection, and image generation. While this mechanism is very expressive and capable, it comes with the drawback of being computationally expensive and requiring datasets of considerable size for effective optimization. To address these shortcomings, many designs have been proposed in the literature to reduce the computational burden and alleviate the data size requirements. Examples of such attempts in the vision domain are the MLP-Mixer, the Conv-Mixer, the Perciver-IO, and many more. This paper introduces a new computational block as an alternative to the standard ViT block that reduces the compute burdens by replacing the normal Attention layers with a Network in Network structure that enhances the static approach of the MLP Mixer with a dynamic system of learning an element-wise gating function by a token mixing process. Extensive experimentation shows that the proposed design provides better performance than the baseline architectures on multiple datasets applied in the image classification task of the vision domain.

翻訳日:2024-04-26 20:48:34 公開日:2024-04-24

# WMDPベンチマーク:アンラーニングによる悪意的使用の測定と削減

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning ( http://arxiv.org/abs/2403.03218v4 )

ライセンス: Link先を確認

Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer, Samuel Marks, Oam Patel, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Liu, Adam A. Hunt, Justin Tienken-Harder, Kevin Y. Shih, Kemper Talley, John Guan, Russell Kaplan, Ian Steneker, David Campbell, Brad Jokubaitis, Alex Levinson, Jean Wang, William Qian, Kallol Krishna Karmakar, Steven Basart, Stephen Fitz, Mindy Levine, Ponnurangam Kumaraguru, Uday Tupakula, Vijay Varadharajan, Yan Shoshitaishvili, Jimmy Ba, Kevin M. Esvelt, Alexandr Wang, Dan Hendrycks,

(参考訳) ホワイトハウス人工知能に関する大統領令は、生物、サイバー、化学兵器の開発において悪意あるアクターに力を与える大きな言語モデル(LLM)のリスクを強調している。悪意のある使用のリスクを測定するために、政府機関と主要なAIラボは、LLMにおける有害な能力の評価を開発している。しかし、現在の評価は非公開であり、リスク軽減のさらなる研究を妨げている。さらに、悪意のある使用のための、非常に特殊な経路にのみ焦点をあてている。これらのギャップを埋めるために、私たちは、バイオセキュリティ、サイバーセキュリティ、化学セキュリティにおける有害な知識のプロキシ測定として機能する、4,157の多重選択質問のデータセットであるWMDP(Weapons of Mass Destruction Proxy)ベンチマークを公開しました。 WMDPは学者と技術コンサルタントのコンソーシアムによって開発され、公開前に機密情報を除去するために厳格にフィルタリングされた。 WMDPは、まず、LLMにおける有害な知識の評価として、そして次に、そのような有害な知識を取り除くための未学習手法のベンチマークとして、2つの役割を果たす。モデル表現の制御に基づく最先端のアンラーニング手法であるCUTを開発した。 CUTは、生物学や計算機科学などの分野における一般的な能力を保ちながら、WMDPのモデル性能を低下させる。私たちはベンチマークとコードをhttps://wmdp.aiで公開しています。

The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 4,157 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop CUT, a state-of-the-art unlearning method based on controlling model representations. CUT reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at https://wmdp.ai

翻訳日:2024-04-26 20:48:34 公開日:2024-04-24

# 量子因果構造に対するデ・フィネッティの定理

A de Finetti theorem for quantum causal structures ( http://arxiv.org/abs/2403.10316v2 )

ライセンス: Link先を確認

Fabio Costa, Jonathan Barrett, Sally Shrapnel,

(参考訳) 因果構造が'未知'である,という意味は何でしょうか? 因果関係に関する事前の知識のない実験の「繰り返し」についても話せるだろうか? そして、任意の、あるいは不確定な因果構造を持つプロセスの集合が独立かつ同一に分散されていると、どのような条件で言えるだろうか? 古典的確率、量子状態、量子チャネルに関する同様の質問は、「デ・フィネッティの定理(de Finetti theorems)」と呼ばれる、単純で修正が容易な条件(交換下での対称性)と非常に特殊な多部構造(同じ状態とチャネルの混合)を結びつけて、美しく答えられる。ここでは、任意の因果構造を持つプロセスに結果を拡張し、不定因果順序や、雑音量子デバイスに適用可能なマルチ時間非マルコフ過程を含む。この結果はまた、線形制約の大きい量子状態に対する新しいクラスであるデ・フィネッティの定理も意味しており、これは独立な興味を持つことができる。

What does it mean for a causal structure to be `unknown'? Can we even talk about `repetitions' of an experiment without prior knowledge of causal relations? And under what conditions can we say that a set of processes with arbitrary, possibly indefinite, causal structure are independent and identically distributed? Similar questions for classical probabilities, quantum states, and quantum channels are beautifully answered by so-called "de Finetti theorems", which connect a simple and easy-to-justify condition -- symmetry under exchange -- with a very particular multipartite structure: a mixture of identical states/channels. Here we extend the result to processes with arbitrary causal structure, including indefinite causal order and multi-time, non-Markovian processes applicable to noisy quantum devices. The result also implies a new class of de Finetti theorems for quantum states subject to a large class of linear constraints, which can be of independent interest.

翻訳日:2024-04-26 20:48:34 公開日:2024-04-24

# SiMBA: 視覚と多変量時系列のためのシンプルマンバベースアーキテクチャ

SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series ( http://arxiv.org/abs/2403.15360v2 )

ライセンス: Link先を確認

Badri N. Patro, Vijay S. Agneeswaran,

(参考訳) トランスフォーマーは、シーケンスミキシングのための注意ネットワークとチャネルミキシングのためのMDPを広く採用しており、ドメイン間のブレークスルーを達成する上で重要な役割を担っている。しかし、近年の文献では、低い帰納バイアスや入力シーケンス長に関する二次的複雑さなど、注意ネットワークの問題が強調されている。 S4などの状態空間モデル(Hippo、Global Convolutions、Liquid S4、LRU、Mega、Mamba)は、長いシーケンス長を扱うために上記の問題に対処するために登場した。 Mambaは最先端のSSMだが、コンピュータビジョンデータセットの大規模ネットワークにスケールする場合、安定性に問題がある。我々は,特定の固有値計算によるチャネルモデリングのためのEinstein FFT(EinFFT)を導入し,シーケンスモデリングにMambaブロックを用いる新しいアーキテクチャであるSiMBAを提案する。画像と時系列のベンチマークによる大規模なパフォーマンス調査は、SiMBAが既存のSSMよりも優れており、最先端のトランスフォーマーとパフォーマンスギャップを埋めていることを示している。特に、SiMBAは、ImageNetとStanford CarやFlowerなどのトランスファーラーニングベンチマーク、タスクラーニングベンチマーク、および7つの時系列ベンチマークデータセットにおいて、最先端のSSMとしての地位を確立している。プロジェクトページは ~\url{https://github.com/badripatro/Simba} で公開されている。

Transformers have widely adopted attention networks for sequence mixing and MLPs for channel mixing, playing a pivotal role in achieving breakthroughs across domains. However, recent literature highlights issues with attention networks, including low inductive bias and quadratic complexity concerning input sequence length. State Space Models (SSMs) like S4 and others (Hippo, Global Convolutions, liquid S4, LRU, Mega, and Mamba), have emerged to address the above issues to help handle longer sequence lengths. Mamba, while being the state-of-the-art SSM, has a stability issue when scaled to large networks for computer vision datasets. We propose SiMBA, a new architecture that introduces Einstein FFT (EinFFT) for channel modeling by specific eigenvalue computations and uses the Mamba block for sequence modeling. Extensive performance studies across image and time-series benchmarks demonstrate that SiMBA outperforms existing SSMs, bridging the performance gap with state-of-the-art transformers. Notably, SiMBA establishes itself as the new state-of-the-art SSM on ImageNet and transfer learning benchmarks such as Stanford Car and Flower as well as task learning benchmarks as well as seven time series benchmark datasets. The project page is available on this website ~\url{https://github.com/badripatro/Simba}.

翻訳日:2024-04-26 20:48:34 公開日:2024-04-24

# 確率モデルを用いた入院電子健康記録の逐次推定

Sequential Inference of Hospitalization Electronic Health Records Using Probabilistic Models ( http://arxiv.org/abs/2403.19011v2 )

ライセンス: Link先を確認

Alan D. Kaplan, Priyadip Ray, John D. Greene, Vincent X. Liu,

(参考訳) ダイナミックな病院環境では、意思決定支援は患者の成果を改善する貴重なツールとなり得る。このダイナミックな環境では、実験室のテストや薬品などの長いシーケンスを頻繁に更新するデータ駆動推論が困難である。これは、データ型と可変長列に含まれる混合シーケンス型の不均一性による部分もある。本研究では,入院電子健康記録(EHR)データに含まれる複数の任意長配列に対する確率的教師なしモデルの設計を行う。このモデルは潜在変数構造を使用し、薬物、診断、実験室のテスト、神経学的評価、薬物の間の複雑な関係を捉えている。損失のある変換や時間ビンニングを必要とせずに、オリジナルのデータでトレーニングすることができる。推論アルゴリズムは、部分的データを用いて、その長さや特定の値の存在を含む完全なシーケンスの特性を推測する。我々は,北カリフォルニアのKaiser Permanente(カイザー・パーマネンテ)統合型ヘルスケアデリバリーシステムにおいて,医療を受ける被験者のデータに基づいて,このモデルをトレーニングする。その結果,入院ベッドにおける集中治療室 (ICU) の長さと存在を予測するための保留データと比較した。提案手法はベースライン手法よりも優れており,これらの実験では,学習したモデルが将来の値を示すシーケンスで情報をキャプチャすることを示す。

In the dynamic hospital setting, decision support can be a valuable tool for improving patient outcomes. Data-driven inference of future outcomes is challenging in this dynamic setting, where long sequences such as laboratory tests and medications are updated frequently. This is due in part to heterogeneity of data types and mixed-sequence types contained in variable length sequences. In this work we design a probabilistic unsupervised model for multiple arbitrary-length sequences contained in hospitalization Electronic Health Record (EHR) data. The model uses a latent variable structure and captures complex relationships between medications, diagnoses, laboratory tests, neurological assessments, and medications. It can be trained on original data, without requiring any lossy transformations or time binning. Inference algorithms are derived that use partial data to infer properties of the complete sequences, including their length and presence of specific values. We train this model on data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system. The results are evaluated against held-out data for predicting the length of sequences and presence of Intensive Care Unit (ICU) in hospitalization bed sequences. Our method outperforms a baseline approach, showing that in these experiments the trained model captures information in the sequences that is informative of their future values.

翻訳日:2024-04-26 20:48:34 公開日:2024-04-24

# ILPO-NET:3次元における任意の体積パターンの不変認識のためのネットワーク

ILPO-NET: Network for the invariant recognition of arbitrary volumetric patterns in 3D ( http://arxiv.org/abs/2403.19612v3 )

ライセンス: Link先を確認

Dmitrii Zhemchuzhnikov, Sergei Grudinin,

(参考訳) 空間パターンの効果的な認識とそれらの階層の学習は、現代の空間データ分析において不可欠である。ボリュームデータアプリケーションは、シフトだけでなく、パターンの回転にも不変性を保証する技術を模索している。従来の方法では翻訳的不変性は容易に達成できるが、回転的不変性には複数の課題があり、研究の活発な領域として残っている。本稿では、Wigner行列拡張を用いた局所的な空間パターン配向に本質的に不変な畳み込み操作で任意の形状のパターンを扱う新しいアプローチであるILPO-Net(Invariant to Local Patterns Orientation Network)を提案する。我々のアーキテクチャは新たな畳み込み演算子をシームレスに統合し、MedMNISTやCATHといった多様なボリュームデータセットをベンチマークすると、パラメータ数を大幅に削減したベースラインよりも優れた性能を示し、MedMNISTの1000倍も少ない。これらの実証の他に、ILPO-Netの回転不変性は、複数の分野にわたる他のアプリケーションへの道を開く。私たちのコードはhttps://gricad-gitlab.univ-grenoble-alpes.fr/GruLab/ILPO/-/tree/main/ILPONetで公開されています。

Effective recognition of spatial patterns and learning their hierarchy is crucial in modern spatial data analysis. Volumetric data applications seek techniques ensuring invariance not only to shifts but also to pattern rotations. While traditional methods can readily achieve translational invariance, rotational invariance possesses multiple challenges and remains an active area of research. Here, we present ILPO-Net (Invariant to Local Patterns Orientation Network), a novel approach that handles arbitrarily shaped patterns with the convolutional operation inherently invariant to local spatial pattern orientations using the Wigner matrix expansions. Our architecture seamlessly integrates the new convolution operator and, when benchmarked on diverse volumetric datasets such as MedMNIST and CATH, demonstrates superior performance over the baselines with significantly reduced parameter counts - up to 1000 times fewer in the case of MedMNIST. Beyond these demonstrations, ILPO-Net's rotational invariance paves the way for other applications across multiple disciplines. Our code is publicly available at https://gricad-gitlab.univ-grenoble-alpes.fr/GruLab/ILPO/-/tree/main/ILPONet.

翻訳日:2024-04-26 20:38:42 公開日:2024-04-24

# UltraLight VM-UNet: Parallel Vision Mamba が皮膚病変セグメンテーションのパラメータを著しく削減

UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation ( http://arxiv.org/abs/2403.20035v3 )

ライセンス: Link先を確認

Renkai Wu, Yinghao Liu, Pengchen Liang, Qing Chang,

(参考訳) 伝統的にモデルのセグメンテーション性能を改善するために、ほとんどのアプローチはより複雑なモジュールを追加することを好む。また,これは医療分野,特にモバイル医療機器には適さない。計算負荷モデルでは,計算資源の制約により実際の臨床環境には適さない。近年、Mambaによって代表される状態空間モデル(SSM)は、従来のCNNやTransformerと強力な競合関係にある。本稿では,マンバにおけるパラメータの影響の鍵となる要素を深く探求し,これに基づくUltraLight Vision Mamba UNet(UltraLight VM-UNet)を提案する。具体的には、処理チャネルの全体数を一定に保ちながら、最小の計算負荷で優れた性能を実現する、PVM Layerという並列ビジョン・マンバの並列処理手法を提案する。以上の結果から,UltraLight VM-UNetは0.049M,GFLOPs 0.060のパラメータと同等の性能を示すことを示した。さらに,本研究では,マンバのパラメータ影響の鍵となる要素を深く研究し,マンバが将来,軽量化のための新たなメインストリームモジュールとなるための理論的基盤となることを示唆する。コードはhttps://github.com/wurenkai/UltraLight-VM-UNetから入手できる。

Traditionally for improving the segmentation performance of models, most approaches prefer to use adding more complex modules. And this is not suitable for the medical field, especially for mobile medical devices, where computationally loaded models are not suitable for real clinical environments due to computational resource constraints. Recently, state-space models (SSMs), represented by Mamba, have become a strong competitor to traditional CNNs and Transformers. In this paper, we deeply explore the key elements of parameter influence in Mamba and propose an UltraLight Vision Mamba UNet (UltraLight VM-UNet) based on this. Specifically, we propose a method for processing features in parallel Vision Mamba, named PVM Layer, which achieves excellent performance with the lowest computational load while keeping the overall number of processing channels constant. We conducted comparisons and ablation experiments with several state-of-the-art lightweight models on three skin lesion public datasets and demonstrated that the UltraLight VM-UNet exhibits the same strong performance competitiveness with parameters of only 0.049M and GFLOPs of 0.060. In addition, this study deeply explores the key elements of parameter influence in Mamba, which will lay a theoretical foundation for Mamba to possibly become a new mainstream module for lightweighting in the future. The code is available from https://github.com/wurenkai/UltraLight-VM-UNet .

翻訳日:2024-04-26 20:38:42 公開日:2024-04-24

# マルチモーダル脳画像翻訳のための教師なし腫瘍認識蒸留法

Unsupervised Tumor-Aware Distillation for Multi-Modal Brain Image Translation ( http://arxiv.org/abs/2403.20168v2 )

ライセンス: Link先を確認

Chuan Huang, Jia Wei, Rui Li,

(参考訳) MRIスキャンによるマルチモーダル脳画像は、様々なモダリティから補完的な情報を提供するために臨床診断に広く用いられている。しかし、時間、コスト、アーティファクトといった様々な要因により、実際に完全にペア化されたマルチモーダル画像を得るのは難しいため、モダリティを欠く脳画像が得られる。この問題に対処するために、教師なしマルチモーダル脳画像翻訳が広く研究されている。既存の方法は、画像全体を翻訳する際に腫瘍領域に集中できないため、翻訳中の脳腫瘍の変形の問題に悩まされている。本稿では, 腫瘍領域を正確に知覚・翻訳できる, UTAD-Net と呼ばれる教師なしの蒸留指導者ネットワークを提案する。具体的には,教師ネットワークと学生ネットワークの2つの部分から構成される。教師ネットワークは、まず、未ペア画像と対応する腫瘍マスクを用いて、ソースからターゲットモダリティへのエンドツーエンドマッピングを学習する。そして、翻訳知識を学生ネットワークに蒸留し、マスクなしでより現実的な腫瘍領域と画像全体を生成する。実験により, 画像品質の定量評価と定性評価の両面において, 最先端の手法と比較して競合性能が得られた。さらに、下流セグメンテーションタスクにおいて生成された画像の有効性を示す。私たちのコードはhttps://github.com/scut-HC/UTAD-Net.orgで公開されています。

Multi-modal brain images from MRI scans are widely used in clinical diagnosis to provide complementary information from different modalities. However, obtaining fully paired multi-modal images in practice is challenging due to various factors, such as time, cost, and artifacts, resulting in modality-missing brain images. To address this problem, unsupervised multi-modal brain image translation has been extensively studied. Existing methods suffer from the problem of brain tumor deformation during translation, as they fail to focus on the tumor areas when translating the whole images. In this paper, we propose an unsupervised tumor-aware distillation teacher-student network called UTAD-Net, which is capable of perceiving and translating tumor areas precisely. Specifically, our model consists of two parts: a teacher network and a student network. The teacher network learns an end-to-end mapping from source to target modality using unpaired images and corresponding tumor masks first. Then, the translation knowledge is distilled into the student network, enabling it to generate more realistic tumor areas and whole images without masks. Experiments show that our model achieves competitive performance on both quantitative and qualitative evaluations of image quality compared with state-of-the-art methods. Furthermore, we demonstrate the effectiveness of the generated images on downstream segmentation tasks. Our code is available at https://github.com/scut-HC/UTAD-Net.

翻訳日:2024-04-26 20:38:42 公開日:2024-04-24

# ピクセルからグラフへ:視覚言語モデルを用いたオープン語彙シーングラフ生成

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models ( http://arxiv.org/abs/2404.00906v3 )

ライセンス: Link先を確認

Rongjie Li, Songyang Zhang, Dahua Lin, Kai Chen, Xuming He,

(参考訳) シーングラフ生成(SGG)は、下流の推論タスクのための中間グラフ表現に視覚シーンを解析することを目的としている。近年の進歩にもかかわらず、既存の手法は、新しい視覚的関係の概念を持つシーングラフを生成するのに苦労している。この課題に対処するために、シークエンス生成に基づく新しいオープン語彙SGGフレームワークを導入する。我々のフレームワークは、画像からグラフへの生成パラダイムを取り入れた視覚言語事前学習モデル(VLM)を活用している。具体的には,VLMを用いた画像からテキストへの生成によってシーングラフのシーケンスを生成し,これらのシーケンスからシーングラフを構築する。これにより、オープン語彙SGGにおけるVLMの強みを活用し、VLタスクを強化するための明示的リレーショナルモデリングをシームレスに統合する。実験結果から,我々の設計はオープンな語彙で優れた性能を達成できるだけでなく,明示的な関係モデリング知識を通じて,下流の視覚言語タスク性能を向上させることが示唆された。

Scene graph generation (SGG) aims to parse a visual scene into an intermediate graph representation for downstream reasoning tasks. Despite recent advancements, existing methods struggle to generate scene graphs with novel visual relation concepts. To address this challenge, we introduce a new open-vocabulary SGG framework based on sequence generation. Our framework leverages vision-language pre-trained models (VLM) by incorporating an image-to-graph generation paradigm. Specifically, we generate scene graph sequences via image-to-text generation with VLM and then construct scene graphs from these sequences. By doing so, we harness the strong capabilities of VLM for open-vocabulary SGG and seamlessly integrate explicit relational modeling for enhancing the VL tasks. Experimental results demonstrate that our design not only achieves superior performance with an open vocabulary but also enhances downstream vision-language task performance through explicit relation modeling knowledge.

翻訳日:2024-04-26 20:38:42 公開日:2024-04-24

# Poro 34Bと多言語性の祝福

Poro 34B and the Blessing of Multilinguality ( http://arxiv.org/abs/2404.01856v2 )

ライセンス: Link先を確認

Risto Luukkonen, Jonathan Burdge, Elaine Zosa, Aarne Talman, Ville Komulainen, Väinö Hatanpää, Peter Sarlin, Sampo Pyysalo,

(参考訳) 最先端の大規模言語モデルの事前訓練は、今や数兆ワードのテキストを必要としており、これは大多数の言語で利用できるものよりも桁違いに多い。複数の言語にテキストを含めることは、より事前訓練されたデータを取得するための明らかな方法であるが、多言語性はしばしば呪いと見なされる。我々は、多言語性は祝福であり、多言語学習を通じて、小言語に対する単言語モデルの性能を大幅に向上させることが可能であると信じている。本研究では, フィンランド語, 英語, プログラミング言語の1兆トークンに対して訓練された34億のパラメータモデルであるPoro 34Bを紹介し, 多言語学習アプローチは, 既存のフィンランド語モデルの能力よりも大幅に進歩するだけでなく, 翻訳に優れ, 英語やプログラミング言語の生成においてそのクラスにおいて競争力を持つモデルを生成することができることを示した。我々は、オープンライセンスの下でモデルパラメータ、スクリプト、データをhttps://huggingface.co/LumiOpen/Poro-34Bでリリースします。

The pretraining of state-of-the-art large language models now requires trillions of words of text, which is orders of magnitude more than available for the vast majority of languages. While including text in more than one language is an obvious way to acquire more pretraining data, multilinguality is often seen as a curse, and most model training efforts continue to focus near-exclusively on individual large languages. We believe that multilinguality can be a blessing and that it should be possible to substantially improve over the capabilities of monolingual models for small languages through multilingual training. In this study, we introduce Poro 34B, a 34 billion parameter model trained for 1 trillion tokens of Finnish, English, and programming languages, and demonstrate that a multilingual training approach can produce a model that not only substantially advances over the capabilities of existing models for Finnish, but also excels in translation and is competitive in its class in generating English and programming languages. We release the model parameters, scripts, and data under open licenses at https://huggingface.co/LumiOpen/Poro-34B.

翻訳日:2024-04-26 20:38:42 公開日:2024-04-24

# DeepFunction:関数データを用いたスレッドパイプ接続欠陥の診断のための深度学習に基づく不均衡分類

DeepFunction: Deep Metric Learning-based Imbalanced Classification for Diagnosing Threaded Pipe Connection Defects using Functional Data ( http://arxiv.org/abs/2404.03329v2 )

ライセンス: Link先を確認

Yukun Xie, Juan Du, Chen Zhang,

(参考訳) 現代の製造業では、ほとんどの製品ラインが適合している。非コンフォーミングな製品はほとんどないが、欠陥タイプが異なる。欠陥型の同定は、生産ラインのさらなる根本原因診断に役立つ。センサの開発により、プロセス変数の信号を高分解能で収集することができ、マルチチャネル機能データと見なすことができる。プロセスの特徴と欠陥のタイプを特定するのに役立つ、豊富な情報があります。パイプの締め付けプロセスの実際の例に触発され、各サンプルが多チャンネル関数データである欠陥分類に焦点をあてる。しかし、各欠陥タイプのサンプルは制限され、不均衡である。また、パイプ締め付け工程の前の密閉前工程が未保存であるため、機能は不完全である。不均衡、マルチチャネル、不完全な機能データに基づいて欠陥サンプルを分類するのは非常に重要であるが困難である。そこで我々は,関数型データ(DeepFunction)を用いたディープメトリック学習に基づく,革新的な分類フレームワークを提案する。このフレームワークは、深いメトリック学習の力を活用して、不均衡なデータセットをトレーニングする。関数データを処理するために特別に設計されたニューラルネットワークも、多チャンネルおよび不完全な関数データを扱うために提案されている。実世界のケーススタディの結果は、既存のベンチマークと比較すると、我々のフレームワークの精度が優れていることを示している。

In modern manufacturing, most of the product lines are conforming. Few products are nonconforming but with different defect types. The identification of defect types can help further root cause diagnosis of production lines. With the sensing development, signals of process variables can be collected in high resolution, which can be regarded as multichannel functional data. They have abundant information to characterize the process and help identify the defect types. Motivated by a real example from the pipe tightening process, we focus on defect classification where each sample is a multichannel functional data. However, the available samples for each defect type are limited and imbalanced. Moreover, the functions are incomplete since the pre-tightening process before the pipe tightening process is unobserved. To classify the defect samples based on imbalanced, multichannel, and incomplete functional data is very important but challenging. Thus, we propose an innovative classification framework based on deep metric learning using functional data (DeepFunction). The framework leverages the power of deep metric learning to train on imbalanced datasets. A neural network specially crafted for processing functional data is also proposed to handle multichannel and incomplete functional data. The results from a real-world case study demonstrate the superior accuracy of our framework when compared to existing benchmarks.

翻訳日:2024-04-26 20:38:42 公開日:2024-04-24

# 十分でないなら、そのようにしよう:合成顔を通して顔認識における認証データの需要を減らす

If It's Not Enough, Make It So: Reducing Authentic Data Demand in Face Recognition through Synthetic Faces ( http://arxiv.org/abs/2404.03537v3 )

ライセンス: Link先を確認

Andrea Atzori, Fadi Boutros, Naser Damer, Gianni Fenu, Mirko Marras,

(参考訳) 近年の深層顔認識の進歩は、大規模で多様で手動で注釈付けされた顔データセットの需要を増大させてきた。顔認識のための真正で高品質なデータを取得することは、主にプライバシー上の懸念から、困難であることが証明されている。大規模な顔データセットは、主にWebベースのイメージから作成され、明示的なユーザの同意が欠如している。本稿では,合成顔データを用いて実画像に頼らずに効果的な顔認識モデルを訓練し,データ収集の懸念を緩和する方法について検討する。まず,最新の顔認識モデルの性能ギャップについて検討し,合成データのみと認証データのみを用いて訓練した。そこで我々は,最先端のバックボーンを様々な合成データと認証データの組み合わせで訓練することにより,分析をより深め,検証精度の確保のために,後者の限られた使用法を最適化するための洞察を得た。最後に、同じ目的を念頭において、データ拡張アプローチが合成データおよび認証データに与える影響を評価した。以上の結果から,統合データセットでトレーニングしたFRの有効性,特に適切な拡張手法と組み合わせた場合のFRの有効性が明らかとなった。

Recent advances in deep face recognition have spurred a growing demand for large, diverse, and manually annotated face datasets. Acquiring authentic, high-quality data for face recognition has proven to be a challenge, primarily due to privacy concerns. Large face datasets are primarily sourced from web-based images, lacking explicit user consent. In this paper, we examine whether and how synthetic face data can be used to train effective face recognition models with reduced reliance on authentic images, thereby mitigating data collection concerns. First, we explored the performance gap among recent state-of-the-art face recognition models, trained with synthetic data only and authentic (scarce) data only. Then, we deepened our analysis by training a state-of-the-art backbone with various combinations of synthetic and authentic data, gaining insights into optimizing the limited use of the latter for verification accuracy. Finally, we assessed the effectiveness of data augmentation approaches on synthetic and authentic data, with the same goal in mind. Our results highlighted the effectiveness of FR trained on combined datasets, particularly when combined with appropriate augmentation techniques.

翻訳日:2024-04-26 20:38:42 公開日:2024-04-24

# SAAS:大規模言語モデルにおける数学的推論強化のための問題解決能力向上戦略

SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models ( http://arxiv.org/abs/2404.03887v3 )

ライセンス: Link先を確認

Hyeonwoo Kim, Gyoungjin Gim, Yungi Kim, Jihoo Kim, Byungju Kim, Wonseok Lee, Chanjun Park,

(参考訳) 本研究では,Large Language Models (LLM) の数学的推論と問題解決能力の向上を目的とした,新しい学習手法を提案する。我々は,CoT(Chain-of-Thought)とPoT(Program-of-Thought)の学習を統合することに集中し,数学的推論能力の学習の優先順位付けが問題解決能力の増幅に役立つと仮定した。したがって、CoTによる初期学習は、問題の解決に不可欠である。そこで本研究では,CoT学習からPoT学習へ戦略的に移行する,SAAS(Solving Ability Amplification Strategy)という逐次学習手法を提案する。いくつかのベンチマークによる広範な性能比較を含む実証研究により,SAASがSOTA(State-of-the-art)の性能を達成することを示す。その結果, LLMにおける数学的推論の分野において, 逐次学習手法の有効性が著しく向上していることが示唆された。

This study presents a novel learning approach designed to enhance both mathematical reasoning and problem-solving abilities of Large Language Models (LLMs). We focus on integrating the Chain-of-Thought (CoT) and the Program-of-Thought (PoT) learning, hypothesizing that prioritizing the learning of mathematical reasoning ability is helpful for the amplification of problem-solving ability. Thus, the initial learning with CoT is essential for solving challenging mathematical problems. To this end, we propose a sequential learning approach, named SAAS (Solving Ability Amplification Strategy), which strategically transitions from CoT learning to PoT learning. Our empirical study, involving an extensive performance comparison using several benchmarks, demonstrates that our SAAS achieves state-of-the-art (SOTA) performance. The results underscore the effectiveness of our sequential learning approach, marking a significant advancement in the field of mathematical reasoning in LLMs.

翻訳日:2024-04-26 20:38:42 公開日:2024-04-24

# コード表現の強化によるグラフニューラルネットによる障害位置推定の改善に向けて

Towards Better Graph Neural Neural Network-based Fault Localization Through Enhanced Code Representation ( http://arxiv.org/abs/2404.04496v3 )

ライセンス: Link先を確認

Md Nakhla Rafi, Dong Jae Kim, An Ran Chen, Tse-Hsun Chen, Shaowei Wang,

(参考訳) 自動ソフトウェアフォールトローカライゼーションは、デバッグを容易にするために故障箇所をピンポイントすることで、ソフトウェア品質保証において重要な役割を果たす。広く使われている手法であるカバレッジベースのフォールトローカライゼーションでは、被疑点スコアに基づいたコードランク付けにカバレッジスペクトルの統計を用いる。しかし、統計的アプローチの剛性は、学習に基づく技術を要求する。中でもグラフニューラルネットワーク(GNN)に基づくグラフニューラルネットワーク(Grace)は,特徴表現を圧縮する他の学習手法の制限を緩和する,厳密な抽象構文強化グラフ表現として,テストとソースのカバレッジ関係を保存する能力によって,最先端技術を実現している。しかし、そのような表現は、ソフトウェアと関連するカバレッジスペクトルとASTグラフの複雑さの増大によりスケーラビリティに苦慮している。本研究では,ノードやエッジにおけるグラフ表現の複雑さを70%削減する新しいグラフ表現であるDepGraphを提案する。さらに,属性としてグラフ内のコード変更情報などの付加的機能を統合し,そのモデルが豊富な歴史的プロジェクトデータを活用できるようにする。 Defects4j 2.0.0を用いてDepGraphを評価し,Top-1における20%以上の障害の所在と平均一位と平均平均ランク(MAR)を50%以上改善し,GPUメモリ使用率を44%削減し,トレーニング/推論時間を85%向上させた。さらに、クロスプロジェクト環境では、DepGraphは最先端のベースラインを超え、Top-1の精度が42%、MFRとMARが68%、MARが65%向上している。我々の研究は、DepGraphの堅牢性、最先端の精度、将来の拡張と採用のためのスケーラビリティを実証する。

Automatic software fault localization plays an important role in software quality assurance by pinpointing faulty locations for easier debugging. Coverage-based fault localization, a widely used technique, employs statistics on coverage spectra to rank code based on suspiciousness scores. However, the rigidity of statistical approaches calls for learning-based techniques. Amongst all, Grace, a graph-neural network (GNN) based technique has achieved state-of-the-art due to its capacity to preserve coverage spectra, i.e., test-to-source coverage relationships, as precise abstract syntax-enhanced graph representation, mitigating the limitation of other learning-based technique which compresses the feature representation. However, such representation struggles with scalability due to the increasing complexity of software and associated coverage spectra and AST graphs. In this work, we proposed a new graph representation, DepGraph, that reduces the complexity of the graph representation by 70% in nodes and edges by integrating interprocedural call graph in the graph representation of the code. Moreover, we integrate additional features such as code change information in the graph as attributes so the model can leverage rich historical project data. We evaluate DepGraph using Defects4j 2.0.0, and it outperforms Grace by locating 20% more faults in Top-1 and improving the Mean First Rank (MFR) and the Mean Average Rank (MAR) by over 50% while decreasing GPU memory usage by 44% and training/inference time by 85%. Additionally, in cross-project settings, DepGraph surpasses the state-of-the-art baseline with a 42% higher Top-1 accuracy, and 68% and 65% improvement in MFR and MAR, respectively. Our study demonstrates DepGraph's robustness, achieving state-of-the-art accuracy and scalability for future extension and adoption.

翻訳日:2024-04-26 20:38:42 公開日:2024-04-24

# MA-LMM:長期ビデオ理解のためのメモリ拡張大型マルチモーダルモデル

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding ( http://arxiv.org/abs/2404.05726v2 )

ライセンス: Link先を確認

Bo He, Hengduo Li, Young Kyun Jang, Menglin Jia, Xuefei Cao, Ashish Shah, Abhinav Shrivastava, Ser-Nam Lim,

(参考訳) 大型言語モデル(LLM)の成功により、ビジョンモデルとLLMの統合により、ビジョン言語基盤モデルの構築が注目されている。しかし、既存のLLMベースの大規模マルチモーダルモデル(例えば、Video-LLaMA、VideoChat)は、短いビデオ理解のために限られたフレームしか持たない。本研究では,長期的映像理解のための効率的かつ効果的なモデルの設計に主眼を置いている。既存の作業と同じようなフレームを同時に処理するのではなく、オンラインで動画を処理し、過去の映像情報をメモリバンクに保存することを提案する。これにより、LLMのコンテキスト長制約やGPUメモリ制限を超過することなく、長期解析のために過去の映像コンテンツを参照することが可能となる。私たちのメモリバンクは、市販のマルチモーダルLCMにシームレスに統合できます。我々は,映像理解,ビデオ質問応答,ビデオキャプションなど,様々な映像理解タスクに関する広範な実験を行い,そのモデルにより,複数のデータセットにわたる最先端のパフォーマンスを実現することができる。コードはhttps://boheumd.github.io/MA-LMM/で公開されている。

With the success of large language models (LLMs), integrating the vision model into LLMs to build vision-language foundation models has gained much more interest recently. However, existing LLM-based large multimodal models (e.g., Video-LLaMA, VideoChat) can only take in a limited number of frames for short video understanding. In this study, we mainly focus on designing an efficient and effective model for long-term video understanding. Instead of trying to process more frames simultaneously like most existing work, we propose to process videos in an online manner and store past video information in a memory bank. This allows our model to reference historical video content for long-term analysis without exceeding LLMs' context length constraints or GPU memory limits. Our memory bank can be seamlessly integrated into current multimodal LLMs in an off-the-shelf manner. We conduct extensive experiments on various video understanding tasks, such as long-video understanding, video question answering, and video captioning, and our model can achieve state-of-the-art performances across multiple datasets. Code available at https://boheumd.github.io/MA-LMM/.

翻訳日:2024-04-26 20:28:54 公開日:2024-04-24

# 定段非平滑型SAのプリリミット結合と定常収束

Prelimit Coupling and Steady-State Convergence of Constant-stepsize Nonsmooth Contractive SA ( http://arxiv.org/abs/2404.06023v2 )

ライセンス: Link先を確認

Yixuan Zhang, Dongyan Huo, Yudong Chen, Qiaomin Xie,

(参考訳) Q-learningによって動機づけられ, 定常段階の非滑らかな収縮性確率近似 (SA) について検討した。ダイナミクスの2つの重要なクラスに焦点を当てます。 1)付加雑音を有する非平滑な収縮型SA 2) 加法ノイズと乗法ノイズの両方を特徴とする同期および非同期Q-ラーニング。どちらの力学に対しても、ワッサーシュタイン距離の定常極限分布に反復体の弱収束を確立する。さらに,定常収束を確立するためのプリリミット結合手法を提案し,ステップサイズがゼロになるにつれて定常分布の限界を特徴づける。この結果から、非滑らかなSAの漸近バイアスは、滑らかなSAと鋭い対照的なステップサイズの平方根に比例することを示した。このバイアス特性により、非滑らかなSAのバイアス低減にリチャードソン・ロームバーグ外挿を用いることができる。

Motivated by Q-learning, we study nonsmooth contractive stochastic approximation (SA) with constant stepsize. We focus on two important classes of dynamics: 1) nonsmooth contractive SA with additive noise, and 2) synchronous and asynchronous Q-learning, which features both additive and multiplicative noise. For both dynamics, we establish weak convergence of the iterates to a stationary limit distribution in Wasserstein distance. Furthermore, we propose a prelimit coupling technique for establishing steady-state convergence and characterize the limit of the stationary distribution as the stepsize goes to zero. Using this result, we derive that the asymptotic bias of nonsmooth SA is proportional to the square root of the stepsize, which stands in sharp contrast to smooth SA. This bias characterization allows for the use of Richardson-Romberg extrapolation for bias reduction in nonsmooth SA.

翻訳日:2024-04-26 20:28:54 公開日:2024-04-24

# ノイズNCA:ニューラルセルオートマタの時空間連続性を改善するノイジー種子

NoiseNCA: Noisy Seed Improves Spatio-Temporal Continuity of Neural Cellular Automata ( http://arxiv.org/abs/2404.06279v2 )

ライセンス: Link先を確認

Ehsan Pajouheshgar, Yitao Xu, Sabine Süsstrunk,

(参考訳) ニューラルセルオートマタ(Neural Cellular Automata、NCA)はセルオートマタの一種で、ニューラルネットワークによって更新ルールをパラメータ化して、勾配降下を用いてトレーニングすることができる。本稿では, 反応拡散系を記述する偏微分方程式 (PDE) に着想を得て, テクスチャ合成に使用されるNAAモデルに着目した。 NCAモデルをトレーニングするために、時空間領域を離散化し、オイラー積分を用いてPDEを数値シミュレーションする。しかし、訓練されたNAAが、対応するPDEによって記述される連続力学を真に学習するかどうか、あるいは単にトレーニングで使用される離散化を過度に適合させるだけなのかは、未解決の問題である。時空離散化が連続性に近づく極限において, NCA モデルについて検討する。既存のNAAモデルは、特に「シード」とも呼ばれる初期状態に近い場合、トレーニングの離散化に過度に適合する傾向にある。そこで本研究では,一様雑音を初期条件とする解を提案する。本研究では, NCA の動的一貫性を幅広い時空間的粒度にわたって維持する手法の有効性を実証する。 NCAモデルの改良により、パターン生成速度と合成パターンのスケールを連続的に制御し、2つの新しいテスト時間相互作用が可能となった。インタラクティブなオンラインデモでは、この新しいNAA機能を実演しています。我々の研究は、NAAモデルが連続力学を学習し、動的システムの観点からNAA研究の新たな場を開くことを明らかにしている。

Neural Cellular Automata (NCA) is a class of Cellular Automata where the update rule is parameterized by a neural network that can be trained using gradient descent. In this paper, we focus on NCA models used for texture synthesis, where the update rule is inspired by partial differential equations (PDEs) describing reaction-diffusion systems. To train the NCA model, the spatio-termporal domain is discretized, and Euler integration is used to numerically simulate the PDE. However, whether a trained NCA truly learns the continuous dynamic described by the corresponding PDE or merely overfits the discretization used in training remains an open question. We study NCA models at the limit where space-time discretization approaches continuity. We find that existing NCA models tend to overfit the training discretization, especially in the proximity of the initial condition, also called "seed". To address this, we propose a solution that utilizes uniform noise as the initial condition. We demonstrate the effectiveness of our approach in preserving the consistency of NCA dynamics across a wide range of spatio-temporal granularities. Our improved NCA model enables two new test-time interactions by allowing continuous control over the speed of pattern formation and the scale of the synthesized patterns. We demonstrate this new NCA feature in our interactive online demo. Our work reveals that NCA models can learn continuous dynamics and opens new venues for NCA research from a dynamical systems' perspective.

翻訳日:2024-04-26 20:28:54 公開日:2024-04-24

# 指数重み付き移動モデル

Exponentially Weighted Moving Models ( http://arxiv.org/abs/2404.08136v2 )

ライセンス: Link先を確認

Eric Luxenberg, Stephen Boyd,

(参考訳) ベクトル時系列に対する指数重み付き移動モデル(EWMM)は、過去の観測データに対する指数重み付き損失関数に基づいて、時間毎に新しいデータモデルに適合する。指数重み付き移動平均(EWMA)は、平方損失関数を用いて平均を推定する特殊なケースである。二次損失関数に対して、EWMMは2次関数のパラメータを更新する単純な再帰を用いて適合することができる。他の損失関数の場合、過去の履歴全体が保存されなければならない。本稿では,過去のサンプルの固定数のウィンドウのみを格納するEWMMの近似計算法を提案する。この近似EWMMは凸最適化に依存し、時間とともに成長しない問題を解く。近似から得られた推定値と正確なEWMM法による推定値を比較する。

An exponentially weighted moving model (EWMM) for a vector time series fits a new data model each time period, based on an exponentially fading loss function on past observed data. The well known and widely used exponentially weighted moving average (EWMA) is a special case that estimates the mean using a square loss function. For quadratic loss functions EWMMs can be fit using a simple recursion that updates the parameters of a quadratic function. For other loss functions, the entire past history must be stored, and the fitting problem grows in size as time increases. We propose a general method for computing an approximation of EWMM, which requires storing only a window of a fixed number of past samples, and uses an additional quadratic term to approximate the loss associated with the data before the window. This approximate EWMM relies on convex optimization, and solves problems that do not grow with time. We compare the estimates produced by our approximation with the estimates from the exact EWMM method.

翻訳日:2024-04-26 20:28:54 公開日:2024-04-24

# 量子エミッタからの時間結合絡み合った光子の理論

Theory of time-bin entangled photons from quantum emitters ( http://arxiv.org/abs/2404.08348v2 )

ライセンス: Link先を確認

Thomas K. Bracht, Florian Kappe, Moritz Cygorek, Tim Seidelmann, Yusuf Karli, Vikas Remesh, Gregor Weihs, Vollrath Martin Axt, Doris E. Reiter,

(参考訳) 絡み合った光子対は、量子通信の領域における多くの応用の基礎となる。絡み合った光子対の光ファイバー移動では、時間ビン符号化は偏光符号化量子ビットに比べて安定性が向上する可能性がある。ここでは、時間双絡光子の測定を記述するための理論的基礎を定めている。我々は、量子状態トモグラフィー測定に対応する時間ビン符号化光子対の多重時間相関関数を導出する。我々の理論は、量子エミッタからの時間ビン絡みの現実的なシミュレーションのために、特定の量子システムに適用されるあらゆる種類の損失やデコヒーレンス効果を含むようにシミュレーションを拡張する出発点となる。

Entangled photon pairs form the foundation for many applications in the realm of quantum communication. For fiber-optic transfer of entangled photon pairs, time-bin encoding can potentially offer an improved stability compared to polarization encoded qubits. Here, we lay the theoretical foundations to describe the measurement of time-bin entangled photons. We derive multi-time correlation functions of the time-bin encoded photon pairs, corresponding to quantum state tomographic measurements. Our theory can be the starting point to extend the simulations to include all kinds of loss or decoherence effects that apply in a specific quantum system for realistic simulation for time-bin entanglement from quantum emitters.

翻訳日:2024-04-26 20:28:54 公開日:2024-04-24

# 大規模言語モデルを用いた次世代データインタラクションシステムDB-GPTの実証

Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models ( http://arxiv.org/abs/2404.10209v3 )

ライセンス: Link先を確認

Siqiao Xue, Danrui Qi, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhiping Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Hong Yi, Shaodong Liu, Hongjun Yang, Faqiang Chen,

(参考訳) 大規模言語モデル(LLM)の最近のブレークスルーは、ソフトウェアの多くの領域を移行する位置にある。データと対話する技術は、特にLLMと重要な絡み合いを持ち、効率的で直感的なデータインタラクションが最重要である。本稿では,従来のデータインタラクションタスクにLLMを統合し,ユーザエクスペリエンスとアクセシビリティを向上させる,革新的で製品対応のPythonライブラリDB-GPTを提案する。 DB-GPTは、自然言語で記述されたデータインタラクションタスクを理解し、LLMによるコンテキスト認識応答を提供するように設計されており、初心者から専門家まで、ユーザにとって必須のツールである。システム設計は、ローカル、分散、およびクラウド環境へのデプロイをサポートする。 LLMでText-to-SQLのような基本的なデータインタラクションタスクを扱うだけでなく、Multi-AgentsフレームワークやAエージェントワークフロー表現言語(AWEL)を通じて生成データ分析のような複雑なタスクを処理できる。サービス指向マルチモデル管理フレームワーク(SMMF)は、データのプライバシとセキュリティを保証する。さらに、DB-GPTは、ユーザがDB-GPTを製品環境に簡単に統合できるように設計された一連の製品対応機能を提供している。 DB-GPTのコードはGithub(https://github.com/eosphoros-ai/DB-GPT)で公開されている。手順(https://github.com/eosphoros-ai/DB-GPT#install)でDB-GPTをインストールし、Youtube(https://youtu.be/n_8RI1ENyl4)で5分間の紹介ビデオを見て、DB-GPTをさらに調査してください。

The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. The technologies of interacting with data particularly have an important entanglement with LLMs as efficient and intuitive data interactions are paramount. In this paper, we present DB-GPT, a revolutionary and product-ready Python library that integrates LLMs into traditional data interaction tasks to enhance user experience and accessibility. DB-GPT is designed to understand data interaction tasks described by natural language and provide context-aware responses powered by LLMs, making it an indispensable tool for users ranging from novice to expert. Its system design supports deployment across local, distributed, and cloud environments. Beyond handling basic data interaction tasks like Text-to-SQL with LLMs, it can handle complex tasks like generative data analysis through a Multi-Agents framework and the Agentic Workflow Expression Language (AWEL). The Service-oriented Multi-model Management Framework (SMMF) ensures data privacy and security, enabling users to employ DB-GPT with private LLMs. Additionally, DB-GPT offers a series of product-ready features designed to enable users to integrate DB-GPT within their product environments easily. The code of DB-GPT is available at Github(https://github.com/eosphoros-ai/DB-GPT) which already has over 10.7k stars. Please install DB-GPT for your own usage with the instructions(https://github.com/eosphoros-ai/DB-GPT#install) and watch a 5-minute introduction video on Youtube(https://youtu.be/n_8RI1ENyl4) to further investigate DB-GPT.

翻訳日:2024-04-26 20:28:54 公開日:2024-04-24

# 騒音測定の浄化と密閉の忠実な蒸留

Purification of Noisy Measurements and Faithful Distillation of Entanglement ( http://arxiv.org/abs/2404.10538v2 )

ライセンス: Link先を確認

Jaemin Kim, Jiyoung Yun, Joonwoo Bae,

(参考訳) 一般量子演算を構成する量子測度が特にノイズとなるような,ノイズを伴う現実的なシナリオにおけるエンタングルメント蒸留について考察する。本報告では, ノイズ測定を浄化するプロトコルについて述べるとともに, 浄化の助けを借りて, 不完全な局所操作を蒸留に利用できることを示す。提案手法は, 実装時のノイズに対して堅牢であることを示すとともに, 実用化時の浄化を解析し, 測定およびゲート誤差を最大10%まで低減するために, 2つの追加量子ビットによる浄化は, 絡み合わせを蒸留するのに費用対効果があることを示唆する。精製プロトコルは、現在利用可能な量子技術で実現可能であり、絡み合いアプリケーションに容易に適用できる。

We consider entanglement distillation in a realistic scenario with noisy operations in which quantum measurements that constitute a general quantum operation are particularly noisy. We present a protocol for purifying noisy measurements and show that with the help of the purification, imperfect local operations can be used to distill entanglement. We show that the purification protocol is robust against noise in implementation and analyze the purification in a practical realization: for measurement and gate errors up to 10%, we suggest that the purification with two additional qubits is cost-effective for distilling entanglement. The purification protocol is feasible with currently available quantum technologies and readily applied to entanglement applications.

翻訳日:2024-04-26 20:28:54 公開日:2024-04-24

# 異常検出のための散乱変換を用いたグラフニューラルネットワークの統合

Integrating Graph Neural Networks with Scattering Transform for Anomaly Detection ( http://arxiv.org/abs/2404.10800v3 )

ライセンス: Link先を確認

Abdeljalil Zoubir, Badr Missaoui,

(参考訳) 本稿では,グラフニューラルネットワーク(GNN)を用いたネットワーク侵入検知システム(NIDS)における2つの新しい手法を提案する。最初のアプローチであるScattering Transform with E-GraphSAGE (STEG)は、散乱変換を用いてエッジ特徴ベクトルの多重分解能解析を行う。これは、ネットワークトラフィックの微妙な異常を特定するのに不可欠な詳細な表現を提供する。第2のアプローチでは、ノード表現をNode2Vecで開始することで改善し、統一値を使用する標準的な方法から逸脱し、より正確で全体的なネットワーク画像を取得する。提案手法は,ベンチマークNIDSデータセットにおける既存の最先端手法と比較して,性能が大幅に向上した。

In this paper, we present two novel methods in Network Intrusion Detection Systems (NIDS) using Graph Neural Networks (GNNs). The first approach, Scattering Transform with E-GraphSAGE (STEG), utilizes the scattering transform to conduct multi-resolution analysis of edge feature vectors. This provides a detailed representation that is essential for identifying subtle anomalies in network traffic. The second approach improves node representation by initiating with Node2Vec, diverging from standard methods of using uniform values, thereby capturing a more accurate and holistic network picture. Our methods have shown significant improvements in performance compared to existing state-of-the-art methods in benchmark NIDS datasets.

翻訳日:2024-04-26 20:28:54 公開日:2024-04-24

# 高速スパース入力動的ビュー合成のための分解運動場

Factorized Motion Fields for Fast Sparse Input Dynamic View Synthesis ( http://arxiv.org/abs/2404.11669v3 )

ライセンス: Link先を確認

Nagabhushan Somraj, Kapil Choudhary, Sai Harsha Mupparaju, Rajiv Soundararajan,

(参考訳) 高速な最適化とレンダリングのために動的シーンの3D表現を設計することは難しい作業である。最近の明示的な表現は動的放射場を高速に学習しレンダリングすることを可能にするが、それらには深い入力視点が必要である。本研究では,スパースな入力視点を持つ動的放射場に対する高速な表現の学習に焦点をあてる。しかし、スパース入力による最適化は非制約であり、学習を制約するためには、前もって動きを使う必要がある。既存の高速ダイナミックシーンモデルでは、動きを明示的にモデル化することはなく、動きの先行に制約されるのが困難である。運動場の時空間相関を生かし,高速な因子化4次元表現として明示的な動きモデルを設計する。次に、カメラ間のスパースフロー前処理と、カメラ内の密流前処理を組み合わせることで、動作モデルを調整することを含む、信頼性の高いフロー前処理を導入する。我々のモデルは高速でコンパクトであり、スパースな入力視点を持つ人気のあるマルチビュー動的シーンデータセット上で非常に優れた性能を実現している。私たちのモデルのソースコードは、プロジェクトページにある。 https://nagabhushansn95.github.io/publications/2024/RF-DeRF.html。

Designing a 3D representation of a dynamic scene for fast optimization and rendering is a challenging task. While recent explicit representations enable fast learning and rendering of dynamic radiance fields, they require a dense set of input viewpoints. In this work, we focus on learning a fast representation for dynamic radiance fields with sparse input viewpoints. However, the optimization with sparse input is under-constrained and necessitates the use of motion priors to constrain the learning. Existing fast dynamic scene models do not explicitly model the motion, making them difficult to be constrained with motion priors. We design an explicit motion model as a factorized 4D representation that is fast and can exploit the spatio-temporal correlation of the motion field. We then introduce reliable flow priors including a combination of sparse flow priors across cameras and dense flow priors within cameras to regularize our motion model. Our model is fast, compact and achieves very good performance on popular multi-view dynamic scene datasets with sparse input viewpoints. The source code for our model can be found on our project page: https://nagabhushansn95.github.io/publications/2024/RF-DeRF.html.

翻訳日:2024-04-26 20:28:54 公開日:2024-04-24

# ダミーの格子手術

Lattice Surgery for Dummies ( http://arxiv.org/abs/2404.13202v2 )

ライセンス: Link先を確認

Avimita Chatterjee, Subrata Das, Swaroop Ghosh,

(参考訳) 量子誤り訂正(QEC)は、ノイズの修正とフォールトトレラント量子コンピューティングへの道を開く上で重要な役割を果たす。この分野は大幅に進歩し、新しい量子エラー訂正符号が頻繁に出現し、エラーに効果的に対処している。これらのうち、トポロジ的符号、特に表面符号は、誤差の低いしきい値と大規模量子コンピュータの実装の可能性で際立っている。しかし、これらの符号は1量子ビットの符号化に制限されている。格子手術は、複数の符号化量子ビット間の相互作用や、表面コードの格子間の相互作用を可能にするために重要であり、その高度な誤り訂正機能は、運用上のオーバーヘッドを大幅に増大させることなく維持される。格子手術は、より広範な量子系にまたがるQECCのスケーリングに重要である。その重要な重要性にもかかわらず、格子の手術を理解することは、その固有の複雑さのために困難であり、複雑な量子物理学と数学的概念の深い理解を必要としている。本論文は,格子手術のデミスティフィケーションを試み,量子物理学や数学の深い背景を持たない人にもアクセスできるようにする。この研究は、表面符号を探索し、格子手術の基礎を導入し、量子ゲートの構築とマルチキュービット回路のエミュレートにその応用を実証する。

Quantum error correction (QEC) plays a crucial role in correcting noise and paving the way for fault-tolerant quantum computing. This field has seen significant advancements, with new quantum error correction codes emerging regularly to address errors effectively. Among these, topological codes, particularly surface codes, stand out for their low error thresholds and feasibility for implementation in large-scale quantum computers. However, these codes are restricted to encoding a single qubit. Lattice surgery is crucial for enabling interactions among multiple encoded qubits or between the lattices of a surface code, ensuring that its sophisticated error-correcting features are maintained without significantly increasing the operational overhead. Lattice surgery is pivotal for scaling QECCs across more extensive quantum systems. Despite its critical importance, comprehending lattice surgery is challenging due to its inherent complexity, demanding a deep understanding of intricate quantum physics and mathematical concepts. This paper endeavors to demystify lattice surgery, making it accessible to those without a profound background in quantum physics or mathematics. This work explores surface codes, introduces the basics of lattice surgery, and demonstrates its application in building quantum gates and emulating multi-qubit circuits.

翻訳日:2024-04-26 20:19:09 公開日:2024-04-24

# RAW画像からの反射の除去

Removing Reflections from RAW Photos ( http://arxiv.org/abs/2404.14414v2 )

ライセンス: Link先を確認

Eric Kee, Adam Pikielny, Kevin Blackburn-Matzen, Marc Levoy,

(参考訳) 消費者写真用画像から現実世界の反射を除去するシステムについて述べる。本システムでは,リニア(RAW)写真に対して,モバイルデバイス上で自撮りカメラを使用すれば,リフレクション(リフレクション)を不明瞭にするためのコンテキスト写真の追加(オプション)を行う。このシステムは実世界のRAW画像の合成混合物を用いて訓練され、光学的かつ幾何学的に正確な反射シミュレーションを用いて合成される。提案システムは,取得した画像と任意の文脈写真を入力として受け入れ,256pで動作させるベースモデルと,256pで出力された256p画像をフル解像度に変換するアップサンプリングモデルから構成される。このシステムは、MacBookまたはiPhone 14 Proで1Kから4.5～6.5秒でレビュー用の画像を生成することができる。我々は、現場で撮影されたRAW写真をテストし、典型的な消費者向け写真を具現化した。

We describe a system to remove real-world reflections from images for consumer photography. Our system operates on linear (RAW) photos, with the (optional) addition of a contextual photo looking in the opposite direction, e.g., using the selfie camera on a mobile device, which helps disambiguate what should be considered the reflection. The system is trained using synthetic mixtures of real-world RAW images, which are combined using a reflection simulation that is photometrically and geometrically accurate. Our system consists of a base model that accepts the captured photo and optional contextual photo as input, and runs at 256p, followed by an up-sampling model that transforms output 256p images to full resolution. The system can produce images for review at 1K in 4.5 to 6.5 seconds on a MacBook or iPhone 14 Pro. We test on RAW photos that were captured in the field and embody typical consumer photographs.

翻訳日:2024-04-26 20:19:09 公開日:2024-04-24

# 量子計量による非エルミート臨界点の同定

Identifying non-Hermitian critical points with quantum metric ( http://arxiv.org/abs/2404.15628v1 )

ライセンス: Link先を確認

Jun-Feng Ren, Jing Li, Hai-Tao Ding, Dan-Wei Zhang,

(参考訳) 量子状態の幾何学的性質は、量子幾何学テンソルによって完全に符号化される。量子幾何テンソルの実部と虚部は、それぞれヒルベルト空間内の2つの近接量子状態間の距離と位相差を特徴づける量子計量とベリー曲率である。従来のエルミート量子系では、量子メートル法は忠実度感受性に対応しており、幾何学的な観点からの量子相転移の特定に既に使われている。本研究では、この知恵を非エルミート系に拡張し、非エルミート臨界点を明らかにする。具体的には、数値的厳密な対角化法と解析法を用いることで、非エルミート一般化オーブリー・アンドル・マインモデルと非エルミートクラスタと混合場イジングモデルを含む様々な非エルミートモデルにおける量子メートル法と対応する順序パラメータを計算する。これらの非エルミートモデルにおける固有状態の量子計量は、それぞれ局在化遷移、移動エッジ、および多体量子相転移を正確に同定する。さらに、この戦略は有限サイズ効果と異なる境界条件に対して堅牢であることを示す。

The geometric properties of quantum states is fully encoded by the quantum geometric tensor. The real and imaginary parts of the quantum geometric tensor are the quantum metric and Berry curvature, which characterize the distance and phase difference between two nearby quantum states in Hilbert space, respectively. For conventional Hermitian quantum systems, the quantum metric corresponds to the fidelity susceptibility and has already been used to specify quantum phase transitions from the geometric perspective. In this work, we extend this wisdom to the non-Hermitian systems for revealing non-Hermitian critical points. To be concrete, by employing numerical exact diagonalization and analytical methods, we calculate the quantum metric and corresponding order parameters in various non-Hermitian models, which include two non-Hermitian generalized Aubry-Andr\'{e} models and non-Hermitian cluster and mixed-field Ising models. We demonstrate that the quantum metric of eigenstates in these non-Hermitian models exactly identifies the localization transitions, mobility edges, and many-body quantum phase transitions, respectively. We further show that this strategy is robust against the finite-size effect and different boundary conditions.

翻訳日:2024-04-26 20:19:09 公開日:2024-04-24

# 非Fungibleプログラム:Web3用のプライベートフルスタックアプリケーション

Non-Fungible Programs: Private Full-Stack Applications for Web3 ( http://arxiv.org/abs/2404.15632v1 )

ライセンス: Link先を確認

Blake Regalia, Benjamin Adams,

(参考訳) Web3アプリケーションがWeb 2.0に対して提供する最大の利点は、データアクセス層の進化である。ユーザからの信頼を強いる不透明で集中的なサービスは、スマートコントラクトの信頼性のない分散システムに置き換えられる。しかしながら、スマートコントラクトがトランザクションされるブロックチェーンベースのデータベースのパブリックな性質は、データプライバシに依存するアプリケーションや、不完全な情報を持つ参加者に依存するアプリケーションに対して、一般的には課題を提起している。これは、アクティブコントラクトのメモリ状態を暗号化する秘密のスマートコントラクトネットワークの導入と、データベースのオンチェーン保存によって、変わっている。機密性によって、コントラクトは以前実現不可能だった新しいインタラクションメカニズムをより容易に実装できる。一方、Web 2.0とWeb3アプリケーションの両方では、ユーザインターフェイスは、ユーザ意図をアクション可能なリクエストに変換する上で重要な役割を担っています。多くの場合、開発者はインテリジェンスと自律性をクライアント側に移行し、計算、グラフィックス、ネットワーキングにWeb技術を活用しています。しかし、Web3のこのようなフロントエンドへの依存は、アプリケーションに永続的なホストがいなければ、分散化されたアプリケーションがエンドユーザにアクセスできないという問題点を浮き彫りにしている。ここでは、ブロックチェーンを介して分散し、Web技術を活用し、暗号化されたスマートコントラクトで永続化されたプライベートデータベースによってバックアップされる、自己完結型のフロントエンドアプリケーションを開発するための、NFP(Non-Fungible Program)モデルを紹介します。フロントエンドコードへのアクセスとバックエンドサービスへのアクセスは、NFTオーナシップモデルに従ってスマートコントラクトによって制御され、保証される。拡張によって、NFPアプリケーションはトークン所有者に対話性をもたらし、オーラクルの認証機構、補充Webサービス、オーバレイネットワークなどの新機能をセキュアに実現します。また...。

The greatest advantage that Web3 applications offer over Web 2.0 is the evolution of the data access layer. Opaque, centralized services that compelled trust from users are replaced by trustless, decentralized systems of smart contracts. However, the public nature of blockchain-based databases, on which smart contracts transact, has typically presented a challenge for applications that depend on data privacy or that rely on participants having incomplete information. This has changed with the introduction of confidential smart contract networks that encrypt the memory state of active contracts as well as their databases stored on-chain. With confidentiality, contracts can more readily implement novel interaction mechanisms that were previously infeasible. Meanwhile, in both Web 2.0 and Web3 applications the user interface continues to play a crucial role in translating user intent into actionable requests. In many cases, developers have shifted intelligence and autonomy into the client-side, leveraging Web technologies for compute, graphics, and networking. Web3's reliance on such frontends has revealed a pain point though, namely that decentralized applications are not accessible to end users without a persistent host serving the application. Here we introduce the Non-Fungible Program (NFP) model for developing self-contained frontend applications that are distributed via blockchain, powered by Web technology, and backed by private databases persisted in encrypted smart contracts. Access to frontend code, as well as backend services, is controlled and guaranteed by smart contracts according to the NFT ownership model, eliminating the need for a separate host. By extension, NFP applications bring interactivity to token owners and enable new functionalities, such as authorization mechanisms for oracles, supplementary Web services, and overlay networks in a secure manner. In addition...

翻訳日:2024-04-26 20:19:09 公開日:2024-04-24

# マルチユニットオークション設計のための人工知能

Artificial Intelligence for Multi-Unit Auction design ( http://arxiv.org/abs/2404.15633v1 )

ライセンス: Link先を確認

Peyman Khezr, Kendall Taylor,

(参考訳) マルチユニットオークションにおける入札行動を理解することは、研究者にとって現在進行中の課題である。広く使われているにもかかわらず、入札行動、収益ランキング、そして一般的な多ユニットオークションの効率に関する理論的洞察は限られている。本稿では,人工知能,特に強化学習をモデル自由学習手法として活用し,実際に使用されている3つの著名なマルチユニットオークションにおける入札をシミュレートする。マルチユニットオークションにおいて,学習と入札に適した6つのアルゴリズムを導入し,実例を用いて比較する。本稿では,人工知能を用いたオークションデザインの重要性,特にマルチユニットオークションの設計の強化について述べる。

Understanding bidding behavior in multi-unit auctions remains an ongoing challenge for researchers. Despite their widespread use, theoretical insights into the bidding behavior, revenue ranking, and efficiency of commonly used multi-unit auctions are limited. This paper utilizes artificial intelligence, specifically reinforcement learning, as a model free learning approach to simulate bidding in three prominent multi-unit auctions employed in practice. We introduce six algorithms that are suitable for learning and bidding in multi-unit auctions and compare them using an illustrative example. This paper underscores the significance of using artificial intelligence in auction design, particularly in enhancing the design of multi-unit auctions.

翻訳日:2024-04-26 20:09:25 公開日:2024-04-24

# 予測エンクロメント時間に基づく非信号区間における歩行者の潜在的リスクのリアルタイム評価フレームワーク

A Real-time Evaluation Framework for Pedestrian's Potential Risk at Non-Signalized Intersections Based on Predicted Post-Encroachment Time ( http://arxiv.org/abs/2404.15635v1 )

ライセンス: Link先を確認

Tengfeng Lin, Zhixiong Jin, Seongjin Choi, Hwasoo Yeo,

(参考訳) 交差点での歩行者の安全対策は、交通関連の負傷者や死亡者を減らす緊急性によって引き起こされる交通研究の分野における最重要課題の1つである。コンピュータビジョン技術や予測モデルの発展に伴い、リアルタイムのプロアクティブな保護システムの開発が交差点における歩行者の安全向上に欠かせないものとなっている。これらの保護システムの中核は、歩行者の潜在的なリスクの予測に基づく評価であり、事故の発生を防ぐ重要な役割を担っている。現在の予測に基づく潜在的なリスク評価研究における大きな課題は、歩行者の潜在的なリスクを評価するためのリアルタイムフレームワークを作成するための不十分な進歩、潜在的なリスクを表現するための正確で説明可能な安全指標の欠如、歩行者の各カテゴリーに特有な調整済みの評価基準の欠如、の3つの側面にまとめることができる。これらの課題に対処するために,コンピュータビジョン技術と予測モデルを用いたフレームワークを開発し,歩行者の潜在的なリスクをリアルタイムで評価する。この枠組みと一体化しているのは、歩行者や車両の交差点到着時刻を予測できるディープラーニングモデルから派生した、新しいサロゲート安全対策であるPredicted Post-Encroachment Time (P-PET)である。歩行者のリスク評価の有効性と信頼性をさらに向上するため,歩行者を異なるカテゴリーに分類し,グループ毎に具体的な評価基準を適用した。本研究は,P-PETを用いて潜在的リスクを効果的に識別する能力を示し,リアルタイムアプリケーションの実現可能性と,歩行者の異なるカテゴリーにおけるリスク評価性能の向上を示すものである。

Addressing pedestrian safety at intersections is one of the paramount concerns in the field of transportation research, driven by the urgency of reducing traffic-related injuries and fatalities. With advances in computer vision technologies and predictive models, the pursuit of developing real-time proactive protection systems is increasingly recognized as vital to improving pedestrian safety at intersections. The core of these protection systems lies in the prediction-based evaluation of pedestrian's potential risks, which plays a significant role in preventing the occurrence of accidents. The major challenges in the current prediction-based potential risk evaluation research can be summarized into three aspects: the inadequate progress in creating a real-time framework for the evaluation of pedestrian's potential risks, the absence of accurate and explainable safety indicators that can represent the potential risk, and the lack of tailor-made evaluation criteria specifically for each category of pedestrians. To address these research challenges, in this study, a framework with computer vision technologies and predictive models is developed to evaluate the potential risk of pedestrians in real time. Integral to this framework is a novel surrogate safety measure, the Predicted Post-Encroachment Time (P-PET), derived from deep learning models capable to predict the arrival time of pedestrians and vehicles at intersections. To further improve the effectiveness and reliability of pedestrian risk evaluation, we classify pedestrians into distinct categories and apply specific evaluation criteria for each group. The results demonstrate the framework's ability to effectively identify potential risks through the use of P-PET, indicating its feasibility for real-time applications and its improved performance in risk evaluation across different categories of pedestrians.

翻訳日:2024-04-26 20:09:25 公開日:2024-04-24

# PriorNet: 効率的な画像デハージングのための多次元インタラクティブアテンション付き軽量ネットワーク

PriorNet: A Novel Lightweight Network with Multidimensional Interactive Attention for Efficient Image Dehazing ( http://arxiv.org/abs/2404.15638v1 )

ライセンス: Link先を確認

Yutong Chen, Zhang Wen, Chao Wang, Lei Gong, Zhongchao Yi,

(参考訳) ヘイズ画像は視覚的品質を低下させ、デハジングはその後の処理タスクにとって重要な前提条件である。現在のデハジング法のほとんどはニューラルネットワークに依存しており、高い計算パラメータ圧力や弱い一般化能力といった課題に直面している。本稿では,過剰な詳細抽出問題を回避しつつ,ヘイズ画像の明瞭さと視覚的品質を大幅に向上させる,新しい,軽量で,適用性の高いデハージングネットワークであるPresiderNetを紹介する。 PriorNetのコアは、多次元インタラクティブアテンション(MIA)機構であり、複雑なシステムに関連する計算負荷と一般化の難しさを著しく低減し、様々なヘイズ特性を効果的に捉えている。均一な畳み込みカーネルサイズを利用し、スキップ接続を組み込むことで、特徴抽出プロセスの合理化を実現した。レイヤ数とアーキテクチャの簡略化は、デハージング効率を高めるだけでなく、エッジデバイスへのデプロイを容易にする。複数のデータセットにわたる広範囲なテストは、シングルイメージのデハージングタスクにおいて、イメージの詳細と色の忠実さを維持しながら、デハージングと明快さの回復において、PreferNetの例外的なパフォーマンスを示している。特に、モデルのサイズがたった18KbのPresideNetは、他の方法に比べて優れたデハージング一般化機能を示している。我々の研究は、画像デハージング技術の発展に大きく貢献し、特に普遍性とデプロイ性の向上の重要性を強調しながら、フィールドおよび関連ドメインに対する新たな視点とツールを提供しています。

Hazy images degrade visual quality, and dehazing is a crucial prerequisite for subsequent processing tasks. Most current dehazing methods rely on neural networks and face challenges such as high computational parameter pressure and weak generalization capabilities. This paper introduces PriorNet--a novel, lightweight, and highly applicable dehazing network designed to significantly improve the clarity and visual quality of hazy images while avoiding excessive detail extraction issues. The core of PriorNet is the original Multi-Dimensional Interactive Attention (MIA) mechanism, which effectively captures a wide range of haze characteristics, substantially reducing the computational load and generalization difficulties associated with complex systems. By utilizing a uniform convolutional kernel size and incorporating skip connections, we have streamlined the feature extraction process. Simplifying the number of layers and architecture not only enhances dehazing efficiency but also facilitates easier deployment on edge devices. Extensive testing across multiple datasets has demonstrated PriorNet's exceptional performance in dehazing and clarity restoration, maintaining image detail and color fidelity in single-image dehazing tasks. Notably, with a model size of just 18Kb, PriorNet showcases superior dehazing generalization capabilities compared to other methods. Our research makes a significant contribution to advancing image dehazing technology, providing new perspectives and tools for the field and related domains, particularly emphasizing the importance of improving universality and deployability.

翻訳日:2024-04-26 20:09:25 公開日:2024-04-24

# CodeIP: 大規模言語のコードモデルのための文法ガイド付きマルチビット透かし

CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code ( http://arxiv.org/abs/2404.15639v1 )

ライセンス: Link先を確認

Batu Guan, Yao Wan, Zhangqian Bi, Zheng Wang, Hongyu Zhang, Yulei Sui, Pan Zhou, Lichao Sun,

(参考訳) 大規模言語モデル(LLM)は、コード生成の自動化にますます使用されているため、コードがAI生成されているかどうか、特に産業における知的財産権(IP)保護や教育における学術的不正行為の防止といった目的のために、どのモデルがどのモデルであるかを知ることが望まれる。マシン生成コンテンツに透かしを組み込むことは、コード証明を提供する方法のひとつだが、既存のソリューションは単一のビットに制限されているか、柔軟性が欠如している。我々は,LLMベースのコード生成のための新しい透かし技術であるCodeIPを提案する。 CodeIPは、生成されたコードのセマンティクスを保持しながら、マルチビット情報の挿入を可能にし、未設定の透かしの強度と多様性を向上させる。これは、次のトークンの後の文法型を予測するために型予測器を訓練し、生成されたコードの構文的および意味的正しさを高めることで達成される。 5つのプログラミング言語にまたがる実世界のデータセットの実験では、CodeIPの有効性が示されている。

As Large Language Models (LLMs) are increasingly used to automate code generation, it is often desired to know if the code is AI-generated and by which model, especially for purposes like protecting intellectual property (IP) in industry and preventing academic misconduct in education. Incorporating watermarks into machine-generated content is one way to provide code provenance, but existing solutions are restricted to a single bit or lack flexibility. We present CodeIP, a new watermarking technique for LLM-based code generation. CodeIP enables the insertion of multi-bit information while preserving the semantics of the generated code, improving the strength and diversity of the inerseted watermark. This is achieved by training a type predictor to predict the subsequent grammar type of the next token to enhance the syntactical and semantic correctness of the generated code. Experiments on a real-world dataset across five programming languages showcase the effectiveness of CodeIP.

翻訳日:2024-04-26 20:09:25 公開日:2024-04-24

# Building-PCC: ポイントクラウド補完ベンチマークの構築

Building-PCC: Building Point Cloud Completion Benchmarks ( http://arxiv.org/abs/2404.15644v1 )

ライセンス: Link先を確認

Weixiao Gao, Ravi Peters, Jantien Stoter,

(参考訳) 3次元センシング技術の急速な進歩により、物体の3次元形状情報を得るのがますます便利になっている。ライダー技術は、遠距離で物体の3D情報を正確にキャプチャする機能を備えており、都市部の3Dデータの収集に広く応用されている。しかし、収集された点雲データは、閉塞、信号吸収、スペクトル反射などの要因により不完全性を示すことが多い。本稿では,これらの不完全データ処理におけるポイントクラウド補完技術の適用について検討し,都市のビルディングポイントクラウド補完作業における既存のディープラーニング手法の性能を評価するために,ビルディングPCCデータセットを新たに構築する。異なる手法の総合的な評価を通じて,3次元地理情報分野の革新を促進することを目的として,ビルディングポイントクラウドの完成において直面する重要な課題を分析した。ソースコードはhttps://github.com/tudelft3d/Building-PCC-Building-Point-Cloud-Completion-Benchmarks.gitで公開されています。

With the rapid advancement of 3D sensing technologies, obtaining 3D shape information of objects has become increasingly convenient. Lidar technology, with its capability to accurately capture the 3D information of objects at long distances, has been widely applied in the collection of 3D data in urban scenes. However, the collected point cloud data often exhibit incompleteness due to factors such as occlusion, signal absorption, and specular reflection. This paper explores the application of point cloud completion technologies in processing these incomplete data and establishes a new real-world benchmark Building-PCC dataset, to evaluate the performance of existing deep learning methods in the task of urban building point cloud completion. Through a comprehensive evaluation of different methods, we analyze the key challenges faced in building point cloud completion, aiming to promote innovation in the field of 3D geoinformation applications. Our source code is available at https://github.com/tudelft3d/Building-PCC-Building-Point-Cloud-Completion-Benchmarks.git.

翻訳日:2024-04-26 20:09:25 公開日:2024-04-24

# 小川らによるRamp Quantum Secret Sharing Schemeの高度共有

Advance Sharing with Ogawa et al.'s Ramp Quantum Secret Sharing Scheme ( http://arxiv.org/abs/2404.15646v1 )

ライセンス: Link先を確認

Satoshi Masumori, Ryutaroh Matsumoto,

(参考訳) 小川らによって提案されたランプ量子秘密共有は、しきい値型アクセス構造から最も高い符号化率を有する。一方、いくつかの量子秘密共有方式では、ディーラーに秘密が渡される前に一部の株式を参加者に分配できることが知られている。しかし、小川らの策略で秘密にされる前に、一部の株式を分配できるかどうかは不明である。本稿では,小川らの方式で秘密が与えられる前に株式を分配する手法を提案し,その上で,所定の秘密の前に流通できる株式の集合について必要かつ十分な条件を決定する。

The ramp quantum secret sharing proposed by Ogawa et al. has the highest possible coding rate given a threshold type access structure. On the other hand, in some quantum secret sharing schemes, it is known that some shares can be distributed to participants before a secret is given to the dealer. However, it is unclear whether some shares can be distributed before a secret is given in Ogawa et al.'s scheme. In this paper, we propose a method to distribute some shares before a secret is given in Ogawa et al.'s scheme, then determine a necessary and sufficient condition on sets of shares that can be distributed before a given secret.

翻訳日:2024-04-26 20:09:25 公開日:2024-04-24

# Affordance Blending Networks

Affordance Blending Networks ( http://arxiv.org/abs/2404.15648v1 )

ライセンス: Link先を確認

Hakan Aktas, Yukie Nagai, Minoru Asada, Erhan Oztop, Emre Ugur,

(参考訳) Affordancesは生態心理学に根ざし、James J. Gibsonによって開拓された概念であり、個人と環境の間の動的関係を理解するための基本的な枠組みとして登場した。伝統的な知覚的および認知的パラダイムを超えて、余裕は、与えられたコンテキスト内のエージェントにオブジェクトが与える本質的な効果と行動の可能性を表す。理論レンズとして、余剰は効果と作用の間のギャップを埋め、エージェントの実体に対する作用とこれらの作用の効果の間の関係を微妙に理解する。本研究では, 対象, 行動, 効果を共通潜在空間内の1つの潜在表現に統一するモデルを提案する。この余剰空間を利用することで,アクションやオブジェクトが与えられたときのエフェクトトラジェクトリを生成し,効果トラジェクトリやオブジェクトが与えられたときのアクショントラジェクトリを生成することができる。実験では,本モデルでは各対象の振る舞いを学習せず,同値性と呼ぶ対象が共有する余剰関係を学習することを示した。シミュレーション実験に加えて,実世界の事例において,本モデルが直接模倣に利用できることを示した。また,異なるロボットの動作を関連付けるために,クロス・エボディメント・トランスファーの基盤として,サプライズを提案する。最後に、決定論的モデル入力に対して有効な出力を生成するソリューションとして選択的損失を導入する。

Affordances, a concept rooted in ecological psychology and pioneered by James J. Gibson, have emerged as a fundamental framework for understanding the dynamic relationship between individuals and their environments. Expanding beyond traditional perceptual and cognitive paradigms, affordances represent the inherent effect and action possibilities that objects offer to the agents within a given context. As a theoretical lens, affordances bridge the gap between effect and action, providing a nuanced understanding of the connections between agents' actions on entities and the effect of these actions. In this study, we propose a model that unifies object, action and effect into a single latent representation in a common latent space that is shared between all affordances that we call the affordance space. Using this affordance space, our system is able to generate effect trajectories when action and object are given and is able to generate action trajectories when effect trajectories and objects are given. In the experiments, we showed that our model does not learn the behavior of each object but it learns the affordance relations shared by the objects that we call equivalences. In addition to simulated experiments, we showed that our model can be used for direct imitation in real world cases. We also propose affordances as a base for Cross Embodiment transfer to link the actions of different robots. Finally, we introduce selective loss as a solution that allows valid outputs to be generated for indeterministic model inputs.

翻訳日:2024-04-26 20:09:25 公開日:2024-04-24

# EMの返却:QA評価のためのエンティティ駆動型回答セットの拡張

Return of EM: Entity-driven Answer Set Expansion for QA Evaluation ( http://arxiv.org/abs/2404.15650v1 )

ライセンス: Link先を確認

Dongryeol Lee, Minwoo Lee, Kyungmin Min, Joonsuk Park, Kyomin Jung,

(参考訳) 近年,大規模言語モデル(LLM)を直接使用することが,QAモデルを評価する上で最も信頼性の高い手法であることが示されている。しかし、限定的な解釈可能性、高いコスト、環境被害に悩まされている。そこで本研究では,エンティティ駆動型回答セット拡張を用いたソフトEMを提案する。本手法は, 表面形状が実体の種類によっては特定のパターンに従うことがしばしばあるという観察に基づいて, 多様な表面形状を含むように金の解集合を拡張する。実験結果から,本手法は従来の評価手法よりも高い性能を示した。さらに,評価手法の信頼性はLLM法と同等であり,高い解釈可能性と環境負荷の低減の利点も提供する。

Recently, directly using large language models (LLMs) has been shown to be the most reliable method to evaluate QA models. However, it suffers from limited interpretability, high cost, and environmental harm. To address these, we propose to use soft EM with entity-driven answer set expansion. Our approach expands the gold answer set to include diverse surface forms, based on the observation that the surface forms often follow particular patterns depending on the entity type. The experimental results show that our method outperforms traditional evaluation methods by a large margin. Moreover, the reliability of our evaluation method is comparable to that of LLM-based ones, while offering the benefits of high interpretability and reduced environmental harm.

翻訳日:2024-04-26 20:09:25 公開日:2024-04-24

# CatLIP: Webスケール画像テキストデータによる2.7倍高速事前学習によるCLIPレベルの視覚認識精度

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data ( http://arxiv.org/abs/2404.15653v1 )

ライセンス: Link先を確認

Sachin Mehta, Maxwell Horton, Fartash Faghri, Mohammad Hossein Sekhavat, Mahyar Najibi, Mehrdad Farajtabar, Oncel Tuzel, Mohammad Rastegari,

(参考訳) コントラスト学習は、画像とテキストの埋め込みのアライメントを通じて効果的な視覚表現を学習するための変換方法として登場した。しかし、画像とテキストのペア間の対照的な損失におけるペアワイズ類似性計算は、計算上の問題を引き起こす。本稿では,Webスケール画像テキストデータに基づく視覚モデルの弱教師付き事前学習を提案する。提案手法は,画像テキストデータに基づく事前学習を分類タスクとして再編成する。その結果、対の類似性計算を対照的な損失で不要にし、Webスケールのデータでの対照的な学習と比較して、トレーニング速度の2.7\times$Accelerationを達成した。検出やセグメンテーションを含む多様な視覚タスクにまたがる広範囲な実験を通じて,提案手法は高い表現品質を維持していることを示す。トレーニング済みのモデルウェイトとトレーニングレシピとともに、ソースコードは \url{https://github.com/apple/corenet} で公開されています。

Contrastive learning has emerged as a transformative method for learning effective visual representations through the alignment of image and text embeddings. However, pairwise similarity computation in contrastive loss between image and text pairs poses computational challenges. This paper presents a novel weakly supervised pre-training of vision models on web-scale image-text data. The proposed method reframes pre-training on image-text data as a classification task. Consequently, it eliminates the need for pairwise similarity computations in contrastive loss, achieving a remarkable $2.7\times$ acceleration in training speed compared to contrastive learning on web-scale data. Through extensive experiments spanning diverse vision tasks, including detection and segmentation, we demonstrate that the proposed method maintains high representation quality. Our source code along with pre-trained model weights and training recipes is available at \url{https://github.com/apple/corenet}.

翻訳日:2024-04-26 20:09:25 公開日:2024-04-24

# パーソナライズされたビジュアル多重クラスタリングに向けたマルチモーダルプロキシ学習

Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering ( http://arxiv.org/abs/2404.15655v1 )

ライセンス: Link先を確認

Jiawei Yao, Qi Qian, Juhua Hu,

(参考訳) 近年、異なる視点から複数の隠れたデータ構造を明らかにする可能性から、複数のクラスタリングが注目されている。深層クラスタリング技術の出現は、大規模データセット内の複雑なパターンと関係を明らかにすることによって、パフォーマンスを著しく向上させた。しかし、アルゴリズムが生成するすべてのクラスタリングをユーザが必要とせず、必要なクラスタリングを判断するためには、クラスタリング結果の相当な理解が必要であるため、大きな課題が生じる。伝統的に、ユーザの短いキーワードと対応する視覚コンポーネントを一致させることは困難であったが、マルチモーダルおよび大規模言語モデル(LLM)の出現はこのギャップを埋め始めている。そこで本研究では,マルチモーダル・プロキシ・ラーニング・プロセスを用いた新しい手法であるMulti-MaPを提案する。これはCLIPエンコーダを利用してコヒーレントテキストと画像埋め込みを抽出し、GPT-4はユーザの興味を統合して効果的なテキストコンテキストを定式化する。さらに、ユーザの関心に応じて最適なテキストプロキシを学習するために、参照語制約と概念レベルの制約を設計する。 Multi-MaPは、キーワードを通じてユーザの興味を適切にキャプチャするだけでなく、関連するクラスタリングの特定を容易にする。広範にわたる実験により,Multi-MaPは,全てのベンチマークマルチクラスタ・ビジョンタスクにおいて,最先端の手法を一貫して上回っていることがわかった。私たちのコードはhttps://github.com/Alexander-Yao/Multi-MaP.comで公開されています。

Multiple clustering has gained significant attention in recent years due to its potential to reveal multiple hidden structures of data from different perspectives. The advent of deep multiple clustering techniques has notably advanced the performance by uncovering complex patterns and relationships within large datasets. However, a major challenge arises as users often do not need all the clusterings that algorithms generate, and figuring out the one needed requires a substantial understanding of each clustering result. Traditionally, aligning a user's brief keyword of interest with the corresponding vision components was challenging, but the emergence of multi-modal and large language models (LLMs) has begun to bridge this gap. In response, given unlabeled target visual data, we propose Multi-MaP, a novel method employing a multi-modal proxy learning process. It leverages CLIP encoders to extract coherent text and image embeddings, with GPT-4 integrating users' interests to formulate effective textual contexts. Moreover, reference word constraint and concept-level constraint are designed to learn the optimal text proxy according to the user's interest. Multi-MaP not only adeptly captures a user's interest via a keyword but also facilitates identifying relevant clusterings. Our extensive experiments show that Multi-MaP consistently outperforms state-of-the-art methods in all benchmark multi-clustering vision tasks. Our code is available at https://github.com/Alexander-Yao/Multi-MaP.

翻訳日:2024-04-26 20:09:25 公開日:2024-04-24

# MISLEAD:エスプレッションアタックにおけるエプシロン学習のための選択機能の重要性の操作

MISLEAD: Manipulating Importance of Selected features for Learning Epsilon in Evasion Attack Deception ( http://arxiv.org/abs/2404.15656v1 )

ライセンス: Link先を確認

Vidit Khazanchi, Pavan Kulkarni, Yuvaraj Govindarajulu, Manojkumar Parmar,

(参考訳) 敵攻撃による機械学習(ML)モデルの新たな脆弱性は、その信頼性に対する懸念を引き起こす。特に、回避攻撃は入力データに正確な摂動を導入してモデルを操作し、誤った予測を引き起こす。そこで本稿では,SHAP(SHapley Additive exPlanations)を特徴量分析に用いた手法と,回避攻撃を行うためのイノベーティブな最適エプシロン手法を提案する。私たちのアプローチは、モデル脆弱性を理解するためのSHAPベースの分析から始まり、ターゲットの回避戦略の考案に不可欠です。バイナリ探索アルゴリズムを用いた最適エプシロン法は,回避に要する最小エプシロンを効率的に決定する。多様な機械学習アーキテクチャによる評価は、敵のサンプルを生成する際のテクニックの精度を示し、モデル結果を操作する上での有効性を裏付けている。本研究は,機械学習システムにおける潜在的なセキュリティリスクを特定し,軽減するための,継続的評価とモニタリングの重要性を強調する。

Emerging vulnerabilities in machine learning (ML) models due to adversarial attacks raise concerns about their reliability. Specifically, evasion attacks manipulate models by introducing precise perturbations to input data, causing erroneous predictions. To address this, we propose a methodology combining SHapley Additive exPlanations (SHAP) for feature importance analysis with an innovative Optimal Epsilon technique for conducting evasion attacks. Our approach begins with SHAP-based analysis to understand model vulnerabilities, crucial for devising targeted evasion strategies. The Optimal Epsilon technique, employing a Binary Search algorithm, efficiently determines the minimum epsilon needed for successful evasion. Evaluation across diverse machine learning architectures demonstrates the technique's precision in generating adversarial samples, underscoring its efficacy in manipulating model outcomes. This study emphasizes the critical importance of continuous assessment and monitoring to identify and mitigate potential security risks in machine learning systems.

翻訳日:2024-04-26 20:09:25 公開日:2024-04-24

# FedSI: 効率的な不確実性定量化のためのフェデレーションサブネットワーク推論

FedSI: Federated Subnetwork Inference for Efficient Uncertainty Quantification ( http://arxiv.org/abs/2404.15657v1 )

ライセンス: Link先を確認

Hui Chen, Hengyu Liu, Zhangkai Wu, Xuhui Fan, Longbing Cao,

(参考訳) 深層ニューラルネットワーク(DNN)に基づくパーソナライズド・フェデレーション・ラーニング(PFL)は、データの不均一性に対処し、有望な性能を示す一方で、既存のフェデレーションド・ラーニング(FL)の方法は、効率的な体系的不確実性定量化に悩まされている。ベイズ DNN ベースの PFL は通常、過剰に単純化されたモデル構造か、高い計算とメモリコストのどちらかに疑問を呈する。本稿では,ベイズDNNベースのサブネットワーク推論PFLフレームワークであるFedSIを紹介する。 FedSIは、ベイズ的手法を利用して体系的な不確実性を効果的に組み込むことにより、シンプルでスケーラブルである。クライアント固有のサブネットワーク推論機構を実装し、後続分布を通して推論される大きな分散を持つネットワークパラメータを選択し、残りを決定論的パラメータとして修正する。 FedSIは、体系的な不確実性を最大限に保ちながら、高速でスケーラブルな推論を達成する。 3つの異なるベンチマークデータセットに対する大規模な実験により、FedSIは異種FLシナリオにおいて既存のベイズ系および非ベイズ系FLベースラインより優れていることが示された。

While deep neural networks (DNNs) based personalized federated learning (PFL) is demanding for addressing data heterogeneity and shows promising performance, existing methods for federated learning (FL) suffer from efficient systematic uncertainty quantification. The Bayesian DNNs-based PFL is usually questioned of either over-simplified model structures or high computational and memory costs. In this paper, we introduce FedSI, a novel Bayesian DNNs-based subnetwork inference PFL framework. FedSI is simple and scalable by leveraging Bayesian methods to incorporate systematic uncertainties effectively. It implements a client-specific subnetwork inference mechanism, selects network parameters with large variance to be inferred through posterior distributions, and fixes the rest as deterministic ones. FedSI achieves fast and scalable inference while preserving the systematic uncertainties to the fullest extent. Extensive experiments on three different benchmark datasets demonstrate that FedSI outperforms existing Bayesian and non-Bayesian FL baselines in heterogeneous FL scenarios.

翻訳日:2024-04-26 20:09:25 公開日:2024-04-24

# KS-LLM:質問応答のためのエビデンス文書を用いた大規模言語モデルの知識選択

KS-LLM: Knowledge Selection of Large Language Models with Evidence Document for Question Answering ( http://arxiv.org/abs/2404.15660v1 )

ライセンス: Link先を確認

Xinxin Zheng, Feihu Che, Jinyang Wu, Shuai Zhang, Shuai Nie, Kang Liu, Jianhua Tao,

(参考訳) 大きな言語モデル(LLM)は幻覚の問題に悩まされ、知識集約的なタスクに適用した場合、重大な課題に直面します。有望なアプローチは、証拠文書を検索や生成を通じて得られる追加の支援知識として活用することである。しかし,既存の手法では証拠文書の全内容を直接活用し,ノイズ情報を導入し,大規模言語モデルの性能を損なう可能性がある。この問題に対処するため,我々は,証拠文書から貴重な情報を特定することを目的とした,KS-LLM(Knowledge Selection of Large Language Models)手法を提案する。 KS-LLMアプローチは三つ組を利用して、質問に答えるのに有用な証拠文書から知識スニペットを効果的に選択する。具体的には、まず、入力された質問に基づいて三重項を生成し、次に、証拠文書から三重項に最もよく似た証拠文を選択し、最後に、証拠文と三重項を組み合わせて、大きな言語モデルによる回答の生成を支援する。 TriviaQA, WebQ, NQ などの質問応答データセットの実験的比較により,提案手法がベースラインを超え,最良の結果が得られることを示した。

Large language models (LLMs) suffer from the hallucination problem and face significant challenges when applied to knowledge-intensive tasks. A promising approach is to leverage evidence documents as extra supporting knowledge, which can be obtained through retrieval or generation. However, existing methods directly leverage the entire contents of the evidence document, which may introduce noise information and impair the performance of large language models. To tackle this problem, we propose a novel Knowledge Selection of Large Language Models (KS-LLM) method, aiming to identify valuable information from evidence documents. The KS-LLM approach utilizes triples to effectively select knowledge snippets from evidence documents that are beneficial to answering questions. Specifically, we first generate triples based on the input question, then select the evidence sentences most similar to triples from the evidence document, and finally combine the evidence sentences and triples to assist large language models in generating answers. Experimental comparisons on several question answering datasets, such as TriviaQA, WebQ, and NQ, demonstrate that the proposed method surpasses the baselines and achieves the best results.

翻訳日:2024-04-26 20:09:25 公開日:2024-04-24

# CWF: 高品質メッシュ単純化における弱機能の統合

CWF: Consolidating Weak Features in High-quality Mesh Simplification ( http://arxiv.org/abs/2404.15661v1 )

ライセンス: Link先を確認

Rui Xu, Longdu Liu, Ningna Wang, Shuangmin Chen, Shiqing Xin, Xiaohu Guo, Zichun Zhong, Taku Komura, Wenping Wang, Changhe Tu,

(参考訳) メッシュの単純化では、精度、三角形の品質、機能アライメントといった一般的な要件がトレードオフと見なされることが多い。既存のアルゴリズムは、これらの要求の1つまたはいくつかの特定の側面にのみ集中する。例えば、よく知られたQuadric Error Metrics (QEM) アプローチは精度を優先し、強い特徴線や点も維持できるが、高い三角形の品質を保証するには不足し、強い特徴ほど独特でない弱い特徴を劣化させる可能性がある。本稿では,これらの要件をすべて同時に検討するスムーズな機能を提案する。関数は通常の異方性項と中心渦渦テッセルレーション(CVT)エネルギー項を含み、変数は表面に配置された可動点の集合である。前者はQEMの精神を継承するが、連続的な設定で動作し、後者は偶数点分布を奨励し、様々な表面測度を許容する。さらに、この2つの項を自動的にバランスをとるために、崩壊する重みを導入します。 ABCデータセットから100のCADモデルと21の有機モデルを選択し、既存のメッシュ単純化アルゴリズムを我々のものと比較した。崩壊重みの導入は、2項間の衝突を効果的に減らし、弱い特徴のアライメントを可能にする。この特徴は、我々のアプローチを既存のメッシュの単純化方法と区別し、形状理解において有意義な可能性を証明している。

In mesh simplification, common requirements like accuracy, triangle quality, and feature alignment are often considered as a trade-off. Existing algorithms concentrate on just one or a few specific aspects of these requirements. For example, the well-known Quadric Error Metrics (QEM) approach prioritizes accuracy and can preserve strong feature lines/points as well but falls short in ensuring high triangle quality and may degrade weak features that are not as distinctive as strong ones. In this paper, we propose a smooth functional that simultaneously considers all of these requirements. The functional comprises a normal anisotropy term and a Centroidal Voronoi Tessellation (CVT) energy term, with the variables being a set of movable points lying on the surface. The former inherits the spirit of QEM but operates in a continuous setting, while the latter encourages even point distribution, allowing various surface metrics. We further introduce a decaying weight to automatically balance the two terms. We selected 100 CAD models from the ABC dataset, along with 21 organic models, to compare the existing mesh simplification algorithms with ours. Experimental results reveal an important observation: the introduction of a decaying weight effectively reduces the conflict between the two terms and enables the alignment of weak features. This distinctive feature sets our approach apart from most existing mesh simplification methods and demonstrates significant potential in shape understanding.

翻訳日:2024-04-26 20:09:25 公開日:2024-04-24

# 自己スーパービジョンによる解剖学からの局所性、構成性、分解性学習による基礎モデルにおける部分ホール階層の表現

Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability, Composability, and Decomposability from Anatomy via Self-Supervision ( http://arxiv.org/abs/2404.15672v1 )

ライセンス: Link先を確認

Mohammad Reza Hosseinzadeh Taher, Michael B. Gotway, Jianming Liang,

(参考訳) 深層学習は多層特徴空間の学習において優れているが、医療画像の顕著な特性である部分全体関係の明示的なコーディングを欠いていることが多い。この制限を克服するために、Adam-v2はAdam [79]を拡張した新しい自己教師型学習フレームワークを紹介した。Adam-v2は、(1)局所性、識別性、異なる解剖パターンを識別するための識別的表現、(2)構成性、各解剖学的構造を一括的に学習する、(3)分解性、各解剖学的構造全体を一括的に解釈する、という3つのキーブランチを通じて、学習目標に全階層を明示的に組み込むことによって、Adam [79]を拡張した新しい学習フレームワークである。 10タスクにわたる実験結果は、ゼロショット、少数ショット転送、フル微調整設定の11ベースラインと比較して、Adam-v2が大規模医療モデルとさまざまな下流タスクにまたがる既存のSSLメソッドよりも優れたパフォーマンスを示している。アダム-v2の表現の一般性やロバスト性の高さは、ラベルのない医療画像と異なる解剖学的構造のための階層構造を明示的に構築することに由来する。 Adam-v2は、その埋め込みにおいて解剖学的多様性と調和のセマンティックバランスを保ち、ジェネリックかつセマンティックに意味のある表現を既存のSSLメソッドで見落としている。すべてのコードと事前訓練されたモデルはhttps://github.com/JLiangLab/Eden.comで入手できる。

Humans effortlessly interpret images by parsing them into part-whole hierarchies; deep learning excels in learning multi-level feature spaces, but they often lack explicit coding of part-whole relations, a prominent property of medical imaging. To overcome this limitation, we introduce Adam-v2, a new self-supervised learning framework extending Adam [79] by explicitly incorporating part-whole hierarchies into its learning objectives through three key branches: (1) Localizability, acquiring discriminative representations to distinguish different anatomical patterns; (2) Composability, learning each anatomical structure in a parts-to-whole manner; and (3) Decomposability, comprehending each anatomical structure in a whole-to-parts manner. Experimental results across 10 tasks, compared to 11 baselines in zero-shot, few-shot transfer, and full fine-tuning settings, showcase Adam-v2's superior performance over large-scale medical models and existing SSL methods across diverse downstream tasks. The higher generality and robustness of Adam-v2's representations originate from its explicit construction of hierarchies for distinct anatomical structures from unlabeled medical images. Adam-v2 preserves a semantic balance of anatomical diversity and harmony in its embedding, yielding representations that are both generic and semantically meaningful, yet overlooked in existing SSL methods. All code and pretrained models are available at https://github.com/JLiangLab/Eden.

翻訳日:2024-04-26 19:59:41 公開日:2024-04-24

# Augmented CARDS: Twitter上の気候変動の誤報のトリガーを特定する機械学習アプローチ

Augmented CARDS: A machine learning approach to identifying triggers of climate change misinformation on Twitter ( http://arxiv.org/abs/2404.15673v1 )

ライセンス: Link先を確認

Cristian Rojas, Frank Algra-Maschio, Mark Andrejevic, Travis Coan, John Cook, Yuan-Fang Li,

(参考訳) 気候変動に関する誤報は、社会的幸福への重大な脅威となり、効果的な緩和戦略が緊急に必要となる。しかし、ソーシャルメディアプラットフォーム上でのオンライン誤報の急増は、ファクトチェッカーが虚偽の主張を軽視する能力を上回っている。気候変動の誤報の自動検出は、有望な解決策を提供する。本研究では,2段階の階層モデルであるAugmented CARDSモデルを開発することにより,このギャップに対処する。さらに、2022年の6ヶ月間に500万件の気候をテーマとしたツイートに対して、Augmented CARDSモデルを適用した。 Twitter上での温暖化に関する主張の半分以上は、気候のアクターや陰謀説に対する攻撃が関与していることがわかりました。気候コントラリアニズムのスパイクは、政治イベント、自然イベント、コントラリアンインフルエンサー、あるいは説得力のあるインフルエンサーの4つの刺激の1つと一致する。気候の誤報に対する自動応答の意義について論じる。

Misinformation about climate change poses a significant threat to societal well-being, prompting the urgent need for effective mitigation strategies. However, the rapid proliferation of online misinformation on social media platforms outpaces the ability of fact-checkers to debunk false claims. Automated detection of climate change misinformation offers a promising solution. In this study, we address this gap by developing a two-step hierarchical model, the Augmented CARDS model, specifically designed for detecting contrarian climate claims on Twitter. Furthermore, we apply the Augmented CARDS model to five million climate-themed tweets over a six-month period in 2022. We find that over half of contrarian climate claims on Twitter involve attacks on climate actors or conspiracy theories. Spikes in climate contrarianism coincide with one of four stimuli: political events, natural events, contrarian influencers, or convinced influencers. Implications for automated responses to climate misinformation are discussed.

翻訳日:2024-04-26 19:59:40 公開日:2024-04-24

# チェーン・オブ・ワットを超えて - LLMにおけるチェイン・オブ・Xパラダイムのサーベイ

Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs ( http://arxiv.org/abs/2404.15676v1 )

ライセンス: Link先を確認

Yu Xia, Rui Wang, Xu Liu, Mingyan Li, Tong Yu, Xiang Chen, Julian McAuley, Shuai Li,

(参考訳) CoT(Chain-of-Thought)は、大規模言語モデル(LLM)の印象的な推論能力を引き出す、広く採用されているプロンプト手法である。 CoTのシーケンシャルな思考構造に触発されて、様々な領域やLLMを含むタスクにまたがる様々な課題に対処するために、多くのChain-of-X(CoX)手法が開発されている。本稿では,異なる文脈におけるLLMの連鎖-X法に関する包括的調査を行う。具体的には、ノードの分類、すなわち、CoXのXとアプリケーションタスクで分類する。また,既存のCoX手法の発見と意義,今後の方向性についても論じる。我々の調査は、より広いシナリオにCoTのアイデアを適用したい研究者のための、詳細かつ最新のリソースとして機能することを目的としています。

Chain-of-Thought (CoT) has been a widely adopted prompting method, eliciting impressive reasoning abilities of Large Language Models (LLMs). Inspired by the sequential thought structure of CoT, a number of Chain-of-X (CoX) methods have been developed to address various challenges across diverse domains and tasks involving LLMs. In this paper, we provide a comprehensive survey of Chain-of-X methods for LLMs in different contexts. Specifically, we categorize them by taxonomies of nodes, i.e., the X in CoX, and application tasks. We also discuss the findings and implications of existing CoX methods, as well as potential future directions. Our survey aims to serve as a detailed and up-to-date resource for researchers seeking to apply the idea of CoT to broader scenarios.

翻訳日:2024-04-26 19:59:40 公開日:2024-04-24

# characterFactory:拡散モデルのためのGANを用いた一貫性キャラクタのサンプリング

CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models ( http://arxiv.org/abs/2404.15677v1 )

ライセンス: Link先を確認

Qinghe Wang, Baolu Li, Xiaomin Li, Bing Cao, Liqian Ma, Huchuan Lu, Xu Jia,

(参考訳) 近年のテキスト・ツー・イメージ・モデルの発展は、人中心世代における新たなフロンティアを開拓している。しかし、これらのモデルは、一貫した新しいIDを持つ画像を生成するために直接利用することはできない。本研究では,拡散モデルのためのGANの潜時空間における一貫した同一性を持つ新しい文字をサンプリングするフレームワークである characterFactory を提案する。より具体的には、セレブ名の埋め込みという言葉をアイデンティティ一貫性のある生成タスクの基礎的真実とみなし、GANモデルを訓練して、潜在空間からセレブ埋め込み空間へのマッピングを学習する。さらに、生成したアイデンティティ埋め込みが、様々なコンテキストにおいて、アイデンティティ一貫性のある画像を生成することができるように、コンテキスト一貫性損失を設計する。注目すべきは、モデル全体がトレーニングに10分しかかからず、推論中に無限の文字をエンドツーエンドにサンプリングできることだ。広範囲な実験により, 文字生成におけるキャラクタファクトリーの性能は, アイデンティティの整合性と編集性に優れていた。さらに、生成された文字は、オフザシェルフ画像/ビデオ/3D拡散モデルとシームレスに結合することができる。我々は、提案した CharacterFactory が、アイデンティティ一貫性のある文字生成の重要なステップであると信じている。プロジェクトページは、https://qinghew.github.io/CharacterFactory/.comで公開されている。

Recent advances in text-to-image models have opened new frontiers in human-centric generation. However, these models cannot be directly employed to generate images with consistent newly coined identities. In this work, we propose CharacterFactory, a framework that allows sampling new characters with consistent identities in the latent space of GANs for diffusion models. More specifically, we consider the word embeddings of celeb names as ground truths for the identity-consistent generation task and train a GAN model to learn the mapping from a latent space to the celeb embedding space. In addition, we design a context-consistent loss to ensure that the generated identity embeddings can produce identity-consistent images in various contexts. Remarkably, the whole model only takes 10 minutes for training, and can sample infinite characters end-to-end during inference. Extensive experiments demonstrate excellent performance of the proposed CharacterFactory on character creation in terms of identity consistency and editability. Furthermore, the generated characters can be seamlessly combined with the off-the-shelf image/video/3D diffusion models. We believe that the proposed CharacterFactory is an important step for identity-consistent character generation. Project page is available at: https://qinghew.github.io/CharacterFactory/.

翻訳日:2024-04-26 19:59:40 公開日:2024-04-24

# レジデントパワー, 不正自動化: 自動意思決定システムにおける正当性を無視する問題

Legitimate Power, Illegitimate Automation: The problem of ignoring legitimacy in automated decision systems ( http://arxiv.org/abs/2404.15680v1 )

ライセンス: Link先を確認

Jake Stone, Brent Mittelstadt,

(参考訳) 機械学習と人工知能の進歩は、自動意思決定システム(ADS)の普及を加速させた。広範な文献は、これらのシステムの決定が公正であるためには、どのような条件を満たさなければならないかを探求している。しかし、ADSの支配者がなぜそのような決定を下す権利があるのかという正当性に関する疑問は、比較的ほとんど注目されていない。この論文は、そのような疑問が提起された場合、しばしば、公的な受容または公正性、正確性、専門性、効率といった他の実体的価値と正当性を誤って説明することを示しています。より良い理論を求めて、我々は国家の正当性について哲学文学を批判的に分析し、同意、公理、民主的な権威に焦点をあてる。この分析は、分析政治哲学における正当性に対する一般的な理解もまた、ADSが正当であるか否かの確定に不適であることを示している。そこで本論文は,ADSの正当性理論への期待を明らかにするとともに,今後の研究プログラムへの道筋を示す。

Progress in machine learning and artificial intelligence has spurred the widespread adoption of automated decision systems (ADS). An extensive literature explores what conditions must be met for these systems' decisions to be fair. However, questions of legitimacy -- why those in control of ADS are entitled to make such decisions -- have received comparatively little attention. This paper shows that when such questions are raised theorists often incorrectly conflate legitimacy with either public acceptance or other substantive values such as fairness, accuracy, expertise or efficiency. In search of better theories, we conduct a critical analysis of the philosophical literature on the legitimacy of the state, focusing on consent, public reason, and democratic authorisation. This analysis reveals that the prevailing understanding of legitimacy in analytical political philosophy is also ill-suited to the task of establishing whether and when ADS are legitimate. The paper thus clarifies expectations for theories of ADS legitimacy and charts a path for a future research programme on the topic.

翻訳日:2024-04-26 19:59:40 公開日:2024-04-24

# 生成事前学習変圧器モデルを用いた暗号ハッシュ関数実装のソースコード変数の自動生成

Automated Creation of Source Code Variants of a Cryptographic Hash Function Implementation Using Generative Pre-Trained Transformer Models ( http://arxiv.org/abs/2404.15681v1 )

ライセンス: Link先を確認

Elijah Pelofske, Vincent Urias, Lorie M. Liebrock,

(参考訳) ジェネレーティブ・プレトレーニング・トランスフォーマー(Generative Pre-trained Transformer、GPT)は、新鮮で一貫性のある自然言語を生成できる大規模言語機械学習モデルの一種である。本研究では, 暗号ハッシュ関数SHA-1の実装において, GPTモデルが新規かつ適切なバージョン, 特に非常に安全でないバージョンを生成する能力について検討した。 GPTモデルLlama-2-70b-chat-h、Mistral-7B-Instruct-v0.1、zephyr-7b-alphaが使用される。 GPTモデルは、ローカルGPTフレームワークとlangchainの修正版を使用して各関数を再書き込みするよう促され、完全なソースコードとヘッダファイルのワード埋め込みコンテキストをモデルに提供し、130,000以上の関数がGPT出力のテキストブロックを書き換え、そのうち約4万がCコードとして解析され、コンパイルされた。生成されたコードは、コンパイル可能であり、アルゴリズムの正しさ、メモリリーク、コンパイラ最適化の安定性、参照実装までの文字距離を解析する。注目すべきは、いくつかの生成された関数変種は、いくつかのテストベクターに対して正しいが、他のテストベクターでは正しくないという高い実装上のセキュリティリスクがあることである。さらに、多くの関数の実装は、SHA-1の参照アルゴリズムに正確ではなく、ハッシュ関数の基本的な特徴を持つハッシュを生成した。関数の再書き込みの多くは、メモリリーク、整数オーバーフロー、バウンダリアクセス、初期化されていない値の使用、コンパイラの最適化不安定といった深刻な欠陥を含んでいた。コンパイラの最適化設定とコンパイル済みバイナリのSHA-256ハッシュチェックサムは、同等だが同一の構文を持たない実装に使用される。

Generative pre-trained transformers (GPT's) are a type of large language machine learning model that are unusually adept at producing novel, and coherent, natural language. In this study the ability of GPT models to generate novel and correct versions, and notably very insecure versions, of implementations of the cryptographic hash function SHA-1 is examined. The GPT models Llama-2-70b-chat-h, Mistral-7B-Instruct-v0.1, and zephyr-7b-alpha are used. The GPT models are prompted to re-write each function using a modified version of the localGPT framework and langchain to provide word embedding context of the full source code and header files to the model, resulting in over 130,000 function re-write GPT output text blocks, approximately 40,000 of which were able to be parsed as C code and subsequently compiled. The generated code is analyzed for being compilable, correctness of the algorithm, memory leaks, compiler optimization stability, and character distance to the reference implementation. Remarkably, several generated function variants have a high implementation security risk of being correct for some test vectors, but incorrect for other test vectors. Additionally, many function implementations were not correct to the reference algorithm of SHA-1, but produced hashes that have some of the basic characteristics of hash functions. Many of the function re-writes contained serious flaws such as memory leaks, integer overflows, out of bounds accesses, use of uninitialised values, and compiler optimization instability. Compiler optimization settings and SHA-256 hash checksums of the compiled binaries are used to cluster implementations that are equivalent but may not have identical syntax - using this clustering over 100,000 novel and correct versions of the SHA-1 codebase were generated where each component C function of the reference implementation is different from the original code.

翻訳日:2024-04-26 19:59:40 公開日:2024-04-24

# AnoFPDM:脳MRIにおける拡散モデルの前方プロセスによる異常セグメンテーション

AnoFPDM: Anomaly Segmentation with Forward Process of Diffusion Models for Brain MRI ( http://arxiv.org/abs/2404.15683v1 )

ライセンス: Link先を確認

Yiming Che, Fazle Rafsani, Jay Shah, Md Mahfuzur Rahman Siddiquee, Teresa Wu,

(参考訳) 画像レベルのラベルを活かした異常セグメンテーションにおける弱教師付き拡散モデル(DM)は、教師なし手法に比べて優れた性能で注目されている。トレーニングにおけるピクセルレベルのラベルの必要性を排除し、教師付きメソッドよりもコスト効率の良い代替手段を提供する。しかし、既存の手法は、推論におけるハイパーパラメータチューニングのためのコストのかかるピクセルレベルのラベルに大きく依存するため、完全には教師されない。この課題に対処するために、ピクセルレベルのラベルを必要とせずに動作する、完全に弱い教師付きフレームワークであるAnoFPDM(Anomaly Segmentation with Forward Process of Diffusion Models)を導入する。入力画像毎のノイズスケールとしきい値として,未案内前処理を基準として,適切なハイパーパラメータを同定する。前方プロセスの各ステップから異常マップを集約し,異常領域の信号強度を高める。また,提案手法は,画素レベルのラベルを使わずに,最新の最先端の弱教師付きアプローチよりも優れていた。

Weakly-supervised diffusion models (DM) in anomaly segmentation, leveraging image-level labels, have attracted significant attention for their superior performance compared to unsupervised methods. It eliminates the need for pixel-level labels in training, offering a more cost-effective alternative to supervised methods. However, existing methods are not fully weakly-supervised because they heavily rely on costly pixel-level labels for hyperparameter tuning in inference. To tackle this challenge, we introduce Anomaly Segmentation with Forward Process of Diffusion Models (AnoFPDM), a fully weakly-supervised framework that operates without the need for pixel-level labels. Leveraging the unguided forward process as a reference, we identify suitable hyperparameters, i.e., noise scale and threshold, for each input image. We aggregate anomaly maps from each step in the forward process, enhancing the signal strength of anomalous regions. Remarkably, our proposed method outperforms recent state-of-the-art weakly-supervised approaches, even without utilizing pixel-level labels.

翻訳日:2024-04-26 19:59:40 公開日:2024-04-24

# 能動光強度干渉法による超高分解能イメージング

Super-resolution imaging based on active optical intensity interferometry ( http://arxiv.org/abs/2404.15685v1 )

ライセンス: Link先を確認

Lu-Chuan Liu, Cheng Wu, Wei Li, Yu-Ao Chen, Frank Wilczek, Xiao-Peng Shao, Feihu Xu, Qiang Zhang, Jian-Wei Pan,

(参考訳) インターフェロメトリーによる長基線回折制限光開口合成技術は、科学的研究や実用化において重要な役割を担っている。振幅(位相)干渉法とは対照的に、熱光の光子束効果を測定するために光の量子的性質を利用する強度干渉法は、大気の乱流や光学的欠陥に対して堅牢である。しかし、熱光源は典型的には大きなばらつき角を持ち、モード毎の平均光子数は低く、長い範囲で適用可能である。そこで本研究では,超高分解能イメージングのための能動強度干渉法を提案し,実演する。本手法では、位相非依存の複数のレーザーエミッタを用いて熱照射を発生させ、精巧な計算アルゴリズムを用いて画像の再構成を行う。屋外環境では、1つの望遠鏡の14倍の回折限界の解像度で、1.36km以上の2次元ミリレベルのターゲットを撮像する。高分解能な光学イメージングとセンシングは、物理学と気象学の一般的な分野に長基線能動強度干渉法を適用することで期待できる。

Long baseline diffraction-limited optical aperture synthesis technology by interferometry plays an important role in scientific study and practical application. In contrast to amplitude (phase) interferometry, intensity interferometry -- which exploits the quantum nature of light to measure the photon bunching effect in thermal light -- is robust against atmospheric turbulence and optical defects. However, a thermal light source typically has a significant divergence angle and a low average photon number per mode, forestalling the applicability over long ranges. Here, we propose and demonstrate active intensity interferometry for super-resolution imaging over the kilometer range. Our scheme exploits phase-independent multiple laser emitters to produce the thermal illumination and uses an elaborate computational algorithm to reconstruct the image. In outdoor environments, we image two-dimension millimeter-level targets over 1.36 kilometers at a resolution of 14 times the diffraction limit of a single telescope. High-resolution optical imaging and sensing are anticipated by applying long-baseline active intensity interferometry in general branches of physics and metrology.

翻訳日:2024-04-26 19:59:40 公開日:2024-04-24

# 差分プライバシーにおける雑音分散最適化 : インスタンスごとの差分プライバシーによるゲーム理論的アプローチ

Noise Variance Optimization in Differential Privacy: A Game-Theoretic Approach Through Per-Instance Differential Privacy ( http://arxiv.org/abs/2404.15686v1 )

ライセンス: Link先を確認

Sehyun Ryu, Jonggyu Jang, Hyun Jong Yang,

(参考訳) 差分プライバシー(DP)の概念は、個人をターゲットデータセットに含めることによる分布の変化を観察することにより、プライバシー損失を定量的に測定することができる。一般的に制約として使用されるDPは、AppleやGoogleのような業界巨人の機械学習におけるデータセットの保護において際立っている。 DPを保証する一般的な手法は、クエリ出力に適切なノイズを組み込むことで、会員推測やリンク攻撃といったプライバシー攻撃に対する統計的防御システムを確立することである。しかし、特に小さなデータセットの場合、既存のDPメカニズムは時にクエリ出力に過剰なノイズを加え、データユーティリティを破棄する。これは、従来のDPが最悪のシナリオ、すなわち統計的外れ値に基づいてプライバシー損失を計算するためである。本研究では、この課題に対処するために、インスタンスごとのDP(pDP)を制約として使用し、各データインスタンスのプライバシ損失を測定し、個々のインスタンスに合わせたノイズを最適化する。簡単に言えば、NVO(Per-instance noise variance Optimization)ゲームは共通の興味のある逐次ゲームとしてフレーム化されており、Nash equilibrium(NE)ポイントが本質的にすべてのデータインスタンスに対してpDPを保証していることを示す。提案したpDPアルゴリズムは, 従来のDPアルゴリズムと比較すると, KLのばらつきから平均99.53%の性能向上を示した。

The concept of differential privacy (DP) can quantitatively measure privacy loss by observing the changes in the distribution caused by the inclusion of individuals in the target dataset. The DP, which is generally used as a constraint, has been prominent in safeguarding datasets in machine learning in industry giants like Apple and Google. A common methodology for guaranteeing DP is incorporating appropriate noise into query outputs, thereby establishing statistical defense systems against privacy attacks such as membership inference and linkage attacks. However, especially for small datasets, existing DP mechanisms occasionally add excessive amount of noise to query output, thereby discarding data utility. This is because the traditional DP computes privacy loss based on the worst-case scenario, i.e., statistical outliers. In this work, to tackle this challenge, we utilize per-instance DP (pDP) as a constraint, measuring privacy loss for each data instance and optimizing noise tailored to individual instances. In a nutshell, we propose a per-instance noise variance optimization (NVO) game, framed as a common interest sequential game, and show that the Nash equilibrium (NE) points of it inherently guarantee pDP for all data instances. Through extensive experiments, our proposed pDP algorithm demonstrated an average performance improvement of up to 99.53% compared to the conventional DP algorithm in terms of KL divergence.

翻訳日:2024-04-26 19:59:40 公開日:2024-04-24

# 脆弱性検出のためのグラフニューラルネットワークの提案

Graph Neural Networks for Vulnerability Detection: A Counterfactual Explanation ( http://arxiv.org/abs/2404.15687v1 )

ライセンス: Link先を確認

Zhaoyang Chu, Yao Wan, Qian Li, Yang Wu, Hongyu Zhang, Yulei Sui, Guandong Xu, Hai Jin,

(参考訳) 脆弱性検出は、ソフトウェアシステムのセキュリティと信頼性を確保するために不可欠である。最近、Graph Neural Networks(GNN)は、ソースコードの基盤となるセマンティック構造をキャプチャする能力のため、脆弱性検出のための顕著なコード埋め込みアプローチとして登場した。しかし、GNNは本質的にブラックボックスの性質のため、説明可能性において重大な課題に直面している。この目的のために、いくつかの事実推論に基づく説明器が提案されている。これらの説明者は、結果に寄与する主要な特徴を分析することによって、GNNによる予測について説明する。コードグラフを代替構造に変更するならば、GNNの決定はどうなるのか? 人工知能における反ファクト推論の進歩に触発されて、GNNベースの脆弱性検出のための新しい反ファクト説明器CFExplainerを提案する。事実推論ベースの説明器とは異なり、CFExplainerは入力コードグラフに対する最小限の摂動を求め、予測が変更される。検出された脆弱性の根本原因を特定し、開発者が脆弱性を修正するための適切なアクションを実行するための貴重な洞察を与えることができる。 4つのGNNベースの脆弱性検出モデルに対する大規模な実験は、既存の最先端の事実推論に基づく説明器に対するCFExplainerの有効性を示している。

Vulnerability detection is crucial for ensuring the security and reliability of software systems. Recently, Graph Neural Networks (GNNs) have emerged as a prominent code embedding approach for vulnerability detection, owing to their ability to capture the underlying semantic structure of source code. However, GNNs face significant challenges in explainability due to their inherently black-box nature. To this end, several factual reasoning-based explainers have been proposed. These explainers provide explanations for the predictions made by GNNs by analyzing the key features that contribute to the outcomes. We argue that these factual reasoning-based explanations cannot answer critical what-if questions: What would happen to the GNN's decision if we were to alter the code graph into alternative structures? Inspired by advancements of counterfactual reasoning in artificial intelligence, we propose CFExplainer, a novel counterfactual explainer for GNN-based vulnerability detection. Unlike factual reasoning-based explainers, CFExplainer seeks the minimal perturbation to the input code graph that leads to a change in the prediction, thereby addressing the what-if questions for vulnerability detection. We term this perturbation a counterfactual explanation, which can pinpoint the root causes of the detected vulnerability and furnish valuable insights for developers to undertake appropriate actions for fixing the vulnerability. Extensive experiments on four GNN-based vulnerability detection models demonstrate the effectiveness of CFExplainer over existing state-of-the-art factual reasoning-based explainers.

翻訳日:2024-04-26 19:59:40 公開日:2024-04-24

# ニューラルプロトランゲージ再建術

Neural Proto-Language Reconstruction ( http://arxiv.org/abs/2404.15690v1 )

ライセンス: Link先を確認

Chenxuan Cui, Ying Chen, Qinxin Wang, David R. Mortensen,

(参考訳) プロトフォームの再構築は言語学者にとって面倒なプロセスだった。近年,このプロセスを自動化するためにRNNやTransformerなどの計算モデルが提案されている。本稿では,データ拡張による失明反射の回復,トランスフォーマーモデルへのVAE構造の追加,再構成作業のためのニューラルマシン翻訳モデルなど,従来の手法を改善するために3つのアプローチを採っている。付加的なVAE構造により、TransformerモデルはWikiHanデータセットのパフォーマンスが向上し、データ拡張ステップがトレーニングを安定化することがわかった。

Proto-form reconstruction has been a painstaking process for linguists. Recently, computational models such as RNN and Transformers have been proposed to automate this process. We take three different approaches to improve upon previous methods, including data augmentation to recover missing reflexes, adding a VAE structure to the Transformer model for proto-to-language prediction, and using a neural machine translation model for the reconstruction task. We find that with the additional VAE structure, the Transformer model has a better performance on the WikiHan dataset, and the data augmentation step stabilizes the training.

翻訳日:2024-04-26 19:59:40 公開日:2024-04-24

# 長期オフポリティ評価と学習

Long-term Off-Policy Evaluation and Learning ( http://arxiv.org/abs/2404.15691v1 )

ライセンス: Link先を確認

Yuta Saito, Himan Abdollahpouri, Jesse Anderton, Ben Carterette, Mounia Lalmas,

(参考訳) アルゴリズムの短期的および長期的な結果はしばしば異なり、下流効果を損なう。クリックベイトアルゴリズムは、短期的なクリックを増加させるが、長期的なユーザーエンゲージメントを損なう可能性がある。長期的な結果を推定する可能な解決策は、潜在的なアルゴリズムに対するオンライン実験またはA/Bテストを実行することであるが、関心の長期的な結果を見るのに数ヶ月またはそれ以上の時間がかかるため、アルゴリズムの選択プロセスは受け入れがたいほど遅くなる。そこで本研究では, 歴史的および短期的な実験データのみを用いて, アルゴリズムの長期的結果の推定を可能かつ正確に行う問題について検討した。既存のアプローチでは、サロガシーと呼ばれる短期的な結果に関する制限的な仮定が必要か、あるいは非効率な短期的な結果を有効に利用することができない。そこで本稿では,報酬関数の分解に基づく長期オフライン評価(LOPE)という新しいフレームワークを提案する。 LOPEは、代理よりもリラックスした仮定の下で機能し、短時間の報酬を効果的に活用して、分散を大幅に減少させる。合成実験により、LOPEは、特にサロゲーシーが厳しく違反し、長期報酬がうるさい場合に、既存のアプローチよりも優れていることが示された。さらに,音楽ストリーミングプラットフォーム上で収集された大規模A/Bテストデータに対する実世界の実験により,LOPEは既存の実現可能な手法よりも,実際のアルゴリズムの長期的な結果をより正確に推定できることを示した。

Short- and long-term outcomes of an algorithm often differ, with damaging downstream effects. A known example is a click-bait algorithm, which may increase short-term clicks but damage long-term user engagement. A possible solution to estimate the long-term outcome is to run an online experiment or A/B test for the potential algorithms, but it takes months or even longer to observe the long-term outcomes of interest, making the algorithm selection process unacceptably slow. This work thus studies the problem of feasibly yet accurately estimating the long-term outcome of an algorithm using only historical and short-term experiment data. Existing approaches to this problem either need a restrictive assumption about the short-term outcomes called surrogacy or cannot effectively use short-term outcomes, which is inefficient. Therefore, we propose a new framework called Long-term Off-Policy Evaluation (LOPE), which is based on reward function decomposition. LOPE works under a more relaxed assumption than surrogacy and effectively leverages short-term rewards to substantially reduce the variance. Synthetic experiments show that LOPE outperforms existing approaches particularly when surrogacy is severely violated and the long-term reward is noisy. In addition, real-world experiments on large-scale A/B test data collected on a music streaming platform show that LOPE can estimate the long-term outcome of actual algorithms more accurately than existing feasible methods.

翻訳日:2024-04-26 19:59:40 公開日:2024-04-24

# MRIの高速化とロバスト化のための深層学習

Deep Learning for Accelerated and Robust MRI Reconstruction: a Review ( http://arxiv.org/abs/2404.15692v1 )

ライセンス: Link先を確認

Reinhard Heckel, Mathews Jacob, Akshay Chaudhari, Or Perlman, Efrat Shimron,

(参考訳) 深層学習(DL)は、放射線診断において重要なツールであるMRI(MRI)の強化のための重要な技術として最近登場した。本稿では,MRI再建のためのDLの最近の進歩について概説する。画質を改善し、スキャンを加速し、データ関連の課題に対処するために設計されたDLアプローチとアーキテクチャに焦点を当てている。その中には、エンドツーエンドのニューラルネットワーク、事前訓練されたネットワーク、生成モデル、自己管理手法などが含まれる。また,DLが獲得プロトコルの最適化,分散シフトに対する堅牢性の向上,微妙なバイアスに対処する上で果たす役割についても論じる。広範にわたる文献と実践的洞察に基づいて、MRI再建におけるDLの活用における現在の成功、限界、今後の方向性を概説し、臨床画像の実践に大きな影響を与えるDLの可能性を強調した。

Deep learning (DL) has recently emerged as a pivotal technology for enhancing magnetic resonance imaging (MRI), a critical tool in diagnostic radiology. This review paper provides a comprehensive overview of recent advances in DL for MRI reconstruction. It focuses on DL approaches and architectures designed to improve image quality, accelerate scans, and address data-related challenges. These include end-to-end neural networks, pre-trained networks, generative models, and self-supervised methods. The paper also discusses the role of DL in optimizing acquisition protocols, enhancing robustness against distribution shifts, and tackling subtle bias. Drawing on the extensive literature and practical insights, it outlines current successes, limitations, and future directions for leveraging DL in MRI reconstruction, while emphasizing the potential of DL to significantly impact clinical imaging practices.

翻訳日:2024-04-26 19:59:40 公開日:2024-04-24

# DeepFeatureX Net:Deep Features eXtractors based Network for discrimination synthesis from real image

DeepFeatureX Net: Deep Features eXtractors based Network for discriminating synthetic from real images ( http://arxiv.org/abs/2404.15697v1 )

ライセンス: Link先を確認

Orazio Pontorno, Luca Guarnera, Sebastiano Battiato,

(参考訳) ディープラーニングアルゴリズムによって生成された合成画像であるDeepfakesは、Digital Forensicsの分野における最大の課題の1つだ。科学コミュニティは、デジタル画像(リアルまたはAI生成)の起源を識別できるアプローチの開発に取り組んでいる。しかし、これらの手法は、訓練中に見えないアーキテクチャによって生成されたとしても、画像の性質を識別する能力という一般化の課題に直面している。これは通常パフォーマンスの低下につながる。この文脈では,ベースモデルと呼ばれる3つのブロックをベースとした新しいアプローチを提案し,各ブロックは,意図的不均衡なデータセットを活用することによって,特定の画像クラスの識別的特徴(拡散モデル生成,GAN生成,あるいは実)を抽出する。そして、各ブロックから抽出された特徴を連結処理し、入力画像の原点を識別する。実験結果から,この手法はJPEG圧縮に優れたロバスト性を示すだけでなく,いくつかの一般化テストにおいて最先端の手法よりも優れていることが示された。コード、モデル、データセットはhttps://github.com/opontorno/block-based_deepfake-detectionで確認できる。

Deepfakes, synthetic images generated by deep learning algorithms, represent one of the biggest challenges in the field of Digital Forensics. The scientific community is working to develop approaches that can discriminate the origin of digital images (real or AI-generated). However, these methodologies face the challenge of generalization, that is, the ability to discern the nature of an image even if it is generated by an architecture not seen during training. This usually leads to a drop in performance. In this context, we propose a novel approach based on three blocks called Base Models, each of which is responsible for extracting the discriminative features of a specific image class (Diffusion Model-generated, GAN-generated, or real) as it is trained by exploiting deliberately unbalanced datasets. The features extracted from each block are then concatenated and processed to discriminate the origin of the input image. Experimental results showed that this approach not only demonstrates good robust capabilities to JPEG compression but also outperforms state-of-the-art methods in several generalization tests. Code, models and dataset are available at https://github.com/opontorno/block-based_deepfake-detection.

翻訳日:2024-04-26 19:59:40 公開日:2024-04-24

# MAS-SAM: 群集した特徴を持つ海洋動物を隔離する

MAS-SAM: Segment Any Marine Animal with Aggregated Features ( http://arxiv.org/abs/2404.15700v1 )

ライセンス: Link先を確認

Tianyu Yan, Zifu Wan, Xinhao Deng, Pingping Zhang, Yang Liu, Huchuan Lu,

(参考訳) 近年、SAM(Segment Anything Model)は、高品質なオブジェクトマスクを生成し、ゼロショット画像のセグメンテーションを実現する際、例外的な性能を示す。しかし、多用途視覚モデルとして、SAMは主に大規模な自然光画像で訓練されている。水中のシーンでは、光散乱と吸収により性能が著しく低下する。一方、SAMのデコーダの単純さは、きめ細かいオブジェクトの詳細を損なう可能性がある。以上の課題に対処するため,海洋動物セグメンテーションのためのMAS-SAMという新しい特徴学習フレームワークを提案する。より具体的には、水中シーン用の効果的なアダプタを備えたSAMエンコーダを最初に構築する。次に,ハイパーマップ抽出モジュール (HEM) を導入し,包括的ガイダンスのためのマルチスケール機能を生成する。最後に,マルチスケール特徴を集約し,最終的なセグメンテーション結果を予測するプログレッシブ予測デコーダ(PPD)を提案する。本研究では,Fusion Attention Module (FAM) を移植することにより,グローバルな文脈的手がかりからよりリッチな海洋情報をよりきめ細かな局所的詳細まで抽出することができる。 4つのパブリックMASデータセットに対する大規模な実験により、我々のMAS-SAMは、他の典型的なセグメンテーション手法よりも優れた結果が得られることを示した。ソースコードはhttps://github.com/Drchip61/MAS-SAMで入手できる。

Recently, Segment Anything Model (SAM) shows exceptional performance in generating high-quality object masks and achieving zero-shot image segmentation. However, as a versatile vision model, SAM is primarily trained with large-scale natural light images. In underwater scenes, it exhibits substantial performance degradation due to the light scattering and absorption. Meanwhile, the simplicity of the SAM's decoder might lead to the loss of fine-grained object details. To address the above issues, we propose a novel feature learning framework named MAS-SAM for marine animal segmentation, which involves integrating effective adapters into the SAM's encoder and constructing a pyramidal decoder. More specifically, we first build a new SAM's encoder with effective adapters for underwater scenes. Then, we introduce a Hypermap Extraction Module (HEM) to generate multi-scale features for a comprehensive guidance. Finally, we propose a Progressive Prediction Decoder (PPD) to aggregate the multi-scale features and predict the final segmentation results. When grafting with the Fusion Attention Module (FAM), our method enables to extract richer marine information from global contextual cues to fine-grained local details. Extensive experiments on four public MAS datasets demonstrate that our MAS-SAM can obtain better results than other typical segmentation methods. The source code is available at https://github.com/Drchip61/MAS-SAM.

翻訳日:2024-04-26 19:49:57 公開日:2024-04-24

# Nyonic Technical Report

Nyonic Technical Report ( http://arxiv.org/abs/2404.15702v1 )

ライセンス: Link先を確認

Junfeng Tian, Rui Wang, Cong Li, Yudong Zhou, Jun Liu, Jun Wang,

(参考訳) 本報告では,カスタムな大規模言語モデル用に設計された最新の言語モデルの開発と重要な成果について詳述する。導入された進歩には、フレキシブルなトレーニングデータ調整とカリキュラム学習をサポートする、新しいオンラインデータスケジューリングが含まれている。モデルのアーキテクチャには、ロータリー位置埋め込み(Rotary Positional Embeddings)、QK-LayerNorm(QK-LayerNorm)などの最先端技術と、安定性と性能を高めるために特別に製作された多言語トークンライザが組み込まれている。さらに、我々の堅牢なトレーニングフレームワークは、最適な効率を確保するために、高度なモニタリングと迅速なリカバリ機能を備えている。我々のWonton 7Bモデルは、多言語および英語のベンチマークで競合性能を示した。今後の開発は、より広範囲にトレーニングされたモデルによるパフォーマンスギャップの縮小を優先し、実際の有効性と適応性を高めるだろう。

This report details the development and key achievements of our latest language model designed for custom large language models. The advancements introduced include a novel Online Data Scheduler that supports flexible training data adjustments and curriculum learning. The model's architecture is fortified with state-of-the-art techniques such as Rotary Positional Embeddings, QK-LayerNorm, and a specially crafted multilingual tokenizer to enhance stability and performance. Moreover, our robust training framework incorporates advanced monitoring and rapid recovery features to ensure optimal efficiency. Our Wonton 7B model has demonstrated competitive performance on a range of multilingual and English benchmarks. Future developments will prioritize narrowing the performance gap with more extensively trained models, thereby enhancing the model's real-world efficacy and adaptability.GitHub: \url{https://github.com/nyonicai/nyonic-public}

翻訳日:2024-04-26 19:49:56 公開日:2024-04-24

# 逆相補表現学習を用いた効率的な多モデル融合

Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning ( http://arxiv.org/abs/2404.15704v1 )

ライセンス: Link先を確認

Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, Jing Xiao,

(参考訳) 単一モデルシステムは、話者検証(SV)や画像分類といったタスクの欠如に悩まされ、意思決定時に部分的な事前知識に大きく依存する。マルチモデル融合(MMF)はこれらの問題のいくつかを軽減することができるが、学習された表現の冗長性は改善を制限する可能性がある。そこで本稿では,新たにトレーニングされたモデルに対して,事前取得した知識を回避し,各コンポーネントモデルに対して,最大で相補的表現の学習を可能にする,対向的補完的表現学習(ACoRL)フレームワークを提案する。提案手法は従来のMMFよりも効率よく性能を向上することを示す。さらに、属性分析により、ACoRLの下で訓練されたモデルがより補完的な知識を獲得し、タスク間の効率性と堅牢性を高めるためのアプローチの有効性を強調した。

Single-model systems often suffer from deficiencies in tasks such as speaker verification (SV) and image classification, relying heavily on partial prior knowledge during decision-making, resulting in suboptimal performance. Although multi-model fusion (MMF) can mitigate some of these issues, redundancy in learned representations may limits improvements. To this end, we propose an adversarial complementary representation learning (ACoRL) framework that enables newly trained models to avoid previously acquired knowledge, allowing each individual component model to learn maximally distinct, complementary representations. We make three detailed explanations of why this works and experimental results demonstrate that our method more efficiently improves performance compared to traditional MMF. Furthermore, attribution analysis validates the model trained under ACoRL acquires more complementary knowledge, highlighting the efficacy of our approach in enhancing efficiency and robustness across tasks.

翻訳日:2024-04-26 19:49:56 公開日:2024-04-24

# ESR-NeRF:LDR多視点画像を用いた音源再構成

ESR-NeRF: Emissive Source Reconstruction Using LDR Multi-view Images ( http://arxiv.org/abs/2404.15707v1 )

ライセンス: Link先を確認

Jinseo Jeong, Junseo Koo, Qimeng Zhang, Gunhee Kim,

(参考訳) 既存のNeRFベースの逆レンダリング手法では、シーンは遠方の光源によってのみ照らされ、シーン内の放射源の影響を無視する。本研究では,LDRマルチビュー画像に送信源をオン/オフにすることで,この制限に直面している。 2つの重要な問題に対処する必要がある。 1)未知の光の詳細とともに、限られたダイナミックレンジから生じるあいまいさ 2) 最終的な物体色に繋がる経路を後付けするために, ボリュームレンダリングの高価な計算コストがかかる。本稿では,ニューラルネットワークを学習可能な関数として活用し,レイトレーシング場を表現する新しいアプローチであるESR-NeRFを提案する。光輸送セグメントを満たすためにネットワークを訓練することにより、放射源を徐々に特定し、反射領域を認識しながら、発信する放射光を規制する。その結果,ESR-NeRFの質的・定量的な優位性が示された。提案手法は,DTUデータセット上の低CD測定値を達成するため,送信源のないシーンに適用性も拡張する。

Existing NeRF-based inverse rendering methods suppose that scenes are exclusively illuminated by distant light sources, neglecting the potential influence of emissive sources within a scene. In this work, we confront this limitation using LDR multi-view images captured with emissive sources turned on and off. Two key issues must be addressed: 1) ambiguity arising from the limited dynamic range along with unknown lighting details, and 2) the expensive computational cost in volume rendering to backtrace the paths leading to final object colors. We present a novel approach, ESR-NeRF, leveraging neural networks as learnable functions to represent ray-traced fields. By training networks to satisfy light transport segments, we regulate outgoing radiances, progressively identifying emissive sources while being aware of reflection areas. The results on scenes encompassing emissive sources with various properties demonstrate the superiority of ESR-NeRF in qualitative and quantitative ways. Our approach also extends its applicability to the scenes devoid of emissive sources, achieving lower CD metrics on the DTU dataset.

翻訳日:2024-04-26 19:49:56 公開日:2024-04-24

# ViViDex:人間のビデオから視覚に基づく有害な操作を学習する

ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos ( http://arxiv.org/abs/2404.15709v1 )

ライセンス: Link先を確認

Zerui Chen, Shizhe Chen, Cordelia Schmid, Ivan Laptev,

(参考訳) 本研究では,多指ロボットによる多様なポーズでさまざまな物体を操作するための統一的な視覚ベースのポリシーを学習することを目的とする。これまでの研究は、人間のビデオが政策学習に有効であることを示したが、ビデオから抽出された物理的に不可解な軌跡によって性能改善は制限されてきた。さらに、接地木オブジェクトのような特権オブジェクト情報への依存は、現実的なシナリオにおける適用性をさらに制限する。これらの制約に対処するため、人間のビデオから視覚に基づくポリシー学習を改善するための新しいフレームワークViViDexを提案する。最初は、強化学習と軌道誘導報酬を使って、各ビデオのステートベースのポリシーを訓練し、ビデオから視覚的に自然と身体的にもっともらしい軌跡の両方を得る。次に、州ベースのポリシーから成功したエピソードをロールアウトし、特権情報を使用しずに統一された視覚ポリシーをトレーニングします。性能を著しく向上させるために座標変換法を提案する。提案手法を3つのデクスタラスな操作タスクで評価し,最先端のアルゴリズムよりも大幅に改善したことを示す。

In this work, we aim to learn a unified vision-based policy for a multi-fingered robot hand to manipulate different objects in diverse poses. Though prior work has demonstrated that human videos can benefit policy learning, performance improvement has been limited by physically implausible trajectories extracted from videos. Moreover, reliance on privileged object information such as ground-truth object states further limits the applicability in realistic scenarios. To address these limitations, we propose a new framework ViViDex to improve vision-based policy learning from human videos. It first uses reinforcement learning with trajectory guided rewards to train state-based policies for each video, obtaining both visually natural and physically plausible trajectories from the video. We then rollout successful episodes from state-based policies and train a unified visual policy without using any privileged information. A coordinate transformation method is proposed to significantly boost the performance. We evaluate our method on three dexterous manipulation tasks and demonstrate a large improvement over state-of-the-art algorithms.

翻訳日:2024-04-26 19:49:56 公開日:2024-04-24

# Ada-DF: 顔表情認識のための適応ラベル分布統合ネットワーク

Ada-DF: An Adaptive Label Distribution Fusion Network For Facial Expression Recognition ( http://arxiv.org/abs/2404.15714v1 )

ライセンス: Link先を確認

Shu Liu, Yan Xu, Tongming Wan, Xiaoyan Kui,

(参考訳) 表情認識(FER)は日常生活において重要な役割を担っている。しかし、データセットのアノテーションの曖昧さは、パフォーマンスを著しく損なう可能性がある。本稿では,ferタスクをラベル分散学習パラダイム経由で処理し,デュアルブランチ適応分布融合(Ada-DF)フレームワークを開発する。サンプルのラベル分布を得るために1つの補助枝を構築する。感情のクラス分布は、各感情のラベル分布を通して計算される。最後に、これらの2つの分布は、目標分岐を訓練するための注意重みに応じて適応的に融合する。 RAF-DB、AffectNet、SFEWという3つの実世界のデータセットで大規模な実験が行われています。

Facial expression recognition (FER) plays a significant role in our daily life. However, annotation ambiguity in the datasets could greatly hinder the performance. In this paper, we address FER task via label distribution learning paradigm, and develop a dual-branch Adaptive Distribution Fusion (Ada-DF) framework. One auxiliary branch is constructed to obtain the label distributions of samples. The class distributions of emotions are then computed through the label distributions of each emotion. Finally, those two distributions are adaptively fused according to the attention weights to train the target branch. Extensive experiments are conducted on three real-world datasets, RAF-DB, AffectNet and SFEW, where our Ada-DF shows advantages over the state-of-the-art works.

翻訳日:2024-04-26 19:49:56 公開日:2024-04-24

# 中心スピンモデルにおける精製相転移:双対空間アプローチにおける第二レニイエントロピー

Purification phase transition in the central spin model: second Rényi entropy in dual space approach ( http://arxiv.org/abs/2404.15717v1 )

ライセンス: Link先を確認

V. V. Belov, W. V. Pogosov,

(参考訳) 我々は, 中心スピンモデルが測定過程が存在する場合の力学の数値的研究を行う。このモデルは、そのトポロジーにより実験的な探索を約束しており、中心粒子と量子浴を異なるサブシステムとして自然に区別し、絡み合い相転移を調べることができる。この系における測定誘起相転移を特徴づけるために、二次元空間における第二R'enyiエントロピーに基づく最近開発された手法を用いる。シミュレーションでは、デコヒーレンス、エネルギー緩和、ゲートエラーが説明できる。臨界測定速度を判定し, 相互エントロピーに基づく簡単なアプローチで予測した値とは大きく異なることを示す。

We conduct a numerical investigation of the dynamics of the central spin model in the presence of measurement processes. This model holds promise for experimental exploration due to its topology, which facilitates the natural distinction of a central particle and the quantum bath as different subsystems, allowing for the examination of entanglement phase transitions. To characterize the measurement-induced phase transition in this system, we employ a recently developed method based on second R\'enyi entropy in dual space. Our simulations account for decoherence, energy relaxation, and gate errors. We determine critical measurement rates and demonstrate that they significantly differ from those predicted by a simple approach based on mutual entropy.

翻訳日:2024-04-26 19:49:56 公開日:2024-04-24

# 不合理な身体領域における偽予測の緩和

Mitigating False Predictions In Unreasonable Body Regions ( http://arxiv.org/abs/2404.15718v1 )

ライセンス: Link先を確認

Constantin Ulrich, Catherine Knobloch, Julius C. Holzschuh, Tassilo Wald, Maximilian R. Rokuss, Maximilian Zenk, Maximilian Fischer, Michael Baumgartner, Fabian Isensee, Klaus H. Maier-Hein,

(参考訳) 3次元医用画像セグメンテーションのためのディープラーニングモデルの開発にかなりの努力を払っているにもかかわらず、多様な画像分布を効果的に一般化するという課題は続いている。ドメインの一般化は、臨床現場での堅牢な応用には不可欠であると認識されているが、限られた視野(FOV)でのトレーニングから生じる課題は未解決のままである。この制限は、トレーニングデータのFOVを超える身体領域に適用した場合、誤った予測につながる。そこで本研究では, 単一データセットと複数データセットの両方のトレーニングスキームに適用可能な, 不確定な身体領域の予測をペナルティ化する新しい損失関数を提案する。軸スライス位置スコアを生成するBody Part Regressionモデルで実現した。様々なFOVを特徴とするテストセットを用いた包括的評価により,本手法は一般化能力の顕著な改善を示す。偽陽性腫瘍予測を85%まで効果的に軽減し、全体のセグメンテーション性能を大幅に向上させる。

Despite considerable strides in developing deep learning models for 3D medical image segmentation, the challenge of effectively generalizing across diverse image distributions persists. While domain generalization is acknowledged as vital for robust application in clinical settings, the challenges stemming from training with a limited Field of View (FOV) remain unaddressed. This limitation leads to false predictions when applied to body regions beyond the FOV of the training data. In response to this problem, we propose a novel loss function that penalizes predictions in implausible body regions, applicable in both single-dataset and multi-dataset training schemes. It is realized with a Body Part Regression model that generates axial slice positional scores. Through comprehensive evaluation using a test set featuring varying FOVs, our approach demonstrates remarkable improvements in generalization capabilities. It effectively mitigates false positive tumor predictions up to 85% and significantly enhances overall segmentation performance.

翻訳日:2024-04-26 19:49:56 公開日:2024-04-24

# 主観的NLP課題に対するアノテータ中心能動学習

Annotator-Centric Active Learning for Subjective NLP Tasks ( http://arxiv.org/abs/2404.15720v1 )

ライセンス: Link先を確認

Michiel van der Meer, Neele Falk, Pradeep K. Murukannaiah, Enrico Liscio,

(参考訳) 主観的NLPタスクに対する人間の判断のばらつきを正確に把握するためには、アノテーションプロセスに幅広い視点を取り入れることが不可欠である。アクティブラーニング(AL)は、最も有益なサンプルを戦略的に注釈付けすることで、人間のアノテーションを収集するコストに対処する。本稿では,データサンプリングに続き,アノテーション選択戦略を取り入れたACAL(Annotator-Centric Active Learning)を提案する。 1)人間の判断の多様性を効率よく近似し,アノテータ中心の指標を用いてモデル性能を評価する。従来の評価指標と人間中心評価指標の両方を用いて、7つの主観的NLPタスクにまたがる複数のアノテータ選択戦略を実験した。以上の結果から,ACALはデータ効率を向上し,アノテータ中心の性能評価に優れることが示唆された。しかし、その成功は、十分に大きく多様なアノテータのプールがサンプルとして利用できることに依存している。

To accurately capture the variability in human judgments for subjective NLP tasks, incorporating a wide range of perspectives in the annotation process is crucial. Active Learning (AL) addresses the high costs of collecting human annotations by strategically annotating the most informative samples. We introduce Annotator-Centric Active Learning (ACAL), which incorporates an annotator selection strategy following data sampling. Our objective is two-fold: (1) to efficiently approximate the full diversity of human judgments, and to assess model performance using annotator-centric metrics, which emphasize minority perspectives over a majority. We experiment with multiple annotator selection strategies across seven subjective NLP tasks, employing both traditional and novel, human-centered evaluation metrics. Our findings indicate that ACAL improves data efficiency and excels in annotator-centric performance evaluations. However, its success depends on the availability of a sufficiently large and diverse pool of annotators to sample from.

翻訳日:2024-04-26 19:49:56 公開日:2024-04-24

# SPARO: 視覚のためのロバストおよびコンポジショントランスフォーマーエンコーディングのための選択的注意

SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision ( http://arxiv.org/abs/2404.15721v1 )

ライセンス: Link先を確認

Ankit Vani, Bac Nguyen, Samuel Lavoie, Ranjay Krishna, Aaron Courville,

(参考訳) 選択的な注意は、感覚入力の絶え間ない洪水におけるタスク関連側面に焦点を合わせるのに役立ちます。この知覚の制約は、注意を散らし、知覚可能な概念の新しい構成にしっかりと一般化することを可能にする。しかし、CLIPやDINOのようなトランスフォーマーバックボーンを持つ表現学習モデルは、堅牢性や構成性を示すのに失敗することが多い。人間の知覚とは異なり、トランスフォーマーエンコーディングは個々の概念を別々に扱うものではない。そこで本研究では,SPAROを提案する。SPAROは1つのアテンションヘッドによって生成され,エンコーディングを別個のアテンションスロットに分割する読み出し機構である。 CLIPによるSPAROの使用は、視覚とテキストのモダリティが同じ概念を持つ共有構成世界の異なる視点であることを示す帰納的バイアスを与える。 SPAROを用いて、CLIPによる下流認識、ロバスト性、検索、構成性ベンチマークの改善(ImageNetは+14%、SugarCrepeは+4%)、およびDINOによるImageNetの近接および線形プローブ(+3%)について示す。また,各SPARO概念に介入して選択し,下流タスク性能(SugarCrepeでは+4%から+9%まで)をさらに向上させ,SPAROの表現構造の堅牢性について検討する強力な能力についても紹介する。最後に、アブレーション実験と学習概念の可視化を通して洞察を提供する。

Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception allows us to robustly generalize under distractions and to new compositions of perceivable concepts. Transformers employ a similar notion of attention in their architecture, but representation learning models with transformer backbones like CLIP and DINO often fail to demonstrate robustness and compositionality. We highlight a missing architectural prior: unlike human perception, transformer encodings do not separately attend over individual concepts. In response, we propose SPARO, a read-out mechanism that partitions encodings into separately-attended slots, each produced by a single attention head. Using SPARO with CLIP imparts an inductive bias that the vision and text modalities are different views of a shared compositional world with the same corresponding concepts. Using SPARO, we demonstrate improvements on downstream recognition, robustness, retrieval, and compositionality benchmarks with CLIP (up to +14% for ImageNet, +4% for SugarCrepe), and on nearest neighbors and linear probe for ImageNet with DINO (+3% each). We also showcase a powerful ability to intervene and select individual SPARO concepts to further improve downstream task performance (up from +4% to +9% for SugarCrepe) and use this ability to study the robustness of SPARO's representation structure. Finally, we provide insights through ablation experiments and visualization of learned concepts.

翻訳日:2024-04-26 19:49:56 公開日:2024-04-24

# Gradformer: 指数減衰を備えたグラフ変換器

Gradformer: Graph Transformer with Exponential Decay ( http://arxiv.org/abs/2404.15729v1 )

ライセンス: Link先を確認

Chuang Liu, Zelin Yao, Yibing Zhan, Xueqi Ma, Shirui Pan, Wenbin Hu,

(参考訳) グラフトランスフォーマー(GT)は、幅広いタスクでその利点を実証している。しかし、GTsの自己注意機構はグラフの帰納バイアス、特にグラフのタスクに不可欠な構造に関するバイアスを見落としている。位置符号化と注意バイアスを利用して誘導バイアスをモデル化する手法もあるが、その効果はいまだに準最適である。そこで本稿では,GTと本質的帰納バイアスを革新的に統合する手法であるGradformerについて述べる。具体的には、崩壊マスク行列の値は指数関数的に減少し、グラフ構造内のノードの近さの減少に関連している。この設計によりGradformerは、グラフのローカル詳細に集中しながら、遠くのノードから情報をキャプチャする能力を維持することができる。さらに、グラッドフォーマーは減衰マスクに学習可能な制約を導入し、異なる注意頭が異なる減衰マスクを学習できるようにする。このような設計は注目ヘッドを多様化させ、グラフ内の多様な構造情報のより効果的な同化を可能にする。様々なベンチマーク実験により、グラフニューラルネットワークとGTベースラインモデルにおいて、グラフ分類や回帰タスクにおいて、Gradformerは一貫してパフォーマンスが向上していることが示された。さらに、Gradformerは他のGTモデルで観測される顕著な精度低下とは対照的に、ネットワークの深層化に伴って浅部モデルと比較して精度を維持または向上し、深部GTモデルのトレーニングに有効な方法であることが証明されている。

Graph Transformers (GTs) have demonstrated their advantages across a wide range of tasks. However, the self-attention mechanism in GTs overlooks the graph's inductive biases, particularly biases related to structure, which are crucial for the graph tasks. Although some methods utilize positional encoding and attention bias to model inductive biases, their effectiveness is still suboptimal analytically. Therefore, this paper presents Gradformer, a method innovatively integrating GT with the intrinsic inductive bias by applying an exponential decay mask to the attention matrix. Specifically, the values in the decay mask matrix diminish exponentially, correlating with the decreasing node proximities within the graph structure. This design enables Gradformer to retain its ability to capture information from distant nodes while focusing on the graph's local details. Furthermore, Gradformer introduces a learnable constraint into the decay mask, allowing different attention heads to learn distinct decay masks. Such an design diversifies the attention heads, enabling a more effective assimilation of diverse structural information within the graph. Extensive experiments on various benchmarks demonstrate that Gradformer consistently outperforms the Graph Neural Network and GT baseline models in various graph classification and regression tasks. Additionally, Gradformer has proven to be an effective method for training deep GT models, maintaining or even enhancing accuracy compared to shallow models as the network deepens, in contrast to the significant accuracy drop observed in other GT models.Codes are available at \url{https://github.com/LiuChuang0059/Gradformer}.

翻訳日:2024-04-26 19:49:56 公開日:2024-04-24

# MD-NOMAD:確率微分方程式と不確実性伝播をエミュレートするための混合密度非線形多様体デコーダ

MD-NOMAD: Mixture density nonlinear manifold decoder for emulating stochastic differential equations and uncertainty propagation ( http://arxiv.org/abs/2404.15731v1 )

ライセンス: Link先を確認

Akshay Thakur, Souvik Chakraborty,

(参考訳) 確率シミュレータのためのニューラル演算子フレームワークである混合密度非線形多様体デコーダ(MD-NOMAD)を提案する。提案手法は,ニューラルアーキテクチャの非線形デコーダ(NomaD)と混合密度に基づく手法を併用して,確率的出力関数の条件確率分布を推定する。 MD-NOMADは、確率的混合モデルの複雑な確率と、ポイントワイドニューラル演算子NOMADの高次元スケーラビリティを推定する能力を利用する。本研究では, 確率的常微分方程式と偏微分方程式の広範囲にまたがる実験的な評価を行い, 対応する結果を示し, 提案フレームワークの性能を明らかにする。

We propose a neural operator framework, termed mixture density nonlinear manifold decoder (MD-NOMAD), for stochastic simulators. Our approach leverages an amalgamation of the pointwise operator learning neural architecture nonlinear manifold decoder (NOMAD) with mixture density-based methods to estimate conditional probability distributions for stochastic output functions. MD-NOMAD harnesses the ability of probabilistic mixture models to estimate complex probability and the high-dimensional scalability of pointwise neural operator NOMAD. We conduct empirical assessments on a wide array of stochastic ordinary and partial differential equations and present the corresponding results, which highlight the performance of the proposed framework.

翻訳日:2024-04-26 19:49:56 公開日:2024-04-24

# メトロ原点推定予測のための微粒な時空間MLPアーキテクチャ

Fine-grained Spatial-temporal MLP Architecture for Metro Origin-Destination Prediction ( http://arxiv.org/abs/2404.15734v1 )

ライセンス: Link先を確認

Yang Liu, Binglin Chen, Yongsen Zheng, Guanbin Li, Liang Lin,

(参考訳) 都市交通の正確な予測は、地下鉄のスケジューリングを最適化し、全体の輸送効率を向上させるために重要である。駅間の細粒度および包括的関係を効果的に分析することは、メトロオリジン・デスティン化(OD)予測に不可欠である。しかし、既存のメトロODモデルは、駅の視点で複数のODペアからの情報や、ODペアのサブセットにのみ焦点を合わせている。これらのアプローチはODペア間の微細な関係を見落とし、潜在的な異常な状態を予測するのに困難をもたらす可能性がある。これらの課題に対処するために、すべてのODペアの観点からトラフィックの変動を分析し、ODMixerというメトロOD予測のための微粒な時空間MLPアーキテクチャを提案する。具体的には、ODMixerは二重分岐構造を持ち、Channel Mixer、Multi-view Mixer、Bidirectional Trend Learnerを含む。 Channel MixerはODペア間の短期的時間的関係を捉えることを目的としており、Multi-view Mixerは起源と目的地の両方の観点から関係を捉えることに集中している。長期的な時間的関係をモデル化するために,双方向トレンド学習システムを導入する。大規模OD予測データセットHZMODとSHMOの大規模な実験により,ODMixerの利点が示された。コードは利用可能です。

Accurate prediction of metro traffic is crucial for optimizing metro scheduling and enhancing overall transport efficiency. Analyzing fine-grained and comprehensive relations among stations effectively is imperative for metro Origin-Destination (OD) prediction. However, existing metro OD models either mix information from multiple OD pairs from the station's perspective or exclusively focus on a subset of OD pairs. These approaches may overlook fine-grained relations among OD pairs, leading to difficulties in predicting potential anomalous conditions. To address these challenges, we analyze traffic variations from the perspective of all OD pairs and propose a fine-grained spatial-temporal MLP architecture for metro OD prediction, namely ODMixer. Specifically, our ODMixer has double-branch structure and involves the Channel Mixer, the Multi-view Mixer, and the Bidirectional Trend Learner. The Channel Mixer aims to capture short-term temporal relations among OD pairs, the Multi-view Mixer concentrates on capturing relations from both origin and destination perspectives. To model long-term temporal relations, we introduce the Bidirectional Trend Learner. Extensive experiments on two large-scale metro OD prediction datasets HZMOD and SHMO demonstrate the advantages of our ODMixer. The code will be available.

翻訳日:2024-04-26 19:49:56 公開日:2024-04-24

# 列車なしのゲイン:訓練不要な言語適応者強化のための言語算術

No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement ( http://arxiv.org/abs/2404.15737v1 )

ライセンス: Link先を確認

Mateusz Klimaszewski, Piotr Andruszkiewicz, Alexandra Birch,

(参考訳) モジュール型深層学習は、多言語性の呪いを解き、負の干渉の影響を防ぎ、多言語事前学習言語モデルにおける言語間性能を実現するための最先端のソリューションである。しかし、このアプローチのトレードオフは、密接な関係のある言語からの正転移学習の削減である。そこで本研究では,この制限に対処するためのトレーニング不要なポストプロセッシングを実現する,言語演算と呼ばれる新しい手法を提案する。タスク演算フレームワークにインスパイアされ、言語アダプタに加えて学習を適用し、フレームワークをマルチタスクから多言語設定に移行する。提案手法の有効性は,MAD-Xに基づく言語間スキームの3つの下流タスクにおいて実証され,後処理の手順として機能する。ゼロショットおよび低リソースアプリケーションの最も難しいケースでは、言語演算がベースラインを一貫して改善する。私たちのコードとモデルはhttps://github.com/mklimasz/ language-arithmetic で利用可能です。

Modular deep learning is the state-of-the-art solution for lifting the curse of multilinguality, preventing the impact of negative interference and enabling cross-lingual performance in Multilingual Pre-trained Language Models. However, a trade-off of this approach is the reduction in positive transfer learning from closely related languages. In response, we introduce a novel method called language arithmetic, which enables training-free post-processing to address this limitation. Inspired by the task arithmetic framework, we apply learning via addition to the language adapters, transitioning the framework from a multi-task to a multilingual setup. The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes, acting as a post-processing procedure. Language arithmetic consistently improves the baselines with significant gains in the most challenging cases of zero-shot and low-resource applications. Our code and models are available at https://github.com/mklimasz/language-arithmetic .

翻訳日:2024-04-26 19:40:12 公開日:2024-04-24

# ネストニューラルネットワークによるSINDyアプローチの一般化

Generalizing the SINDy approach with nested neural networks ( http://arxiv.org/abs/2404.15742v1 )

ライセンス: Link先を確認

Camilla Fiorini, Clément Flint, Louis Fostier, Emmanuel Franck, Reyhaneh Hashemi, Victor Michel-Dansac, Wassim Tenachi,

(参考訳) シンボリック回帰(英: Symbolic Regression、SR)は、データからシンボリック表現を推論することを目的とした、広く研究されている研究分野である。 SRの一般的なアプローチは、疎回帰を用いてデータから支配方程式を識別する非線形力学系(\sindy)フレームワークのスパース同定である。本研究では、ネスト構造によりSINDyアプローチの表現性を高めることを目的とした強化手法であるNested SINDyを紹介する。実際、伝統的な記号回帰法やシステム同定法は、分析的に容易に記述できない複雑なシステムでは失敗することが多い。 Nested SINDyはSINDyフレームワーク上に構築されており、コアSINDyレイヤの前後に追加レイヤを導入する。これにより、関数の合成や積を含む、より広い範囲のシステムに対する記号表現を特定できる。我々は、基本的な三角関数やより複雑なシステムに対するスパースな解析的表現など、単純なシステムの記号表現を正確に見つけるNested SINDyアプローチの能力を実証する。この結果から,Nested SINDyが表現性において従来のSINDyアプローチを超越した,シンボリック回帰のツールとしての可能性を強調した。しかし、Nested SINDyの最適化プロセスの課題にも言及し、最適化プロセスのためのより堅牢な方法論の設計を含む今後の研究方向性を提案する。この研究は、Nested SINDyがデータから動的システムの記号表現を効果的に発見できることを証明し、データ駆動手法によって複雑なシステムを理解する新たな機会を提供する。

Symbolic Regression (SR) is a widely studied field of research that aims to infer symbolic expressions from data. A popular approach for SR is the Sparse Identification of Nonlinear Dynamical Systems (\sindy) framework, which uses sparse regression to identify governing equations from data. This study introduces an enhanced method, Nested SINDy, that aims to increase the expressivity of the SINDy approach thanks to a nested structure. Indeed, traditional symbolic regression and system identification methods often fail with complex systems that cannot be easily described analytically. Nested SINDy builds on the SINDy framework by introducing additional layers before and after the core SINDy layer. This allows the method to identify symbolic representations for a wider range of systems, including those with compositions and products of functions. We demonstrate the ability of the Nested SINDy approach to accurately find symbolic expressions for simple systems, such as basic trigonometric functions, and sparse (false but accurate) analytical representations for more complex systems. Our results highlight Nested SINDy's potential as a tool for symbolic regression, surpassing the traditional SINDy approach in terms of expressivity. However, we also note the challenges in the optimization process for Nested SINDy and suggest future research directions, including the designing of a more robust methodology for the optimization process. This study proves that Nested SINDy can effectively discover symbolic representations of dynamical systems from data, offering new opportunities for understanding complex systems through data-driven methods.

翻訳日:2024-04-26 19:40:12 公開日:2024-04-24

# SRAGAN: 清墨画創出のための正則化・適応生成支援ネットワーク

SRAGAN: Saliency Regularized and Attended Generative Adversarial Network for Chinese Ink-wash Painting Generation ( http://arxiv.org/abs/2404.15743v1 )

ライセンス: Link先を確認

Xiang Gao, Yuqi Zhang,

(参考訳) 本論文は、実際の絵を中国の伝統的な墨画、すなわち中国の墨画様式の移譲に転換する問題に対処する。この問題は、画像から画像への翻訳モデルによって実現できるが、これらすべての方法で注目すべき問題は、オリジナルの画像内容の詳細がインクウォッシュスタイルの要素の転送によって容易に消去または破損できることである。この問題を解消または改善するために,未完成画像から画像への翻訳フレームワークに塩分検出を導入し,生成した絵画のコンテンツ情報を正規化することを提案する。本手法では,サリエンシ・アダプティブ・ノーマライゼーション(SANorm)を提案し,サリエンシ・アダプティブ・ノーマライゼーション(SANorm)を提案し,サリエンシ・インダプティブ・ノーマライゼーション(SANorm)を提案し,サリエンシ・インフォメーション・インフォメーション・インフォメーション・インフォメーション・インフォメーション・インジェクションにより,サリエンシ・インフォメーション・インフォメーション・インフォメーション・インフォメーション・インフォメーション・インフォメーション・インフォメーション(SANorm)を提案する。また,サリエンシ・マスクを用いたサリエンシ・アテンデント・ディシミネータ・ネットワークを提案し,サリエンシ・マスクを用いたサリエンシ・アテンシ・アテンシ・アテンシネータ・ネットワークを提案し,画像のサリエンシ・オブジェクトに対してより微細なインク・ウォッシュ・スタイリゼーション・エフェクトの創出に寄与する。定性的かつ定量的な実験は、中国の墨画様式の伝達方法よりも、我々のモデルの方が優れていることを一貫して示している。

This paper handles the problem of converting real pictures into traditional Chinese ink-wash paintings, i.e., Chinese ink-wash painting style transfer. Though this problem could be realized by a wide range of image-to-image translation models, a notable issue with all these methods is that the original image content details could be easily erased or corrupted due to transfer of ink-wash style elements. To solve or ameliorate this issue, we propose to incorporate saliency detection into the unpaired image-to-image translation framework to regularize content information of the generated paintings. The saliency map is utilized for content regularization from two aspects, both explicitly and implicitly: (\romannumeral1) we propose saliency IOU (SIOU) loss to explicitly regularize saliency consistency before and after stylization; (\romannumeral2) we propose saliency adaptive normalization (SANorm) which implicitly enhances content integrity of the generated paintings by injecting saliency information to the generator network to guide painting generation. Besides, we also propose saliency attended discriminator network which harnesses saliency mask to focus generative adversarial attention onto salient image regions, it contributes to producing finer ink-wash stylization effect for salient objects of images. Qualitative and quantitative experiments consistently demonstrate superiority of our model over related advanced methods for Chinese ink-wash painting style transfer.

翻訳日:2024-04-26 19:40:12 公開日:2024-04-24

# グラフベースフェイクニュース検出器に対する一般ブラックボックス攻撃

A General Black-box Adversarial Attack on Graph-based Fake News Detectors ( http://arxiv.org/abs/2404.15744v1 )

ライセンス: Link先を確認

Peican Zhu, Zechen Pan, Yang Liu, Jiwei Tian, Keke Tang, Zhen Wang,

(参考訳) グラフニューラルネットワーク(GNN)をベースとした偽ニュース検出装置は,グラフ構築に様々な手法を適用し,識別のための特徴あるニュース埋め込みを学習することを目的とした。ブラックボックスのシナリオでは、建設の詳細は分かっていないため、特定の隣接行列を必要とする古典的な敵攻撃を実行することは現実的ではない。本稿では,異なるグラフ構造に基づく検出器に対する一般攻撃(GAFSI)を初めて提案する。特に、共有はグラフを構築するためにGNNベースのフェイクニュース検出器にとって重要な社会的相互作用であるので、我々は共有行動をシミュレートして検出器を騙す。まず,ローカルおよびグローバルな情報を活用するユーザを選別するための不正選択モジュールを提案する。さらに、ポストインジェクションモジュールは、選択したユーザに対して、投稿を送信して共有関係を作成するようにガイドする。共有記録はソーシャルコンテキストに追加され、さまざまな検出器に対する一般的な攻撃につながる。実験データを用いた実験の結果,GAFSIの有効性が示された。

Graph Neural Network (GNN)-based fake news detectors apply various methods to construct graphs, aiming to learn distinctive news embeddings for classification. Since the construction details are unknown for attackers in a black-box scenario, it is unrealistic to conduct the classical adversarial attacks that require a specific adjacency matrix. In this paper, we propose the first general black-box adversarial attack framework, i.e., General Attack via Fake Social Interaction (GAFSI), against detectors based on different graph structures. Specifically, as sharing is an important social interaction for GNN-based fake news detectors to construct the graph, we simulate sharing behaviors to fool the detectors. Firstly, we propose a fraudster selection module to select engaged users leveraging local and global information. In addition, a post injection module guides the selected users to create shared relations by sending posts. The sharing records will be added to the social context, leading to a general attack against different detectors. Experimental results on empirical datasets demonstrate the effectiveness of GAFSI.

翻訳日:2024-04-26 19:40:12 公開日:2024-04-24

# メタアナリシスを超えた協調的不均一因果推論

Collaborative Heterogeneous Causal Inference Beyond Meta-analysis ( http://arxiv.org/abs/2404.15746v1 )

ライセンス: Link先を確認

Tianyu Guo, Sai Praneeth Karimireddy, Michael I. Jordan,

(参考訳) 異なるデータセンター間のコラボレーションは、サイト間の異質性によってしばしば問題になる。異質性を考慮するために、最先端の手法は、各部位における共変量分布を再重み付けして、対象個体群の分布に適合させることである。それでも、ある場所が人口全体をカバーできなかったら、この方法は容易に失敗する可能性がある。さらに、分散シフトを調整した後も、従来のメタ分析の概念に依存している。本研究では,不均一データを用いた因果推論のための協調的逆確率スコア重み付け推定器を提案する。分布シフトを個別に調整する代わりに、重み付けされた確率スコアモデルを用いて分布シフトを協調的に調整する。異質性の増加に伴うメタアナリシスに基づく手法に対して,本手法は有意な改善を示した。脆弱な密度推定を考慮し,d<8を用いた非パラメトリック密度推定と,漸近的正規性を保証するフレキシブルな機械学習手法の可能性を示す。プライバシを保ちながら、成果モデルを協調的にトレーニングするフェデレーション学習アルゴリズムを提案する。合成および実データを用いて,本手法の利点を実証する。

Collaboration between different data centers is often challenged by heterogeneity across sites. To account for the heterogeneity, the state-of-the-art method is to re-weight the covariate distributions in each site to match the distribution of the target population. Nevertheless, this method could easily fail when a certain site couldn't cover the entire population. Moreover, it still relies on the concept of traditional meta-analysis after adjusting for the distribution shift. In this work, we propose a collaborative inverse propensity score weighting estimator for causal inference with heterogeneous data. Instead of adjusting the distribution shift separately, we use weighted propensity score models to collaboratively adjust for the distribution shift. Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases. To account for the vulnerable density estimation, we further discuss the double machine method and show the possibility of using nonparametric density estimation with d<8 and a flexible machine learning method to guarantee asymptotic normality. We propose a federated learning algorithm to collaboratively train the outcome model while preserving privacy. Using synthetic and real datasets, we demonstrate the advantages of our method.

翻訳日:2024-04-26 19:40:12 公開日:2024-04-24

# Guided-SPSA:パラメータシフト則による同時摂動確率近似

Guided-SPSA: Simultaneous Perturbation Stochastic Approximation assisted by the Parameter Shift Rule ( http://arxiv.org/abs/2404.15751v1 )

ライセンス: Link先を確認

Maniraman Periyasamy, Axel Plinge, Christopher Mutschler, Daniel D. Scherer, Wolfgang Mauerer,

(参考訳) 変分量子アルゴリズム(VQC)の研究は近年,量子コンピューティングコミュニティから大きな注目を集めている。これらのハイブリッドアルゴリズムは古典的成分と量子的成分の両方を利用しており、ノイズの多い中間スケールの量子デバイスに適している。パラメータシフト則を用いて正確な勾配を推定してVQCを最適化することは、NISQデバイスでは実現可能だが、より大きな問題サイズではうまくスケールできない。計算複雑性は、パラメータシフト則による勾配推定に必要な回路評価数の観点から、VQCのパラメータ数と線形にスケールする。一方、同時摂動確率近似(SPSA)のようなVQCsの勾配を近似する手法は、パラメータの数に応じてスケールしないが不安定と闘い、しばしば準最適解を得る。本研究では,パラメータシフト則とSPSAに基づく勾配近似を有意に組み合わせた,ガイド-SPSAと呼ばれる新しい勾配推定手法を提案する。 Guided-SPSAは、パラメータシフト則と同等またはより良い解を求めるトレーニング中に必要となる回路評価回数を15%から25%削減する。 Guided-SPSAはすべてのシナリオで標準SPSAより優れており、パラメータの最適下初期化のようなシナリオではパラメータシフトルールより優れている。本稿では、回帰、分類、強化学習などの量子機械学習の様々なパラダイムにおけるガイド-SPSAの性能を数値的に示す。

The study of variational quantum algorithms (VQCs) has received significant attention from the quantum computing community in recent years. These hybrid algorithms, utilizing both classical and quantum components, are well-suited for noisy intermediate-scale quantum devices. Though estimating exact gradients using the parameter-shift rule to optimize the VQCs is realizable in NISQ devices, they do not scale well for larger problem sizes. The computational complexity, in terms of the number of circuit evaluations required for gradient estimation by the parameter-shift rule, scales linearly with the number of parameters in VQCs. On the other hand, techniques that approximate the gradients of the VQCs, such as the simultaneous perturbation stochastic approximation (SPSA), do not scale with the number of parameters but struggle with instability and often attain suboptimal solutions. In this work, we introduce a novel gradient estimation approach called Guided-SPSA, which meaningfully combines the parameter-shift rule and SPSA-based gradient approximation. The Guided-SPSA results in a 15% to 25% reduction in the number of circuit evaluations required during training for a similar or better optimality of the solution found compared to the parameter-shift rule. The Guided-SPSA outperforms standard SPSA in all scenarios and outperforms the parameter-shift rule in scenarios such as suboptimal initialization of the parameters. We demonstrate numerically the performance of Guided-SPSA on different paradigms of quantum machine learning, such as regression, classification, and reinforcement learning.

翻訳日:2024-04-26 19:40:12 公開日:2024-04-24

# GC-IMSデータを用いた感染検出のための機械学習アルゴリズムの探索

Exploring Machine Learning Algorithms for Infection Detection Using GC-IMS Data: A Preliminary Study ( http://arxiv.org/abs/2404.15757v1 )

ライセンス: Link先を確認

Christos Sardianos, Chrysostomos Symvoulidis, Matthias Schlögl, Iraklis Varlamis, Georgios Th. Papadopoulos,

(参考訳) 感染症の診断における高度な診断技術の発達は、現代医療において重要な領域となっている。ガスクロマトグラフィー・イオンモビリティ・スペクトロメトリ(GC-IMS)データを活用し,機械学習アルゴリズムを1つのプラットフォームに組み込むことで,正確な感染識別の課題に対処することを目的とした。これらの困難に触発されて、当社の目標は、強力なデータ分析プロセスの作成、機械学習(ML)モデルの強化、臨床応用の徹底的な検証である。本研究は,ガスクロマトグラフィー・イオンモビリティ・スペクトロメトリ(GC-IMS)データと機械学習アルゴリズムを統合実験室情報管理システム(LIMS)プラットフォームに組み込むことにより,先進的な診断技術の分野に寄与する。プリミティブトライアルでは、さまざまなMLアルゴリズムを使用して感染したサンプルと非感染したサンプルを区別する際の精度の向上が示されている。現在、継続する取り組みは、モデルの有効性を高め、その機能を明らかにするための技術を調査し、病気の早期発見を支援するために様々な種類のデータを統合することに重点を置いている。

The developing field of enhanced diagnostic techniques in the diagnosis of infectious diseases, constitutes a crucial domain in modern healthcare. By utilizing Gas Chromatography-Ion Mobility Spectrometry (GC-IMS) data and incorporating machine learning algorithms into one platform, our research aims to tackle the ongoing issue of precise infection identification. Inspired by these difficulties, our goals consist of creating a strong data analytics process, enhancing machine learning (ML) models, and performing thorough validation for clinical applications. Our research contributes to the emerging field of advanced diagnostic technologies by integrating Gas Chromatography-Ion Mobility Spectrometry (GC-IMS) data and machine learning algorithms within a unified Laboratory Information Management System (LIMS) platform. Preliminary trials demonstrate encouraging levels of accuracy when employing various ML algorithms to differentiate between infected and non-infected samples. Continuing endeavors are currently concentrated on enhancing the effectiveness of the model, investigating techniques to clarify its functioning, and incorporating many types of data to further support the early detection of diseases.

翻訳日:2024-04-26 19:40:12 公開日:2024-04-24

# Dot by Dot:変換言語モデルに隠れた計算

Let's Think Dot by Dot: Hidden Computation in Transformer Language Models ( http://arxiv.org/abs/2404.15758v1 )

ライセンス: Link先を確認

Jacob Pfau, William Merrill, Samuel R. Bowman,

(参考訳) 言語モデルの連鎖応答は、ほとんどのベンチマークのパフォーマンスを改善する。しかしながら、これらのパフォーマンス向上が、人間のようなタスクの分解や、追加トークンが許容するより大きい計算にどの程度貢献できるかは、まだ不明である。中間トークンを使わずに応答できない2つの難解なアルゴリズムタスクを解くという考え方の連鎖の代わりに,トランスフォーマーは無意味なフィラートークン(eg, '...')を使用できることを示す。しかし, フィラートークンの学習は困難であり, 集束するためには, 具体的, 密集的な監督が必要であることが実証的に判明した。また、フィラートークンが一階公式の量化器深さの点で有用であるような問題のクラスを理論的に特徴づける。この特徴を満たすために、連鎖トークンはマルチトークン計算に関わる中間計算ステップに関する情報を提供する必要はない。以上の結果から,トークン選択とは無関係に,追加のトークンが計算上のメリットをもたらすことが示唆された。中間トークンがフィラートークンとして機能するという事実は、観測されたチェーンオブソートトークンから次第に分離される、不明瞭で隠れた計算に関わる大きな言語モデルに対する懸念を提起する。

Chain-of-thought responses from language models improve performance across most benchmarks. However, it remains unclear to what extent these performance gains can be attributed to human-like task decomposition or simply the greater computation that additional tokens allow. We show that transformers can use meaningless filler tokens (e.g., '......') in place of a chain of thought to solve two hard algorithmic tasks they could not solve when responding without intermediate tokens. However, we find empirically that learning to use filler tokens is difficult and requires specific, dense supervision to converge. We also provide a theoretical characterization of the class of problems where filler tokens are useful in terms of the quantifier depth of a first-order formula. For problems satisfying this characterization, chain-of-thought tokens need not provide information about the intermediate computational steps involved in multi-token computations. In summary, our results show that additional tokens can provide computational benefits independent of token choice. The fact that intermediate tokens can act as filler tokens raises concerns about large language models engaging in unauditable, hidden computations that are increasingly detached from the observed chain-of-thought tokens.

翻訳日:2024-04-26 19:40:12 公開日:2024-04-24

# 逆実例によるデバイアスマシンの非学習

Debiasing Machine Unlearning with Counterfactual Examples ( http://arxiv.org/abs/2404.15760v1 )

ライセンス: Link先を確認

Ziheng Chen, Jia Wang, Jun Zhuang, Abbavaram Gowtham Reddy, Fabrizio Silvestri, Jin Huang, Kaushiki Nag, Kun Kuang, Xin Ning, Gabriele Tolomei,

(参考訳) 忘れられる権利(RTBF)は、機械学習技術を実装することによって、過去の行動の持続的な影響から個人を保護しようとするものである。これらの技術は、広範囲なモデルの再訓練を必要とせずに、以前取得した知識の削除を促進する。しかし、彼らはしばしば重要な問題を見落としている。このバイアスは,(1)不均一なデータ除去を特徴とするデータレベルのバイアス,(2)残りのデータセットを汚染し,モデル精度を低下させるアルゴリズムレベルのバイアスの2つから生じる。本研究では、未学習プロセスの背後にある因果要因を分析し、データレベルとアルゴリズムレベルでバイアスを軽減する。通常、我々は介入に基づくアプローチを導入し、脱バイアスデータセットで忘れるべき知識を消去する。さらに,他のデータセットのパフォーマンスを損なうことなくセマンティックデータの一貫性を維持するため,逆実例を活用することで,忘れる手順を導出する。実験の結果,提案手法は,評価指標に基づく既存の機械学習ベースラインよりも優れていた。

The right to be forgotten (RTBF) seeks to safeguard individuals from the enduring effects of their historical actions by implementing machine-learning techniques. These techniques facilitate the deletion of previously acquired knowledge without requiring extensive model retraining. However, they often overlook a critical issue: unlearning processes bias. This bias emerges from two main sources: (1) data-level bias, characterized by uneven data removal, and (2) algorithm-level bias, which leads to the contamination of the remaining dataset, thereby degrading model accuracy. In this work, we analyze the causal factors behind the unlearning process and mitigate biases at both data and algorithmic levels. Typically, we introduce an intervention-based approach, where knowledge to forget is erased with a debiased dataset. Besides, we guide the forgetting procedure by leveraging counterfactual examples, as they maintain semantic data consistency without hurting performance on the remaining dataset. Experimental results demonstrate that our method outperforms existing machine unlearning baselines on evaluation metrics.

翻訳日:2024-04-26 19:40:12 公開日:2024-04-24

# シリコン中のドナーバウンド電子スピンの高強度スピン軌道結合強度の測定

Measurement of enhanced spin-orbit coupling strength for donor-bound electron spins in silicon ( http://arxiv.org/abs/2404.15762v1 )

ライセンス: Link先を確認

Radha Krishnan, Beng Yee Gan, Yu-Ling Hsueh, A. M. Saffat-Ee Huq, Jonathan Kenny, Rajib Rahman, Teck Seng Koh, Michelle Y. Simmons, Bent Weber,

(参考訳) 伝統的に量子ドットスピン量子ビットにおける有害な効果と考えられてきたが、オンチップ交流電場による高速コヒーレント制御を可能にするため、スピン軌道相互作用は近年再検討されている。バルクシリコン中の電子の場合、SOCは本質的に弱いが、表面や界面、あるいは原子配置によって増強することができる。ここでは、スピン軌道結合の強さは、単一ドナーと比較して多重ドナー量子ドットの多体波動関数の2桁以上で局所的に増強できることを示す。電気双極子スピン共鳴(EDSR)を用いたシリコン中のドナー結合スピンの全電気的制御の経路を提供する可能性がある。

While traditionally considered a deleterious effect in quantum dot spin qubits, the spin-orbit interaction is recently being revisited as it allows for rapid coherent control by on-chip AC electric fields. For electrons in bulk silicon, SOC is intrinsically weak, however, it can be enhanced at surfaces and interfaces, or through atomic placement. Here we show that the strength of the spin-orbit coupling can be locally enhanced by more than two orders of magnitude in the manybody wave functions of multi-donor quantum dots compared to a single donor, reaching strengths so far only reported for holes or two-donor system with certain symmetry. Our findings may provide a pathway towards all-electrical control of donor-bound spins in silicon using electric dipole spin resonance (EDSR).

翻訳日:2024-04-26 19:40:12 公開日:2024-04-24

# 非デジタルレジストレーションによる3次元顔モフィング攻撃生成

3D Face Morphing Attack Generation using Non-Rigid Registration ( http://arxiv.org/abs/2404.15765v1 )

ライセンス: Link先を確認

Jag Mohan Singh, Raghavendra Ramachandra,

(参考訳) 顔認識システム(FRS)は、現実の環境での精度の高さから、電子商取引や電子バンキングなどの商業環境で広く使われている。しかし、これらのシステムは、異なる被験者の顔色画像が混ざり合った顔形態形成攻撃に弱い。そこで本研究では、2つのボナファイド点雲から3次元顔形態を生成する新しい方法を提案する。提案手法はまず中性表現を用いたボナファイド点雲を選択する。 2つの入力点雲を最適化せずにベイジアンコヒーレントポイントドリフト (BCPD) を用いて登録し, 登録点雲の形状と色を平均化し, 顔変形点雲を生成する。提案手法は,200人のボナファイド被験者から388個の顔変形点雲を生成する。この手法の有効性は、G-MAPが81.61%の既存のSOTAよりも優れている97.93%の一般モルフィング攻撃可能性(G-MAP)を達成し、広範囲にわたる脆弱性実験によって実証された。

Face Recognition Systems (FRS) are widely used in commercial environments, such as e-commerce and e-banking, owing to their high accuracy in real-world conditions. However, these systems are vulnerable to facial morphing attacks, which are generated by blending face color images of different subjects. This paper presents a new method for generating 3D face morphs from two bona fide point clouds. The proposed method first selects bona fide point clouds with neutral expressions. The two input point clouds were then registered using a Bayesian Coherent Point Drift (BCPD) without optimization, and the geometry and color of the registered point clouds were averaged to generate a face morphing point cloud. The proposed method generates 388 face-morphing point clouds from 200 bona fide subjects. The effectiveness of the method was demonstrated through extensive vulnerability experiments, achieving a Generalized Morphing Attack Potential (G-MAP) of 97.93%, which is superior to the existing state-of-the-art (SOTA) with a G-MAP of 81.61%.

翻訳日:2024-04-26 19:40:12 公開日:2024-04-24

# 確率微分方程式によるベイズ流の統一と拡散モデル

Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations ( http://arxiv.org/abs/2404.15766v1 )

ライセンス: Link先を確認

Kaiwen Xue, Yuhao Zhou, Shen Nie, Xu Min, Xiaolu Zhang, Jun Zhou, Chongxuan Li,

(参考訳) ベイズ流ネットワーク (BFN) は, 拡散モデル (DM) のサンプルではなく, ベイズ推定による様々なノイズレベルの分布のパラメータを反復的に改良する。識別可能な性質のため、BFNは連続データと離散データの両方をモデリングし、同時に高速サンプリング機能を維持することを約束している。本稿では,確率微分方程式(SDE)を用いて,BFNをDMに接続することで,BFNの理解と拡張を図る。我々は,BFNの雑音付加過程に対応する線形SDEを同定し,BFNの回帰損失が復調点マッチングと一致していることを示し,各逆時間SDEの1次解法としてBFNのサンプルを検証した。これらの知見と既存のDMにおける高速サンプリングのレシピに基づいて、画像とテキストの両方で機能評価(例、10)が限定されたサンプル品質の観点から、元のBFNサンプリングを著しく上回るBFNの特殊解法を提案する。特に,本研究では,5～20倍の速度を無償で達成している。私たちのコードはhttps://github.com/ML-GSAI/BFN-Solver.comから入手可能です。

Bayesian flow networks (BFNs) iteratively refine the parameters, instead of the samples in diffusion models (DMs), of distributions at various noise levels through Bayesian inference. Owing to its differentiable nature, BFNs are promising in modeling both continuous and discrete data, while simultaneously maintaining fast sampling capabilities. This paper aims to understand and enhance BFNs by connecting them with DMs through stochastic differential equations (SDEs). We identify the linear SDEs corresponding to the noise-addition processes in BFNs, demonstrate that BFN's regression losses are aligned with denoise score matching, and validate the sampler in BFN as a first-order solver for the respective reverse-time SDE. Based on these findings and existing recipes of fast sampling in DMs, we propose specialized solvers for BFNs that markedly surpass the original BFN sampler in terms of sample quality with a limited number of function evaluations (e.g., 10) on both image and text datasets. Notably, our best sampler achieves an increase in speed of 5~20 times for free. Our code is available at https://github.com/ML-GSAI/BFN-Solver.

翻訳日:2024-04-26 19:40:12 公開日:2024-04-24

# ChEX:胸部X線におけるインタラクティブな局在と領域記述

ChEX: Interactive Localization and Region Description in Chest X-rays ( http://arxiv.org/abs/2404.15770v1 )

ライセンス: Link先を確認

Philip Müller, Georgios Kaissis, Daniel Rueckert,

(参考訳) レポート生成モデルは、胸部X線のような医療画像のきめ細かいテキスト解釈を提供するが、対話性(すなわち、ユーザクエリを通じて生成プロセスを操る能力)と局所的解釈可能性(すなわち、その予測を視覚的に根拠づけること)が欠如していることが多い。これらの問題に対処する努力はあったが、テキストクエリをサポートしない、あるいはローカライズされた解釈性を提供しないなど、対話性に制限がある。そこで本研究では,解剖学的領域や病理などの多様な側面を対象としたテキストプロンプトとバウンディングボックスを統合した,新しいマルチタスクアーキテクチャとトレーニングパラダイムを提案する。このアプローチをChest X-Ray Explainer (ChEX)と呼ぶ。画像のローカライズされた解釈やレポート生成を含む9つの胸部X線タスクの不均一なセットに対する評価は、SOTAモデルとの競合性を示し、さらなる分析はChEXのインタラクティブ機能を示している。

Report generation models offer fine-grained textual interpretations of medical images like chest X-rays, yet they often lack interactivity (i.e. the ability to steer the generation process through user queries) and localized interpretability (i.e. visually grounding their predictions), which we deem essential for future adoption in clinical practice. While there have been efforts to tackle these issues, they are either limited in their interactivity by not supporting textual queries or fail to also offer localized interpretability. Therefore, we propose a novel multitask architecture and training paradigm integrating textual prompts and bounding boxes for diverse aspects like anatomical regions and pathologies. We call this approach the Chest X-Ray Explainer (ChEX). Evaluations across a heterogeneous set of 9 chest X-ray tasks, including localized image interpretation and report generation, showcase its competitiveness with SOTA models while additional analysis demonstrates ChEX's interactive capabilities.

翻訳日:2024-04-26 19:40:12 公開日:2024-04-24

# DVF:検索ガイドラインによるロバスト化と高精度画像検索

DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines ( http://arxiv.org/abs/2404.15771v1 )

ライセンス: Link先を確認

Xin Jiang, Hao Tang, Rui Yan, Jinhui Tang, Zechao Li,

(参考訳) 細粒度画像検索(FGIR)は、一般化を維持しながら視覚的に類似した物体を識別する視覚表現を学習することである。既存の方法は識別的特徴を生成することを提案するが、FGIRタスク自体の特異性を考えることは滅多にない。本稿では,FGIRモデルの設計において,サブカテゴリ固有の相違点を特定し,識別的特徴を生成するための実践的ガイドラインを提案する。これらのガイドラインには、オブジェクト(G1)の強調、サブカテゴリ固有の相違(G2)の強調、効果的なトレーニング戦略(G3)の活用が含まれる。 G1 と G2 に続いて,DVF と表記される平易な視覚変換器のための新しいデュアルビジュアルフィルタ機構を設計し,サブカテゴリ固有の相違を捉える。具体的には、二重視覚フィルタリング機構は、オブジェクト指向モジュールと意味指向モジュールとから構成される。これらのコンポーネントは、オブジェクトを拡大し、それぞれ識別可能な領域を特定するのに役立ちます。 G3の後、DVFの識別性と一般化能力を向上させるための識別モデルトレーニング戦略を実装した。総括分析およびアブレーション研究により,提案ガイドラインの有効性が確認された。ベルとホイッスルなしで、提案されたDVFは、クローズドセットとオープンセットの設定で、広く使われている3つのきめ細かいデータセットに対して最先端のパフォーマンスを達成する。

Fine-grained image retrieval (FGIR) is to learn visual representations that distinguish visually similar objects while maintaining generalization. Existing methods propose to generate discriminative features, but rarely consider the particularity of the FGIR task itself. This paper presents a meticulous analysis leading to the proposal of practical guidelines to identify subcategory-specific discrepancies and generate discriminative features to design effective FGIR models. These guidelines include emphasizing the object (G1), highlighting subcategory-specific discrepancies (G2), and employing effective training strategy (G3). Following G1 and G2, we design a novel Dual Visual Filtering mechanism for the plain visual transformer, denoted as DVF, to capture subcategory-specific discrepancies. Specifically, the dual visual filtering mechanism comprises an object-oriented module and a semantic-oriented module. These components serve to magnify objects and identify discriminative regions, respectively. Following G3, we implement a discriminative model training strategy to improve the discriminability and generalization ability of DVF. Extensive analysis and ablation studies confirm the efficacy of our proposed guidelines. Without bells and whistles, the proposed DVF achieves state-of-the-art performance on three widely-used fine-grained datasets in closed-set and open-set settings.

翻訳日:2024-04-26 19:40:12 公開日:2024-04-24

# Bi-Mamba4TS:時系列予測のための双方向マンバ

Bi-Mamba4TS: Bidirectional Mamba for Time Series Forecasting ( http://arxiv.org/abs/2404.15772v1 )

ライセンス: Link先を確認

Aobo Liang, Xingguo Jiang, Yan Sun, Chang Lu,

(参考訳) 長期時系列予測(LTSF)は、将来のトレンドとパターンに関するより長い洞察を提供する。近年、ディープラーニングモデル、特にトランスフォーマーはLTSFタスクで高度なパフォーマンスを実現している。しかし、トランスフォーマーの二次複雑性は、計算効率のバランスと性能の予測という課題を提起する。近年,Mamba という新しい状態空間モデル (SSM) が提案されている。入力データに対する選択的機能とハードウェア対応並列計算アルゴリズムにより、Mambaは線形計算複雑性を維持しながら、長期的依存をうまく捉えることができる。 Mamba は長いシーケンスモデリングに優れた能力を示しており、LTSF の Transformer ベースのモデルと競合する可能性がある。本稿では,時系列予測のための双方向マンバであるBi-Mamba4TSを提案する。時系列セマンティクスの空間性に対処するため、我々は、より微細な粒度で時系列の進化パターンを捉えながら、局所的な情報を強化するパッチ手法を採用した。データセットの特性に基づいてより適切なモデリング手法を選択するため,本モデルでは,チャネル独立・チャネル混合トークン化戦略を統一し,系列関係対応決定器を用いて戦略選択プロセスを制御する。 7つの実世界のデータセットに対する大規模な実験により、我々のモデルは最先端の手法と比較してより正確な予測を達成できることを示した。

Long-term time series forecasting (LTSF) provides longer insights into future trends and patterns. In recent years, deep learning models especially Transformers have achieved advanced performance in LTSF tasks. However, the quadratic complexity of Transformers rises the challenge of balancing computaional efficiency and predicting performance. Recently, a new state space model (SSM) named Mamba is proposed. With the selective capability on input data and the hardware-aware parallel computing algorithm, Mamba can well capture long-term dependencies while maintaining linear computational complexity. Mamba has shown great ability for long sequence modeling and is a potential competitor to Transformer-based models in LTSF. In this paper, we propose Bi-Mamba4TS, a bidirectional Mamba for time series forecasting. To address the sparsity of time series semantics, we adopt the patching technique to enrich the local information while capturing the evolutionary patterns of time series in a finer granularity. To select more appropriate modeling method based on the characteristics of the dataset, our model unifies the channel-independent and channel-mixing tokenization strategies and uses a series-relation-aware decider to control the strategy choosing process. Extensive experiments on seven real-world datasets show that our model achieves more accurate predictions compared with state-of-the-art methods.

翻訳日:2024-04-26 19:30:27 公開日:2024-04-24

# LiDARインテンシティシミュレーションのための物理対応ディープラーニングアーキテクチャに向けて

Toward Physics-Aware Deep Learning Architectures for LiDAR Intensity Simulation ( http://arxiv.org/abs/2404.15774v1 )

ライセンス: Link先を確認

Vivek Anand, Bharat Lohani, Gaurav Pandey, Rakesh Mishra,

(参考訳) 自動運転車(AV)は環境の理解とナビゲーションにLiDARの認識に大きく依存している。 LiDAR強度は反射レーザー信号に関する貴重な情報を提供し、AVの知覚能力を高める上で重要な役割を果たす。しかし、LiDARの強度を正確にシミュレーションすることは、環境中の物体の材料特性が利用できないことや、レーザービームと環境の間の複雑な相互作用のため、依然として課題である。提案手法は,深層学習フレームワークに物理に基づくモーダルティを組み込むことで,強度シミュレーションの精度を向上させることを目的とする。レーザービームと物体の間の相互作用を捉える重要な要素の1つは、入射角である。本研究は,深部ニューラルネットワークへの個別入力としてLiDAR入射角を追加することにより,結果を著しく向上させることを示した。 U-NET a Convolutional Neural Network (CNN) と Pix2Pix a Generative Adversarial Network (GAN) の2つの著名なディープラーニングアーキテクチャの比較研究を行った。この2つのアーキテクチャを強度予測タスクに実装し,実験にSemanticKITTIとVoxelScapeデータセットを使用した。比較分析により、どちらのアーキテクチャも追加入力として入射角から恩恵を受けることが明らかとなった。さらにPix2Pixアーキテクチャは、特に入射角が組み込まれた場合、U-NETより優れている。

Autonomous vehicles (AVs) heavily rely on LiDAR perception for environment understanding and navigation. LiDAR intensity provides valuable information about the reflected laser signals and plays a crucial role in enhancing the perception capabilities of AVs. However, accurately simulating LiDAR intensity remains a challenge due to the unavailability of material properties of the objects in the environment, and complex interactions between the laser beam and the environment. The proposed method aims to improve the accuracy of intensity simulation by incorporating physics-based modalities within the deep learning framework. One of the key entities that captures the interaction between the laser beam and the objects is the angle of incidence. In this work we demonstrate that the addition of the LiDAR incidence angle as a separate input to the deep neural networks significantly enhances the results. We present a comparative study between two prominent deep learning architectures: U-NET a Convolutional Neural Network (CNN), and Pix2Pix a Generative Adversarial Network (GAN). We implemented these two architectures for the intensity prediction task and used SemanticKITTI and VoxelScape datasets for experiments. The comparative analysis reveals that both architectures benefit from the incidence angle as an additional input. Moreover, the Pix2Pix architecture outperforms U-NET, especially when the incidence angle is incorporated.

翻訳日:2024-04-26 19:30:27 公開日:2024-04-24

# 医療産業における大規模言語モデル応用の評価に関する総合的研究

A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry ( http://arxiv.org/abs/2404.15777v1 )

ライセンス: Link先を確認

Yining Huang, Keke Tang, Meilian Chen,

(参考訳) 2017年のTransformerアーキテクチャの開始以来、GPTやBERTのような大規模言語モデル(LLM)は大幅に進化し、言語理解と生成の高度な能力を持つ様々な産業に影響を与えた。これらのモデルは、医療分野を変革する可能性を示し、その効果的かつ倫理的な展開を保証するための特別な評価フレームワークの必要性を強調している。この包括的調査は、医療におけるLSMの広範な適用と必要な評価を概説し、医療の成果を高める上で、その能力を完全に活用するための実証的検証の重要性を強調した。本調査は,臨床環境,医療用テキストデータ処理,研究,教育,公衆衛生への意識といった分野におけるLCM応用の詳細な分析を行うために構成されている。まず,臨床応用,医用テキストデータ処理,情報検索,データ分析,医学論文作成,教育コンテンツ生成などの業務において,その業績に基づいて評価される役割について検討する。その後のセクションでは、これらの評価で使用される方法論を掘り下げ、モデルの有効性、正確性、倫理的整合性を評価するために使用されるベンチマークとメトリクスについて議論した。本調査は,医療従事者,研究者,政策立案者に対して,医療応用におけるLCMの潜在的な強みと限界を包括的に理解することを目的としている。この調査は、評価プロセスとLSMを医療に組み込む上で直面する課題に関する詳細な洞察を提供することによって、これらの強力なモデルの責任ある開発と展開をガイドし、厳格な倫理基準を維持しながら、その潜在能力を最大限に活用することを目指している。

Since the inception of the Transformer architecture in 2017, Large Language Models (LLMs) such as GPT and BERT have evolved significantly, impacting various industries with their advanced capabilities in language understanding and generation. These models have shown potential to transform the medical field, highlighting the necessity for specialized evaluation frameworks to ensure their effective and ethical deployment. This comprehensive survey delineates the extensive application and requisite evaluation of LLMs within healthcare, emphasizing the critical need for empirical validation to fully exploit their capabilities in enhancing healthcare outcomes. Our survey is structured to provide an in-depth analysis of LLM applications across clinical settings, medical text data processing, research, education, and public health awareness. We begin by exploring the roles of LLMs in different medical applications, detailing how they are evaluated based on their performance in tasks such as clinical application, medical text data processing, information retrieval, data analysis, medical scientific writing, educational content generation etc. The subsequent sections delve into the methodologies employed in these evaluations, discussing the benchmarks and metrics used to assess the models' effectiveness, accuracy, and ethical alignment. Through this survey, we aim to equip healthcare professionals, researchers, and policymakers with a comprehensive understanding of the potential strengths and limitations of LLMs in medical applications. By providing detailed insights into the evaluation processes and the challenges faced in integrating LLMs into healthcare, this survey seeks to guide the responsible development and deployment of these powerful models, ensuring they are harnessed to their full potential while maintaining stringent ethical standards.

翻訳日:2024-04-26 19:30:27 公開日:2024-04-24

# BASS: 意図を最適化した投機サンプリング

BASS: Batched Attention-optimized Speculative Sampling ( http://arxiv.org/abs/2404.15778v1 )

ライセンス: Link先を確認

Haifeng Qian, Sujan Kumar Gonugondla, Sungsoo Ha, Mingyue Shang, Sanjay Krishna Gouda, Ramesh Nallapati, Sudipta Sengupta, Xiaofei Ma, Anoop Deoras,

(参考訳) 投機的復号化は、大規模言語モデルをホストする際のレイテンシとスループットを改善する強力な方法として登場した。しかし、既存の実装のほとんどは単一のシーケンスを生成することに重点を置いている。実世界の生成AIアプリケーションは、しばしば複数の応答と、バッチ環境で投機的復号化を実行する方法を必要とする。本稿では、バッチ化された投機的復号化システムについて述べる。これは、マルチシーケンス生成遅延において新しい最先端の状態を設定し、GPUの優れた利用と、時間予算内での世代品質を示す。例えば、1つのA100 GPU上の7.8Bサイズモデルとバッチサイズが8の場合、各シーケンスは平均速度5.8msで生成され、全体のスループットは毎秒1.1Kである。これらの結果は、最先端のレイテンシと、最適化された正規デコードよりも2.15倍のスピードアップを示している。通常のデコーディングが終わらない時間予算の中で、我々のシステムはHumanEval Pass@Firstの43%とPass@Allの61%のシーケンスを生成することができる。復号化のピークGPU利用率は15.8%、正規復号化の最高値の3倍、単列投機復号化の約10倍に達する。

Speculative decoding has emerged as a powerful method to improve latency and throughput in hosting large language models. However, most existing implementations focus on generating a single sequence. Real-world generative AI applications often require multiple responses and how to perform speculative decoding in a batched setting while preserving its latency benefits poses non-trivial challenges. This paper describes a system of batched speculative decoding that sets a new state of the art in multi-sequence generation latency and that demonstrates superior GPU utilization as well as quality of generations within a time budget. For example, for a 7.8B-size model on a single A100 GPU and with a batch size of 8, each sequence is generated at an average speed of 5.8ms per token, the overall throughput being 1.1K tokens per second. These results represent state-of-the-art latency and a 2.15X speed-up over optimized regular decoding. Within a time budget that regular decoding does not finish, our system is able to generate sequences with HumanEval Pass@First of 43% and Pass@All of 61%, far exceeding what's feasible with single-sequence speculative decoding. Our peak GPU utilization during decoding reaches as high as 15.8%, more than 3X the highest of that of regular decoding and around 10X of single-sequence speculative decoding.

翻訳日:2024-04-26 19:30:27 公開日:2024-04-24

# 立方体サットの高スペクトル画像伝送と復元のためのリアルタイム圧縮センシング

Real-Time Compressed Sensing for Joint Hyperspectral Image Transmission and Restoration for CubeSat ( http://arxiv.org/abs/2404.15781v1 )

ライセンス: Link先を確認

Chih-Chung Hsu, Chih-Yu Jian, Eng-Shen Tu, Chia-Ming Lee, Guan-Lin Chen,

(参考訳) 本稿では、しばしばストライプ効果に悩まされ、計算資源に制限される小型衛星からのハイパースペクトル画像(HSI)再構成に関わる課題に対処する。本研究は,RTCS(Real-Time Compressed Sensing, リアルタイム圧縮センシング)ネットワークを提案する。 RTCSネットワークは、必要なトレーニングサンプルを削減し、整数8ベースのエンコーダの実装を容易にし、ストリップのような高速な圧縮センシングを可能にした。これは、高精度浮動小数点演算を必要とする最適化ベースのモデルとは対照的であり、エッジデバイスへのデプロイが困難である。我々のエンコーダは、ストリップライクなHSIデータ伝送に整数8互換の線形プロジェクションを使用し、リアルタイム圧縮センシングを確実にする。さらに、新しい2ストリームアーキテクチャに基づいて、レシーバ側で効率的なHSI復元デコーダを提案し、高度な中央サーバを必要とせずにエッジデバイス再構築を可能にする。これは、地上で重要な計算資源を必要とする小型衛星が増えているため、特に重要である。大規模な実験により、我々のアプローチの優れた性能が検証され、既存の小型衛星システムに新たな重要な能力を提供する。

This paper addresses the challenges associated with hyperspectral image (HSI) reconstruction from miniaturized satellites, which often suffer from stripe effects and are computationally resource-limited. We propose a Real-Time Compressed Sensing (RTCS) network designed to be lightweight and require only relatively few training samples for efficient and robust HSI reconstruction in the presence of the stripe effect and under noisy transmission conditions. The RTCS network features a simplified architecture that reduces the required training samples and allows for easy implementation on integer-8-based encoders, facilitating rapid compressed sensing for stripe-like HSI, which exactly matches the moderate design of miniaturized satellites on push broom scanning mechanism. This contrasts optimization-based models that demand high-precision floating-point operations, making them difficult to deploy on edge devices. Our encoder employs an integer-8-compatible linear projection for stripe-like HSI data transmission, ensuring real-time compressed sensing. Furthermore, based on the novel two-streamed architecture, an efficient HSI restoration decoder is proposed for the receiver side, allowing for edge-device reconstruction without needing a sophisticated central server. This is particularly crucial as an increasing number of miniaturized satellites necessitates significant computing resources on the ground station. Extensive experiments validate the superior performance of our approach, offering new and vital capabilities for existing miniaturized satellite systems.

翻訳日:2024-04-26 19:30:27 公開日:2024-04-24

# Aegisの実証的研究

An Empirical Study of Aegis ( http://arxiv.org/abs/2404.15784v1 )

ライセンス: Link先を確認

Daniel Saragih, Paridhi Goel, Tejas Balaji, Alyssa Li,

(参考訳) ビット・フリップ攻撃(ビット・フリップ・アタック、Bit flipping attack)は、ニューラルネットワークに対する攻撃の一種であり、その有効性を緩和するために多くの防御機構が発明された。これらの防御機構の堅牢性を確保することの重要性から,我々はイージスフレームワークに関する実証的研究を行った。我々は、低エントロピーデータ(MNIST)に基づいて、Aegisのベースラインメカニズムを評価し、MNISTを微調整した事前学習モデルを評価する。また,データ強化とAegisのロバストネストレーニングの併用,およびAegisが他の敵攻撃(例えば,敵の事例の生成)でどのように機能するかを比較した。 Aegisのダイナミックエグジット戦略とロバストネストレーニングの両方に欠点があることが判明した。特に、摂動データや逆の例をベースラインと比較すると、精度の低下が見られる。さらに、より単純なデータセットでテストすると、ダイナミックエグゼクティブ・ストラテジーが一様性を失うことが判明した。プロジェクトのコードはGitHubで公開されている。

Bit flipping attacks are one class of attacks on neural networks with numerous defense mechanisms invented to mitigate its potency. Due to the importance of ensuring the robustness of these defense mechanisms, we perform an empirical study on the Aegis framework. We evaluate the baseline mechanisms of Aegis on low-entropy data (MNIST), and we evaluate a pre-trained model with the mechanisms fine-tuned on MNIST. We also compare the use of data augmentation to the robustness training of Aegis, and how Aegis performs under other adversarial attacks, such as the generation of adversarial examples. We find that both the dynamic-exit strategy and robustness training of Aegis has some drawbacks. In particular, we see drops in accuracy when testing on perturbed data, and on adversarial examples, as compared to baselines. Moreover, we found that the dynamic exit-strategy loses its uniformity when tested on simpler datasets. The code for this project is available on GitHub.

翻訳日:2024-04-26 19:30:27 公開日:2024-04-24

# クラスを超えて見る:言語記述子によるゼロショット接地状況認識

Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer ( http://arxiv.org/abs/2404.15785v1 )

ライセンス: Link先を確認

Jiaming Lei, Lin Li, Chunping Wang, Jun Xiao, Long Chen,

(参考訳) 強力な一般化能力、事前訓練された視覚言語モデル(VLM)、例えばCLIPは、ゼロショットシーン理解において広く利用されている。単純な認識タスクとは異なり、接地状況認識(GSR)では、画像内の健全な活動(動詞)を分類するだけでなく、行動に参加するすべての意味的役割を検出する必要がある。この複雑なタスクは通常、動詞の認識、意味的役割の接地、名詞の認識という3つのステップを含む。クラスベースのプロンプトをVLMとグラウンドモデルで直接採用することは、曖昧な動詞概念の区別、固定された動詞中心のテンプレート1入力による役割の正確なローカライズ、文脈対応の名詞予測といった、いくつかの制限に悩まされる。本稿では,これらの制限は,動詞・名詞の理解が不十分なモードに起因していると論じる。この目的のために,Language Explainer (LEX) によるゼロショットGSRの新しいアプローチを導入する。 1) 異なる動詞群の識別性を高めるために、一般的な動詞中心の記述を生成する動詞説明装置 2) より明瞭な理解のために動詞中心のテンプレートを言い換えて意味的役割の正確なローカライゼーションを強化する接地説明詞。 3) シーン固有の名詞記述を生成する名詞説明器は,文脈対応の名詞認識を保証する。 GSRプロセスの各ステップに補助的な説明器を設けることで、LEXは現実世界のシナリオにおける複雑なシーン理解を容易にする。 SWiGデータセットに対する広範な検証では、ゼロショットGSRにおけるLEXの有効性と相互運用性が示されている。

Benefiting from strong generalization ability, pre-trained vision language models (VLMs), e.g., CLIP, have been widely utilized in zero-shot scene understanding. Unlike simple recognition tasks, grounded situation recognition (GSR) requires the model not only to classify salient activity (verb) in the image, but also to detect all semantic roles that participate in the action. This complex task usually involves three steps: verb recognition, semantic role grounding, and noun recognition. Directly employing class-based prompts with VLMs and grounding models for this task suffers from several limitations, e.g., it struggles to distinguish ambiguous verb concepts, accurately localize roles with fixed verb-centric template1 input, and achieve context-aware noun predictions. In this paper, we argue that these limitations stem from the mode's poor understanding of verb/noun classes. To this end, we introduce a new approach for zero-shot GSR via Language EXplainer (LEX), which significantly boosts the model's comprehensive capabilities through three explainers: 1) verb explainer, which generates general verb-centric descriptions to enhance the discriminability of different verb classes; 2) grounding explainer, which rephrases verb-centric templates for clearer understanding, thereby enhancing precise semantic role localization; and 3) noun explainer, which creates scene-specific noun descriptions to ensure context-aware noun recognition. By equipping each step of the GSR process with an auxiliary explainer, LEX facilitates complex scene understanding in real-world scenarios. Our extensive validations on the SWiG dataset demonstrate LEX's effectiveness and interoperability in zero-shot GSR.

翻訳日:2024-04-26 19:30:27 公開日:2024-04-24

# MedMNIST+データセットコレクションによるモデルプロトタイピングの再考

Rethinking Model Prototyping through the MedMNIST+ Dataset Collection ( http://arxiv.org/abs/2404.15786v1 )

ライセンス: Link先を確認

Sebastian Doerrich, Francesco Di Salvo, Julius Brockmann, Christian Ledig,

(参考訳) 臨床実践におけるディープラーニングベースのシステムの統合は、制限された異種医学データセットに根ざした課題によってしばしば妨げられる。さらに、臨床応用性よりも狭い範囲のベンチマークでの限界性能改善の優先順位付けは、有意義なアルゴリズムの進歩を遅らせている。この傾向は、臨床に関係のある革新を育むのではなく、選択したデータセット上で最先端のパフォーマンスを達成するために既存の手法を過度に微調整する結果をもたらすことが多い。本研究は、MedMNIST+データベースの総合的なベンチマークを提示し、評価環境の多様化と、医用画像分類のための共通畳み込みニューラルネットワーク(CNN)とトランスフォーマーベースのアーキテクチャの徹底的な分析を行う。本評価は, 様々な医療データセット, トレーニング手法, 入力解像度を包含し, 広く使用されているモデル変異の強度と限界を再評価することを目的としている。この結果から,計算効率のよいトレーニングスキームと最新の基礎モデルは,高額なエンドツーエンドトレーニングとリソース強化アプローチのギャップを埋める上で有望であることが示唆された。さらに、一般的な仮定とは対照的に、高分解能は一定のしきい値を超えるパフォーマンスを一貫して改善することはなく、特にプロトタイピング段階における低分解能の使用を優先して処理を高速化する。特に,本研究では,異なるモデルアーキテクチャの本質的な能力を理解することの重要性を強調したViTベースのアーキテクチャと比較して,畳み込みモデルの競争性を再確認する。さらに、我々の標準化された評価フレームワークは、MedMNIST+データセットコレクションの透明性、再現性、コンパラビリティの向上と、この分野における今後の研究に役立つことを期待しています。コードはまもなくリリースされる。

The integration of deep learning based systems in clinical practice is often impeded by challenges rooted in limited and heterogeneous medical datasets. In addition, prioritization of marginal performance improvements on a few, narrowly scoped benchmarks over clinical applicability has slowed down meaningful algorithmic progress. This trend often results in excessive fine-tuning of existing methods to achieve state-of-the-art performance on selected datasets rather than fostering clinically relevant innovations. In response, this work presents a comprehensive benchmark for the MedMNIST+ database to diversify the evaluation landscape and conduct a thorough analysis of common convolutional neural networks (CNNs) and Transformer-based architectures, for medical image classification. Our evaluation encompasses various medical datasets, training methodologies, and input resolutions, aiming to reassess the strengths and limitations of widely used model variants. Our findings suggest that computationally efficient training schemes and modern foundation models hold promise in bridging the gap between expensive end-to-end training and more resource-refined approaches. Additionally, contrary to prevailing assumptions, we observe that higher resolutions may not consistently improve performance beyond a certain threshold, advocating for the use of lower resolutions, particularly in prototyping stages, to expedite processing. Notably, our analysis reaffirms the competitiveness of convolutional models compared to ViT-based architectures emphasizing the importance of comprehending the intrinsic capabilities of different model architectures. Moreover, we hope that our standardized evaluation framework will help enhance transparency, reproducibility, and comparability on the MedMNIST+ dataset collection as well as future research within the field. Code will be released soon.

翻訳日:2024-04-26 19:30:27 公開日:2024-04-24

# MotionMaster:ビデオ生成のためのトレーニング不要カメラモーション転送

MotionMaster: Training-free Camera Motion Transfer For Video Generation ( http://arxiv.org/abs/2404.15789v1 )

ライセンス: Link先を確認

Teng Hu, Jiangning Zhang, Ran Yi, Yating Wang, Hongrui Huang, Jieyu Weng, Yabiao Wang, Lizhuang Ma,

(参考訳) 拡散モデルの出現は、画像およびビデオ生成の進歩を大いに促進した。近年,テキスト・トゥ・ビデオ・ジェネレーションやビデオ・モーション・コントロールなど,カメラ・モーション・コントロールが重要な話題となっているコントロール可能なビデオ・ジェネレーションへの取り組みが進められている。しかし、既存のカメラモーションコントロール手法は、時間カメラモジュールのトレーニングに頼っており、ビデオ生成モデルにおける大量のパラメータのため、かなりの計算資源を必要とする。さらに、トレーニング中にカメラのモーションタイプを事前に定義する既存の手法では、カメラ制御の柔軟性が制限されている。そこで,トレーニングコストを低減し,フレキシブルなカメラ制御を実現するために,ソースビデオ中のカメラの動きとオブジェクトの動きをアンハングリングし,抽出したカメラの動きを新しいビデオに転送する,新しいトレーニングフリー動画移動モデルであるCOMDを提案する。まず,背景から移動物体を分離し,ポアソン方程式を解くことにより,背景の動きに基づいて移動物体領域におけるカメラの動きを推定する。さらに,複数のビデオの時間的注目マップに共通する特徴を抽出するために,ウィンドウベースのクラスタリング手法を用いて,類似のカメラモーションを用いた複数のビデオから共通カメラモーションを抽出する,数発のカメラモーション・アンタングル法を提案する。最後に、異なる種類のカメラの動きを組み合わせ、より制御しやすくフレキシブルなカメラ制御を可能にするモーション組み合わせ法を提案する。広汎な実験により、我々のトレーニング不要なアプローチは、カメラオブジェクトの動きを効果的に分離し、分離されたカメラモーションを幅広い制御可能なビデオ生成タスクに適用し、フレキシブルで多様なカメラモーション制御を実現することができることを示した。

The emergence of diffusion models has greatly propelled the progress in image and video generation. Recently, some efforts have been made in controllable video generation, including text-to-video generation and video motion control, among which camera motion control is an important topic. However, existing camera motion control methods rely on training a temporal camera module, and necessitate substantial computation resources due to the large amount of parameters in video generation models. Moreover, existing methods pre-define camera motion types during training, which limits their flexibility in camera control. Therefore, to reduce training costs and achieve flexible camera control, we propose COMD, a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos and transfers the extracted camera motions to new videos. We first propose a one-shot camera motion disentanglement method to extract camera motion from a single source video, which separates the moving objects from the background and estimates the camera motion in the moving objects region based on the motion in the background by solving a Poisson equation. Furthermore, we propose a few-shot camera motion disentanglement method to extract the common camera motion from multiple videos with similar camera motions, which employs a window-based clustering technique to extract the common features in temporal attention maps of multiple videos. Finally, we propose a motion combination method to combine different types of camera motions together, enabling our model a more controllable and flexible camera control. Extensive experiments demonstrate that our training-free approach can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks, achieving flexible and diverse camera motion control.

翻訳日:2024-04-26 19:30:27 公開日:2024-04-24

# マルチモーダル検索のための大規模言語モデルの活用

Leveraging Large Language Models for Multimodal Search ( http://arxiv.org/abs/2404.15790v1 )

ライセンス: Link先を確認

Oriol Barbany, Michael Huang, Xinliang Zhu, Arnab Dhua,

(参考訳) マルチモーダル検索は、ユーザに対して、検索意図を抑圧する自然な効果的な方法を提供する上で、ますます重要になっている。画像は所望の製品の細かな詳細を提供するが、テキストは検索の修正を簡単に組み込むことができる。しかし、既存のマルチモーダル検索システムは信頼性が低く、単純なクエリに対処できない。この問題は、曖昧で暗黙的で無関係なインフォームを含む自然言語のテキストクエリの大きなばらつきによって難しくなる。これらの問題に対処するには、マッチング能力の強化、推論能力、コンテキスト対応のクエリ解析と書き換えを必要とする。本稿では,Fashion200Kデータセット上での新たなパフォーマンスマイルストーンを実現する,新しいマルチモーダル検索モデルを提案する。さらに,Large Language Models (LLM) を統合した新しい検索インタフェースを提案する。このインタフェースは,ユーザと対話しながら,検索システムにクエリをルーティングする。マルチモーダル検索モデルと組み合わせることで、人間のようなインタラクションを提供し、全体的な検索体験を向上できるショッピングアシスタントの新時代を開拓する。

Multimodal search has become increasingly important in providing users with a natural and effective way to ex-press their search intentions. Images offer fine-grained details of the desired products, while text allows for easily incorporating search modifications. However, some existing multimodal search systems are unreliable and fail to address simple queries. The problem becomes harder with the large variability of natural language text queries, which may contain ambiguous, implicit, and irrelevant in-formation. Addressing these issues may require systems with enhanced matching capabilities, reasoning abilities, and context-aware query parsing and rewriting. This paper introduces a novel multimodal search model that achieves a new performance milestone on the Fashion200K dataset. Additionally, we propose a novel search interface integrating Large Language Models (LLMs) to facilitate natural language interaction. This interface routes queries to search systems while conversationally engaging with users and considering previous searches. When coupled with our multimodal search model, it heralds a new era of shopping assistants capable of offering human-like interaction and enhancing the overall search experience.

翻訳日:2024-04-26 19:30:27 公開日:2024-04-24

# 非局所性から文脈性への変換

Converting nonlocality into contextuality ( http://arxiv.org/abs/2404.15793v1 )

ライセンス: Link先を確認

Karl Svozil,

(参考訳) 行列鉛筆の対角化は、ブールの「可能な経験の条件」を演算子に基づいて書き起こす一様手法を提供する。また、関連する文脈の構造解析を行い、古典的な予測から量子化されたシステムの偏差のコンパクトな形式を提案する。

Diagonalization of matrix pencils provide a uniform technique to transcribe operator based violations of Boole's `conditions of possible experience' involving multipartite correllations into contextuality. They also provide structural analysis of the contexts involved, and thereby suggest compact forms of deviations of quantized systems from classical predictions.

翻訳日:2024-04-26 19:30:27 公開日:2024-04-24

# 品質多様性のためのインコンテキストAIジェネレータとしての大規模言語モデル

Large Language Models as In-context AI Generators for Quality-Diversity ( http://arxiv.org/abs/2404.15794v1 )

ライセンス: Link先を確認

Bryan Lim, Manon Flageat, Antoine Cully,

(参考訳) QD(Quality-Diversity)アプローチは、様々なニッチにまたがる高品質なソリューションのアーカイブを見つけることができるため、オープンなプロセスを開発する上で有望な方向である。既に多くのアプリケーションで成功したが、QDアプローチは通常、新しい候補ソリューションを生成するために1つまたは2つのソリューションの組み合わせに頼っている。技術進化のようなオープンなプロセスで観察されるように、これらのソリューションの大きな多様性を賢明に組み合わせることで、より革新的なソリューションが生まれ、QD検索の生産性が向上する可能性がある。本研究では、生成モデルのパターンマッチング機能を利用して、そのような効率的な解の組み合わせを実現することを提案する。 In-context QDは、事前学習されたLarge Language Models (LLMs) のコンテキスト内能力を引き出す手法のフレームワークであり、QDアーカイブをコンテキストとして利用する興味深いソリューションを生成する。一連の共通QDドメインに適用すると、In-context QDは、単目的最適化のために開発されたQDベースラインと類似の戦略の両方と比較して有望な結果を示す。さらに、この結果は、パラメータサイズとアーカイブ人口サイズの複数の値にまたがるだけでなく、BBO関数と異なる特徴を持つ領域やポリシー探索の領域にも及んでいる。最後に、QDのための有望なソリューションの創出を促進する重要なプロンプト設計の考察を強調した広範囲なアブレーションを行う。

Quality-Diversity (QD) approaches are a promising direction to develop open-ended processes as they can discover archives of high-quality solutions across diverse niches. While already successful in many applications, QD approaches usually rely on combining only one or two solutions to generate new candidate solutions. As observed in open-ended processes such as technological evolution, wisely combining large diversity of these solutions could lead to more innovative solutions and potentially boost the productivity of QD search. In this work, we propose to exploit the pattern-matching capabilities of generative models to enable such efficient solution combinations. We introduce In-context QD, a framework of techniques that aim to elicit the in-context capabilities of pre-trained Large Language Models (LLMs) to generate interesting solutions using the QD archive as context. Applied to a series of common QD domains, In-context QD displays promising results compared to both QD baselines and similar strategies developed for single-objective optimization. Additionally, this result holds across multiple values of parameter sizes and archive population sizes, as well as across domains with distinct characteristics from BBO functions to policy search. Finally, we perform an extensive ablation that highlights the key prompt design considerations that encourage the generation of promising solutions for QD.

翻訳日:2024-04-26 19:30:27 公開日:2024-04-24

# Raformer:ビデオワイヤー塗装用の冗長性対応トランスフォーマー

Raformer: Redundancy-Aware Transformer for Video Wire Inpainting ( http://arxiv.org/abs/2404.15802v1 )

ライセンス: Link先を確認

Zhong Ji, Yimu Su, Yan Zhang, Jiacheng Hou, Yanwei Pang, Jungong Han,

(参考訳) Video Wire Inpainting (VWI) は、フィルムやテレビシリーズのワイヤーを完璧に除去することを目的とした、ビデオインペイントにおける顕著な応用である。しかしながら、ワイヤの取り外しは、一般的にビデオの塗布作業で対象とするものよりも長く、細くなり、人や背景オブジェクトと不規則に交差することが多く、塗装プロセスに複雑さが生じるため、大きな課題となる。ビデオワイヤの小型化,品質の低さ,各種シーンの限定といった,既存のビデオワイヤデータセットの制約を認識し,新しいマスク生成戦略であるWire removal Video Dataset 2 (WRV2) と Pseudo Wire-Shaped (PWS) Masks を導入した新しいVWIデータセットを提案する。 WRV2データセットは、平均80フレームの4,000本以上のビデオで構成され、インペイントモデルの開発と有効性を促進するように設計されている。そこで本研究では,ビデオインペイントにおけるワイヤ除去のユニークな課題に対処する冗長性認識変換器(Raformer)法を提案する。すべてのフレームパッチを無差別に処理する従来のアプローチとは異なり、Raformerは、塗装に有用な情報を持たない静的な背景セグメントなど、冗長な部分を選択的にバイパスする新しい戦略を採用している。 Raformerのコアとなるのは、粗い粒度のウィンドウベースのアテンションメカニズムを通じて重要なコンテンツを分離しアクセントする、冗長性意識(RAA)モジュールである。これはSoft Feature Alignment (SFA)モジュールによって補完され、これらの機能を洗練し、エンドツーエンドの機能アライメントを実現する。従来のビデオインペイントデータセットと提案したWRV2データセットの両方に対する大規模な実験により、Raformerが他の最先端手法よりも優れていることが示された。

Video Wire Inpainting (VWI) is a prominent application in video inpainting, aimed at flawlessly removing wires in films or TV series, offering significant time and labor savings compared to manual frame-by-frame removal. However, wire removal poses greater challenges due to the wires being longer and slimmer than objects typically targeted in general video inpainting tasks, and often intersecting with people and background objects irregularly, which adds complexity to the inpainting process. Recognizing the limitations posed by existing video wire datasets, which are characterized by their small size, poor quality, and limited variety of scenes, we introduce a new VWI dataset with a novel mask generation strategy, namely Wire Removal Video Dataset 2 (WRV2) and Pseudo Wire-Shaped (PWS) Masks. WRV2 dataset comprises over 4,000 videos with an average length of 80 frames, designed to facilitate the development and efficacy of inpainting models. Building upon this, our research proposes the Redundancy-Aware Transformer (Raformer) method that addresses the unique challenges of wire removal in video inpainting. Unlike conventional approaches that indiscriminately process all frame patches, Raformer employs a novel strategy to selectively bypass redundant parts, such as static background segments devoid of valuable information for inpainting. At the core of Raformer is the Redundancy-Aware Attention (RAA) module, which isolates and accentuates essential content through a coarse-grained, window-based attention mechanism. This is complemented by a Soft Feature Alignment (SFA) module, which refines these features and achieves end-to-end feature alignment. Extensive experiments on both the traditional video inpainting datasets and our proposed WRV2 dataset demonstrate that Raformer outperforms other state-of-the-art methods.

翻訳日:2024-04-26 19:30:27 公開日:2024-04-24

# GeckOpt: インテントベースのツール選択によるLLMシステムの効率性

GeckOpt: LLM System Efficiency via Intent-Based Tool Selection ( http://arxiv.org/abs/2404.15804v1 )

ライセンス: Link先を確認

Michael Fore, Simranjit Singh, Dimitrios Stamoulis,

(参考訳) 本稿では,大規模言語モデル (LLM) に対する GPT による意図に基づく推論手法について検討する。実行時にユーザプロンプトの背後にある意図を特定することで、タスク実行に必要なAPIツールセットを縮小し、トークンの消費量を最大24.6\%削減します。 100以上のGPT-4-Turboノードを持つ現実世界の大規模並列Copilotプラットフォームの初期結果は、LCMベースのシステム効率を改善するためのコスト削減と可能性を示している。

In this preliminary study, we investigate a GPT-driven intent-based reasoning approach to streamline tool selection for large language models (LLMs) aimed at system efficiency. By identifying the intent behind user prompts at runtime, we narrow down the API toolset required for task execution, reducing token consumption by up to 24.6\%. Early results on a real-world, massively parallel Copilot platform with over 100 GPT-4-Turbo nodes show cost reductions and potential towards improving LLM-based system efficiency.

翻訳日:2024-04-26 19:30:27 公開日:2024-04-24

# ESM2を超える: 効率的なクラスタリングによるグラフ強化タンパク質配列モデリング

Beyond ESM2: Graph-Enhanced Protein Sequence Modeling with Efficient Clustering ( http://arxiv.org/abs/2404.15805v1 )

ライセンス: Link先を確認

Shujian Jiao, Bingxuan Li, Lei Wang, Xiaojin Zhang, Wei Chen, Jiajie Peng, Zhongyu Wei,

(参考訳) タンパク質は生命の過程に必須であり、進化と多様性を支えている。シークエンシング技術の進歩により数百万のタンパク質が明らかにされ、生物学的分析とAI開発のための高度な事前学習されたタンパク質モデルの必要性が強調されている。 FacebookのESM2は、これまでで最も先進的なタンパク質言語モデルであり、教師なし学習にマスク付き予測タスクを活用し、顕著な生化学的精度でアミノ酸表現を作成する。我々の研究は、タンパク質ファミリー分類をESM2のトレーニングに組み込むことで、このギャップに対処する。このアプローチは、Community Propagation-Based Clustering Algorithmで強化され、グローバルなタンパク質表現を改善し、文脈予測タスクは局所アミノ酸の精度を微調整する。本モデルでは,タンパク質の表現品質を著しく向上させるグローバルな手法とローカルな手法を組み合わせる能力を示すために,いくつかの下流実験で最先端の結果を得た。

Proteins are essential to life's processes, underpinning evolution and diversity. Advances in sequencing technology have revealed millions of proteins, underscoring the need for sophisticated pre-trained protein models for biological analysis and AI development. Facebook's ESM2, the most advanced protein language model to date, leverages a masked prediction task for unsupervised learning, crafting amino acid representations with notable biochemical accuracy. Yet, it lacks in delivering functional protein insights, signaling an opportunity for enhancing representation quality.Our study addresses this gap by incorporating protein family classification into ESM2's training.This approach, augmented with Community Propagation-Based Clustering Algorithm, improves global protein representations, while a contextual prediction task fine-tunes local amino acid accuracy. Significantly, our model achieved state-of-the-art results in several downstream experiments, demonstrating the power of combining global and local methodologies to substantially boost protein representation quality.

翻訳日:2024-04-26 19:30:27 公開日:2024-04-24

# マスクの場所:グラフマスクオートエンコーダのための構造誘導型マスキング

Where to Mask: Structure-Guided Masking for Graph Masked Autoencoders ( http://arxiv.org/abs/2404.15806v1 )

ライセンス: Link先を確認

Chuang Liu, Yuyao Wang, Yibing Zhan, Xueqi Ma, Dapeng Tao, Jia Wu, Wenbin Hu,

(参考訳) グラフマスク付きオートエンコーダ(GMAE)は、グラフ構造化データに対する自己教師付き事前学習の大幅な進歩として登場した。これまでのGMAEモデルは、トレーニング中にノードやエッジに対して単純なランダムマスキング戦略を主に利用していた。しかし、この戦略はグラフ構造内の異なるノードの異なる重要性を考慮できない。本稿では,マスク付き事前学習プロセスにおいて,グラフの構造組成を基本的かつ一意的に活用する可能性について検討する。そこで本研究では,既存のGMAEモデルの改良を目的とした,構造誘導型マスキング戦略(StructMAE)を提案する。 StructMAEには2つのステップがある。 1) 構造に基づくスコア付け: 各ノードが評価され,その構造的意義を反映したスコアが割り当てられる。事前定義と学習可能なスコアリングの2つの異なる種類のスコアリング方法が提案されている。 2) 構造誘導型マスキング: 得られた評価スコアを用いて, 自己指導型再建作業の構造意識を徐々に向上させる, 容易かつハードなマスキング戦略を開発する。特に、この戦略はランダムマスキングから始まり、アセスメントスコアに基づいて、構造非形式ノードをマスキングする。この設計は、グラフ構造情報の学習においてモデルを徐々に効果的に導く。さらに、StructMAE法は、教師なしと転送学習の両方において、既存の最先端のGMAEモデルよりも優れていることを一貫して実証している。コードはhttps://github.com/LiuChuang0059/StructMAEで入手できる。

Graph masked autoencoders (GMAE) have emerged as a significant advancement in self-supervised pre-training for graph-structured data. Previous GMAE models primarily utilize a straightforward random masking strategy for nodes or edges during training. However, this strategy fails to consider the varying significance of different nodes within the graph structure. In this paper, we investigate the potential of leveraging the graph's structural composition as a fundamental and unique prior in the masked pre-training process. To this end, we introduce a novel structure-guided masking strategy (i.e., StructMAE), designed to refine the existing GMAE models. StructMAE involves two steps: 1) Structure-based Scoring: Each node is evaluated and assigned a score reflecting its structural significance. Two distinct types of scoring manners are proposed: predefined and learnable scoring. 2) Structure-guided Masking: With the obtained assessment scores, we develop an easy-to-hard masking strategy that gradually increases the structural awareness of the self-supervised reconstruction task. Specifically, the strategy begins with random masking and progresses to masking structure-informative nodes based on the assessment scores. This design gradually and effectively guides the model in learning graph structural information. Furthermore, extensive experiments consistently demonstrate that our StructMAE method outperforms existing state-of-the-art GMAE models in both unsupervised and transfer learning tasks. Codes are available at https://github.com/LiuChuang0059/StructMAE.

翻訳日:2024-04-26 19:20:39 公開日:2024-04-24

# すべてのための1つの部分グラフ:帰納的知識グラフ補完のためのオープンな部分グラフの効率的な推論

One Subgraph for All: Efficient Reasoning on Opening Subgraphs for Inductive Knowledge Graph Completion ( http://arxiv.org/abs/2404.15807v1 )

ライセンス: Link先を確認

Zhiwen Xie, Yi Zhang, Guangyou Zhou, Jin Liu, Xinhui Tu, Jimmy Xiangji Huang,

(参考訳) 知識グラフ補完(KGC)は近年、膨大な研究関心を集めており、既存のほとんどの手法は、トレーニング中にすべてのエンティティが観察されるトランスダクティブな設定に従って設計されている。トランスダクティブKGCの進歩にもかかわらず、これらの手法は未知の物質を含む新しいKGの推論に苦慮している。このようにして、見知らぬエンティティ間の欠落リンクを推論することを目的としたインダクティブKGCが、新たなトレンドとなっている。既存の多くの研究は、各候補を囲む囲む部分グラフを抽出することにより、帰納的KGCをグラフ分類問題として変換する。残念ながら、封じ込められたサブグラフの繰り返し抽出による高価な時間消費や、エンティティに依存しない特徴学習の欠如など、いくつかの課題に直面している。これらの問題に対処するために、帰納的KGCのためのグローバルローカルアンカー表現(GLAR)学習法を提案する。囲い込みサブグラフを利用する従来の方法とは異なり、全ての候補に対して共有開口サブグラフを抽出し、推論を行い、より効率的に推論を行うことができる。さらに、新興企業のためのリッチなエンティティ非依存機能を学ぶために、転送可能なグローバルアンカーとローカルアンカーを設計する。最後に、全ての候補をランク付けするために、オープニングサブグラフにグローバルな局所グラフ推論モデルを適用する。大規模な実験により、私たちのGLARは既存の最先端手法よりも優れています。

Knowledge Graph Completion (KGC) has garnered massive research interest recently, and most existing methods are designed following a transductive setting where all entities are observed during training. Despite the great progress on the transductive KGC, these methods struggle to conduct reasoning on emerging KGs involving unseen entities. Thus, inductive KGC, which aims to deduce missing links among unseen entities, has become a new trend. Many existing studies transform inductive KGC as a graph classification problem by extracting enclosing subgraphs surrounding each candidate triple. Unfortunately, they still face certain challenges, such as the expensive time consumption caused by the repeat extraction of enclosing subgraphs, and the deficiency of entity-independent feature learning. To address these issues, we propose a global-local anchor representation (GLAR) learning method for inductive KGC. Unlike previous methods that utilize enclosing subgraphs, we extract a shared opening subgraph for all candidates and perform reasoning on it, enabling the model to perform reasoning more efficiently. Moreover, we design some transferable global and local anchors to learn rich entity-independent features for emerging entities. Finally, a global-local graph reasoning model is applied on the opening subgraph to rank all candidates. Extensive experiments show that our GLAR outperforms most existing state-of-the-art methods.

翻訳日:2024-04-26 19:20:39 公開日:2024-04-24

# Nadir BRDF調整反射率の簡易計算による高度なセンチネル2解析

Facilitating Advanced Sentinel-2 Analysis Through a Simplified Computation of Nadir BRDF Adjusted Reflectance ( http://arxiv.org/abs/2404.15812v1 )

ライセンス: Link先を確認

David Montero, Miguel D. Mahecha, César Aybar, Clemens Mosig, Sebastian Wieneke,

(参考訳) 欧州宇宙機関のコペルニクス計画によるセンチネル2号(S2)ミッションは、地球表面分析に不可欠なデータを提供する。 Level-2Aは、MultiSpectral Instrument (MSI)を通して、高分解能(10-60 m)表面反射率(SR)データを提供する。 SRデータの精度と可視性を向上するためには、ナディアの視界をシミュレートする調整が不可欠である。これらの補正は、SRの異方性の性質と太陽や観測角度の変動に対処し、時間と異なる条件下で一貫した画像の比較を確実にする。単純なアルゴリズムである$c$-factor法は、観測されたS2 SRをMODIS BRDFモデルを用いて調整し、Nadir BRDF Adjusted Reflectance(NBAR)を実現する。個々のイメージへの$c$-factorの直接的な適用にもかかわらず、複数のS2イメージとクラウドストアドデータからのアースシステムデータキューブ(ESDC)をまたいだアプリケーションのための凝集型Pythonフレームワークが不足している。本稿では,S2 SRデータをNBARに変換するPythonパッケージであるsen2nbarを紹介する。本パッケージは、S2 SRデータのNBARへの変換を単一の関数で単純化し、効率的なプロセス管理のためにモジュールに編成する。 SAFEファイルとSPatioTemporal Asset Catalogs (STAC)のESDCのNBAR変換を容易にすることで、sen2nbarは多様なデータフォーマット要求を処理する柔軟なツールとして開発されている。 Sen2nbarがS2データの標準化と調和に大きく貢献することを期待しており、様々なアプリケーションにまたがる多様なユーザに対して堅牢なソリューションを提供する。 sen2nbarはhttps://github.com/ESDS-Leipzig/sen2nbar.comで入手できるオープンソースツールである。

The Sentinel-2 (S2) mission from the European Space Agency's Copernicus program provides essential data for Earth surface analysis. Its Level-2A products deliver high-to-medium resolution (10-60 m) surface reflectance (SR) data through the MultiSpectral Instrument (MSI). To enhance the accuracy and comparability of SR data, adjustments simulating a nadir viewing perspective are essential. These corrections address the anisotropic nature of SR and the variability in sun and observation angles, ensuring consistent image comparisons over time and under different conditions. The $c$-factor method, a simple yet effective algorithm, adjusts observed S2 SR by using the MODIS BRDF model to achieve Nadir BRDF Adjusted Reflectance (NBAR). Despite the straightforward application of the $c$-factor to individual images, a cohesive Python framework for its application across multiple S2 images and Earth System Data Cubes (ESDCs) from cloud-stored data has been lacking. Here we introduce sen2nbar, a Python package crafted to convert S2 SR data to NBAR, supporting both individual images and ESDCs derived from cloud-stored data. This package simplifies the conversion of S2 SR data to NBAR via a single function, organized into modules for efficient process management. By facilitating NBAR conversion for both SAFE files and ESDCs from SpatioTemporal Asset Catalogs (STAC), sen2nbar is developed as a flexible tool that can handle diverse data format requirements. We anticipate that sen2nbar will considerably contribute to the standardization and harmonization of S2 data, offering a robust solution for a diverse range of users across various applications. sen2nbar is an open-source tool available at https://github.com/ESDS-Leipzig/sen2nbar.

翻訳日:2024-04-26 19:20:39 公開日:2024-04-24

# 拡散シュレーディンガー橋による高速組立

Fast Ensembling with Diffusion Schrödinger Bridge ( http://arxiv.org/abs/2404.15814v1 )

ライセンス: Link先を確認

Hyunsu Kim, Jongmin Yoon, Juho Lee,

(参考訳) ディープ・アンサンブル(Deep Ensemble、DE)アプローチは、様々な初期点からニューラルネットワークを訓練し、様々な局所最適点に向かって収束させることにより、ディープ・ニューラルネットワークの性能を高めるための簡単な手法である。しかし、この手法の限界は推論の計算オーバーヘッドが高く、多くの学習されたパラメータを格納し、推論段階で各パラメータに対して個別のフォワードパスを実行する必要性から生じる。本稿では,Diffusion Bridge Network (DBN) と呼ばれる新しい手法を提案する。 Schr\\odinger Bridgeの理論に基づいて、単一アンサンブル部材の出力分布とアンサンブルモデルの出力分布を接続する確率微分方程式(SDE)を直接シミュレートし、すべてのアンサンブルモデルを前方通過することなくアンサンブル予測を得る。重アンサンブルをDBNを構成する軽量ニューラルネットワークに置き換えることで,CIFAR-10,CIFAR-100,TinyImageNetなどのベンチマークデータセットの精度と不確実性を維持しつつ,計算コストを削減した推論を実現した。実装はhttps://github.com/kim-hyunsu/dbn.comで公開しています。

Deep Ensemble (DE) approach is a straightforward technique used to enhance the performance of deep neural networks by training them from different initial points, converging towards various local optima. However, a limitation of this methodology lies in its high computational overhead for inference, arising from the necessity to store numerous learned parameters and execute individual forward passes for each parameter during the inference stage. We propose a novel approach called Diffusion Bridge Network (DBN) to address this challenge. Based on the theory of the Schr\"odinger bridge, this method directly learns to simulate an Stochastic Differential Equation (SDE) that connects the output distribution of a single ensemble member to the output distribution of the ensembled model, allowing us to obtain ensemble prediction without having to invoke forward pass through all the ensemble models. By substituting the heavy ensembles with this lightweight neural network constructing DBN, we achieved inference with reduced computational cost while maintaining accuracy and uncertainty scores on benchmark datasets such as CIFAR-10, CIFAR-100, and TinyImageNet. Our implementation is available at https://github.com/kim-hyunsu/dbn.

翻訳日:2024-04-26 19:20:39 公開日:2024-04-24

# 単一視点Scene Point Cloud Human Grasp Generation

Single-View Scene Point Cloud Human Grasp Generation ( http://arxiv.org/abs/2404.15815v1 )

ライセンス: Link先を確認

Yan-Kang Wang, Chengyi Xing, Yi-Lin Wei, Xiao-Ming Wu, Wei-Shi Zheng,

(参考訳) 本研究では,一つの視点から物体を観察する典型的な現実の状況を,より正確に反映した,一視点のシーンポイント雲に基づく人間のつかみを生成する新しい課題について検討する。オブジェクト・ポイント・クラウドの不完全性や多数のシーン・ポイントの存在により、生成した手はオブジェクトの見えない部分に侵入しやすくなり、シーン・ポイントの影響を受けやすい。そこで我々は,S2HGraspという2つの重要なモジュールからなるフレームワークを紹介した。グローバルパーセプションモジュールは部分的オブジェクトポイントの雲をグローバルに知覚し,DiffuGraspモジュールはシーンポイントを含む複雑な入力に基づいて高品質な人間の把握を生成するように設計されている。さらに,S2HGDデータセットを導入し,1,668個のユニークなオブジェクトからなる,約99,000個の単一オブジェクトのシーンポイントクラウドから構成した。我々の広範な実験により、S2HGraspはシーンポイントによらず自然の人間のつかみを生成できるだけでなく、手と物体の見えない部分の侵入を効果的に防止できることが示された。さらに,本モデルでは,目に見えない物体に適用した場合に,強い一般化能力を示す。私たちのコードとデータセットはhttps://github.com/iSEE-Laboratory/S2HGrasp.orgで公開されています。

In this work, we explore a novel task of generating human grasps based on single-view scene point clouds, which more accurately mirrors the typical real-world situation of observing objects from a single viewpoint. Due to the incompleteness of object point clouds and the presence of numerous scene points, the generated hand is prone to penetrating into the invisible parts of the object and the model is easily affected by scene points. Thus, we introduce S2HGrasp, a framework composed of two key modules: the Global Perception module that globally perceives partial object point clouds, and the DiffuGrasp module designed to generate high-quality human grasps based on complex inputs that include scene points. Additionally, we introduce S2HGD dataset, which comprises approximately 99,000 single-object single-view scene point clouds of 1,668 unique objects, each annotated with one human grasp. Our extensive experiments demonstrate that S2HGrasp can not only generate natural human grasps regardless of scene points, but also effectively prevent penetration between the hand and invisible parts of the object. Moreover, our model showcases strong generalization capability when applied to unseen objects. Our code and dataset are available at https://github.com/iSEE-Laboratory/S2HGrasp.

翻訳日:2024-04-26 19:20:39 公開日:2024-04-24

# 視覚変換器を用いた対向領域適応

Vision Transformer-based Adversarial Domain Adaptation ( http://arxiv.org/abs/2404.15817v1 )

ライセンス: Link先を確認

Yahan Li, Yuan Wu,

(参考訳) Unsupervised domain adapt (UDA) は、ラベル付きソースドメインからラベル付きターゲットドメインに知識を転送することを目的としている。最新のUDA法は、常に敵の訓練を頼りに最先端の結果を得ることができ、既存のUDA法では、畳み込みニューラルネットワーク(CNN)を特徴抽出器として、ドメイン不変の特徴を学習している。視覚変換器(ViT)は、その出現以来大きな注目を集め、画像分類、オブジェクト検出、セマンティックセグメンテーションなど様々なコンピュータビジョンタスクで広く利用されているが、敵領域適応のポテンシャルは研究されていない。本稿では,このギャップを,対向領域適応における特徴抽出器としてViTを用いて埋める。さらに,既存のUDA手法でCNNベースの特徴抽出器を直接置き換えることで,VTベースの特徴抽出器の性能向上が容易に実現可能であることを実証的に示す。コードはhttps://github.com/LluckyYH/VT-ADAで公開されている。

Unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. The most recent UDA methods always resort to adversarial training to yield state-of-the-art results and a dominant number of existing UDA methods employ convolutional neural networks (CNNs) as feature extractors to learn domain invariant features. Vision transformer (ViT) has attracted tremendous attention since its emergence and has been widely used in various computer vision tasks, such as image classification, object detection, and semantic segmentation, yet its potential in adversarial domain adaptation has never been investigated. In this paper, we fill this gap by employing the ViT as the feature extractor in adversarial domain adaptation. Moreover, we empirically demonstrate that ViT can be a plug-and-play component in adversarial domain adaptation, which means directly replacing the CNN-based feature extractor in existing UDA methods with the ViT-based feature extractor can easily obtain performance improvement. The code is available at https://github.com/LluckyYH/VT-ADA.

翻訳日:2024-04-26 19:20:39 公開日:2024-04-24

# SynthEval: 語彙合成データの詳細なユーティリティとプライバシ評価のためのフレームワーク

SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data ( http://arxiv.org/abs/2404.15821v1 )

ライセンス: Link先を確認

Anton Danholt Lautrup, Tobias Hyrup, Arthur Zimek, Peter Schneider-Kamp,

(参考訳) データ不足、データの公正性、データプライバシといった、機械学習の現代的問題に対処する合成データの需要が高まっているため、データの有用性と潜在的なプライバシリスクを評価するための堅牢なツールが不可欠である。オープンソースの新しい評価フレームワークであるSynthEvalは、特別な種類の事前処理ステップを仮定することなく、カテゴリ属性と数値属性を同等のケアで扱うことで、既存のツールと差別化している。これは、事実上あらゆる表レコードの合成データセットに適用できる。我々のツールは統計的および機械学習技術を利用して、合成データの忠実度とプライバシー保護の整合性を包括的に評価する。 SynthEvalは、独立して、あるいは高度にカスタマイズ可能なベンチマーク設定で使用でき、追加のメトリクスで容易に拡張できる、幅広い種類のメトリクスを統合する。本稿では,SynthEvalについて述べるとともに,その汎用性を例に示す。このフレームワークは、より良いベンチマークとより一貫性のあるモデル機能の比較を促進する。

With the growing demand for synthetic data to address contemporary issues in machine learning, such as data scarcity, data fairness, and data privacy, having robust tools for assessing the utility and potential privacy risks of such data becomes crucial. SynthEval, a novel open-source evaluation framework distinguishes itself from existing tools by treating categorical and numerical attributes with equal care, without assuming any special kind of preprocessing steps. This~makes it applicable to virtually any synthetic dataset of tabular records. Our tool leverages statistical and machine learning techniques to comprehensively evaluate synthetic data fidelity and privacy-preserving integrity. SynthEval integrates a wide selection of metrics that can be used independently or in highly customisable benchmark configurations, and can easily be extended with additional metrics. In this paper, we describe SynthEval and illustrate its versatility with examples. The framework facilitates better benchmarking and more consistent comparisons of model capabilities.

翻訳日:2024-04-26 19:20:39 公開日:2024-04-24

# 決定論的環境における再帰的後方Q-Learning

Recursive Backwards Q-Learning in Deterministic Environments ( http://arxiv.org/abs/2404.15822v1 )

ライセンス: Link先を確認

Jan Diekhoff, Jörn Fischer,

(参考訳) 強化学習は複雑な問題に対する最適解を見つける一般的な方法である。 Q-learningのようなアルゴリズムは、環境のモデルを使わずに確率的な問題を解決する学習に長けている。しかし、決定論的問題の解決には必要以上に時間がかかる。このようなモデルベースのアプローチを導入することで、決定論的問題を解決するためにQラーニングを改善することができる。本稿では,再帰的逆向きQ-ラーニング(RBQL)エージェントについて紹介する。終端状態に達した後、このモデルを通してその値を後方に再帰的に伝播する。これにより、長い学習プロセスなしに、各状態が最適な値に評価される。迷路を通る最短経路を見つける例として、このエージェントは通常のQ-ラーニングエージェントを大きく上回る。

Reinforcement learning is a popular method of finding optimal solutions to complex problems. Algorithms like Q-learning excel at learning to solve stochastic problems without a model of their environment. However, they take longer to solve deterministic problems than is necessary. Q-learning can be improved to better solve deterministic problems by introducing such a model-based approach. This paper introduces the recursive backwards Q-learning (RBQL) agent, which explores and builds a model of the environment. After reaching a terminal state, it recursively propagates its value backwards through this model. This lets each state be evaluated to its optimal value without a lengthy learning process. In the example of finding the shortest path through a maze, this agent greatly outperforms a regular Q-learning agent.

翻訳日:2024-04-26 19:20:39 公開日:2024-04-24

# ニューラルネットワークハードウェアアクセラレータのための構成可能かつ効率的なメモリ階層

A Configurable and Efficient Memory Hierarchy for Neural Network Hardware Accelerator ( http://arxiv.org/abs/2404.15823v1 )

ライセンス: Link先を確認

Oliver Bause, Paul Palomero Bernardo, Oliver Bringmann,

(参考訳) 機械学習アプリケーションが進化を続けるにつれ、ディープニューラルネットワーク(DNN)に特化している効率的なハードウェアアクセラレーターの需要はますます重要になっている。本稿では,DNNの適応型メモリアクセスパターン毎に設定可能なメモリ階層化フレームワークを提案する。階層は、アクセルの計算ユニットにデータを提供するために、オフチップメモリからオンデマンドでデータを要求する。目的は、必要なメモリ容量を最小化することと、高いアクセラレータ性能を維持することのバランスを最適化することである。このフレームワークは設定性に特徴があり、最大5レベルまで調整されたメモリ階層を作成することができる。さらに、このフレームワークは、メモリ管理プロセスの柔軟性を高めるために、オプションシフトレジスタを最終レベルとして組み込んでいる。 DNN層の包括的ループネスト解析により、ほとんどのループアンロールのアクセスパターンを効率的に実行できることが示されている。 DNN加速器UltraTrailの合成結果とケーススタディは、より小さなメモリモジュールを使用することができるため、チップ面積を62.2%まで削減できる可能性を示唆している。同時に、パフォーマンス損失は2.4%まで最小化できる。

As machine learning applications continue to evolve, the demand for efficient hardware accelerators, specifically tailored for deep neural networks (DNNs), becomes increasingly vital. In this paper, we propose a configurable memory hierarchy framework tailored for per layer adaptive memory access patterns of DNNs. The hierarchy requests data on-demand from the off-chip memory to provide it to the accelerator's compute units. The objective is to strike an optimized balance between minimizing the required memory capacity and maintaining high accelerator performance. The framework is characterized by its configurability, allowing the creation of a tailored memory hierarchy with up to five levels. Furthermore, the framework incorporates an optional shift register as final level to increase the flexibility of the memory management process. A comprehensive loop-nest analysis of DNN layers shows that the framework can efficiently execute the access patterns of most loop unrolls. Synthesis results and a case study of the DNN accelerator UltraTrail indicate a possible reduction in chip area of up to 62.2% as smaller memory modules can be used. At the same time, the performance loss can be minimized to 2.4%.

翻訳日:2024-04-26 19:20:39 公開日:2024-04-24

# 量子ゲートのロバストな複雑さ:基礎

Robust Quantum Gate Complexity: Foundations ( http://arxiv.org/abs/2404.15828v1 )

ライセンス: Link先を確認

Johannes Aspman, Vyacheslav Kungurtsev, Jakub Marecek,

(参考訳) クローズド量子システムの最適制御は、量子コンピュータの実装と理解において重要な役割を担っている、幾何学的にエレガントな計算理論と技法の集合である。回路自体の設計は、初期的かつ容易に準備された状態から、ある意味でユーザに対して通知されるもの、例えば、評価が回路の一部であるオラクルへ、キュービットを操るために、適切なゲートセット(制御オペランドとして現れる)を選択する最適制御問題に対応する。しかし、現代のデバイスはノイズが多いことが知られており、回路が意図した動作をするかどうかは定かではない。しかし、より広範な最適制御理論には計算ツールが存在するが、不確実性や誤りに関して量子制御系の適切な操作の堅牢性はまだ研究されていない。本稿では,閉量子最適制御とその幾何学的解釈への関連性から着想を得た新しいアプローチを提案する。この目的のために、量子制御の文脈におけるロバストネスの適切な問題定義を示し、ゲート複雑性に対するより広範な影響に焦点を当てる。

Optimal control of closed quantum systems is a well studied geometrically elegant set of computational theory and techniques that have proven pivotal in the implementation and understanding of quantum computers. The design of a circuit itself corresponds to an optimal control problem of choosing the appropriate set of gates (which appear as control operands) in order to steer a qubit from an initial, easily prepared state, to one that is informative to the user in some sense, for e.g., an oracle whose evaluation is part of the circuit. However, contemporary devices are known to be noisy, and it is not certain that a circuit will behave as intended. Yet, although the computational tools exist in broader optimal control theory, robustness of adequate operation of a quantum control system with respect to uncertainty and errors has not yet been broadly studied in the literature. In this paper, we propose a new approach inspired by the closed quantum optimal control and its connection to geometric interpretations. To this end, we present the appropriate problem definitions of robustness in the context of quantum control, focusing on its broader implications for gate complexity.

翻訳日:2024-04-26 19:20:39 公開日:2024-04-24

# ジェファーソン研究室におけるディープラーニングを用いたキャビティ故障予測の高速化

Accelerating Cavity Fault Prediction Using Deep Learning at Jefferson Laboratory ( http://arxiv.org/abs/2404.15829v1 )

ライセンス: Link先を確認

Monibor Rahman, Adam Carpenter, Khan Iftekharuddin, Chris Tennant,

(参考訳) 加速キャビティはジェファーソン研究所の連続電子ビーム加速器施設(CEBAF)の不可欠な部分である。 CEBAFの400以上のキャビティのうちのどれかが欠陥を経験すると、実験的なユーザーホールへのビームの送出を妨害する。本研究では,緩やかに発達する空洞断層を予測するための深層学習モデルを提案する。プリフォールト信号を利用してLSTM-CNNバイナリ分類器を訓練し,通常の動作中の高周波信号と差し迫った故障を示すRF信号とを識別する。我々は、故障の信頼度を調整し、複数連続するウィンドウ基準を実装して、故障事象を識別し、偽陽性率を低くすることで、モデルを最適化する。展開シナリオをシミュレートする加速キャビティから収集された実際のデータセットの分析から得られた結果は、モデルが正常な信号を99.99%の精度で識別し、ゆっくりと発達する断層の80%を正確に予測する能力を示している。特に、これらの成果は高度に不均衡なデータセットの文脈で達成され、断層の開始前に数百ミリ秒の故障予測が行われた。予測障害により、プリエンプティブ対策は、その発生を予防または緩和することで、運用効率を向上させることができる。

Accelerating cavities are an integral part of the Continuous Electron Beam Accelerator Facility (CEBAF) at Jefferson Laboratory. When any of the over 400 cavities in CEBAF experiences a fault, it disrupts beam delivery to experimental user halls. In this study, we propose the use of a deep learning model to predict slowly developing cavity faults. By utilizing pre-fault signals, we train a LSTM-CNN binary classifier to distinguish between radio-frequency (RF) signals during normal operation and RF signals indicative of impending faults. We optimize the model by adjusting the fault confidence threshold and implementing a multiple consecutive window criterion to identify fault events, ensuring a low false positive rate. Results obtained from analysis of a real dataset collected from the accelerating cavities simulating a deployed scenario demonstrate the model's ability to identify normal signals with 99.99% accuracy and correctly predict 80% of slowly developing faults. Notably, these achievements were achieved in the context of a highly imbalanced dataset, and fault predictions were made several hundred milliseconds before the onset of the fault. Anticipating faults enables preemptive measures to improve operational efficiency by preventing or mitigating their occurrence.

翻訳日:2024-04-26 19:20:39 公開日:2024-04-24

# OpTC -- AURIX TC3xxマイクロコントローラ上にニューラルネットワークをデプロイするためのツールチェーン

OpTC -- A Toolchain for Deployment of Neural Networks on AURIX TC3xx Microcontrollers ( http://arxiv.org/abs/2404.15833v1 )

ライセンス: Link先を確認

Christian Heidorn, Frank Hannig, Dominik Riedelbauch, Christoph Strohmeyer, Jürgen Teich,

(参考訳) AURIX 2xxおよび3xxシリーズのTriCoreマイクロコントローラは、自動車業界や、最近では機械学習タスクを含むアプリケーションでも広く使われている。しかし、これらのアプリケーションは主に手動で設計されており、TriCoreマイクロコントローラにニューラルネットワークをもたらすためのツールサポートはほとんどない。そこで我々は,TC3xxマイクロコントローラ上でのニューラルネットワークの自動圧縮,変換,コード生成,デプロイのためのエンドツーエンドツールチェーンであるOPCを提案する。 OpTCは、さまざまなタイプのニューラルネットワークをサポートし、与えられたニューラルネットワークの感度分析に基づいて、レイヤワイズプルーニングを使用して圧縮を提供する。マルチ層パーセプトロン(MLP)、畳み込みニューラルネットワーク(CNN)、リカレントニューラルネットワーク(RNN)など、さまざまなタイプのニューラルネットワークをサポートする柔軟性が、TC387マイクロコントローラのケーススタディで示されている。これにより、電気モーターの温度予測と異常検出のための自動車応用を用いて、OPTCがサポートする幅広い応用の有効性と適用範囲を実証する。

The AURIX 2xx and 3xx families of TriCore microcontrollers are widely used in the automotive industry and, recently, also in applications that involve machine learning tasks. Yet, these applications are mainly engineered manually, and only little tool support exists for bringing neural networks to TriCore microcontrollers. Thus, we propose OpTC, an end-to-end toolchain for automatic compression, conversion, code generation, and deployment of neural networks on TC3xx microcontrollers. OpTC supports various types of neural networks and provides compression using layer-wise pruning based on sensitivity analysis for a given neural network. The flexibility in supporting different types of neural networks, such as multi-layer perceptrons (MLP), convolutional neural networks (CNN), and recurrent neural networks (RNN), is shown in case studies for a TC387 microcontroller. Automotive applications for predicting the temperature in electric motors and detecting anomalies are thereby used to demonstrate the effectiveness and the wide range of applications supported by OpTC.

翻訳日:2024-04-26 19:20:39 公開日:2024-04-24

# 2原子エンタングルメントの加工媒体を有する量子エンジンを用いたエネルギー変換装置

Energy-conversion device using a quantum engine with the work medium of two-atom entanglement ( http://arxiv.org/abs/2404.15835v1 )

ライセンス: Link先を確認

J. -W. Zhang, B. Wang, W. -F. Yuan, J. -C. Li, J. -T. Bu, G. -Y. Ding, W. -Q. Ding, L. Chen, F. Zhou, M. Feng,

(参考訳) 絡み合いは量子情報処理の必須資源と考えられているが、絡み合いがエネルギー変換や量子状態の出力に役立つかどうかはまだ実験的な証拠がない。本稿では,高調波ポテンシャルに閉じ込められた2つの絡み合ったイオンによって作用する中間体を有する量子エンジンとして動作するエネルギー変換装置について報告する。 2つのイオンは、2つのイオンが共有する振動モードの1つに仮想的に結合することで絡み合っており、量子エンジンは別の共振モードである量子負荷に結合する。本研究では, 量子エンジンのエネルギー変換効率について検討し, 量子負荷に蓄積される有用エネルギー(すなわち最大抽出可能作業)を, 2つのイオンを異なるエンタングルメントの度合いで調整し, 負荷中のフォノンの変化を検出することにより検討する。我々の観測は、エンタングルメントが量子エンジンによって生成される有用なエネルギーを燃料にするが、エネルギー変換効率には役に立たないという、初めて定量的な証拠を提供する。この結果は,最大抽出可能エネルギーを最も高指標とする量子電池の研究に有用であると考えられる。

Although entanglement is considered as an essential resource for quantum information processing, whether entanglement helps for energy conversion or output in the quantum regime is still lack of experimental witness. Here we report on an energy-conversion device operating as a quantum engine with the working medium acted by two entangled ions confined in a harmonic potential. The two ions are entangled by virtually coupling to one of the vibrational modes shared by the two ions, and the quantum engine couples to a quantum load, which is another shared vibrational mode. We explore the energy conversion efficiency of the quantum engine and investigate the useful energy (i.e., the maximum extractable work) stored in the quantum load by tuning the two ions in different degrees of entanglement as well as detecting the change of the phonons in the load. Our observation provides, for the first time, quantitative evidence that entanglement fuels the useful energy produced by the quantum engine, but not helpful for the energy conversion efficiency. We consider that our results may be useful to the study of quantum batteries for which one of the most indexes is the maximum extractable energy.

翻訳日:2024-04-26 19:20:39 公開日:2024-04-24

# 難易度タブラルデータストリーム分類における2次元単語埋め込みの利用

Employing Two-Dimensional Word Embedding for Difficult Tabular Data Stream Classification ( http://arxiv.org/abs/2404.15836v1 )

ライセンス: Link先を確認

Paweł Zyblewski,

(参考訳) 急速な技術進歩はデータ量の増加と本質的に結びついており、その大部分はデータストリームとして解釈でき、概念のドリフト現象を示し、高い不均衡比を持つことができる。したがって、難しいデータストリームを分類するための新しいアプローチを開発することは、急速に成長する研究分野である。同時に、ディープラーニングとトランスファーラーニングの普及と、コンピュータビジョンタスクにおける畳み込みニューラルネットワークの成功は、表層データを離散デジタル信号の同質な形式に変換することに焦点を当てた、新しい研究トレンドであるMDE(Multi-dimensional Encoding)の出現に寄与している。本稿では,SSTML(Streaming Super Tabular Machine Learning)を提案する。 SSTMLは、連続したチャンクをSTMLアルゴリズムを用いて画像表現にエンコードし、単一のResNet-18トレーニングエポックを実行する。合成データストリームと実データストリームで実施された実験は、SSTMLが、同等の処理時間を維持しながら、最先端のアルゴリズムよりも統計的に優れた分類品質を達成できることを実証した。

Rapid technological advances are inherently linked to the increased amount of data, a substantial portion of which can be interpreted as data stream, capable of exhibiting the phenomenon of concept drift and having a high imbalance ratio. Consequently, developing new approaches to classifying difficult data streams is a rapidly growing research area. At the same time, the proliferation of deep learning and transfer learning, as well as the success of convolutional neural networks in computer vision tasks, have contributed to the emergence of a new research trend, namely Multi-Dimensional Encoding (MDE), focusing on transforming tabular data into a homogeneous form of a discrete digital signal. This paper proposes Streaming Super Tabular Machine Learning (SSTML), thereby exploring for the first time the potential of MDE in the difficult data stream classification task. SSTML encodes consecutive data chunks into an image representation using the STML algorithm and then performs a single ResNet-18 training epoch. Experiments conducted on synthetic and real data streams have demonstrated the ability of SSTML to achieve classification quality statistically significantly superior to state-of-the-art algorithms while maintaining comparable processing time.

翻訳日:2024-04-26 19:20:39 公開日:2024-04-24

# IOHファインダを用いた動的二項値問題の実証解析

Empirical Analysis of the Dynamic Binary Value Problem with IOHprofiler ( http://arxiv.org/abs/2404.15837v1 )

ライセンス: Link先を確認

Diederick Vermetten, Johannes Lengler, Dimitri Rusin, Thomas Bäck, Carola Doerr,

(参考訳) 動的環境における最適化問題は、近年、いくつかの理論的研究の源泉となっている。これらの問題の1つは単調な動的二項値問題であり、これは理論的には異なる遺伝的アルゴリズムの間で高い判別能力を持つ。この理論的な基礎から、この問題のいくつかのバージョンをIOH prominentrベンチマークフレームワークに統合する。この積分を用いて、中等次元問題に関する理論結果を再現する大規模ベンチマーク実験を行い、まだ理論的に研究されていないGAの性能について検討する。本結果は, 理論とベンチマークの相乗効果の多さを浮き彫りにして, 動的最適化問題のさらなる研究を行うためのプラットフォームを提供するものである。

Optimization problems in dynamic environments have recently been the source of several theoretical studies. One of these problems is the monotonic Dynamic Binary Value problem, which theoretically has high discriminatory power between different Genetic Algorithms. Given this theoretical foundation, we integrate several versions of this problem into the IOHprofiler benchmarking framework. Using this integration, we perform several large-scale benchmarking experiments to both recreate theoretical results on moderate dimensional problems and investigate aspects of GA's performance which have not yet been studied theoretically. Our results highlight some of the many synergies between theory and benchmarking and offer a platform through which further research into dynamic optimization problems can be performed.

翻訳日:2024-04-26 19:20:39 公開日:2024-04-24

# シーケンスによる記述論理に対する構成的補間と概念に基づくベス定義可能性

Constructive Interpolation and Concept-Based Beth Definability for Description Logics via Sequents ( http://arxiv.org/abs/2404.15840v1 )

ライセンス: Link先を確認

Tim S. Lyon, Jonas Karge,

(参考訳) 本稿では,多数の記述論理(DL)に適用可能なコンストラクティブな手法を導入し,一連のシステムに基づく概念に基づくBeth Definability Properties(CBP)を確立する。高い表現力を持つDL RIQをケーススタディとして、RIQオントロジーのための新しいシークエント計算を導入し、暗黙的に定義可能な概念の明示的な定義の抽出を可能にするシークエント計算から、ある種の補間体をどのように計算できるかを示す。我々の知る限りでは、これは補間子と定義をDLの文脈内で計算する最初のシーケントベースのアプローチであり、RIQがCBPを楽しむ最初の証明である。さらに, 逐次システムのモジュラリティのため, RIQ の制限は認められず, 適切な修正により他の DL にも適用可能である。

We introduce a constructive method applicable to a large number of description logics (DLs) for establishing the concept-based Beth definability property (CBP) based on sequent systems. Using the highly expressive DL RIQ as a case study, we introduce novel sequent calculi for RIQ-ontologies and show how certain interpolants can be computed from sequent calculus proofs, which permit the extraction of explicit definitions of implicitly definable concepts. To the best of our knowledge, this is the first sequent-based approach to computing interpolants and definitions within the context of DLs, as well as the first proof that RIQ enjoys the CBP. Moreover, due to the modularity of our sequent systems, our results hold for any restriction of RIQ, and are applicable to other DLs by suitable modifications.

翻訳日:2024-04-26 19:10:55 公開日:2024-04-24

# 共同評価とフィードバック生成のためのLDMプロンプティング戦略の探索

Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback Generation ( http://arxiv.org/abs/2404.15845v1 )

ライセンス: Link先を確認

Maja Stahl, Leon Biermann, Andreas Nehring, Henning Wachsmuth,

(参考訳) 個々のフィードバックは、学生がエッセイを書くスキルを改善するのに役立つ。しかし、そのようなフィードバックを提供するために必要な手作業は、実際は個人化を制限する。自動生成エッセイフィードバックは、生徒を自身のペース、利便性、望ましい頻度で指導する代替手段として機能する。大規模言語モデル(LLM)は、一貫性と文脈に関連のあるテキストを生成する上で、強力な性能を示している。しかし、役に立つエッセイフィードバックを提供する能力は不明確である。本研究は,LLMをベースとしたゼロショットと数発のエッセイフィードバックの促進戦略について検討する。 Chain-of-Thoughtのプロンプトにインスパイアされた私たちは、自動エッセイスコア(AES)が生成したフィードバックの品質にどのような影響を及ぼすか、その程度について調査する。 LLMが達成できるAES性能と、生成したエッセイフィードバックの有用性の両方を評価した。その結果,AESとフィードバック生成を併用することで,AESの性能が向上することが示唆された。しかし,我々の手作業による評価では,生成したエッセイフィードバックの品質が重視される一方で,生成したフィードバックに対するエッセイスコアリングの影響は依然として低いままである。

Individual feedback can help students improve their essay writing skills. However, the manual effort required to provide such feedback limits individualization in practice. Automatically-generated essay feedback may serve as an alternative to guide students at their own pace, convenience, and desired frequency. Large language models (LLMs) have demonstrated strong performance in generating coherent and contextually relevant text. Yet, their ability to provide helpful essay feedback is unclear. This work explores several prompting strategies for LLM-based zero-shot and few-shot generation of essay feedback. Inspired by Chain-of-Thought prompting, we study how and to what extent automated essay scoring (AES) can benefit the quality of generated feedback. We evaluate both the AES performance that LLMs can achieve with prompting only and the helpfulness of the generated essay feedback. Our results suggest that tackling AES and feedback generation jointly improves AES performance. However, while our manual evaluation emphasizes the quality of the generated essay feedback, the impact of essay scoring on the generated feedback remains low ultimately.

翻訳日:2024-04-26 19:10:55 公開日:2024-04-24

# 複雑から単純へ:大規模言語モデルの能力を考慮した多制約複雑命令の強化

From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models ( http://arxiv.org/abs/2404.15846v1 )

ライセンス: Link先を確認

Qianyu He, Jie Zeng, Qianxi He, Jiaqing Liang, Yanghua Xiao,

(参考訳) 大規模言語モデル(LLM)では、複雑な命令(複雑な命令に従う)で命令に従うことが必須である。しかし、LLMが複数の制約を持つ複雑な命令に従う能力をいかに拡張するかは、まだ解明されていない。このギャップを埋めるために、私たちはまず、能力に追従する複雑な制約を強化するのに有効なトレーニングデータについて研究する。複数の制約を含む命令でLLMを訓練することで、複雑な命令、特に複雑性レベルが低い命令の理解が促進されることが判明した。この改善はドメイン外制約の合成にも応用できる。さらに,有効なトレーニングデータを取得する方法と活用方法についても提案する。最後に,本手法の有効性を,総合的な性能,訓練効率,一般化能力の4つの条件で検証するために,広範囲な実験を行った。

It is imperative for Large language models (LLMs) to follow instructions with elaborate requirements (i.e. Complex Instructions Following). Yet, it remains under-explored how to enhance the ability of LLMs to follow complex instructions with multiple constraints. To bridge the gap, we initially study what training data is effective in enhancing complex constraints following abilities. We found that training LLMs with instructions containing multiple constraints enhances their understanding of complex instructions, especially those with lower complexity levels. The improvement can even generalize to compositions of out-of-domain constraints. Additionally, we further propose methods addressing how to obtain and utilize the effective training data. Finally, we conduct extensive experiments to prove the effectiveness of our methods in terms of overall performance, training efficiency, and generalization abilities under four settings.

翻訳日:2024-04-26 19:10:55 公開日:2024-04-24

# 3D Freehand Ultrasound using Visual Inertial and Deep Inertial Odometry for Measurement of Patellar Tracking (特集:ユビキタス・バイオサイバネティックスとバイオサイバネティックス)

3D Freehand Ultrasound using Visual Inertial and Deep Inertial Odometry for Measuring Patellar Tracking ( http://arxiv.org/abs/2404.15847v1 )

ライセンス: Link先を確認

Russell Buchanan, S. Jack Tu, Marco Camurri, Stephen J. Mellon, Maurice Fallon,

(参考訳) 膝蓋骨関節症(PFJ)は4名中1名に影響を及ぼし、治療にもかかわらず20%が慢性膝関節痛を発症した。膝置換術後の粗悪な結果と痛みは、しばしばパテラーの奇形追跡と結びついている。従来のCTやMRIのような画像技術では、コストや金属のアーチファクトといった課題に直面しています。関節の動きをモニターする新しいシステムは、PFJのダイナミクスの理解を大幅に改善し、より良い患者のケアと結果を支援する。 2次元超音波とモーショントラッキングを組み合わせることで, セマンティックセグメンテーションと位置登録による関節の3次元再構築が可能である。しかし,スキャナの軌跡を推定するための高価な外部インフラの必要性は,ハンドヘルド超音波による3次元骨の再構築を臨床的に行う上での最大の限界である。携帯型超音波スキャナー追跡のためのモーションキャプチャーの代替として,視覚慣性オドメトリー (VIO) と深層学習に基づく慣性オンドメトリー法を提案した。これらの方法で生成された3次元再構成は、PFJの評価と、自由手超音波スキャンによるさらなる測定の可能性を実証している。その結果, 平均復元誤差は1.25mm, 平均1.21mmであった。 VIO法は、外部インフラを必要とする方法に匹敵する精度で、ワイヤレスハンドヘルド超音波スキャンから骨を3次元再構成するための最初のインフラストラクチャフリーな方法である。

Patellofemoral joint (PFJ) issues affect one in four people, with 20% experiencing chronic knee pain despite treatment. Poor outcomes and pain after knee replacement surgery are often linked to patellar mal-tracking. Traditional imaging methods like CT and MRI face challenges, including cost and metal artefacts, and there's currently no ideal way to observe joint motion without issues such as soft tissue artefacts or radiation exposure. A new system to monitor joint motion could significantly improve understanding of PFJ dynamics, aiding in better patient care and outcomes. Combining 2D ultrasound with motion tracking for 3D reconstruction of the joint using semantic segmentation and position registration can be a solution. However, the need for expensive external infrastructure to estimate the trajectories of the scanner remains the main limitation to implementing 3D bone reconstruction from handheld ultrasound scanning clinically. We proposed the Visual-Inertial Odometry (VIO) and the deep learning-based inertial-only odometry methods as alternatives to motion capture for tracking a handheld ultrasound scanner. The 3D reconstruction generated by these methods has demonstrated potential for assessing the PFJ and for further measurements from free-hand ultrasound scans. The results show that the VIO method performs as well as the motion capture method, with average reconstruction errors of 1.25 mm and 1.21 mm, respectively. The VIO method is the first infrastructure-free method for 3D reconstruction of bone from wireless handheld ultrasound scanning with an accuracy comparable to methods that require external infrastructure.

翻訳日:2024-04-26 19:10:55 公開日:2024-04-24

# LLMにおける概念抽象化の検出

Detecting Conceptual Abstraction in LLMs ( http://arxiv.org/abs/2404.15848v1 )

ライセンス: Link先を確認

Michaela Regneri, Alhassan Abdelhalim, Sören Laue,

(参考訳) 本稿では,大言語モデル (LLM) 内で名詞の抽象化を検出する新しい手法を提案する。分類学関係における名詞対の心理的動機付けから始めると、ハイパーネミーを示す表面パターンをインスタンス化し、BERTが生成する注意行列を解析する。結果を2つの反事実集合と比較し、名詞対の分布的類似性にのみ関連付けられない抽象機構においてハイパーネミーを検出できることを示す。我々の発見は、LLMにおける概念的抽象性の説明可能性への第一歩である。

We present a novel approach to detecting noun abstraction within a large language model (LLM). Starting from a psychologically motivated set of noun pairs in taxonomic relationships, we instantiate surface patterns indicating hypernymy and analyze the attention matrices produced by BERT. We compare the results to two sets of counterfactuals and show that we can detect hypernymy in the abstraction mechanism, which cannot solely be related to the distributional similarity of noun pairs. Our findings are a first step towards the explainability of conceptual abstraction in LLMs.

翻訳日:2024-04-26 19:10:55 公開日:2024-04-24

# 質問応答のためのモバイルデバイスへの大規模言語モデル移植

Porting Large Language Models to Mobile Devices for Question Answering ( http://arxiv.org/abs/2404.15851v1 )

ライセンス: Link先を確認

Hannes Fassold,

(参考訳) モバイルデバイスにLLM(Large Language Models)をデプロイすることで、デバイス上で自然言語処理のすべての機能が利用できるようになる。 LLMの重要なユースケースは質問応答であり、幅広いユーザクエリに対して正確でコンテキスト的に関連する回答を提供することができる。我々は、どのようにして最先端のLCMをモバイルデバイスに移植し、デバイス上でネイティブに動作させたかを説明した。 LLM推論には、柔軟で自己完結したC++フレームワークであるllama.cppフレームワークを使用します。我々は、30億のパラメータを持つOrca-Mini-3Bモデルの6ビット量子化バージョンを選択し、このモデルの正しいプロンプトフォーマットを提示した。実験結果から,LLM推論はGalaxy S21スマートフォン上で対話的な速度で動作し,政治や地理,歴史など,さまざまな分野の質問に対する高品質な回答が得られた。

Deploying Large Language Models (LLMs) on mobile devices makes all the capabilities of natural language processing available on the device. An important use case of LLMs is question answering, which can provide accurate and contextually relevant answers to a wide array of user queries. We describe how we managed to port state of the art LLMs to mobile devices, enabling them to operate natively on the device. We employ the llama.cpp framework, a flexible and self-contained C++ framework for LLM inference. We selected a 6-bit quantized version of the Orca-Mini-3B model with 3 billion parameters and present the correct prompt format for this model. Experimental results show that LLM inference runs in interactive speed on a Galaxy S21 smartphone and that the model delivers high-quality answers to user queries related to questions from different subjects like politics, geography or history.

翻訳日:2024-04-26 19:10:55 公開日:2024-04-24

# QOPTLib: 組合せ最適化問題のための量子コンピューティング指向ベンチマーク

QOPTLib: a Quantum Computing Oriented Benchmark for Combinatorial Optimization Problems ( http://arxiv.org/abs/2404.15852v1 )

ライセンス: Link先を確認

Eneko Osaba, Esther Villar-Rodriguez,

(参考訳) 本稿では,組合せ最適化のための量子コンピューティング指向ベンチマークを提案する。 QOPTLibと呼ばれるこのベンチマークは、トラベルセールスマン問題、車両ルーティング問題、一次元ビンパッケージ問題、最大カット問題という4つのよく知られた問題に均等に分散した40のインスタンスで構成されている。 QOPTLibのインスタンスのサイズは、計算可能なサイズだけでなく、良い結果を得る可能性のない最大長にも対応している。この点において、ハイブリッドアプローチも考慮されている点を強調しておくことが重要である。したがって、このベンチマークは、ユーザに汎用データセットを提供する最初の取り組みである。本稿では,量子アニールに基づく2つの解法を用いたQOPTLibの完全解法について紹介する。私たちの主な目的は、新しい量子ベースのアルゴリズムによって、他の研究者がこれらの結果に勝とうとする、予備的なベースラインを確立することです。

In this paper, we propose a quantum computing oriented benchmark for combinatorial optimization. This benchmark, coined as QOPTLib, is composed of 40 instances equally distributed over four well-known problems: Traveling Salesman Problem, Vehicle Routing Problem, one-dimensional Bin Packing Problem and the Maximum Cut Problem. The sizes of the instances in QOPTLib not only correspond to computationally addressable sizes, but also to the maximum length approachable with non-zero likelihood of getting a good result. In this regard, it is important to highlight that hybrid approaches are also taken into consideration. Thus, this benchmark constitutes the first effort to provide users a general-purpose dataset. Also in this paper, we introduce a first full solving of QOPTLib using two solvers based on quantum annealing. Our main intention with this is to establish a preliminary baseline, hoping to inspire other researchers to beat these outcomes with newly proposed quantum-based algorithms.

翻訳日:2024-04-26 19:10:55 公開日:2024-04-24

# CLAD:コントラスト学習による操作攻撃に対するロバストオーディオディープフェイク検出

CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning ( http://arxiv.org/abs/2404.15854v1 )

ライセンス: Link先を確認

Haolin Wu, Jing Chen, Ruiying Du, Cong Wu, Kun He, Xingcan Shang, Hao Ren, Guowen Xu,

(参考訳) オーディオディープフェイクの普及は、重大なセキュリティ上の脅威を引き起こし、堅牢な検出方法を必要とする。既存の検出システムは将来性を示すが、悪意のあるオーディオ操作に対する堅牢性はまだ未調査である。このギャップを埋めるために、我々は最も広く採用されているオーディオディープフェイク検出器の攻撃に対する感受性について、初めて包括的な研究を行った。驚くべきことに、ボリュームコントロールのような操作でさえ、人間の知覚に影響を与えることなく、検出を著しくバイパスすることができる。そこで我々はCLAD(Contrastive Learning-based Audio Deepfake Detector)を提案する。鍵となる考え方は、操作によってもたらされる変動を最小限に抑えるために、対照的な学習を取り入れることである。さらに,特徴空間内でより密集した実音声をクラスタリングすることで,検出精度の向上を目的とした長さ損失を組み込んだ。我々は,最も広く採用されているオーディオディープフェイク検出モデルと,様々な操作攻撃に対して提案したCLADを総合的に評価した。検出モデルは脆弱性を示し、FARはそれぞれ36.69%、31.23%、そして51.28%まで上昇した。 CLADはロバスト性を高め、ノイズ注入下でFARを0.81%まで減少させ、全てのテストでFARを1.63%以下に維持した。ソースコードとドキュメントはアーティファクトリポジトリ(https://github.com/CLAD23/CLAD)で公開しています。

The increasing prevalence of audio deepfakes poses significant security threats, necessitating robust detection methods. While existing detection systems exhibit promise, their robustness against malicious audio manipulations remains underexplored. To bridge the gap, we undertake the first comprehensive study of the susceptibility of the most widely adopted audio deepfake detectors to manipulation attacks. Surprisingly, even manipulations like volume control can significantly bypass detection without affecting human perception. To address this, we propose CLAD (Contrastive Learning-based Audio deepfake Detector) to enhance the robustness against manipulation attacks. The key idea is to incorporate contrastive learning to minimize the variations introduced by manipulations, therefore enhancing detection robustness. Additionally, we incorporate a length loss, aiming to improve the detection accuracy by clustering real audios more closely in the feature space. We comprehensively evaluated the most widely adopted audio deepfake detection models and our proposed CLAD against various manipulation attacks. The detection models exhibited vulnerabilities, with FAR rising to 36.69%, 31.23%, and 51.28% under volume control, fading, and noise injection, respectively. CLAD enhanced robustness, reducing the FAR to 0.81% under noise injection and consistently maintaining an FAR below 1.63% across all tests. Our source code and documentation are available in the artifact repository (https://github.com/CLAD23/CLAD).

翻訳日:2024-04-26 19:10:55 公開日:2024-04-24

# 接続: bIt-rate mOdulatioNを通してChannel NEtwork attaCkを変換する

CONNECTION: COvert chaNnel NEtwork attaCk Through bIt-rate mOdulatioN ( http://arxiv.org/abs/2404.15858v1 )

ライセンス: Link先を確認

Simone Soderi, Rocco De Nicola,

(参考訳) カバーチャネルネットワークは、組織が敵の攻撃からネットワークを保護するために設置したセキュリティ対策を回避し、よく知られた方法である。本稿では,広帯域ネットワーク上で接続されたデバイス間の被覆チャネルを実装するためのビットレート変調に基づく新しい手法を提案する。この攻撃は、マシン(シークレット送信機)から機密情報を流出させ、ネットワークのセキュリティ対策や検知システムを避けながら秘密裏にシークレットレシーバーに転送するために利用することができる。本報告では,ネットワーク情報伝達における隠蔽チャネルネットワークとその潜在的なセキュリティリスクに着目して,この脅威を実現する方法について説明する。提案手法はビットレート変調を利用しており、高いビットレートは「1」、低いビットレートは「0」であり、秘密通信を可能にする。我々は、正当性のあるトラフィックや干渉、ビットレート容量、ビットエラー率の存在下での堅牢性など、隠蔽チャネルに関連する重要な指標を分析する。実験では、この攻撃の優れた性能を実証し、5bpsに優れた頑丈さと最大0.9239bps/Hzのチャネル容量を異なるノイズ源下で達成した。そこで,ビットレート変調はネットワークのセキュリティを効果的に侵害し,機密データを侵害することを示す。

Covert channel networks are a well-known method for circumventing the security measures organizations put in place to protect their networks from adversarial attacks. This paper introduces a novel method based on bit-rate modulation for implementing covert channels between devices connected over a wide area network. This attack can be exploited to exfiltrate sensitive information from a machine (i.e., covert sender) and stealthily transfer it to a covert receiver while evading network security measures and detection systems. We explain how to implement this threat, focusing specifically on covert channel networks and their potential security risks to network information transmission. The proposed method leverages bit-rate modulation, where a high bit rate represents a '1' and a low bit rate represents a '0', enabling covert communication. We analyze the key metrics associated with covert channels, including robustness in the presence of legitimate traffic and other interference, bit-rate capacity, and bit error rate. Experiments demonstrate the good performance of this attack, which achieved 5 bps with excellent robustness and a channel capacity of up to 0.9239 bps/Hz under different noise sources. Therefore, we show that bit-rate modulation effectively violates network security and compromises sensitive data.

翻訳日:2024-04-26 19:10:55 公開日:2024-04-24

# データ主体権執行のための安全・プライバシー保護認証

Secure and Privacy-Preserving Authentication for Data Subject Rights Enforcement ( http://arxiv.org/abs/2404.15859v1 )

ライセンス: Link先を確認

Malte Hansen, Andre Büttner,

(参考訳) GDPRを考慮して、データコントローラ(DC)は、データ主体(DS)が特定のデータ対象の権利を行使できるようにする必要がある。ここでの重要な要件は、DCがDSを確実に認証できることである。明確な技術的仕様がないため、ID文書のコピー要求やメールアドレスの検証など、様々な方法で実現されている。しかし、以前の研究では、これは様々なセキュリティやプライバシのリスクと関連付けられており、DSの特定は非自明な作業である可能性があることが示されている。本稿では、異なる認証方式をレビューし、属性ベースの認証情報とeIDを利用して、独立したIDプロバイダの助けを借りてDSの認証を可能にするアーキテクチャを提案する。私たちの仕事は、DCとDSの両方に利益をもたらす、DSの認証方法の標準化とプライバシー保護に寄与します。

In light of the GDPR, data controllers (DC) need to allow data subjects (DS) to exercise certain data subject rights. A key requirement here is that DCs can reliably authenticate a DS. Due to a lack of clear technical specifications, this has been realized in different ways, such as by requesting copies of ID documents or by email address verification. However, previous research has shown that this is associated with various security and privacy risks and that identifying DSs can be a non-trivial task. In this paper, we review different authentication schemes and propose an architecture that enables DCs to authenticate DSs with the help of independent Identity Providers in a secure and privacy-preserving manner by utilizing attribute-based credentials and eIDs. Our work contributes to a more standardized and privacy-preserving way of authenticating DSs, which will benefit both DCs and DSs.

翻訳日:2024-04-26 19:10:55 公開日:2024-04-24

# グラフ状態の真の多部非局所性はモデル依存的である

The genuinely multipartite nonlocality of graph states is model-dependent ( http://arxiv.org/abs/2404.15861v1 )

ライセンス: Link先を確認

Xavier Coiteux-Roy, Owidiusz Makuta, Fionnuala Curran, Remigiusz Augusiak, Marc-Olivier Renou,

(参考訳) ベルの定理は、いくつかの量子状態相関は古典的でない資源でしか説明できないことを証明している。真に多部的な非局所性(GMNL)の概念は、後に、2つ以上の非古典的資源が非自明な方法で関係しているという事実を概念化するために導入された。本稿ではまず,GMNLの歴史的定義に固有の矛盾を思い出す。第二に、我々はその再定義の一つ、Local-Operations-and-Shared-Randomness GMNL (LOSR-GMNL) に目を向け、全ての母子グラフ状態(クラスター状態を含む)がこの2番目の性質を持つことを示した。最後に,局所操作通信GMNL(Local-Operations-and-Neighbour-Communication GMNL,LONC-GMNL)と呼ばれる,一部の当事者間の短距離通信が生じる可能性のある状況に適応した第3の代替定義を概念化する。クラスター状態は第3の性質を持たないが、GHZ状態はそうである。その技術的内容以外にも、実験的に生成した量子システムの非古典性を評価するために、真のマルチパーティリート非局所性、真のマルチパーティリート絡み込み、あるいは絡み込み深さといった概念を適用する前に、厳密な概念的な作業が必要であることを、我々の書簡は示している。実験的な研究の多くは、これらの概念の歴史的定義に基づいて証人をいまだに用いているが、これは二部類資源に基づくモデルを拒否したことに失敗したことに留意する。

Bell's theorem proves that some quantum state correlations can only be explained by bipartite non-classical resources. The notion of genuinely multipartite nonlocality (GMNL) was later introduced to conceptualize the fact that nonclassical resources involving more than two parties in a nontrivial way may be needed to account for some quantum correlations. In this letter, we first recall the contradictions inherent to the historical definition of GMNL. Second, we turn to one of its redefinitions, called Local-Operations-and-Shared-Randomness GMNL (LOSR-GMNL), proving that all caterpillar graph states (including cluster states) have this second property. Finally, we conceptualize a third, alternative definition, which we call Local-Operations-and-Neighbour-Communication GMNL (LONC-GMNL), that is adapted to situations in which short-range communication between some parties might occur. We show that cluster states do not have this third property, while GHZ states do. Beyond its technical content, our letter illustrates that rigorous conceptual work is needed before applying the concepts of genuinely multipartite nonlocality, genuine multipartite entanglement or entanglement depth to benchmark the nonclassicality of some experimentally-produced quantum system. We note that most experimental works still use witnesses based on the historical definitions of these notions, which fail to reject models based on bipartite resources.

翻訳日:2024-04-26 19:10:55 公開日:2024-04-24

# Trncated quantum observablesとその半古典的極限

Truncated quantum observables and their semiclassical limit ( http://arxiv.org/abs/2404.15863v1 )

ライセンス: Link先を確認

Fabio Deelan Cunden, Marilena Ligabò, Maria Caterina Susca,

(参考訳) 量子可観測量$H$ truncated on the range of orthogonal projections $\Pi_N$ of rank $N$, we study the corresponding Weyl symbol in the semiclassical limit in the semiclassical limit of vanishing Planck constant $\hbar\to0$ and large quantum number $N\to\infty$, with $\hbar N$。ある仮定の下では、位相空間の古典的に許容される領域上で、ワイル記号の(一般には不連続である)記号への$L^2$-収束を証明する。一般定理の図解として、調和振動子と1次元箱内の自由粒子に対して、切り離された可観測物を分析する。後者の場合、古典的に許された領域の境界付近のシンボルの顕微鏡的点制限も計算する。

For quantum observables $H$ truncated on the range of orthogonal projections $\Pi_N$ of rank $N$, we study the corresponding Weyl symbol in the phase space in the semiclassical limit of vanishing Planck constant $\hbar\to0$ and large quantum number $N\to\infty$, with $\hbar N$ fixed. Under certain assumptions, we prove the $L^2$- convergence of the Weyl symbols to a symbol truncated (hence, in general discontinuous) on the classically permitted region in phase space. As an illustration of the general theorems we analyse truncated observables for the harmonic oscillator and for a free particle in a one-dimensional box. In the latter case, we also compute the microscopic pointwise limit of the symbols near the boundary of the classically permitted region.

翻訳日:2024-04-26 19:10:55 公開日:2024-04-24

# LLM支援インテントベース5Gコアネットワーク管理とオーケストレーションのためのセマンティックルーティング

Semantic Routing for Enhanced Performance of LLM-Assisted Intent-Based 5G Core Network Management and Orchestration ( http://arxiv.org/abs/2404.15869v1 )

ライセンス: Link先を確認

Dimitrios Michael Manias, Ali Chouman, Abdallah Shami,

(参考訳) 大規模言語モデル(LLM)は、人工知能(AI)アプリケーション、特に自然言語処理と生成AIの分野で急速に普及している。テキスト生成アプリケーションに限らず、これらのモデルにはプロンプトエンジニアリングを利用する機会があり、そのようなモデルの入力を適切に構造化して、モデルの目的を明確に表現することができる。この顕著な例は、ネットワーク操作と管理の自動化とメンテナンスのための新しいアプローチであるインテントベースのネットワーキングである。本稿では,LLMによる5Gコアネットワークのインテントベース管理とオーケストレーションにおけるセマンティックルーティングの実現について述べる。本研究は,エンド・ツー・エンドの意図抽出フレームワークを構築し,エンコーダの効果を徹底的に分析し,システム全体の性能を定量化するとともに,サンプルユーザ意図の多様なデータセットを提示する。その結果, セマンティックルータを用いることで, アーキテクチャを推し進めるスタンドアロンのLCMに比べて, LLM配置の精度と効率が向上することがわかった。

Large language models (LLMs) are rapidly emerging in Artificial Intelligence (AI) applications, especially in the fields of natural language processing and generative AI. Not limited to text generation applications, these models inherently possess the opportunity to leverage prompt engineering, where the inputs of such models can be appropriately structured to articulate a model's purpose explicitly. A prominent example of this is intent-based networking, an emerging approach for automating and maintaining network operations and management. This paper presents semantic routing to achieve enhanced performance in LLM-assisted intent-based management and orchestration of 5G core networks. This work establishes an end-to-end intent extraction framework and presents a diverse dataset of sample user intents accompanied by a thorough analysis of the effects of encoders and quantization on overall system performance. The results show that using a semantic router improves the accuracy and efficiency of the LLM deployment compared to stand-alone LLMs with prompting architectures.

翻訳日:2024-04-26 19:10:55 公開日:2024-04-24

# 混合型古典位相空間を持つキックトトップモデルにおける混合固有状態の分数のパワーロッド崩壊」への加算

Addendum to "Power-law decay of the fraction of the mixed eigenstates in kicked top model with mixed-type classical phase space" ( http://arxiv.org/abs/2404.15874v1 )

ライセンス: Link先を確認

Hua Yan, Qian Wang, Marko Robnik,

(参考訳) クリャロフ部分空間法を用いて、スピンコヒーレント状態を生成することにより、量子カオスを研究するためのプロトタイプモデル、固有状態のフシミ関数を研究するためのアクセス可能なシステムサイズは、文献や我々の以前の研究であるPhysよりもはるかに大きい。 E 108, 054217 (2023) [arXiv:2308.04824] 完全にカオス化されたトップでは、平均Wehrlエントロピーの局所化測度が円ユニタリアンサンブルの予測に近づくことが分かる。混合型の場合、古典的コンパクト位相空間におけるフシミ関数と正則領域とカオス領域の重なりによる混合固有状態の同定を行う。数値的に、混合固有状態の分数は$j^{-\zeta}$としてスケールし、システムサイズが$j$になるにつれて、ほぼ2桁のスケールでパワー・ローの減衰が増加する。これは、フシミ函数の一様半古典的凝縮の原理と半古典的極限におけるベリー・ロブニク図形を裏付ける証拠を与える。

By using the Krylov subspace technique to generate the spin coherent states in kicked top model, a prototype model for studying quantum chaos, the accessible system size for studying the Husimi functions of eigenstates can be much larger than that reported in the literature and our previous study Phys. Rev. E 108, 054217 (2023) [arXiv:2308.04824]. In the fully chaotic kicked top, we find that the mean Wehrl entropy localization measure approaches the prediction given by the Circular Unitary Ensemble. In the mixed-type case, we identify mixed eigenstates by the overlap of the Husimi function with regular and chaotic regions in classical compact phase space. Numerically, we show that the fraction of mixed eigenstates scales as $j^{-\zeta}$, a power-law decay as the system size $j$ increases, across nearly two orders of magnitude. This provides supporting evidence for the principle of uniform semiclassical condensation of Husimi functions and the Berry-Robnik picture in the semiclassical limit.

翻訳日:2024-04-26 19:10:55 公開日:2024-04-24

# 摂動マスキングに基づく効果的な教師なし制約テキスト生成

Effective Unsupervised Constrained Text Generation based on Perturbed Masking ( http://arxiv.org/abs/2404.15877v1 )

ライセンス: Link先を確認

Yingwen Fu, Wenjie Ou, Zhou Yu, Yue Lin,

(参考訳) 教師なし制約付きテキスト生成は、教師付きデータなしで与えられた制約セットの下でテキストを生成することを目的としている。現在の最先端の手法は、編集位置と動作を確率的にサンプリングし、不要な探索ステップを引き起こす可能性がある。本稿では,各ステップにおける最適な編集位置と動作を探索することで,効率を向上させるPMCTGを提案する。具体的には、PMCTGは摂動マスク技術を拡張して、編集する最も一貫性のないトークンを効果的に検索する。次に、4つのマルチアスペクトスコアリング機能を導入し、編集アクションを選択して検索の難しさをさらに軽減する。 PMCTGは教師付きデータを必要としないため、異なる生成タスクに適用することができる。 PMCTGは,教師なし環境下で,キーワード・文生成とパラフレーズ生成という2つの代表的なタスクにおいて,新たな最先端結果を実現する。

Unsupervised constrained text generation aims to generate text under a given set of constraints without any supervised data. Current state-of-the-art methods stochastically sample edit positions and actions, which may cause unnecessary search steps. In this paper, we propose PMCTG to improve effectiveness by searching for the best edit position and action in each step. Specifically, PMCTG extends perturbed masking technique to effectively search for the most incongruent token to edit. Then it introduces four multi-aspect scoring functions to select edit action to further reduce search difficulty. Since PMCTG does not require supervised data, it could be applied to different generation tasks. We show that under the unsupervised setting, PMCTG achieves new state-of-the-art results in two representative tasks, namely keywords-to-sentence generation and paraphrasing.

翻訳日:2024-04-26 19:10:55 公開日:2024-04-24

# 超伝導量子プロセッサ上での非定常流体流動のシミュレーション

Simulating unsteady fluid flows on a superconducting quantum processor ( http://arxiv.org/abs/2404.15878v1 )

ライセンス: Link先を確認

Zhaoyuan Meng, Jiarun Zhong, Shibo Xu, Ke Wang, Jiachen Chen, Feitong Jin, Xuhao Zhu, Yu Gao, Yaozu Wu, Chuanyu Zhang, Ning Wang, Yiren Zou, Aosai Zhang, Zhengyi Cui, Fanhao Shen, Zehang Bao, Zitian Zhu, Ziqi Tan, Tingting Li, Pengfei Zhang, Shiying Xiong, Hekang Li, Qiujiang Guo, Zhen Wang, Chao Song, H. Wang, Yue Yang,

(参考訳) 近年の中間スケール量子プロセッサの進歩は、実用的な量子優位性の探索に多大な関心を惹き付けている。流体力学のシミュレーションは古典物理学において非常に難しい問題であるが、実用化には不可欠である。本稿では、超伝導量子プロセッサを用いて、量子符号化、進化、流れ状態の検出からなる非定常流れのディジタルシミュレーション実験を報告する。量子アルゴリズムは、シュリンガー方程式の流体力学的定式化を用いたハミルトニアンシミュレーションに基づいている。平行1量子ゲートと2量子ビットゲートの中央値の99.97%と99.67%で、2次元(2次元)圧縮性分岐流と10量子ビットの2次元減衰渦のダイナミクスをシミュレートする。実験結果は, 平均密度と運動量分布の時間的変化をよく捉え, 適度な雑音を伴う空間流場を定性的に再現した。この研究は、現実的な応用のために乱流のようなより複雑な流れをシミュレートする量子コンピューティングの可能性を示す。

Recent advancements of intermediate-scale quantum processors have triggered tremendous interest in the exploration of practical quantum advantage. The simulation of fluid dynamics, a highly challenging problem in classical physics but vital for practical applications, emerges as a good candidate for showing quantum utility. Here, we report an experiment on the digital simulation of unsteady flows, which consists of quantum encoding, evolution, and detection of flow states, with a superconducting quantum processor. The quantum algorithm is based on the Hamiltonian simulation using the hydrodynamic formulation of the Schr\"odinger equation. With the median fidelities of 99.97% and 99.67% for parallel single- and two-qubit gates respectively, we simulate the dynamics of a two-dimensional (2D) compressible diverging flow and a 2D decaying vortex with ten qubits. The experimental results well capture the temporal evolution of averaged density and momentum profiles, and qualitatively reproduce spatial flow fields with moderate noises. This work demonstrates the potential of quantum computing in simulating more complex flows, such as turbulence, for practical applications.

翻訳日:2024-04-26 19:01:10 公開日:2024-04-24

# LiDARを用いた3次元物体検出における分布外検出の再検討

Revisiting Out-of-Distribution Detection in LiDAR-based 3D Object Detection ( http://arxiv.org/abs/2404.15879v1 )

ライセンス: Link先を確認

Michael Kösel, Marcel Schreiber, Michael Ulrich, Claudius Gläser, Klaus Dietmayer,

(参考訳) LiDARベースの3Dオブジェクト検出は、オブジェクトを正確に3Dでローカライズし分類する能力によって、自動走行の重要な部分となっている。しかし、オブジェクト検出器は、未知のフォアグラウンドオブジェクト、特に元のトレーニングデータに存在しないオブジェクトを扱う場合、重要な課題に直面している。これらのアウト・オブ・ディストリビューション(OOD)オブジェクトは誤分類を引き起こし、自動車両の安全性と信頼性に重大なリスクをもたらす。現在、LiDARを用いたOODオブジェクト検出は十分に研究されていない。我々は、OODオブジェクトの合成学習データを生成し、既知のオブジェクトカテゴリを摂動することで、この問題に対処する。我々の考えでは、これらの合成OODオブジェクトは、分布内(ID)オブジェクトと比較して、対象検出器の特徴マップで異なる応答を生成する。次に、事前訓練された固定オブジェクト検出器を用いて特徴を抽出し、単純な多層パーセプトロン(MLP)を訓練し、各検出をIDまたはOODとして分類する。さらに,ポイントクラウドを変更せずに既存のデータセットを使用できる新しい評価プロトコルを提案し,現実のシナリオをより確実に評価する。提案手法の有効性は,新たに提案したnuScenes OODベンチマークを用いて検証した。ソースコードはhttps://github.com/uulm-mrm/mmood3d.comで入手できる。

LiDAR-based 3D object detection has become an essential part of automated driving due to its ability to localize and classify objects precisely in 3D. However, object detectors face a critical challenge when dealing with unknown foreground objects, particularly those that were not present in their original training data. These out-of-distribution (OOD) objects can lead to misclassifications, posing a significant risk to the safety and reliability of automated vehicles. Currently, LiDAR-based OOD object detection has not been well studied. We address this problem by generating synthetic training data for OOD objects by perturbing known object categories. Our idea is that these synthetic OOD objects produce different responses in the feature map of an object detector compared to in-distribution (ID) objects. We then extract features using a pre-trained and fixed object detector and train a simple multilayer perceptron (MLP) to classify each detection as either ID or OOD. In addition, we propose a new evaluation protocol that allows the use of existing datasets without modifying the point cloud, ensuring a more authentic evaluation of real-world scenarios. The effectiveness of our method is validated through experiments on the newly proposed nuScenes OOD benchmark. The source code is available at https://github.com/uulm-mrm/mmood3d.

翻訳日:2024-04-26 19:01:10 公開日:2024-04-24

# 振動解析を用いた飛行前UAVロータ欠陥検出のための機械学習

Machine Learning for Pre/Post Flight UAV Rotor Defect Detection Using Vibration Analysis ( http://arxiv.org/abs/2404.15880v1 )

ライセンス: Link先を確認

Alexandre Gemayel, Dimitrios Michael Manias, Abdallah Shami,

(参考訳) 無人航空機(UAV)は将来のスマートシティにとって重要なインフラ要素となるだろう。効率的な運用のためには、UAVの信頼性は障害や故障の常時監視によって保証されなければならない。この目的のために,本論文では,信号処理と機械学習を用いて,包括的振動解析データの解析を行い,飛行前および飛行後におけるローターブレードの欠陥の有無を判定する。次元減少技術の助けを借りて、ランダムフォレストアルゴリズムは最高の性能を示し、欠陥のあるローターブレードを完璧に検出した。さらに、様々な特徴部分集合の影響を包括的に分析し、モデルの分類決定プロセスに影響を与える要因について考察する。

Unmanned Aerial Vehicles (UAVs) will be critical infrastructural components of future smart cities. In order to operate efficiently, UAV reliability must be ensured by constant monitoring for faults and failures. To this end, the work presented in this paper leverages signal processing and Machine Learning (ML) methods to analyze the data of a comprehensive vibrational analysis to determine the presence of rotor blade defects during pre and post-flight operation. With the help of dimensionality reduction techniques, the Random Forest algorithm exhibited the best performance and detected defective rotor blades perfectly. Additionally, a comprehensive analysis of the impact of various feature subsets is presented to gain insight into the factors affecting the model's classification decision process.

翻訳日:2024-04-26 19:01:10 公開日:2024-04-24

# 現在とその後の攻撃:ブラックボックス攻撃に対する物体検出のロバスト性の評価

Steal Now and Attack Later: Evaluating Robustness of Object Detection against Black-box Adversarial Attacks ( http://arxiv.org/abs/2404.15881v1 )

ライセンス: Link先を確認

Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung, Che-Rung Lee,

(参考訳) オブジェクト検出に対する遅延攻撃は、ターゲット画像に追加のゴーストオブジェクトを生成して推論時間をインフレーションすることを目的とした、敵攻撃の一種である。しかしながら、ブラックボックスのシナリオでゴーストオブジェクトを生成することは、これらの資格のないオブジェクトに関する情報が不透明であるため、依然として課題である。本研究では, 敵の事例にゴーストオブジェクトを生成できる可能性について, 「現在, 復号化」という概念を拡張して示す。これらの敵対的な例は、一度生成されると、AIサービスの潜在的な脆弱性を悪用するために使用され、重大なセキュリティ上の懸念を引き起こす。実験結果から,提案した攻撃は,対象モデルに関する事前知識を必要とせずに,様々な一般的なモデルとGoogle Vision APIをまたいだ攻撃を成功させることが示された。さらに、攻撃の平均コストは1ドル以下で、AIセキュリティに重大な脅威をもたらす。

Latency attacks against object detection represent a variant of adversarial attacks that aim to inflate the inference time by generating additional ghost objects in a target image. However, generating ghost objects in the black-box scenario remains a challenge since information about these unqualified objects remains opaque. In this study, we demonstrate the feasibility of generating ghost objects in adversarial examples by extending the concept of "steal now, decrypt later" attacks. These adversarial examples, once produced, can be employed to exploit potential vulnerabilities in the AI service, giving rise to significant security concerns. The experimental results demonstrate that the proposed attack achieves successful attacks across various commonly used models and Google Vision API without any prior knowledge about the target model. Additionally, the average cost of each attack is less than \$ 1 dollars, posing a significant threat to AI security.

翻訳日:2024-04-26 19:01:10 公開日:2024-04-24

# スプリット・インデックス行列積状態における非オンサイト対称性と量子テレポーテーション

Non-onsite symmetries and quantum teleportation in split-index matrix product states ( http://arxiv.org/abs/2404.15883v1 )

ライセンス: Link先を確認

David T. Stephen,

(参考訳) 我々は、新しい物理的および計算的性質を持つスピン鎖のクラスを記述する。物理的側面において、スピン鎖は非オンサイト対称性によって定義される対称性で保護された位相位相の例を与える。これらの位相は弦順パラメータによって検出できるが、特に絡み合いスペクトルの縮退は示さない。計算側では、スピン鎖は、必要な古典的側処理が測定結果の非線形関数であるという新しい性質により、長距離にわたって決定論的に情報をテレポートするために使用できる新しい種類の状態を表す。また、測定に基づく量子計算の普遍的な資源として機能しうる状態の例を示し、絡み合いスペクトルの縮退を伴わずにそのような資源の最初の例を提供する。我々の分析における重要なツールは、スプリットインデックス行列積状態(SIMPS)と呼ばれる新しいテンソルネットワーク表現である。我々はSIMPSの基本形式を開発し、それらを行列積状態と比較し、特定の非オンサイト対称性や量子テレポーテーションを記述するための装置がいかに優れているかを示し、それらが制約されたスピン鎖を記述するのにどのように適しているかを議論する。

We describe a class of spin chains with new physical and computational properties. On the physical side, the spin chains give examples of symmetry-protected topological phases that are defined by non-onsite symmetries, i.e. symmetries that are not a tensor product of single-site operators. These phases can be detected by string-order parameters, but notably do not exhibit entanglement spectrum degeneracy. On the computational side, the spin chains represent a new class of states that can be used to deterministically teleport information across long distances, with the novel property that the necessary classical side processing is a non-linear function of the measurement outcomes. We also give examples of states that can serve as universal resources for measurement-based quantum computation, providing the first examples of such resources without entanglement spectrum degeneracy. The key tool in our analysis is a new kind of tensor network representation which we call split-index matrix product states (SIMPS). We develop the basic formalism of SIMPS, compare them to matrix product states, show how they are better equipped to describe certain kinds of non-onsite symmetries and quantum teleportation, and discuss how they are also well-suited to describing constrained spin chains.

翻訳日:2024-04-26 19:01:10 公開日:2024-04-24

# 地域エネルギー市場におけるプライバシ保護請求(Long Version)

Privacy-Preserving Billing for Local Energy Markets (Long Version) ( http://arxiv.org/abs/2404.15886v1 )

ライセンス: Link先を確認

Eman Alqahtani, Mustafa A. Mustafa,

(参考訳) 本稿では,地域エネルギー市場(PBP-LEMs)に対するプライバシ保護請求プロトコルを提案する。 PBP-LEMは、市場団体が参加者の請求書を、正当性を犠牲にすることなく、分散的かつプライバシー保護的な方法で共同で計算することを可能にする。また、内部共謀の可能性から生じる個人のプライバシーに対するリスクを軽減している。まず,ビルディングブロックとして機能する情報理論のセキュリティを実現する,新しい,効率的で,プライバシ保護の個別請求方式を提案する。 PBP-LEMは、マルチパーティ計算、Pedersenのコミットメント、内部製品機能暗号化といった他の手法とともに、データの機密性と正確性を保証するためにこの方式を利用している。さらに、我々は3つのアプローチを提案し、結果としてプライバシーとパフォーマンスのレベルが異なる。このプロトコルがセキュリティとプライバシの要件を満たしていることを証明し、実際のLEMへのデプロイを可能にする。また、本分析では、全体的な性能の変動も示し、適用されたアプローチに基づいてオーバーヘッドが集中している領域を特定する。

We propose a privacy-preserving billing protocol for local energy markets (PBP-LEMs) that takes into account market participants' energy volume deviations from their bids. PBP-LEMs enables a group of market entities to jointly compute participants' bills in a decentralized and privacy-preserving manner without sacrificing correctness. It also mitigates risks on individuals' privacy arising from any potential internal collusion. We first propose a novel, efficient, and privacy-preserving individual billing scheme, achieving information-theoretic security, which serves as a building block. PBP-LEMs utilizes this scheme, along with other techniques such as multiparty computation, Pedersen commitments and inner product functional encryption, to ensure data confidentiality and accuracy. Additionally, we present three approaches, resulting in different levels of privacy and performance. We prove that the protocol meets its security and privacy requirements and is feasible for deployment in real LEMs. Our analysis also shows variations in overall performance and identifies areas where overhead is concentrated based on the applied approach.

翻訳日:2024-04-26 19:01:10 公開日:2024-04-24

# Sketch2Human: 絡み合った幾何学と外観制御を備えた深部人材育成

Sketch2Human: Deep Human Generation with Disentangled Geometry and Appearance Control ( http://arxiv.org/abs/2404.15889v1 )

ライセンス: Link先を確認

Linzi Qu, Jiaxiang Shang, Hui Ye, Xiaoguang Han, Hongbo Fu,

(参考訳) 幾何学的および外観制御されたフルボディ画像生成は興味深いが難しい課題である。既存のソリューションは、無条件または粗い条件(例えば、ポーズ、テキスト)に依存しているため、明示的な幾何学と体と衣服の外観制御が欠如している。スケッチはそのような編集機能を提供し、様々なスケッチベースの顔生成および編集ソリューションで採用されている。しかしながら、スケッチベースの顔生成をフルボディ生成に直接適応させると、ポーズ、体型、衣服の形、テクスチャの複雑さと多様性のために、高忠実で多様な結果が得られないことが多い。最近の幾何学的に制御可能な拡散法は主に外見を生成するプロンプトに依存しており、入力が粗い場合の現実主義と結果の忠実さのバランスをとることは困難である。本研究はSketch2Humanについて述べる。Sketch2Humanは、セマンティックスケッチ(幾何学制御用)と参照イメージ(外観制御用)でガイドされる、フルボディの人体画像生成を制御可能な最初のシステムである。我々の解は、逆幾何と出現潜時符号を入力とするStyleGAN-Humanの潜時空間に基づいている。具体的には,StyleGAN-Humanの潜伏空間からサンプル化した大規模な合成データセットを用いて訓練されたスケッチエンコーダについて述べる。そこで我々は,StyleGAN-Humanにおける部分幾何学とテクスチャの絡み合った情報と,非絡み合ったデータセットが存在しないことを考慮し,ジオメトリー保存および外観変換によるトレーニングデータを作成し,非絡み合った幾何学と外観制御を実現するための新しいトレーニングスキームを設計する。本手法は合成データを用いて訓練されているが,手描きスケッチも扱える。定性的および定量的評価は,本手法の最先端手法に対する優れた性能を示すものである。

Geometry- and appearance-controlled full-body human image generation is an interesting but challenging task. Existing solutions are either unconditional or dependent on coarse conditions (e.g., pose, text), thus lacking explicit geometry and appearance control of body and garment. Sketching offers such editing ability and has been adopted in various sketch-based face generation and editing solutions. However, directly adapting sketch-based face generation to full-body generation often fails to produce high-fidelity and diverse results due to the high complexity and diversity in the pose, body shape, and garment shape and texture. Recent geometrically controllable diffusion-based methods mainly rely on prompts to generate appearance and it is hard to balance the realism and the faithfulness of their results to the sketch when the input is coarse. This work presents Sketch2Human, the first system for controllable full-body human image generation guided by a semantic sketch (for geometry control) and a reference image (for appearance control). Our solution is based on the latent space of StyleGAN-Human with inverted geometry and appearance latent codes as input. Specifically, we present a sketch encoder trained with a large synthetic dataset sampled from StyleGAN-Human's latent space and directly supervised by sketches rather than real images. Considering the entangled information of partial geometry and texture in StyleGAN-Human and the absence of disentangled datasets, we design a novel training scheme that creates geometry-preserved and appearance-transferred training data to tune a generator to achieve disentangled geometry and appearance control. Although our method is trained with synthetic data, it can handle hand-drawn sketches as well. Qualitative and quantitative evaluations demonstrate the superior performance of our method to state-of-the-art methods.

翻訳日:2024-04-26 19:01:10 公開日:2024-04-24

# 臨床QAにおける中規模言語モデルの可能性の評価

Assessing The Potential Of Mid-Sized Language Models For Clinical QA ( http://arxiv.org/abs/2404.15894v1 )

ライセンス: Link先を確認

Elliot Bolton, Betty Xiong, Vijaytha Muralidharan, Joel Schamroth, Vivek Muralidharan, Christopher D. Manning, Roxana Daneshjou,

(参考訳) GPT-4 や Med-PaLM のような大規模言語モデルは、臨床上のタスクにおいて顕著なパフォーマンスを示しているが、それらは計算へのアクセスを必要とし、クローズソースであり、デバイスにデプロイすることができない。 BioGPT-large、BioMedLM、LLaMA 2、Mistral 7Bのような中型モデルはこれらの欠点を回避しているが、臨床業務の能力は検討されている。臨床利用の可能性を評価し,どのモデルを使うべきかを研究者が決定するのを助けるために,臨床質問応答(QA)の2つのタスク,MedQAとコンシューマクエリ応答を比較した。 Mistral 7Bは、すべてのベンチマークで優勝し、バイオメディカルドメイン向けに訓練されたモデルよりも優れています。 Mistral 7B の MedQA スコアは 63.0% で、オリジナルの Med-PaLM に近づき、コンシューマー向けヘルスクエリに対するもっともらしい応答を生成することができるが、改善の余地はまだ残っている。本研究は,臨床業務におけるオープンソース中規模モデルの初回評価を行う。

Large language models, such as GPT-4 and Med-PaLM, have shown impressive performance on clinical tasks; however, they require access to compute, are closed-source, and cannot be deployed on device. Mid-size models such as BioGPT-large, BioMedLM, LLaMA 2, and Mistral 7B avoid these drawbacks, but their capacity for clinical tasks has been understudied. To help assess their potential for clinical use and help researchers decide which model they should use, we compare their performance on two clinical question-answering (QA) tasks: MedQA and consumer query answering. We find that Mistral 7B is the best performing model, winning on all benchmarks and outperforming models trained specifically for the biomedical domain. While Mistral 7B's MedQA score of 63.0% approaches the original Med-PaLM, and it often can produce plausible responses to consumer health queries, room for improvement still exists. This study provides the first head-to-head assessment of open source mid-sized models on clinical tasks.

翻訳日:2024-04-26 19:01:10 公開日:2024-04-24

# 暗号通貨規制のグローバルトレンド:概観

Global Trends in Cryptocurrency Regulation: An Overview ( http://arxiv.org/abs/2404.15895v1 )

ライセンス: Link先を確認

Xihan Xiong, Junliang Luo,

(参考訳) 暗号通貨は重要な資産クラスへと発展し、様々な利点を提供している。しかし、市場ボラティリティや違法行為における誤用の可能性など、重大なリスクも生じている。これらのリスクは、消費者の保護、市場の整合性、金融安定を確保するための包括的な規制枠組みの緊急の必要性を浮き彫りにしている。しかし、暗号通貨規制の世界的な状況は依然として複雑であり、各国の規制枠組みが大幅に変化していることが特徴である。本研究の目的は,様々な管轄区域の規制環境を調査することで,これらの違いを解明することである。まず、規制の課題と考察を議論し、その後、国際規制のスタンス、アプローチ、措置の比較分析を行う。我々の研究は、暗号通貨規制におけるグローバルなトレンドの理解を高めるための実践的な洞察を提供してくれることを願っている。

Cryptocurrencies have evolved into an important asset class, providing a variety of benefits. However, they also present significant risks, such as market volatility and the potential for misuse in illegal activities. These risks underline the urgent need for a comprehensive regulatory framework to ensure consumer protection, market integrity, and financial stability. Yet, the global landscape of cryptocurrency regulation remains complex, marked by substantial variations in regulatory frameworks among different countries. This paper aims to study these differences by investigating the regulatory landscapes across various jurisdictions. We first discuss regulatory challenges and considerations, and then conduct a comparative analysis of international regulatory stances, approaches, and measures. We hope our study offers practical insights to enhance the understanding of global trends in cryptocurrency regulation.

翻訳日:2024-04-26 19:01:10 公開日:2024-04-24

# パラメトリック近似を超えた駆動散逸ダウンコンバージョンシステムにおける量子メロロジー

Quantum metrology in a driven-dissipation down-conversion system beyond the parametric approximation ( http://arxiv.org/abs/2404.15898v1 )

ライセンス: Link先を確認

Dong Xie, Chunling Xu,

(参考訳) ポンプモードと2つの縮退信号モードからなる縮退型ダウンコンバージョンシステムにおける量子メロロジーについて検討する。従来のパラメトリック近似では、ポンプモードは量子演算子ではなく定数であると仮定される。パラメトリック近似を超える2つの退化信号モードとポンプモードの結合強度の測定精度を得る。散逸がなければ、初期状態が古典状態と量子状態の直積であるときに超ハイゼンベルク極限が得られる。これは、準備が簡単でない絡み合いリソースの使用を必要としない。ポンプモードが単光子散逸に苦しむ場合、結合強度がコヒーレント駆動により0に近づくにつれて、結合強度の測定不確かさは0に近くなる。直接光子検出は最適な測定方法であることが証明された。この結果は、信号モードが2光子散逸に苦しむときに変わっていない。信号モードも単一モードの消散に苦しむ場合、結合強度に関する情報は定常状態で得られる。さらに、結合強度の測定の不確実性も0に近づき、通常の放射相と超放射相の臨界点としてノイズ温度に依存する。最後に、運転強度を測定するための正確な量子センサとして、駆動散逸ダウンコンバージョンシステムを用いることができることを示す。

We investigate quantum metrology in a degenerate down-conversion system composed of a pump mode and two degenerate signal modes. In the conventional parametric approximation, the pump mode is assumed to be constant, not a quantum operator. We obtain the measurement precision of the coupling strength between the pump mode and two degenerate signal modes beyond the parametric approximation. Without a dissipation, the super-Heisenberg limit can be obtained when the initial state is the direct product of classical state and quantum state. This does not require the use of entanglement resources which are not easy to prepare. When the pump mode suffers from a single-photon dissipation, the measurement uncertainty of the coupling strength is close to 0 as the coupling strength approaches 0 with a coherent driving. The direct photon detection is proved to be the optimal measurement. This result has not been changed when the signal modes suffer from the two-photon dissipation. When the signal modes also suffer from the single-mode dissipation, the information of the coupling strength can still be obtained in the steady state. In addition, the measurement uncertainty of the coupling strength can also be close to 0 and become independent of noise temperature as the critical point between the normal and superradiance phase approaches. Finally, we show that a driven-dissipation down-conversion system can be used as a precise quantum sensor to measure the driving strength.

翻訳日:2024-04-26 19:01:10 公開日:2024-04-24

# ST-MambaSync: 降雨量予測のためのマンバ構造と時空間変圧器の対応

ST-MambaSync: The Confluence of Mamba Structure and Spatio-Temporal Transformers for Precipitous Traffic Prediction ( http://arxiv.org/abs/2404.15899v1 )

ライセンス: Link先を確認

Zhiqi Shao, Xusheng Yao, Ze Wang, Junbin Gao,

(参考訳) 計算効率と精度のバランスをとることは、特に時空間データセットのような高次元データを扱う場合、機械学習において最重要である。本研究はST-MambaSyncについて紹介する。ST-MambaSyncは、合理化された注意層と単純化された状態空間層を統合する革新的なフレームワークである。このモデルは時空間予測タスクにおける競合精度を実現する。我々は、注意機構とマンバ成分の関係を掘り下げ、マンバ関数が残留ネットワーク構造内の注意に類似していることを明らかにする。この比較分析により、状態空間モデルの効率が向上し、計算コストの削減による優れた性能を実現する能力が解明される。

Balancing accuracy with computational efficiency is paramount in machine learning, particularly when dealing with high-dimensional data, such as spatial-temporal datasets. This study introduces ST-MambaSync, an innovative framework that integrates a streamlined attention layer with a simplified state-space layer. The model achieves competitive accuracy in spatial-temporal prediction tasks. We delve into the relationship between attention mechanisms and the Mamba component, revealing that Mamba functions akin to attention within a residual network structure. This comparative analysis underpins the efficiency of state-space models, elucidating their capability to deliver superior performance at reduced computational costs.

翻訳日:2024-04-26 19:01:10 公開日:2024-04-24

# 線引き:古代エトルリアの鏡からアートを抽出する深層画

Drawing the Line: Deep Segmentation for Extracting Art from Ancient Etruscan Mirrors ( http://arxiv.org/abs/2404.15903v1 )

ライセンス: Link先を確認

Rafael Sterzinger, Simon Brenner, Robert Sablatnig,

(参考訳) エトルリアの鏡はエトルリアの芸術において重要なカテゴリーであり、それゆえ、古代についての洞察を得るために体系的な調査が行われている。彼らの分析の重要な側面は、裏面から手動で彫刻をトレースする労働集約的な作業である。さらに、このタスクは、これらのミラーが持続する損傷のために本質的に困難であり、プロセスに主観性を導入する。これらの課題に対処するためには,手元にある制限データの有効利用を必要とするディープセグメンテーションネットワークと連携して,測光ステレオスキャンによるプロセスの自動化を行う。我々は、パッチ単位の予測と様々なデータ拡張、および自己教師型学習を取り入れることで、これを実現する。ベースラインと比較して,擬似F-Measureの予測性能を約16%向上させる。ヒトのベースラインに対して完全なミラーの性能を評価する際に,人間のアノテータと定量的に類似した性能を示し,既存のバイナライゼーション法を著しく上回る性能を示した。提案手法では,アノテーションのプロセスの合理化,客観性の向上,作業負荷の削減を図り,これらの歴史的遺物や非伝統的文書の検証に貴重な貢献をする。

Etruscan mirrors constitute a significant category within Etruscan art and, therefore, undergo systematic examinations to obtain insights into ancient times. A crucial aspect of their analysis involves the labor-intensive task of manually tracing engravings from the backside. Additionally, this task is inherently challenging due to the damage these mirrors have sustained, introducing subjectivity into the process. We address these challenges by automating the process through photometric-stereo scanning in conjunction with deep segmentation networks which, however, requires effective usage of the limited data at hand. We accomplish this by incorporating predictions on a per-patch level, and various data augmentations, as well as exploring self-supervised learning. Compared to our baseline, we improve predictive performance w.r.t. the pseudo-F-Measure by around 16%. When assessing performance on complete mirrors against a human baseline, our approach yields quantitative similar performance to a human annotator and significantly outperforms existing binarization methods. With our proposed methodology, we streamline the annotation process, enhance its objectivity, and reduce overall workload, offering a valuable contribution to the examination of these historical artifacts and other non-traditional documents.

翻訳日:2024-04-26 19:01:10 公開日:2024-04-24

# 周波数可変フォック状態生成のための量子光のハイブリッド光源

A hybrid source of quantum light for generation of frequency tunable Fock states ( http://arxiv.org/abs/2404.15908v1 )

ライセンス: Link先を確認

Aleksa Krstić, Priyanshu Tiwari, Florian Höhe, Frank Setzpfandt, Ulf Peschel, Joachim Ankerhold, Sina Saravi,

(参考訳) 非線形共振器における量子光発生方式を2レベル系にハイブリダイドして提案する。理論的には、一連の制御されたポンプパルスに励起されると、ハイブリッド源は1-および2-光子状態のほぼオンデマンド生成や最大7光子を持つフォック状態の生成確率50%以上といった高い確率でフォック状態を生成することができる。さらに重要なことは、非線形キャビティとポンプの調整可能な性質により、固定された2レベルのシステムであっても任意の周波数でフォック状態を生成することができ、量子技術のあらゆる分野で根本的に新しい機会が生まれることである。

We propose a scheme for quantum-light generation in a nonlinear cavity hybridized with a 2-level system. We theoretically show that, when excited by a series of controlled pump pulses, the hybrid source can generate various Fock states with high probabilities, such as near-on-demand generation of 1- and 2-photon states, and above 50% probability for generation of Fock states with up to 7 photons. More importantly, the tailorable nature of the nonlinear cavity and its pumping allows for generating Fock states with arbitrary frequencies, even with a fixed 2-level system, creating fundamentally new opportunities in all areas of quantum technologies.

翻訳日:2024-04-26 19:01:10 公開日:2024-04-24

# 生成的事前学習による長手ビデオの事前学習

Learning Long-form Video Prior via Generative Pre-Training ( http://arxiv.org/abs/2404.15909v1 )

ライセンス: Link先を確認

Jinheng Xie, Jiajun Feng, Zhaoxu Tian, Kevin Qinghong Lin, Yawen Huang, Xi Xia, Nanxu Gong, Xu Zuo, Jiaqi Yang, Yefeng Zheng, Mike Zheng Shou,

(参考訳) 人、オブジェクト、そしてそれらの相互作用のような長いビデオにかかわる概念は、暗黙の事前に従うものとして見ることができる。それらは特に複雑で、包括的に学ぶための課題を提起し続けています。近年、生成事前学習(GPT)は、視覚的位置さえも、どんな種類のテキストコンテンツでもモデリングできる多彩な能力を示している。この方法は、長めのビデオの学習に役立ちますか? ピクセル空間を操作する代わりに、バウンディングボックスやキーポイントのような視覚的な場所をビデオのキー情報として使うのが効果的である。適切なデータが不足しているため、映画から \textbf{Storyboard20K} と呼ばれる新しいデータセットを作成し、代表として機能させる。シナプス、ショット・バイ・ショットのキーフレーム、一貫したID、バウンディングボックス、ボディキーポイントを含むフィルムセットと文字の細かいアノテーションが含まれる。このようにして、ロングフォームビデオはトークンのセットで表現することができ、生成前のトレーニングを通じて学習することができる。実験結果から,本手法は以前から長編ビデオの学習に有用であることが確認された。コードとデータは \url{https://github.com/showlab/Long-form-Video-Prior} でリリースされる。

Concepts involved in long-form videos such as people, objects, and their interactions, can be viewed as following an implicit prior. They are notably complex and continue to pose challenges to be comprehensively learned. In recent years, generative pre-training (GPT) has exhibited versatile capacities in modeling any kind of text content even visual locations. Can this manner work for learning long-form video prior? Instead of operating on pixel space, it is efficient to employ visual locations like bounding boxes and keypoints to represent key information in videos, which can be simply discretized and then tokenized for consumption by GPT. Due to the scarcity of suitable data, we create a new dataset called \textbf{Storyboard20K} from movies to serve as a representative. It includes synopses, shot-by-shot keyframes, and fine-grained annotations of film sets and characters with consistent IDs, bounding boxes, and whole body keypoints. In this way, long-form videos can be represented by a set of tokens and be learned via generative pre-training. Experimental results validate that our approach has great potential for learning long-form video prior. Code and data will be released at \url{https://github.com/showlab/Long-form-Video-Prior}.

翻訳日:2024-04-26 18:51:25 公開日:2024-04-24

# スピン浴に囲まれた中心スピンの正確な力学を用いた強結合非マルコフ量子熱力学と正準ハミルトニアンの役割

Strong coupling non-Markovian quantum thermodynamics using exact dynamics of a central spin surrounded by a spin bath and the role of a canonical Hamiltonian ( http://arxiv.org/abs/2404.15915v1 )

ライセンス: Link先を確認

Devvrat Tiwari, Baibhab Bose, Subhashish Banerjee,

(参考訳) 焦点は強結合非マルコフ量子系の量子熱力学を理解することである。この目的のために、スピン浴に囲まれた中心スピンの非自明な非マルコフ模型を取り上げ、その正確な進化は任意の系-バス結合に導かれる。システムや浴槽の内部エネルギー、仕事、熱、エントロピー生成、エルゴトロピーといった基本的な量子力学量は、力学と元の系(バス)ハミルトン方程式を用いて計算される。作業の明示的な表現として、システムと浴槽内エネルギーのミスマッチが導出される。さらに、中心スピンを量子電池として想定するシナリオにおいて、チャージャーとして作用するスピン浴に関する興味深い観察を行う。上記の熱力学量の計算における標準ハミルトニアンの役割についても検討した。

The focus is on understanding the quantum thermodynamics of strongly coupled non-Markovian quantum systems. To this end, a non-trivial, non-Markovian model of a central spin surrounded by a spin bath is taken up, and its exact evolution is derived for arbitrary system-bath couplings. The fundamental quantum thermodynamic quantities, such as system and bath internal energies, work, heat, entropy production, and ergotropy, are calculated using the dynamics and original system (bath) Hamiltonian. An explicit expression for the work, a mismatch between the system and bath internal energies, is derived. Further, an interesting observation relevant to the spin bath acting as a charger is made in a scenario where the central spin is envisaged as a quantum battery. The role of a canonical Hamiltonian in calculating the above thermodynamic quantities, a recently developed technique, is also investigated.

翻訳日:2024-04-26 18:51:25 公開日:2024-04-24

# 畳み込みニューラルネットワーク, ResNet と Grad-CAM を用いた黄斑変性の知覚と局在

Perception and Localization of Macular Degeneration Applying Convolutional Neural Network, ResNet and Grad-CAM ( http://arxiv.org/abs/2404.15918v1 )

ライセンス: Link先を確認

Tahmim Hossain, Sagor Chandro Bakchy,

(参考訳) 罹患した患者にぼやけた視力を感じる網膜疾患として有名なのが黄斑変性症(Macular Degeneration)である。本研究は, 健康および黄斑変性の根源を分類し, 被害部位を同定することに基づく。バックボーンとして、ResNetアーキテクチャとCNN(ResNet50、ResNet50v2、ResNet101、ResNet101v2、ResNet152、ResNet152v2)が使用される。データは3つのカテゴリに分けられる。 (a)トレーニングセットは90%、テストセットは10% (b)トレーニングセットは80%、テストセットは20% (c)トレーニングセットは50%、テストセットは50%である。トレーニングの後、評価指標から最良のモデルが選択されました。モデルの中で、ResNet50のバックボーンを持つCNNは、90\%の列車で98.7\%のトレーニング精度と10\%のテストデータを分割する。このモデルを用いて,被害地を把握するためにGrad-CAMビジュアライゼーションを行った。

A well-known retinal disease that feels blurry visions to the affected patients is Macular Degeneration. This research is based on classifying the healthy and macular degeneration fundus with localizing the affected region of the fundus. A CNN architecture and CNN with ResNet architecture (ResNet50, ResNet50v2, ResNet101, ResNet101v2, ResNet152, ResNet152v2) as the backbone are used to classify the two types of fundus. The data are split into three categories including (a) Training set is 90% and Testing set is 10% (b) Training set is 80% and Testing set is 20%, (c) Training set is 50% and Testing set is 50%. After the training, the best model has been selected from the evaluation metrics. Among the models, CNN with backbone of ResNet50 performs best which gives the training accuracy of 98.7\% for 90\% train and 10\% test data split. With this model, we have performed the Grad-CAM visualization to get the region of affected area of fundus.

翻訳日:2024-04-26 18:51:25 公開日:2024-04-24

# フェデレーション学習のための要素重量集約法

An Element-Wise Weights Aggregation Method for Federated Learning ( http://arxiv.org/abs/2404.15919v1 )

ライセンス: Link先を確認

Yi Hu, Hanchi Ren, Chen Hu, Jingjing Deng, Xianghua Xie,

(参考訳) フェデレートラーニング(FL)は強力な機械学習(ML)パラダイムであり、分散クライアントが元のデバイスにデータを保存しながら共有グローバルモデルを共同で学習し、プライバシを保存することができる。 FLにおける中心的な課題は、異なる、潜在的にバランスの取れていないクライアントからの局所的なモデルウェイトを効果的に集約することである。既存のメソッドはしばしば各クライアントを無差別に扱い、ローカルモデル全体に対して単一の比率を適用する。しかし、それぞれの重量が特定の割合に割り当てられるのは経験的に有利である。本稿では,学習性能の最適化と収束速度の高速化を目的とした,新しい要素量集約法(EWWA-FL)を提案する。従来のFLアプローチとは異なり、EWWA-FLは各要素のレベルでグローバルモデルに局所的な重みを集約し、各クライアントが学習プロセスに要素的に貢献できるようにする。各クライアントのユニークなデータセット特性を考慮して、EWWA-FLはグローバルモデルのロバスト性を異なるデータセットに拡張するとともに、迅速な収束を実現している。この方法は様々な重み付け戦略を採用するのに十分な柔軟性がある。総合的な実験を通じて,EWWA-FLの高度な性能を実証し,様々なバックボーンとベンチマークの精度と収束速度の両面で有意な改善を示した。

Federated learning (FL) is a powerful Machine Learning (ML) paradigm that enables distributed clients to collaboratively learn a shared global model while keeping the data on the original device, thereby preserving privacy. A central challenge in FL is the effective aggregation of local model weights from disparate and potentially unbalanced participating clients. Existing methods often treat each client indiscriminately, applying a single proportion to the entire local model. However, it is empirically advantageous for each weight to be assigned a specific proportion. This paper introduces an innovative Element-Wise Weights Aggregation Method for Federated Learning (EWWA-FL) aimed at optimizing learning performance and accelerating convergence speed. Unlike traditional FL approaches, EWWA-FL aggregates local weights to the global model at the level of individual elements, thereby allowing each participating client to make element-wise contributions to the learning process. By taking into account the unique dataset characteristics of each client, EWWA-FL enhances the robustness of the global model to different datasets while also achieving rapid convergence. The method is flexible enough to employ various weighting strategies. Through comprehensive experiments, we demonstrate the advanced capabilities of EWWA-FL, showing significant improvements in both accuracy and convergence speed across a range of backbones and benchmarks.

翻訳日:2024-04-26 18:51:25 公開日:2024-04-24

# ショートカットにおける速度とコストの最適トレードオフの単一原子検証

Single-Atom Verification of the Optimal Trade-Off Between Speed and Cost in Shortcuts to Adiabaticity ( http://arxiv.org/abs/2404.15922v1 )

ライセンス: Link先を確認

J. -W. Zhang, J. -T. Bu, J. C. Li, Weiquan Meng, W. -Q. Ding, B. Wang, W. -F. Yuan, H. -J. Du, G. -Y. Ding, W. -J. Chen, L. Chen, F. Zhou, Zhenyu Xu, M. Feng,

(参考訳) 断熱へのショートカットのアプローチは、量子情報処理における断熱力学の効果的な実行を可能にする。動的速度と過渡駆動フィールドに関連するコストとの本質的にのトレードオフのため、任意に高速な演算を実行することは現実的ではない。このプロセスにおける速度とエネルギーコストの正確な相互作用を理解するため、理論と実験的に新しいトレードオフを提案し、これは、$s$-パラメータ化された位相空間内で厳密に最適化された境界によって特徴づけられる。我々の実験は、単一超低温の$^{40}$Ca$^{+}$イオンを調和ポテンシャルに閉じ込めて実施する。イオンの量子状態を正確に操作することにより、Landau-Zenerモデル(英語版)を例として実行し、量子速度制限とコストはスペクトルギャップによって制御される。私たちは、提案されたトレードオフが、当初純粋な状態と初期混合状態の両方を含むシナリオにおいて、確かに密接であるのを目撃します。我々の研究は、断熱性に対するショートカットの基本的な制約を理解するのに役立ち、伝統的に見落とされた未利用位相空間の可能性を照らし出す。

The approach of shortcuts to adiabaticity enables the effective execution of adiabatic dynamics in quantum information processing with enhanced speed. Owing to the inherent trade-off between dynamical speed and the cost associated with the transitionless driving field, executing arbitrarily fast operations becomes impractical. To understand the accurate interplay between speed and energetic cost in this process, we propose theoretically and verify experimentally a new trade-off, which is characterized by a tightly optimized bound within $s$-parameterized phase spaces. Our experiment is carried out in a single ultracold $^{40}$Ca$^{+}$ ion trapped in a harmonic potential. By exactly operating the quantum states of the ion, we execute the Landau-Zener model as an example, where the quantum speed limit as well as the cost are governed by the spectral gap. We witness that our proposed trade-off is indeed tight in scenarios involving both initially pure and initially mixed states. Our work helps understanding the fundamental constraints in shortcuts to adiabaticity and illuminates the potential of under-utilized phase spaces that have been traditionally overlooked.

翻訳日:2024-04-26 18:51:25 公開日:2024-04-24

# KGValidator:知識グラフ構築の自動検証フレームワーク

KGValidator: A Framework for Automatic Validation of Knowledge Graph Construction ( http://arxiv.org/abs/2404.15923v1 )

ライセンス: Link先を確認

Jack Boylan, Shashank Mangla, Dominic Thorn, Demian Gholipour Ghalandari, Parsa Ghaffari, Chris Hokamp,

(参考訳) 本研究では,知識グラフ(KG)補完モデルの自動評価にLarge Language Models (LLMs) を用いることを検討した。歴史的に、KGsで情報を検証することは難しい課題であり、大規模な人間のアノテーションを禁止コストで要求してきた。汎用的な生成AIとLLMの出現により、人間のループ検証が生成エージェントに置き換えられる可能性が高まった。生成モデルを用いて知識グラフを検証する場合に,一貫性と検証のためのフレームワークを導入する。我々のフレームワークは、最近のLLM出力の構造的・意味的検証のためのオープンソース開発と、あらゆる種類の外部知識ソースを参照する能力によって支援される事実確認と検証への柔軟なアプローチに基づいている。この設計は適応と拡張が容易であり、モデル固有の知識、ユーザが提供するコンテキスト、外部知識の検索が可能なエージェントを組み合わせることで、どんなグラフ構造化データでも検証することができる。

This study explores the use of Large Language Models (LLMs) for automatic evaluation of knowledge graph (KG) completion models. Historically, validating information in KGs has been a challenging task, requiring large-scale human annotation at prohibitive cost. With the emergence of general-purpose generative AI and LLMs, it is now plausible that human-in-the-loop validation could be replaced by a generative agent. We introduce a framework for consistency and validation when using generative models to validate knowledge graphs. Our framework is based upon recent open-source developments for structural and semantic validation of LLM outputs, and upon flexible approaches to fact checking and verification, supported by the capacity to reference external knowledge sources of any kind. The design is easy to adapt and extend, and can be used to verify any kind of graph-structured data through a combination of model-intrinsic knowledge, user-supplied context, and agents capable of external knowledge retrieval.

翻訳日:2024-04-26 18:51:25 公開日:2024-04-24

# エコー室内:Twitterの誤報の言語的根拠

Inside the echo chamber: Linguistic underpinnings of misinformation on Twitter ( http://arxiv.org/abs/2404.15925v1 )

ライセンス: Link先を確認

Xinyu Wang, Jiayi Li, Sarah Rajtmajer,

(参考訳) ソーシャルメディア利用者は、誤った情報を含む投稿を共有したり、真面目な議論を伴う議論のある話題についてコメントしたりすることで、誤報の拡散をオンラインで推進している。エコーチャンバーの研究は、情報の拡散におけるホモフィリとバイアスによって促進される類似のピアとの繰り返しの相互作用を通じて、ユーザの視点が強化されることを示唆している。社会行動の社会的基盤と言語基盤に対する長年の関心に基づいて、この研究は、誤情報に関する会話が言語利用を通してどのように介在しているかを探求する。会話やユーザコミュニティの話題の中で,言語的尺度,例えば,グループ内/グループ内キュー,可読性,談話接続性などを比較した。誤報の議論において,グループ識別信号の存在が増加し,エコー室内での処理流速が増大することが判明した。本稿では、これらのトピックにわたる傾向の具体的特徴について論じ、文脈的影響について考察する。

Social media users drive the spread of misinformation online by sharing posts that include erroneous information or commenting on controversial topics with unsubstantiated arguments often in earnest. Work on echo chambers has suggested that users' perspectives are reinforced through repeated interactions with like-minded peers, promoted by homophily and bias in information diffusion. Building on long-standing interest in the social bases of language and linguistic underpinnings of social behavior, this work explores how conversations around misinformation are mediated through language use. We compare a number of linguistic measures, e.g., in-/out-group cues, readability, and discourse connectives, within and across topics of conversation and user communities. Our findings reveal increased presence of group identity signals and processing fluency within echo chambers during discussions of misinformation. We discuss the specific character of these broader trends across topics and examine contextual influences.

翻訳日:2024-04-26 18:51:25 公開日:2024-04-24

# ゼロショットクロスリンガル転送の一般化対策

Generalization Measures for Zero-Shot Cross-Lingual Transfer ( http://arxiv.org/abs/2404.15928v1 )

ライセンス: Link先を確認

Saksham Bassi, Duygu Ataman, Kyunghyun Cho,

(参考訳) モデルが未知の入力を異なる特徴で解釈する知識を一般化する能力は、堅牢で信頼性の高い機械学習システムを構築する上で不可欠である。言語モデル評価タスクには、モデル一般化に関する情報メトリクスが欠如しており、新しい設定での適用性は、多くの言語やタスクでしばしば欠落しているタスクと言語固有の下流のパフォーマンスを用いて測定される。本稿では,言語間ゼロショット設定における言語モデルの一般化能力に関する,より効率的な情報計算を支援するための,効率的かつ信頼性の高い尺度のセットについて検討する。学習後のパラメータのばらつきや初期化からの距離といった従来の尺度に加えて、言語間移動の成功を捉えた損失景観のシャープネスの効果も測定し、一般化に相関するモデル最適化のシャープネスを確実に計算する新しい安定アルゴリズムを提案する。

A model's capacity to generalize its knowledge to interpret unseen inputs with different characteristics is crucial to build robust and reliable machine learning systems. Language model evaluation tasks lack information metrics about model generalization and their applicability in a new setting is measured using task and language-specific downstream performance, which is often lacking in many languages and tasks. In this paper, we explore a set of efficient and reliable measures that could aid in computing more information related to the generalization capability of language models in cross-lingual zero-shot settings. In addition to traditional measures such as variance in parameters after training and distance from initialization, we also measure the effectiveness of sharpness in loss landscape in capturing the success in cross-lingual transfer and propose a novel and stable algorithm to reliably compute the sharpness of a model optimum that correlates to generalization.

翻訳日:2024-04-26 18:51:25 公開日:2024-04-24

# エルゴディックから多体局在遷移の複雑度測定による診断

Complexity Measure Diagnostics of Ergodic to Many-Body Localization Transition ( http://arxiv.org/abs/2404.15940v1 )

ライセンス: Link先を確認

Khen Cohen, Yaron Oz, De-liang Zhong,

(参考訳) 三対角化ハミルトニアンのランツォス係数の確率分布関数によって定義される複雑性測定に基づいて,エルゴード相と多体局在相の遷移を新たに診断する。相関強度の関数としてこれらの複雑性尺度を用いて, エルゴードの多体遷移を診断するモーメントとエントロピーと, 初期条件の記憶に関する位相の特徴的な特徴を示す。

We introduce new diagnostics of the transition between the ergodic and many-body localization phases, which are based on complexity measures defined via the probability distribution function of the Lanczos coefficients of the tri-diagonalized Hamiltonian. We use these complexity measures to analyze the power-law random banded matrix model as a function of the correlation strength and show that the moments and the entropy of the distribution diagnose the ergodic to many-body transition, as well as the distinctive feature of the phases concerning the memory of the initial conditions.

翻訳日:2024-04-26 18:51:25 公開日:2024-04-24

# PT対称性相転移の符号としてのSqueezed Displaced Schrödinger-cat状態

Squeezed Displaced Schrödinger-cat state as a signature of the PT-symmetry phase transition ( http://arxiv.org/abs/2404.15942v1 )

ライセンス: Link先を確認

Yuetao Chen, Shoukang Chang, Shaoyan Gao,

(参考訳) パリティ時間(PT)対称系は、非エルミート・ハミルトニアンによって制御され、例外点(EP)で縮退するゲインロス系であり、様々なフォトニック系、電気系、機械系等で研究されている。しかし、電子の輸送特性が劇的に影響される電子系において、PT対称性の相転移をどのように捉えるかは、まだ未解決の問題である。幸いなことに、光子と電子のハイブリッド化は、制御するだけでなく、材料特性を探索する新しい方法を提供する。本稿では,非エルミタンSu-Schrieffer-Heeger(SSH)鎖に結合した空洞を平均場アンザッツで検討する。 Squeezed Displaced Schrodinger cat (SDSc) は, PT対称性が破れ, PT対称性が回復すると, ほぼ1から0に急激な低下を経験する。さらに、半古典的極限において、半古典的光子ハミルトニアン $H_{\rm eff}(x, p)$ において、x=0$ の両側に局所極限が存在し、空洞基底状態における SDSc 状態の出現の明確な記号である。したがって、SDSc状態の出現は、キャビティモードでは修正できないPT対称性の相転移を捉えるのに利用できる。さらに、空洞基底状態を利用して光干渉計の位相を推定し、量子フィッシャー情報と非古典性はEPで急激に低下することを示す。これは、電子材料におけるPT対称性の破れは、位相推定における量子フィッシャー情報と非古典性によっても捉えることができることを示している。

Parity-time (PT ) symmetric systems are gain-loss systems whose dynamics are governed by non-Hermitian Hamiltonians with degeneracies at exceptional-points (EPs) and has been studied in various photonic, electrical, mechanical systems, and so on. However, it is still an open question how to capture PT symmetry phase transition in electronic system where the transport properties of electron will be dramatically effected. Fortunately, the hybridization between photon and electron offers a novel way not only to control but also probe material properties. Here, we investigate a cavity coupled to a non-Hermitian Su-Schrieffer-Heeger (SSH) chain within mean-field ansatzs. We find that Squeezed Displaced Schrodinger cat (SDSc) will emerge with high fidelity in cavity ground state when PT -symmetry is broken and the fidelity will experience a sharp drop from almost 1 to 0 as PT symmetry recovers. Additionally, in semiclassical limit, we find that there exists local extrema at two sides of $x=0$ in semiclassical photon Hamiltonian $H_{\rm eff}(x, p)$, a clear signature of the emergence of SDSc state in cavity ground state. Thus, the appearance of SDSc state can be used to capture PT-symmetry phase transition which can not be modified by cavity mode. Besides, we exploit the cavity ground state to estimate the phase in the optical interferometer, and show that the quantum Fisher information and nonclassicality will sharply decline at EPs. This reveals that PT-symmetry breaking in electronic materials can also be captured by the quantum Fisher information and nonclassicality in phase estimation.

翻訳日:2024-04-26 18:51:25 公開日:2024-04-24

# Mammo-CLIP:マルチビューマンモグラフィーによる乳がん診断におけるコントラスト言語画像前訓練(CLIP)の活用

Mammo-CLIP: Leveraging Contrastive Language-Image Pre-training (CLIP) for Enhanced Breast Cancer Diagnosis with Multi-view Mammography ( http://arxiv.org/abs/2404.15946v1 )

ライセンス: Link先を確認

Xuxin Chen, Yuheng Li, Mingzhe Hu, Ella Salari, Xiaoqian Chen, Richard L. J. Qiu, Bin Zheng, Xiaofeng Yang,

(参考訳) 乳がん検診の精度を高めるために,マンモグラムの多視点からの情報の融合が重要な役割を担っているが,多視点マンモグラムを用いたコンピュータ支援診断(CAD)手法の開発は依然として課題に直面しており,臨床にはそのようなCAD方式は使われていない。この課題を克服するため, CLIP(Contrastive Language- Image Pre-Training)に基づく新たなアプローチについて検討した。マルチビュー機能融合のためのシングルビューCLIPを効果的に適用し、(2)限られたサンプルと計算資源でこのパラメータ密度モデルを効率的に微調整することで、マルチビューマンモグラムと対応する単純なテキストを処理する最初のマルチモーダルフレームワークであるMammo-CLIPを導入する。 Mammo-CLIPは、早期の機能融合戦略を用いて、左右乳房のCCおよびMLOビューから取得した4つのマンモグラムのマルチビュー関係を学習する。学習効率を向上させるため、CLIPイメージとテキストエンコーダにプラグアンドプレイアダプタを追加して、微調整パラメータを指定し、更新を約1%に制限する。フレームワークの評価には、2つのデータセットを振り返りに組み立てました。悪性470例と良性479例からなる最初のデータセットは、5倍のクロスバリデーションにより提案したMammo-CLIPの微調整および内部評価に使用された。悪性60例,良性294例を含む第2のデータセットを用いて,マンモCLIPの一般化性を検討した。その結果,Mammo-CLIPはAUC (0.841 vs. 0.817, 0.837 vs. 0.807) の最先端のクロスビュー・トランスフォーマーよりも優れていた。また、以前の2つのCLIPベースの手法を20.3%、14.3%上回る。本研究は、乳がんの次世代画像テキストベースのCADスキーム開発に、微調整された視覚言語モデルを適用する可能性を強調した。

Although fusion of information from multiple views of mammograms plays an important role to increase accuracy of breast cancer detection, developing multi-view mammograms-based computer-aided diagnosis (CAD) schemes still faces challenges and no such CAD schemes have been used in clinical practice. To overcome the challenges, we investigate a new approach based on Contrastive Language-Image Pre-training (CLIP), which has sparked interest across various medical imaging tasks. By solving the challenges in (1) effectively adapting the single-view CLIP for multi-view feature fusion and (2) efficiently fine-tuning this parameter-dense model with limited samples and computational resources, we introduce Mammo-CLIP, the first multi-modal framework to process multi-view mammograms and corresponding simple texts. Mammo-CLIP uses an early feature fusion strategy to learn multi-view relationships in four mammograms acquired from the CC and MLO views of the left and right breasts. To enhance learning efficiency, plug-and-play adapters are added into CLIP image and text encoders for fine-tuning parameters and limiting updates to about 1% of the parameters. For framework evaluation, we assembled two datasets retrospectively. The first dataset, comprising 470 malignant and 479 benign cases, was used for few-shot fine-tuning and internal evaluation of the proposed Mammo-CLIP via 5-fold cross-validation. The second dataset, including 60 malignant and 294 benign cases, was used to test generalizability of Mammo-CLIP. Study results show that Mammo-CLIP outperforms the state-of-art cross-view transformer in AUC (0.841 vs. 0.817, 0.837 vs. 0.807) on both datasets. It also surpasses previous two CLIP-based methods by 20.3% and 14.3%. This study highlights the potential of applying the finetuned vision-language models for developing next-generation, image-text-based CAD schemes of breast cancer.

翻訳日:2024-04-26 18:51:25 公開日:2024-04-24

# シークエンス(Sequence)は、何をディスクにするかを秘密に教えてくれる

Sequence can Secretly Tell You What to Discard ( http://arxiv.org/abs/2404.15949v1 )

ライセンス: Link先を確認

Jincheng Dai, Zhuowei Huang, Haiyun Jiang, Chen Chen, Deng Cai, Wei Bi, Shuming Shi,

(参考訳) 大規模言語モデル(LLM)は、幅広いタスクにおいて優れたパフォーマンスを保ちながら、大きなGPUメモリを必要とし、かなりの計算資源を消費する。モデル重みに加えて、KVキャッシュが占有するメモリはシーケンス長とともに線形に増加し、推論の主要なボトルネックとなる。本稿では,メモリフットプリントを大幅に削減するKVキャッシュの最適化手法を提案する。包括的調査により、LLaMA2級数モデルでそのことが分かる。 (i)隣接するトークンのクエリベクトルの類似性は非常に高く、 (II)現在のクエリの注意計算は、前回のクエリのわずかな部分の注意情報のみに依存することができる。これらの観測に基づいて,モデルを微調整することなく,重要なキーと値のペアを動的に保持するKVキャッシュ消去ポリシーであるCORMを提案する。 CORMは、LongBenchの6つのタスクで顕著なパフォーマンス劣化を伴わずに、KVキャッシュの推論メモリ使用量を最大70%削減する。

Large Language Models (LLMs), despite their impressive performance on a wide range of tasks, require significant GPU memory and consume substantial computational resources. In addition to model weights, the memory occupied by KV cache increases linearly with sequence length, becoming a main bottleneck for inference. In this paper, we introduce a novel approach for optimizing the KV cache which significantly reduces its memory footprint. Through a comprehensive investigation, we find that on LLaMA2 series models, (i) the similarity between adjacent tokens' query vectors is remarkably high, and (ii) current query's attention calculation can rely solely on the attention information of a small portion of the preceding queries. Based on these observations, we propose CORM, a KV cache eviction policy that dynamically retains important key-value pairs for inference without finetuning the model. We validate that CORM reduces the inference memory usage of KV cache by up to 70% without noticeable performance degradation across six tasks in LongBench.

翻訳日:2024-04-26 18:51:25 公開日:2024-04-24

# 推薦のための混合教師付きグラフコントラスト学習

Mixed Supervised Graph Contrastive Learning for Recommendation ( http://arxiv.org/abs/2404.15954v1 )

ライセンス: Link先を確認

Weizhi Zhang, Liangwei Yang, Zihe Song, Henry Peng Zou, Ke Xu, Yuanjie Zhu, Philip S. Yu,

(参考訳) Recommender System(RecSys)は、オンラインプラットフォームにおいて重要な役割を担い、膨大な情報の中でパーソナライズされた提案を提供する。グラフコントラスト学習は、二部グラフの教師なし強化を伴う高次協調フィルタリング信号から学習することを目的としており、これはペアワイズレコメンデーション損失とコントラストロスの両方を含むマルチタスク学習フレームワークに大きく依存している。この分離された設計は、異なる損失から不整合最適化方向を引き起こす可能性があるため、収束時間が長くなり、サブ最適性能さえも生じる。さらに、RecSysは、拡張中に追加の教師付き協調フィルタリング信号を提供することなく、異なるビューからユーザやイテムを区別することを学ぶため、自己監督によるコントラスト損失はRecSysのデータスパシティ問題を緩和するに足らない。本稿では、これらの問題に対処するために、MixSGCL(Mixed Supervised Graph Contrastive Learning for Recommendation)を提案する。 MixSGCLはもともと、推奨と教師なしのコントラスト損失のトレーニングを教師付きコントラスト学習損失に統合し、2つのタスクを1つの最適化方向に整合させる。データの分散性問題に対処するため,既存のユーザ・イテム相互作用に基づいて,より直接的な教師付き協調フィルタリング信号のマイニングを行うノードワイド・エッジワイド・ミックスアップを提案する。 3つの実世界のデータセットに対する大規模な実験は、MixSGCLが最先端の手法を超越し、精度と効率の両方で最高のパフォーマンスを達成していることを示している。教師付きグラフコントラスト学習におけるMixSGCLの有効性を検証する。

Recommender systems (RecSys) play a vital role in online platforms, offering users personalized suggestions amidst vast information. Graph contrastive learning aims to learn from high-order collaborative filtering signals with unsupervised augmentation on the user-item bipartite graph, which predominantly relies on the multi-task learning framework involving both the pair-wise recommendation loss and the contrastive loss. This decoupled design can cause inconsistent optimization direction from different losses, which leads to longer convergence time and even sub-optimal performance. Besides, the self-supervised contrastive loss falls short in alleviating the data sparsity issue in RecSys as it learns to differentiate users/items from different views without providing extra supervised collaborative filtering signals during augmentations. In this paper, we propose Mixed Supervised Graph Contrastive Learning for Recommendation (MixSGCL) to address these concerns. MixSGCL originally integrates the training of recommendation and unsupervised contrastive losses into a supervised contrastive learning loss to align the two tasks within one optimization direction. To cope with the data sparsity issue, instead unsupervised augmentation, we further propose node-wise and edge-wise mixup to mine more direct supervised collaborative filtering signals based on existing user-item interactions. Extensive experiments on three real-world datasets demonstrate that MixSGCL surpasses state-of-the-art methods, achieving top performance on both accuracy and efficiency. It validates the effectiveness of MixSGCL with our coupled design on supervised graph contrastive learning.

翻訳日:2024-04-26 18:51:25 公開日:2024-04-24

# ディープフェイク画像を超える:AI生成ビデオの検出

Beyond Deepfake Images: Detecting AI-Generated Videos ( http://arxiv.org/abs/2404.15955v1 )

ライセンス: Link先を確認

Danial Samadi Vahdati, Tai D. Nguyen, Aref Azizpour, Matthew C. Stamm,

(参考訳) 生成AIの最近の進歩は、視覚的にリアルな合成ビデオを生成する技術の開発につながっている。本稿では,AI合成画像を検出するために,多くの技術が開発されているが,合成画像検出装置では合成映像を検出できないことを示す。これは、合成ビデオジェネレータが、画像ジェネレータが残したものとはかなり異なるトレースを導入するためである。それにもかかわらず,H.264再圧縮後においても,合成ビデオトレースを学習し,信頼性の高い合成ビデオ検出や生成元属性の実行に利用できることを示す。さらに,ゼロショット転送性による新しいジェネレータからの映像の検出は困難である一方で,新しいジェネレータからの映像の正確な検出は,数ショットの学習によって達成できることを実証した。

Recent advances in generative AI have led to the development of techniques to generate visually realistic synthetic video. While a number of techniques have been developed to detect AI-generated synthetic images, in this paper we show that synthetic image detectors are unable to detect synthetic videos. We demonstrate that this is because synthetic video generators introduce substantially different traces than those left by image generators. Despite this, we show that synthetic video traces can be learned, and used to perform reliable synthetic video detection or generator source attribution even after H.264 re-compression. Furthermore, we demonstrate that while detecting videos from new generators through zero-shot transferability is challenging, accurate detection of videos from a new generator can be achieved through few-shot learning.

翻訳日:2024-04-26 18:51:25 公開日:2024-04-24

# 視覚マンバに関する調査

A Survey on Visual Mamba ( http://arxiv.org/abs/2404.15956v1 )

ライセンス: Link先を確認

Hanwei Zhang, Ying Zhu, Dan Wang, Lijun Zhang, Tianxiang Chen, Zi Ye,

(参考訳) 選択機構とハードウェア対応アーキテクチャを備えた状態空間モデル(SSM)、すなわちMambaは、最近、長いシーケンスモデリングにおいて大きな可能性を証明している。トランスにおける自己注意機構は、画像サイズと計算要求の増加と2次複雑さを持つため、研究者らは現在、コンピュータビジョンタスクにMambaを適用する方法を模索している。本稿では,コンピュータビジョン分野におけるMambaモデルの詳細分析を目的とした,初めての総合的な調査である。これは、状態空間モデルフレームワーク、選択メカニズム、ハードウェア対応設計など、Mambaの成功に寄与する基本的な概念を探求することから始まる。次に、これらの視覚マンバモデルについて、基礎的なモデルに分類し、その高度化を図るために、畳み込み、再発、注意などのテクニックで強化することでレビューする。さらに、様々な視覚処理におけるバックボーンとしての利用を含む、視覚タスクにおけるMambaの幅広い応用を掘り下げる。これには、一般的な視覚タスク、医療視覚タスク(例えば、2D/3Dセグメンテーション、分類、画像登録など)、リモートセンシング視覚タスクが含まれる。本稿では,高次視覚(オブジェクト検出,セグメンテーション,ビデオ分類など)と低次視覚(画像超解像,画像復元,視覚生成など)の2段階から一般的な視覚タスクを紹介する。この取り組みが、現在の課題に対処し、さらにマンバモデルをコンピュータビジョンに適用するために、コミュニティ内でさらなる関心を喚起することを期待しています。

State space models (SSMs) with selection mechanisms and hardware-aware architectures, namely Mamba, have recently demonstrated significant promise in long-sequence modeling. Since the self-attention mechanism in transformers has quadratic complexity with image size and increasing computational demands, the researchers are now exploring how to adapt Mamba for computer vision tasks. This paper is the first comprehensive survey aiming to provide an in-depth analysis of Mamba models in the field of computer vision. It begins by exploring the foundational concepts contributing to Mamba's success, including the state space model framework, selection mechanisms, and hardware-aware design. Next, we review these vision mamba models by categorizing them into foundational ones and enhancing them with techniques such as convolution, recurrence, and attention to improve their sophistication. We further delve into the widespread applications of Mamba in vision tasks, which include their use as a backbone in various levels of vision processing. This encompasses general visual tasks, Medical visual tasks (e.g., 2D / 3D segmentation, classification, and image registration, etc.), and Remote Sensing visual tasks. We specially introduce general visual tasks from two levels: High/Mid-level vision (e.g., Object detection, Segmentation, Video classification, etc.) and Low-level vision (e.g., Image super-resolution, Image restoration, Visual generation, etc.). We hope this endeavor will spark additional interest within the community to address current challenges and further apply Mamba models in computer vision.

翻訳日:2024-04-26 18:41:38 公開日:2024-04-24

# 超低温分子を用いたSU(N)磁性

SU(N) magnetism with ultracold molecules ( http://arxiv.org/abs/2404.15957v1 )

ライセンス: Link先を確認

Bijit Mukherjee, Jeremy M. Hutson, Kaden R. A. Hazzard,

(参考訳) SU($N$)対称性を持つ量子系は、量子多体物理学のパラダイム的な設定である。複雑な物質への洞察と、エキゾチックな基底状態の安定化能力について研究されている。超低温アルカリ原子は、$N=2I+1=1,2,\ldots,10$に対してSU($N$)対称性を示すと予測された。その後の実験により、豊富な多体物理学が明らかになった。しかし、アルカリ原子とアース原子は反発相互作用を持つフェルミオンに対してのみこの対称性を実現する。本稿では, 静電場やマイクロ波との破壊衝突で遮蔽された超低温分子がSU($N$)対称性を示すことを予測し, スピンフリー値からのs波散乱長の偏差は, 静電遮蔽によるCaFの約3倍であり, バイアルカリ分子ではさらに小さいと推定される。彼らはそのドアを、ボソンに32ドル、フェルミオンに36ドル(約3万2000円)まで開けた。それらは、ボゾン系や魅力的な相互作用を含む原子に到達できない重要な特徴を提供する。

Quantum systems with SU($N$) symmetry are paradigmatic settings for quantum many-body physics. They have been studied for the insights they provide into complex materials and their ability to stabilize exotic ground states. Ultracold alkaline-earth atoms were predicted to exhibit SU($N$) symmetry for $N=2I+1=1,2,\ldots,10$, where $I$ is the nuclear spin. Subsequent experiments have revealed rich many-body physics. However, alkaline-earth atoms realize this symmetry only for fermions with repulsive interactions. In this paper, we predict that ultracold molecules shielded from destructive collisions with static electric fields or microwaves exhibit SU($N$) symmetry, which holds because deviations of the s-wave scattering length from the spin-free values are only about 3\% for CaF with static-field shielding and are estimated to be even smaller for bialkali molecules. They open the door to $N$ as large as $32$ for bosons and $36$ for fermions. They offer important features unachievable with atoms, including bosonic systems and attractive interactions.

翻訳日:2024-04-26 18:41:38 公開日:2024-04-24

# 液状化誘起横方向拡散予測のための説明可能なAIモデル

Explainable AI models for predicting liquefaction-induced lateral spreading ( http://arxiv.org/abs/2404.15959v1 )

ライセンス: Link先を確認

Cheng-Hsi Hsiao, Krishna Kumar, Ellen Rathje,

(参考訳) 地震によって引き起こされる液状化は、インフラへの脅威として、相当に横方向の拡散を引き起こす可能性がある。マシンラーニング(ML)は、複雑な土壌特性と現場条件をキャプチャすることで、横方向の拡散予測モデルを改善することができる。しかし、MLモデルの"ブラックボックス"の性質は、重要な意思決定における採用を妨げる可能性がある。本研究は,2011年クライストチャーチ地震のデータに基づいて訓練された横方向拡散予測のためのeXtreme Gradient Boosting(XGB)モデルの解釈にSHAP(SHapley Additive ExPlanations)を用いることにより,この制限に対処する。 SHAP分析は、モデルの予測を駆動し、透明性を高め、確立されたエンジニアリング知識との比較を可能にする要因を明らかにする。その結果, コーン浸透試験(CPT)データから得られた土壌特性の重要性をXGBモデルで同定し, 領域理解との整合性を検証した。この研究は、地球工学とハザードアセスメントにおける信頼性とインフォームドな意思決定のための説明可能な機械学習の価値を強調している。

Earthquake-induced liquefaction can cause substantial lateral spreading, posing threats to infrastructure. Machine learning (ML) can improve lateral spreading prediction models by capturing complex soil characteristics and site conditions. However, the "black box" nature of ML models can hinder their adoption in critical decision-making. This study addresses this limitation by using SHapley Additive exPlanations (SHAP) to interpret an eXtreme Gradient Boosting (XGB) model for lateral spreading prediction, trained on data from the 2011 Christchurch Earthquake. SHAP analysis reveals the factors driving the model's predictions, enhancing transparency and allowing for comparison with established engineering knowledge. The results demonstrate that the XGB model successfully identifies the importance of soil characteristics derived from Cone Penetration Test (CPT) data in predicting lateral spreading, validating its alignment with domain understanding. This work highlights the value of explainable machine learning for reliable and informed decision-making in geotechnical engineering and hazard assessment.

翻訳日:2024-04-26 18:41:38 公開日:2024-04-24

# ステップ周波数GPRフィールド計測の機械学習による土壌解析:予備的検討

Soil analysis with machine-learning-based processing of stepped-frequency GPR field measurements: Preliminary study ( http://arxiv.org/abs/2404.15961v1 )

ライセンス: Link先を確認

Chunlei Xu, Michael Pregesbauer, Naga Sravani Chilukuri, Daniel Windhager, Mahsa Yousefi, Pedro Julian, Lothar Ratschbacher,

(参考訳) 土壌浸透レーダ(GPR)は農業や園芸に関連する土壌パラメータを抽出する手段として広く研究されている。機械学習(ML)法と組み合わせると、SFCW(Stepped Frequency Countinuous Wave Radar)の高分解能測定により、根面深度を含む深さ分解土壌パラメータへの費用対効果が期待できる。この方向への第一歩として、トラクタ搭載SFCW GPR機器を用いた広範囲なフィールドサーベイを行う。 MLデータ処理を用いて、電磁気誘導(EMI)機器を同時に記録することにより、GPR機器の電気伝導率(ECaR)を予測する能力をテストする。ゴルフコースで約6600平方メートルに分散したGPRデータとEMIデータを組み合わせた3472の大規模フィールド計測キャンペーンを行った。選択された地形は高地表面の均一性から恩恵を受けるが、測定された土壌パラメータの変化は小さく、識別が困難である。定量的な結果から,農業環境におけるエンド・ツー・エンドMLの性能評価のための性能指標としてnugget-to-sill比を用いることを提案するとともに,マルチセンサ回帰設定における制限要因について議論する。コードはオープンソースとしてリリースされ、https://opensource.silicon-austria.com/xuc/soil-analysis-machine-learning-stepped- frequency-gprで公開されている。

Ground Penetrating Radar (GPR) has been widely studied as a tool for extracting soil parameters relevant to agriculture and horticulture. When combined with Machine-Learning-based (ML) methods, high-resolution Stepped Frequency Countinuous Wave Radar (SFCW) measurements hold the promise to give cost effective access to depth resolved soil parameters, including at root-level depth. In a first step in this direction, we perform an extensive field survey with a tractor mounted SFCW GPR instrument. Using ML data processing we test the GPR instrument's capabilities to predict the apparent electrical conductivity (ECaR) as measured by a simultaneously recording Electromagnetic Induction (EMI) instrument. The large-scale field measurement campaign with 3472 co-registered and geo-located GPR and EMI data samples distributed over ~6600 square meters was performed on a golf course. The selected terrain benefits from a high surface homogeneity, but also features the challenge of only small, and hence hard to discern, variations in the measured soil parameter. Based on the quantitative results we suggest the use of nugget-to-sill ratio as a performance metric for the evaluation of end-to-end ML performance in the agricultural setting and discuss the limiting factors in the multi-sensor regression setting. The code is released as open source and available at https://opensource.silicon-austria.com/xuc/soil-analysis-machine-learning-stepped-frequency-gpr.

翻訳日:2024-04-26 18:41:38 公開日:2024-04-24

# 量子力学の複素確率最適制御基礎

Complex Stochastic Optimal Control Foundation of Quantum Mechanics ( http://arxiv.org/abs/2404.15964v1 )

ライセンス: Link先を確認

Vasil Yordanov,

(参考訳) 近年の研究では、確率的ハミルトン・ヤコビ・ベルマン方程式(HJB)を用いて量子力学方程式を導出する複雑な変数を含むように拡張されている。しかしながら、これらの研究は通常、HJB方程式を複素数に直接適用することは有効であると仮定する。本稿では,複素変数の文脈においてHJB方程式を適切に適用する方法について述べる。この結果は、コーシー・リーマンの定理に直接的な影響を受け、量子粒子の確率運動を著しく再評価する。これらの知見は量子力学の理解を深めるだけでなく、量子力学に確率論的最適制御を適用するためのフレームワークの数学的厳密性を高める。

Recent studies have expanded the use of the stochastic Hamilton Jacobi Bellman (HJB) equation to include complex variables for deriving quantum mechanical equations. However, these studies typically assume that it is valid to apply the HJB equation directly to complex numbers, an approach that overlooks the fundamental problem of comparing complex numbers to find optimal controls. This paper addresses how to properly apply the HJB equation in the context of complex variables. Our findings significantly reevaluate the stochastic movement of quantum particles, directly influenced by the Cauchy Riemann theorem. These insights not only deepen our understanding of quantum dynamics but also enhance the mathematical rigor of the framework for applying stochastic optimal control in quantum mechanics.

翻訳日:2024-04-26 18:41:38 公開日:2024-04-24

# 対超流体の非局所次数パラメータ

Nonlocal order parameter of pair superfluids ( http://arxiv.org/abs/2404.15972v1 )

ライセンス: Link先を確認

Nitya Cuzzuol, Luca Barbiero, Arianna Montorsi,

(参考訳) 順序パラメータは、量子物質を特徴づける基本的な資源を表す。局所密度測定により導出可能な非局所秩序パラメータである奇数パリティ(英語版)を用いて,ペア超流動を厳密に定義できることが示される。研究の例として,1次元と2次元の異なる密度のボース・ハバードモデルについて検討する。ここでは, 相対的に強い相互作用に対して, 対超流動性を求める。奇パリティ作用素は、系の密度とその次元によらず、そのような位相のユニークな順序パラメータとして作用する。我々の発見を強制するために、我々は、超低温原子系において、実験的な実現がタイムリーな話題である2成分のボース・ハバード・ハミルトン系にも、我々のアプローチの一般性を確認する。その結果, 対超流動における相関密度変動の役割に新たな光を当てた。さらに、これらのエキゾチック相を実験的に検出し、正常な超流動相への遷移を特徴づけるための強力なツールを提供する。

Order parameters represent a fundamental resource to characterize quantum matter. We show that pair superfluids can be rigorously defined in terms of a nonlocal order parameter, named odd parity, which derivation is experimentally accessible by local density measurements. As a case of study, we first investigate a constrained Bose-Hubbard model at different densities, both in one and two spatial dimensions. Here, our analysis finds pair superfluidity for relatively strong attractive interactions. The odd parity operator acts as the unique order parameter for such phase irrespectively to the density of the system and its dimensionality. In order to enforce our finding, we confirm the generality of our approach also on a two-component Bose-Hubbard Hamiltonian, which experimental realization represents a timely topic in ultracold atomic systems. Our results shed new light on the role of correlated density fluctuations in pair superfluids. In addition, they provide a powerful tool for the experimental detection of such exotic phases and the characterization of their transition to the normal superfluid phase.

翻訳日:2024-04-26 18:41:38 公開日:2024-04-24

# 電界のマクロ計測による絡み合いの検出とその変動

Detecting entanglement from macroscopic measurements of the electric field and its fluctuations ( http://arxiv.org/abs/2404.15973v1 )

ライセンス: Link先を確認

Pedro Rosario, Alan C. Santos, Nicola Piovella, Robin Kaiser, André Cidrim, Romain Bachelard,

(参考訳) 大規模量子系における絡み合いを検出するための卓越した課題に対処するために、絡み合いの証人が現われ、状態の分離性に対処している。しかし、目撃者を最適化したり、実験的にアクセスしたりすることは、しばしば困難である。ここでは、その電場、その四量体と全蛍光に基づいて、オープン量子系に対する絡み合った証人の族を紹介します。スピンスクイーズ不等式よりも一般的には、遠距離観測の方向を変えることによって、状態トモグラフィーを必要とせず、連続した目撃者の族が開くため、絡み合った状態の新たなクラスを検出することができる。それらの効率は、ほぼあらゆる方向から、協調的自発放出によって生じる長寿命状態のような集合的な単一光子状態の絡み合いを検出することによって示される。大型量子系における絡み合いを検出できないため、これらの電場に基づく証人は、原子系(コールド原子や閉じ込められたイオン)、巨大原子、色中心、超伝導量子ビットなど、パウリ族によって記述されたあらゆるエミッターに使用できる。

To address the outstanding task of detecting entanglement in large quantum systems, entanglement witnesses have emerged, addressing the separable nature of a state. Yet optimizing witnesses, or accessing them experimentally, often remains a challenge. We here introduce a family of entanglement witnesses for open quantum systems, based on the electric field -- its quadratures and the total fluorescence. More general than spin-squeezing inequalities, it can detect new classes of entangled states, as changing the direction for far-field observation opens up a continuous family of witnesses, without the need for a state tomography. Their efficiency is demonstrated by detecting, from almost any direction, the entanglement of collective single-photon states, such as long-lived states generated by cooperative spontaneous emission. Able to detect entanglement in large quantum systems, these electric-field-based witnesses can be used on any set of emitters described by the Pauli group, such as atomic systems (cold atoms and trapped ions), giant atoms, color centers, and superconducting qubits.

翻訳日:2024-04-26 18:41:38 公開日:2024-04-24

# 3次元都市データのビジュアル分析における最先端技術

The State of the Art in Visual Analytics for 3D Urban Data ( http://arxiv.org/abs/2404.15976v1 )

ライセンス: Link先を確認

Fabio Miranda, Thomas Ortner, Gustavo Moreira, Maryam Hosseini, Milena Vuckovic, Filip Biljecki, Claudio Silva, Marcos Lage, Nivan Ferreira,

(参考訳) 都市化は、多様な利害関係者にとって大きな関心を持つ幅広い現象に対して、都市環境における3次元構造の重要性を増幅してきた。 3次元都市データの普及に伴い、都市環境の特徴に合わせた視覚分析技術の開発に多くの研究が注がれている。しかし、3次元を視覚分析に組み込むことで、都市データの多様な複雑さに対処する効果的なビジュアルツールを設計する上で、さらなる課題がもたらされる。本稿では,3次元都市データの視覚的分析について述べる。私たちの作業では、ユースケース、分析タスク、データ、視覚化、インタラクションを考慮して、公開作業が3つの主要な側面(なぜ、何、どのように、どのように)に沿って行われるかを特徴付けています。我々は、可視化ジャーナルや会議、都市計画、建築、工学など、無数の都市ドメインからの出版作品のきめ細かい分類を提供する。都市と可視化の専門家の視点を取り入れることで、文献のギャップを識別し、可視化研究者に課題と機会を理解する動機を与え、将来の研究方向性を示す。

Urbanization has amplified the importance of three-dimensional structures in urban environments for a wide range of phenomena that are of significant interest to diverse stakeholders. With the growing availability of 3D urban data, numerous studies have focused on developing visual analysis techniques tailored to the unique characteristics of urban environments. However, incorporating the third dimension into visual analytics introduces additional challenges in designing effective visual tools to tackle urban data's diverse complexities. In this paper, we present a survey on visual analytics of 3D urban data. Our work characterizes published works along three main dimensions (why, what, and how), considering use cases, analysis tasks, data, visualizations, and interactions. We provide a fine-grained categorization of published works from visualization journals and conferences, as well as from a myriad of urban domains, including urban planning, architecture, and engineering. By incorporating perspectives from both urban and visualization experts, we identify literature gaps, motivate visualization researchers to understand challenges and opportunities, and indicate future research directions.

翻訳日:2024-04-26 18:41:38 公開日:2024-04-24

# SO(3)空間におけるフーリエ解析について : EquiLoPOネットワーク

On the Fourier analysis in the SO(3) space : EquiLoPO Network ( http://arxiv.org/abs/2404.15979v1 )

ライセンス: Link先を確認

Dmitrii Zhemchuzhnikov, Sergei Grudinin,

(参考訳) 回転不変または等分散を伴う体積データを解析することは、現在の研究において活発なトピックである。既存のディープラーニングアプローチでは、離散的な回転に制限されたグループ畳み込みネットワークまたは制約付きフィルタ構造を持つステアブル畳み込みネットワークを利用する。本研究は, 連続SO(3)群における局所パターンオリエンテーションに対する解析的等価性を実現するとともに, 制約のないトレーニング可能なフィルタであるEquiLoPOネットワークを許容する新しい同変ニューラルネットワークアーキテクチャを提案する。我々の重要な革新は、フーリエ基底として既約表現を活用する群畳み込み演算と、入力から出力関数へのよく定義された写像を提供するSO(3)空間における局所活性化関数であり、等式を保存することである。本稿では,これらの操作をResNetスタイルのアーキテクチャに統合することにより,従来の手法の限界を克服するモデルを提案する。 MedMNIST3Dによる多種多様な3次元医用画像データセットの包括的評価は、我々のアプローチの有効性を示している。この研究は、SO(3) 上の真の回転同値と局所活性化関数によって実現されるフレキシブルな非拘束フィルタの利点を示唆する。私たちのコードは、 \url{https://gricad-gitlab.univ-grenoble-alpes.fr/GruLab/ILPO/-/tree/main/EquiLoPO}で公開されています。

Analyzing volumetric data with rotational invariance or equivariance is an active topic in current research. Existing deep-learning approaches utilize either group convolutional networks limited to discrete rotations or steerable convolutional networks with constrained filter structures. This work proposes a novel equivariant neural network architecture that achieves analytical Equivariance to Local Pattern Orientation on the continuous SO(3) group while allowing unconstrained trainable filters - EquiLoPO Network. Our key innovations are a group convolutional operation leveraging irreducible representations as the Fourier basis and a local activation function in the SO(3) space that provides a well-defined mapping from input to output functions, preserving equivariance. By integrating these operations into a ResNet-style architecture, we propose a model that overcomes the limitations of prior methods. A comprehensive evaluation on diverse 3D medical imaging datasets from MedMNIST3D demonstrates the effectiveness of our approach, which consistently outperforms state of the art. This work suggests the benefits of true rotational equivariance on SO(3) and flexible unconstrained filters enabled by the local activation function, providing a flexible framework for equivariant deep learning on volumetric data with potential applications across domains. Our code is publicly available at \url{https://gricad-gitlab.univ-grenoble-alpes.fr/GruLab/ILPO/-/tree/main/EquiLoPO}.

翻訳日:2024-04-26 18:41:38 公開日:2024-04-24

# 合金を用いた分散量子コンピューティングにおけるテレポーテーション数の最小化

Minimizing the Number of Teleportations in Distributed Quantum Computing Using Alloy ( http://arxiv.org/abs/2404.15980v1 )

ライセンス: Link先を確認

Ali Ebnenasir, Kieran Young,

(参考訳) 本稿では,形式的手法を用いて分散量子コンピューティング(DQC)におけるテレポーテーション数を最小化する新しい手法を提案する。量子テレポーテーションは、量子情報の通信において重要な役割を果たしている。そのため、量子マシンのネットワーク上に量子アルゴリズムを分散させる際には、できるだけ少ないテレポーテーションを実行することが望ましい。グラフ理論やヒューリスティック検索技術に頼っている既存の手法とは対照的に,フォーマルな手法を駆使してテレポーテーション数を最小化する手法を提案する。具体的には,アロイにおけるテレポーテーション最小化問題の形式的仕様,提案するアロイ仕様の量子回路への一般化可能性,異なる量子回路やネットワークに対するアロイ仕様の再使用性,ロードバランシングやヘテロジニティといった他の問題の特定および解決の単純さ,提案手法の構成性などについて述べる。我々はまた、量子回路のテキスト記述を入力として、対応するアロイモデルを生成し、最終的にアロイアナライザを用いて最小化問題を解くqcAlloyと呼ばれるソフトウェアツールを開発した。我々は、100量子ビットと1200層以上のRevLibベンチマークのいくつかの回路に対して、qcAlloyを実験的に評価し、テレポーテーション数の最小化の観点から、ほとんどのベンチマーク回路において、qcAlloyが最も効率的な既存手法の1つであることを示した。

This paper presents a novel approach for minimizing the number of teleportations in Distributed Quantum Computing (DQC) using formal methods. Quantum teleportation plays a major role in communicating quantum information. As such, it is desirable to perform as few teleportations as possible when distributing a quantum algorithm on a network of quantum machines. Contrary to most existing methods which rely on graph-theoretic or heuristic search techniques, we propose a drastically different approach for minimizing the number of teleportations through utilizing formal methods. Specifically, the contributions of this paper include: the formal specification of the teleportation minimization problem in Alloy, the generalizability of the proposed Alloy specifications to quantum circuits with $n$-ary gates, the reusability of the Alloy specifications for different quantum circuits and networks, the simplicity of specifying and solving other problems such as load balancing and heterogeneity, and the compositionality of the proposed approach. We also develop a software tool, called qcAlloy, that takes as input the textual description of a quantum circuit, generates the corresponding Alloy model, and finally solves the minimization problem using the Alloy analyzer. We have experimentally evaluated qcAlloy for some of the circuits in the RevLib benchmark with more than 100 qubits and 1200 layers, and have demonstrated that qcAlloy outperforms one of the most efficient existing methods for most benchmark circuits in terms of minimizing the number of teleportations.

翻訳日:2024-04-26 18:41:38 公開日:2024-04-24

# 位相符号を持つ重ヘックス格子における絡み合った論理量子ビットの生成

Creating entangled logical qubits in the heavy-hex lattice with topological codes ( http://arxiv.org/abs/2404.15989v1 )

ライセンス: Link先を確認

Bence Hetényi, James R. Wootton,

(参考訳) 量子誤差補正の設計は、量子ビットの接続性に強く依存する。固体量子ビットの場合、最も簡単なアプローチは、平面グラフに接続を制約することである。実際の考慮事項は接続性をさらに制限し、現在のIBM Quantumデバイスのヘビーヘックスアーキテクチャのような比較的スパースなグラフをもたらす可能性がある。そのような場合、全ての量子ビットをその潜在能力を最大限に活用することは困難である。代わりに、よく知られた量子誤り訂正符号を実装するために必要なより密接な接続をエミュレートするために、多くの量子ビットは効果的に使われないままである。この作業では、このバグが機能にどのように変換されるかを示します。 1つのコードの未使用のキュービットを使って別のコードを実行することで、2つのコードが相互に実装され、フォールトトレラントなエンタングルゲートと測定を簡単に適用できる。我々は、表面コードとBacon-Shor符号を133量子ビットのIBM量子デバイス上で実現し、これを実証する。横方向のCXゲートと格子の手術を用いて、コード距離が最大$d = 4$および5ラウンドの安定化器測定サイクルを持つこれらの論理量子ビット間の絡み合いを示す。量子ビット間の非平面結合により、論理的な$XX$, $YY$, $ZZ$Observablesを同時に測定できる。これにより、$d=2$の場合と$d=3$の場合の両方において、9,4\%$の忠実さを特徴とするポストセレクションと、量子誤り訂正のみを用いて$d=3$のインスタンスの不正性を検証する。

Designs for quantum error correction depend strongly on the connectivity of the qubits. For solid state qubits, the most straightforward approach is to have connectivity constrained to a planar graph. Practical considerations may also further restrict the connectivity, resulting in a relatively sparse graph such as the heavy-hex architecture of current IBM Quantum devices. In such cases it is hard to use all qubits to their full potential. Instead, in order to emulate the denser connectivity required to implement well-known quantum error correcting codes, many qubits remain effectively unused. In this work we show how this bug can be turned into a feature. By using the unused qubits of one code to execute another, two codes can be implemented on top of each other, allowing easy application of fault-tolerant entangling gates and measurements. We demonstrate this by realizing a surface code and a Bacon-Shor code on a 133 qubit IBM Quantum device. Using transversal CX gates and lattice surgery we demonstrate entanglement between these logical qubits with code distance up to $d = 4$ and five rounds of stabilizer measurement cycles. The nonplanar coupling between the qubits allows us to simultaneously measure the logical $XX$, $YY$, and $ZZ$ observables. With this we verify the violation of Bell's inequality for both the $d=2$ case with post selection featuring a fidelity of $94\%$, and the $d=3$ instance using only quantum error correction.

翻訳日:2024-04-26 18:41:38 公開日:2024-04-24

# HDDGAN:赤外・可視画像融合のための異種二重識別器生成アドバイサルネットワーク

HDDGAN: A Heterogeneous Dual-Discriminator Generative Adversarial Network for Infrared and Visible Image Fusion ( http://arxiv.org/abs/2404.15992v1 )

ライセンス: Link先を確認

Guosheng Lu, Zile Fang, Chunming He, Zhigang Zhao,

(参考訳) 赤外線・可視画像融合(IVIF)は、可視画像からテクスチャの詳細を統合しつつ、赤外線画像からの熱放射情報を保存することを目的としており、複雑なシーンや乱れた環境において、重要な特徴や被写体の隠れた詳細を捉えることができる。その結果、IVIFは、ビデオ監視、夜間ナビゲーション、ターゲット認識などの実用的な応用において、明確なアドバンテージを提供する。しかし、赤外線と可視画像の異なる特徴により、熱領域の特徴と詳細な情報を同時に取得する上で、一般的な手法はしばしば課題に直面している。その結果、融合の結果は熱標的領域情報とテクスチャの詳細の間の妥協を頻繁に伴う。本研究では,この問題に対処するために,新しい異種二重識別器生成敵ネットワーク(HDDGAN)を提案する。具体的には、このジェネレータはマルチスケールのスキップ接続構造として構成され、異なるソース画像から必須の特徴の抽出を容易にする。融合結果の情報表現能力を高めるために、ソース画像間の相違を利用して、ジェネレータ内の情報融合層を構築するための注意機構を用いる。さらに、赤外線と可視画像における情報の異なる学習要件を認識し、異なる構造を持つ2つの識別器を設計する。本手法は、可視画像から詳細な情報を同時に取得しながら、赤外線画像から有能な情報を学習するためのモデルを導くことを目的としている。様々な公開データセット上で行った大規模な実験は、提案したHDDGANが他の最先端(SOTA)アルゴリズムよりも優れていることを実証し、実用的な応用の可能性を強調した。

Infrared and visible image fusion (IVIF) aims to preserve thermal radiation information from infrared images while integrating texture details from visible images, enabling the capture of important features and hidden details of subjects in complex scenes and disturbed environments. Consequently, IVIF offers distinct advantages in practical applications such as video surveillance, night navigation, and target recognition. However, prevailing methods often face challenges in simultaneously capturing thermal region features and detailed information due to the disparate characteristics of infrared and visible images. Consequently, fusion outcomes frequently entail a compromise between thermal target area information and texture details. In this study, we introduce a novel heterogeneous dual-discriminator generative adversarial network (HDDGAN) to address this issue. Specifically, the generator is structured as a multi-scale skip-connected structure, facilitating the extraction of essential features from different source images. To enhance the information representation ability of the fusion result, an attention mechanism is employed to construct the information fusion layer within the generator, leveraging the disparities between the source images. Moreover, recognizing the distinct learning requirements of information in infrared and visible images, we design two discriminators with differing structures. This approach aims to guide the model to learn salient information from infrared images while simultaneously capturing detailed information from visible images. Extensive experiments conducted on various public datasets demonstrate the superiority of our proposed HDDGAN over other state-of-the-art (SOTA) algorithms, highlighting its enhanced potential for practical applications.

翻訳日:2024-04-26 18:41:38 公開日:2024-04-24

# LLMの不確かさ推定と定量化: 簡単な監視手法

Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach ( http://arxiv.org/abs/2404.15993v1 )

ライセンス: Link先を確認

Linyu Liu, Yu Pan, Xiaocheng Li, Guanting Chen,

(参考訳) 大規模言語モデル(LLM)は多くのタスクに対して高い能力を持つが、信頼できないあるいは不正確な出力を生成することがある。この問題に対処するために,LLMの不確実性推定と校正の問題について検討する。まず LLM の不確実性推定問題を定式化し,ラベル付きデータセットを利用して LLM の応答の不確かさを推定する教師付きアプローチを提案する。定式化に基づいて,LLMの不確実性推定と標準MLモデルの不確実性推定の違いを説明し,LLMの隠れアクティベーションが不確実性情報を含んでいる理由を説明する。提案手法は, 各種タスク間の不確実性評価に隠れアクティベーションを利用する利点を効果的に示し, アウト・オブ・ディストリビューション・セッティングにおけるロバストな転送可能性を示す。さらに,不確実性推定タスクと不確実性判定タスクを区別し,不確実性推定モードが良好なキャリブレーション性能をもたらすことを示す。実際には,本手法は実装が容易で,ブラックボックス,グレイボックス,ホワイトボックスなど,さまざまなモデルの透過性に適応し,LCMの内部機構のアクセシビリティに基づいた高い性能を示す。

Large language models (LLMs) are highly capable of many tasks but they can sometimes generate unreliable or inaccurate outputs. To tackle this issue, this paper studies the problem of uncertainty estimation and calibration for LLMs. We begin by formulating the uncertainty estimation problem for LLMs and then propose a supervised approach that takes advantage of the labeled datasets and estimates the uncertainty of the LLMs' responses. Based on the formulation, we illustrate the difference between the uncertainty estimation for LLMs and that for standard ML models and explain why the hidden activations of the LLMs contain uncertainty information. Our designed approach effectively demonstrates the benefits of utilizing hidden activations for enhanced uncertainty estimation across various tasks and shows robust transferability in out-of-distribution settings. Moreover, we distinguish the uncertainty estimation task from the uncertainty calibration task and show that a better uncertainty estimation mode leads to a better calibration performance. In practice, our method is easy to implement and is adaptable to different levels of model transparency including black box, grey box, and white box, each demonstrating strong performance based on the accessibility of the LLM's internal mechanisms.

翻訳日:2024-04-26 18:41:38 公開日:2024-04-24

# BeSound: クロスモーダル蒸留によるBluetoothによる位置推定

BeSound: Bluetooth-Based Position Estimation Enhancing with Cross-Modality Distillation ( http://arxiv.org/abs/2404.15999v1 )

ライセンス: Link先を確認

Hymalai Bello, Sungho Suh, Bo Zhou, Paul Lukowicz,

(参考訳) スマートファクトリーは、製造プロセスの最適化と効率の向上に先進技術を活用している。主にカメラベースの手法による作業者追跡システムの実装は、正確な監視を保証する。しかしながら、労働者のプライバシと技術保護に関する懸念は、代替アプローチを検討する必要がある。本稿では,Bluetooth Low Energy (BLE) と超音波座標を用いた非視覚的,スケーラブルなソリューションを提案する。 BLEの位置推定は、スマートフォンで利用でき、多くのスマートフォンユーザーのためにスケーラブルであり、労働者のローカライゼーションと安全プロトコルの送信を容易にするため、非常に低消費電力でコスト効率のソリューションを提供する。超音波信号は応答時間と精度を向上するが、カスタムハードウェアが必要であり、コストが増大する。両モダリティの利点を組み合わせるために,超音波信号からBLE RSSIデータへの知識蒸留(KD)を用いる。学生モデルが訓練されると、モデルはBLE-RSSIデータを入力して推論するだけで、ユビキティとBLE RSSIの低コストの利点を保ちます。スマートファクトリテストベッド環境において,12人の参加者による実験から得られたデータを用いて,アプローチを検証した。その結果,F1スコアの11.79%がベースライン(KDのないターゲットモデル,BLE-RSSIデータのみのトレーニング)に比べて増加した。

Smart factories leverage advanced technologies to optimize manufacturing processes and enhance efficiency. Implementing worker tracking systems, primarily through camera-based methods, ensures accurate monitoring. However, concerns about worker privacy and technology protection make it necessary to explore alternative approaches. We propose a non-visual, scalable solution using Bluetooth Low Energy (BLE) and ultrasound coordinates. BLE position estimation offers a very low-power and cost-effective solution, as the technology is available on smartphones and is scalable due to the large number of smartphone users, facilitating worker localization and safety protocol transmission. Ultrasound signals provide faster response times and higher accuracy but require custom hardware, increasing costs. To combine the benefits of both modalities, we employ knowledge distillation (KD) from ultrasound signals to BLE RSSI data. Once the student model is trained, the model only takes as inputs the BLE-RSSI data for inference, retaining the advantages of ubiquity and low cost of BLE RSSI. We tested our approach using data from an experiment with twelve participants in a smart factory test bed environment. We obtained an increase of 11.79% in the F1-score compared to the baseline (target model without KD and trained with BLE-RSSI data only).

翻訳日:2024-04-26 18:41:38 公開日:2024-04-24

# 包括的で使いやすいマルチタスク医療画像メタデータセット(MedIMeta)

A comprehensive and easy-to-use multi-domain multi-task medical imaging meta-dataset (MedIMeta) ( http://arxiv.org/abs/2404.16000v1 )

ライセンス: Link先を確認

Stefano Woerner, Arthur Jaques, Christian F. Baumgartner,

(参考訳) 医療画像分析の分野では、機械学習技術の統合による変革が起きているが、これらの技術の主な課題は、大きく、多様で、よく注釈付けされたデータセットの不足であることが多い。医療画像はフォーマット、サイズ、その他のパラメータによって異なり、機械学習での使用には広範な事前処理と標準化が必要である。これらの課題に対処するため,新しいマルチドメイン・マルチタスク・メタデータセットであるMedIMeta(MedIMeta)を紹介した。 MedIMetaには、10の異なるドメインにまたがる19の医療画像データセットが含まれており、54の異なる医療タスクを含んでいる。我々はMedimetaの技術的検証を行い、完全に教師付きおよびクロスドメインの学習ベースラインを通じてその実用性を実証する。

While the field of medical image analysis has undergone a transformative shift with the integration of machine learning techniques, the main challenge of these techniques is often the scarcity of large, diverse, and well-annotated datasets. Medical images vary in format, size, and other parameters and therefore require extensive preprocessing and standardization, for usage in machine learning. Addressing these challenges, we introduce the Medical Imaging Meta-Dataset (MedIMeta), a novel multi-domain, multi-task meta-dataset. MedIMeta contains 19 medical imaging datasets spanning 10 different domains and encompassing 54 distinct medical tasks, all of which are standardized to the same format and readily usable in PyTorch or other ML frameworks. We perform a technical validation of MedIMeta, demonstrating its utility through fully supervised and cross-domain few-shot learning baselines.

翻訳日:2024-04-26 18:31:49 公開日:2024-04-24

# Max-Cut問題に対するヒューリスティックフロケット断熱アルゴリズムのベンチマーク

Benchmarking a heuristic Floquet adiabatic algorithm for the Max-Cut problem ( http://arxiv.org/abs/2404.16001v1 )

ライセンス: Link先を確認

Etienne Granet, Henrik Dreyer,

(参考訳) 量子力学の断熱定理によれば、ハミルトニアンの基底状態にある系は、徐々にハミルトニアンが変化すると基底状態に残る。これは原理的に量子コンピュータの難しい問題を解くのに使うことができる。しかし、このハミルトン力学をデジタル量子コンピュータに実装するには、システムのサイズとシミュレーション時間でトロッターステップのサイズをスケーリングする必要がある。本研究では,古典的最適化問題に対して,有限トラッターステップで断熱的進化を行うことができることを論じる。この「フロッケの断熱進化」は、通常の連続的な断熱進化と比べて門の数を数桁減少させる。行列積-状態シミュレーションを用いた数値的なエビデンスでは、多数のインスタンスにおいて3ドル正則グラフ上のマックス・カット問題を、驚くほど低い実行時間で最適に解くことができるが、結合次元が$D=2$である。計算結果を外挿することで、量子コンピュータが古典的な正確な解法や近似解法と競合するために必要なリソースを推定する。

According to the adiabatic theorem of quantum mechanics, a system initially in the ground state of a Hamiltonian remains in the ground state if one slowly changes the Hamiltonian. This can be used in principle to solve hard problems on quantum computers. Generically, however, implementation of this Hamiltonian dynamics on digital quantum computers requires scaling Trotter step size with system size and simulation time, which incurs a large gate count. In this work, we argue that for classical optimization problems, the adiabatic evolution can be performed with a fixed, finite Trotter step. This "Floquet adiabatic evolution" reduces by several orders of magnitude the gate count compared to the usual, continuous-time adiabatic evolution. We give numerical evidence using matrix-product-state simulations that it can optimally solve the Max-Cut problem on $3$-regular graphs in a large number of instances, with surprisingly low runtime, even with bond dimensions as low as $D=2$. Extrapolating our numerical results, we estimate the resources needed for a quantum computer to compete with classical exact or approximate solvers for this specific problem.

翻訳日:2024-04-26 18:31:49 公開日:2024-04-24

# 中心とチャネル状態の双対性

Channel-State duality with centers ( http://arxiv.org/abs/2404.16004v1 )

ライセンス: Link先を確認

Simon Langenscheidt, Daniele Oriti, Eugenia Colafranceschi,

(参考訳) 直和構造を持つヒルベルト空間の場合に対して、通常のチャネル状態双対性から生じる写像の拡張について検討する。この設定は、一般に制約と結びついている中心を持つ代数の表現に現れ、量子多体理論からホログラフィーや量子重力まで多くの物理的応用がある。我々は、状態の非分離性と誘導チャネルの等尺性との間には一般的な関係があることを証明した。また、無限次元ヒルベルト空間上のトレースクラス作用素の代数へのアプローチの一般化も提供する。

We study extensions of the mappings arising in usual Channel-State duality to the case of Hilbert spaces with a direct sum structure. This setting arises in representations of algebras with centers, which are commonly associated with constraints, and it has many physical applications from quantum many-body theory to holography and quantum gravity. We establish that there is a general relationship between non-separability of the state and the isometric properties of the induced channel. We also provide a generalisation of our approach to algebras of trace-class operators on infinite dimensional Hilbert spaces.

翻訳日:2024-04-26 18:31:49 公開日:2024-04-24

# ウェアラブル能動認識のための単モーダル・マルチモーダルセンサフュージョン

Unimodal and Multimodal Sensor Fusion for Wearable Activity Recognition ( http://arxiv.org/abs/2404.16005v1 )

ライセンス: Link先を確認

Hymalai Bello,

(参考訳) 異なる感覚のモダリティと複数の位置を組み合わせることは、人間の行動のような複雑な状況に対する統一された認識と理解を形成するのに役立つ。したがって、ヒューマンアクティビティ認識(HAR)は、重複情報と補完情報(Unimodal/Multimodal)を組み合わせることで恩恵を受ける。それでも、簡単な作業ではありません。センサー技術、信号処理、データ融合アルゴリズム、ドメイン固有の知識などの専門知識を含む、多分野のアプローチが必要です。このPh.D.の仕事は、慣性、圧力(音響と大気圧)、およびHARのための繊維の容量感覚のような感覚モーダルを取り入れている。探索されたシナリオは、ジェスチャーと手の位置追跡、顔と頭部のパターン認識、身体姿勢とジェスチャー認識である。選択されたウェアラブルデバイスとセンシングモダリティは、マシンラーニングベースのアルゴリズムと完全に統合されており、その一部は組み込みデバイス、エッジに実装され、リアルタイムでテストされる。

Combining different sensing modalities with multiple positions helps form a unified perception and understanding of complex situations such as human behavior. Hence, human activity recognition (HAR) benefits from combining redundant and complementary information (Unimodal/Multimodal). Even so, it is not an easy task. It requires a multidisciplinary approach, including expertise in sensor technologies, signal processing, data fusion algorithms, and domain-specific knowledge. This Ph.D. work employs sensing modalities such as inertial, pressure (audio and atmospheric pressure), and textile capacitive sensing for HAR. The scenarios explored are gesture and hand position tracking, facial and head pattern recognition, and body posture and gesture recognition. The selected wearable devices and sensing modalities are fully integrated with machine learning-based algorithms, some of which are implemented in the embedded device, on the edge, and tested in real-time.

翻訳日:2024-04-26 18:31:49 公開日:2024-04-24

# MMT-Bench:マルチタスクAGIに向けた大規模ビジョンランゲージモデル評価のための総合的マルチモーダルベンチマーク

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI ( http://arxiv.org/abs/2404.16006v1 )

ライセンス: Link先を確認

Kaining Ying, Fanqing Meng, Jin Wang, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Runjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao,

(参考訳) LVLM(Large Vision-Language Models)は、視覚対話や埋め込みナビゲーションといった汎用マルチモーダルアプリケーションにおいて大きな進歩を見せている。しかし、既存のマルチモーダル評価ベンチマークでは、LVLM開発を追尾するに足りず、初歩的な能力をテストする限られた数のマルチモーダルタスクをカバーしている。本研究では,専門家の知識と意図的な視覚認識,ローカライゼーション,推論,計画を必要とする大規模マルチモーダルタスクのLVLMを評価するための総合的なベンチマークであるMT-Benchを提案する。 MMT-Benchは、自動車運転や車載ナビゲーションなど、さまざまなマルチモーダルシナリオから、厳密にキュレートされた多目的視覚質問を311,325ドル、マルチモーダル理解において32ドルのメタタスクと162ドルのサブタスクをカバーしている。 MMT-Benchはその広範なタスクカバレッジのため、タスクマップを使用してLVLMの評価を可能にし、ドメイン内および外部タスクの発見を容易にする。プロプライエタリなGPT-4V、GeminiProVision、オープンソースのInternVL-Chatなどの30ドルのLVLMによる評価結果は、MMT-Benchがもたらす重大な課題を浮き彫りにした。我々は,MT-Benchがコミュニティに,汎用マルチモーダルインテリジェンスの実現を目的とした次世代マルチモーダル基盤モデルの開発を促すことを期待する。

Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks testing rudimentary capabilities, falling short in tracking LVLM development. In this study, we present MMT-Bench, a comprehensive benchmark designed to assess LVLMs across massive multimodal tasks requiring expert knowledge and deliberate visual recognition, localization, reasoning, and planning. MMT-Bench comprises $31,325$ meticulously curated multi-choice visual questions from various multimodal scenarios such as vehicle driving and embodied navigation, covering $32$ core meta-tasks and $162$ subtasks in multimodal understanding. Due to its extensive task coverage, MMT-Bench enables the evaluation of LVLMs using a task map, facilitating the discovery of in- and out-of-domain tasks. Evaluation results involving $30$ LVLMs such as the proprietary GPT-4V, GeminiProVision, and open-sourced InternVL-Chat, underscore the significant challenges posed by MMT-Bench. We anticipate that MMT-Bench will inspire the community to develop next-generation multimodal foundation models aimed at achieving general-purpose multimodal intelligence.

翻訳日:2024-04-26 18:31:49 公開日:2024-04-24

# 射影ベル状態測定による二部体機械的猫状態の生成

Generation of bipartite mechanical cat state by performing projective Bell state measurement ( http://arxiv.org/abs/2404.16007v1 )

ライセンス: Link先を確認

Roson Nongthombam, Urmimala Dewan, Amarendra K. Sarma,

(参考訳) 量子状態の調製と、フォトニックおよびフォノンのSchr\"odinger cat状態の測定は、量子計算における代替符号化スキームの影響により、大きな関心を集めている。これらのスキームはコヒーレントな状態重ね合わせを採用し、2レベルシステムとは対照的にキャビティまたは機械共振器によって提供される拡張ヒルベルト空間を利用する。さらに、このような猫の状態は、マクロ系の基本的な量子現象をテストするためのプラットフォームとしても機能する。本研究では, 2つの超伝導量子ビット上でのベル状態計測により, 絡み合い交換方式を用いて4つの二分音素のベル猫状態を生成する。その後,CHSH法を用いて両側ネコ状態のベル不等式試験を行った。絡み合い状態が絡み合い交換によって生成されることを考えると,本手法は連続変数系に基づく複雑な量子ネットワークプロセッサの進歩に有望な応用を期待できる。

Quantum state preparation and measurement of photonic and phononic Schr\"odinger cat states have gathered significant interest due to their implications for alternative encoding schemes in quantum computation. These scheme employ coherent state superpositions, leveraging the expanded Hilbert space provided by cavity or mechanical resonators in contrast to two-level systems. Moreover, such cat states also serve as a platform for testing fundamental quantum phenomena in macroscopic systems. In this study, we generate four bipartite phononic Bell cat states using an entanglement swapping scheme achieved through projective Bell state measurements on two superconducting qubits. Subsequently, we conduct a Bell inequality test on the bipartite cat state using the CHSH formulation. Given that the entangled cat states are generated through entanglement swapping, our approach could hold promising applications for the advancement of complex quantum network processors based on continuous variable systems.

翻訳日:2024-04-26 18:31:49 公開日:2024-04-24

# Gated Sparse Autoencodersによる辞書学習の改善

Improving Dictionary Learning with Gated Sparse Autoencoders ( http://arxiv.org/abs/2404.16014v1 )

ライセンス: Link先を確認

Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah, Neel Nanda,

(参考訳) 最近の研究で、スパースオートエンコーダ(SAE)は、言語モデル(LM)アクティベーションにおける解釈可能な特徴の教師なし発見に有効な手法であることがわかった。 Gated Sparse Autoencoder (Gated SAE) を導入する。 SAEでは、スパーシリティを促進するために使われるL1ペナルティは、縮小など多くの望ましくないバイアスをもたらす。 Gated SAEの重要な洞察は、機能の分離である。 a) どの方向を使うか、または使うかを決定すること b) これらの方向の大きさを推定することにより、L1ペナルティを前者のみに適用することができ、望ましくない副作用の範囲を制限することができる。最大7BパラメータのLM上でのSAEのトレーニングにより、通常の超パラメータ範囲では、Gated SAEは収縮を解消し、同様に解釈可能であり、同等の再現忠実性を達成するのに半分の発射特性を必要とすることがわかった。

Recent work has found that sparse autoencoders (SAEs) are an effective technique for unsupervised discovery of interpretable features in language models' (LMs) activations, by finding sparse, linear reconstructions of LM activations. We introduce the Gated Sparse Autoencoder (Gated SAE), which achieves a Pareto improvement over training with prevailing methods. In SAEs, the L1 penalty used to encourage sparsity introduces many undesirable biases, such as shrinkage -- systematic underestimation of feature activations. The key insight of Gated SAEs is to separate the functionality of (a) determining which directions to use and (b) estimating the magnitudes of those directions: this enables us to apply the L1 penalty only to the former, limiting the scope of undesirable side effects. Through training SAEs on LMs of up to 7B parameters we find that, in typical hyper-parameter ranges, Gated SAEs solve shrinkage, are similarly interpretable, and require half as many firing features to achieve comparable reconstruction fidelity.

翻訳日:2024-04-26 18:31:49 公開日:2024-04-24

# 神経オペレーターは磁気流体力学の局所物理学を学ぶ

Neural Operators Learn the Local Physics of Magnetohydrodynamics ( http://arxiv.org/abs/2404.16015v1 )

ライセンス: Link先を確認

Taeyoung Kim, Youngsoo Ha, Myungjoo Kang,

(参考訳) 磁気流体力学(MHD)は、プラズマと導電性流体の力学を記述し、恒星や銀河の構造や進化などの現象を理解するのに不可欠であり、理想的なMHD方程式によるプラズマ運動のための核融合において重要な役割を担っている。これらの双曲型PDEを解くには、複雑な構造と高いコストによる計算上の課題を提示する、洗練された数値的な方法が必要である。最近の進歩は、従来の数値解析のための代理モデルとしてフーリエニューラル演算子(FNO)のようなニューラル演算子を導入している。本研究では, 理想的なMHDの数値フラックスを近似する修正されたフラックスフーリエニューラル演算子モデルについて検討し, 連続推論, サンプル分布外への一般化, 古典的な数値スキームよりも高速な計算を行うことにより, 既存のニューラル演算子モデルより優れている手法を提案する。

Magnetohydrodynamics (MHD) plays a pivotal role in describing the dynamics of plasma and conductive fluids, essential for understanding phenomena such as the structure and evolution of stars and galaxies, and in nuclear fusion for plasma motion through ideal MHD equations. Solving these hyperbolic PDEs requires sophisticated numerical methods, presenting computational challenges due to complex structures and high costs. Recent advances introduce neural operators like the Fourier Neural Operator (FNO) as surrogate models for traditional numerical analyses. This study explores a modified Flux Fourier neural operator model to approximate the numerical flux of ideal MHD, offering a novel approach that outperforms existing neural operator models by enabling continuous inference, generalization outside sampled distributions, and faster computation compared to classical numerical schemes.

翻訳日:2024-04-26 18:31:49 公開日:2024-04-24

# RetinaRegNet:網膜画像登録のためのVersatileアプローチ

RetinaRegNet: A Versatile Approach for Retinal Image Registration ( http://arxiv.org/abs/2404.16017v1 )

ライセンス: Link先を確認

Vishal Balaji Sivaraman, Muhammad Imran, Qingyue Wei, Preethika Muralidharan, Michelle R. Tamplin, Isabella M . Grumbach, Randy H. Kardon, Jui-Kai Wang, Yuyin Zhou, Wei Shao,

(参考訳) 本稿では,網膜画像登録タスクにおける最先端性能を実現するRetinaRegNetモデルを提案する。 RetinaRegNetは網膜画像のトレーニングを必要としない。拡散モデルから派生した画像特徴を用いて、2つの網膜画像間の点対応を確立することから始まる。このプロセスでは、SIFTアルゴリズムとランダム点サンプリングを併用して、移動画像から特徴点を選択する。各選択された特徴点について、その点における特徴ベクトルと固定画像中の全ての画素の特徴ベクトルとの類似性を評価することにより、2D相関マップを算出する。相関マップにおける最も類似度の高い画素は、移動画像の特徴点に対応する。推定点対応における外れ値を取り除くために、まず逆整合制約を適用し、次に変換に基づく外れ値検出器を適用した。この手法は、広く使われているランダムサンプルコンセンサス(RANSAC)のアウリア検出器をかなりの差で上回った。大きな変形に対処するために、我々は2段階の画像登録フレームワークを利用した。第1段階ではホモグラフィ変換を用い,第2段階ではより正確な3階多項式変換を用いた。このモデルの有効性は、カラーファンドス画像、フルオレセイン血管造影画像、レーザースペックルフロー画像の3つの網膜画像データセットで実証された。 RetinaRegNetは、現在の最先端メソッドを3つのデータセットすべてで上回った。特に画像対を大きな変位とスケール変形で登録するのに有効であった。この革新は網膜画像解析における様々な応用を約束する。私たちのコードはhttps://github.com/mirthAI/RetinaRegNet.comで公開されています。

We introduce the RetinaRegNet model, which can achieve state-of-the-art performance across various retinal image registration tasks. RetinaRegNet does not require training on any retinal images. It begins by establishing point correspondences between two retinal images using image features derived from diffusion models. This process involves the selection of feature points from the moving image using the SIFT algorithm alongside random point sampling. For each selected feature point, a 2D correlation map is computed by assessing the similarity between the feature vector at that point and the feature vectors of all pixels in the fixed image. The pixel with the highest similarity score in the correlation map corresponds to the feature point in the moving image. To remove outliers in the estimated point correspondences, we first applied an inverse consistency constraint, followed by a transformation-based outlier detector. This method proved to outperform the widely used random sample consensus (RANSAC) outlier detector by a significant margin. To handle large deformations, we utilized a two-stage image registration framework. A homography transformation was used in the first stage and a more accurate third-order polynomial transformation was used in the second stage. The model's effectiveness was demonstrated across three retinal image datasets: color fundus images, fluorescein angiography images, and laser speckle flowgraphy images. RetinaRegNet outperformed current state-of-the-art methods in all three datasets. It was especially effective for registering image pairs with large displacement and scaling deformations. This innovation holds promise for various applications in retinal image analysis. Our code is publicly available at https://github.com/mirthAI/RetinaRegNet.

翻訳日:2024-04-26 18:31:49 公開日:2024-04-24

# PRISMアライメントプロジェクト:大規模言語モデルの主観的・多文化的アライメントに関する参加的・代表的・個人的フィードバック

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models ( http://arxiv.org/abs/2404.16019v1 )

ライセンス: Link先を確認

Hannah Rose Kirk, Alexander Whitefield, Paul Röttger, Andrew Bean, Katerina Margatina, Juan Ciro, Rafael Mosquera, Max Bartolo, Adina Williams, He He, Bertie Vidgen, Scott A. Hale,

(参考訳) 人間のフィードバックは、大規模言語モデル(LLM)のアライメントにおいて中心的な役割を果たす。しかしながら、人間のフィードバック収集の方法(方法)、ドメイン(場所)、人(人)、目的(目的)について、オープンな疑問が残る。 PRISMは,75か国から1500の多様な参加者の好みを,21のLDMと8,011のライブ会話において,文脈的嗜好ときめ細かいフィードバックにマッピングする新しいデータセットである。 PRISM の貢献一人的フィードバックデータにおける広域的及び人口統計学的関与 (二集団福祉(英国及び米国)の理解のための国勢調査表示サンプル二点、及び三すべての評価が詳細な参加者プロファイルに関連づけられた個別のフィードバックにより、個人化及びサンプルアーティファクトの帰属が図られる。我々は、主観的・多文化的な視点を主眼とする会話の収集に重点を置いており、最も対人的・異文化的な意見の相違を期待する。我々は,対話の多様性,嗜好の多様性,福祉効果の3つのケーススタディを通じて,PRISMの有用性を実証し,人間がアライメント規範を設定することの重要性を示した。私たちは、リッチなコミュニティリソースを提供するだけでなく、AI開発への幅広い参加と、技術設計に対するより包括的なアプローチを提唱しています。

Human feedback plays a central role in the alignment of Large Language Models (LLMs). However, open questions remain about the methods (how), domains (where), people (who) and objectives (to what end) of human feedback collection. To navigate these questions, we introduce PRISM, a new dataset which maps the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries, to their contextual preferences and fine-grained feedback in 8,011 live conversations with 21 LLMs. PRISM contributes (i) wide geographic and demographic participation in human feedback data; (ii) two census-representative samples for understanding collective welfare (UK and US); and (iii) individualised feedback where every rating is linked to a detailed participant profile, thus permitting exploration of personalisation and attribution of sample artefacts. We focus on collecting conversations that centre subjective and multicultural perspectives on value-laden and controversial topics, where we expect the most interpersonal and cross-cultural disagreement. We demonstrate the usefulness of PRISM via three case studies of dialogue diversity, preference diversity, and welfare outcomes, showing that it matters which humans set alignment norms. As well as offering a rich community resource, we advocate for broader participation in AI development and a more inclusive approach to technology design.

翻訳日:2024-04-26 18:31:49 公開日:2024-04-24

# Universal Adversarial TriggersはUniversalではない

Universal Adversarial Triggers Are Not Universal ( http://arxiv.org/abs/2404.16020v1 )

ライセンス: Link先を確認

Nicholas Meade, Arkil Patel, Siva Reddy,

(参考訳) 最近の研究は、アライメントされた言語モデルから安全でない応答を引き出すことができる逆引き金と呼ばれるトークンシーケンスを見つけるための最適化手順を開発した。これらのトリガーは普遍的に転送可能であると考えられており、例えば、あるモデルに最適化されたトリガーは、他のモデルをジェイルブレイクすることができる。本稿では,このような敵対的引き金が普遍的でないことを具体的に示す。我々は13個のオープンモデル間のトリガ転送を広範囲に調査し、一貫性のない転送を観察する。提案実験により,予測最適化モデル (APO) とファインチューニングモデル (AFT) の相反的トリガに対するロバスト性に有意な差が認められた。 APOモデルは、トリガがモデルに直接最適化されている場合でも、ジェイルブレイクが非常に難しいことが分かりました。一方, AFT モデルでは, 各種の安全でない命令に対する拒絶反応を呈するが, 敵の引き金に非常に敏感であることを示す。最後に、ATTモデルに最適化されたほとんどのトリガは、5つの異なるドメインからの新しい安全でない命令に一般化され、その脆弱性をさらに強調する。全体として、我々の研究は、アライメント言語モデルのより包括的な安全性評価の必要性を強調しています。

Recent work has developed optimization procedures to find token sequences, called adversarial triggers, which can elicit unsafe responses from aligned language models. These triggers are believed to be universally transferable, i.e., a trigger optimized on one model can jailbreak other models. In this paper, we concretely show that such adversarial triggers are not universal. We extensively investigate trigger transfer amongst 13 open models and observe inconsistent transfer. Our experiments further reveal a significant difference in robustness to adversarial triggers between models Aligned by Preference Optimization (APO) and models Aligned by Fine-Tuning (AFT). We find that APO models are extremely hard to jailbreak even when the trigger is optimized directly on the model. On the other hand, while AFT models may appear safe on the surface, exhibiting refusals to a range of unsafe instructions, we show that they are highly susceptible to adversarial triggers. Lastly, we observe that most triggers optimized on AFT models also generalize to new unsafe instructions from five diverse domains, further emphasizing their vulnerability. Overall, our work highlights the need for more comprehensive safety evaluations for aligned language models.

翻訳日:2024-04-26 18:31:49 公開日:2024-04-24

# PuLID: コントラストアライメントによるPureとLightning IDのカスタマイズ

PuLID: Pure and Lightning ID Customization via Contrastive Alignment ( http://arxiv.org/abs/2404.16022v1 )

ライセンス: Link先を確認

Zinan Guo, Yanze Wu, Zhuowei Chen, Lang Chen, Qian He,

(参考訳) 本稿では,PuLID(Pure and Lightning ID customization)を提案する。標準拡散器にLightning T2Iブランチを組み込むことで、PuLIDはコントラストアライメント損失と正確なID損失の両方を導入し、オリジナルのモデルの破壊を最小限に抑え、高いID忠実度を確保する。実験の結果,PuLIDはIDの忠実度と編集性の両方において優れた性能を示した。 PuLIDのもうひとつの魅力は、ID挿入前後のイメージ要素(例えば、背景、照明、構成、スタイル)を可能な限り一貫した状態に保つことである。コードとモデルはhttps://github.com/ToTheBeginning/PuLIDで入手できる。

We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation. By incorporating a Lightning T2I branch with a standard diffusion one, PuLID introduces both contrastive alignment loss and accurate ID loss, minimizing disruption to the original model and ensuring high ID fidelity. Experiments show that PuLID achieves superior performance in both ID fidelity and editability. Another attractive property of PuLID is that the image elements (e.g., background, lighting, composition, and style) before and after the ID insertion are kept as consistent as possible. Codes and models will be available at https://github.com/ToTheBeginning/PuLID

翻訳日:2024-04-26 18:31:49 公開日:2024-04-24

# ベイズ行列正規混合回帰を用いた自動車追従行動の学習

Learning Car-Following Behaviors Using Bayesian Matrix Normal Mixture Regression ( http://arxiv.org/abs/2404.16023v1 )

ライセンス: Link先を確認

Chengyuan Zhang, Kehua Chen, Meixin Zhu, Hai Yang, Lijun Sun,

(参考訳) 自動車追従行動(CF)の学習と理解は, 微視的交通シミュレーションにおいて重要である。従来のCFモデルは、単純ではあるが、しばしば一般化機能に欠けるが、多くのデータ駆動方式は、頑丈さにもかかわらず、限定的な解釈性を持つ「ブラックボックス」として機能する。このギャップを埋めるために、この研究は、CFの挙動に固有の特徴相関と時間ダイナミクスを同時に捉えるベイズ行列正規混合回帰(MNMR)モデルを導入する。このアプローチは、モデルフレームワーク内で行と列の共分散行列を別々に学習することで、人間のドライバ決定プロセスに対する洞察力のある視点を提供する。広範囲な実験を通じて、入力の様々な履歴ステップ、出力の予測ステップ、およびモデル複雑度にまたがるモデルの性能を評価する。その結果,CF中に存在する複雑な相関関係と時間的ダイナミクスを効果的に捉える上で,モデルの有効性を一貫して示すことができた。集中的なケーススタディでは、学習平均と共分散行列を通して、異なる操作条件を識別する、モデルがより優れた解釈可能性を示す。これは、CFシナリオにおける複雑な人間の運転行動を理解する上での我々のモデルの有効性を浮き彫りにするだけでなく、交通シミュレーションや自律運転システムにおけるCF動作の解釈可能性を高めるツールとしての可能性も強調する。

Learning and understanding car-following (CF) behaviors are crucial for microscopic traffic simulation. Traditional CF models, though simple, often lack generalization capabilities, while many data-driven methods, despite their robustness, operate as "black boxes" with limited interpretability. To bridge this gap, this work introduces a Bayesian Matrix Normal Mixture Regression (MNMR) model that simultaneously captures feature correlations and temporal dynamics inherent in CF behaviors. This approach is distinguished by its separate learning of row and column covariance matrices within the model framework, offering an insightful perspective into the human driver decision-making processes. Through extensive experiments, we assess the model's performance across various historical steps of inputs, predictive steps of outputs, and model complexities. The results consistently demonstrate our model's adeptness in effectively capturing the intricate correlations and temporal dynamics present during CF. A focused case study further illustrates the model's outperforming interpretability of identifying distinct operational conditions through the learned mean and covariance matrices. This not only underlines our model's effectiveness in understanding complex human driving behaviors in CF scenarios but also highlights its potential as a tool for enhancing the interpretability of CF behaviors in traffic simulations and autonomous driving systems.

翻訳日:2024-04-26 18:31:49 公開日:2024-04-24

# 複屈折スピン-光子界面は偏光絡みを発生させる

Birefringent spin-photon interface generates polarization entanglement ( http://arxiv.org/abs/2404.16025v1 )

ライセンス: Link先を確認

Nikita Leppenen, Dmitry S. Smirnov,

(参考訳) マイクロピラーキャビティにおける単体電荷量子ドットの発光に基づくスピンフォトン界面は、フォトニック絡み状態の生成を可能にする。現在の装置は共振器複屈折に悩まされており、スピン光子絡みの発生を制限する。本稿では、異方性キャビティとの界面による光吸収と発光を理論的に研究し、最大励起およびスピン光子絡み合い条件を導出する。本研究では, マイクロピラーキャビティに対して, 共振器モード間の量子ドット共鳴を厳密に調整することにより, マイクロピラーキャビティに対して, スピン光子状態の1と完全量子ドット集団インバージョンとを等しく一致させることができることを示す。このスイートスポットは、最大エンタングル状態で3つの三角形と忠実度を計算することで示すように、多光子クラスター状態を生成するのにも有効である。

A spin-photon interface based on the luminescence of a singly charged quantum dot in a micropillar cavity allows for the creation of photonic entangled states. Current devices suffer from cavity birefringence, which limits the generation of spin-photon entanglement. In this paper, we theoretically study the light absorption and emission by the interface with an anisotropic cavity and derive the maximal excitation and spin-photon entanglement conditions. We show that the concurrence of the spin-photon state equal to one and complete quantum dot population inversion can be reached for a micropillar cavity with any degree of birefringence by tuning the quantum dot resonance strictly between the cavity modes. This sweet spot is also valid for generating a multiphoton cluster state, as we demonstrate by calculating the three-tangle and fidelity with the maximally entangled state.

翻訳日:2024-04-26 18:31:49 公開日:2024-04-24

# 編集可能な画像要素と制御可能な合成

Editable Image Elements for Controllable Synthesis ( http://arxiv.org/abs/2404.16029v1 )

ライセンス: Link先を確認

Jiteng Mu, Michaël Gharbi, Richard Zhang, Eli Shechtman, Nuno Vasconcelos, Xiaolong Wang, Taesung Park,

(参考訳) 拡散モデルはテキスト誘導合成タスクに大きな進歩をもたらした。しかし,拡散モデルの高次元ノイズ入力空間は,画像インバージョンや空間編集に自然に適していないため,ユーザが提供する画像の編集は依然として困難である。本研究では,拡散モデルを用いて入力画像の空間的編集を促進する画像表現を提案する。具体的には、入力画像を忠実に再構築できる「イメージ要素」に入力を符号化することを学ぶ。これらの要素はユーザによって直感的に編集することができ、拡散モデルによって現実的な画像にデコードされる。オブジェクトのリサイズ,再配置,ドラッグング,デオクルージョン,除去,変動,画像合成など,画像編集作業における表現の有効性を示す。プロジェクトページ: https://jitengmu.github.io/Editable_Image_Elements/

Diffusion models have made significant advances in text-guided synthesis tasks. However, editing user-provided images remains challenging, as the high dimensional noise input space of diffusion models is not naturally suited for image inversion or spatial editing. In this work, we propose an image representation that promotes spatial editing of input images using a diffusion model. Concretely, we learn to encode an input into "image elements" that can faithfully reconstruct an input image. These elements can be intuitively edited by a user, and are decoded by a diffusion model into realistic images. We show the effectiveness of our representation on various image editing tasks, such as object resizing, rearrangement, dragging, de-occlusion, removal, variation, and image composition. Project page: https://jitengmu.github.io/Editable_Image_Elements/

翻訳日:2024-04-26 18:22:04 公開日:2024-04-24

# MoDE: クラスタリングによるCLIPデータエキスパート

MoDE: CLIP Data Experts via Clustering ( http://arxiv.org/abs/2404.16030v1 )

ライセンス: Link先を確認

Jiawei Ma, Po-Yao Huang, Saining Xie, Shang-Wen Li, Luke Zettlemoyer, Shih-Fu Chang, Wen-Tau Yih, Hu Xu,

(参考訳) 対照的な言語画像事前訓練(CLIP)の成功は、画像とキャプションのペアリングによる監督に依存しており、ウェブクローリングされたデータでは騒がしい傾向にある。データエキスパートの混合(Mixture of Data Experts, MODE)を提示し,クラスタリングによるCLIPデータエキスパートのシステム学習を行う。各データエキスパートは、あるデータクラスタでトレーニングされ、他のクラスタの偽陰性ノイズに対する感度が低い。推定時には,タスクメタデータとクラスタ条件の相関関係から決定される重みを適用して,それらの出力をアンサンブルする。相関関係を正確に推定するには、あるクラスタ内のサンプルは意味論的に類似するべきであるが、データ専門家の数は、トレーニングと推論に妥当である必要がある。このように、人間の言語におけるオントロジーを考察し、粗粒度レベルで各データエキスパートを表現するために、きめ細かいクラスタセンターを使うことを提案する。実験によると、ViT-B/16の4人のCLIPデータ専門家が、OpenAI CLIPとOpenCLIPによるViT-L/14のゼロショット画像分類よりも優れており、トレーニングコストは安い($35\%)。一方、MoDEはすべてのデータエキスパートを非同期にトレーニングすることができ、フレキシブルに新しいデータエキスパートを組み込むことができます。コードはhttps://github.com/facebookresearch/MetaCLIP/tree/main/modeで公開されている。

The success of contrastive language-image pretraining (CLIP) relies on the supervision from the pairing between images and captions, which tends to be noisy in web-crawled data. We present Mixture of Data Experts (MoDE) and learn a system of CLIP data experts via clustering. Each data expert is trained on one data cluster, being less sensitive to false negative noises in other clusters. At inference time, we ensemble their outputs by applying weights determined through the correlation between task metadata and cluster conditions. To estimate the correlation precisely, the samples in one cluster should be semantically similar, but the number of data experts should still be reasonable for training and inference. As such, we consider the ontology in human language and propose to use fine-grained cluster centers to represent each data expert at a coarse-grained level. Experimental studies show that four CLIP data experts on ViT-B/16 outperform the ViT-L/14 by OpenAI CLIP and OpenCLIP on zero-shot image classification but with less ($<$35\%) training cost. Meanwhile, MoDE can train all data expert asynchronously and can flexibly include new data experts. The code is available at https://github.com/facebookresearch/MetaCLIP/tree/main/mode.

翻訳日:2024-04-26 18:22:04 公開日:2024-04-24

# リアリスティック・ナレッジ・コンフリクトによる大規模言語モデル行動の研究

Studying Large Language Model Behaviors Under Realistic Knowledge Conflicts ( http://arxiv.org/abs/2404.16032v1 )

ライセンス: Link先を確認

Evgenii Kortukov, Alexander Rubinstein, Elisa Nguyen, Seong Joon Oh,

(参考訳) Retrieval-augmented Generation (RAG) は、時間的劣化、幻覚、根拠の欠如など、完全なパラメトリック言語モデルの多くの問題を緩和する。 RAGでは、コンテキストで提供される文書からモデルの知識を更新することができる。これは、モデルのパラメトリック知識とコンテキスト情報の間に矛盾するケースを引き起こし、モデルがその知識を常に更新するとは限らない。それまでの研究は、モデルの正しいパラメトリック回答と矛盾する合成文書を作成することによって、知識の対立を研究していた。本稿では,知識紛争を現実的に研究するための枠組みを提案する。我々は、真に矛盾する文書を用いて、誤ったパラメトリック知識を更新する。これは、知識の衝突が実際どのように起こるのかを反映している。この現実的なシナリオでは、知識更新が以前報告されたよりも頻繁に失敗することが分かります。モデルがまだ回答を更新できない場合、パラメトリックバイアスが見つかります。これらの結果から, LLMの実践的パラメトリック知識は, 読解能力や行動に悪影響を及ぼす可能性が示唆された。私たちのコードはhttps://github.com/kortukov/realistic_knowledge_conflicts/で利用可能です。

Retrieval-augmented generation (RAG) mitigates many problems of fully parametric language models, such as temporal degradation, hallucinations, and lack of grounding. In RAG, the model's knowledge can be updated from documents provided in context. This leads to cases of conflict between the model's parametric knowledge and the contextual information, where the model may not always update its knowledge. Previous work studied knowledge conflicts by creating synthetic documents that contradict the model's correct parametric answers. We present a framework for studying knowledge conflicts in a realistic setup. We update incorrect parametric knowledge using real conflicting documents. This reflects how knowledge conflicts arise in practice. In this realistic scenario, we find that knowledge updates fail less often than previously reported. In cases where the models still fail to update their answers, we find a parametric bias: the incorrect parametric answer appearing in context makes the knowledge update likelier to fail. These results suggest that the factual parametric knowledge of LLMs can negatively influence their reading abilities and behaviors. Our code is available at https://github.com/kortukov/realistic_knowledge_conflicts/.

翻訳日:2024-04-26 18:22:04 公開日:2024-04-24

# Cantor:MLLMのマルチモーダルチェイン・オブ・サード

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM ( http://arxiv.org/abs/2404.16033v1 )

ライセンス: Link先を確認

Timin Gao, Peixian Chen, Mengdan Zhang, Chaoyou Fu, Yunhang Shen, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Xing Sun, Liujuan Cao, Rongrong Ji,

(参考訳) 大型言語モデル(LLM)の出現は、チェーン・オブ・シント(CoT)手法によって強化され、視覚的推論問題は通常、管理可能なサブタスクに分解され、様々な外部ツールで順次取り組まれる。しかし、このようなパラダイムは、視覚情報不足や包括的推論に必要な抽象的な要約を提供するのに失敗する低レベルの認識ツールの制限により、意思決定における「幻覚の決定」の可能性に直面している。視覚的コンテキスト獲得と論理的推論の集約は、視覚的推論タスクに取り組む上で重要であると我々は主張する。本稿では,マルチモーダル言語モデル(MLLM)を用いた複雑な視覚的推論タスクとその認知能力を解くために,マルチモーダル CoT の領域を掘り下げる。そこで我々はCantorと呼ばれる革新的なマルチモーダルCoTフレームワークを提案する。 Cantorはまず意思決定ジェネレータとして機能し、視覚入力を統合して画像と問題を分析し、実際のコンテキストとの密接な整合性を確保する。さらに、CantorはMLLMの高度な認知機能を活用し、高いレベルの情報を引き出すための多面的専門家として機能し、CoT生成プロセスを強化する。提案手法の有効性を実証し,2つの複雑な視覚的推論データセットにまたがるマルチモーダルCoT性能の大幅な向上を示す。プロジェクトページ: https://ggg0919.github.io/cantor/。

With the advent of large language models(LLMs) enhanced by the chain-of-thought(CoT) methodology, visual reasoning problem is usually decomposed into manageable sub-tasks and tackled sequentially with various external tools. However, such a paradigm faces the challenge of the potential "determining hallucinations" in decision-making due to insufficient visual information and the limitation of low-level perception tools that fail to provide abstract summaries necessary for comprehensive reasoning. We argue that converging visual context acquisition and logical reasoning is pivotal for tackling visual reasoning tasks. This paper delves into the realm of multimodal CoT to solve intricate visual reasoning tasks with multimodal large language models(MLLMs) and their cognitive capability. To this end, we propose an innovative multimodal CoT framework, termed Cantor, characterized by a perception-decision architecture. Cantor first acts as a decision generator and integrates visual inputs to analyze the image and problem, ensuring a closer alignment with the actual context. Furthermore, Cantor leverages the advanced cognitive functions of MLLMs to perform as multifaceted experts for deriving higher-level information, enhancing the CoT generation process. Our extensive experiments demonstrate the efficacy of the proposed framework, showing significant improvements in multimodal CoT performance across two complex visual reasoning datasets, without necessitating fine-tuning or ground-truth rationales. Project Page: https://ggg0919.github.io/cantor/ .

翻訳日:2024-04-26 18:22:04 公開日:2024-04-24

# MaGGIe:Masked Guided Gradual Human Instance Matting

MaGGIe: Masked Guided Gradual Human Instance Matting ( http://arxiv.org/abs/2404.16035v1 )

ライセンス: Link先を確認

Chuong Huynh, Seoung Wug Oh, Abhinav Shrivastava, Joon-Young Lee,

(参考訳) ヒューマン・マッティング(Human matting)は、画像およびビデオ処理における基礎的なタスクであり、入力から人間の前景ピクセルを抽出する。以前の作業では、追加のガイダンスによって精度を向上させるか、フレーム間の単一インスタンスの時間的一貫性を改善するかのどちらかだった。我々は,計算コスト,精度,整合性を維持しつつ,ヒトのインスタンスごとのα行列を段階的に予測する新しいフレームワークであるMasked Guided Gradual Human Instance Mattingを提案する。提案手法はトランスフォーマーアテンションやスパースコンボリューションなど,現代的なアーキテクチャを活用して,メモリやレイテンシを爆発させることなく,すべてのインスタンスマットを同時に出力する。提案手法は,マルチインスタンスシナリオにおいて一定の推論コストを抑えながら,提案したベンチマーク上で頑健かつ多目的な性能を実現する。高品質な画像とビデオのマッチングベンチマークにより、実世界のシナリオにおけるモデルの一般化を促進するために、公開されているソースからの新規なマルチインスタンス合成アプローチが導入された。

Human matting is a foundation task in image and video processing, where human foreground pixels are extracted from the input. Prior works either improve the accuracy by additional guidance or improve the temporal consistency of a single instance across frames. We propose a new framework MaGGIe, Masked Guided Gradual Human Instance Matting, which predicts alpha mattes progressively for each human instances while maintaining the computational cost, precision, and consistency. Our method leverages modern architectures, including transformer attention and sparse convolution, to output all instance mattes simultaneously without exploding memory and latency. Although keeping constant inference costs in the multiple-instance scenario, our framework achieves robust and versatile performance on our proposed synthesized benchmarks. With the higher quality image and video matting benchmarks, the novel multi-instance synthesis approach from publicly available sources is introduced to increase the generalization of models in real-world scenarios.

翻訳日:2024-04-26 18:22:04 公開日:2024-04-24

# TLA+仕様に対する分散プログラムのトレースの検証

Validating Traces of Distributed Programs Against TLA+ Specifications ( http://arxiv.org/abs/2404.16075v1 )

ライセンス: Link先を確認

Horatiu Cirstea, Markus A. Kuppe, Benjamin Loillier, Stephan Merz,

(参考訳) TLA+は、分散アルゴリズムを含むシステムを特定するための形式言語であり、強力な検証ツールによってサポートされている。本稿では,分散プログラムのトレースをTLA+で記述した高レベル仕様に関連付けるためのフレームワークを提案する。この問題は、TLCモデルチェッカーを用いて実現した制約付きモデルチェック問題に還元される。我々のフレームワークは,実行のトレースを記録するためにJavaプログラムを計測するAPI,それらのトレースを仕様に関連付けるために使用するTLA+演算子のコレクション,モデルチェッカーを実行するためのスクリプトで構成される。提案手法を複数の分散プログラムに適用し,すべてのケースにおいて仕様と実装の相違を検出する。本稿では,これらの相違の原因,TLCによる検証の解釈方法,実装開発におけるトレース検証の結果を考慮する方法について論じる。

TLA+ is a formal language for specifying systems, including distributed algorithms, that is supported by powerful verification tools. In this work we present a framework for relating traces of distributed programs to high-level specifications written inTLA+. The problem is reduced to a constrained model checking problem, realized using the TLC model checker. Our framework consists of an API for instrumenting Java programs in order to record traces of executions, of a collection of TLA+ operators that are used for relating those traces to specifications, and of scripts for running the model checker.Crucially, traces only contain updates to specification variables rather than full values, and it is not necessary to provide values for all variables. We have applied our approach to several distributed programs, detecting discrepancies between the specifications and the implementations in all cases. We discuss reasons for these discrepancies, how to interpret the verdict produced by TLC, and how to take into account the results of trace validation for implementation development.

翻訳日:2024-04-26 18:22:04 公開日:2024-04-24

# 地震検出のためのセマンティック進化強化グラフオートエンコーダ

Semantic Evolvement Enhanced Graph Autoencoder for Rumor Detection ( http://arxiv.org/abs/2404.16076v1 )

ライセンス: Link先を確認

Xiang Tao, Liang Wang, Qiang Liu, Shu Wu, Liang Wang,

(参考訳) ソーシャルメディア上の噂が急速に広まる中、噂検出は極めて重要な課題となっている。近年,テキスト情報とイベントの伝播構造を利用した多数の噂検出モデルが提案されている。しかし,これらの手法は伝播過程における事象の意味的進化情報の重要性を軽視し,教師付き訓練パラダイムや従来の噂検出手法で真に学ぶことはしばしば困難である。本稿では,新しい意味進化拡張グラフオートエンコーダ(GARD)モデルを提案する。このモデルは、特定のグラフオートエンコーダと再構成戦略を通じて、局所的な意味変化とグローバルな意味進化情報をキャプチャすることで、事象の意味進化情報を学ぶ。セマンティック進化情報と伝搬構造情報を組み合わせることで、イベント伝播の包括的理解を達成し、正確かつ堅牢な検出を行うとともに、セマンティック進化情報を早期にキャプチャすることで、より早い段階での噂を検出する。さらに、噂や非噂の異なるパターンを学習するモデルの能力を高めるために、モデルの性能をさらに向上させる一様正則化手法を導入する。 3つの公開ベンチマークデータセットによる実験結果から、GARD法が全体的な性能と早期噂検出の両方において最先端のアプローチよりも優れていることが確認された。

Due to the rapid spread of rumors on social media, rumor detection has become an extremely important challenge. Recently, numerous rumor detection models which utilize textual information and the propagation structure of events have been proposed. However, these methods overlook the importance of semantic evolvement information of event in propagation process, which is often challenging to be truly learned in supervised training paradigms and traditional rumor detection methods. To address this issue, we propose a novel semantic evolvement enhanced Graph Autoencoder for Rumor Detection (GARD) model in this paper. The model learns semantic evolvement information of events by capturing local semantic changes and global semantic evolvement information through specific graph autoencoder and reconstruction strategies. By combining semantic evolvement information and propagation structure information, the model achieves a comprehensive understanding of event propagation and perform accurate and robust detection, while also detecting rumors earlier by capturing semantic evolvement information in the early stages. Moreover, in order to enhance the model's ability to learn the distinct patterns of rumors and non-rumors, we introduce a uniformity regularizer to further improve the model's performance. Experimental results on three public benchmark datasets confirm the superiority of our GARD method over the state-of-the-art approaches in both overall performance and early rumor detection.

翻訳日:2024-04-26 18:22:04 公開日:2024-04-24

# ゼロショット強化学習によるスーパーコンパイラコードの最適化

Supercompiler Code Optimization with Zero-Shot Reinforcement Learning ( http://arxiv.org/abs/2404.16077v1 )

ライセンス: Link先を確認

Jialong Wu, Chaoyi Deng, Jianmin Wang, Mingsheng Long,

(参考訳) コンパイラーにおける効果的なコード最適化は、コンピュータとソフトウェア工学において中心的な役割を果たす。コンパイラはユーザの介入を必要とせずに自動的に最適化空間を検索できるが、検索が遅くて面倒なので、これは標準的プラクティスではない。ここでは、大規模なデータに基づいて広範囲に訓練された人工知能エージェントであるCodeZeroを紹介し、エージェントの1回の試行において、各プログラムの効果的な最適化戦略を即時に生成する。可能なテストプログラムの膨大な範囲を克服するために、品質、自然性、多様性を重視したトレーニングプログラムの大規模なデータセットを作成します。最適化可能な膨大なスペースに対処するため,コンパイラ環境のワールドモデルと対話することで,エージェントをサンプル効率で訓練する深層強化学習を適用した。ベンチマークスイートと本番レベルのコード最適化問題の両方の評価は、エージェントのスーパーコンパイラのパフォーマンスとゼロショットの一般化能力を示し、コンパイラの専門家が設計したビルトイン最適化よりも優れています。われわれの手法は、人工知能の工学的潜在能力を生かし、コード最適化の領域で機械学習技術をスケールする方法を開拓する。

Effective code optimization in compilers plays a central role in computer and software engineering. While compilers can be made to automatically search the optimization space without the need for user interventions, this is not a standard practice since the search is slow and cumbersome. Here we present CodeZero, an artificial intelligence agent trained extensively on large data to produce effective optimization strategies instantly for each program in a single trial of the agent. To overcome the huge range of possible test programs, we prepare a large dataset of training programs that emphasize quality, naturalness, and diversity. To tackle the vast space of possible optimizations, we adapt deep reinforcement learning to train the agent in a sample-efficient manner through interacting with a world model of the compiler environment. Evaluation on both benchmark suites and production-level code optimization problems demonstrates our agent's supercompiler performances and zero-shot generalization abilities, outperforming built-in optimization options designed by compiler experts. Our methodology kindles the great potential of artificial intelligence for engineering and paves the way for scaling machine learning techniques in the realm of code optimization.

翻訳日:2024-04-26 18:22:04 公開日:2024-04-24

# 階層的時間的抽象化による世界モデル学習:確率論的視点

Learning World Models With Hierarchical Temporal Abstractions: A Probabilistic Perspective ( http://arxiv.org/abs/2404.16078v1 )

ライセンス: Link先を確認

Vaisakh Shaj,

(参考訳) ヒューマンインテリジェンスを2種類の推論能力で再現できるマシンは、複数のレベルの時空間的抽象化とスケールを内部世界モデルを使って推論できるべきである。現実世界のダイナミクスに固有の因果的階層を正確に反映した、そのような内的世界モデルを開発するための形式主義を考案することは、人工知能と機械学習の分野における重要な研究課題である。この論文は、状態空間モデル(SSM)を内部世界モデルとして広く使われることによるいくつかの制限を特定し、これらの欠点に対処するために、Hidden-Parameter SSMとMulti-Time Scale SSMという2つの新しい確率形式を提案する。両方の形式主義におけるグラフィカルモデルの構造は、信念の伝播を用いたスケーラブルな正確な確率的推論と、時間を通してのバックプロパゲーションによるエンドツーエンドの学習を促進する。このアプローチは、複数の時間的抽象化とスケールにわたる非定常力学を表現することができるスケーラブルで適応的な階層的世界モデルの開発を可能にする。さらに、これらの確率論的形式主義は世界状態の不確実性の概念を統合し、現実世界の確率的性質をエミュレートし、その予測に対する自信を定量化する能力を向上させる。論文はまた、これらの形式主義がベイズ脳仮説と述語処理に関する関連する神経科学の文献とどのように一致しているかについても論じている。様々な実・模擬ロボットを用いた実験により,我々のフォーマリズムが一致し,多くの場合において,長距離将来の予測を行う上で,現代の変圧器変圧器の性能を上回ることが実証された。論文の結論は、現在のモデルの限界を反映し、今後の研究の方向性を示唆することである。

Machines that can replicate human intelligence with type 2 reasoning capabilities should be able to reason at multiple levels of spatio-temporal abstractions and scales using internal world models. Devising formalisms to develop such internal world models, which accurately reflect the causal hierarchies inherent in the dynamics of the real world, is a critical research challenge in the domains of artificial intelligence and machine learning. This thesis identifies several limitations with the prevalent use of state space models (SSMs) as internal world models and propose two new probabilistic formalisms namely Hidden-Parameter SSMs and Multi-Time Scale SSMs to address these drawbacks. The structure of graphical models in both formalisms facilitates scalable exact probabilistic inference using belief propagation, as well as end-to-end learning via backpropagation through time. This approach permits the development of scalable, adaptive hierarchical world models capable of representing nonstationary dynamics across multiple temporal abstractions and scales. Moreover, these probabilistic formalisms integrate the concept of uncertainty in world states, thus improving the system's capacity to emulate the stochastic nature of the real world and quantify the confidence in its predictions. The thesis also discuss how these formalisms are in line with related neuroscience literature on Bayesian brain hypothesis and predicitive processing. Our experiments on various real and simulated robots demonstrate that our formalisms can match and in many cases exceed the performance of contemporary transformer variants in making long-range future predictions. We conclude the thesis by reflecting on the limitations of our current models and suggesting directions for future research.

翻訳日:2024-04-26 18:12:21 公開日:2024-04-24

# 反射共焦点顕微鏡のAI駆動解析による診断の強化

Enhancing Diagnosis through AI-driven Analysis of Reflectance Confocal Microscopy ( http://arxiv.org/abs/2404.16080v1 )

ライセンス: Link先を確認

Hong-Jun Yoon, Chris Keum, Alexander Witkowski, Joanna Ludzik, Tracy Petrie, Heidi A. Hanson, Sancy A. Leachman,

(参考訳) 反射共焦点顕微鏡(英: Reflectance Confocal Microscopy、RCM)は、生体医学研究や臨床皮膚学で用いられる非侵襲的イメージング技術である。皮膚と表皮組織の高解像度画像を仮想的に提供し、物理的生検の必要性を減らす。 RCMはレーザー光源を用いて組織を照明し、反射した光を捉え、様々な深さの顕微鏡構造の詳細画像を生成する。近年の研究では、RCM画像の解析のためのAIと機械学習、特にCNNについて研究されている。本研究は, 臨床上重要な領域を同定し, 皮膚科医に効果的な画像解釈と診断信頼性を高めるためのセグメンテーション戦略を提案する。このアプローチは皮膚科の診断と治療を進めることを約束する。

Reflectance Confocal Microscopy (RCM) is a non-invasive imaging technique used in biomedical research and clinical dermatology. It provides virtual high-resolution images of the skin and superficial tissues, reducing the need for physical biopsies. RCM employs a laser light source to illuminate the tissue, capturing the reflected light to generate detailed images of microscopic structures at various depths. Recent studies explored AI and machine learning, particularly CNNs, for analyzing RCM images. Our study proposes a segmentation strategy based on textural features to identify clinically significant regions, empowering dermatologists in effective image interpretation and boosting diagnostic confidence. This approach promises to advance dermatological diagnosis and treatment.

翻訳日:2024-04-26 18:12:20 公開日:2024-04-24

# 適応量子回路を用いた行列積状態の定数深さ準備

Constant-depth preparation of matrix product states with adaptive quantum circuits ( http://arxiv.org/abs/2404.16083v1 )

ライセンス: Link先を確認

Kevin C. Smith, Abid Khan, Bryan K. Clark, S. M. Girvin, Tzu-Chieh Wei,

(参考訳) 局所的なユニタリゲート、ミッドサーキット測定、フィードフォワード演算を組み合わせた適応量子回路は、特に浅い深さの回路に制限された短期量子デバイスにおいて、効率的な状態準備のための有望な経路として最近登場した。行列積状態 (MPS) は多体交絡状態の重要なクラスを構成し、一次元のギャップを持つ局所ハミルトニアンの基底状態を効率的に記述し、近年の多くの量子アルゴリズムにおける応用を見つける。近年、MPSのパラダイム的な例であるAKLT状態は、非ゼロ相関長(Smith et al , PRX Quantum 4, 020315 (2023))による局所的なユニタリゲートの適応量子回路で正確に準備できることが示されている。本研究は,本手法の範囲を広くし,一元回路のみに依存する最適準備プロトコルよりも高い精度で,多種多様なMPSを一定深度適応量子回路で正確に作成できることを実証する。このクラスは、短距離および長距離の絡み合ったMPS、対称性保護トポロジカル(SPT)および対称性破壊状態、有限アベリア、非アベリアおよび連続対称性を持つMPS、MBQCの資源状態、調整可能な相関長を持つ状態の族を含むことを示す。さらに、ランダムMPSや特定のSPTフェーズでMPSを生成するような、一定の深さのサンプリングプロトコルを設計するためのフレームワークの有用性について述べる。我々は、特定のMPSが一定時間で準備できる十分な条件を示し、グローバルなオンサイト対称性が中心的な役割を果たす。この研究は、多体絡み合った状態を効率的に準備するための適応量子回路の膨大な可能性を実証し、既知のプロトコルより優れた明示的なアルゴリズムを提供し、重要な種類の状態を作成する。

Adaptive quantum circuits, which combine local unitary gates, midcircuit measurements, and feedforward operations, have recently emerged as a promising avenue for efficient state preparation, particularly on near-term quantum devices limited to shallow-depth circuits. Matrix product states (MPS) comprise a significant class of many-body entangled states, efficiently describing the ground states of one-dimensional gapped local Hamiltonians and finding applications in a number of recent quantum algorithms. Recently, it was shown that the AKLT state -- a paradigmatic example of an MPS -- can be exactly prepared with an adaptive quantum circuit of constant-depth, an impossible feat with local unitary gates due to its nonzero correlation length [Smith et al., PRX Quantum 4, 020315 (2023)]. In this work, we broaden the scope of this approach and demonstrate that a diverse class of MPS can be exactly prepared using constant-depth adaptive quantum circuits, outperforming optimal preparation protocols that rely on unitary circuits alone. We show that this class includes short- and long-ranged entangled MPS, symmetry-protected topological (SPT) and symmetry-broken states, MPS with finite Abelian, non-Abelian, and continuous symmetries, resource states for MBQC, and families of states with tunable correlation length. Moreover, we illustrate the utility of our framework for designing constant-depth sampling protocols, such as for random MPS or for generating MPS in a particular SPT phase. We present sufficient conditions for particular MPS to be preparable in constant time, with global on-site symmetry playing a pivotal role. Altogether, this work demonstrates the immense promise of adaptive quantum circuits for efficiently preparing many-body entangled states and provides explicit algorithms that outperform known protocols to prepare an essential class of states.

翻訳日:2024-04-26 18:12:20 公開日:2024-04-24

# 確率量子制御の統計力学:$d$-adic Rényi回路

Statistical Mechanics of Stochastic Quantum Control: $d$-adic Rényi Circuits ( http://arxiv.org/abs/2404.16087v1 )

ライセンス: Link先を確認

Andrew A. Allocca, Conner LeMaire, Thomas Iadecola, Justin H. Wilson,

(参考訳) ヒルベルト空間次元が大きい多体系における量子情報の力学は、効果的な統計力学モデルの観点からの啓蒙的な記述を許容する。この事実に動機付けられて、古典的にカオス的な$d$-adic R\'{e}nyi写像と確率的制御、この写像の量子的類似物、ランダムグラフ上のポッツモデルという3つの異なるモデルの間の関係を明らかにする。古典的モデルとその量子アナログは、システムを順序付けしようとするランダムに応用された制御マップによって駆動されるカオスと制御された位相間の遷移を共有する。量子モデルでは、制御マップは、深夜定常状態の絡み合い内容の相転移を同時に駆動する測定を必要とする。制御と絡み合いの遷移の相互作用を探索するため、量子モデルから有効なポッツモデルを導出し、両方の遷移を目撃する情報理論量の探索に利用する。絡み合い遷移は、他の測定誘起相転移と一致し、ボンド-パーコレーション普遍性クラスに属するが、制御遷移は古典的なランダムウォークによって支配される。これら2つの相転移はモデルパラメータの関数として融合し、以前の量子モデルの小さな数値的研究で観測された挙動と一致する。

The dynamics of quantum information in many-body systems with large onsite Hilbert space dimension admits an enlightening description in terms of effective statistical mechanics models. Motivated by this fact, we reveal a connection between three separate models: the classically chaotic $d$-adic R\'{e}nyi map with stochastic control, a quantum analog of this map for qudits, and a Potts model on a random graph. The classical model and its quantum analog share a transition between chaotic and controlled phases, driven by a randomly applied control map that attempts to order the system. In the quantum model, the control map necessitates measurements that concurrently drive a phase transition in the entanglement content of the late-time steady state. To explore the interplay of the control and entanglement transitions, we derive an effective Potts model from the quantum model and use it to probe information-theoretic quantities that witness both transitions. The entanglement transition is found to be in the bond-percolation universality class, consistent with other measurement-induced phase transitions, while the control transition is governed by a classical random walk. These two phase transitions merge as a function of model parameters, consistent with behavior observed in previous small-size numerical studies of the quantum model.

翻訳日:2024-04-26 18:12:20 公開日:2024-04-24

# J_1$-$J_2$鎖の臨界理論と積分した行列積状態経路からの一般化されたハルデン写像

A Generalised Haldane Map from the Matrix Product State Path Integral to the Critical Theory of the $J_1$-$J_2$ Chain ( http://arxiv.org/abs/2404.16088v1 )

ライセンス: Link先を確認

F. Azad, Adam J. McRoberts, Chris Hooley, A. G. Green,

(参考訳) 行列積状態 (MPS) 上に構築された経路積分を用いて, J_1$-$J_2$ spin-$1/2$ 鎖について検討した。非自明な絡み合い構造により、MPSアンザッツは半古典的、サドル点レベルでもモデルの鍵位相を捉え、変分状態として、アーベルボゾン化によって得られる場の理論とよく一致する。半古典的なレベルを超えて、MPSアンザッツは臨界相の場理論の物理的動機付けによる導出を促進することを示し、連続極限(ハルデン写像の一般化)を慎重に取り込むことで、MPSパスから正しい位相項を持つ場理論と創発的な$SO(4)$対称性を積分し、顕微鏡状態と位相場理論構造を包含する。さらに、二量体遷移は、特にMPSの定式化において明らかであり、明示的な二量体ポテンシャルが関連し、磁気的ゆらぎを逸脱する。

We study the $J_1$-$J_2$ spin-$1/2$ chain using a path integral constructed over matrix product states (MPS). By virtue of its non-trivial entanglement structure, the MPS ansatz captures the key phases of the model even at a semi-classical, saddle-point level, and, as a variational state, is in good agreement with the field theory obtained by abelian bosonisation. Going beyond the semi-classical level, we show that the MPS ansatz facilitates a physically-motivated derivation of the field theory of the critical phase: by carefully taking the continuum limit -- a generalisation of the Haldane map -- we recover from the MPS path integral a field theory with the correct topological term and emergent $SO(4)$ symmetry, constructively linking the microscopic states and topological field-theoretic structures. Moreover, the dimerisation transition is particularly clear in the MPS formulation -- an explicit dimerisation potential becomes relevant, gapping out the magnetic fluctuations.

翻訳日:2024-04-26 18:12:20 公開日:2024-04-24

# 測定誘起遷移近傍の長距離多部絡み合い

Long-range multipartite entanglement near measurement-induced transitions ( http://arxiv.org/abs/2404.16095v1 )

ライセンス: Link先を確認

Sebastien J Avakian, T. Pereg-Barnea, William Witczak-Krempa,

(参考訳) 測定は量子システムに大きな影響を与え、平衡から新しい物質の状態を作るのに使用できる。ここでは、ユニタリと測定を含む量子回路に現れる多粒子絡み構造について検討する。測定値とユニタリ進化のバランスが,非監視系よりもはるかに広い距離に分散し,通常の絡み合いの運命を回避できることを示す。本研究では,分散グラフに基づくグラフィカル表現を導入し,一般的な部分領域に対する真のマルチパート・エンタングルメントの進化を推測する。 1次元計測によって誘起される動的相転移を実現する回路について,本研究で得られた知見を例証する。 2件と4件のケースも例によってカバーされている。最後に,我々のアプローチが量子回路とアーキテクチャの幅広いクラスに対して,絡み合いのダイナミクスに関する基本的な知見を提供する方法について論じる。

Measurements profoundly impact quantum systems, and can be used to create new states of matter out of equilibrium. Here, we investigate the multipartite entanglement structure that emerges in quantum circuits involving unitaries and measurements. We describe how a balance between measurements and unitary evolution can lead to multipartite entanglement spreading to distances far greater than what is found in non-monitored systems, thus evading the usual fate of entanglement. We introduce a graphical representation based on spanning graphs that allows to infer the evolution of genuine multipartite entanglement for general subregions. We exemplify our findings on circuits that realize a 1d measurement-induced dynamical phase transition, where we find genuine 3-party entanglement at all separations. The 2- and 4-party cases are also covered with examples. Finally, we discuss how our approach can provide fundamental insights regarding entanglement dynamics for a wide class of quantum circuits and architectures.

翻訳日:2024-04-26 18:12:20 公開日:2024-04-24

# 半古典ブラックホールの相対的状態カウント

Relative state-counting for semiclassical black holes ( http://arxiv.org/abs/2404.16098v1 )

ライセンス: Link先を確認

Chris Akers, Jonathan Sorce,

(参考訳) 摂動量子重力の特定の状態間のエントロピー差は、紫外線の完了を指定せずに計算できることが示されている。これは、エントロピー差が定義されるが絶対エントロピーが定義されない古典的な統計力学の状況と類似している。しかし、古典的な統計力学とは異なり、摂動量子重力で計算されるエントロピー差は明確な物理的解釈を持っていない。ここでは、エントロピー差を状態の相対的数え上げと解釈できる摂動ブラックホール状態の族を構築する。概念的には、この論文は固定されたブラックホール背景の質量ゆらぎの代数から始まり、これはI型代数であるが、これは因子ではなく、従ってエントロピーの標準的定義を持たないことを指摘している。以前の研究と同様に、質量ゆらぎと量子物質を結合することは、エントロピー差(絶対エントロピーではない)がよく定義されるタイプIIに質量代数を埋め込む。すると、質量ゆらぎのマイクロカノニカル波動関数の場合、タイプIIエントロピー差はゲージ不変ユニタリを用いて1つのマイクロカノニカル窓をもう1つにマップするのに必要となる余剰ヒルベルト空間の次元の対数と等しいことが示される。この論文は、フォン・ノイマンのエントロピー差は物理的解釈を持たないが、「ワンショット」エントロピー差は成立する、より一般的な状態のクラスにおけるタイプIIエントロピー差に関するコメントで締めくくっている。

It has been shown that entropy differences between certain states of perturbative quantum gravity can be computed without specifying an ultraviolet completion. This is analogous to the situation in classical statistical mechanics, where entropy differences are defined but absolute entropy is not. Unlike in classical statistical mechanics, however, the entropy differences computed in perturbative quantum gravity do not have a clear physical interpretation. Here we construct a family of perturbative black hole states for which the entropy difference can be interpreted as a relative counting of states. Conceptually, this paper begins with the algebra of mass fluctuations around a fixed black hole background, and points out that while this is a type I algebra, it is not a factor and therefore has no canonical definition of entropy. As in previous work, coupling the mass fluctuations to quantum matter embeds the mass algebra within a type II factor, in which entropy differences (but not absolute entropies) are well defined. It is then shown that for microcanonical wavefunctions of mass fluctuation, the type II entropy difference equals the logarithm of the dimension of the extra Hilbert space that is needed to map one microcanonical window to another using gauge-invariant unitaries. The paper closes with comments on type II entropy difference in a more general class of states, where the von Neumann entropy difference does not have a physical interpretation, but "one-shot" entropy differences do.

翻訳日:2024-04-26 18:12:20 公開日:2024-04-24

# 多変量忠実度

Multivariate Fidelities ( http://arxiv.org/abs/2404.16101v1 )

ライセンス: Link先を確認

Theshani Nuradha, Hemant K. Mishra, Felix Leditzky, Mark M. Wilde,

(参考訳) 本稿の主な貢献は、多くの多変量量子忠実度を導入し、ウルマンとホレヴォの忠実度を自然に拡張したいくつかの望ましい性質を満たすことを示すことである。本稿では,平均対対数$z$-忠実度,多変量半定値プログラミング(SDP)忠実度,および既存の秘密度尺度に着想を得た多変量有限性という3つの変種を提案する。 2つ目は、ウルマン忠実度のSDP定式化を2つ以上の状態に拡張することで得られる。これら3つの変種はすべて以下の性質を満たす。 (i)通勤国家における古典的忠実度の多変量化 (ii)データ処理の不平等三国家の順応による不変性 (iv)その値が$[0,1]$の間隔にあること、すなわち、それらの値が1に等しいこと、そして全ての状態が等しいこと、そしてそれらの値が0に等しいこと、そして状態が互いに直交している場合に限り、その値が0に等しいこと。 (v)直属財産 (vi)関節腔、及び (vii)一様連続性は一定の条件下で有界である。さらに、これらの異なる変種に関連する不等式を確立し、これらすべての定義が可換状態の平均対忠実度と一致することを明確にする。最後に、多変量対ユークリッドフィディリティという別の多変量体を導入し、これはマツシタ多変量体フィディリティの量子一般化である。また、上述の望ましい性質のほとんどを満足し、多変量対ユークリッド発散の関数であり、任意に変化するヌル仮説を持つ量子仮説検定の操作的解釈を持つことを示した。

The main contribution of our paper is to introduce a number of multivariate quantum fidelities and show that they satisfy several desirable properties that are natural extensions of those of the Uhlmann and Holevo fidelities. We propose three variants that reduce to the average pairwise fidelity for commuting states: average pairwise $z$-fidelities, the multivariate semi-definite programming (SDP) fidelity, and a multivariate fidelity inspired by an existing secrecy measure. The second one is obtained by extending the SDP formulation of the Uhlmann fidelity to more than two states. All three of these variants satisfy the following properties: (i) reduction to multivariate classical fidelities for commuting states, (ii) the data-processing inequality, (iii) invariance under permutations of the states, (iv) its values are in the interval $[0,1]$; they are faithful, that is, their values are equal to one if and only if all the states are equal, and they satisfy orthogonality, that is their values are equal to zero if and only if the states are mutually orthogonal to each other, (v) direct-sum property, (vi) joint concavity, and (vii) uniform continuity bounds under certain conditions. Furthermore, we establish inequalities relating these different variants, indeed clarifying that all these definitions coincide with the average pairwise fidelity for commuting states. Lastly, we introduce another multivariate fidelity called multivariate log-Euclidean fidelity, which is a quantum generalization of the Matusita multivariate fidelity. We also show that it satisfies most of the desirable properties listed above, it is a function of a multivariate log-Euclidean divergence, and has an operational interpretation in terms of quantum hypothesis testing with an arbitrarily varying null hypothesis.

翻訳日:2024-04-26 18:12:20 公開日:2024-04-24

# 日仏音声メディアにおけるジェンダーと年齢間の音声の進化 : ダイアクロニック・パースペクティブ

Evolution of Voices in French Audiovisual Media Across Genders and Age in a Diachronic Perspective ( http://arxiv.org/abs/2404.16104v1 )

ライセンス: Link先を確認

Albert Rilliard, David Doukhan, Rémi Uro, Simon Devauchelle,

(参考訳) 本稿では,フランスのメディアアーカイブから1023人の話者の声のダイアクロニック音響解析を行った。話者は、4つの期間(1955/56年、1975/76年、1995/96年、2015/16年)、4つの年齢グループ(20-35年、36-50年、51-65年、65年)と2つの性別に基づいて32のカテゴリーに分散している。基本周波数(F_0$)と第14フォルマント(F1-4)を推定した。不均一なデータに対するこれらの推定の質を保証するために用いられる手順について述べる。各話者の$F_0$分布から、ベース-$F_0$値を計算してレジスタを推定した。ホルマント周波数から平均声道長を推定した。 Base-$F_0$と声道長を線形混合モデルに適合させ,年齢効果を補正した。その結果,性別によらず,低声化傾向にある期間の影響が示唆された。ピッチの低下は女性の年齢とともに観察されるが、男性話者は観察されない。

We present a diachronic acoustic analysis of the voice of 1023 speakers from French media archives. The speakers are spread across 32 categories based on four periods (years 1955/56, 1975/76, 1995/96, 2015/16), four age groups (20-35; 36-50; 51-65, >65), and two genders. The fundamental frequency ($F_0$) and the first four formants (F1-4) were estimated. Procedures used to ensure the quality of these estimations on heterogeneous data are described. From each speaker's $F_0$ distribution, the base-$F_0$ value was calculated to estimate the register. Average vocal tract length was estimated from formant frequencies. Base-$F_0$ and vocal tract length were fit by linear mixed models to evaluate how they may have changed across time periods and genders, corrected for age effects. Results show an effect of the period with a tendency to lower voices, independently of gender. A lowering of pitch is observed with age for female but not male speakers.

翻訳日:2024-04-26 18:12:20 公開日:2024-04-24

# 時間ビン符号化フォトニック量子情報プロトコルのロバストなアプローチ

A robust approach for time-bin encoded photonic quantum information protocols ( http://arxiv.org/abs/2404.16106v1 )

ライセンス: Link先を確認

Simon J. U. White, Emanuele Polino, Farzad Ghafari, Dominick J. Joch, Luis Villegas-Aguilar, Lynden K. Shalm, Varun B. Verma, Marcus Huber, Nora Tischler,

(参考訳) 光子の時間2自由度で符号化された量子状態は、量子情報プロトコルの基本的なリソースである。従来の時間ビン符号化量子状態の生成と測定方法は、光学的不安定性、複雑な設定、時間分解能の要求により深刻な課題に直面している。ここでは、香港・ウー・マンデル干渉に基づく堅牢なアプローチを活用し、これらの問題を回避できる。まず、短時間の時間分離を伴う時間ビン量子ビットの高忠実な量子状態トモグラフィーを行う。そして,非古典性試験により,単一光子の系内偏光時間絡みを認証する。最後に,単一空間モードにおける高次元時間ビン量子状態の生成と測定を行う,堅牢でスケーラブルなプロトコルを提案する。このプロトコルは、標準的なスキームでは事実上アクセスできない高次元の状態やタスクへのアクセスを可能にし、基本的な量子情報科学を前進させ、量子通信の応用を開放することを約束している。

Quantum states encoded in the time-bin degree of freedom of photons represent a fundamental resource for quantum information protocols. Traditional methods for generating and measuring time-bin encoded quantum states face severe challenges due to optical instabilities, complex setups, and timing resolution requirements. Here, we leverage a robust approach based on Hong-Ou-Mandel interference that allows us to circumvent these issues. First, we perform high-fidelity quantum state tomographies of time-bin qubits with a short temporal separation. Then, we certify intrasystem polarization-time entanglement of single photons through a nonclassicality test. Finally, we propose a robust and scalable protocol to generate and measure high-dimensional time-bin quantum states in a single spatial mode. The protocol promises to enable access to high-dimensional states and tasks that are practically inaccessible with standard schemes, thereby advancing fundamental quantum information science and opening applications in quantum communication.

翻訳日:2024-04-26 18:12:20 公開日:2024-04-24

# 量子プロセッサにおける非安定化性の証明

Certifying nonstabilizerness in quantum processors ( http://arxiv.org/abs/2404.16107v1 )

ライセンス: Link先を確認

Rafael Wagner, Filipa C. R. Peres, Emmanuel Zambrini Cruzeiro, Ernesto F. Galvão,

(参考訳) 非安定化器性(英: Nonstabilizerness)またはマジック(英: magic)は、量子計算において重要な資源である。量子処理ユニット(QPU)の複雑さの増大は、このリソースを特徴づけるために堅牢でスケーラブルな技術を必要とする。集合の集合が、集合の少なくとも1つの状態が非安定化状態である場合、この性質を持つ。我々は、最近、基底非依存コヒーレンス(英語版)の証人として導入されたある二状態重なり合う不等式が、多重量子集合マジックの証人であることを示した。また、複数のQPUにまたがる魔法の存在を、互いに絡み合うことなく証明し、個々のQPUに対する要求を減らすことが可能であることを示す。

Nonstabilizerness, also known as magic, is a crucial resource for quantum computation. The growth in complexity of quantum processing units (QPUs) demands robust and scalable techniques for characterizing this resource. We introduce the notion of set magic: a set of states has this property if at least one state in the set is a non-stabilizer state. We show that certain two-state overlap inequalities, recently introduced as witnesses of basis-independent coherence, are also witnesses of multi-qubit set magic. We also show it is possible to certify the presence of magic across multiple QPUs without the need for entanglement between them and reducing the demands on each individual QPU.

翻訳日:2024-04-26 18:12:20 公開日:2024-04-24

# zkLLM: 大規模言語モデルのためのゼロ知識証明

zkLLM: Zero Knowledge Proofs for Large Language Models ( http://arxiv.org/abs/2404.16109v1 )

ライセンス: Link先を確認

Haochen Sun, Jason Li, Hongyang Zhang,

(参考訳) 近年の人工知能(AI)の急激な増加は、大言語モデル(LLM)の卓越によって特徴づけられ、世界中に根本的変化をもたらした。しかし、これらの進歩とともに、LSMの正当性をめぐる懸念が高まり、その広範な応用に法的課題が生じた。これらの懸念を複雑にし、LSMのパラメータはしばしば知的財産として扱われ、直接の調査が制限される。本研究では,LLMが生成するアウトプットの信頼性を確立することの必要性という,AI法制の領域における根本的な課題に対処する。この問題に対処するため、我々は LLM に最適化された初歩的なゼロ知識証明である zkLLM を提案する。ディープラーニングにおける非算術的操作の永続的課題に対処するため,我々は,非算術的テンソル操作のための並列化されたルックアップ引数であるtlookupを導入し,漸近的オーバーヘッドのないソリューションを提案する。さらに、tlookupの基礎を生かして、注意機構のための特別なゼロ知識証明であるzkAttnを導入し、実行時間、メモリ使用量、正確性について慎重に検討する。完全に並列化されたCUDAの実装によって、zkLLMはLLM上で効率的なゼロ知識検証計算を実現するための重要な一歩として現れます。注目すべきは、LLMが13億のパラメータを持つ場合、我々の手法は推論プロセス全体の正当性証明を15分以内で生成できるということである。結果として得られた証明は、200kB未満のコンパクトなサイズで、モデルのパラメータのプライバシを保ち、不注意な情報漏洩を確実にするように設計されている。

The recent surge in artificial intelligence (AI), characterized by the prominence of large language models (LLMs), has ushered in fundamental transformations across the globe. However, alongside these advancements, concerns surrounding the legitimacy of LLMs have grown, posing legal challenges to their extensive applications. Compounding these concerns, the parameters of LLMs are often treated as intellectual property, restricting direct investigations. In this study, we address a fundamental challenge within the realm of AI legislation: the need to establish the authenticity of outputs generated by LLMs. To tackle this issue, we present zkLLM, which stands as the inaugural specialized zero-knowledge proof tailored for LLMs to the best of our knowledge. Addressing the persistent challenge of non-arithmetic operations in deep learning, we introduce tlookup, a parallelized lookup argument designed for non-arithmetic tensor operations in deep learning, offering a solution with no asymptotic overhead. Furthermore, leveraging the foundation of tlookup, we introduce zkAttn, a specialized zero-knowledge proof crafted for the attention mechanism, carefully balancing considerations of running time, memory usage, and accuracy. Empowered by our fully parallelized CUDA implementation, zkLLM emerges as a significant stride towards achieving efficient zero-knowledge verifiable computations over LLMs. Remarkably, for LLMs boasting 13 billion parameters, our approach enables the generation of a correctness proof for the entire inference process in under 15 minutes. The resulting proof, compactly sized at less than 200 kB, is designed to uphold the privacy of the model parameters, ensuring no inadvertent information leakage.

翻訳日:2024-04-26 18:12:20 公開日:2024-04-24

# Mamba-360:Long Sequence Modellingに代わる変圧器としての状態空間モデルの調査:方法、応用、課題

Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges ( http://arxiv.org/abs/2404.16112v1 )

ライセンス: Link先を確認

Badri Narayana Patro, Vijay Srinivas Agneeswaran,

(参考訳) シーケンスモデリングは自然言語処理(NLP)、音声認識、時系列予測、音楽生成、バイオインフォマティクスなど、さまざまな分野において重要な領域である。 Recurrent Neural Networks(RNN)とLong Short Term Memory Networks(LSTM)は歴史的に、機械翻訳、名前付きエンティティ認識(NER)といったシーケンスモデリングタスクを支配してきた。しかし、変圧器の進歩は、優れた性能を考えれば、このパラダイムの変化につながっている。しかし、変換器は$O(N^2)$注目の複雑さと帰納バイアスを扱う際の課題に悩まされる。スペクトルネットワークや畳み込みを使い、様々なタスクでうまく機能するこれらの問題に対処するために、いくつかのバリエーションが提案されている。しかし、それらは長いシーケンスを扱うのに依然として困難である。状態空間モデル(SSM)は、特にS4の出現や、S4nd、Hippo、Hyena、Diagnol State Spaces(DSS)、Gated State Spaces(GSS)、LRU、Liquid-S4、Mambaなどの変種と共に、この文脈におけるシーケンスモデリングパラダイムの有望な代替品として登場した。本稿では,3つのパラダイム,すなわちゲーティングアーキテクチャ,構造アーキテクチャ,リカレントアーキテクチャに基づいて,基本的なSSMを分類する。この調査ではまた、視覚、ビデオ、音声、音声、言語(特に長いシーケンスモデリング)、医学(ゲノミクスを含む)、化学(薬物設計のような)、レコメンデーションシステム、および表データを含む時系列分析など、さまざまな領域におけるSSMの応用についても強調した。さらに,Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2などのベンチマークデータセットと,Breakfast, COIN, LVU, および各種時系列データセットのSSMの性能を集約した。 Mamba-360のプロジェクトページは、このWebページにある。 https://github.com/badripatro/mamba360}。

Sequence modeling is a crucial area across various domains, including Natural Language Processing (NLP), speech recognition, time series forecasting, music generation, and bioinformatics. Recurrent Neural Networks (RNNs) and Long Short Term Memory Networks (LSTMs) have historically dominated sequence modeling tasks like Machine Translation, Named Entity Recognition (NER), etc. However, the advancement of transformers has led to a shift in this paradigm, given their superior performance. Yet, transformers suffer from $O(N^2)$ attention complexity and challenges in handling inductive bias. Several variations have been proposed to address these issues which use spectral networks or convolutions and have performed well on a range of tasks. However, they still have difficulty in dealing with long sequences. State Space Models(SSMs) have emerged as promising alternatives for sequence modeling paradigms in this context, especially with the advent of S4 and its variants, such as S4nd, Hippo, Hyena, Diagnol State Spaces (DSS), Gated State Spaces (GSS), Linear Recurrent Unit (LRU), Liquid-S4, Mamba, etc. In this survey, we categorize the foundational SSMs based on three paradigms namely, Gating architectures, Structural architectures, and Recurrent architectures. This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data. Moreover, we consolidate the performance of SSMs on benchmark datasets like Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2, as well as video datasets such as Breakfast, COIN, LVU, and various time series datasets. The project page for Mamba-360 work is available on this webpage.\url{https://github.com/badripatro/mamba360}.

翻訳日:2024-04-26 18:12:20 公開日:2024-04-24

# ニューラルバンドを用いたWhite-box LLMのオンラインパーソナライズ

Online Personalizing White-box LLMs Generation with Neural Bandits ( http://arxiv.org/abs/2404.16115v1 )

ライセンス: Link先を確認

Zekai Chen, Weeden Daniel, Po-yu Chen, Francois Buet-Golfouse,

(参考訳) LLMによるパーソナライズされたコンテンツ生成の出現は、ユーザ毎にユニークなモデルを作成するという持続不可能な要求を伴わずに、個々の嗜好を満たすためにテキストを効率的に適応する方法という、新しい課題を提示している。本研究では,ユーザフィードバックに基づくソフトインストラクション埋め込みを動的に最適化するために,ニューラルバンディットアルゴリズムを用いた革新的なオンライン手法を導入し,ホワイトボックスLLMによるオープンエンドテキスト生成のパーソナライズを強化した。各種タスクの厳密な実験を通じて,ベースライン戦略よりも優れた性能を示す。特にNeuralTSは、パーソナライズされたニュースの見出し生成を大幅に改善し、最高のROUGEスコアの62.9%の改善と、ベースラインに対するLLMエージェント評価の2.76%向上を実現している。

The advent of personalized content generation by LLMs presents a novel challenge: how to efficiently adapt text to meet individual preferences without the unsustainable demand of creating a unique model for each user. This study introduces an innovative online method that employs neural bandit algorithms to dynamically optimize soft instruction embeddings based on user feedback, enhancing the personalization of open-ended text generation by white-box LLMs. Through rigorous experimentation on various tasks, we demonstrate significant performance improvements over baseline strategies. NeuralTS, in particular, leads to substantial enhancements in personalized news headline generation, achieving up to a 62.9% improvement in terms of best ROUGE scores and up to 2.76% increase in LLM-agent evaluation against the baseline.

翻訳日:2024-04-26 18:12:20 公開日:2024-04-24

# ソーシャルメディアにおける人間生成とAI生成の選挙宣言の分類

Classifying Human-Generated and AI-Generated Election Claims in Social Media ( http://arxiv.org/abs/2404.16116v1 )

ライセンス: Link先を確認

Alphaeus Dmonte, Marcos Zampieri, Kevin Lybarger, Massimiliano Albanese,

(参考訳) 政治は、ソーシャルメディアプラットフォーム上で議論される最も一般的なトピックの1つであり、特に主要な選挙サイクルでは、ユーザーが候補者や選挙プロセスについて会話する。悪意ある俳優はこの機会を利用して誤報を広め、選挙プロセスへの信頼を損なうかもしれない。 LLM(Large Language Models)の出現は、悪質なアクターが前例のない規模で誤情報を生成できるようにすることによって、この問題を悪化させる。人工知能(AI)が生成するコンテンツは、真正なユーザーコンテンツとは区別できないことが多く、ソーシャルネットワーク上の情報の完全性に関する懸念を提起する。本稿では,選挙関連主張を特徴付ける新しい分類法を提案する。この分類法は、司法、機器、プロセス、およびクレームの性質に関する粒度のカテゴリを含む選挙関連のクレームを分析するための手段を提供する。 ElectAIは9,900のツイートからなる新しいベンチマークデータセットで、それぞれが人間またはAI生成とラベル付けされている。 AI生成ツイートでは、生成した特定のLLM変種が指定される。我々は提案した分類法を用いて1,550のツイートのサブセットに注釈を付け、選挙関連クレームの特徴を捉えた。分類属性を抽出するLLMの能力について検討し、ElectAIを用いて機械学習モデルを訓練し、人間とAIが生成するポストを識別し、特定のLLM変種を特定する。

Politics is one of the most prevalent topics discussed on social media platforms, particularly during major election cycles, where users engage in conversations about candidates and electoral processes. Malicious actors may use this opportunity to disseminate misinformation to undermine trust in the electoral process. The emergence of Large Language Models (LLMs) exacerbates this issue by enabling malicious actors to generate misinformation at an unprecedented scale. Artificial intelligence (AI)-generated content is often indistinguishable from authentic user content, raising concerns about the integrity of information on social networks. In this paper, we present a novel taxonomy for characterizing election-related claims. This taxonomy provides an instrument for analyzing election-related claims, with granular categories related to jurisdiction, equipment, processes, and the nature of claims. We introduce ElectAI, a novel benchmark dataset that consists of 9,900 tweets, each labeled as human- or AI-generated. For AI-generated tweets, the specific LLM variant that produced them is specified. We annotated a subset of 1,550 tweets using the proposed taxonomy to capture the characteristics of election-related claims. We explored the capabilities of LLMs in extracting the taxonomy attributes and trained various machine learning models using ElectAI to distinguish between human- and AI-generated posts and identify the specific LLM variant.

翻訳日:2024-04-26 18:12:20 公開日:2024-04-24

# 極性Bluetooth低エネルギー心拍センサのサイバーセキュリティ評価

Cybersecurity Assessment of the Polar Bluetooth Low Energy Heart-rate Sensor ( http://arxiv.org/abs/2404.16117v1 )

ライセンス: Link先を確認

Smone Soderi,

(参考訳) ウェアラブルとインプラント可能なデバイス間の無線通信は、人体を取り巻く情報交換を実装している。無線ボディエリアネットワーク(WBAN)技術は、日常生活における非侵襲的な応用を可能にする。ワイヤレス接続デバイスは多くのサービスの品質を改善し、手順を簡単にする。一方、大きな攻撃面を開き、潜在的なセキュリティ脆弱性を導入する。 Bluetooth Low Energy (BLE) は、無線パーソナルエリアネットワーク(WPAN)で広く使われている低電力プロトコルである。本稿ではBLE心拍センサのセキュリティ脆弱性を解析する。受信信号強度指標(RSSI)の変動を観測することにより、BLE接続における異常を検出することができる。ケーススタディは、アタッカーがモバイルアプリとBLEデバイス間で送信されたデータを簡単にインターセプトし、操作できることを示しています。この研究により、ワイヤレスボディセンサーから受信できる心拍情報のセキュリティについて、著者は認識を深めるでしょう。

Wireless communications among wearable and implantable devices implement the information exchange around the human body. Wireless body area network (WBAN) technology enables non-invasive applications in our daily lives. Wireless connected devices improve the quality of many services, and they make procedures easier. On the other hand, they open up large attack surfaces and introduces potential security vulnerabilities. Bluetooth low energy (BLE) is a low-power protocol widely used in wireless personal area networks (WPANs). This paper analyzes the security vulnerabilities of a BLE heart-rate sensor. By observing the received signal strength indicator (RSSI) variations, it is possible to detect anomalies in the BLE connection. The case-study shows that an attacker can easily intercept and manipulate the data transmitted between the mobile app and the BLE device. With this research, the author would raise awareness about the security of the heart-rate information that we can receive from our wireless body sensors.

翻訳日:2024-04-26 18:02:26 公開日:2024-04-24

# ハネトケン発生器としての行為! 大規模言語モデルを用いたハネトケン生成の検討

Act as a Honeytoken Generator! An Investigation into Honeytoken Generation with Large Language Models ( http://arxiv.org/abs/2404.16118v1 )

ライセンス: Link先を確認

Daniel Reti, Norman Becker, Tillmann Angeli, Anasuya Chattopadhyay, Daniel Schneider, Sebastian Vollmer, Hans D. Schotten,

(参考訳) セキュリティインシデントの増加に伴い、偽装ベースの防衛戦略の採用はサイバーセキュリティにおいて重要な役割を担っている。この研究は、このような防御機構の重要な構成要素であるハネトケンの設計におけるスケーラビリティの課題に対処する。ハネトケンのマニュアル作成は面倒な作業である。自動生成装置は存在するが、汎用性に欠けることが多く、特定の種類のハネトケンに特化しており、適切なトレーニングデータセットに大きく依存している。これらの制約を克服するために、この研究は大規模言語モデル(LLM)を用いて様々なハニトケンを作成するアプローチを体系的に研究する。設定ファイル、データベース、ログファイルなど、この作業で作成された7種類のハネトケンタイプのうち、最適なプロンプトを評価するために2つが使用された。ロボット.txtファイルとハニーワードの生成は、16のプロンプトビルディングブロックに基づいて、210の異なるプロンプト構造を体系的にテストするために使用された。さらに、全てのハニトケンは、異なるモデルの様々な性能を評価するために、異なる最先端のLLMで試験された。 1つの LLM 上で最適に実行されるプロンプトは、必ずしも他の LLM に対してうまく一般化するとは限らない。 GPT-3.5で生成されたハニーワードは、従来の自動ハニーワード生成法に比べて、実際のパスワードと区別しにくいことが判明した。全体として、本研究の成果は、ジェネリックLLMが提示されたプロンプト構造を用いて、幅広いハネトケンを生成可能であることを示している。

With the increasing prevalence of security incidents, the adoption of deception-based defense strategies has become pivotal in cyber security. This work addresses the challenge of scalability in designing honeytokens, a key component of such defense mechanisms. The manual creation of honeytokens is a tedious task. Although automated generators exists, they often lack versatility, being specialized for specific types of honeytokens, and heavily rely on suitable training datasets. To overcome these limitations, this work systematically investigates the approach of utilizing Large Language Models (LLMs) to create a variety of honeytokens. Out of the seven different honeytoken types created in this work, such as configuration files, databases, and log files, two were used to evaluate the optimal prompt. The generation of robots.txt files and honeywords was used to systematically test 210 different prompt structures, based on 16 prompt building blocks. Furthermore, all honeytokens were tested across different state-of-the-art LLMs to assess the varying performance of different models. Prompts performing optimally on one LLMs do not necessarily generalize well to another. Honeywords generated by GPT-3.5 were found to be less distinguishable from real passwords compared to previous methods of automated honeyword generation. Overall, the findings of this work demonstrate that generic LLMs are capable of creating a wide array of honeytokens using the presented prompt structures.

翻訳日:2024-04-26 18:02:25 公開日:2024-04-24

# ハイブリッド無線ボディエリアネットワーク(HyWBAN):セマンティックコミュニケーションとジャミング技術の進歩

Securing Hybrid Wireless Body Area Networks (HyWBAN): Advancements in Semantic Communications and Jamming Techniques ( http://arxiv.org/abs/2404.16120v1 )

ライセンス: Link先を確認

Simone Soderi, Mariella Särestöniemi, Syifaul Fuada, Matti Hämäläinen, Marcos Katz, Jari Iinatti,

(参考訳) 本稿では、スマートヘルスケアとIoT(Internet of Things)アプリケーションに不可欠なHybrid Wireless Body Area Networks(HyWBANs)のセキュリティを強化するための新しい戦略を検討する。高度なサイバー攻撃に対するHyWBANの脆弱性を認識し,セマンティックコミュニケーションとジャミングレシーバーの革新的な組み合わせを提案する。この二重層セキュリティ機構は、特にボディ内からオンボディ通信チャネルを含むシナリオにおいて、不正なアクセスやデータ漏洩を防止する。本研究では,生物組織を介したハイブリッド(無線,光)通信の伝搬を理解するために総合的な実験室計測を行い,これらの知見を利用してディープラーニング(DL)モデルを訓練するためのデータセットを洗練する。これらのモデルは次に、ジャミングレシーバーを使用してデータの機密性と整合性を高めるために、暗号鍵にリンクされたセマンティックな概念を生成する。提案モデルでは,特にジャミングを補完する楕円曲線Diffie-Hellman (ECDH) のような従来の暗号手法と比較して,エネルギー消費量の大幅な削減が示されている。われわれのアプローチは, 主要なセキュリティ問題に対処し, 将来安全なバイオメディカル通信システムの基盤となるものとなる。

This paper explores novel strategies to strengthen the security of Hybrid Wireless Body Area Networks (HyWBANs), essential in smart healthcare and Internet of Things (IoT) applications. Recognizing the vulnerability of HyWBAN to sophisticated cyber-attacks, we propose an innovative combination of semantic communications and jamming receivers. This dual-layered security mechanism protects against unauthorized access and data breaches, particularly in scenarios involving in-body to on-body communication channels. We conduct comprehensive laboratory measurements to understand hybrid (radio and optical) communication propagation through biological tissues and utilize these insights to refine a dataset for training a Deep Learning (DL) model. These models, in turn, generate semantic concepts linked to cryptographic keys for enhanced data confidentiality and integrity using a jamming receiver. The proposed model demonstrates a significant reduction in energy consumption compared to traditional cryptographic methods, like Elliptic Curve Diffie-Hellman (ECDH), especially when supplemented with jamming. Our approach addresses the primary security concerns and sets the baseline for future secure biomedical communication systems advancements.

翻訳日:2024-04-26 18:02:25 公開日:2024-04-24

# FairDeDup:セマンティックデータセットの重複における視覚領域の公平性の検出と緩和

FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication ( http://arxiv.org/abs/2404.16123v1 )

ライセンス: Link先を確認

Eric Slyman, Stefan Lee, Scott Cohen, Kushal Kafle,

(参考訳) 最近のデータセット復号化技術は、コンテンツ対応のデータセットプルーニングが、オリジナルのデータセットのトレーニングと比較して、大きなパフォーマンス損失を伴わないビジョンランゲージ事前訓練(VLP)モデルのトレーニングコストを劇的に削減できることを実証している。これらの結果は、Webから収集された一般的に使用されている画像キャプチャデータセットのプルーニングに基づいています。本研究は,これらのモデルにおける重複がこれらのバイアスの頻度にどのように影響するかを評価し,最新のSemDeDupアルゴリズムに容易に実装可能な修正を導入し,観測した負の効果を低減できることを示した。 LAION-400Mの非重複変種に基づいてトレーニングされたCLIPスタイルのモデルを調べると、提案したFairDeDupアルゴリズムは、CLIPベンチマークのゼロショット性能を維持しながら、FairFaceおよびFACETデータセット上のSemDeDup上でのフェアネス指標を継続的に改善することがわかった。

Recent dataset deduplication techniques have demonstrated that content-aware dataset pruning can dramatically reduce the cost of training Vision-Language Pretrained (VLP) models without significant performance losses compared to training on the original dataset. These results have been based on pruning commonly used image-caption datasets collected from the web -- datasets that are known to harbor harmful social biases that may then be codified in trained models. In this work, we evaluate how deduplication affects the prevalence of these biases in the resulting trained models and introduce an easy-to-implement modification to the recent SemDeDup algorithm that can reduce the negative effects that we observe. When examining CLIP-style models trained on deduplicated variants of LAION-400M, we find our proposed FairDeDup algorithm consistently leads to improved fairness metrics over SemDeDup on the FairFace and FACET datasets while maintaining zero-shot performance on CLIP benchmarks.

翻訳日:2024-04-26 18:02:25 公開日:2024-04-24

# インドにおけるWhatsApp for Businessの配置

A Situated-Infrastructuring of WhatsApp for Business in India ( http://arxiv.org/abs/2404.16124v1 )

ライセンス: Link先を確認

Ankolika De,

(参考訳) WhatsAppは、インドにおける重要なコミュニケーションツールとなり、文化的境界を超越し、国のデジタルランドスケープに深く統合されている。 MetaのWhatsApp for Businessの導入は、プラットフォームの人気とシームレスに一致し、ビジネスに重要なツールを提供する。しかし、収益化計画は、特に小規模企業にとって、収益目標とアクセシビリティのバランスをとる上で、課題となる。本研究は、談話分析を用いて、メタのインドにおけるWhatsAppのインフラについて検討し、技術的、社会的、文化的側面の動的相互作用を強調した。その結果、WhatsApp for Businessの展開に伴う潜在的なパワー差が強調され、漸進的ではあるが重要な修正が加えられ、研究者は、特に疎外化されたユーザーにとって、急激な技術的変化の影響と倫理について調査するよう促された。

WhatsApp has become a pivotal communication tool in India, transcending cultural boundaries and deeply integrating into the nation's digital landscape. Meta's introduction of WhatsApp for Business aligns seamlessly with the platform's popularity, offering businesses a crucial tool. However, the monetization plans pose challenges, particularly for smaller businesses, in balancing revenue goals with accessibility. This study, employing discourse analysis, examines Meta's infrastructuring of WhatsApp in India, emphasizing the dynamic interplay of technological, social, and cultural dimensions. Consequently, it highlights potential power differences caused by the deployment of WhatsApp for Business followed by its gradual but significant modifications, encouraging scholars to investigate the implications and ethics of rapid technological changes, particularly for marginalized users.

翻訳日:2024-04-26 18:02:25 公開日:2024-04-24

# 競合リスクの存在下でのEHRデータに対する静的および動的ランダム森林モデルの比較:中央線関連血流感染の予測

Comparison of static and dynamic random forests models for EHR data in the presence of competing risks: predicting central line-associated bloodstream infection ( http://arxiv.org/abs/2404.16127v1 )

ライセンス: Link先を確認

Elena Albu, Shan Gao, Pieter Stijnen, Frank Rademakers, Christel Janssens, Veerle Cossey, Yves Debaveye, Laure Wynants, Ben Van Calster,

(参考訳) 病院の入院に関する予後の結果は、一般的に検閲に苦しめられず、分類的にも時間的にもモデル化できる。競合イベントは一般的だが、しばしば無視される。本研究は無作為林(RF)モデルを用いて中央線関連血液ストリーム感染症(CLABSI)の発症リスクを予測した。 27478例(CLABSI, 1466例, 28426例)を対象とし, 静的および動的RFモデルの構築(CLABSI vs. CLABSI), マルチノミアル(CLABSI, 退院, 退院, 退院, 退院), 生存(CLABSIまで), 競合リスク(CLABSIまでの時間, 退院, 退院), 7日間のCLABSIリスクの予測を行った。列車/テストスプリット100回にわたってモデル性能を評価した。 AUROCはベースライン予測では0.74で、カテーテルエピソードでは5日目の予測では0.78まで上昇し、その後低下した。生存モデルはCLABSI(E:O比1.2から1.6)のリスクを過大評価し、AUROCは他のモデルよりも約0.01低かった。二項モデルと多項モデルでは計算時間が低かった。複数の結果イベントを含むモデル(複数のリスクと競合するリスク)は、バイナリやサバイバルモデルとは異なる内部構造を示す。検閲がない場合、複雑なモデリング選択はCLABSI予測のバイナリモデルと比較して予測性能を著しく改善しない。発生時に競合するイベントを検閲する生存モデルは避けるべきである。

Prognostic outcomes related to hospital admissions typically do not suffer from censoring, and can be modeled either categorically or as time-to-event. Competing events are common but often ignored. We compared the performance of random forest (RF) models to predict the risk of central line-associated bloodstream infections (CLABSI) using different outcome operationalizations. We included data from 27478 admissions to the University Hospitals Leuven, covering 30862 catheter episodes (970 CLABSI, 1466 deaths and 28426 discharges) to build static and dynamic RF models for binary (CLABSI vs no CLABSI), multinomial (CLABSI, discharge, death or no event), survival (time to CLABSI) and competing risks (time to CLABSI, discharge or death) outcomes to predict the 7-day CLABSI risk. We evaluated model performance across 100 train/test splits. Performance of binary, multinomial and competing risks models was similar: AUROC was 0.74 for baseline predictions, rose to 0.78 for predictions at day 5 in the catheter episode, and decreased thereafter. Survival models overestimated the risk of CLABSI (E:O ratios between 1.2 and 1.6), and had AUROCs about 0.01 lower than other models. Binary and multinomial models had lowest computation times. Models including multiple outcome events (multinomial and competing risks) display a different internal structure compared to binary and survival models. In the absence of censoring, complex modelling choices do not considerably improve the predictive performance compared to a binary model for CLABSI prediction in our studied settings. Survival models censoring the competing events at their time of occurrence should be avoided.

翻訳日:2024-04-26 18:02:25 公開日:2024-04-24

# 二進線形符号近傍のビット列上の量子一様重ね合わせを効率的に構築する

Efficiently constructing a quantum uniform superposition over bit strings near a binary linear code ( http://arxiv.org/abs/2404.16129v1 )

ライセンス: Link先を確認

Edward Farhi, Stephen P. Jordan,

(参考訳) 高忠実度近似を、ハミング距離の全てのビット列上の量子重ね合わせである$| \Psi_b \rangle$, 次元のコードワードの$b$を$\mathbb{Z}_2^n$で表し、その値を$n$, $b$, $k$とする量子回路で効率的に構築できることを実証する。我々は、請求を裏付ける$n=1000$で数値実験を行う。達成可能な半径$b$は、既知の古典的アルゴリズムが最も近いコードワードを効率的に見つけることができる距離よりもはるかに大きい。したがって、これらの状態は、文字列に最も近いコードワードを見つけるために計算を必要としない量子畳み込みによって準備することはできない。 $\mathbb{R}^n$ の格子の類似状態とは異なり、$|\Psi_b \rangle$ は有界距離復号には役に立たない。さらに、重複計算を復号化することができる。これらの状態は、他のコードの問題を解決するのに使えるかもしれない。これらの状態を構築するのに使用されるテクニックは興味深く、コードを超えたアプリケーションを提供できることを願っています。

We demonstrate that a high fidelity approximation to $| \Psi_b \rangle$, the quantum superposition over all bit strings within Hamming distance $b$ of the codewords of a dimension-$k$ linear code over $\mathbb{Z}_2^n$, can be efficiently constructed by a quantum circuit for large values of $n$, $b$ and $k$ which we characterize. We do numerical experiments at $n=1000$ which back up our claims. The achievable radius $b$ is much larger than the distance out to which known classical algorithms can efficiently find the nearest codeword. Hence, these states cannot be prepared by quantum constuctions that require uncomputing to find the codeword nearest a string. Unlike the analogous states for lattices in $\mathbb{R}^n$, $|\Psi_b \rangle$ is not a useful resource for bounded distance decoding because the relevant overlap falls off too quickly with distance and known classical algorithms do better. Furthermore the overlap calculation can be dequantized. Perhaps these states could be used to solve other code problems. The technique used to construct these states is of interest and hopefully will have applications beyond codes.

翻訳日:2024-04-26 18:02:25 公開日:2024-04-24

# ローカルからグローバルへ:クエリに焦点をあてた要約へのグラフRAGアプローチ

From Local to Global: A Graph RAG Approach to Query-Focused Summarization ( http://arxiv.org/abs/2404.16130v1 )

ライセンス: Link先を確認

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Jonathan Larson,

(参考訳) 検索強化生成(RAG)を用いて、外部知識ソースから関連情報を検索することで、大規模言語モデル(LLM)が、プライベートおよび/または未確認の文書コレクションに関する質問に答えることができる。しかしながら、RAGは、明示的な検索タスクではなく、クエリ中心の要約(QFS)タスクであるため、データセットの主なテーマは何か? 一方、以前のQFS法は、典型的なRAGシステムによってインデックスされたテキストの量にスケールできない。これらのコントラスト手法の強みを生かしたグラフRAG手法を提案し,ユーザ質問の一般性とインデックスするソーステキスト量の両方をスケールするプライベートテキストコーパスに対する質問応答を提案する。提案手法は LLM を用いてグラフベースのテキストインデックスを2段階に構築する。まず,資料からエンティティ知識グラフを導出し,近縁なエンティティのすべてのグループに対するコミュニティ要約を事前に生成する。質問があると、各コミュニティの要約は部分的な応答を生成するために使用され、その後、すべての部分的な応答はユーザーへの最終応答で再度要約される。 100万のトークン範囲のデータセットに対するグローバルなセンスメイキング質問のクラスについて、グラフRAGは、生成された回答の包括性と多様性の両方に対して、na\\ive RAGベースラインよりも大幅に改善されていることを示す。グローバルとローカルのGraph RAGアプローチのオープンソースでPythonベースの実装がhttps://aka.ms/graphrag.comで公開される予定だ。

The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions over private and/or previously unseen document collections. However, RAG fails on global questions directed at an entire text corpus, such as "What are the main themes in the dataset?", since this is inherently a query-focused summarization (QFS) task, rather than an explicit retrieval task. Prior QFS methods, meanwhile, fail to scale to the quantities of text indexed by typical RAG systems. To combine the strengths of these contrasting methods, we propose a Graph RAG approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text to be indexed. Our approach uses an LLM to build a graph-based text index in two stages: first to derive an entity knowledge graph from the source documents, then to pregenerate community summaries for all groups of closely-related entities. Given a question, each community summary is used to generate a partial response, before all partial responses are again summarized in a final response to the user. For a class of global sensemaking questions over datasets in the 1 million token range, we show that Graph RAG leads to substantial improvements over a na\"ive RAG baseline for both the comprehensiveness and diversity of generated answers. An open-source, Python-based implementation of both global and local Graph RAG approaches is forthcoming at https://aka.ms/graphrag.

翻訳日:2024-04-26 18:02:25 公開日:2024-04-24

# クラスタ削除のための組合せ近似: よりシンプルで、より速く、より良く

Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better ( http://arxiv.org/abs/2404.16131v1 )

ライセンス: Link先を確認

Vicente Balmaseda, Ying Xu, Yixin Cao, Nate Veldt,

(参考訳) クラスタ削除はNPハードグラフクラスタリングの目的であり、計算生物学やソーシャルネットワーク分析の応用において、グラフを斜めに分割するために最小限のエッジを削除することが目的である。まず,2つの近似アルゴリズムの厳密な解析を行い,その近似保証を4から3に改善する。さらに、補助グラフにおいて最大等級の頂点を優しく取り、その周囲にクラスタを形成することにより、両アルゴリズムを驚くほど単純な方法でデランドマイズすることができることを示す。これらのアルゴリズムの1つは線形プログラムの解法に依存する。私たちの最後の貢献は、理論と実践においてはるかにスケーラブルになるように、新しく純粋に組み合わせたアプローチを設計することです。

Cluster deletion is an NP-hard graph clustering objective with applications in computational biology and social network analysis, where the goal is to delete a minimum number of edges to partition a graph into cliques. We first provide a tighter analysis of two previous approximation algorithms, improving their approximation guarantees from 4 to 3. Moreover, we show that both algorithms can be derandomized in a surprisingly simple way, by greedily taking a vertex of maximum degree in an auxiliary graph and forming a cluster around it. One of these algorithms relies on solving a linear program. Our final contribution is to design a new and purely combinatorial approach for doing so that is far more scalable in theory and practice.

翻訳日:2024-04-26 18:02:25 公開日:2024-04-24

# 翻訳OCTAにおける網膜の特徴の定量的解析

Quantitative Characterization of Retinal Features in Translated OCTA ( http://arxiv.org/abs/2404.16133v1 )

ライセンス: Link先を確認

Rashadul Hasan Badhon, Atalie Carina Thompson, Jennifer I. Lim, Theodore Leng, Minhaj Nur Alam,

(参考訳) 目的: 本研究は, 生成機械学習(ML)を用いて光コヒーレンス・トモグラフィー(OCT)画像から光コヒーレンス・トモグラフィー(OCTA)画像へ変換し, 特殊なOCTAハードウェアの必要性を回避できる可能性を検討した。方法: 2次元血管分割モデルと2次元OCTA画像翻訳モデルを含む生成対向ネットワークフレームワークの実装を含む方法。この研究は、TR-OCTA画像の品質を評価するために、解像度と疾患ステータスに基づいてサブセットに分割された500人の公開データセットを利用している。この検証は、翻訳された画像とグラウンド・真理OCTA(GT-OCTA)を比較するために、いくつかの品質と定量的な指標を用いる。そして,GT-OCTAを用いたTR-OCTAの血管特性を定量的に解析し,TR-OCTAを用いた客観的疾患診断の可能性について検討した。結果: TR-OCTAは3mmデータセットと6mmデータセットの両方で高画質(GT-OCTAと比較して高分解能,中等度構造類似度,コントラスト品質)を示した。特に疾患患者では, 血管計測値に若干の差があった。血管形態は局所的な血管歪みの影響を受けやすいが, 血管周囲の血管形態は, 局所的な血管歪みの影響を受けやすい傾向を示した。結論:本研究は, TR-OCTAの血管的特徴を疾患検出に利用することにより, 臨床実践におけるOCTA導入の限界に対する有望な解決策を示す。翻訳関連性:本研究は、詳細な血管像をより広く利用し、コストの高いOCTA機器への依存を減らすことにより、網膜疾患の診断過程を著しく向上させる可能性がある。

Purpose: This study explores the feasibility of using generative machine learning (ML) to translate Optical Coherence Tomography (OCT) images into Optical Coherence Tomography Angiography (OCTA) images, potentially bypassing the need for specialized OCTA hardware. Methods: The method involved implementing a generative adversarial network framework that includes a 2D vascular segmentation model and a 2D OCTA image translation model. The study utilizes a public dataset of 500 patients, divided into subsets based on resolution and disease status, to validate the quality of TR-OCTA images. The validation employs several quality and quantitative metrics to compare the translated images with ground truth OCTAs (GT-OCTA). We then quantitatively characterize vascular features generated in TR-OCTAs with GT-OCTAs to assess the feasibility of using TR-OCTA for objective disease diagnosis. Result: TR-OCTAs showed high image quality in both 3 and 6 mm datasets (high-resolution, moderate structural similarity and contrast quality compared to GT-OCTAs). There were slight discrepancies in vascular metrics, especially in diseased patients. Blood vessel features like tortuosity and vessel perimeter index showed a better trend compared to density features which are affected by local vascular distortions. Conclusion: This study presents a promising solution to the limitations of OCTA adoption in clinical practice by using vascular features from TR-OCTA for disease detection. Translation relevance: This study has the potential to significantly enhance the diagnostic process for retinal diseases by making detailed vascular imaging more widely available and reducing dependency on costly OCTA equipment.

翻訳日:2024-04-26 18:02:25 公開日:2024-04-24

# グラフニューラルネットワークを用いたパワー障害カスケード予測

Power Failure Cascade Prediction using Graph Neural Networks ( http://arxiv.org/abs/2404.16134v1 )

ライセンス: Link先を確認

Sathwik Chadaga, Xinyu Wu, Eytan Modiano,

(参考訳) 分岐故障による停電の予測の問題点を考察する。本稿では,初期コンシデントと電力注入値が与えられたカスケードプロセスの各世代における格子状態を予測するグラフニューラルネットワークに基づくフローフリーモデルを提案する。シミュレーションから生成したカスケードシーケンスデータプールを用いて,提案モデルを訓練する。そして、そのモデルを様々なレベルの粒度で評価する。モデルが障害サイズ、最終的なグリッド状態、およびカスケード内の各ブランチの障害時間ステップを予測するためのいくつかのエラーメトリクスを示す。我々は、影響モデルに対してグラフニューラルネットワークモデルをベンチマークする。ランダムにスケールした電力注入値の汎用性に加えて、グラフニューラルネットワークモデルは、対応する負荷プロファイルに特化して構築された複数の影響モデルよりも優れていることを示す。最後に,提案モデルにより,ほぼ2桁の計算時間を短縮できることを示す。

We consider the problem of predicting power failure cascades due to branch failures. We propose a flow-free model based on graph neural networks that predicts grid states at every generation of a cascade process given an initial contingency and power injection values. We train the proposed model using a cascade sequence data pool generated from simulations. We then evaluate our model at various levels of granularity. We present several error metrics that gauge the model's ability to predict the failure size, the final grid state, and the failure time steps of each branch within the cascade. We benchmark the graph neural network model against influence models. We show that, in addition to being generic over randomly scaled power injection values, the graph neural network model outperforms multiple influence models that are built specifically for their corresponding loading profiles. Finally, we show that the proposed model reduces the computational time by almost two orders of magnitude.

翻訳日:2024-04-26 18:02:25 公開日:2024-04-24

# Performant near-term quantum combinatorial optimization

Performant near-term quantum combinatorial optimization ( http://arxiv.org/abs/2404.16135v1 )

ライセンス: Link先を確認

Titus D. Morris, Phillip C. Lotshaw,

(参考訳) 本稿では,線形深度回路を用いた組合せ最適化問題を解くための変分量子アルゴリズムを提案する。我々のアルゴリズムは、ターゲット組合せ関数の各項を制御するために設計されたハミルトン生成器からなるアンサッツと、量子想像時間進化の修正版に続くパラメータ更新を使用する。我々は,MAXCUT問題に対する解を目標とする数値シミュレーションにおいて,このアンサッツを評価する。状態の進化は想像上の時間進化を忠実に模倣し、その最適解収束は古典的ハミルトンスペクトルの適応変換によってさらに改善され、資源はアイデンティティに近い最適化されたゲートを刈り取ることで最小化される。これらの革新により、アルゴリズムは常に最適解に収束し、その過程で興味深い高絡み合いのダイナミクスを持つ。このパフォーマンスとリソース最小のアプローチは、短期量子コンピューティングハードウェアにおける潜在的な量子計算上の利点の候補である。

We present a variational quantum algorithm for solving combinatorial optimization problems with linear-depth circuits. Our algorithm uses an ansatz composed of Hamiltonian generators designed to control each term in the target combinatorial function, along with parameter updates following a modified version of quantum imaginary time evolution. We evaluate this ansatz in numerical simulations that target solutions to the MAXCUT problem. The state evolution is shown to closely mimic imaginary time evolution, and its optimal-solution convergence is further improved using adaptive transformations of the classical Hamiltonian spectrum, while resources are minimized by pruning optimized gates that are close to the identity. With these innovations, the algorithm consistently converges to optimal solutions, with interesting highly-entangled dynamics along the way. This performant and resource-minimal approach is a promising candidate for potential quantum computational advantages on near-term quantum computing hardware.

翻訳日:2024-04-26 18:02:25 公開日:2024-04-24

# BlendMimic3Dデータセットの導入とGCN再構成

3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement ( http://arxiv.org/abs/2404.16136v1 )

ライセンス: Link先を確認

Filipa Lino, Carlos Santiago, Manuel Marques,

(参考訳) HPE(3D Human Pose Estimation)の分野では、特に閉塞のあるシナリオにおいて、人間のポーズを正確に推定することが大きな課題である。この研究は、データの不足とオクルージョンを扱うための戦略に関して、3D HPEにおける現在の最先端のギャップを特定し、対処する。 BlendMimic3Dデータセットは、3D HPEアルゴリズムのシームレスな統合のために閉塞が発生している現実世界の状況を模倣するように設計されている。さらに,グラフモデルによるポーズ表現を強化するために,GCN(Graph Convolutional Network)を用いた3次元ポーズ改善ブロックを提案する。このGCNブロックはプラグアンドプレイのソリューションとして機能し、様々な3D HPEフレームワークに対応できる。 BlendMimic3Dの排他的データを用いてGCNをトレーニングすることにより、排他的ポーズの解決において、非排他的ポーズに匹敵する結果が得られた。プロジェクトのWebページはhttps://blendmimic3d.github.io/BlendMimic3D/.comで公開されている。

In the field of 3D Human Pose Estimation (HPE), accurately estimating human pose, especially in scenarios with occlusions, is a significant challenge. This work identifies and addresses a gap in the current state of the art in 3D HPE concerning the scarcity of data and strategies for handling occlusions. We introduce our novel BlendMimic3D dataset, designed to mimic real-world situations where occlusions occur for seamless integration in 3D HPE algorithms. Additionally, we propose a 3D pose refinement block, employing a Graph Convolutional Network (GCN) to enhance pose representation through a graph model. This GCN block acts as a plug-and-play solution, adaptable to various 3D HPE frameworks without requiring retraining them. By training the GCN with occluded data from BlendMimic3D, we demonstrate significant improvements in resolving occluded poses, with comparable results for non-occluded ones. Project web page is available at https://blendmimic3d.github.io/BlendMimic3D/.

翻訳日:2024-04-26 18:02:25 公開日:2024-04-24

# DFT-s-OFDMにおけるPAPR低減のためのパルス整形設計

Learned Pulse Shaping Design for PAPR Reduction in DFT-s-OFDM ( http://arxiv.org/abs/2404.16137v1 )

ライセンス: Link先を確認

Fabrizio Carpi, Soheil Rostami, Joonyoung Cho, Siddharth Garg, Elza Erkip, Charlie Jianzhong Zhang,

(参考訳) 高ピーク対平均電力比(PAPR)は細胞系、特にアップリンク方向における細胞被覆を制限する主要な要因の1つである。離散フーリエ変換拡散直交周波数領域多重化(DFT-s-OFDM)とスペクトル拡張周波数領域スペクトル整形(FDSS)は、アップリンク波形のPAPRを下げるための効率的な手法の1つである。本研究では,FDSSフィルタを決定する機械学習ベースのフレームワークを提案し,シンボル誤り率(SER),PAPR,スペクトル平坦性要件のトレードオフを最適化する。我々のエンドツーエンド最適化フレームワークは、Nyquist zero-ISI(シンボル間干渉)条件を含む、複数の重要な設計制約を考慮に入れている。その結果,学習したFDSSフィルタは従来のベースラインに比べてPAPRを低下させ,SER劣化を最小限に抑えた。最適化のパラメータをチューニングすることで、PAPR削減のためのFDSSフィルタの基本的制限と特性を理解するのにも役立ちます。

High peak-to-average power ratio (PAPR) is one of the main factors limiting cell coverage for cellular systems, especially in the uplink direction. Discrete Fourier transform spread orthogonal frequency-domain multiplexing (DFT-s-OFDM) with spectrally-extended frequency-domain spectrum shaping (FDSS) is one of the efficient techniques deployed to lower the PAPR of the uplink waveforms. In this work, we propose a machine learning-based framework to determine the FDSS filter, optimizing a tradeoff between the symbol error rate (SER), the PAPR, and the spectral flatness requirements. Our end-to-end optimization framework considers multiple important design constraints, including the Nyquist zero-ISI (inter-symbol interference) condition. The numerical results show that learned FDSS filters lower the PAPR compared to conventional baselines, with minimal SER degradation. Tuning the parameters of the optimization also helps us understand the fundamental limitations and characteristics of the FDSS filters for PAPR reduction.

翻訳日:2024-04-26 18:02:25 公開日:2024-04-24

# 実世界の課題から分類した協調知覚の中間融合法に関する調査研究

A Survey on Intermediate Fusion Methods for Collaborative Perception Categorized by Real World Challenges ( http://arxiv.org/abs/2404.16139v1 )

ライセンス: Link先を確認

Melih Yazgan, Thomas Graf, Min Liu, J. Marius Zoellner,

(参考訳) 本研究は、現実の課題によって分類された自律運転の協調認識における中間核融合手法を解析する。様々な手法について検討し,その特徴と採用した評価指標について詳述する。その焦点は、送信効率、ローカライゼーションエラー、通信障害、異質性といった課題に対処することにある。さらに、敵の攻撃や防衛に対抗するための戦略や、ドメインシフトに適応するためのアプローチについても検討する。本研究の目的は, 自律運転における協調的認識の分野を前進させる上で, 中間核融合法が果たす役割を明らかにすることである。

This survey analyzes intermediate fusion methods in collaborative perception for autonomous driving, categorized by real-world challenges. We examine various methods, detailing their features and the evaluation metrics they employ. The focus is on addressing challenges like transmission efficiency, localization errors, communication disruptions, and heterogeneity. Moreover, we explore strategies to counter adversarial attacks and defenses, as well as approaches to adapt to domain shifts. The objective is to present an overview of how intermediate fusion methods effectively meet these diverse challenges, highlighting their role in advancing the field of collaborative perception in autonomous driving.

翻訳日:2024-04-26 18:02:25 公開日:2024-04-24

# 安定成層乱流の機械学習によるURANSの閉鎖--物理時間スケールと深部時系列モデルのデータハイパーパラメータを接続する

Machine-Learned Closure of URANS for Stably Stratified Turbulence: Connecting Physical Timescales & Data Hyperparameters of Deep Time-Series Models ( http://arxiv.org/abs/2404.16141v1 )

ライセンス: Link先を確認

Muralikrishnan Gopalakrishnan Meena, Demetri Liousas, Andrew D. Simin, Aditya Kashi, Wesley H. Brewer, James J. Riley, Stephen M. de Bruyn Kops,

(参考訳) 安定成層乱流(SST)に適用した非定常レイノルズ平均ナビエストークス(URANS)方程式のクロージャモデリングのための時系列機械学習(ML)法を開発した。 SSTは力の微妙なバランスに強く影響され、崩壊する場合にはより異方性になる。さらに、URANS方程式の項のいくつかで説明される物理現象の限定的な理解がある。各項を個別にモデル化しようとするよりも、項群、すなわち力のバランスを直接モデル化する機械学習の能力を探求することが魅力的である。等質で安定に成層された崩壊SSTを一様密度勾配で検討し,次元の減少を可能とした。本稿では,Long Short-Term Memory (LSTM) とNeural Ordinary Differential Equation (NODE) の2つの時系列MLモデルを検討する。どちらのモデルも正確に動作し、後方試験では数値的に安定である。さらに、複雑なシステムの物理的に関連する時間スケールを抽出することにより、MLモデルのデータ要求について検討する。 MLモデルがSSTの力学を正確に捉えるために必要な最小情報の時間尺度の比率は,流れのレイノルズ数と一致することがわかった。現在のフレームワークは、高次元の複雑なSSTフローのダイナミクスを捉えるためのそのようなモデルの能力を探るためのバックボーンを提供する。

We develop time-series machine learning (ML) methods for closure modeling of the Unsteady Reynolds Averaged Navier Stokes (URANS) equations applied to stably stratified turbulence (SST). SST is strongly affected by fine balances between forces and becomes more anisotropic in time for decaying cases. Moreover, there is a limited understanding of the physical phenomena described by some of the terms in the URANS equations. Rather than attempting to model each term separately, it is attractive to explore the capability of machine learning to model groups of terms, i.e., to directly model the force balances. We consider decaying SST which are homogeneous and stably stratified by a uniform density gradient, enabling dimensionality reduction. We consider two time-series ML models: Long Short-Term Memory (LSTM) and Neural Ordinary Differential Equation (NODE). Both models perform accurately and are numerically stable in a posteriori tests. Furthermore, we explore the data requirements of the ML models by extracting physically relevant timescales of the complex system. We find that the ratio of the timescales of the minimum information required by the ML models to accurately capture the dynamics of the SST corresponds to the Reynolds number of the flow. The current framework provides the backbone to explore the capability of such models to capture the dynamics of higher-dimensional complex SST flows.

翻訳日:2024-04-26 16:02:40 公開日:2024-04-24

# 量子および古典的機械学習モデルにおける逆ロバスト性の比較分析

A Comparative Analysis of Adversarial Robustness for Quantum and Classical Machine Learning Models ( http://arxiv.org/abs/2404.16154v1 )

ライセンス: Link先を確認

Maximilian Wendlinger, Kilian Tscharke, Pascal Debus,

(参考訳) 量子機械学習(QML)は、研究や産業から大きな関心を集め続けている分野である。 QMLモデルは、古典的な機械学習モデルとほとんど同じ方法で敵攻撃に弱いことが示されているが、量子モデルと古典モデルの敵攻撃を比較する方法はほとんど分かっていない。本稿では,移動攻撃,摂動パターン,リプシッツ境界を用いた古典的,量子的モデルの相似性と相違について系統的に検討する。具体的には、特徴属性の定量的分析を可能にする手作りデータセットの分類タスクに焦点を当てる。これにより、理論的にも実験的にも、分類ネットワークの堅牢性に関する洞察を得ることができる。まず、振幅や再アップロード符号化回路などの典型的なQMLモデルアーキテクチャと変分パラメータを比較し、従来のConvNetアーキテクチャと比較する。次に、QML回路の古典的近似(元はランダムフーリエ特徴サンプリングで得られたが、トレーニング可能な符号化に適合する)を導入し、他のアーキテクチャと比較してフーリエネットワークと呼ばれるモデルを評価する。以上の結果から,このフーリエネットワークは量子古典境界上の「中間基底」と見なせることがわかった。両方向の境界を越える敵攻撃は成功したが、正規化は量子ネットワークをより堅牢にし、リプシッツ境界や転送攻撃に直接影響を与えることを示す。

Quantum machine learning (QML) continues to be an area of tremendous interest from research and industry. While QML models have been shown to be vulnerable to adversarial attacks much in the same manner as classical machine learning models, it is still largely unknown how to compare adversarial attacks on quantum versus classical models. In this paper, we show how to systematically investigate the similarities and differences in adversarial robustness of classical and quantum models using transfer attacks, perturbation patterns and Lipschitz bounds. More specifically, we focus on classification tasks on a handcrafted dataset that allows quantitative analysis for feature attribution. This enables us to get insight, both theoretically and experimentally, on the robustness of classification networks. We start by comparing typical QML model architectures such as amplitude and re-upload encoding circuits with variational parameters to a classical ConvNet architecture. Next, we introduce a classical approximation of QML circuits (originally obtained with Random Fourier Features sampling but adapted in this work to fit a trainable encoding) and evaluate this model, denoted Fourier network, in comparison to other architectures. Our findings show that this Fourier network can be seen as a "middle ground" on the quantum-classical boundary. While adversarial attacks successfully transfer across this boundary in both directions, we also show that regularization helps quantum networks to be more robust, which has direct impact on Lipschitz bounds and transfer attacks.

翻訳日:2024-04-26 16:02:40 公開日:2024-04-24

# SAM は EIG の夢か? 期待情報を用いた対話型セグメンタの性能評価

Does SAM dream of EIG? Characterizing Interactive Segmenter Performance using Expected Information Gain ( http://arxiv.org/abs/2404.16155v1 )

ライセンス: Link先を確認

Kuan-I Chung, Daniel Moyer,

(参考訳) 本稿では,対話型セグメンテーションモデルの評価手法を提案する。ベイズ実験設計の概念に基づいて、この手順はモデルの点プロンプトの理解と所望のセグメンテーションマスクとの対応を測定する。我々は、Oracle Diceインデックスの測定が、この特性の測定に無関心であるか、あるいは誤解を招くことさえ示している。本稿では,3つの対話的セグメンテーションモデルと2つの大きな画像セグメンテーションデータセットのサブセットに提案手法を適用した。

We introduce an assessment procedure for interactive segmentation models. Based on concepts from Bayesian Experimental Design, the procedure measures a model's understanding of point prompts and their correspondence with the desired segmentation mask. We show that Oracle Dice index measurements are insensitive or even misleading in measuring this property. We demonstrate the use of the proposed procedure on three interactive segmentation models and subsets of two large image segmentation datasets.

翻訳日:2024-04-26 16:02:40 公開日:2024-04-24

# 量子ガンのガーディアン

Guardians of the Quantum GAN ( http://arxiv.org/abs/2404.16156v1 )

ライセンス: Link先を確認

Archisman Ghosh, Debarshi Kundu, Avimita Chatterjee, Swaroop Ghosh,

(参考訳) Quantum Generative Adversarial Networks (qGANs)は、画像生成量子機械学習モデルの最前線にある。量子機械学習モデルをトレーニングし、推論するためのNISQ(Noisy Intermediate-Scale Quantum)デバイスへの需要の増加に対応するため、量子ハードウェアをサービスとして提供するサードパーティベンダの数は増加すると予想されている。この拡張は、信頼できないベンダーが量子機械学習モデルからプロプライエタリな情報を盗むリスクをもたらす。そこで本研究では,qGANsのトレーニングフェーズに埋め込まれたノイズシグネチャを非侵襲的な透かしとして活用する新しい透かし手法を提案する。透かしは、qGANが生成した画像の中で識別可能であり、トレーニング中に使用する特定の量子ハードウェアをトレースすることで、所有権の強い証明を提供する。セキュリティの堅牢性をさらに高めるため、複数の量子ハードウェアのシーケンス上でqGANのトレーニングを提案し、敵が複製し難い全てのトレーニングハードウェアのノイズシグネチャを含む複雑な透かしを埋め込む。また、この透かしを頑健に抽出する機械学習分類器を開発し、モデルの真正性を検証したqGANによって生成された画像からトレーニングハードウェア(またはハードウェアスイート)を識別する。ウォーターマークの署名は、トレーニングに使用されたハードウェアとは異なるハードウェアの推論に対して堅牢である点に注意が必要だ。個別の量子ハードウェア上でのQGANのトレーニングには,それぞれ100%と90%の透かし抽出精度が得られた(異なるハードウェア上での参照)。トレーニング中のパラメータの進化は量子ノイズによって強く変調されるため、提案された透かしは他の量子機械学習モデルにも拡張することができる。

Quantum Generative Adversarial Networks (qGANs) are at the forefront of image-generating quantum machine learning models. To accommodate the growing demand for Noisy Intermediate-Scale Quantum (NISQ) devices to train and infer quantum machine learning models, the number of third-party vendors offering quantum hardware as a service is expected to rise. This expansion introduces the risk of untrusted vendors potentially stealing proprietary information from the quantum machine learning models. To address this concern we propose a novel watermarking technique that exploits the noise signature embedded during the training phase of qGANs as a non-invasive watermark. The watermark is identifiable in the images generated by the qGAN allowing us to trace the specific quantum hardware used during training hence providing strong proof of ownership. To further enhance the security robustness, we propose the training of qGANs on a sequence of multiple quantum hardware, embedding a complex watermark comprising the noise signatures of all the training hardware that is difficult for adversaries to replicate. We also develop a machine learning classifier to extract this watermark robustly, thereby identifying the training hardware (or the suite of hardware) from the images generated by the qGAN validating the authenticity of the model. We note that the watermark signature is robust against inferencing on hardware different than the hardware that was used for training. We obtain watermark extraction accuracy of 100% and ~90% for training the qGAN on individual and multiple quantum hardware setups (and inferencing on different hardware), respectively. Since parameter evolution during training is strongly modulated by quantum noise, the proposed watermark can be extended to other quantum machine learning models as well.

翻訳日:2024-04-26 16:02:40 公開日:2024-04-24

# マルチFPGAプラットフォームにおける大規模変圧器実装の可能性

The Feasibility of Implementing Large-Scale Transformers on Multi-FPGA Platforms ( http://arxiv.org/abs/2404.16158v1 )

ライセンス: Link先を確認

Yu Gao, Juan Camilo Vega, Paul Chow,

(参考訳) FPGAは、データセンターでLarge Language Models (LLMs)のような大規模な機械学習アプリケーションの実装について議論する際に、ほとんど言及されない。単一のFPGAがGPUと性能の競争力を持つことを示す多くの証拠があり、特に低レイテンシで、電力を考慮した場合の方がはるかに効率的である。このことは、大規模機械学習アプリケーションに複数のFPGAを使うことを探求するメリットがあることを示唆している。複数のFPGAを使用する場合の課題は、マルチFPGAアプリケーションの開発とデプロイに一般的に受け入れられるフローがない、すなわち、大きなアプリケーションを記述し、複数のFPGAにマップし、マルチFPGAプラットフォームにアプリケーションをデプロイするツールがないことである。本稿では,スケーラブルなマルチFPGAプラットフォームと大規模アプリケーションをプラットフォームにマップするツールを開発することにより,複数のFPGAを用いた大規模トランスフォーマーの実現の可能性を検討する。 I-BERTトランスの効率的なマルチFPGAバージョンを設計し、6つのFPGAを概念実証として1つのエンコーダを実装することで、我々のプラットフォームとツールが動作することを示す。概念実証のプロトタイプと最新のFPGAを用いたGPUの性能評価に基づいて、大規模機械学習アプリケーションの世界にはFPGAの場所が存在すると結論付けている。我々は、適切なインフラストラクチャとツールで、LLMのようなアプリケーションにFPGAを使用することの可能なメリットを引き続き探求することが妥当であることを示す、有望な第一歩を実証する。

FPGAs are rarely mentioned when discussing the implementation of large machine learning applications, such as Large Language Models (LLMs), in the data center. There has been much evidence showing that single FPGAs can be competitive with GPUs in performance for some computations, especially for low latency, and often much more efficient when power is considered. This suggests that there is merit to exploring the use of multiple FPGAs for large machine learning applications. The challenge with using multiple FPGAs is that there is no commonly-accepted flow for developing and deploying multi-FPGA applications, i.e., there are no tools to describe a large application, map it to multiple FPGAs and then deploy the application on a multi-FPGA platform. In this paper, we explore the feasibility of implementing large transformers using multiple FPGAs by developing a scalable multi-FPGA platform and some tools to map large applications to the platform. We validate our approach by designing an efficient multi-FPGA version of the I-BERT transformer and implement one encoder using six FPGAs as a working proof-of-concept to show that our platform and tools work. Based on our proof-of-concept prototype and the estimations of performance using the latest FPGAs compared to GPUs, we conclude that there can be a place for FPGAs in the world of large machine learning applications. We demonstrate a promising first step that shows that with the right infrastructure and tools it is reasonable to continue to explore the possible benefits of using FPGAs for applications such as LLMs.

翻訳日:2024-04-26 16:02:40 公開日:2024-04-24

# AFU: アクターフリーの批判者が継続的管理のためにオフポリティのRLを更新

AFU: Actor-Free critic Updates in off-policy RL for continuous control ( http://arxiv.org/abs/2404.16159v1 )

ライセンス: Link先を確認

Nicolas Perrin-Gilbert,

(参考訳) 本稿では、連続的な行動空間に対するQラーニングにおける「最大Q問題」を、回帰と条件付き勾配スケーリングに基づく解を用いて新しい方法で解決する、非政治的な深部RLアルゴリズムであるAFUを提案する。 AFUには俳優がいるが、批評家の更新は完全に独立している。その結果、俳優は自由に選択できる。初期バージョンであるAFU-alphaでは、Soft Actor-Critic(SAC)と同じ確率的アクターを用いているが、SACの単純な障害モードを研究し、アクター更新を局所的な最適状態に閉じ込められにくくするためにAFUをどう修正できるかを示し、アルゴリズムの第2バージョンであるAFU-beta(AFU-beta)が実現される。両バージョンのAFUのサンプル効率を実証し,アクター批判的視点から逸脱しながら,最先端のアクター批判手法と競合する最初のモデルフリーオフポリチアルゴリズムであることを示す。

This paper presents AFU, an off-policy deep RL algorithm addressing in a new way the challenging "max-Q problem" in Q-learning for continuous action spaces, with a solution based on regression and conditional gradient scaling. AFU has an actor but its critic updates are entirely independent from it. As a consequence, the actor can be chosen freely. In the initial version, AFU-alpha, we employ the same stochastic actor as in Soft Actor-Critic (SAC), but we then study a simple failure mode of SAC and show how AFU can be modified to make actor updates less likely to become trapped in local optima, resulting in a second version of the algorithm, AFU-beta. Experimental results demonstrate the sample efficiency of both versions of AFU, marking it as the first model-free off-policy algorithm competitive with state-of-the-art actor-critic methods while departing from the actor-critic perspective.

翻訳日:2024-04-26 16:02:40 公開日:2024-04-24

# アシスタントを用いた心理療法チャットボットのドメイン特異的改善

Domain-Specific Improvement on Psychotherapy Chatbot Using Assistant ( http://arxiv.org/abs/2404.16160v1 )

ライセンス: Link先を確認

Cheng Kang, Daniel Novak, Katerina Urbanova, Yuqing Cheng, Yong Hu,

(参考訳) 大規模言語モデル (LLM) は、人手による命令データを用いた特定のタスクに対する印象的な一般化機能を実証している。しかし、そのような指導データに対する限られた量、多様性、専門知識は、ドメイン固有の指示が与えられた場合の精神療法タスクにおけるLLMのパフォーマンスに関する懸念を提起する。まず、AlexanderStreet療法に基づくドメイン特化補助命令を提案し、次に、適応微調整法と検索強化法を用いて、事前学習したLLMを改善する。自動評価と人的評価を用いて言語質を定量的に評価することにより、心理療法補助指導における事前学習のLLMが、最先端のLLM応答ベースラインを上回っていることを観察する。我々の助教授アプローチは、トレーニング済みのLSMに指示を合わせ、トレーニング済みのLSMにより心理学的な知識を与える半注釈法を提供する。

Large language models (LLMs) have demonstrated impressive generalization capabilities on specific tasks with human-written instruction data. However, the limited quantity, diversity, and professional expertise of such instruction data raise concerns about the performance of LLMs in psychotherapy tasks when provided with domain-specific instructions. To address this, we firstly propose Domain-Specific Assistant Instructions based on AlexanderStreet therapy, and secondly, we use an adaption fine-tuning method and retrieval augmented generation method to improve pre-trained LLMs. Through quantitative evaluation of linguistic quality using automatic and human evaluation, we observe that pre-trained LLMs on Psychotherapy Assistant Instructions outperform state-of-the-art LLMs response baselines. Our Assistant-Instruction approach offers a half-annotation method to align pre-trained LLMs with instructions and provide pre-trained LLMs with more psychotherapy knowledge.

翻訳日:2024-04-26 16:02:40 公開日:2024-04-24

# より現実的な環境へ向けた生涯マルチエージェントパスのスケーリング:研究課題と機会

Scaling Lifelong Multi-Agent Path Finding to More Realistic Settings: Research Challenges and Opportunities ( http://arxiv.org/abs/2404.16162v1 )

ライセンス: Link先を確認

He Jiang, Yulun Zhang, Rishi Veerapaneni, Jiaoyang Li,

(参考訳) MAPF(Multi-Agent Path Finding)は、複数のエージェントを衝突なしに開始点から目標へ移動させる問題である。 Lifelong MAPF (LMAPF) は、エージェントに新たな目標を継続的に割り当てることでMAPFを拡張する。我々は2023年のLMAPFコンペで優勝したロボットランナーのLMAPFに対して,いくつかの興味深い研究課題と今後の方向性を提示する。本稿では,3つの主要な研究課題について概説する。最初の課題は、多数のエージェント(例えば1万個)や非常に高いエージェント密度(例えば97.7%)に対して、限られた計画時間(例えば1ステップあたり1秒)で高品質なLMAPFソリューションを探すことである。我々は、より競争力のあるルールベースのMAPFアルゴリズムや最先端MAPFアルゴリズムの並列化など、今後の方向性を示す。第2の課題は、LMAPFアルゴリズムにおける混雑と筋活動の影響を緩和することである。本稿では,渋滞軽減のための移動誘導や交通ルールの開発,将来予測とリアルタイム検索の導入,最適なエージェント数の決定など,今後の方向性を示す。第3の課題は、文学と現実世界の応用で使用されるLMAPFモデルのギャップを埋めることである。我々は,より現実的なキノダイナミックモデル,実行の不確実性,システムの進化といった今後の方向性を提示する。

Multi-Agent Path Finding (MAPF) is the problem of moving multiple agents from starts to goals without collisions. Lifelong MAPF (LMAPF) extends MAPF by continuously assigning new goals to agents. We present our winning approach to the 2023 League of Robot Runners LMAPF competition, which leads us to several interesting research challenges and future directions. In this paper, we outline three main research challenges. The first challenge is to search for high-quality LMAPF solutions within a limited planning time (e.g., 1s per step) for a large number of agents (e.g., 10,000) or extremely high agent density (e.g., 97.7%). We present future directions such as developing more competitive rule-based and anytime MAPF algorithms and parallelizing state-of-the-art MAPF algorithms. The second challenge is to alleviate congestion and the effect of myopic behaviors in LMAPF algorithms. We present future directions, such as developing moving guidance and traffic rules to reduce congestion, incorporating future prediction and real-time search, and determining the optimal agent number. The third challenge is to bridge the gaps between the LMAPF models used in the literature and real-world applications. We present future directions, such as dealing with more realistic kinodynamic models, execution uncertainty, and evolving systems.

翻訳日:2024-04-26 16:02:40 公開日:2024-04-24

# ファクチュアル・ナレッジ・リコールにおけるLCMの全体的評価に向けて

Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall ( http://arxiv.org/abs/2404.16164v1 )

ライセンス: Link先を確認

Jiaqing Yuan, Lin Pan, Chung-Wei Hang, Jiang Guo, Jiarong Jiang, Bonan Min, Patrick Ng, Zhiguo Wang,

(参考訳) 大規模言語モデル(LLM)は、様々なNLPタスクにおいて顕著な性能を示しており、幅広いユースケースで急速に採用されている。したがって、幻覚は依然として困難な問題であり、生成したアウトプットの事実性を評価することは極めて重要である。本研究は,事前学習から学んだ事実的知識を想起するLLMの能力と,その能力に影響を与える要因を評価することに焦点を当てる。そこで我々はFACT-BENCHを構築し,20のドメイン,134のプロパティタイプ,3つの応答タイプ,異なる知識人気レベルをカバーする。 10のモデルファミリーから31のモデルをベンチマークし、その長所と短所を総合的に評価する。事前学習のみのモデルが命令チューニングのモデルよりも常に優れており、モデルスケーリングの肯定的な効果は、より大きなモデルがすべてのモデルファミリに対してより小さいモデルよりも優れており、インストラクションチューニングが知識リコールを損なうことを観察する。しかし、GPT-4の最高性能は上行線との差が大きい。さらに,反実的実演を用いたインコンテキスト・エスペクタの役割について検討し,大規模モデルにおける事実的知識リコールの大幅な低下につながった。さらに、既知の知識と未知の知識を分離することによって、その劣化は、モデルの既知の知識と矛盾する模範者や、そのような模範者の数によって引き起こされる。最後に、LLaMA-7Bを未知の知識の異なる設定で微調整する。特に、モデルの既知の知識の微調整は有益であり、未知の知識と混ざった知識の微調整よりも一貫して優れている。ベンチマークを公開します。

Large language models (LLMs) have shown remarkable performance on a variety of NLP tasks, and are being rapidly adopted in a wide range of use cases. It is therefore of vital importance to holistically evaluate the factuality of their generated outputs, as hallucinations remain a challenging issue. In this work, we focus on assessing LLMs' ability to recall factual knowledge learned from pretraining, and the factors that affect this ability. To that end, we construct FACT-BENCH, a representative benchmark covering 20 domains, 134 property types, 3 answer types, and different knowledge popularity levels. We benchmark 31 models from 10 model families and provide a holistic assessment of their strengths and weaknesses. We observe that instruction-tuning hurts knowledge recall, as pretraining-only models consistently outperform their instruction-tuned counterparts, and positive effects of model scaling, as larger models outperform smaller ones for all model families. However, the best performance from GPT-4 still represents a large gap with the upper-bound. We additionally study the role of in-context exemplars using counterfactual demonstrations, which lead to significant degradation of factual knowledge recall for large models. By further decoupling model known and unknown knowledge, we find the degradation is attributed to exemplars that contradict a model's known knowledge, as well as the number of such exemplars. Lastly, we fine-tune LLaMA-7B in different settings of known and unknown knowledge. In particular, fine-tuning on a model's known knowledge is beneficial, and consistently outperforms fine-tuning on unknown and mixed knowledge. We will make our benchmark publicly available.

翻訳日:2024-04-26 16:02:40 公開日:2024-04-24

# 量子ネットワークのための連続冷却イオンの高速光子による絡み合い

Fast photon-mediated entanglement of continuously-cooled trapped ions for quantum networking ( http://arxiv.org/abs/2404.16167v1 )

ライセンス: Link先を確認

Jameson O'Reilly, George Toh, Isabella Goetting, Sagnik Saha, Mikhail Shalaev, Allison Carter, Andrew Risinger, Ashish Kalakuntla, Tingguang Li, Ashrit Verma, Christopher Monroe,

(参考訳) 我々は2つのコトラップされた原子バリウムイオンの量子ビットを、各イオンから真空0.8NAの目的物を通して1つの可視光子を集め、それらを集積ファイバービームスプリッターを介して干渉し、偶然に検出することで絡み合わせる。これにより、クォービットは、観測された忠実度が F > 94% 以下の絡み合ったベル状態に投影される。また, 同調冷却用イッテルビウムイオンを導入し, 中断除去の必要性を除去し, 連続的絡み合い速度2501/sを実現した。

We entangle two co-trapped atomic barium ion qubits by collecting single visible photons from each ion through in-vacuo 0.8 NA objectives, interfering them through an integrated fiber-beamsplitter and detecting them in coincidence. This projects the qubits into an entangled Bell state with an observed fidelity lower bound of F > 94%. We also introduce an ytterbium ion for sympathetic cooling to remove the need for recooling interruptions and achieve a continuous entanglement rate of 250 1/s.

翻訳日:2024-04-26 16:02:40 公開日:2024-04-24

# 現代のUDAアルゴリズムにおける超確実現象

The Over-Certainty Phenomenon in Modern UDA Algorithms ( http://arxiv.org/abs/2404.16168v1 )

ライセンス: Link先を確認

Fin Amin, Jung-Eun Kim,

(参考訳) ニューラルネットワークがトレーニングセットから逸脱した不慣れなデータに直面している場合、これはドメインシフトを意味する。これらのネットワークは入力に関する予測を出力するが、これらの新しい観測に精通するレベルを説明できないのが普通である。この課題は、組み込みシステムやエッジデバイスなど、リソース制約のある設定でさらに顕著になる。このような課題に対処するために、我々は、ニューラルネットワークが観測するデータを認識することに関連して、ニューラルネットワークの判断境界を再検討し、確実な蒸留として作り出したアプローチを導入することを目的としている。一般的な作業は、教師なし領域適応(UDA)をモデルエントロピーの削減の目的としながら、キャリブレーションの不正確さに対処する意図しない出生モデルである。本稿では,従来の学習モデルの欠点を考察する。この問題の解決法として,計算資源が限られている環境に適合性を維持しつつ,精度を向上するだけでなく,モデルのキャリブレーションも保証するUDAアルゴリズムを提案する。

When neural networks are confronted with unfamiliar data that deviate from their training set, this signifies a domain shift. While these networks output predictions on their inputs, they typically fail to account for their level of familiarity with these novel observations. This challenge becomes even more pronounced in resource-constrained settings, such as embedded systems or edge devices. To address such challenges, we aim to recalibrate a neural network's decision boundaries in relation to its cognizance of the data it observes, introducing an approach we coin as certainty distillation. While prevailing works navigate unsupervised domain adaptation (UDA) with the goal of curtailing model entropy, they unintentionally birth models that grapple with calibration inaccuracies - a dilemma we term the over-certainty phenomenon. In this paper, we probe the drawbacks of this traditional learning model. As a solution to the issue, we propose a UDA algorithm that not only augments accuracy but also assures model calibration, all while maintaining suitability for environments with limited computational resources.

翻訳日:2024-04-26 16:02:40 公開日:2024-04-24

# MiMICRI:心血管画像分類モデルにおける領域中心の対実的説明に向けて

MiMICRI: Towards Domain-centered Counterfactual Explanations of Cardiovascular Image Classification Models ( http://arxiv.org/abs/2404.16174v1 )

ライセンス: Link先を確認

Grace Guo, Lifu Deng, Animesh Tandon, Alex Endert, Bum Chul Kwon,

(参考訳) 近年、広くアクセス可能な大規模な医用画像データセットが普及し、心臓血管画像分類と分析のための人工知能(AI)モデルが急増している。同時に、これらのモデルによる潜在的に重大な影響は、特定の画像入力が与えられたモデル予測を説明することを目的とした、説明可能なAI(XAI)メソッドの開発を動機付けている。しかし、これらの手法の多くはドメイン専門家によって開発・評価されておらず、説明は専門知識やドメイン知識の観点からは文脈化されていない。本稿では,心血管画像分類モデルのドメイン中心の対実的説明を提供する,新しいフレームワークとピソンライブラリであるMIMICRIを提案する。 MiMICRIは、ユーザーが形態的構造に対応する医療画像のセグメントをインタラクティブに選択、置換するのに役立つ。生成された偽物から、ユーザーは各セグメントがモデル予測に与える影響を評価し、そのモデルを既知の医療事実に対して検証することができる。私たちはこの図書館を2人の医療専門家と評価した。我々の評価は、ドメイン中心のXAIアプローチがモデル説明の解釈可能性を高め、専門家が関連するドメイン知識の観点からモデルについて推論するのに役立つことを示す。しかし, 副作用の臨床的妥当性についても懸念が浮上した。我々は、MiMICRIフレームワークの汎用性と信頼性に関する議論と、医療場面におけるモデル解釈可能性のためのドメイン中心のXAI手法の開発に関する知見の意義を結論付けた。

The recent prevalence of publicly accessible, large medical imaging datasets has led to a proliferation of artificial intelligence (AI) models for cardiovascular image classification and analysis. At the same time, the potentially significant impacts of these models have motivated the development of a range of explainable AI (XAI) methods that aim to explain model predictions given certain image inputs. However, many of these methods are not developed or evaluated with domain experts, and explanations are not contextualized in terms of medical expertise or domain knowledge. In this paper, we propose a novel framework and python library, MiMICRI, that provides domain-centered counterfactual explanations of cardiovascular image classification models. MiMICRI helps users interactively select and replace segments of medical images that correspond to morphological structures. From the counterfactuals generated, users can then assess the influence of each segment on model predictions, and validate the model against known medical facts. We evaluate this library with two medical experts. Our evaluation demonstrates that a domain-centered XAI approach can enhance the interpretability of model explanations, and help experts reason about models in terms of relevant domain knowledge. However, concerns were also surfaced about the clinical plausibility of the counterfactuals generated. We conclude with a discussion on the generalizability and trustworthiness of the MiMICRI framework, as well as the implications of our findings on the development of domain-centered XAI methods for model interpretability in healthcare contexts.

翻訳日:2024-04-26 16:02:40 公開日:2024-04-24

# シリング攻撃の緩和によるレコメンダシステムの開発

Advancing Recommender Systems by mitigating Shilling attacks ( http://arxiv.org/abs/2404.16177v1 )

ライセンス: Link先を確認

Aditya Chichani, Juzer Golwala, Tejas Gundecha, Kiran Gawande,

(参考訳) 提供商品の数が指数関数的に増加し、ユーザが決定を下す前に同化できるデータ量が比較的小さいという前提を考えると、推奨システムはユーザの好みに応じてコンテンツを分類するのに役立つ。コラボレーティブ・フィルタリングは、優れた性能のため、リコメンデーションの計算に広く使われている手法である。しかし、この方法では、システムはレコメンデーションに偏見を抱く攻撃に対して脆弱になる。これらの攻撃は「シリング・アタック」と呼ばれ、システム内でアイテムを押したり、商品をヌークしたりする。本稿では,システム内のシリングプロファイルを正確に検出するアルゴリズムを提案するとともに,そのようなプロファイルがレコメンデーションに与える影響について検討する。

Considering the premise that the number of products offered grow in an exponential fashion and the amount of data that a user can assimilate before making a decision is relatively small, recommender systems help in categorizing content according to user preferences. Collaborative filtering is a widely used method for computing recommendations due to its good performance. But, this method makes the system vulnerable to attacks which try to bias the recommendations. These attacks, known as 'shilling attacks' are performed to push an item or nuke an item in the system. This paper proposes an algorithm to detect such shilling profiles in the system accurately and also study the effects of such profiles on the recommendations.

翻訳日:2024-04-26 16:02:40 公開日:2024-04-24

# S2DEVFMAP: 時系列における異常予測の最大化のためのデュアルアンサンブル投票融合を用いた自己教師付き学習フレームワーク

S2DEVFMAP: Self-Supervised Learning Framework with Dual Ensemble Voting Fusion for Maximizing Anomaly Prediction in Timeseries ( http://arxiv.org/abs/2404.16179v1 )

ライセンス: Link先を確認

Sarala Naidu, Ning Xiong,

(参考訳) 異常検出は、特に冷却システムの信頼性と最適性能を維持する上で、産業環境において重要な役割を担っている。従来の異常検出手法は、様々なデータ特性やノイズレベルの変動を扱う際の課題に直面することが多く、効果は限られている。しかし、従来の異常検出は、しばしば単一モデルの応用に依存している。この研究は、5つの異種独立モデルと2重アンサンブル融合を用いた新しい頑健なアプローチを提案する。各種モデルは様々なシステムの振る舞いを捉え、融合戦略は検出効率を最大化し、誤報を最小限にする。各ベースオートエンコーダモデルはデータのユニークな表現を学習し、相補的な強度を活用して異常検出性能を向上させる。最終異常予測の有効性と信頼性を高めるため、二重アンサンブル法を適用した。このアプローチは、異常を識別する範囲を最大化するのに優れています。実世界の産業用冷却システムデータのデータセットによる実験結果から,提案手法の有効性が示された。このアプローチは、システムの信頼性を確保し、潜在的な誤動作を防ぐために異常検出が重要である他の産業アプリケーションにも拡張できる。

Anomaly detection plays a crucial role in industrial settings, particularly in maintaining the reliability and optimal performance of cooling systems. Traditional anomaly detection methods often face challenges in handling diverse data characteristics and variations in noise levels, resulting in limited effectiveness. And yet traditional anomaly detection often relies on application of single models. This work proposes a novel, robust approach using five heterogeneous independent models combined with a dual ensemble fusion of voting techniques. Diverse models capture various system behaviors, while the fusion strategy maximizes detection effectiveness and minimizes false alarms. Each base autoencoder model learns a unique representation of the data, leveraging their complementary strengths to improve anomaly detection performance. To increase the effectiveness and reliability of final anomaly prediction, dual ensemble technique is applied. This approach outperforms in maximizing the coverage of identifying anomalies. Experimental results on a real-world dataset of industrial cooling system data demonstrate the effectiveness of the proposed approach. This approach can be extended to other industrial applications where anomaly detection is critical for ensuring system reliability and preventing potential malfunctions.

翻訳日:2024-04-26 16:02:40 公開日:2024-04-24

# 初期モデルのないブラインドフェデレーション学習

Blind Federated Learning without initial model ( http://arxiv.org/abs/2404.16180v1 )

ライセンス: Link先を確認

Jose L. Salmeron, Irina Arévalo,

(参考訳) フェデレートラーニング(Federated Learning)は、独自のプライベートデータを持つ複数の参加者間のモデル構築を可能にする、新たな機械学習アプローチである。この方法はセキュアでプライバシ保護であり、病院などの異なるソースからの機密データを使用して機械学習モデルをトレーニングするのに適している。本稿では,ファジィ認知マップのファジィ学習をプライバシ保護手法として,粒子群最適化に基づく2つの革新的な手法を提案する。さらに、この研究には、連合学習プロセスに初期モデルがないことが関係しており、効果的に盲目化している。この提案は、いくつかのオープンデータセットでテストされており、精度と精度の両方を改善している。

Federated learning is an emerging machine learning approach that allows the construction of a model between several participants who hold their own private data. This method is secure and privacy-preserving, suitable for training a machine learning model using sensitive data from different sources, such as hospitals. In this paper, the authors propose two innovative methodologies for Particle Swarm Optimisation-based federated learning of Fuzzy Cognitive Maps in a privacy-preserving way. In addition, one relevant contribution this research includes is the lack of an initial model in the federated learning process, making it effectively blind. This proposal is tested with several open datasets, improving both accuracy and precision.

翻訳日:2024-04-26 16:02:40 公開日:2024-04-24

# ABCD:リスクアセスメントのための信頼強化アテンションベースの畳み込みオートエンコーダ

ABCD: Trust enhanced Attention based Convolutional Autoencoder for Risk Assessment ( http://arxiv.org/abs/2404.16183v1 )

ライセンス: Link先を確認

Sarala Naidu, Ning Xiong,

(参考訳) 産業システムにおける異常検出は、機器故障の防止、リスク識別の確保、システム全体の効率の維持に不可欠である。従来の監視方法は、固定されたしきい値と経験則に依存しており、システムの健康状態の微妙な変化を検出し、差し迫った失敗を予測するのに十分な敏感ではない。この制限に対処するため,リスク検出のためのABCD(Attention-based convolutional autoencoder)を提案する。 ABCDは、実世界の産業用冷却システムの歴史的データから導電率の正常な挙動を学習し、入力データを再構成し、期待されるパターンから逸脱する異常を識別する。このフレームワークは、予測の信頼性を確保するためにキャリブレーション技術も採用している。その結果,ABCDでは注意機構が57.4%向上し,誤報が9.37%減少した。このアプローチは、メンテナンスにマップされたリスク優先度ランクを効果的に検出し、冷却システム設計者とサービス担当者に貴重な洞察を提供する。 0.03%の校正誤差は、モデルが十分に校正され、モデルの信頼性を高めることを示し、メンテナンス戦略に関する情報的決定を可能にする。

Anomaly detection in industrial systems is crucial for preventing equipment failures, ensuring risk identification, and maintaining overall system efficiency. Traditional monitoring methods often rely on fixed thresholds and empirical rules, which may not be sensitive enough to detect subtle changes in system health and predict impending failures. To address this limitation, this paper proposes, a novel Attention-based convolutional autoencoder (ABCD) for risk detection and map the risk value derive to the maintenance planning. ABCD learns the normal behavior of conductivity from historical data of a real-world industrial cooling system and reconstructs the input data, identifying anomalies that deviate from the expected patterns. The framework also employs calibration techniques to ensure the reliability of its predictions. Evaluation results demonstrate that with the attention mechanism in ABCD a 57.4% increase in performance and a reduction of false alarms by 9.37% is seen compared to without attention. The approach can effectively detect risks, the risk priority rank mapped to maintenance, providing valuable insights for cooling system designers and service personnel. Calibration error of 0.03% indicates that the model is well-calibrated and enhances model's trustworthiness, enabling informed decisions about maintenance strategies

翻訳日:2024-04-26 15:27:26 公開日:2024-04-24

# Pebbleのパール: 自動ラベリングのための信頼性機能の改善

Pearls from Pebbles: Improved Confidence Functions for Auto-labeling ( http://arxiv.org/abs/2404.16188v1 )

ライセンス: Link先を確認

Harit Vishwakarma, Reid, Chen, Sui Jiet Tay, Satya Sai Srinath Namburi, Frederic Sala, Ramya Korlakai Vinayak,

(参考訳) 自動ラベリングは、最小限の手動ラベリングでラベル付きトレーニングセットを生成する重要なテクニックのファミリーである。顕著な変種、しきい値に基づく自動ラベル付け(TBAL)は、上述したモデルの信頼度スコアのしきい値を見つけ、ラベルなしのデータポイントを正確にラベル付けすることで機能する。しかし、多くのモデルは自信過剰なスコアを生み出すことが知られており、TBALのパフォーマンスは劣っている。自然に考えれば、過剰な自信を和らげるためにオフ・ザ・シェルフ・キャリブレーション法を適用するというものであるが、そのような方法はいまだに不足している。信頼関数のアドホックな選択を実験するのではなく, TBAL信頼関数の研究のための枠組みを提案する。 TBALシステムの性能を最大化するための新しいポストホック手法である, フレームワークのトラクタブルバージョンを開発した。そこで我々は,<texttt{Colander} 法を広範囲に評価し,キャリブレーション用に設計した手法と比較した。 \texttt{Colander}は、ベースラインに対するカバレッジを最大60\%改善し、自動ラベル付けエラーを5\%以下に維持し、ベースラインと同じ量のラベル付きデータを使用する。

Auto-labeling is an important family of techniques that produce labeled training sets with minimum manual labeling. A prominent variant, threshold-based auto-labeling (TBAL), works by finding a threshold on a model's confidence scores above which it can accurately label unlabeled data points. However, many models are known to produce overconfident scores, leading to poor TBAL performance. While a natural idea is to apply off-the-shelf calibration methods to alleviate the overconfidence issue, such methods still fall short. Rather than experimenting with ad-hoc choices of confidence functions, we propose a framework for studying the \emph{optimal} TBAL confidence function. We develop a tractable version of the framework to obtain \texttt{Colander} (Confidence functions for Efficient and Reliable Auto-labeling), a new post-hoc method specifically designed to maximize performance in TBAL systems. We perform an extensive empirical evaluation of our method \texttt{Colander} and compare it against methods designed for calibration. \texttt{Colander} achieves up to 60\% improvements on coverage over the baselines while maintaining auto-labeling error below $5\%$ and using the same amount of labeled data as the baselines.

翻訳日:2024-04-26 15:27:26 公開日:2024-04-24

# 医用視覚質問応答のためのドメイン適応視覚と言語モデルの融合

Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering ( http://arxiv.org/abs/2404.16192v1 )

ライセンス: Link先を確認

Cuong Nhat Ha, Shima Asaadi, Sanjeev Kumar Karn, Oladimeji Farri, Tobias Heimann, Thomas Runkler,

(参考訳) 視覚言語モデルは、一般的なドメインで有効であり、視覚質問応答(VQA)のような多様なマルチモーダルアプリケーションで強い性能を示すが、より専門的なドメイン、例えば医療において同じレベルの効果を維持するのに苦労する。医療領域に適応した大規模ビジョンと言語モデルを統合する医療ビジョン言語モデルを提案する。このモデルは、3つの異なるバイオメディカル・ラジオロジー・マルチモーダル・ビジュアル・テキスト・データセットを用いてパラメータ効率のトレーニングを行う。提案モデルはSLAKE 1.0の医療用VQA(MedVQA)データセットで87.5%の精度で最先端のパフォーマンスを達成し、他のMedVQAデータセットであるVQA-RADでは73.2%の精度で高い性能を示す。

Vision-language models, while effective in general domains and showing strong performance in diverse multi-modal applications like visual question-answering (VQA), struggle to maintain the same level of effectiveness in more specialized domains, e.g., medical. We propose a medical vision-language model that integrates large vision and language models adapted for the medical domain. This model goes through three stages of parameter-efficient training using three separate biomedical and radiology multi-modal visual and text datasets. The proposed model achieves state-of-the-art performance on the SLAKE 1.0 medical VQA (MedVQA) dataset with an overall accuracy of 87.5% and demonstrates strong performance on another MedVQA dataset, VQA-RAD, achieving an overall accuracy of 73.2%.

翻訳日:2024-04-26 15:27:26 公開日:2024-04-24

# クラス共起確率を用いた複数ラベル認識の改善

Improving Multi-label Recognition using Class Co-Occurrence Probabilities ( http://arxiv.org/abs/2404.16193v1 )

ライセンス: Link先を確認

Samyak Rawlekar, Shubhang Bhatnagar, Vishnuvardhan Pogunulu Srinivasulu, Narendra Ahuja,

(参考訳) マルチラベル認識(MLR)は、画像内の複数のオブジェクトを識別する。この問題のさらなる複雑さに対処するため、近年の研究では、タスクのための大規模なテキスト画像データセットに基づいて訓練された視覚言語モデル(VLM)の情報を活用している。これらの手法は、各オブジェクト(クラス)に対して独立した分類器を学習し、その発生時の相関関係を見渡す。このような共起は、クラス間の条件付き確率としてトレーニングデータから取得することができる。本稿では,独立分類器の性能向上のために,オブジェクトペアの共起情報を組み込んだ独立分類器の拡張フレームワークを提案する。グラフ畳み込みネットワーク(GCN)を用いて,VLMを用いて得られた画像とテキストから得られた推定値を精算することにより,クラス間の条件付き確率を強制する。提案手法を4つのMLRデータセットで検証し,提案手法がすべての最先端手法より優れていることを示す。

Multi-label Recognition (MLR) involves the identification of multiple objects within an image. To address the additional complexity of this problem, recent works have leveraged information from vision-language models (VLMs) trained on large text-images datasets for the task. These methods learn an independent classifier for each object (class), overlooking correlations in their occurrences. Such co-occurrences can be captured from the training data as conditional probabilities between a pair of classes. We propose a framework to extend the independent classifiers by incorporating the co-occurrence information for object pairs to improve the performance of independent classifiers. We use a Graph Convolutional Network (GCN) to enforce the conditional probabilities between classes, by refining the initial estimates derived from image and text sources obtained using VLMs. We validate our method on four MLR datasets, where our approach outperforms all state-of-the-art methods.

翻訳日:2024-04-26 15:27:26 公開日:2024-04-24

# 認識的異種群を用いた個人差分アルゴリズムのゲーム理論解析

A Game-Theoretic Analysis of Auditing Differentially Private Algorithms with Epistemically Disparate Herd ( http://arxiv.org/abs/2404.16195v1 )

ライセンス: Link先を確認

Ya-Ting Yang, Tao Zhang, Quanyan Zhu,

(参考訳) プライバシを保存するAIアルゴリズムは、さまざまな領域で広く採用されているが、透明性の欠如は説明責任の問題を引き起こす可能性がある。監査アルゴリズムはこの問題に対処できるが、マシンベースの監査アプローチはしばしばコストと時間を要する。一方、Herd auditは、集団知性を活用して代替のソリューションを提供する。にもかかわらず、様々なレベルの専門知識と知識へのアクセスをもたらす監査者間の疫学的な格差の存在は、監査のパフォーマンスに影響を及ぼす可能性がある。効果的な羊飼いの監査は、アルゴリズム開発者の信用可能な説明責任の脅威を確立し、彼らの主張を裏付けるインセンティブを与える。本研究の目的は,Stackelbergのゲームアプローチを用いて,アルゴリズム開発者に対する監査が与える影響を調査する,体系的なフレームワークを開発することである。監査人にとって最適な戦略は、監査プロセスにおける監査人の自信を高めるため、関連する情報への容易なアクセスの重要性を強調している。同様に、ディベロッパにとって最適な選択は、監査人が知識獲得のコストを下げる場合、羊飼いの監査が実行可能であることを示唆している。透明性と説明責任を高めることで、Hed auditはプライバシ保護アルゴリズムの責任ある開発に寄与する。

Privacy-preserving AI algorithms are widely adopted in various domains, but the lack of transparency might pose accountability issues. While auditing algorithms can address this issue, machine-based audit approaches are often costly and time-consuming. Herd audit, on the other hand, offers an alternative solution by harnessing collective intelligence. Nevertheless, the presence of epistemic disparity among auditors, resulting in varying levels of expertise and access to knowledge, may impact audit performance. An effective herd audit will establish a credible accountability threat for algorithm developers, incentivizing them to uphold their claims. In this study, our objective is to develop a systematic framework that examines the impact of herd audits on algorithm developers using the Stackelberg game approach. The optimal strategy for auditors emphasizes the importance of easy access to relevant information, as it increases the auditors' confidence in the audit process. Similarly, the optimal choice for developers suggests that herd audit is viable when auditors face lower costs in acquiring knowledge. By enhancing transparency and accountability, herd audit contributes to the responsible development of privacy-preserving algorithms.

翻訳日:2024-04-26 15:27:26 公開日:2024-04-24

# ミツバチにおける小分子毒性の分類のための新しいベンチマークデータセットApisTox

ApisTox: a new benchmark dataset for the classification of small molecules toxicity on honey bees ( http://arxiv.org/abs/2404.16196v1 )

ライセンス: Link先を確認

Jakub Adamczyk, Jakub Poziemski, Paweł Siedlecki,

(参考訳) ミツバチのグローバルな減少は、農業、生物多様性、環境安定に重大なリスクをもたらす。既存のデータのギャップを埋めるため,ハチに対する殺虫剤の毒性に着目した包括的データセットであるApisToxを紹介した。このデータセットは、ECOTOXやPPDBといった既存のソースからのデータを組み合わせ、活用することで、以前のデータセットを超える広範囲で一貫性のある、キュレートされたコレクションを提供する。 ApisToxには、化学物質の毒性レベル、論文の出版時期などの詳細、外部の化学物質データベースにリンクする識別子など、幅広いデータが含まれている。このデータセットは、環境・農業研究の重要なツールとして機能するが、ミツバチの個体数に対する害を最小限に抑えるための政策や慣行の開発を支援することもできる。最後に、ApisToxはアグロケミカル化合物の分子特性予測法をベンチマークするためのユニークな資源を提供し、環境科学と化学情報学の両方の進歩を促進する。これは、ミツバチの保護における学術研究と実践的応用の両方に有用な道具である。

The global decline in bee populations poses significant risks to agriculture, biodiversity, and environmental stability. To bridge the gap in existing data, we introduce ApisTox, a comprehensive dataset focusing on the toxicity of pesticides to honey bees (Apis mellifera). This dataset combines and leverages data from existing sources such as ECOTOX and PPDB, providing an extensive, consistent, and curated collection that surpasses the previous datasets. ApisTox incorporates a wide array of data, including toxicity levels for chemicals, details such as time of their publication in literature, and identifiers linking them to external chemical databases. This dataset may serve as an important tool for environmental and agricultural research, but also can support the development of policies and practices aimed at minimizing harm to bee populations. Finally, ApisTox offers a unique resource for benchmarking molecular property prediction methods on agrochemical compounds, facilitating advancements in both environmental science and cheminformatics. This makes it a valuable tool for both academic research and practical applications in bee conservation.

翻訳日:2024-04-26 15:27:26 公開日:2024-04-24

# 臨床治験における効率的な患者補充に向けて : プロンプト学習モデルの適用

Towards Efficient Patient Recruitment for Clinical Trials: Application of a Prompt-Based Learning Model ( http://arxiv.org/abs/2404.16198v1 )

ライセンス: Link先を確認

Mojdeh Rahmanian, Seyed Mostafa Fakhrahmad, Seyedeh Zahra Mousavi,

(参考訳) 目的: 臨床試験は医薬品の介入を進めるのに不可欠であるが、適格な参加者を選ぶ際にボトルネックに直面している。電子健康記録(EHR)を採用に活用することは人気があるが、構造化されていない医療用テキストの複雑な性質は、参加者を効率的に特定する上での課題である。自然言語処理(NLP)技術は最近、トランスフォーマーモデルに焦点を絞ったソリューションとして登場した。本研究では,EHRで収集した非構造化医療ノートから,コホート選択タスクに対するプロンプトベース大規模言語モデルの性能を評価することを目的とした。方法: 医療記録の処理には, 試験に必要な適格基準に最も関連性の高い文章を選択した。それぞれの資格基準に関連するSNOMED CT概念を収集した。 SNOMED CTのオントロジーに基づいてMedCATと診断した。基準関連用語と一致する概念を含む注釈文を抽出した。次に,抽出した文をトレーニングセットとして,プロンプトベース大規模言語モデル(GPT)を用いた。 2018 n2c2 チャレンジのデータセットを用いて,NLP 技術を用いて,13 の資格基準に基づいて 311 人の医療記録を分類することを目的としたモデルの性能評価を行った。結果: 提案モデルでは, マイクロFとマクロFの合計が0.9061, 0.8060であり, 実験結果の最高値となった。結論: 本研究におけるプロンプトベース大規模言語モデルの適用は, 有望な評価基準に基づいて, 患者を分類するものである。また,他の医療用テキストにも適用可能なSNOMED CTオントロジーを用いた抽出要約法を提案する。

Objective: Clinical trials are essential for advancing pharmaceutical interventions, but they face a bottleneck in selecting eligible participants. Although leveraging electronic health records (EHR) for recruitment has gained popularity, the complex nature of unstructured medical texts presents challenges in efficiently identifying participants. Natural Language Processing (NLP) techniques have emerged as a solution with a recent focus on transformer models. In this study, we aimed to evaluate the performance of a prompt-based large language model for the cohort selection task from unstructured medical notes collected in the EHR. Methods: To process the medical records, we selected the most related sentences of the records to the eligibility criteria needed for the trial. The SNOMED CT concepts related to each eligibility criterion were collected. Medical records were also annotated with MedCAT based on the SNOMED CT ontology. Annotated sentences including concepts matched with the criteria-relevant terms were extracted. A prompt-based large language model (Generative Pre-trained Transformer (GPT) in this study) was then used with the extracted sentences as the training set. To assess its effectiveness, we evaluated the model's performance using the dataset from the 2018 n2c2 challenge, which aimed to classify medical records of 311 patients based on 13 eligibility criteria through NLP techniques. Results: Our proposed model showed the overall micro and macro F measures of 0.9061 and 0.8060 which were among the highest scores achieved by the experiments performed with this dataset. Conclusion: The application of a prompt-based large language model in this study to classify patients based on eligibility criteria received promising scores. Besides, we proposed a method of extractive summarization with the aid of SNOMED CT ontology that can be also applied to other medical texts.

翻訳日:2024-04-26 15:27:26 公開日:2024-04-24

# 機械学習による高次光子状態の最適分類

Optimized higher-order photon state classification by machine learning ( http://arxiv.org/abs/2404.16203v1 )

ライセンス: Link先を確認

Guangpeng Xu, Jeffrey Carvalho, Chiran Wijesundara, Tim Thomay,

(参考訳) 高次光子放出の分類は、決定論的多光子生成のためにさらに多くの方法が開発され、重要となる。広く使われている2階相関g(2)は、より高い光子フォック状態の量子純度を決定するのに十分ではない。従来のキャラクタリゼーション手法では、測定時間と計算時間を増大させる大量の光子検出イベントが必要となる。本稿では,2次元畳み込みニューラルネットワーク(CNN)に基づく機械学習モデルを用いて,最大で94%の精度でマルチフォトンフォック状態の迅速な分類を行う。シミュレーションされた光子検出イベントとg(3)相関を合わせることで、このモデルは特にスパース相関データで効率よく、800の共検出イベントで90%の精度を達成することができる。提案した実験装置を用いて、このCNN分類器は、量子技術に広く応用されている高光子状態の準リアルタイム分類の可能性を開く。

The classification of higher-order photon emission becomes important with more methods being developed for deterministic multiphoton generation. The widely-used second-order correlation g(2) is not sufficient to determine the quantum purity of higher photon Fock states. Traditional characterization methods require a large amount of photon detection events which leads to increased measurement and computation time. Here, we demonstrate a Machine Learning model based on a 2D Convolutional Neural Network (CNN) for rapid classification of multiphoton Fock states up to |3> with an overall accuracy of 94%. By fitting the g(3) correlation with simulated photon detection events, the model exhibits efficient performance particularly with sparse correlation data, with 800 co-detection events to achieve an accuracy of 90%. Using the proposed experimental setup, this CNN classifier opens up the possibility for quasi real-time classification of higher photon states, which holds broad applications in quantum technologies.

翻訳日:2024-04-26 15:27:26 公開日:2024-04-24

# 絡み合いに基づく人工トポロジー:周辺ネットワークノード

Entanglement-Based Artificial Topology: Neighboring Remote Network Nodes ( http://arxiv.org/abs/2404.16204v1 )

ライセンス: Link先を確認

Si-Yi Chen, Jessica Illiano, Angela Sara Cacciapuoti, Marcello Caleffi,

(参考訳) 絡み合いは、量子インターネットの鍵となる通信資源として全会一致で認識される。しかし, 両端の絡み合いに注意を集中させることによって, 両端の絡み合いを生かして, 新たなネットワーク機能を実現する可能性について, これまでに検討が進んでいない。本稿では,ネットワーク間リソースとしてマルチパーティ・エンタングルメントを活用することを目的としている。具体的には、異なる量子局所領域ネットワーク(QLAN)の相互接続を考察し、マルチパーティント・エンタングルメントにより、局所演算のみにより、物理QLANトポロジの限界を克服する、QLAN間人工トポロジを動的に生成できることを示す。そこで本研究ではまず,各QLANに分散するマルチパーティの絡み合った状態を設計する。そして、そのような状態がどのように設計されるかを示す。一異なるQLANに属する相互接続ノード及び二異なるQLAN間トラフィックパターンに動的に適応すること。我々の貢献は、ネットワークエンジニアリングコミュニティに、人工トポロジと人工地区の概念に関する手持ちのガイドラインを提供することである。

Entanglement is unanimously recognized as the key communication resource of the Quantum Internet. Yet, the possibility of implementing novel network functionalities by exploiting the marvels of entanglement has been poorly investigated so far, by mainly restricting the attention to bipartite entanglement. Conversely, in this paper, we aim at exploiting multipartite entanglement as inter-network resource. Specifically, we consider the interconnection of different Quantum Local Area Networks (QLANs), and we show that multipartite entanglement allows to dynamically generate an inter-QLAN artificial topology, by means of local operations only, that overcomes the limitations of the physical QLAN topologies. To this aim, we first design the multipartite entangled state to be distributed within each QLAN. Then, we show how such a state can be engineered to: i) interconnect nodes belonging to different QLANs, and ii) dynamically adapt to different inter-QLAN traffic patterns. Our contribution aims at providing the network engineering community with a hands-on guideline towards the concept of artificial topology and artificial neighborhood.

翻訳日:2024-04-26 15:27:26 公開日:2024-04-24

# AIS 2024 ユーザ生成コンテンツの映像品質評価に関する課題:方法と結果

AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results ( http://arxiv.org/abs/2404.16205v1 )

ライセンス: Link先を確認

Marcos V. Conde, Saman Zadtootaghaj, Nabajeet Barman, Radu Timofte, Chenlong He, Qi Zheng, Ruoxi Zhu, Zhengzhong Tu, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai, Wei Sun, Yuqin Cao, Yanwei Jiang, Jun Jia, Zhichao Zhang, Zijian Chen, Weixia Zhang, Xiongkuo Min, Steve Göring, Zihao Qi, Chen Feng,

(参考訳) 本稿では,ユーザ生成コンテンツ(UGC)に着目したAIS 2024ビデオ品質アセスメント(VQA)チャレンジをレビューする。この課題の目的は、UGCビデオの知覚品質を推定できるディープラーニングベースの手法を収集することである。 YouTube UGC Datasetのユーザー生成ビデオには、さまざまなコンテンツ(スポーツ、ゲーム、歌詞、アニメなど)、品質、解像度が含まれている。提案手法では,30FHDフレームを1秒以下で処理する必要がある。チャレンジでは、合計102人の参加者が登録され、15人がコードとモデルを提出した。ユーザ生成コンテンツの効率的な映像品質評価のための多種多様な深層モデルに関する調査として,トップ5投稿のパフォーマンスを概観し,紹介する。

This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed methods must process 30 FHD frames under 1 second. In the challenge, a total of 102 participants registered, and 15 submitted code and models. The performance of the top-5 submissions is reviewed and provided here as a survey of diverse deep models for efficient video quality assessment of user-generated content.

翻訳日:2024-04-26 15:27:26 公開日:2024-04-24

# 構造的およびテクスチャ的埋め込みを用いた知識グラフの補完

Knowledge Graph Completion using Structural and Textual Embeddings ( http://arxiv.org/abs/2404.16206v1 )

ライセンス: Link先を確認

Sakher Khalil Alqaaidi, Krzysztof Kochut,

(参考訳) 知識グラフ(KG)は、質問回答やレコメンデーションシステムなど、人工知能アプリケーションに広く使われている。しかしながら、KGは不完全であることがしばしば見出される。既存の文献の多くは、与えられた不完全なKG三重項に対する欠落ノードの予測に重点を置いているが、既存のノード間の関係を探索することでKGを完遂する機会は残っている。本研究では,KG内のテキスト情報と構造情報の両方を利用する関係予測モデルを提案する。本手法では,歩行に基づく埋め込みと言語モデル埋め込みを統合し,ノードを効果的に表現する。本研究では,広く利用されているデータセットで評価した場合,関係予測タスクにおける競合結果が得られたことを実証する。

Knowledge Graphs (KGs) are widely employed in artificial intelligence applications, such as question-answering and recommendation systems. However, KGs are frequently found to be incomplete. While much of the existing literature focuses on predicting missing nodes for given incomplete KG triples, there remains an opportunity to complete KGs by exploring relations between existing nodes, a task known as relation prediction. In this study, we propose a relations prediction model that harnesses both textual and structural information within KGs. Our approach integrates walks-based embeddings with language model embeddings to effectively represent nodes. We demonstrate that our model achieves competitive results in the relation prediction task when evaluated on a widely used dataset.

翻訳日:2024-04-26 15:27:26 公開日:2024-04-24

# GPU-RANC:ニューロモルフィックアーキテクチャのためのCUDA加速シミュレーションフレームワーク

GPU-RANC: A CUDA Accelerated Simulation Framework for Neuromorphic Architectures ( http://arxiv.org/abs/2404.16208v1 )

ライセンス: Link先を確認

Sahil Hassan, Michael Inouye, Miguel C. Gonzalez, Ilkin Aliyev, Joshua Mack, Maisha Hafiz, Ali Akoglu,

(参考訳) オープンソースシミュレーションツールは、ニューロモルフィックなアプリケーションエンジニアやハードウェアアーキテクトにとって、パフォーマンスボトルネックを調査し、シリコンにコミットする前に設計最適化を検討する上で重要な役割を果たす。 Reconfigurable Architecture for Neuromorphic Computing (RANC)は、ソフトウェアベースのシミュレーションとFPGAベースのエミュレーションの両方を通じて、統合されたエコシステム内で事前訓練されたスパイキングニューラルネットワーク(SNN)モデルを実行する機能を提供するツールである。 RANCは、実装ボトルネックを調査し、アーキテクチャパラメータをチューニングしたり、アプリケーションの洞察に基づいてニューロンの振る舞いを変更し、ハードウェアの性能とネットワークの正確性に関する貿易空間を研究するために、柔軟でパラメータ化された設計でコミュニティによって利用されてきた。ニューロモルフィックコンピューティングで使用するアーキテクチャの設計には、ニューロン当たりの重みの数と精度、コア当たりのニューロンと軸索の数、ネットワークトポロジ、ニューロンの振る舞いなど、信じられないほど多くの構成パラメータがある。このような研究を加速し、ユーザに生産的な空間探索の合理化を提供するため、本稿では、RANCのGPUベースの実装を紹介する。我々は並列化のアプローチを要約し、GPUベースの様々なユースケースにおけるTick-accurateシミュレーションで達成したスピードアップのゲインを定量化する。 512個のニューロモルフィックコアMNIST推論アプリケーションに基づくRANCシミュレータのシリアルバージョンと比較して,最大780倍の高速化を示した。 RANCエコシステムは、SNNを加速させ、最適化されたニューロモルフィックアーキテクチャに迅速に収束させることにより、よりリッチな研究を行うための様々な最適化を探索する研究において、より実現可能な手段を提供すると考えている。

Open-source simulation tools play a crucial role for neuromorphic application engineers and hardware architects to investigate performance bottlenecks and explore design optimizations before committing to silicon. Reconfigurable Architecture for Neuromorphic Computing (RANC) is one such tool that offers ability to execute pre-trained Spiking Neural Network (SNN) models within a unified ecosystem through both software-based simulation and FPGA-based emulation. RANC has been utilized by the community with its flexible and highly parameterized design to study implementation bottlenecks, tune architectural parameters or modify neuron behavior based on application insights and study the trade space on hardware performance and network accuracy. In designing architectures for use in neuromorphic computing, there are an incredibly large number of configuration parameters such as number and precision of weights per neuron, neuron and axon counts per core, network topology, and neuron behavior. To accelerate such studies and provide users with a streamlined productive design space exploration, in this paper we introduce the GPU-based implementation of RANC. We summarize our parallelization approach and quantify the speedup gains achieved with GPU-based tick-accurate simulations across various use cases. We demonstrate up to 780 times speedup compared to serial version of the RANC simulator based on a 512 neuromorphic core MNIST inference application. We believe that the RANC ecosystem now provides a much more feasible avenue in the research of exploring different optimizations for accelerating SNNs and performing richer studies by enabling rapid convergence to optimized neuromorphic architectures.

翻訳日:2024-04-26 15:27:26 公開日:2024-04-24

# 観測可能な量子状態の集合におけるランダム性の検証

Verifying randomness in sets of quantum states via observables ( http://arxiv.org/abs/2404.16211v1 )

ライセンス: Link先を確認

Xavier Bonet-Monroig, Hao Wang, Adrián Pérez-Salinas,

(参考訳) 量子状態の集合とハールランダム分布との整合を、既知の量子可観測性を通して統計的モーメントのマッチングにより予測する計量平均ランダム性を示す。本研究では,Haar-randomnessがディリクレ分布と結びついていることを示し,統計モーメントの閉形式表現と単純な境界を与える。この計量を置換とユニタリ同値な可観測性に一般化し、拡張平均ランダム性がハールランダム分布と互換性があるならば、状態の集合は概してハールランダムである。

We present a metric, average randomness, that predicts the compatibility of a set of quantum states with the Haar-random distribution, by matching of statistical moments, through a known quantum observable. We show that Haar-randomness is connected to the Dirichlet distribution, and provide a closed-form expression, and simple bounds of the statistical moments. We generalize this metric to permutation- and unitary-equivalent observables, ensuring that if the extended average randomness is compatible with a Haar-random distribution, then the set of states is approximately Haar-random.

翻訳日:2024-04-26 15:27:26 公開日:2024-04-24

# 進化する脅威景観におけるディープフェイク画像検出の最近の進歩の分析

An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape ( http://arxiv.org/abs/2404.16212v1 )

ライセンス: Link先を確認

Sifat Muhammad Abdullah, Aravind Cheruvu, Shravya Kanchi, Taejoong Chung, Peng Gao, Murtuza Jadliwala, Bimal Viswanath,

(参考訳) ディープフェイクまたは合成画像は、オンラインプラットフォームに深刻なリスクをもたらす。これにより、Deepfakeイメージを正確に検出し、公開可能なDeepfakeデータセット上での優れたパフォーマンスを実現するためのいくつかの研究活動が引き起こされた。本研究は,8つの最先端検出器について検討し,最近の2つの発展により,配備の準備が整っていないことを論じるものである。まず、大規模な生成モデルをカスタマイズするための軽量な方法の出現により、攻撃者は多数のカスタマイズされたジェネレータ(ディープフェイクを作成する)を作成でき、それによって脅威表面を大幅に増大させることができる。既存のディフェンスは、現在一般に公開されているような 'emph{user-customized generative model' の一般化に失敗していることを示す。本稿では、コンテンツに依存しない特徴に基づく新しい機械学習手法と、ユーザカスタマイズモデルに対する一般化性能を改善するためのアンサンブルモデリングについて論じる。第2に,‘textit{vision foundation model’ – 複数の下流タスクに容易に適応可能な広範なデータに基づいてトレーニングされたマシンラーニングモデル – の出現は,攻撃者が既存の防御を回避可能な敵のディープフェイクを作らないために,誤用される可能性がある。本稿では, 既存の基盤モデルを利用して, 画像内容のセマンティックな操作を通じて, 逆方向のサンプルを作成できる単純な逆方向攻撃を提案する。我々は、攻撃に対するいくつかの防衛の脆弱性を強調し、この新たな脅威に対抗するために、先進的な基盤モデルと敵の訓練を活用する方向を探る。

Deepfake or synthetic images produced using deep generative models pose serious risks to online platforms. This has triggered several research efforts to accurately detect deepfake images, achieving excellent performance on publicly available deepfake datasets. In this work, we study 8 state-of-the-art detectors and argue that they are far from being ready for deployment due to two recent developments. First, the emergence of lightweight methods to customize large generative models, can enable an attacker to create many customized generators (to create deepfakes), thereby substantially increasing the threat surface. We show that existing defenses fail to generalize well to such \emph{user-customized generative models} that are publicly available today. We discuss new machine learning approaches based on content-agnostic features, and ensemble modeling to improve generalization performance against user-customized models. Second, the emergence of \textit{vision foundation models} -- machine learning models trained on broad data that can be easily adapted to several downstream tasks -- can be misused by attackers to craft adversarial deepfakes that can evade existing defenses. We propose a simple adversarial attack that leverages existing foundation models to craft adversarial samples \textit{without adding any adversarial noise}, through careful semantic manipulation of the image content. We highlight the vulnerabilities of several defenses against our attack, and explore directions leveraging advanced foundation models and adversarial training to defend against this new threat.

翻訳日:2024-04-26 15:27:26 公開日:2024-04-24

# ActiveRIR:音響環境モデリングのためのアクティブオーディオ-ビジュアル探索

ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling ( http://arxiv.org/abs/2404.16216v1 )

ライセンス: Link先を確認

Arjun Somayazulu, Sagnik Majumder, Changan Chen, Kristen Grauman,

(参考訳) 環境音響モデルは、任意の音源/受信者の位置について、室内環境の物理的特性によって音がどのように変換されるかを表す。従来の音響モデル構築の方法は、空間の密集した場所にある大量の音響データの高価で時間を要する収集や、音響データサンプリングの場所をインテリジェントに選択するためのシーン幾何学の特権的な知識に依存している。本研究では,視覚・音響センサを備えた移動体エージェントが,環境音響モデルと占有マップを同時に構築する,無人環境の環境音響モデルを構築するための新しいタスクである能動的音響サンプリングを提案する。音声・視覚センサストリームからの情報を活用してエージェントナビゲーションを誘導し、最適な音響データサンプリング位置を判定する強化学習(RL)ポリシーであるActiveRIRを導入し、最小限の音響サンプルから環境の高品質な音響モデルを生成する。環境音響モデルにおける情報ゲインに基づく新しいRL報酬で政策を訓練する。 ActiveRIRは、最先端の音響シミュレーションプラットフォームから、さまざまな目に見えない屋内環境の評価を行い、従来のナビゲーションエージェントと既存の最先端の手法の両方を性能評価する。

An environment acoustic model represents how sound is transformed by the physical characteristics of an indoor environment, for any given source/receiver location. Traditional methods for constructing acoustic models involve expensive and time-consuming collection of large quantities of acoustic data at dense spatial locations in the space, or rely on privileged knowledge of scene geometry to intelligently select acoustic data sampling locations. We propose active acoustic sampling, a new task for efficiently building an environment acoustic model of an unmapped environment in which a mobile agent equipped with visual and acoustic sensors jointly constructs the environment acoustic model and the occupancy map on-the-fly. We introduce ActiveRIR, a reinforcement learning (RL) policy that leverages information from audio-visual sensor streams to guide agent navigation and determine optimal acoustic data sampling positions, yielding a high quality acoustic model of the environment from a minimal set of acoustic samples. We train our policy with a novel RL reward based on information gain in the environment acoustic model. Evaluating on diverse unseen indoor environments from a state-of-the-art acoustic simulation platform, ActiveRIR outperforms an array of methods--both traditional navigation agents based on spatial novelty and visual exploration as well as existing state-of-the-art methods.

翻訳日:2024-04-26 15:27:26 公開日:2024-04-24

# 階層空間上でのFADEを用いた効率的なNAS

Efficient NAS with FaDE on Hierarchical Spaces ( http://arxiv.org/abs/2404.16218v1 )

ライセンス: Link先を確認

Simon Neumeyer, Julian Stier, Michael Granitzer,

(参考訳) ニューラルアーキテクチャサーチ(NAS)は難しい問題である。階層的な検索空間は、ニューラルネットワークサブモジュールの安価な評価を可能にし、アーキテクチャ評価の代理となる。しかし、階層構造が制限的すぎる場合や、サロゲートが一般化に失敗する場合もあります。階層型NAS空間の有限領域における相対的な性能予測を得るために、微分可能なアーキテクチャ探索を用いるFaDEを提案する。これらのランクの相対的な性質は、メモリレス、バッチワイドな外的探索アルゴリズム(英語版)であり、疑似階調降下の進化的アルゴリズム(英語版)を用いる。 FaDEは特に階層的な多セル探索空間に適しており、指数的なコストではなく線形で探索できるため、プロキシ検索空間は不要である。実験の結果、探索空間の有限領域におけるFaDEランクは、対応するアーキテクチャ性能と相関し、第2に、完全なニューラルネットワーク探索空間における疑似漸進的進化探索に有効であることが示された。

Neural architecture search (NAS) is a challenging problem. Hierarchical search spaces allow for cheap evaluations of neural network sub modules to serve as surrogate for architecture evaluations. Yet, sometimes the hierarchy is too restrictive or the surrogate fails to generalize. We present FaDE which uses differentiable architecture search to obtain relative performance predictions on finite regions of a hierarchical NAS space. The relative nature of these ranks calls for a memory-less, batch-wise outer search algorithm for which we use an evolutionary algorithm with pseudo-gradient descent. FaDE is especially suited on deep hierarchical, respectively multi-cell search spaces, which it can explore by linear instead of exponential cost and therefore eliminates the need for a proxy search space. Our experiments show that firstly, FaDE-ranks on finite regions of the search space correlate with corresponding architecture performances and secondly, the ranks can empower a pseudo-gradient evolutionary search on the complete neural architecture search space.

翻訳日:2024-04-26 15:17:42 公開日:2024-04-24

# 曲がった連結は完備マイオラナ・McFarlandクラスに属さないのか?

When does a bent concatenation not belong to the completed Maiorana-McFarland class? ( http://arxiv.org/abs/2404.16220v1 )

ライセンス: Link先を確認

Sadmir Kudin, Enes Pasalic, Alexandr Polujan, Fengrong Zhang,

(参考訳) すべてのブールベント関数 $f$ は、連結化 $f=f_1||f_2$ または連結化 $f=f_1||f_2||f_3|||f_4$ と書けるが、これらはすべて同時にベント、半ベント、あるいは5値のスペクトル関数である。曲がった連結$f$ (not) は、完成したMaiorana-McFarland クラス $\mathcal{M}^\#$ に属するのか? 本稿では、この問題を完全に解決するために、$f=f_1||f_2$と$f=f_1||f_2||f_3||f_4$という形の結合に対する$\mathcal{M}$-部分空間の構造の完全な特徴づけを与える。これらの条件に基づき、$f=g||h||g||(h+1)$の場合、特別な場合において、$\mathcal{M}^\#$ の外の曲がり関数を指定するためのいくつかの明示的な設計法を提案する。

Every Boolean bent function $f$ can be written either as a concatenation $f=f_1||f_2$ of two complementary semi-bent functions $f_1,f_2$; or as a concatenation $f=f_1||f_2||f_3||f_4$ of four Boolean functions $f_1,f_2,f_3,f_4$, all of which are simultaneously bent, semi-bent, or 5-valued spectra-functions. In this context, it is essential to ask: When does a bent concatenation $f$ (not) belong to the completed Maiorana-McFarland class $\mathcal{M}^\#$? In this article, we answer this question completely by providing a full characterization of the structure of $\mathcal{M}$-subspaces for the concatenation of the form $f=f_1||f_2$ and $f=f_1||f_2||f_3||f_4$, which allows us to specify the necessary and sufficient conditions so that $f$ is outside $\mathcal{M}^\#$. Based on these conditions, we propose several explicit design methods of specifying bent functions outside $\mathcal{M}^\#$ in the special case when $f=g||h||g||(h+1)$, where $g$ and $h$ are bent functions.

翻訳日:2024-04-26 15:17:42 公開日:2024-04-24

# NeRF-XL:複数のGPUによるNeRFのスケーリング

NeRF-XL: Scaling NeRFs with Multiple GPUs ( http://arxiv.org/abs/2404.16221v1 )

ライセンス: Link先を確認

Ruilong Li, Sanja Fidler, Angjoo Kanazawa, Francis Williams,

(参考訳) 我々は、複数のGPUにまたがってニューラルネットワーク場(NeRF)を分散する原理的な方法であるNeRF-XLを提案し、任意の容量でNeRFのトレーニングとレンダリングを可能にする。まず,大規模シーンを独立に訓練された複数のNeRFに分解する既存のマルチGPUアプローチを再検討し,これらの手法の基本的な問題点を特定し,トレーニングにGPU(Advanced Computer Resources)を用いることによって,再構成品質の改善を阻害する。 NeRF-XLはこれらの問題を修正し、単により多くのハードウェアを使用することで、任意の数のパラメータでNeRFのトレーニングとレンダリングを可能にする。提案手法のコアとなる分散トレーニングとレンダリングの定式化は,従来のシングルGPUの場合と数学的に等価であり,GPU間の通信を最小化する。任意のパラメータ数でNeRFをアンロックすることにより、NeRFのマルチGPUスケーリング法則を初めて明らかにし、パラメータ数を大きくした再構成品質の向上とGPUの高速化を実現した。我々は,25km^2の都市部をカバーする258K画像を含む,これまでで最大規模のオープンソースデータセットMatrixCityを含む,さまざまなデータセットに対するNeRF-XLの有効性を実証した。

We present NeRF-XL, a principled method for distributing Neural Radiance Fields (NeRFs) across multiple GPUs, thus enabling the training and rendering of NeRFs with an arbitrarily large capacity. We begin by revisiting existing multi-GPU approaches, which decompose large scenes into multiple independently trained NeRFs, and identify several fundamental issues with these methods that hinder improvements in reconstruction quality as additional computational resources (GPUs) are used in training. NeRF-XL remedies these issues and enables the training and rendering of NeRFs with an arbitrary number of parameters by simply using more hardware. At the core of our method lies a novel distributed training and rendering formulation, which is mathematically equivalent to the classic single-GPU case and minimizes communication between GPUs. By unlocking NeRFs with arbitrarily large parameter counts, our approach is the first to reveal multi-GPU scaling laws for NeRFs, showing improvements in reconstruction quality with larger parameter counts and speed improvements with more GPUs. We demonstrate the effectiveness of NeRF-XL on a wide variety of datasets, including the largest open-source dataset to date, MatrixCity, containing 258K images covering a 25km^2 city area.

翻訳日:2024-04-26 15:17:42 公開日:2024-04-24

# インストラクショナルビデオにおけるステップ差

Step Differences in Instructional Video ( http://arxiv.org/abs/2404.16222v1 )

ライセンス: Link先を確認

Tushar Nagarajan, Lorenzo Torresani,

(参考訳) ユーザビデオと参照ハウツービデオを比較することは、ユーザの進捗に合わせてパーソナライズされたアシストを提供するAR/VR技術にとって重要な要件である。しかし、言語ベースの支援に対する現在のアプローチは、単一のビデオに関する質問に答えることしかできない。本論文では,まず,既存のステップアノテーションと付随するナレーションを活用することで,ハウト100Mからビデオのペアを含む大量の視覚的チューニングデータを自動生成し,さらにビデオ条件付き言語モデルを訓練して,複数の生動画を共同で解析する手法を提案する。本モデルでは,これらの違いの重大さに基づいて,ビデオペアとランキングビデオの差分を同定し,複数のビデオに対して一般的な推論を行うための有望な能力を示す。

Comparing a user video to a reference how-to video is a key requirement for AR/VR technology delivering personalized assistance tailored to the user's progress. However, current approaches for language-based assistance can only answer questions about a single video. We propose an approach that first automatically generates large amounts of visual instruction tuning data involving pairs of videos from HowTo100M by leveraging existing step annotations and accompanying narrations, and then trains a video-conditioned language model to jointly reason across multiple raw videos. Our model achieves state-of-the-art performance at identifying differences between video pairs and ranking videos based on the severity of these differences, and shows promising ability to perform general reasoning over multiple videos.

翻訳日:2024-04-26 15:17:42 公開日:2024-04-24

# NTIRE 2024 チャレンジサーベイ

Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey ( http://arxiv.org/abs/2404.16223v1 )

ライセンス: Link先を確認

Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhijing Sun, Jiaying Zhu, Liangyan Li, Ke Chen, Yunzhe Li, Yimo Ning, Guanhua Zhao, Jun Chen, Jinyang Yu, Kele Xu, Qisheng Xu, Yong Dou,

(参考訳) 本報告では,NTIRE 2024 RAW Image Super-Resolution Challengeについて概説し,提案手法と結果について述べる。 RAWスーパーリゾリューションのための新しい手法は、現代の画像信号処理(ISP)パイプラインでは必須であるが、RGBドメインのようには研究されていない。この課題の目標は、ノイズやぼやけなどの未知の劣化を考慮して、RAWベイア画像を2倍にスケールアップすることである。この挑戦では、合計230人の参加者が登録され、45人が挑戦期間に結果を提出した。 RAW Image Super-Resolutionの現在の最先端の指標として、トップ5のサブミッションのパフォーマンスをレビューし、ここで提供する。

This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as noise and blur. In the challenge, a total of 230 participants registered, and 45 submitted results during thee challenge period. The performance of the top-5 submissions is reviewed and provided here as a gauge for the current state-of-the-art in RAW Image Super-Resolution.

翻訳日:2024-04-26 15:17:42 公開日:2024-04-24

# 痛みの言語に関する計算学的分析--系統的考察

Computational analysis of the language of pain: a systematic review ( http://arxiv.org/abs/2404.16226v1 )

ライセンス: Link先を確認

Diogo A. P. Nunes, Joana Ferreira-Gomes, Fani Neto, David Martins de Matos,

(参考訳) 目的: 本研究の目的は, 患者や医師が生み出す痛みの言葉の計算処理に関する文献を体系的にレビューし, 現状と課題を明らかにすることである。方法: PRISMAガイドラインに従って, 痛みの言語処理に関する関連研究を選択し, あらかじめ定義された研究課題に答えるために, 総合的な文献検索を行った。データ抽出と合成を行い, 主目的と結果, 患者と痛みの集団, テキストデータ, 計算手法, 結果目標に応じて, 選択された研究を分類した。結果: 医師が生成した痛みの言語, 特に臨床記録から得られたものは, 最もよく用いられるデータであった。課題は、患者の診断とトリアージ、痛みの言及の識別、治療反応の予測、バイオメディカルな実体抽出、言語的特徴と臨床状態の相関、痛みの物語の語彙的分析である。 1つの研究は、実験装置における痛みの発話に関する以前の言語知識を含んでいた。ほとんどの研究は、臨床ツールとして、または間接的な知識として、医師の成果を目標にしていた。最も標的にされていない治療段階は、患者が最も関与する自己管理である。最も研究されていない痛みの次元は、感情的で社会文化的であった。提案アルゴリズムを取り入れた2つの研究で、臨床における医師の成績が改善した。考察: 今後の研究は、患者が生み出す痛みの言語分析、患者中心の自己管理とエンパワーメントのためのリソース開発、痛みの感情的・社会的側面の探索、そして、提案ツールによる支援による医師のパフォーマンス改善の測定に焦点をあてるべきである。

Objectives: This study aims to systematically review the literature on the computational processing of the language of pain, whether generated by patients or physicians, identifying current trends and challenges. Methods: Following the PRISMA guidelines, a comprehensive literature search was conducted to select relevant studies on the computational processing of the language of pain and answer pre-defined research questions. Data extraction and synthesis were performed to categorize selected studies according to their primary purpose and outcome, patient and pain population, textual data, computational methodology, and outcome targets. Results: Physician-generated language of pain, specifically from clinical notes, was the most used data. Tasks included patient diagnosis and triaging, identification of pain mentions, treatment response prediction, biomedical entity extraction, correlation of linguistic features with clinical states, and lexico-semantic analysis of pain narratives. Only one study included previous linguistic knowledge on pain utterances in their experimental setup. Most studies targeted their outcomes for physicians, either directly as clinical tools or as indirect knowledge. The least targeted stage of clinical pain care was self-management, in which patients are most involved. The least studied dimensions of pain were affective and sociocultural. Only two studies measured how physician performance on clinical tasks improved with the inclusion of the proposed algorithm. Discussion: This study found that future research should focus on analyzing patient-generated language of pain, developing patient-centered resources for self-management and patient-empowerment, exploring affective and sociocultural aspects of pain, and measuring improvements in physician performance when aided by the proposed tools.

翻訳日:2024-04-26 15:17:42 公開日:2024-04-24

# 共分散行列ダイナミクスのクロトフ制御による光学系における最適絡み合い生成

Optimal entanglement generation in optomechanical systems via Krotov control of covariance matrix dynamics ( http://arxiv.org/abs/2404.16227v1 )

ライセンス: Link先を確認

Peng-Ju Chen, Da-Wei Luo, Ting Yu,

(参考訳) 本研究では,Fockベースカットオフを使わずに,オプティメカルシステムにおける絡み合い生成に着目し,連続変数系の最適制御について検討した。 Krotovアルゴリズムを用いて共分散行列のダイナミクスを最適化し、制御対象関数を設計し、システムのダイナミクスを操作して望ましい目標状態を生成する方法を示した。本研究では, 外部レーザ場の変形制御を施すことにより, マクロメカニカルミラーと量子光学キャビティとの絡み合いを確実に生成できることを実証した。この制御は、低周波成分に制限するために外部磁場にスペクトル制約を課す際にも達成される可能性がある。さらに、非マルコフ開系力学に対する量子制御の影響を体系的に研究する。我々は, 環境騒音の有害な影響を緩和する上で, 記憶効果が有効であることを示した。特に、この絡み合いは、これらの記憶効果の存在下での崩壊を減少させる。

We investigated the optimal control of a continuous variable system, focusing on entanglement generation in an optomechanical system without utilizing Fock basis cutoffs. Using the Krotov algorithm to optimize the dynamics of the covariance matrix, we illustrated how to design a control objective function to manipulate the dynamics of the system to generate a desirable target state. We showed that entanglement between the macroscopic mechanical mirror and the quantum optical cavity can be reliably generated through imposing the control on the detuning of the external laser field. It has be shown that the control may be still achieved when imposing spectral constraints on the external field to restrict it to low-frequency components. In addition, we systematically studies the effects of quantum control on non-Markovian open system dynamics. We observed that memory effects can play a beneficial role in mitigating the detrimental impact of environmental noises. Specifically, the entanglement generated shows reduced decay in the presence of these memory effects.

翻訳日:2024-04-26 15:17:42 公開日:2024-04-24

# InAsP量子ドットナノワイヤにおける合成スピンワン鎖におけるマクロシングルトリップレット量子ビットを持つ2つの量子ビットゲート

Two qubit gate with macroscopic singlet-triplet qubits in synthetic spin-one chains in InAsP quantum dot nanowires ( http://arxiv.org/abs/2404.16229v1 )

ライセンス: Link先を確認

Hassan Allami, Daniel Miravet, Marek Korkusinski, Pawel Hawrylak,

(参考訳) InAsP量子ドットナノワイヤにおける合成スピン-ワン鎖における2量子ゲートの理論について述べる。マクロトポロジカルに保護された一重項四重項量子ビットは、2つのスピン半分のハルデネ準粒子で構築される。ハルダン準粒子は、InAsP量子ドットの鎖で実現された合成スピン1鎖によってホストされ、それぞれ4つの電子を持つInPナノワイヤに埋め込まれている。量子ドットナノワイヤは相互作用する原子モデルから派生したハバード・カナモリ・ハミルトン(HK)によって記述される。正確な対角化とマトリックス生成状態(MPS)ツールを用いて、HKハミルトニアンの低エネルギー挙動が反強磁性スピン1鎖ハミルトニアンによって効果的に捕捉されることを示した。次に、2つのマクロ量子ビットについて考察し、その2つの鎖の間に中間制御点を挿入することにより、2つのマクロ量子ビット間の可変結合を生成する方法を提案する。最後に、高い精度の2ST量子ビットゲートを生成するための2つの方法を提案する。(1)各量子ビットの長さを制御し、(2)異なる背景磁場を2つの量子ビットに利用することによって、。

We present a theory of a two qubit gate with macroscopic singlet-triplet (ST) qubits in synthetic spin-one chains in InAsP quantum dot nanowires. The macroscopic topologically protected singlet-triplet qubits are built with two spin-half Haldane quasiparticles. The Haldane quasiparticles are hosted by synthetic spin-one chain realized in chains of InAsP quantum dots embedded in an InP nanowire, with four electrons each. The quantum dot nanowire is described by a Hubbard-Kanamori (HK) Hamiltonian derived from an interacting atomistic model. Using exact diagonalization and Matrix Product States (MPS) tools, we demonstrate that the low-energy behavior of the HK Hamiltonian is effectively captured by an antiferromagnetic spin-one chain Hamiltonian. Next we consider two macroscopic qubits and present a method for creating a tunable coupling between the two macroscopic qubits by inserting an intermediate control dot between the two chains. Finally, we propose and demonstrate two approaches for generating highly accurate two-ST qubit gates : (1) by controlling the length of each qubit, and (2) by employing different background magnetic fields for the two qubits.

翻訳日:2024-04-26 15:17:42 公開日:2024-04-24

# SECO: マルチサーバ階層間のモデル分割によるセキュア推論

SECO: Secure Inference With Model Splitting Across Multi-Server Hierarchy ( http://arxiv.org/abs/2404.16232v1 )

ライセンス: Link先を確認

Shuangyi Chen, Ashish Khisti,

(参考訳) 予測・アズ・ア・サービスという文脈では、データとモデルのプライバシに関する懸念が提起され、セキュアな推論プロトコルによって取り組まれている。これらのプロトコルは、さまざまなセキュリティ前提の下で設計された単一または複数の暗号化ツールを使用して構築される。本稿では,ユーザが入力データベクトルと複数のサーバノードを分割ニューラルネットワークモデルで配置して,データのプライバシを損なうことなく,予測を協調的に計算することのできるセキュアな推論プロトコルSECOを紹介する。我々は、ニューラルネットワークモデル全体を単一のサーバノード上に配置する必要のあるセキュアな推論に関する以前の作業を拡張し、マルチサーバ階層に拡張し、ユーザがゲートウェイサーバノードに通信し、リモートサーバノードに通信する。推論タスクはサーバノードに分割され、データベクトルの暗号化されたコピーで実行されなければならない。我々は,マルチパーティの同型暗号とマルチパーティのガーブロード回路方式を採用し,ユーザから部分モデル構造を保護するとともに,半正直なサーバの不正な大多数に対してシステムを保護する。我々は,複数のモデル上でSECOを評価し,ユーザの計算コストと通信コストの低減を実現し,限られたリソースを持つユーザのデバイスに適用可能なプロトコルを提案する。

In the context of prediction-as-a-service, concerns about the privacy of the data and the model have been brought up and tackled via secure inference protocols. These protocols are built up by using single or multiple cryptographic tools designed under a variety of different security assumptions. In this paper, we introduce SECO, a secure inference protocol that enables a user holding an input data vector and multiple server nodes deployed with a split neural network model to collaboratively compute the prediction, without compromising either party's data privacy. We extend prior work on secure inference that requires the entire neural network model to be located on a single server node, to a multi-server hierarchy, where the user communicates to a gateway server node, which in turn communicates to remote server nodes. The inference task is split across the server nodes and must be performed over an encrypted copy of the data vector. We adopt multiparty homomorphic encryption and multiparty garbled circuit schemes, making the system secure against dishonest majority of semi-honest servers as well as protecting the partial model structure from the user. We evaluate SECO on multiple models, achieving the reduction of computation and communication cost for the user, making the protocol applicable to user's devices with limited resources.

翻訳日:2024-04-26 15:17:42 公開日:2024-04-24

# AutoGluon-Multimodal (AutoMM): ファンデーションモデルによるマルチモーダルオートMLのスーパーチャージ

AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models ( http://arxiv.org/abs/2404.16233v1 )

ライセンス: Link先を確認

Zhiqiang Tang, Haoyang Fang, Su Zhou, Taojiannan Yang, Zihan Zhong, Tony Hu, Katrin Kirchhoff, George Karypis,

(参考訳) AutoGluon-Multimodal(AutoMM)は、マルチモーダル学習に特化したオープンソースのAutoMLライブラリとして導入された。非常に使いやすく、AutoMMは3行のコードで基礎モデルの微調整を可能にする。画像、テキスト、および表データを含む様々なモダリティをサポートするため、ライブラリは、分類、回帰、オブジェクト検出、セマンティックマッチング、イメージセグメンテーションにまたがる、包括的な機能スイートを提供する。さまざまなデータセットやタスクにわたる実験では、既存のAutoMLツールと比較して、基本的な分類や回帰タスクにおけるAutoMMの優れたパフォーマンスを示すと同時に、高度なタスクにおける競合結果を示し、そのような目的のために設計された特殊なツールボックスと整合する。

AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning. Distinguished by its exceptional ease of use, AutoMM enables fine-tuning of foundational models with just three lines of code. Supporting various modalities including image, text, and tabular data, both independently and in combination, the library offers a comprehensive suite of functionalities spanning classification, regression, object detection, semantic matching, and image segmentation. Experiments across diverse datasets and tasks showcases AutoMM's superior performance in basic classification and regression tasks compared to existing AutoML tools, while also demonstrating competitive results in advanced tasks, aligning with specialized toolboxes designed for such purposes.

翻訳日:2024-04-26 15:17:42 公開日:2024-04-24

# 軌道関数の厳密な形式化:非相互作用的$v$-表現可能性問題に対処する

Rigorous Formalization of Orbital Functionals: Addressing the Noninteracting $v$-Representability Problem ( http://arxiv.org/abs/2404.16236v1 )

ライセンス: Link先を確認

Neil Qiang Su,

(参考訳) 占有軌道、占有軌道、または占有軌道に明示的に依存する函数はクリフォード代数を用いて厳密に定式化され、公式な実装法として軌道(および占有軌道)最適化を促進する変分原理が確立される。理論的には、これらの手法は、元のコーン・シャムと関連する方法、特に相互作用する系の電子密度が相互作用しない参照系と一致しない場合の限界を回避している。この研究は、軌道汎函数(および占有)を新しい視点から再定義し、従来の密度汎函数の拡張としてだけでなく、優越的で厳密な代替として位置づける。

Functionals that explicitly depend on occupied, unoccupied, or fractionally-occupied orbitals are rigorously formalized using Clifford algebras, and a variational principle is established that facilitates orbital (and occupation) optimization as a formal implementation method. Theoretically, these methodologies circumvent the limitations encountered in the original Kohn-Sham and related methods, particularly when the interacting system's electron density does not match that of any noninteracting reference system. This work redefines orbital (and occupation) functionals from a novel perspective, positioning them not merely as extensions of traditional density functionals, but as superior, rigorous alternatives.

翻訳日:2024-04-26 15:17:42 公開日:2024-04-24

# 高度な情報理論によるデータ分析におけるプライバシとユーティリティの相乗効果

Synergizing Privacy and Utility in Data Analytics Through Advanced Information Theorization ( http://arxiv.org/abs/2404.16241v1 )

ライセンス: Link先を確認

Zahir Alsulaimawi,

(参考訳) 本研究では、プライバシ保護データ分析のための新しいフレームワークを開発し、データユーティリティとプライバシに関するバランスをとるという重要な課題に対処する。本稿では,高次元画像データに適したノイズ注入技術,高感度属性をマスキングしながら特徴抽出を行う可変オートエンコーダ(VAE),構造化データプライバシに最適化された期待最大化(EM)アプローチの3つの高度なアルゴリズムを紹介する。修正MNISTやCelebrityAなどのデータセットに適用することにより、機密属性と変換データ間の相互情報を著しく低減し、プライバシーを向上する。実験の結果,これらの手法が優れたプライバシ保護を実現し,高いユーティリティを保ち,両面が不可欠である実用的なアプリケーションに有効であることが確認された。この研究は、さまざまなデータタイプにまたがってプライバシ保護アルゴリズムをデプロイするためのフレキシブルで効果的な戦略を提供し、データ分析における実用性と機密性のための新しいベンチマークを確立することで、この分野に貢献する。

This study develops a novel framework for privacy-preserving data analytics, addressing the critical challenge of balancing data utility with privacy concerns. We introduce three sophisticated algorithms: a Noise-Infusion Technique tailored for high-dimensional image data, a Variational Autoencoder (VAE) for robust feature extraction while masking sensitive attributes and an Expectation Maximization (EM) approach optimized for structured data privacy. Applied to datasets such as Modified MNIST and CelebrityA, our methods significantly reduce mutual information between sensitive attributes and transformed data, thereby enhancing privacy. Our experimental results confirm that these approaches achieve superior privacy protection and retain high utility, making them viable for practical applications where both aspects are crucial. The research contributes to the field by providing a flexible and effective strategy for deploying privacy-preserving algorithms across various data types and establishing new benchmarks for utility and confidentiality in data analytics.

翻訳日:2024-04-26 15:17:42 公開日:2024-04-24

# muRelBench: Zonotopeドメイン用のマイクロベンチマーク

muRelBench: MicroBenchmarks for Zonotope Domains ( http://arxiv.org/abs/2404.16243v1 )

ライセンス: Link先を確認

Kenny Ballou, Elena Sherman,

(参考訳) 我々は、弱い関係の抽象ドメインとその操作のための合成ベンチマークスイートである、texttt{muRelBench}を提示する。例えば、ベンチマークはドメイン閉鎖のような提案されたアルゴリズムの実験的な評価をサポートすることができる。

We present \texttt{muRelBench}, a suite of synthetic benchmarks for weakly-relational abstract domains and their operations. For example, the benchmarks can support experimental evaluations of proposed algorithms such as domain closure.

翻訳日:2024-04-26 15:17:42 公開日:2024-04-24

# 高度なAIアシスタントの倫理

The Ethics of Advanced AI Assistants ( http://arxiv.org/abs/2404.16244v1 )

ライセンス: Link先を確認

Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks, Verena Rieser, Hasan Iqbal, Nenad Tomašev, Ira Ktena, Zachary Kenton, Mikel Rodriguez, Seliem El-Sayed, Sasha Brown, Canfer Akbulut, Andrew Trask, Edward Hughes, A. Stevie Bergman, Renee Shelby, Nahema Marchal, Conor Griffin, Juan Mateos-Garcia, Laura Weidinger, Winnie Street, Benjamin Lange, Alex Ingerman, Alison Lentz, Reed Enger, Andrew Barakat, Victoria Krakovna, John Oliver Siy, Zeb Kurth-Nelson, Amanda McCroskery, Vijay Bolina, Harry Law, Murray Shanahan, Lize Alberts, Borja Balle, Sarah de Haas, Yetunde Ibitoye, Allan Dafoe, Beth Goldberg, Sébastien Krier, Alexander Reese, Sims Witherspoon, Will Hawkins, Maribeth Rauh, Don Wallace, Matija Franklin, Josh A. Goldstein, Joel Lehman, Michael Klenk, Shannon Vallor, Courtney Biles, Meredith Ringel Morris, Helen King, Blaise Agüera y Arcas, William Isaac, James Manyika,

(参考訳) 本稿では,高度AIアシスタントがもたらす倫理的・社会的リスクについて論じる。我々は、先進的なAIアシスタントを自然言語インタフェースを備えた人工知能エージェントとして定義し、ユーザに代わって、1つ以上のドメインにわたって、ユーザの期待に応えてアクションのシーケンスを計画および実行することが機能する。この論文は、AIアシスタント、その技術基盤、潜在的な応用範囲の概要を提供する、技術自体を考えることから始まる。そして、AIの価値アライメント、幸福、安全、悪意のある使用に関する質問を探索する。次に、高度なAIアシスタントと個人ユーザとの関係をさらに詳細に検討し、操作や説得、人為性、適切な関係、信頼、プライバシといったトピックを探求する。この分析によって、高度なアシスタントの社会規模での展開を考慮し、協力、株式とアクセス、誤情報、経済的影響、環境、先進的なAIアシスタントの評価方法に焦点をあてる。最後に、研究者、開発者、政策立案者、および公共ステークホルダーに対して、さまざまなレコメンデーションを提供することで締めくくります。

This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, providing an overview of AI assistants, their technical foundations and potential range of applications. It then explores questions around AI value alignment, well-being, safety and malicious uses. Extending the circle of inquiry further, we next consider the relationship between advanced AI assistants and individual users in more detail, exploring topics such as manipulation and persuasion, anthropomorphism, appropriate relationships, trust and privacy. With this analysis in place, we consider the deployment of advanced assistants at a societal scale, focusing on cooperation, equity and access, misinformation, economic impact, the environment and how best to evaluate advanced AI assistants. Finally, we conclude by providing a range of recommendations for researchers, developers, policymakers and public stakeholders.

翻訳日:2024-04-26 15:17:42 公開日:2024-04-24

# 角運動量量子エンタングルを用いた固体高調波ガウス軌道の計算効率の良い分子積分

Computationally Efficient Molecular Integrals of Solid Harmonic Gaussian Orbitals Using Quantum Entanglement of Angular Momentum ( http://arxiv.org/abs/2404.16245v1 )

ライセンス: Link先を確認

Hang Hu, Gilles Peslherbe, Hsu Kiang Ooi, Anguang Hu,

(参考訳) 角運動量の量子論におけるベクトルカップリングとベクトルアンカップリングスキームは、量子角運動量状態で作用するユニタリなクレブシュ・ゴルダン変換に対応し、それによってそれらの絡み合いの度合いを制御する。この変換から量子角運動量を加えることは、量子角運動量の絡み合いの度合いを減らし、固体調和ガウス軌道(SHGO)の分子積分の単純かつ効果的な計算に繋がる。古典的コンピュータでさえ、SHGOと分子核クーロン積分の評価におけるスピードアップ比は、高い角運動量量子数を持つ原子軌道に対して最大4桁までである。したがって、量子系の絡み合いが小さくなればなるほど、シミュレーションが容易になり、SHGOの分子積分は量子コンピューティングに特に適していることが示される。角運動量状態のClebsch-Gordan変換のユニタリおよびカスケードのために以前に開発された高効率量子回路は、量子化学においてユビキタスな2電子クーロン積分を効率的に計算するために固体調和系の微分および積規則に適用することができる。このような量子回路と変分量子固有解法アルゴリズムを組み合わせることで、この論文で明らかになった固体調和基底における分子積分の高い計算効率は、完全な量子コンピューティング化学を加速するための道を開くかもしれない。

Vector-coupling and vector-uncoupling schemes in the quantum theory of angular momentum correspond to unitary Clebsch-Gordan transformations that operate on quantum angular momentum states and thereby control their degree of entanglement. The addition of quantum angular momentum from this transformation is suitable for reducing the degree of entanglement of quantum angular momentum, leading to simple and effective calculations of the molecular integrals of solid harmonic Gaussian orbitals (SHGO). Even with classical computers, the speed-up ratio in the evaluation of molecular nuclear Coulomb integrals with SHGOs can be up to four orders of magnitude for atomic orbitals with high angular momentum quantum number. Thus, the less entanglement there is for a quantum system the easier it is to simulate, and molecular integrals with SHGOs are shown to be particularly well-suited for quantum computing. High-efficiency quantum circuits previously developed for unitary and cascading Clebsch-Gordan transformations of angular momentum states can be applied to the differential and product rules of solid harmonics to efficiently compute two-electron Coulomb integrals ubiquitous in quantum chemistry. Combined with such quantum circuits and variational quantum eigensolver algorithms, the high computational efficiency of molecular integrals in solid harmonic bases unveiled in this paper may open an avenue for accelerating full quantum computing chemistry.

翻訳日:2024-04-26 15:17:42 公開日:2024-04-24

# URL:タスク指示表現圧縮によるユニバーサル参照知識リンク

URL: Universal Referential Knowledge Linking via Task-instructed Representation Compression ( http://arxiv.org/abs/2404.16248v1 )

ライセンス: Link先を確認

Zhuoqun Li, Hongyu Lin, Tianshu Wang, Boxi Cao, Yaojie Lu, Weixiang Zhou, Hao Wang, Zhenyu Zeng, Le Sun, Xianpei Han,

(参考訳) 根拠付き参照にクレームをリンクすることは、信頼できる情報に対する人間の要求を満たす重要な能力である。現在の研究は情報検索やセマンティックマッチングのような特定のタスクに限定されており、クレーム-参照関係はユニークで固定的であり、実世界の参照知識リンク(RKL)はより多様で複雑である。本稿では,1つの統一モデルにより多種多様な参照知識リンクタスクを解決することを目的とした,ユニバーサル参照知識リンク(URL)を提案する。そこで本研究では,LLMの命令従順と意味理解能力を参照知識リンクに効果的に適応させるため,LLMによるタスク命令型表現圧縮と多視点学習手法を提案する。さらに,様々なシナリオにまたがる参照知識リンクタスクにおけるモデルの有効性を評価するための新しいベンチマークを構築した。実験により、既存の手法では普遍的なRKLが困難であることが示され、提案したフレームワークは様々なシナリオでタスクを効果的に解決し、従って従来の手法よりも大きなマージンで優れていることが示された。

Linking a claim to grounded references is a critical ability to fulfill human demands for authentic and reliable information. Current studies are limited to specific tasks like information retrieval or semantic matching, where the claim-reference relationships are unique and fixed, while the referential knowledge linking (RKL) in real-world can be much more diverse and complex. In this paper, we propose universal referential knowledge linking (URL), which aims to resolve diversified referential knowledge linking tasks by one unified model. To this end, we propose a LLM-driven task-instructed representation compression, as well as a multi-view learning approach, in order to effectively adapt the instruction following and semantic understanding abilities of LLMs to referential knowledge linking. Furthermore, we also construct a new benchmark to evaluate ability of models on referential knowledge linking tasks across different scenarios. Experiments demonstrate that universal RKL is challenging for existing approaches, while the proposed framework can effectively resolve the task across various scenarios, and therefore outperforms previous approaches by a large margin.

翻訳日:2024-04-26 15:07:57 公開日:2024-04-24

# SemgrexとSsurgeon, 依存グラフの検索と操作

Semgrex and Ssurgeon, Searching and Manipulating Dependency Graphs ( http://arxiv.org/abs/2404.16250v1 )

ライセンス: Link先を確認

John Bauer, Chloe Kiddon, Eric Yeh, Alex Shan, Christopher D. Manning,

(参考訳) 依存性グラフを検索してそれらを操作するのは、正しく行うのに時間がかかり、難しい作業になる可能性がある。本稿では,依存グラフを検索するシステムであるSemgrexを文書化し,Semgrexの出力を操作するシステムであるSsurgeonを紹介する。これらのシステムで使用されるコンパクトな言語は、依存性のコマンドラインやAPI処理を容易にする。さらに、JavaとPythonで公開されたツールキットとの統合により、テキストの関係や属性を自然のテキストで検索できる。

Searching dependency graphs and manipulating them can be a time consuming and challenging task to get right. We document Semgrex, a system for searching dependency graphs, and introduce Ssurgeon, a system for manipulating the output of Semgrex. The compact language used by these systems allows for easy command line or API processing of dependencies. Additionally, integration with publicly released toolkits in Java and Python allows for searching text relations and attributes over natural text.

翻訳日:2024-04-26 15:07:57 公開日:2024-04-24

# マルチターンLDM相互作用における急速漏洩効果とブラックボックス防御の検討

Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions ( http://arxiv.org/abs/2404.16251v1 )

ライセンス: Link先を確認

Divyansh Agarwal, Alexander R. Fabbri, Philippe Laban, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu,

(参考訳) 大規模言語モデル(LLM)のプロンプトリークは、特に検索強化世代(RAG)システムにおいて、重大なセキュリティとプライバシの脅威を引き起こす。しかし, マルチターンLDM相互作用と緩和戦略のリークは, 標準化された方法では研究されていない。本稿では,4つの異なるドメインと10のクローズドおよびオープンソース LLM にまたがる急激なリークに対するLSM 脆弱性について検討する。我々のユニークなマルチターン脅威モデルでは, LLMのサイコファンシー効果を活用し, LLM応答におけるタスク命令と知識リークを識別する。マルチターン環境では,GPT-4およびclaude-1.3による99%のリークを含む平均攻撃成功率(ASR)が86.2%に上昇する。 GeminiのようなブラックボックスのLCMの中には、ドメイン間のリークに対する様々な感受性を示すものもあります - 医療ドメインと比較して、ニュースドメインのコンテキスト知識をリークする傾向があります。実験では,RAGシナリオにおけるクエリリライタを含む6つのブラックボックス防衛戦略の具体的な効果を測定した。提案する多層防御の組み合わせは, ブラックボックスLLMのASRは5.3%であり, LLMセキュリティ研究の強化と今後の方向性を示す余地がある。

Prompt leakage in large language models (LLMs) poses a significant security and privacy threat, particularly in retrieval-augmented generation (RAG) systems. However, leakage in multi-turn LLM interactions along with mitigation strategies has not been studied in a standardized manner. This paper investigates LLM vulnerabilities against prompt leakage across 4 diverse domains and 10 closed- and open-source LLMs. Our unique multi-turn threat model leverages the LLM's sycophancy effect and our analysis dissects task instruction and knowledge leakage in the LLM response. In a multi-turn setting, our threat model elevates the average attack success rate (ASR) to 86.2%, including a 99% leakage with GPT-4 and claude-1.3. We find that some black-box LLMs like Gemini show variable susceptibility to leakage across domains - they are more likely to leak contextual knowledge in the news domain compared to the medical domain. Our experiments measure specific effects of 6 black-box defense strategies, including a query-rewriter in the RAG scenario. Our proposed multi-tier combination of defenses still has an ASR of 5.3% for black-box LLMs, indicating room for enhancement and future direction for LLM security research.

翻訳日:2024-04-26 15:07:57 公開日:2024-04-24

# 完全同型暗号化による顔分析のプライバシー向上

Enhancing Privacy in Face Analytics Using Fully Homomorphic Encryption ( http://arxiv.org/abs/2404.16255v1 )

ライセンス: Link先を確認

Bharat Yalavarthi, Arjun Ramesh Kaushik, Arun Ross, Vishnu Boddeti, Nalini Ratha,

(参考訳) 現代の顔認識システムは、ディープニューラルネットワークを使用して顔から有能な特徴を抽出する。これらの特徴は潜在空間への埋め込みを意味し、しばしば顔認識システム内のテンプレートとして保存される。これらの埋め込みはデータ漏洩の影響を受けやすく、場合によっては元の顔画像の再構築にも利用できる。妥協のアイデンティティを防止するため、テンプレート保護スキームが一般的である。しかし、これらのスキームは、年齢、性別、人種などの柔らかい生体情報漏洩を未然に防ぐことはできない。この問題を軽減するために,FHE(Fully Homomorphic Encryption)と,PolyProtectと呼ばれる既存のテンプレート保護スキームを組み合わせた新しい手法を提案する。埋め込みはFHEを用いて圧縮・暗号化され、多項式変換を用いてセキュアなPolyProtectテンプレートに変換され、さらなる保護が可能であることを示す。提案手法の有効性を,複数のデータセットに対する広範な実験により実証する。提案手法は, 認識精度を損なうことなく, ソフトな生体認証属性の漏洩を効果的に防止する。

Modern face recognition systems utilize deep neural networks to extract salient features from a face. These features denote embeddings in latent space and are often stored as templates in a face recognition system. These embeddings are susceptible to data leakage and, in some cases, can even be used to reconstruct the original face image. To prevent compromising identities, template protection schemes are commonly employed. However, these schemes may still not prevent the leakage of soft biometric information such as age, gender and race. To alleviate this issue, we propose a novel technique that combines Fully Homomorphic Encryption (FHE) with an existing template protection scheme known as PolyProtect. We show that the embeddings can be compressed and encrypted using FHE and transformed into a secure PolyProtect template using polynomial transformation, for additional protection. We demonstrate the efficacy of the proposed approach through extensive experiments on multiple datasets. Our proposed approach ensures irreversibility and unlinkability, effectively preventing the leakage of soft biometric attributes from face embeddings without compromising recognition accuracy.

翻訳日:2024-04-26 15:07:57 公開日:2024-04-24

# 低コストかつスケーラブルなローハンマ除去のための確率的トラッカー管理法

Probabilistic Tracker Management Policies for Low-Cost and Scalable Rowhammer Mitigation ( http://arxiv.org/abs/2404.16256v1 )

ライセンス: Link先を確認

Aamer Jaleel, Stephen W. Keckler, Gururaj Saileshwar,

(参考訳) 本稿ではDRAM Rowhammer攻撃の軽減に焦点を当てる。近年、TRRのようなソリューションがDDR4 DRAMにデプロイされ、攻撃者行を追跡し、近隣の犠牲者行をリフレッシュすることで緩和作用が発行されている。残念ながら、そのようなDRAM内ソリューションはリソース制約(攻撃行を追跡するために数十のカウンタしかプロビジョニングできない)であり、それらを騙すのに使われた攻撃をスラッシングする傾向がある。 DRAMトラッカーの安全な代替品は数万のカウンタを必要とする。本研究は,資源制約トラッカーを用いた安全でスケーラブルなローハマー緩和を実証する。私たちのキーとなる考え方は、確率的管理ポリシー(PROTEAS)でこのようなトラッカーを管理することです。 PROTEASには、リクエストストリームサンプリングやランダムな消去のようなコンポーネントポリシーが含まれており、リソース制約されたトラッカーのスラッシュ耐性を可能にする。 Rowhammerのしきい値が500に下がったとしても、ProteASは小さなDRAMトラッカー(DRAMバンクあたり16カウンタ)を確保でき、3%のスローダウンを達成できる。さらに, PROTEAS はSamsung (DSAC) による最近の同様の確率的提案 (DSAC) よりも優れており,Rowhammer に対するレジリエンスは 11X - 19 倍であることを示す。

This paper focuses on mitigating DRAM Rowhammer attacks. In recent years, solutions like TRR have been deployed in DDR4 DRAM to track aggressor rows and then issue a mitigative action by refreshing neighboring victim rows. Unfortunately, such in-DRAM solutions are resource-constrained (only able to provision few tens of counters to track aggressor rows) and are prone to thrashing based attacks, that have been used to fool them. Secure alternatives for in-DRAM trackers require tens of thousands of counters. In this work, we demonstrate secure and scalable rowhammer mitigation using resource-constrained trackers. Our key idea is to manage such trackers with probabilistic management policies (PROTEAS). PROTEAS includes component policies like request-stream sampling and random evictions which enable thrash-resistance for resource-constrained trackers. We show that PROTEAS can secure small in-DRAM trackers (with 16 counters per DRAM bank) even when Rowhammer thresholds drop to 500 while incurring less than 3% slowdown. Moreover, we show that PROTEAS significantly outperforms a recent similar probabilistic proposal from Samsung (called DSAC) while achieving 11X - 19X the resilience against Rowhammer.

翻訳日:2024-04-26 15:07:57 公開日:2024-04-24

# すべての支払バウンドタスクに対するスワップレグレットの最小化予測

Predict to Minimize Swap Regret for All Payoff-Bounded Tasks ( http://arxiv.org/abs/2404.13503v2 )

ライセンス: Link先を確認

Lunjia Hu, Yifan Wu,

(参考訳) 一連の予測がキャリブレーションされるのは、下流のすべての決定タスクに対してスワップ後悔を誘発しない場合に限られる。本稿では,バイナリイベントの予測の最大スワップレグレット(MSR)について検討する。これまで、MSRを最小化するための最良のオンライン予測アルゴリズムは、MSRの上限であるK1校正誤差を一定要素まで最小化することで得られる。しかし、最近の研究 (Qiao and Valiant, 2021) は、最悪のケースで予想される$K_1$キャリブレーション誤差に対して${\Omega}(T^{0.528})$低いバウンドを与える。 MSRのいくつかの緩和はこの障壁を克服すると考えられており、外部の後悔(Kleinberg et al , 2023)と、下流のタスクの作用数(Noarov et al , 2023; Roth and Shi, 2024)に多項式的に依存する後悔の限界を通じてである。我々は、この障壁を緩和することなく克服できることを示し、$O(\sqrt{T}logT)$ expected MSRを保証する効率的なランダム化予測アルゴリズムを提供する。また、MSRを決定論的キャリブレーション誤差指標とみなし、キャリブレーションの経済的有用性についても検討し、既存の指標との関係について検討する。

A sequence of predictions is calibrated if and only if it induces no swap regret to all down-stream decision tasks. We study the Maximum Swap Regret (MSR) of predictions for binary events: the swap regret maximized over all downstream tasks with bounded payoffs. Previously, the best online prediction algorithm for minimizing MSR is obtained by minimizing the K1 calibration error, which upper bounds MSR up to a constant factor. However, recent work (Qiao and Valiant, 2021) gives an ${\Omega}(T^{0.528})$ lower bound for the worst-case expected $K_1$ calibration error incurred by any randomized algorithm in T rounds, presenting a barrier to achieving better rates for MSR. Several relaxations of MSR have been considered to overcome this barrier, via external regret (Kleinberg et al., 2023) and regret bounds depending polynomially on the number of actions in downstream tasks (Noarov et al., 2023; Roth and Shi, 2024). We show that the barrier can be surpassed without any relaxations: we give an efficient randomized prediction algorithm that guarantees $O(\sqrt{T}logT)$ expected MSR. We also discuss the economic utility of calibration by viewing MSR as a decision-theoretic calibration error metric and study its relationship to existing metrics.

翻訳日:2024-04-26 12:31:48 公開日:2024-04-24

# MARVEL:視覚的評価と学習による多次元抽象化と推論

MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning ( http://arxiv.org/abs/2404.13591v2 )

ライセンス: Link先を確認

Yifan Jiang, Jiarui Zhang, Kexuan Sun, Zhivar Sourati, Kian Ahrabian, Kaixin Ma, Filip Ilievski, Jay Pujara,

(参考訳) マルチモーダルな大規模言語モデル(MLLM)は、多くの一般的な視覚推論ベンチマークにおいて大きな進歩を示しているが、それらが抽象的な視覚推論能力を持っているかどうかは未解決のままである。スドゥークパズルと同様に、抽象的視覚推論(AVR)問題は、特定のタスク構成(例えば、行列)において入力形状(例えば、桁)を制御する高レベルパターン(例えば、繰り返し制約)を見つける必要がある。しかし、既存のAVRベンチマークでは、パターンの限られたセット(付加、結合)、入力形状(矩形、正方形)、タスク構成(3×3行列)しか考慮されていない。 MLLMの推論能力を総合的に評価するため、MARVELは6つのコア知識パターン、幾何学的および抽象的形状、および5つの異なるタスク構成からなる770個のパズルからなる多次元AVRベンチマークである。モデル精度が知覚と推論の基盤となっているかどうかを調べるため、MARVELは階層的評価フレームワークにおいて、一般的なAVR質問と知覚質問を補完する。我々は9つの代表MLLMをゼロショットおよび少数ショット設定でMARVEL上で包括的実験を行う。実験の結果、AVR質問では、すべてのモデルがほぼランダムなパフォーマンスを示しており、すべてのパターンやタスク構成にまたがる人間と比較して、大きなパフォーマンスギャップ(40%)があることがわかった。知覚的疑問のさらなる分析により、MLLMは視覚的特徴(ほぼランダムなパフォーマンス)を理解するのに苦労し、パズルのパネル(45%)を数えることさえ困難であり、抽象的推論の能力を妨げていることが明らかになった。コードとデータセット全体をリリースします。

While multi-modal large language models (MLLMs) have shown significant progress on many popular visual reasoning benchmarks, whether they possess abstract visual reasoning abilities remains an open question. Similar to the Sudoku puzzles, abstract visual reasoning (AVR) problems require finding high-level patterns (e.g., repetition constraints) that control the input shapes (e.g., digits) in a specific task configuration (e.g., matrix). However, existing AVR benchmarks only considered a limited set of patterns (addition, conjunction), input shapes (rectangle, square), and task configurations (3 by 3 matrices). To evaluate MLLMs' reasoning abilities comprehensively, we introduce MARVEL, a multidimensional AVR benchmark with 770 puzzles composed of six core knowledge patterns, geometric and abstract shapes, and five different task configurations. To inspect whether the model accuracy is grounded in perception and reasoning, MARVEL complements the general AVR question with perception questions in a hierarchical evaluation framework. We conduct comprehensive experiments on MARVEL with nine representative MLLMs in zero-shot and few-shot settings. Our experiments reveal that all models show near-random performance on the AVR question, with significant performance gaps (40%) compared to humans across all patterns and task configurations. Further analysis of perception questions reveals that MLLMs struggle to comprehend the visual features (near-random performance) and even count the panels in the puzzle ( <45%), hindering their ability for abstract reasoning. We release our entire code and dataset.

翻訳日:2024-04-26 12:31:48 公開日:2024-04-24

# Describe-then-Reason: Visual Comprehension Training によるマルチモーダル数学的推論の改善

Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training ( http://arxiv.org/abs/2404.14604v2 )

ライセンス: Link先を確認

Mengzhao Jia, Zhihan Zhang, Wenhao Yu, Fangkai Jiao, Meng Jiang,

(参考訳) オープンソースのマルチモーダル大言語モデル(MLLM)は、テキスト入力や視覚入力を含む様々なタスクに優れていますが、GPT-4V(ision)やGemini-Proといったプロプライエタリなモデルに遅れを取っている複雑なマルチモーダル数学的推論に苦戦しています。中間段階(すなわち理性)による微調整は、いくつかの数学的推論スキルを引き出すが、結果として得られるモデルは、まだ視覚中心の監督が不十分なため、視覚的理解に乏しく、数学の数字の正確な解釈に繋がる。この問題に対処するために,2段階のトレーニングパイプラインVCARを提案する。まず、視覚的記述生成タスクを通じてMLLMの視覚的理解能力を向上し、次に、説明の助けを借りて合理性を生成するための別の訓練ステップを行う。 2つの人気のあるベンチマーク実験の結果、VCARは、特に高い視覚的要求のある問題において、合理的な監督にのみ依存するベースライン手法を大幅に上回っていることが示された。

Open-source multimodal large language models (MLLMs) excel in various tasks involving textual and visual inputs but still struggle with complex multimodal mathematical reasoning, lagging behind proprietary models like GPT-4V(ision) and Gemini-Pro. Although fine-tuning with intermediate steps (i.e., rationales) elicits some mathematical reasoning skills, the resulting models still fall short in visual comprehension due to inadequate visual-centric supervision, which leads to inaccurate interpretation of math figures. To address this issue, we propose a two-step training pipeline VCAR, which emphasizes the Visual Comprehension training in Addition to mathematical Reasoning learning. It first improves the visual comprehension ability of MLLMs through the visual description generation task, followed by another training step on generating rationales with the assistance of descriptions. Experimental results on two popular benchmarks demonstrate that VCAR substantially outperforms baseline methods solely relying on rationale supervision, especially on problems with high visual demands.

翻訳日:2024-04-26 12:31:48 公開日:2024-04-24

# 小児脳腫瘍切除 : CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDsを中心に

The Brain Tumor Segmentation in Pediatrics (BraTS-PEDs) Challenge: Focus on Pediatrics (CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs) ( http://arxiv.org/abs/2404.15009v2 )

ライセンス: Link先を確認

Anahita Fathi Kazerooni, Nastaran Khalili, Deep Gandhi, Xinyang Liu, Zhifan Jiang, Syed Muhammed Anwar, Jake Albrecht, Maruf Adewole, Udunna Anazodo, Hannah Anderson, Sina Bagheri, Ujjwal Baid, Timothy Bergquist, Austin J. Borja, Evan Calabrese, Verena Chung, Gian-Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov, Ariana Familiar, Keyvan Farahani, Anurag Gottipati, Debanjan Haldar, Shuvanjan Haldar, Juan Eugenio Iglesias, Anastasia Janas, Elaine Johansen, Blaise V Jones, Neda Khalili, Florian Kofler, Dominic LaBella, Hollie Anne Lai, Koen Van Leemput, Hongwei Bran Li, Nazanin Maleki, Aaron S McAllister, Zeke Meier, Bjoern Menze, Ahmed W Moawad, Khanak K Nandolia, Julija Pavaine, Marie Piraud, Tina Poussaint, Sanjay P Prabhu, Zachary Reitman, Andres Rodriguez, Jeffrey D Rudie, Mariana Sanchez-Montano, Ibraheem Salman Shaikh, Lubdha M. Shah, Nakul Sheth, Russel Taki Shinohara, Wenxin Tu, Karthik Viswanathan, Chunhao Wang, Jeffrey B Ware, Benedikt Wiestler, Walter Wiggins, Anna Zapaishchykova, Mariam Aboian, Miriam Bornhorst, Peter de Blank, Michelle Deutsch, Maryam Fouladi, Lindsey Hoffman, Benjamin Kann, Margot Lazow, Leonie Mikael, Ali Nabavizadeh, Roger Packer, Spyridon Bakas, Adam Resnick, Brian Rood, Arastoo Vossough, Marius George Linguraru,

(参考訳) 中枢神経系の小児腫瘍は、小児におけるがん関連死の最も一般的な原因である。小児の高次グリオーマの生存率は20%未満である。希少性のため、診断が遅れることが多く、治療は主に歴史的治療の概念に基づいており、臨床試験には複数施設の協力が必要である。 CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDsの課題は、小児脳腫瘍に焦点をあて、小児神経腫瘍学および臨床治験に特化した複数の国際コンソーシアムにまたがるデータを収集することである。 CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDsチャレンジは、臨床治験に役立つ自動セグメンテーション技術の開発と、最終的には脳腫瘍を持つ子供のケアを加速させる。

Pediatric tumors of the central nervous system are the most common cause of cancer-related death in children. The five-year survival rate for high-grade gliomas in children is less than 20%. Due to their rarity, the diagnosis of these entities is often delayed, their treatment is mainly based on historic treatment concepts, and clinical trials require multi-institutional collaborations. Here we present the CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs challenge, focused on pediatric brain tumors with data acquired across multiple international consortia dedicated to pediatric neuro-oncology and clinical trials. The CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs challenge brings together clinicians and AI/imaging scientists to lead to faster development of automated segmentation techniques that could benefit clinical trials, and ultimately the care of children with brain tumors.

翻訳日:2024-04-26 12:31:48 公開日:2024-04-24

# 骨格運動自動評価におけるフィードバック生成の探索 : 概観

Exploring Feedback Generation in Automated Skeletal Movement Assessment: A Comprehensive Overview ( http://arxiv.org/abs/2404.09359v3 )

ライセンス: Link先を確認

Tal Hakim,

(参考訳) 近年,スケルトンビデオからの運動評価への機械学習の応用が注目されている。この進歩により、在宅でのリハビリテーションがより容易になり、2Dや3Dビデオから手頃な価格でポーズ検出や分析を行うための移動評価アルゴリズムが利用できるようになった。自動評価タスクの主目的は運動を評価することであるが、重要な運動課題を強調したフィードバックの自動生成は、リハビリテーションプロセスを大幅に強化し、加速する可能性がある。自動動作評価の分野では数多くの研究が存在しているが、アドレスフィードバック生成はごくわずかである。本研究では, 生成可能なフィードバックの種類を説明し, 自動フィードバック生成のための既存のソリューションをレビューし, 今後の研究方向性について議論する。我々の知る限り、骨格運動評価におけるフィードバック生成の総合的なレビューはこれが初めてである。

The application of machine-learning solutions to movement assessment from skeleton videos has attracted significant research attention in recent years. This advancement has made rehabilitation at home more accessible, utilizing movement assessment algorithms that can operate on affordable equipment for human pose detection and analysis from 2D or 3D videos. While the primary objective of automatic assessment tasks is to score movements, the automatic generation of feedback highlighting key movement issues has the potential to significantly enhance and accelerate the rehabilitation process. While numerous research works exist in the field of automatic movement assessment, only a handful address feedback generation. In this study, we explain the types of feedback that can be generated, review existing solutions for automatic feedback generation, and discuss future research directions. To our knowledge, this is the first comprehensive review of feedback generation in skeletal movement assessment.

翻訳日:2024-04-25 20:16:34 公開日:2024-04-24

# AIを使ったプログラミングアシスタントは、どこまで開発者のニーズを満たすことができるのか?

How far are AI-powered programming assistants from meeting developers' needs? ( http://arxiv.org/abs/2404.12000v2 )

ライセンス: Link先を確認

Xin Tan, Xiao Long, Xianjun Ni, Yinghao Zhu, Jing Jiang, Li Zhang,

(参考訳) GitHub Copilotのような最近のIDE内AIコーディングアシスタントツール(ACAT)は、開発者のコーディング習慣に大きな影響を与えている。有効性について調べる研究もあるが、実際の支援プロセスについて詳細な調査は行われていない。このギャップを埋めるために、我々は3つの典型的なソフトウェア開発タスクを含む実際の開発シナリオをシミュレートし、27人のコンピュータサイエンス学生を募集し、3つの一般的なACATを用いて彼らの振る舞いを調査する。私たちのゴールは、ACATの有効性を総合的に評価し、推奨コードの特徴を探求し、修正の理由を特定し、ユーザの課題と期待を理解することです。そこで本研究では,VSCode IDE用のデータ収集プラグインと,画面記録機能,コード評価機能,パーソナライズされたインタビュー・調査質問の自動生成機能を備えた実験プラットフォームを開発した。収集したデータを分析することで、ACATは一般的にタスク完了率を高め、時間を短縮し、コード品質を改善し、自己認識の生産性を向上させる。しかし、この改善は、コーディングタスクの性質とユーザエクスペリエンスレベルの両方に影響を受けている。特に、経験豊富な参加者にとって、ACATの使用は完成時間を増加させるかもしれない。また,「編集された行完成」が最も推奨される方法であるのに対し,「構成完了」と「弦完成」は受理率が最も低いことを観察した。推奨コードを変更する主な理由は、出力フォーマットと要求、欠陥のあるロジック、一貫性のないコードスタイルの相違である。課題と期待に関して、サービスアクセスとヘルプドキュメンテーションの最適化は、機能とパフォーマンスを除いて参加者によっても関係しています。本研究は,ACATの有効性とユーザビリティに関する貴重な知見を提供し,その設計と実装のさらなる改善を図っている。

Recent In-IDE AI coding assistant tools (ACATs) like GitHub Copilot have significantly impacted developers' coding habits. While some studies have examined their effectiveness, there lacks in-depth investigation into the actual assistance process. To bridge this gap, we simulate real development scenarios encompassing three typical types of software development tasks and recruit 27 computer science students to investigate their behavior with three popular ACATs. Our goal is to comprehensively assess ACATs' effectiveness, explore characteristics of recommended code, identify reasons for modifications, and understand users' challenges and expectations. To facilitate the study, we develop an experimental platform that includes a data collection plugin for VSCode IDE and provides functions for screen recording, code evaluation, and automatic generation of personalized interview and survey questions. Through analysis of the collected data, we find that ACATs generally enhance task completion rates, reduce time, improve code quality, and increase self-perceived productivity. However, the improvement is influenced by both the nature of coding tasks and users' experience level. Notably, for experienced participants, the use of ACATs may even increase completion time. We observe that "edited line completion" is the most frequently recommended way, while "comments completion" and "string completion" have the lowest acceptance rates. The primary reasons for modifying recommended code are disparities between output formats and requirements, flawed logic, and inconsistent code styles. In terms of challenges and expectations, optimization of service access and help documentation is also concerned by participants except for functionality and performance. Our study provides valuable insights into the effectiveness and usability of ACATs, informing further improvements in their design and implementation.

翻訳日:2024-04-25 20:16:34 公開日:2024-04-24

# フェデレーション学習におけるモデルポジショニングのためのレバレッジ変分グラフ表現

Leverage Variational Graph Representation For Model Poisoning on Federated Learning ( http://arxiv.org/abs/2404.15042v2 )

ライセンス: Link先を確認

Kai Li, Xin Yuan, Jingjing Zheng, Wei Ni, Falko Dressler, Abbas Jamalipour,

(参考訳) 本稿では,フェデレートラーニング(FL)に対するMP(トレーニングデータ不要モデル中毒)攻撃について述べる。新しいMPアタックは、FLのトレーニングデータにアクセスすることなく、悪質なローカルモデルのみに基づいて悪意あるローカルモデルを作成するために、逆変分グラフオートエンコーダ(VGAE)を拡張する。このような進歩はVGAE-MP攻撃に繋がる。 VGAE-MP攻撃は、良性局所モデルと訓練データ特徴間のグラフ構造相関を抽出し、逆向きにグラフ構造を再生し、逆性グラフ構造と良性モデルの特徴を用いて悪意ある局所モデルを生成する。さらに,VGAEを訓練するための良質な局所モデルの最適選択を可能にするとともに,悪質な局所モデルをVGAEと下位段階降下を用いて訓練する新たな攻撃アルゴリズムを提案する。実験では、提案したVGAE-MP攻撃下でのFLの精度が徐々に低下し、既存の防御機構が攻撃の検出に有効でないことが示され、FLに対する深刻な脅威となった。

This paper puts forth a new training data-untethered model poisoning (MP) attack on federated learning (FL). The new MP attack extends an adversarial variational graph autoencoder (VGAE) to create malicious local models based solely on the benign local models overheard without any access to the training data of FL. Such an advancement leads to the VGAE-MP attack that is not only efficacious but also remains elusive to detection. VGAE-MP attack extracts graph structural correlations among the benign local models and the training data features, adversarially regenerates the graph structure, and generates malicious local models using the adversarial graph structure and benign models' features. Moreover, a new attacking algorithm is presented to train the malicious local models using VGAE and sub-gradient descent, while enabling an optimal selection of the benign local models for training the VGAE. Experiments demonstrate a gradual drop in FL accuracy under the proposed VGAE-MP attack and the ineffectiveness of existing defense mechanisms in detecting the attack, posing a severe threat to FL.

翻訳日:2024-04-25 20:16:34 公開日:2024-04-24

# TOP-Nav:Terrin, Obstacle, Proprioception Estimationを統合した脚付きナビゲーション

TOP-Nav: Legged Navigation Integrating Terrain, Obstacle and Proprioception Estimation ( http://arxiv.org/abs/2404.15256v2 )

ライセンス: Link先を確認

Junli Ren, Yikai Liu, Yingru Dai, Guijin Wang,

(参考訳) 脚のついたナビゲーションは通常、オープンワールド、オフロード、挑戦的な環境で検査される。これらのシナリオでは、外乱を推定するには、多重モーダル情報の複雑な合成が必要である。これは、主に障害を避けることに焦点を当てた既存の作業において、大きな制限となる。本研究では,包括的パスプランナとTerrain認識,Obstacle回避,クローズループプロプライオセプションを統合した新しい脚付きナビゲーションフレームワークTOP-Navを提案する。 TOP-Navは、経路計画と運動計画の両方において、視覚とプロプレセプションの相乗効果を強調している。経路プランナ内では、障害物を効果的に回避しつつ、高い走行性を有する地形上の経路をロボットが選択できる地形推定器を提示し、統合する。動作計画レベルでは、ナビゲーションコマンドを追跡するために移動制御器を実装できるだけでなく、経路プランナーに動作評価を提供するための受容アドバイザも構築する。クローズループ動作フィードバックに基づいて、視覚に基づく地形と障害物推定のオンライン修正を行う。そのため、TOP-Navは、ロボットが以前の知識の分布を超えて地形や乱れを扱えるように、オープンワールドナビゲーションを実現し、視覚条件によって課される制約を克服する。 TOP-Navは、シミュレーションと実世界の環境の両方で実施された広範な実験に基づいて、既存の手法と比較して、オープンワールドナビゲーションにおいて優れた性能を示す。

Legged navigation is typically examined within open-world, off-road, and challenging environments. In these scenarios, estimating external disturbances requires a complex synthesis of multi-modal information. This underlines a major limitation in existing works that primarily focus on avoiding obstacles. In this work, we propose TOP-Nav, a novel legged navigation framework that integrates a comprehensive path planner with Terrain awareness, Obstacle avoidance and close-loop Proprioception. TOP-Nav underscores the synergies between vision and proprioception in both path and motion planning. Within the path planner, we present and integrate a terrain estimator that enables the robot to select waypoints on terrains with higher traversability while effectively avoiding obstacles. In the motion planning level, we not only implement a locomotion controller to track the navigation commands, but also construct a proprioception advisor to provide motion evaluations for the path planner. Based on the close-loop motion feedback, we make online corrections for the vision-based terrain and obstacle estimations. Consequently, TOP-Nav achieves open-world navigation that the robot can handle terrains or disturbances beyond the distribution of prior knowledge and overcomes constraints imposed by visual conditions. Building upon extensive experiments conducted in both simulation and real-world environments, TOP-Nav demonstrates superior performance in open-world navigation compared to existing methods.

翻訳日:2024-04-25 20:16:34 公開日:2024-04-24

# OODインテント分類のためのオープンワールドロッキーチケット仮説

The Open-World Lottery Ticket Hypothesis for OOD Intent Classification ( http://arxiv.org/abs/2210.07071v3 )

ライセンス: Link先を確認

Yunhua Zhou, Pengyu Wang, Peiju Liu, Yuxin Wang, Xipeng Qiu,

(参考訳) 既存のOOD(Out-of-Domain)の意図的な分類法は、広範囲な補助的なOODコーパスや特定の訓練パラダイムに依存している。しかしながら、モデルがドメイン内および外部の意図に対する信頼を区別するべきだという基本的な原則では、これらは未発達である。本研究では,OOD上でのモデル過信の根本的な原因を明らかにするとともに,過パラメータ化モデルを用いてキャリブレーションされたサブネットを発見できることを実証する。サブネットワークが提供するキャリブレーションされた信頼性は、ほとんどすべてのポストホックメソッドの利点となるIn-とOut-of-ドメインをよりよく区別することができる。基本的な洞察をもたらすことに加えて、Luttery Ticket仮説をオープンワールドのシナリオにも拡張しています。実世界の4つのデータセットに対する広範な実験を行い、我々のアプローチが、競争力のあるベースラインのスイートと比較して一貫した改善を確立することができることを実証します。

Most existing methods of Out-of-Domain (OOD) intent classification rely on extensive auxiliary OOD corpora or specific training paradigms. However, they are underdeveloped in the underlying principle that the models should have differentiated confidence in In- and Out-of-domain intent. In this work, we shed light on the fundamental cause of model overconfidence on OOD and demonstrate that calibrated subnetworks can be uncovered by pruning the overparameterized model. Calibrated confidence provided by the subnetwork can better distinguish In- and Out-of-domain, which can be a benefit for almost all post hoc methods. In addition to bringing fundamental insights, we also extend the Lottery Ticket Hypothesis to open-world scenarios. We conduct extensive experiments on four real-world datasets to demonstrate our approach can establish consistent improvements compared with a suite of competitive baselines.

翻訳日:2024-04-25 16:34:44 公開日:2024-04-24

# 深層強化学習のための低レイテンシ適応型符号化スパイクフレームワーク

A Low Latency Adaptive Coding Spiking Framework for Deep Reinforcement Learning ( http://arxiv.org/abs/2211.11760v3 )

ライセンス: Link先を確認

Lang Qin, Rui Yan, Huajin Tang,

(参考訳) 近年,低消費電力化とイベント駆動機能により,強化学習(RL)にスパイクニューラルネットワーク(SNN)が用いられている。しかし、固定符号法に苦しむスパイキング強化学習(SRL)は、高レイテンシと低汎用性の問題に直面している。本稿では,学習可能な行列乗法を用いてスパイクのエンコードとデコードを行い,コーダの柔軟性を改善し,遅延を低減する。一方、直接学習法を用いてSNNを訓練し、オンラインとオフラインのRLアルゴリズムに2つの異なる構造を用いる。超低レイテンシ(他のSRL手法の0.8%以下)と、異なるアルゴリズムと異なる環境下でのエネルギー効率(DNNの最大5倍)で最適性能を実現することを発見した。

In recent years, spiking neural networks (SNNs) have been used in reinforcement learning (RL) due to their low power consumption and event-driven features. However, spiking reinforcement learning (SRL), which suffers from fixed coding methods, still faces the problems of high latency and poor versatility. In this paper, we use learnable matrix multiplication to encode and decode spikes, improving the flexibility of the coders and thus reducing latency. Meanwhile, we train the SNNs using the direct training method and use two different structures for online and offline RL algorithms, which gives our model a wider range of applications. Extensive experiments have revealed that our method achieves optimal performance with ultra-low latency (as low as 0.8% of other SRL methods) and excellent energy efficiency (up to 5X the DNNs) in different algorithms and different environments.

翻訳日:2024-04-25 16:34:44 公開日:2024-04-24

# 画像レベルとオブジェクトレベルのセマンティック判別器を用いた構造ガイド画像補完

Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators ( http://arxiv.org/abs/2212.06310v2 )

ライセンス: Link先を確認

Haitian Zheng, Zhe Lin, Jingwan Lu, Scott Cohen, Eli Shechtman, Connelly Barnes, Jianming Zhang, Qing Liu, Yuqian Zhou, Sohrab Amirghodsi, Jiebo Luo,

(参考訳) 構造誘導画像補完は,ユーザからの入力誘導マップに従って画像の局所領域を描画することを目的としている。このようなタスクはインタラクティブな編集に多くの実用的な応用を可能にするが、既存の手法は複雑な自然の場面で現実的なオブジェクトインスタンスを幻覚させるのに苦労することが多い。このような制限は、部分的にはホール領域内の意味レベルの制約の欠如と、現実的なオブジェクト生成を強制するメカニズムの欠如によるものである。本研究では,複雑なセマンティックスやオブジェクトの生成を改善するために,セマンティック・ディミネータとオブジェクトレベル・ディミネータからなる学習パラダイムを提案する。特に、セマンティック・ディミネーターは、事前学習された視覚的特徴を利用して、生成された視覚概念の現実性を改善する。さらに、オブジェクトレベルの識別器は、個々のオブジェクトのリアリズムを強制するために、整列したインスタンスを入力として取ります。提案手法は生成品質を著しく向上させ,セグメンテーション誘導完了,エッジ誘導操作,Places2データセットのパノプティカル誘導操作など,様々なタスクにおける最先端結果を実現する。さらに、トレーニングされたモデルは柔軟で、オブジェクト挿入、置換、除去、標準塗装など、複数の編集ユースケースをサポートできます。特に、トレーニングされたモデルと新しい自動画像補完パイプラインを組み合わせることで、標準的な塗装作業における最先端の結果が得られます。

Structure-guided image completion aims to inpaint a local region of an image according to an input guidance map from users. While such a task enables many practical applications for interactive editing, existing methods often struggle to hallucinate realistic object instances in complex natural scenes. Such a limitation is partially due to the lack of semantic-level constraints inside the hole region as well as the lack of a mechanism to enforce realistic object generation. In this work, we propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects. Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts. Moreover, the object-level discriminators take aligned instances as inputs to enforce the realism of individual objects. Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks, including segmentation-guided completion, edge-guided manipulation and panoptically-guided manipulation on Places2 datasets. Furthermore, our trained model is flexible and can support multiple editing use cases, such as object insertion, replacement, removal and standard inpainting. In particular, our trained model combined with a novel automatic image completion pipeline achieves state-of-the-art results on the standard inpainting task.

翻訳日:2024-04-25 16:34:44 公開日:2024-04-24

# 授業増分学習における効果的な意思決定境界学習

Effective Decision Boundary Learning for Class Incremental Learning ( http://arxiv.org/abs/2301.05180v3 )

ライセンス: Link先を確認

Chaoyue Ding, Kunchi Li, Jun Wan, Shan Yu,

(参考訳) クラスインクリメンタルラーニング(CIL)におけるリハーサルアプローチは、知識蒸留のための古いクラスデータの不足と、記憶メモリが限られているため、学習と新しいクラス間の不均衡なデータ学習という2つの要因によって、新しいクラスに過度に適合する決定境界に悩まされる。本研究では,これらの2つの要因に対処するための,単純かつ効果的なアプローチを提案する。まず、再サンプリング戦略とMixup K {\displaystyle K}nowledge D}istillation (Re-MKD)を用いて、KDの性能を改善する。具体的には、学習されたクラスと新しいクラス間の潜伏分布とより整合したKDトレーニングで使用される適切なデータを合成するために、ミックスアップと再サンプリングの戦略を組み合わせる。次に, インフルエンスバランス法をCIL設定に拡張することにより, インクリメンタルインフルエンスバランス(IIB)法を提案する。これら2つの改善により、KDの性能を改善し、不均衡なデータ学習を同時に扱う効果的な決定境界学習アルゴリズム(EDBL)を提案する。実験の結果、EDBLはいくつかのCILベンチマークで最先端のパフォーマンスを達成できた。

Rehearsal approaches in class incremental learning (CIL) suffer from decision boundary overfitting to new classes, which is mainly caused by two factors: insufficiency of old classes data for knowledge distillation and imbalanced data learning between the learned and new classes because of the limited storage memory. In this work, we present a simple but effective approach to tackle these two factors. First, we employ a re-sampling strategy and Mixup K}nowledge D}istillation (Re-MKD) to improve the performances of KD, which would greatly alleviate the overfitting problem. Specifically, we combine mixup and re-sampling strategies to synthesize adequate data used in KD training that are more consistent with the latent distribution between the learned and new classes. Second, we propose a novel incremental influence balance (IIB) method for CIL to tackle the classification of imbalanced data by extending the influence balance method into the CIL setting, which re-weights samples by their influences to create a proper decision boundary. With these two improvements, we present the effective decision boundary learning algorithm (EDBL) which improves the performance of KD and deals with the imbalanced data learning simultaneously. Experiments show that the proposed EDBL achieves state-of-the-art performances on several CIL benchmarks.

翻訳日:2024-04-25 16:34:44 公開日:2024-04-24

# 組込み検索アライメント:トランスフォーマーモデルを用いたDNA配列アライメント

Embed-Search-Align: DNA Sequence Alignment using Transformer Models ( http://arxiv.org/abs/2309.11087v4 )

ライセンス: Link先を確認

Pavan Holur, K. C. Enevoldsen, Shreyas Rajesh, Lajoyce Mboning, Thalia Georgiou, Louis-S. Bouchard, Matteo Pellegrini, Vwani Roychowdhury,

(参考訳) DNA配列のアライメントは、幅広い参照ゲノム上の最も可能性の高い場所に短いDNA読取を割り当てることを含む。このプロセスは、変異呼び出し、転写学、エピジェノミクスを含む様々なゲノム解析に不可欠である。何十年にもわたって洗練されてきた従来の手法では、ゲノムインデクシングと効率的な検索という2つのステップでこの課題に対処している。テキストを埋め込みに符号化するLarge Language Models (LLM) の成功に基づいて、距離メートル法が意味的類似性を捉え、近年の取り組みでは、同じトランスフォーマーアーキテクチャがDNA配列の数値表現を生成できるかどうかが検討されている。このようなモデルは、コーディングと非コーディング領域の検出、エンハンサーとプロモーター配列の同定など、短いDNA配列の分類を含むタスクにおいて、早期に有望であることが示されている。しかし、配列分類タスクのパフォーマンスは配列アライメントに変換されず、全ての読み出しをうまく整列するためにゲノムワイド検索を行う必要がある。我々は、このオープンな問題をEmbed-Search-Alignタスクとしてフレーミングすることで解決する。この枠組みでは、新しいエンコーダモデルDNA-ESAが参照の読み取りとフラグメントの表現を生成し、リードフラグメント距離をアライメントの代理として使用する共有ベクトル空間に投影する。特にDNA-ESAは,(1)DNA配列表現の自己教師的訓練における対照的な損失,(2)断片を世界規模で探索するためのDNAベクターストアを導入している。 DNA-ESAは、250長の読み取りを3ギガ塩基(単一ハプロイド)のヒト基準ゲノムに合わせると97%正確であり、最近の6つのDNA-トランスフォーマーモデルベースラインのパフォーマンスをはるかに上回り、染色体や種間のタスク転送を示す。

DNA sequence alignment involves assigning short DNA reads to the most probable locations on an extensive reference genome. This process is crucial for various genomic analyses, including variant calling, transcriptomics, and epigenomics. Conventional methods, refined over decades, tackle this challenge in two steps: genome indexing followed by efficient search to locate likely positions for given reads. Building on the success of Large Language Models (LLM) in encoding text into embeddings, where the distance metric captures semantic similarity, recent efforts have explored whether the same Transformer architecture can produce numerical representations for DNA sequences. Such models have shown early promise in tasks involving classification of short DNA sequences, such as the detection of coding vs non-coding regions, as well as the identification of enhancer and promoter sequences. Performance at sequence classification tasks does not, however, translate to sequence alignment, where it is necessary to conduct a genome-wide search to successfully align every read. We address this open problem by framing it as an Embed-Search-Align task. In this framework, a novel encoder model DNA-ESA generates representations of reads and fragments of the reference, which are projected into a shared vector space where the read-fragment distance is used as surrogate for alignment. In particular, DNA-ESA introduces: (1) Contrastive loss for self-supervised training of DNA sequence representations, facilitating rich sequence-level embeddings, and (2) a DNA vector store to enable search across fragments on a global scale. DNA-ESA is >97% accurate when aligning 250-length reads onto a human reference genome of 3 gigabases (single-haploid), far exceeds the performance of 6 recent DNA-Transformer model baselines and shows task transfer across chromosomes and species.

翻訳日:2024-04-25 16:34:44 公開日:2024-04-24

# ランダムに結合したパウリスピンのモデル

A model of randomly-coupled Pauli spins ( http://arxiv.org/abs/2309.15349v3 )

ライセンス: Link先を確認

Masanori Hanada, Antal Jevicki, Xianlong Liu, Enrico Rinaldi, Masaki Tezuka,

(参考訳) 我々は、SYKモデルにおけるマヨラナフェルミオンをスピン作用素に置き換えることで、全ての4-局所相互作用を持つパウリスピン作用素のモデルを構築する。同様に、フェルミオンをハードコアボソンに置き換える。このモデルを数値的に検討し,その特性をSYKモデルと比較する。我々はスピンモデルとSYKモデルとの顕著な定量的な一致を観察し、このスピンモデルは強いカオスであり、ホログラフィーにおいて何らかの役割を果たす可能性があることを示唆している。また、多局所場を用いた経路積分法と量子シミュレーションの可能性についても論じる。パウリスピンは量子ビットベースの量子デバイス上でのフェルミオンよりも実装が容易であるため、このモデルは量子シミュレーションの興味深いターゲットになるかもしれない。

We construct a model of Pauli spin operators with all-to-all 4-local interactions by replacing Majorana fermions in the SYK model with spin operators. Equivalently, we replace fermions with hard-core bosons. We study this model numerically and compare the properties with those of the SYK model. We observe a striking quantitative coincidence between the spin model and the SYK model, which suggests that this spin model is strongly chaotic and, perhaps, can play some role in holography. We also discuss the path-integral approach with multi-local fields and the possibility of quantum simulations. This model may be an interesting target for quantum simulations because Pauli spins are easier to implement than fermions on qubit-based quantum devices.

翻訳日:2024-04-25 16:34:44 公開日:2024-04-24

# 未知のトークンによる学習は、より強力な視力学習者を駆り立てる

Learning with Unmasked Tokens Drives Stronger Vision Learners ( http://arxiv.org/abs/2310.13593v2 )

ライセンス: Link先を確認

Taekyung Kim, Sanghyuk Chun, Byeongho Heo, Dongyoon Han,

(参考訳) マスク付き画像モデリング(MIM)は,自己指導型学習戦略の先駆けとなる。 Masked Autoencoder (MAE) のようなMIMは、入力トークンをランダムにマスキングして処理し、デコーダが入力にマスクされたトークンを再構成することで、強力な表現を学ぶ。しかし、MIM事前訓練エンコーダは、マスク付きトークンのみを回帰することにのみ焦点をあてているため、限られた注意幅を持つことが多いため、エンコーダのより広範な文脈学習を阻害する可能性がある。この制限に対処するため、トレーニングプロセスに無意味なトークンを明示的に組み込むことによりMIMを改善する。具体的には,デコーダがマスク付きトークンを再構成している間に,アンマスク付きトークンが広いコンテキストを体験できるようにする。このように、符号化されたアンマスクトークンは、広範囲なコンテキスト情報を備えており、マスクされたトークンはMIMの強化されたアンマスクトークンを利用することができる。その結果,ImageNet-1K上でのVT-Bによる84.2%のトップ-1の精度と0.6%の利得を達成して,より差別的な表現を訓練した。この成功は、特異値スペクトルと注意分析によって証明されたように、事前学習の強化によるものである。最後に、下流のセマンティックセグメンテーションときめ細かい視覚的分類タスク、そして多様なロバストな評価指標において、我々のモデルは大きなパフォーマンス向上を達成する。コードはhttps://github.com/naver-ai/lutで入手できる。

Masked image modeling (MIM) has become a leading self-supervised learning strategy. MIMs such as Masked Autoencoder (MAE) learn strong representations by randomly masking input tokens for the encoder to process, with the decoder reconstructing the masked tokens to the input. However, MIM pre-trained encoders often exhibit a limited attention span, attributed to MIM's sole focus on regressing masked tokens only, which may impede the encoder's broader context learning. To tackle the limitation, we improve MIM by explicitly incorporating unmasked tokens into the training process. Specifically, our method enables the encoder to learn from broader context supervision, allowing unmasked tokens to experience broader contexts while the decoder reconstructs masked tokens. Thus, the encoded unmasked tokens are equipped with extensive contextual information, empowering masked tokens to leverage the enhanced unmasked tokens for MIM. As a result, our simple remedy trains more discriminative representations revealed by achieving 84.2% top-1 accuracy with ViT-B on ImageNet-1K with 0.6%p gain. We attribute the success to the enhanced pre-training method, as evidenced by the singular value spectrum and attention analyses. Finally, our models achieve significant performance gains at the downstream semantic segmentation and fine-grained visual classification tasks; and on diverse robust evaluation metrics. Code is available at https://github.com/naver-ai/lut

翻訳日:2024-04-25 16:25:00 公開日:2024-04-24

# DEFT:教師なしコアセット選択による事前学習言語モデルのためのデータ効率の良い微調整

DEFT: Data Efficient Fine-Tuning for Pre-Trained Language Models via Unsupervised Core-Set Selection ( http://arxiv.org/abs/2310.16776v4 )

ライセンス: Link先を確認

Devleena Das, Vivek Khetan,

(参考訳) 近年の進歩により、多くの事前学習言語モデル(PLM)が利用可能になったが、ダウンストリームタスクでPLMを微調整するには、どの程度のデータが必要か、疑問が残る。本研究では、教師なしコアセット選択を利用したデータ効率のよい微調整フレームワークであるDEFT-UCSを導入し、下流タスクの微調整に必要なデータ量を削減するために、より小型で代表的なデータセットを識別する。テキスト編集 LM の文脈における DEFT-UCS の有効性について検討し,最先端のテキスト編集モデルである CoEDIT との比較を行った。以上の結果から, DEFT-UCSモデルは,6つの編集タスクからなる8つのデータセットに対して,CoEDITと同程度の精度で,70%の精度で微調整できることがわかった。

Recent advances have led to the availability of many pre-trained language models (PLMs); however, a question that remains is how much data is truly needed to fine-tune PLMs for downstream tasks? In this work, we introduce DEFT-UCS, a data-efficient fine-tuning framework that leverages unsupervised core-set selection to identify a smaller, representative dataset that reduces the amount of data needed to fine-tune PLMs for downstream tasks. We examine the efficacy of DEFT-UCS in the context of text-editing LMs, and compare to the state-of-the art text-editing model, CoEDIT. Our results demonstrate that DEFT-UCS models are just as accurate as CoEDIT, across eight different datasets consisting of six different editing tasks, while finetuned on 70% less data.

翻訳日:2024-04-25 16:25:00 公開日:2024-04-24

# ZeroNVS: 単一画像からのゼロショット360度ビュー合成

ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image ( http://arxiv.org/abs/2310.17994v2 )

ライセンス: Link先を確認

Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, Jiajun Wu,

(参考訳) そこで,本研究では3次元拡散モデルであるZeroNVSを導入し,ワンイメージの新たなビュー合成手法を提案する。既存の手法は,暗黙の背景を持つ単一オブジェクトに対して設計されているが,複雑な背景を持つマルチオブジェクトシーンによってもたらされる課題に対処する新しい手法を提案する。具体的には、オブジェクト中心、屋内、屋外のシーンをキャプチャするデータソースの混合に基づいて、生成をトレーニングする。深度スケールのあいまいさなどのデータ混合問題に対処するため,新しいカメラ条件付パラメータ化と正規化方式を提案する。さらに,SDS (Score Distillation Sampling) は,360度シーンの蒸留時に複雑な背景の分布を小さくする傾向にあり,合成された新規なビューの多様性を向上させるために「SDSアンカー」を提案する。我々のモデルは、DTUデータセット上のLPIPSをゼロショット設定で設定し、DTUで特別に訓練された方法よりも優れた結果を得る。我々はさらに、シングルイメージの新規ビュー合成のための新しいベンチマークとして、挑戦的なMip-NeRF 360データセットを適用し、この設定で強い性能を示す。私たちのコードとデータはhttp://kylesargent.github.io/zeronvs/です。

We introduce a 3D-aware diffusion model, ZeroNVS, for single-image novel view synthesis for in-the-wild scenes. While existing methods are designed for single objects with masked backgrounds, we propose new techniques to address challenges introduced by in-the-wild multi-object scenes with complex backgrounds. Specifically, we train a generative prior on a mixture of data sources that capture object-centric, indoor, and outdoor scenes. To address issues from data mixture such as depth-scale ambiguity, we propose a novel camera conditioning parameterization and normalization scheme. Further, we observe that Score Distillation Sampling (SDS) tends to truncate the distribution of complex backgrounds during distillation of 360-degree scenes, and propose "SDS anchoring" to improve the diversity of synthesized novel views. Our model sets a new state-of-the-art result in LPIPS on the DTU dataset in the zero-shot setting, even outperforming methods specifically trained on DTU. We further adapt the challenging Mip-NeRF 360 dataset as a new benchmark for single-image novel view synthesis, and demonstrate strong performance in this setting. Our code and data are at http://kylesargent.github.io/zeronvs/

翻訳日:2024-04-25 16:25:00 公開日:2024-04-24

# 多体非エルミート系の位相位相

Topological phases of many-body non-Hermitian systems ( http://arxiv.org/abs/2311.03043v3 )

ライセンス: Link先を確認

Kui Cao, Su-Peng Kou,

(参考訳) 多体フェルミオン非エルミタン系は、それぞれエネルギーバンドと量子状態の位相を記述するために2つの異なる位相不変量を必要とするが、後者はまだ探索されていない。粒子ホール, 線形化時間反転, 線形化キラル対称性によって決定される10種類の対称性クラスを同定する。各クラスは各次元に関連する位相不変量を持ち、量子状態の位相を決定する。これらの知見は、多体非エルミート系の位相位相のより深い理解の道を開くものである。

We show that many-body fermionic non-Hermitian systems require two distinct sets of topological invariants to describe the topology of energy bands and quantum states respectively, with the latter yet to be explored. We identify 10 symmetry classes -- determined by particle-hole, linearized time-reversal, and linearized chiral symmetries. Each class has topological invariant associated with each dimension, dictating the topology of quantum states. These findings pave the way for deeper understanding of the topological phases of many-body non-Hermitian systems.

翻訳日:2024-04-25 16:25:00 公開日:2024-04-24

# TransformCode: サブツリー変換によるコード埋め込みのためのコントラスト学習フレームワーク

TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree Transformation ( http://arxiv.org/abs/2311.08157v2 )

ライセンス: Link先を確認

Zixiang Xian, Rubing Huang, Dave Towey, Chunrong Fang, Zhenyu Chen,

(参考訳) 人工知能(AI)は、ソフトウェア開発の効率を向上させることによって、ソフトウェアエンジニアリング(SE)に革命をもたらした。トランスファーラーニングを活用した事前学習モデル(PTM)の出現により、SEのためのAIは大幅に進歩した。しかし、個々のコードトークンを操作する既存のPTMには、いくつかの制限がある。それらは、トレーニングと微調整にコストがかかり、タスク固有のデータセットを微調整するためにラベル付きデータに大きく依存している。本稿では,コード埋め込みを対照的な学習方法で学習する新しいフレームワークであるTransformCodeを提案する。我々のフレームワークはエンコーダに依存しない言語に依存しないので、どんなエンコーダモデルでも活用でき、どんなプログラミング言語でも扱える。また,抽象構文木変換(AST)変換と呼ばれる新しいデータ拡張手法を提案し,構文的および意味的変換を元のコードスニペットに適用し,より多様で頑健なサンプルを生成する。既存の手法に対して、柔軟性と適応性があり、コード表現を必要とする他のダウンストリームタスク(コードクローンの検出や分類など)に容易に拡張できるため、2)大規模モデルや大量のトレーニングデータを必要としないため、効率的でスケーラブルであり、あらゆるプログラミング言語をサポートできるため、(3)教師なし学習に限らず、タスク固有のラベルや目的を組み込むことで教師あり学習タスクにも適用できる、(4)コンピュータリソースに基づいたエンコーダパラメータの調整も可能である。我々は,いくつかのコード関連タスクにおけるフレームワークの評価を行い,SourcererCC,Code2vec,InferCodeといった最先端のメソッドよりも有効性と優位性を示す。

Artificial intelligence (AI) has revolutionized software engineering (SE) by enhancing software development efficiency. The advent of pre-trained models (PTMs) leveraging transfer learning has significantly advanced AI for SE. However, existing PTMs that operate on individual code tokens suffer from several limitations: They are costly to train and fine-tune; and they rely heavily on labeled data for fine-tuning on task-specific datasets. In this paper, we present TransformCode, a novel framework that learns code embeddings in a contrastive learning manner. Our framework is encoder-agnostic and language-agnostic, which means that it can leverage any encoder model and handle any programming language. We also propose a novel data-augmentation technique called abstract syntax tree (AST) transformation, which applies syntactic and semantic transformations to the original code snippets, to generate more diverse and robust samples for contrastive learning. Our framework has several advantages over existing methods: (1) It is flexible and adaptable, because it can easily be extended to other downstream tasks that require code representation (such as code-clone detection and classification); (2) it is efficient and scalable, because it does not require a large model or a large amount of training data, and it can support any programming language; (3) it is not limited to unsupervised learning, but can also be applied to some supervised learning tasks by incorporating task-specific labels or objectives; and (4) it can also adjust the number of encoder parameters based on computing resources. We evaluate our framework on several code-related tasks, and demonstrate its effectiveness and superiority over the state-of-the-art methods such as SourcererCC, Code2vec, and InferCode.

翻訳日:2024-04-25 16:25:00 公開日:2024-04-24

# 行動異常検出のための多レベル誘導探索ネットワークと行動シーンマッチング法

A Multilevel Guidance-Exploration Network and Behavior-Scene Matching Method for Human Behavior Anomaly Detection ( http://arxiv.org/abs/2312.04119v2 )

ライセンス: Link先を確認

Guoqing Yang, Zhiming Luo, Jianzhe Gao, Yingxin Lai, Kun Yang, Yifan He, Shaozi Li,

(参考訳) 人間の行動異常検出は、知的監視やその他の領域において重要な役割を果たす、異常な人間の行動を特定することを目的としている。現在の主流の手法では、再構築や将来のフレーム予測技術が採用されている。しかし、低レベルのピクセルの特徴を再構成したり予測したりすることで、ネットワークが過度に強力な一般化能力を達成し、異常を再構築したり、通常のデータと同じくらい効果的に予測することができる。学生-教師ネットワークにインスパイアされたこれらの手法とは違って,多段階誘導探索ネットワーク(MGENet)と呼ばれる,誘導探索ネットワークと探索ネットワークの高レベル表現の違いから異常を検出する新しいフレームワークを提案する。具体的には、まず骨格キーポイントを入力とし、RGBエンコーダを誘導する学習済み正規化フローを用いて、未知のRGBフレームを入力として取り込んで、動作遅延特徴を探索する。次に、RGBエンコーダはマスク付きRGBフレームを入力として用いたマスクエンコーダをガイドし、潜伏した外観特徴を探索する。さらに、シーン関連行動異常を検出するための行動シーンマッチングモジュール(BSMM)を設計する。 AUCは86.9 %, UB正規データセットは73.5 %であった。コードはhttps://github.com/molu-ggg/GENet.comで入手できる。

Human behavior anomaly detection aims to identify unusual human actions, playing a crucial role in intelligent surveillance and other areas. The current mainstream methods still adopt reconstruction or future frame prediction techniques. However, reconstructing or predicting low-level pixel features easily enables the network to achieve overly strong generalization ability, allowing anomalies to be reconstructed or predicted as effectively as normal data. Different from their methods, inspired by the Student-Teacher Network, we propose a novel framework called the Multilevel Guidance-Exploration Network(MGENet), which detects anomalies through the difference in high-level representation between the Guidance and Exploration network. Specifically, we first utilize the pre-trained Normalizing Flow that takes skeletal keypoints as input to guide an RGB encoder, which takes unmasked RGB frames as input, to explore motion latent features. Then, the RGB encoder guides the mask encoder, which takes masked RGB frames as input, to explore the latent appearance feature. Additionally, we design a Behavior-Scene Matching Module(BSMM) to detect scene-related behavioral anomalies. Extensive experiments demonstrate that our proposed method achieves state-of-the-art performance on ShanghaiTech and UBnormal datasets, with AUC of 86.9 % and 73.5 %, respectively. The code will be available on https://github.com/molu-ggg/GENet.

翻訳日:2024-04-25 16:25:00 公開日:2024-04-24

# AIアセスメント尺度(AIAS) : 教育評価におけるジェネレーティブAIの倫理的統合のためのフレームワーク

The AI Assessment Scale (AIAS): A Framework for Ethical Integration of Generative AI in Educational Assessment ( http://arxiv.org/abs/2312.07086v2 )

ライセンス: Link先を確認

Mike Perkins, Leon Furze, Jasper Roe, Jason MacVaugh,

(参考訳) ジェネレーティブ・人工知能(GenAI)の最近の進歩は、社会の複数の領域におけるパラダイムシフトを生み出しており、これらの技術の使用は今後数十年で教育の明確な特徴となる可能性が高い。 GenAIは変革的な教育の機会を提供し、同時に倫理的・学術的な課題を提起する。このような背景から、我々はGenAIツールを教育アセスメントに統合するための実用的でシンプルで十分に包括的なツール、AIAS(AI Assessment Scale)を概説した。 AIASは、教育者に対して、彼らが解決しようとしている学習結果に基づいて、評価においてGenAIの使用の適切なレベルを選択する権限を与える。 AIASは、学生や教育者に対してより明確で透明性を提供し、機関が協力し合うための公平で公平なポリシーツールを提供し、GenAIの機会を受け入れつつ、そのようなツールが教育的に適切でなくても必要な場合もあることを認識しながら、ニュアンスなアプローチを提供する。実践的でフレキシブルなアプローチを迅速に実施することで、AIASは、教育におけるGenAIに関する現在の不確実性と不安に対処するための、非常に必要な出発点を形成することができる。第二の目的として、教育におけるGenAIツールについて、現在の学術的不正行為のファシリテーターとしてのGenAIに焦点をあてているのとは対照的に、テクノロジーが教育と学習を支援・強化する上でどのように役立つかという、教育におけるGenAIツールに関する再焦点の言説を提唱する。

Recent developments in Generative Artificial Intelligence (GenAI) have created a paradigm shift in multiple areas of society, and the use of these technologies is likely to become a defining feature of education in coming decades. GenAI offers transformative pedagogical opportunities, while simultaneously posing ethical and academic challenges. Against this backdrop, we outline a practical, simple, and sufficiently comprehensive tool to allow for the integration of GenAI tools into educational assessment: the AI Assessment Scale (AIAS). The AIAS empowers educators to select the appropriate level of GenAI usage in assessments based on the learning outcomes they seek to address. The AIAS offers greater clarity and transparency for students and educators, provides a fair and equitable policy tool for institutions to work with, and offers a nuanced approach which embraces the opportunities of GenAI while recognising that there are instances where such tools may not be pedagogically appropriate or necessary. By adopting a practical, flexible approach that can be implemented quickly, the AIAS can form a much-needed starting point to address the current uncertainty and anxiety regarding GenAI in education. As a secondary objective, we engage with the current literature and advocate for a refocused discourse on GenAI tools in education, one which foregrounds how technologies can help support and enhance teaching and learning, which contrasts with the current focus on GenAI as a facilitator of academic misconduct.

翻訳日:2024-04-25 16:15:09 公開日:2024-04-24

# ランダムベクトル関数リンクを用いた軽量ランダム化非線形辞書学習法

A Lightweight Randomized Nonlinear Dictionary Learning Method using Random Vector Functional Link ( http://arxiv.org/abs/2402.03833v2 )

ライセンス: Link先を確認

G. Madhuri, Atul Negi,

(参考訳) カーネルベースの非線形辞書学習法は、暗黙的特徴写像によって得られる特徴空間で動作し、特異値分解(SVD)のような計算コストの高い演算とは独立していない。本稿では,ランダムベクトル関数リンク(RVFL)と呼ばれるランダム化関数リンクを用いて,非線形辞書を学習するためのSVDフリー軽量アプローチを提案する。提案したRVFLに基づく非線形辞書学習(RVFLDL)は,非線形スパース係数から高密度入力特徴へのスパース・トゥ・デンス特徴写像として辞書を学習する。初期乱数辞書のスパース係数 w.r.t は、ホースシュー先行を仮定して導出され、軽量ネットワークとなる入力として使用される。 RVFLは入力から出力層への重みを解析的に生成するので、RVFLベースの辞書のトレーニングはSVD計算から解放される。入力スパース係数と辞書原子との高次依存関係は、スパース係数を非線形に変換し、強化された特徴として付加することにより、トレーニングプロセスに組み込む。したがって、この方法は、辞書に非線形性を誘導しながら、より高次元空間にスパース係数を投影する。 RVFL-netを用いて分類するために、分類行列は非線形スパース係数をラベルにマッピングする変換として学習される。画像分類および再構成アプリケーションで示される手法の実証的証拠は、RVFLDLはスケーラブルであり、他の非線形辞書学習法よりも優れた解を提供することを示している。

Kernel-based nonlinear dictionary learning methods operate in a feature space obtained by an implicit feature map, and they are not independent of computationally expensive operations like Singular Value Decomposition (SVD). This paper presents an SVD-free lightweight approach to learning a nonlinear dictionary using a randomized functional link called a Random Vector Functional Link (RVFL). The proposed RVFL-based nonlinear Dictionary Learning (RVFLDL) learns a dictionary as a sparse-to-dense feature map from nonlinear sparse coefficients to the dense input features. Sparse coefficients w.r.t an initial random dictionary are derived by assuming Horseshoe prior are used as inputs making it a lightweight network. Training the RVFL-based dictionary is free from SVD computation as RVFL generates weights from the input to the output layer analytically. Higher-order dependencies between the input sparse coefficients and the dictionary atoms are incorporated into the training process by nonlinearly transforming the sparse coefficients and adding them as enhanced features. Thus the method projects sparse coefficients to a higher dimensional space while inducing nonlinearities into the dictionary. For classification using RVFL-net, a classifier matrix is learned as a transform that maps nonlinear sparse coefficients to the labels. The empirical evidence of the method illustrated in image classification and reconstruction applications shows that RVFLDL is scalable and provides a solution better than those obtained using other nonlinear dictionary learning methods.

翻訳日:2024-04-25 16:15:09 公開日:2024-04-24

# 動作コード:スパース変分多確率過程学習によるロバスト時系列分類と予測

Motion Code: Robust Time series Classification and Forecasting via Sparse Variational Multi-Stochastic Processes Learning ( http://arxiv.org/abs/2402.14081v2 )

ライセンス: Link先を確認

Chandrajit Bajaj, Minh Nguyen,

(参考訳) 広範に研究されているにもかかわらず、ノイズの多いデータの時系列分類と予測は非常に困難である。主な課題は、時系列を記述するのに適した数学的概念を見つけ、真の信号から効果的にノイズを分離することである。時系列を静的ベクトルやデータシーケンスとして扱う代わりに、連続時間確率過程のサンプル化として、必ずしも固定長ではない各時系列を考察する新しいフレームワークを導入する。このような数学的モデルは、複数のタイムスタンプにまたがるデータ依存を明示的に捉え、ノイズから隠れた時間依存信号を検出する。しかし、基礎となるデータはいくつかの異なるダイナミクスで構成されていることが多いため、単一の確率過程を用いたモデリングは不十分である。このような設定に対処するため、まず各ダイナミクスにシグネチャベクトルを割り当てる。次に、割り当てられたベクトルに基づいて個々のダイナミクスのスパース近似を推測する最も情報性の高いタイムスタンプの抽象的概念を提案する。最終的なモデルであるMotion Codeには、さまざまな基盤となるダイナミクスを統合的に完全にキャプチャ可能なパラメータが含まれている。これにより、未混合の分類と特定のサブタイプの予測を同時に生成することができる。センサやデバイスに関する大規模な実験は、時系列の分類と予測ベンチマークに対するモーションコードの競争性を実証している。

Despite being extensively studied, time series classification and forecasting on noisy data remain highly difficult. The main challenges lie in finding suitable mathematical concepts to describe time series and effectively separating noise from the true signals. Instead of treating time series as a static vector or a data sequence as often seen in previous methods, we introduce a novel framework that considers each time series, not necessarily of fixed length, as a sample realization of a continuous-time stochastic process. Such mathematical model explicitly captures the data dependence across several timestamps and detects the hidden time-dependent signals from noise. However, since the underlying data is often composed of several distinct dynamics, modeling using a single stochastic process is not sufficient. To handle such settings, we first assign each dynamics a signature vector. We then propose the abstract concept of the most informative timestamps to infer a sparse approximation of the individual dynamics based on their assigned vectors. The final model, referred to as Motion Code, contains parameters that can fully capture different underlying dynamics in an integrated manner. This allows unmixing classification and generation of specific sub-type forecasting simultaneously. Extensive experiments on sensors and devices noisy time series data demonstrate Motion Code's competitiveness against time series classification and forecasting benchmarks.

翻訳日:2024-04-25 16:15:09 公開日:2024-04-24

# ヒト皮膚剥離法の比較

Comparison of Methods in Human Skin Decomposition ( http://arxiv.org/abs/2404.00552v2 )

ライセンス: Link先を確認

Hao Gong, Michel Desvignes,

(参考訳) 皮膚色素の分解は医療分野において重要な役割を担っている。ヒトの皮膚はヘモグロビンとメラニンの2つの原始成分に分解することができる。皮膚癌の診断にこれらの結果を適用することが目的である。本稿では, 皮膚色素の分解法を比較検討し, 理論的および実験的に各方法の性能評価を行った。また, 等尺的特徴マッピング (Isomap) を導入し, 皮膚分解の文脈における寸法低減性能を向上させる。

Decomposition of skin pigment plays an important role in medical fields. Human skin can be decomposed into two primitive components, hemoglobin and melanin. It is our goal to apply these results for diagnosis of skin cancer. In this paper, various methods for skin pigment decomposition are reviewed comparatively and the performance of each method is evaluated both theoretically and experimentally. In addition, isometric feature mapping (Isomap) is introduced in order to improve the dimensionality reduction performance in context of skin decomposition.

翻訳日:2024-04-25 16:15:09 公開日:2024-04-24

# 学習したグアシアンスプラッツレンダリングと微調整拡散特性による雲の復元とデノナイズ

Few-shot point cloud reconstruction and denoising via learned Guassian splats renderings and fine-tuned diffusion features ( http://arxiv.org/abs/2404.01112v4 )

ライセンス: Link先を確認

Pietro Bonazzi, Marie-Julie Rakatosaona, Marco Cannici, Federico Tombari, Davide Scaramuzza,

(参考訳) 点雲の復元と復調のための既存のディープラーニング手法は、3次元形状の小さなデータセットに依存している。何十億もの画像で訓練されたディープラーニング手法を活用することで、この問題を回避する。画像ベース深層学習モデルから抽出した事前知識を利用して,少ない画像から点雲を再構成し,そのレンダリングから点雲を識別する手法を提案する。制約設定の再構築を改善するために,意味的整合性管理を導入することで,ハイブリッド表面と外観の相違可能なレンダラーのトレーニングを規則化する。さらに、ノイズの多い点雲の描画を微調整する安定拡散パイプラインを提案し、これらの学習されたフィルタを用いて、3Dの監督なしに来る点雲ノイズを除去する方法を実証する。提案手法をDSSとPointRadianceと比較し,Sketchfab TestsetとSCUT Datasetで高品質な3D再構成を実現した。

Existing deep learning methods for the reconstruction and denoising of point clouds rely on small datasets of 3D shapes. We circumvent the problem by leveraging deep learning methods trained on billions of images. We propose a method to reconstruct point clouds from few images and to denoise point clouds from their rendering by exploiting prior knowledge distilled from image-based deep learning models. To improve reconstruction in constraint settings, we regularize the training of a differentiable renderer with hybrid surface and appearance by introducing semantic consistency supervision. In addition, we propose a pipeline to finetune Stable Diffusion to denoise renderings of noisy point clouds and we demonstrate how these learned filters can be used to remove point cloud noise coming without 3D supervision. We compare our method with DSS and PointRadiance and achieved higher quality 3D reconstruction on the Sketchfab Testset and SCUT Dataset.

翻訳日:2024-04-25 16:15:09 公開日:2024-04-24

# シーングラフからの3次元シーン生成と自己注意

3D scene generation from scene graphs and self-attention ( http://arxiv.org/abs/2404.01887v3 )

ライセンス: Link先を確認

Pietro Bonazzi, Mengqi Wang, Diego Martin Arroyo, Fabian Manhardt, Nico Messikomer, Federico Tombari, Davide Scaramuzza,

(参考訳) リアルで多様な屋内3Dシーンレイアウトをコントロール可能な方法で合成することで、シミュレートされたナビゲーションとバーチャルリアリティーの応用が開かれる。シーンの簡潔で堅牢な表現として、シーングラフは生成されたレイアウトのセマンティックコントロールとして適していることが証明されている。本稿では,シーングラフとフロアプランから3次元シーンを合成する条件付き変分オートエンコーダ(cVAE)モデルを提案する。我々は、シーン内のオブジェクト間の高レベルな関係をキャプチャするために、自己注意層の特性を利用し、これらをモデルの構築ブロックとして使用します。本モデルでは,室内の物体の大きさ,寸法,配向を推定するために,所定のシーングラフ内の関係を満足させながらグラフトランスフォーマーを利用する。実験では、自己保持層がスペーサー(Graphto3Dの7.9倍)とより多様なシーン(16%)につながることが示された。

Synthesizing realistic and diverse indoor 3D scene layouts in a controllable fashion opens up applications in simulated navigation and virtual reality. As concise and robust representations of a scene, scene graphs have proven to be well-suited as the semantic control on the generated layout. We present a variant of the conditional variational autoencoder (cVAE) model to synthesize 3D scenes from scene graphs and floor plans. We exploit the properties of self-attention layers to capture high-level relationships between objects in a scene, and use these as the building blocks of our model. Our model, leverages graph transformers to estimate the size, dimension and orientation of the objects in a room while satisfying relationships in the given scene graph. Our experiments shows self-attention layers leads to sparser (7.9x compared to Graphto3D) and more diverse scenes (16%).

翻訳日:2024-04-25 16:15:08 公開日:2024-04-24

# 暗黒でテキストを見る:アルゴリズムとベンチマーク

Seeing Text in the Dark: Algorithm and Benchmark ( http://arxiv.org/abs/2404.08965v3 )

ライセンス: Link先を確認

Chengpei Xu, Hao Fu, Long Ma, Wenjing Jia, Chengqi Zhang, Feng Xia, Xiaoyu Ai, Binghao Li, Wenjie Zhang,

(参考訳) 低照度環境におけるテキストのローカライズは、視覚的劣化のため難しい。簡単な解法は低照度画像強調(LLE)を最初のステップとして検出する2段階のパイプラインを含むが、LLEは主に機械ではなく人間の視覚用に設計されており、エラーを蓄積することができる。そこで本研究では,LLEの必要性を回避するために,暗黒テキストのローカライズのための効率的かつ効果的な単一ステージアプローチを提案する。テキスト検出器の訓練段階において,制約付き学習モジュールを補助機構として導入する。このモジュールは、特徴マップリサイズ中のテキスト空間的特徴を保存するためのテキスト検出器のガイドとして設計されており、低照度の視覚的劣化下でのテキスト中の空間情報の損失を最小限に抑える。具体的には、本モジュール内に空間的再構成と空間的意味制約を組み込んで、テキスト検出器が本質的な位置的・文脈的範囲の知識を取得することを保証する。提案手法は,テキストの局所的トポロジ的特徴を動的ヘビ特徴ピラミッドネットワークを用いて同定し,新しい長方形累積法によるボトムアップ輪郭形成戦略を採用して,テキストの特徴を正確に記述する手法である。さらに,様々な場面や言語を含む任意の字形テキストを対象とした包括的低照度データセットを提案する。特に,本手法は,この低照度データセットの最先端結果を達成し,標準の標準照度データセットに匹敵する性能を示す。コードとデータセットがリリースされる。

Localizing text in low-light environments is challenging due to visual degradations. Although a straightforward solution involves a two-stage pipeline with low-light image enhancement (LLE) as the initial step followed by detector, LLE is primarily designed for human vision instead of machine and can accumulate errors. In this work, we propose an efficient and effective single-stage approach for localizing text in dark that circumvents the need for LLE. We introduce a constrained learning module as an auxiliary mechanism during the training stage of the text detector. This module is designed to guide the text detector in preserving textual spatial features amidst feature map resizing, thus minimizing the loss of spatial information in texts under low-light visual degradations. Specifically, we incorporate spatial reconstruction and spatial semantic constraints within this module to ensure the text detector acquires essential positional and contextual range knowledge. Our approach enhances the original text detector's ability to identify text's local topological features using a dynamic snake feature pyramid network and adopts a bottom-up contour shaping strategy with a novel rectangular accumulation technique for accurate delineation of streamlined text features. In addition, we present a comprehensive low-light dataset for arbitrary-shaped text, encompassing diverse scenes and languages. Notably, our method achieves state-of-the-art results on this low-light dataset and exhibits comparable performance on standard normal light datasets. The code and dataset will be released.

翻訳日:2024-04-25 16:05:24 公開日:2024-04-24

# ユーティリティ・フェアネス・トレードオフと課題の発見方法

Utility-Fairness Trade-Offs and How to Find Them ( http://arxiv.org/abs/2404.09454v2 )

ライセンス: Link先を確認

Sepehr Dehdashtian, Bashir Sadeghi, Vishnu Naresh Boddeti,

(参考訳) 人口的公平性を考慮した分類システムを構築する場合、満足すべき目的が2つある。 1) 特定業務の効用の最大化及び 2) 既知人口統計属性の公平性を確保すること。これらの目的はしばしば競合するので、両方の最適化は実用性と公正性のトレードオフにつながる可能性がある。既存の研究はトレードオフを認め、その限界を研究するが、2つの疑問は未解決のままである。 1)実用性と公正性の最適なトレードオフは何か。そして 2)データから所望の予測タスクと興味の人口統計属性を数値的に定量化する方法。この論文はこれらの疑問に対処する。データ・スペースとラベル・スペースのトレードオフという2つのユーティリティ・フェアネスのトレードオフを紹介します。トレードオフによって、ユーティリティフェアネスプレーン内の3つの領域が明らかになり、完全に部分的に可能で不可能なものが説明される。本稿では,データサンプルから与えられた予測タスクとグループフェアネス定義のトレードオフを数値的に定量化する方法であるU-FaTEを提案する。トレードオフに基づいて、表現を評価するための新しいスキームを導入する。 1000以上の事前訓練されたモデルからのフェア表現学習手法と表現の広範な評価により、現在のアプローチのほとんどは、複数のデータセットや予測タスクをまたいだ、推定および達成可能なフェアネスユーティリティトレードオフからかけ離れていることが明らかとなった。

When building classification systems with demographic fairness considerations, there are two objectives to satisfy: 1) maximizing utility for the specific task and 2) ensuring fairness w.r.t. a known demographic attribute. These objectives often compete, so optimizing both can lead to a trade-off between utility and fairness. While existing works acknowledge the trade-offs and study their limits, two questions remain unanswered: 1) What are the optimal trade-offs between utility and fairness? and 2) How can we numerically quantify these trade-offs from data for a desired prediction task and demographic attribute of interest? This paper addresses these questions. We introduce two utility-fairness trade-offs: the Data-Space and Label-Space Trade-off. The trade-offs reveal three regions within the utility-fairness plane, delineating what is fully and partially possible and impossible. We propose U-FaTE, a method to numerically quantify the trade-offs for a given prediction task and group fairness definition from data samples. Based on the trade-offs, we introduce a new scheme for evaluating representations. An extensive evaluation of fair representation learning methods and representations from over 1000 pre-trained models revealed that most current approaches are far from the estimated and achievable fairness-utility trade-offs across multiple datasets and prediction tasks.

翻訳日:2024-04-25 16:05:24 公開日:2024-04-24

# プログレッシブ・マルチモーダル・コンディショナル・プロンプトチューニング

Progressive Multi-modal Conditional Prompt Tuning ( http://arxiv.org/abs/2404.11864v2 )

ライセンス: Link先を確認

Xiaoyu Qiu, Hao Feng, Yuechen Wang, Wengang Zhou, Houqiang Li,

(参考訳) 事前学習された視覚言語モデル(VLM)は、VLMを知識ベースとして活用し、下流タスクに有用な情報を抽出するプロンプトを通じて、顕著な一般化能力を示す。しかし、既存の手法は主にユニモーダルプロンプトを採用しており、これはユニモーダル分岐のみを介し、視覚言語(V-L)の機能を同時に調整することができない。さらに、VLMエンコーディングにおけるワンパスフォワードパイプラインは、大きなギャップを持つV-L機能を調整するのに苦労している。これらの課題を克服し,Progressive Multi-modal Conditional Prompt Tuning (ProMPT)を提案する。 ProMPTは、画像と電流の符号化情報を反復的に利用することにより、V-L機能の最適化と整合化を繰り返す構造を利用する。初期化と多モード反復進化(MIE)モジュールを含む。初期化は、VLMを使用して画像とテキストを符号化し、続いて、画像に似たテキスト特徴を選択する特徴フィルタが続く。 MIEは、クラス条件の視覚プロンプト、インスタンス条件のテキストプロンプト、機能フィルタリングによるマルチモーダルプロンプトを容易にする。各MIEイテレーションでは、視覚生成器を介してフィルタリングされたテキスト特徴から視覚プロンプトが得られ、視覚プロンプト中に対象物にもっと焦点を合わせるように画像特徴が促進される。エンコードされたイメージ機能はテキストジェネレータに入力され、クラスシフトに対してより堅牢なテキストプロンプトを生成する。これにより、V-Lの機能は徐々に整列され、粗い状態から正確な予測へと進むことができる。 ProMPTの有効性を評価するために, 広範囲な実験を3つの環境で行った。その結果, ProMPTはすべての設定において, 従来の手法よりも優れ, より優れた一般化とロバスト性を示すことがわかった。コードはhttps://github.com/qiuxiaoyu9954/ProMPTで入手できる。

Pre-trained vision-language models (VLMs) have shown remarkable generalization capabilities via prompting, which leverages VLMs as knowledge bases to extract information beneficial for downstream tasks. However, existing methods primarily employ uni-modal prompting, which only engages a uni-modal branch, failing to simultaneously adjust vision-language (V-L) features. Additionally, the one-pass forward pipeline in VLM encoding struggles to align V-L features that have a huge gap. Confronting these challenges, we propose a novel method, Progressive Multi-modal conditional Prompt Tuning (ProMPT). ProMPT exploits a recurrent structure, optimizing and aligning V-L features by iteratively utilizing image and current encoding information. It comprises an initialization and a multi-modal iterative evolution (MIE) module. Initialization is responsible for encoding images and text using a VLM, followed by a feature filter that selects text features similar to image. MIE then facilitates multi-modal prompting through class-conditional vision prompting, instance-conditional text prompting, and feature filtering. In each MIE iteration, vision prompts are obtained from filtered text features via a vision generator, promoting image features to focus more on target object during vision prompting. The encoded image features are fed into a text generator to produce text prompts that are more robust to class shifts. Thus, V-L features are progressively aligned, enabling advance from coarse to exact prediction. Extensive experiments are conducted in three settings to evaluate the efficacy of ProMPT. The results indicate that ProMPT outperforms existing methods on average across all settings, demonstrating its superior generalization and robustness. Code is available at https://github.com/qiuxiaoyu9954/ProMPT.

翻訳日:2024-04-25 16:05:24 公開日:2024-04-24

# OPTiML: 自己監督型医用画像表現のための最適輸送を用いた高密度セマンティック不変性

OPTiML: Dense Semantic Invariance Using Optimal Transport for Self-Supervised Medical Image Representation ( http://arxiv.org/abs/2404.11868v2 )

ライセンス: Link先を確認

Azad Singh, Vandan Gorade, Deepak Mishra,

(参考訳) 自己教師付き学習(SSL)は、アノテーションなしで学習できることから、医用画像解析の有望な技術として登場した。しかし、有望な可能性にもかかわらず、従来のSSLメソッドでは、セマンティックアライメントの達成や微妙な詳細の取得など、制限に直面している。これは、解剖学的構造や病理的詳細を正確に把握できない、最適下界表現につながる。これらの制約に対応するため,医用画像表現学習におけるSSLの全体的な効果を高めるために,最適なトランスポート(OT)を用いた新しいSSLフレームワークOPTiMLを導入する。中心となる考え方は、OTとクロスビューポイントセマンティクス・インフュージョン・モジュール(CV-SIM)を統合することである。 CV-SIMモジュールに加えて、OPTiMLはOTフレームワーク内での分散と共分散の規則化を強制し、臨床的に関係のある情報に焦点を絞ると同時に、より少ない情報的特徴を破棄する。提案するフレームワークは,様々な医用画像タスクに適用可能な意味豊かな表現を学習する能力を示す。その有効性を検証するために,胸部X線モダリティから利用可能な3つのデータセットについて実験を行った。実験の結果,OPTiMLはすべての評価課題において,最先端の手法よりも優れていることがわかった。

Self-supervised learning (SSL) has emerged as a promising technique for medical image analysis due to its ability to learn without annotations. However, despite the promising potential, conventional SSL methods encounter limitations, including challenges in achieving semantic alignment and capturing subtle details. This leads to suboptimal representations, which fail to accurately capture the underlying anatomical structures and pathological details. In response to these constraints, we introduce a novel SSL framework OPTiML, employing optimal transport (OT), to capture the dense semantic invariance and fine-grained details, thereby enhancing the overall effectiveness of SSL in medical image representation learning. The core idea is to integrate OT with a cross-viewpoint semantics infusion module (CV-SIM), which effectively captures complex, fine-grained details inherent in medical images across different viewpoints. In addition to the CV-SIM module, OPTiML imposes the variance and covariance regularizations within OT framework to force the model focus on clinically relevant information while discarding less informative features. Through these, the proposed framework demonstrates its capacity to learn semantically rich representations that can be applied to various medical imaging tasks. To validate its effectiveness, we conduct experimental studies on three publicly available datasets from chest X-ray modality. Our empirical results reveal OPTiML's superiority over state-of-the-art methods across all evaluated tasks.

翻訳日:2024-04-25 16:05:24 公開日:2024-04-24

# FlagVNE: ネットワークリソース割り当てのためのフレキシブルで汎用的な強化学習フレームワーク

FlagVNE: A Flexible and Generalizable Reinforcement Learning Framework for Network Resource Allocation ( http://arxiv.org/abs/2404.12633v2 )

ライセンス: Link先を確認

Tianfu Wang, Qilin Fan, Chao Wang, Long Yang, Leilei Ding, Nicholas Jing Yuan, Hui Xiong,

(参考訳) VNE(Virtual Network Embedding)は、仮想ネットワーク要求(VNR)を物理インフラにマッピングすることを目的とした、ネットワーク仮想化における重要なリソース割り当てタスクである。強化学習(RL)は近年,この問題に対する有望な解決策として浮上している。しかし、既存のRLベースのVNE法は、一方向のアクション設計と一方向のトレーニング戦略によって制限されており、探索性や一般化性が制限される。本稿では,FLexible And Generalizable RL framework for VNE(FragVNE)を提案する。具体的には,仮想ノードと物理ノードの同時選択を可能にする双方向動作に基づくマルコフ決定プロセスモデルを設計し,解空間の探索性を向上させる。広範かつダイナミックな動作空間に取り組むために,適応的な動作確率分布を生成し,高い訓練効率を確保する階層型デコーダを設計する。さらに, 様々なVNRサイズに対する一般化問題を克服するために, 各VNRサイズに対する専門的な政策訓練を容易にする, カリキュラムスケジューリング戦略を備えたメタRLベースのトレーニング手法を提案する。最後に、多数の実験結果から、FragVNEが複数の主要な指標にまたがって有効であることが示されている。私たちのコードはGitHubで入手可能です(https://github.com/GeminiLight/flag-vne)。

Virtual network embedding (VNE) is an essential resource allocation task in network virtualization, aiming to map virtual network requests (VNRs) onto physical infrastructure. Reinforcement learning (RL) has recently emerged as a promising solution to this problem. However, existing RL-based VNE methods are limited by the unidirectional action design and one-size-fits-all training strategy, resulting in restricted searchability and generalizability. In this paper, we propose a FLexible And Generalizable RL framework for VNE, named FlagVNE. Specifically, we design a bidirectional action-based Markov decision process model that enables the joint selection of virtual and physical nodes, thus improving the exploration flexibility of solution space. To tackle the expansive and dynamic action space, we design a hierarchical decoder to generate adaptive action probability distributions and ensure high training efficiency. Furthermore, to overcome the generalization issue for varying VNR sizes, we propose a meta-RL-based training method with a curriculum scheduling strategy, facilitating specialized policy training for each VNR size. Finally, extensive experimental results show the effectiveness of FlagVNE across multiple key metrics. Our code is available at GitHub (https://github.com/GeminiLight/flag-vne).

翻訳日:2024-04-25 16:05:24 公開日:2024-04-24

# 視線が知覚できる:マルチモーダル大言語モデルの非現実的推論能力のベンチマーク

Eyes Can Deceive: Benchmarking Counterfactual Reasoning Abilities of Multi-modal Large Language Models ( http://arxiv.org/abs/2404.12966v2 )

ライセンス: Link先を確認

Yian Li, Wentao Tian, Yang Jiao, Jingjing Chen, Yu-Gang Jiang,

(参考訳) カウンターファクチュアル推論は、人間の知性の重要な証明として、確立した事実に基づいて仮定を行い、潜在的な結果を外挿することを指す。既存のマルチモーダルな大規模言語モデル(MLLM)は、様々なビジュアル質問回答(VQA)ベンチマークで検証された、印象的な認知と推論能力を示した。それでも、既存のMLLMは、逆問題に直面した場合、どのように機能するのか? この疑問に答えるために,我々はまず,MLLM の因果推論能力を体系的に評価するために,新規な \textbf{C}ounter\textbf{F}actual \textbf{M}ulti\textbf{M}odal reasoning benchmark をキュレートする。我々のCFMMは6つの課題から構成されており、それぞれが多岐にわたるMLLMの対実的推論能力を評価するために、慎重にラベル付けされた数百の対実的質問を含む。興味深いことに、実験を通して、既存のMLLMは、自分たちが見ているものを信じることを好んでいるが、問題に提示される偽の前提を無視し、不正確な応答をもたらす。さらに,提案するCFMMを用いて,MLLMを広範囲に評価する。 CFMMのパフォーマンスといくつかのVQAベンチマークとの間の大きなギャップは、既存のMLLMが人間レベルのインテリジェンスに近づくための十分な改善の余地があることを示している。一方,今後のCFMMにおけるMLLMの性能向上により,高度な知能を持つMLLMの開発に向けた潜在的な道筋を探求することができる。

Counterfactual reasoning, as a crucial manifestation of human intelligence, refers to making presuppositions based on established facts and extrapolating potential outcomes. Existing multimodal large language models (MLLMs) have exhibited impressive cognitive and reasoning capabilities, which have been examined across a wide range of Visual Question Answering (VQA) benchmarks. Nevertheless, how will existing MLLMs perform when faced with counterfactual questions? To answer this question, we first curate a novel \textbf{C}ounter\textbf{F}actual \textbf{M}ulti\textbf{M}odal reasoning benchmark, abbreviated as \textbf{CFMM}, to systematically assess the counterfactual reasoning capabilities of MLLMs. Our CFMM comprises six challenging tasks, each including hundreds of carefully human-labeled counterfactual questions, to evaluate MLLM's counterfactual reasoning capabilities across diverse aspects. Through experiments, interestingly, we find that existing MLLMs prefer to believe what they see, but ignore the counterfactual presuppositions presented in the question, thereby leading to inaccurate responses. Furthermore, we evaluate a wide range of prevalent MLLMs on our proposed CFMM. The significant gap between their performance on our CFMM and that on several VQA benchmarks indicates that there is still considerable room for improvement in existing MLLMs toward approaching human-level intelligence. On the other hand, through boosting MLLMs performances on our CFMM in the future, potential avenues toward developing MLLMs with advanced intelligence can be explored.

翻訳日:2024-04-25 16:05:24 公開日:2024-04-24

# スコア変更を超えて:2つの観点からの非参照画像品質評価に対する敵対的攻撃

Beyond Score Changes: Adversarial Attack on No-Reference Image Quality Assessment from Two Perspectives ( http://arxiv.org/abs/2404.13277v2 )

ライセンス: Link先を確認

Chenxi Yang, Yujia Liu, Dingquan Li, Yan Zhong, Tingting Jiang,

(参考訳) ディープニューラルネットワークは、NR-IQA(No-Reference Image Quality Assessment)において驚くべき成功を収めている。しかし、最近の研究は、NR-IQAモデルが微妙な敵の摂動に対して脆弱であることを強調し、モデル予測と主観的評価の不整合をもたらす。しかし、現在の敵対的攻撃は、個々の画像の予測スコアの摂動に焦点を合わせ、画像集合全体におけるスコア間の相関関係の重要な側面を無視している。一方、ランキング相関と同様、NR-IQAタスクでは相関が重要な役割を担っていることに留意する必要がある。 NR-IQAモデルのロバスト性を包括的に探求するために,画像集合内の相関関係を乱し,個々の画像に変化をスコアする相関エラーベースの新たなフレームワークを導入する。我々の研究は主に、Spearman's Rank-Order correlation Coefficient (SROCC)やMean Squared Error (MSE)のような予測エラー関連メトリクスのようなランキング関連相関指標に焦点を当てている。そこで本研究では,SROCC-MSE-Attack (SMA) と呼ばれる2段階のSROCC-MSE-Attack (SMA) を提案する。実験の結果,SMA法はSROCCを負の値に大きく破壊するだけでなく,個々の画像のスコアにかなりの変化をもたらすことが明らかとなった。一方、さまざまなカテゴリのメトリクスにまたがって最先端のパフォーマンスを示す。提案手法はNR-IQAモデルのロバスト性に関する新しい視点を提供する。

Deep neural networks have demonstrated impressive success in No-Reference Image Quality Assessment (NR-IQA). However, recent researches highlight the vulnerability of NR-IQA models to subtle adversarial perturbations, leading to inconsistencies between model predictions and subjective ratings. Current adversarial attacks, however, focus on perturbing predicted scores of individual images, neglecting the crucial aspect of inter-score correlation relationships within an entire image set. Meanwhile, it is important to note that the correlation, like ranking correlation, plays a significant role in NR-IQA tasks. To comprehensively explore the robustness of NR-IQA models, we introduce a new framework of correlation-error-based attacks that perturb both the correlation within an image set and score changes on individual images. Our research primarily focuses on ranking-related correlation metrics like Spearman's Rank-Order Correlation Coefficient (SROCC) and prediction error-related metrics like Mean Squared Error (MSE). As an instantiation, we propose a practical two-stage SROCC-MSE-Attack (SMA) that initially optimizes target attack scores for the entire image set and then generates adversarial examples guided by these scores. Experimental results demonstrate that our SMA method not only significantly disrupts the SROCC to negative values but also maintains a considerable change in the scores of individual images. Meanwhile, it exhibits state-of-the-art performance across metrics with different categories. Our method provides a new perspective on the robustness of NR-IQA models.

翻訳日:2024-04-25 16:05:24 公開日:2024-04-24

# 超低損失集積フォトニクスは明るい狭帯域光子対源を可能にする

Ultralow-loss integrated photonics enables bright, narrow-band, photon-pair sources ( http://arxiv.org/abs/2404.13387v2 )

ライセンス: Link先を確認

Ruiyang Chen, Yi-Han Luo, Jinbao Long, Baoqi Shi, Chen Shen, Junqiu Liu,

(参考訳) 光子対光源は、フォトニック量子系にとって重要な構成要素である。ケーラー非線形性とキャビティ強化した自発4波混合を利用して、フォトニック集積回路上に構築されたマイクロ共振器を用いてチップスケール光子対光源を作成することができる。実用化のためには、マイクロ共振器の品質係数$Q$は光子対光源の輝度を増大させ、その直線幅を減少させることが必須である。前者は$Q^4$に、後者は$Q$に比例する。本稿では,マイクロ共振器をベースとした狭帯域光子対光源について述べる。この集積マイクロ共振器は窒化ケイ素で作製され、標準のCMOSファウントリープロセスで製造され、極低損失が3ドル/m、本質的なQ$は10ドル^7ドルである。光子対光源の輝度は1.17\times10^9$ Hz/mW$^2$/GHz、光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光さらに、2階相関$g^{(2)}_\mathrm{h}(0)=0.0037(5)$と、視認率$0.973(9)$の時間ビン絡みのソースも可能となる。我々の研究は、超低損失集積フォトニクスのグローバルポテンシャルを証明し、量子通信やネットワークへの効率的でコンパクトで堅牢なインターフェースを触媒する新しい量子光源と回路を創出する。

Photon-pair sources are critical building blocks for photonic quantum systems. Leveraging Kerr nonlinearity and cavity-enhanced spontaneous four-wave mixing, chip-scale photon-pair sources can be created using microresonators built on photonic integrated circuit. For practical applications, a high microresonator quality factor $Q$ is mandatory to magnify photon-pair sources' brightness and reduce their linewidth. The former is proportional to $Q^4$, while the latter is inversely proportional to $Q$. Here, we demonstrate an integrated, microresonator-based, narrow-band photon-pair source. The integrated microresonator, made of silicon nitride and fabricated using a standard CMOS foundry process, features ultralow loss down to $3$ dB/m and intrinsic $Q$ factor exceeding $10^7$. The photon-pair source has brightness of $1.17\times10^9$ Hz/mW$^2$/GHz and linewidth of $25.9$ MHz, both of which are record values for silicon-photonics-based quantum light source. It further enables a heralded single-photon source with heralded second-order correlation $g^{(2)}_\mathrm{h}(0)=0.0037(5)$, as well as a time-bin entanglement source with a raw visibility of $0.973(9)$. Our work evidences the global potential of ultralow-loss integrated photonics to create novel quantum light sources and circuits, catalyzing efficient, compact and robust interfaces to quantum communication and networks.

翻訳日:2024-04-25 16:05:24 公開日:2024-04-24

# 高周波分解によるブラケット画像の復元と改善

Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition ( http://arxiv.org/abs/2404.13537v2 )

ライセンス: Link先を確認

Genggeng Chen, Kexin Dai, Kangzhen Yang, Tao Hu, Xiangyu Chen, Yongqing Yang, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan,

(参考訳) 現実のシナリオでは、一連の画像劣化のため、高品質で透明なコンテンツ写真を得るのは難しい。高品質な画像の合成には大きな進歩があったが、以前の画像復元と改善の方法は、しばしば異なる劣化の特性を見落としていた。彼らは、様々な種類の劣化に対処するために、同じ構造を適用した。高周波数情報が異なる劣化に適用できるという考えから着想を得て,高周波数分解に基づくブラケット画像復元・改善手法HLNetを導入する。具体的には,共有重み加群と非共有重み加群という,特徴抽出に2つの加群を用いる。共有重みモジュールでは、SCConvを用いて、異なる劣化から共通特徴を抽出する。非共有重みモジュールでは、高速周波数分解ブロック(HLFDB)を導入し、低周波情報を処理し、異なる劣化により効果的に対処できるようにする。本手法は他のネットワークと比較して,劣化特性を考慮し,高品質な画像復元を実現する。

In real-world scenarios, due to a series of image degradations, obtaining high-quality, clear content photos is challenging. While significant progress has been made in synthesizing high-quality images, previous methods for image restoration and enhancement often overlooked the characteristics of different degradations. They applied the same structure to address various types of degradation, resulting in less-than-ideal restoration outcomes. Inspired by the notion that high/low frequency information is applicable to different degradations, we introduce HLNet, a Bracketing Image Restoration and Enhancement method based on high-low frequency decomposition. Specifically, we employ two modules for feature extraction: shared weight modules and non-shared weight modules. In the shared weight modules, we use SCConv to extract common features from different degradations. In the non-shared weight modules, we introduce the High-Low Frequency Decomposition Block (HLFDB), which employs different methods to handle high-low frequency information, enabling the model to address different degradations more effectively. Compared to other networks, our method takes into account the characteristics of different degradations, thus achieving higher-quality image restoration.

翻訳日:2024-04-25 16:05:24 公開日:2024-04-24

# Bt-GAN: Bias-transforming Generative Adversarial Networksによる公正な合成健康データの生成

Bt-GAN: Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks ( http://arxiv.org/abs/2404.13634v2 )

ライセンス: Link先を確認

Resmi Ramachandranpillai, Md Fahim Sikder, David Bergström, Fredrik Heintz,

(参考訳) 合成データ生成は、現実的な非識別データを生成することにより、電子医療記録(EHR)の有用性を高めるための有望なソリューションを提供する。しかし、既存の文献は、下流予測における公平性の重要な側面を無視して、合成健康データの品質に重点を置いている。その結果、合成EHRで訓練されたモデルは、目標タスクにおいてバイアスのある結果を生み出すという批判に直面している。これらのバイアスは、特徴間の急激な相関や、サブグループを正確に表現するモデルの失敗から生じることがある。これらの問題に対処するために、医療領域向けに設計されたGANベースの合成データ生成装置であるBt-GAN(Bias-transforming Generative Adversarial Networks)を提案する。素早い相関に挑戦するために i) 情報制約付きデータ生成プロセスを提案し, アルゴリズムの公正性の概念に基づいて, 生成者が公正な決定論的変換を学習できるようにする。正確な部分群表現の取得という課題を克服する (II) スコアベース重み付けサンプリングにより, サブグループ密度を保ち, ジェネレータにインセンティブを与える。このアプローチは、データ多様体の未表現領域から学習するジェネレータを補完する。我々はMIMIC-IIIデータベースを用いて広範囲にわたる実験を行った。以上の結果から,Bt-GANはSOTAの精度を向上し,公平性を向上し,バイアス増幅を最小化できることがわかった。また,本研究の有効性を裏付ける証拠として,詳細な説明可能性分析を行った。そこで本研究では,医療領域における合成データ生成の限界に対処するための,新規かつ専門的なアプローチを提案する。公平性を考慮し、GANのような高度な技術を活用することで、医療応用における信頼性と偏見のない予測の道を開く。

Synthetic data generation offers a promising solution to enhance the usefulness of Electronic Healthcare Records (EHR) by generating realistic de-identified data. However, the existing literature primarily focuses on the quality of synthetic health data, neglecting the crucial aspect of fairness in downstream predictions. Consequently, models trained on synthetic EHR have faced criticism for producing biased outcomes in target tasks. These biases can arise from either spurious correlations between features or the failure of models to accurately represent sub-groups. To address these concerns, we present Bias-transforming Generative Adversarial Networks (Bt-GAN), a GAN-based synthetic data generator specifically designed for the healthcare domain. In order to tackle spurious correlations (i), we propose an information-constrained Data Generation Process that enables the generator to learn a fair deterministic transformation based on a well-defined notion of algorithmic fairness. To overcome the challenge of capturing exact sub-group representations (ii), we incentivize the generator to preserve sub-group densities through score-based weighted sampling. This approach compels the generator to learn from underrepresented regions of the data manifold. We conduct extensive experiments using the MIMIC-III database. Our results demonstrate that Bt-GAN achieves SOTA accuracy while significantly improving fairness and minimizing bias amplification. We also perform an in-depth explainability analysis to provide additional evidence supporting the validity of our study. In conclusion, our research introduces a novel and professional approach to addressing the limitations of synthetic data generation in the healthcare domain. By incorporating fairness considerations and leveraging advanced techniques such as GANs, we pave the way for more reliable and unbiased predictions in healthcare applications.

翻訳日:2024-04-25 16:05:24 公開日:2024-04-24

# データ拡張によるソーシャルネットワークの予測向上に関する比較研究

A Comparative Study on Enhancing Prediction in Social Network Advertisement through Data Augmentation ( http://arxiv.org/abs/2404.13812v2 )

ライセンス: Link先を確認

Qikai Yang, Panfeng Li, Xinhe Xu, Zhicheng Ding, Wenjing Zhou, Yi Nian,

(参考訳) ソーシャルネットワーク広告の世界では、予測モデルのパフォーマンスにおいて、データの量と正確さが重要な役割を担っている。しかし、堅牢な予測アルゴリズムの開発は、しばしば実世界のデータセットに存在する限られたサイズと潜在的なバイアスによって妨げられる。本研究では,ソーシャルネットワーク広告データの生成的拡張フレームワークを提示し,検討する。本稿では,データ拡張のための生成モデルとして,GAN(Generative Adversarial Networks),VAE(VAE),Gaussian Mixture Models(GMM)の3つを検討した。特徴空間の合成拡張を行うことにより,データ拡張により,様々な分類器の性能が定量的に向上したことがわかった。さらに,各データ拡張手法がもたらす相対的な性能向上を比較し,モデル性能を向上させる適切なテクニックを選択するための洞察を提供する。本稿では,ソーシャル・ネットワーク・広告分野において,合成データの増大により,小あるいは不均衡なデータセットによる制限が緩和されることを示すことによって文献に寄与する。同時に、本論文は、異なるデータ拡張手法の実用性に関する比較視点も提供し、モデル性能を向上させるための適切なテクニックを実践者に選択するよう促す。

In the ever-evolving landscape of social network advertising, the volume and accuracy of data play a critical role in the performance of predictive models. However, the development of robust predictive algorithms is often hampered by the limited size and potential bias present in real-world datasets. This study presents and explores a generative augmentation framework of social network advertising data. Our framework explores three generative models for data augmentation - Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Gaussian Mixture Models (GMMs) - to enrich data availability and diversity in the context of social network advertising analytics effectiveness. By performing synthetic extensions of the feature space, we find that through data augmentation, the performance of various classifiers has been quantitatively improved. Furthermore, we compare the relative performance gains brought by each data augmentation technique, providing insights for practitioners to select appropriate techniques to enhance model performance. This paper contributes to the literature by showing that synthetic data augmentation alleviates the limitations imposed by small or imbalanced datasets in the field of social network advertising. At the same time, this article also provides a comparative perspective on the practicality of different data augmentation methods, thereby guiding practitioners to choose appropriate techniques to enhance model performance.

翻訳日:2024-04-25 16:05:24 公開日:2024-04-24

# 地域風と色移動

Regional Style and Color Transfer ( http://arxiv.org/abs/2404.13880v2 )

ライセンス: Link先を確認

Zhicheng Ding, Panfeng Li, Qikai Yang, Siyang Li, Qingtian Gong,

(参考訳) 本稿では,地域スタイル移行の分野への新たな貢献について述べる。既存の手法は、画像全体にわたって均一にスタイルを適用するという欠点に悩まされることが多く、人物像などの前景要素を持つ画像に適用した場合、スタイル上の不整合や前景オブジェクトがねじれてしまう。この制限に対処するために、セグメント化ネットワークを利用して入力画像内の前景オブジェクトを正確に分離する新しいアプローチを提案する。その後、背景領域にのみスタイル転送が適用される。分離されたフォアグラウンドオブジェクトは、慎重にスタイル変換された背景に再統合される。前景と背景との視覚的コヒーレンスを高めるために、再法人化前の前景要素に色転写ステップを用いる。最後に,羽ばたき技術を用いて,前景と背景のシームレスな融合を実現し,視覚的に統一され,美的な最終構成を実現する。その結果,提案手法は従来の手法に比べて,より自然なスタイル変換をもたらすことがわかった。

This paper presents a novel contribution to the field of regional style transfer. Existing methods often suffer from the drawback of applying style homogeneously across the entire image, leading to stylistic inconsistencies or foreground object twisted when applied to image with foreground elements such as person figures. To address this limitation, we propose a new approach that leverages a segmentation network to precisely isolate foreground objects within the input image. Subsequently, style transfer is applied exclusively to the background region. The isolated foreground objects are then carefully reintegrated into the style-transferred background. To enhance the visual coherence between foreground and background, a color transfer step is employed on the foreground elements prior to their rein-corporation. Finally, we utilize feathering techniques to achieve a seamless amalgamation of foreground and background, resulting in a visually unified and aesthetically pleasing final composition. Extensive evaluations demonstrate that our proposed approach yields significantly more natural stylistic transformations compared to conventional methods.

翻訳日:2024-04-25 16:05:24 公開日:2024-04-24

# MaterialSeg3D:Dense Materials from 2D Priors for 3D Assets (特集バイオサイバネティックスとバイオサイバネティックス)

MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets ( http://arxiv.org/abs/2404.13923v2 )

ライセンス: Link先を確認

Zeyu Li, Ruitong Gan, Chuanchen Luo, Yuxi Wang, Jiaheng Liu, Ziwei Zhu Man Zhang, Qing Li, Xucheng Yin, Zhaoxiang Zhang, Junran Peng,

(参考訳) 強力な画像拡散モデルによって駆動される最近の研究は、テキストや視覚的ガイダンスから3Dオブジェクトを自動生成することに成功した。スコア蒸留サンプリング(SDS)を様々な視点で反復的に行うことにより、これらの手法は3次元空間に先立って2次元生成物を持ち上げることに成功している。しかし、そのような2次元生成画像は、照明効果と影をテクスチャに焼き込む。結果として、SDSによって最適化された材料マップは必然的に、相互に相関する成分を伴っている。正確な物質定義がないため、新しいシーンで生成された資産を合理的にリライトすることは不可能であり、下流のシナリオでの応用を制限する。対照的に、人間はこの曖昧さを、その外見や意味から物体の物質を引き出すことによって、力ずくで回避することができる。そこで本研究では,2次元セマンティックから基礎となる物質を推定する3次元アセット・マテリアル生成フレームワークであるMaterialSeg3Dを提案する。このような先行モデルに基づいて,材料を三次元空間で解析する機構を考案する。われわれはUVスタックを維持しており、それぞれのマップは特定の視点から投影されていない。すべての視点をトラバースした後、重み付けされた投票方式でスタックを融合し、領域統一を用いて対象部品のコヒーレンスを確保する。セマンティクスの学習に先立って,多彩な画像,多様なカテゴリ,正確なアノテーションを特徴とするMIO(Materialized Individual Objects)という資料データセットを収集した。定量的および定性的実験により,本手法の有効性を実証した。

Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. However, such a 2D generative image prior bakes the effect of illumination and shadow into the texture. As a result, material maps optimized by SDS inevitably involve spurious correlated components. The absence of precise material definition makes it infeasible to relight the generated assets reasonably in novel scenes, which limits their application in downstream scenarios. In contrast, humans can effortlessly circumvent this ambiguity by deducing the material of the object from its appearance and semantics. Motivated by this insight, we propose MaterialSeg3D, a 3D asset material generation framework to infer underlying material from the 2D semantic prior. Based on such a prior model, we devise a mechanism to parse material in 3D space. We maintain a UV stack, each map of which is unprojected from a specific viewpoint. After traversing all viewpoints, we fuse the stack through a weighted voting scheme and then employ region unification to ensure the coherence of the object parts. To fuel the learning of semantics prior, we collect a material dataset, named Materialized Individual Objects (MIO), which features abundant images, diverse categories, and accurate annotations. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method.

翻訳日:2024-04-25 16:05:24 公開日:2024-04-24

# ジェネレーティブAIの著作権問題に対する経済的解決策

An Economic Solution to Copyright Challenges of Generative AI ( http://arxiv.org/abs/2404.13964v3 )

ライセンス: Link先を確認

Jiachen T. Wang, Zhun Deng, Hiroaki Chiba-Okabe, Boaz Barak, Weijie J. Su,

(参考訳) 生成人工知能(AI)システムは、テキスト、画像、ビデオ、その他のメディアを生成するために、大規模なデータコーパスで訓練されている。このようなシステムは、データコントリビュータのトレーニングに関する著作権権に侵害されるのではないか、という懸念が高まっている。生成AIの著作権問題に対処するため、我々は、AI生成コンテンツ作成への貢献に比例して著作権所有者を補償する枠組みを提案する。コントリビューションの計量は、現代の生成AIモデルの確率的性質を活用し、経済学における協調ゲーム理論の技法を用いて定量的に決定される。このフレームワークは、AI開発者が高品質なトレーニングデータにアクセスすることで、モデルパフォーマンスを向上させるプラットフォームを可能にする。一方、著作権所有者は公正な補償を受け、生成モデルトレーニングのための関連データの継続的な提供を推進している。実験により,本フレームワークは,著作権所有者間の収益の公平かつ解釈可能な分配を確保するため,美術作品生成において最も関連性の高いデータソースの同定に成功していることが示された。

Generative artificial intelligence (AI) systems are trained on large data corpora to generate new pieces of text, images, videos, and other media. There is growing concern that such systems may infringe on the copyright interests of training data contributors. To address the copyright challenges of generative AI, we propose a framework that compensates copyright owners proportionally to their contributions to the creation of AI-generated content. The metric for contributions is quantitatively determined by leveraging the probabilistic nature of modern generative AI models and using techniques from cooperative game theory in economics. This framework enables a platform where AI developers benefit from access to high-quality training data, thus improving model performance. Meanwhile, copyright owners receive fair compensation, driving the continued provision of relevant data for generative model training. Experiments demonstrate that our framework successfully identifies the most relevant data sources used in artwork generation, ensuring a fair and interpretable distribution of revenues among copyright owners.