Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240802となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# SHARP-Net:Culvertと下水道管の欠陥分割のための精製ピラミッドネットワーク SHARP-Net: A Refined Pyramid Network for Deficiency Segmentation in Culverts and Sewer Pipes ( http://arxiv.org/abs/2408.08879v1 ) ライセンス: Link先を確認	Rasha Alshawi, Md Meftahul Ferdaus, Md Tamjidul Hoque, Kendall Niles, Ken Pathak, Steve Sloan, Mahdi Abdelguerfi,	(参考訳) 本稿では,意味的セグメンテーションのための新しいアーキテクチャであるSemantic Haar-Adaptive Refined Pyramid Network (SHARP-Net)を紹介する。 SHARP-Netは、インセプションに似たブロックと様々なフィルタサイズ(3x3$と5x5)、並列マックスプーリング、追加の空間検出層を備えたボトムアップ経路を統合している。この設計は、マルチスケールの特徴と詳細な構造を捉えている。ネットワーク全体を通して、複雑さを減らすために深度的に分離可能な畳み込みが使用される。 SHARP-Netのトップダウンパスは、奥行きの分離可能な畳み込み(deep-wise separable convolutions)を使用して、アップサンプリングと情報融合によって高解像度のフィーチャを生成することに焦点を当てている。 Culvert-Swer DefectsデータセットとベンチマークによるDeepGlobe Land Coverデータセットを用いて,本モデルの評価を行った。実験により, 不規則な欠陥形状, 閉塞, クラス不均衡を扱う上で, ベースモデルの有効性(ハール様の特徴を除く)を実証した。 U-Net、CBAM U-Net、ASCU-Net、FPN、SegFormerなどの最先端の手法より優れており、Culvert-Sewer DefectsとDeepGlobe Land Coverのデータセットで平均14.4%と12.1%の改善を達成し、IoUのスコアは77.2%と70.6%だった。また、訓練時間も短縮された。さらに、慎重に選択されたHaarのような機能の統合により、ディープラーニングモデルの性能は少なくとも20%向上した。提案されたSHARP-NetはHaarライクな特徴を取り入れ、94.75%の印象的なIoUを達成し、ベースモデルよりも22.74%改善した。これらの機能は、他のディープラーニングモデルにも適用され、35.0%の改善を示し、その汎用性と有効性を証明した。これにより、SHARP-Netは、現実世界の挑戦的なシナリオにおいて、正確なセマンティックセグメンテーションのための強力で効率的なソリューションを提供する。 This paper introduces Semantic Haar-Adaptive Refined Pyramid Network (SHARP-Net), a novel architecture for semantic segmentation. SHARP-Net integrates a bottom-up pathway featuring Inception-like blocks with varying filter sizes (3x3$ and 5x5), parallel max-pooling, and additional spatial detection layers. This design captures multi-scale features and fine structural details. Throughout the network, depth-wise separable convolutions are used to reduce complexity. The top-down pathway of SHARP-Net focuses on generating high-resolution features through upsampling and information fusion using $1\times1$ and $3\times3$ depth-wise separable convolutions. We evaluated our model using our developed challenging Culvert-Sewer Defects dataset and the benchmark DeepGlobe Land Cover dataset. Our experimental evaluation demonstrated the base model's (excluding Haar-like features) effectiveness in handling irregular defect shapes, occlusions, and class imbalances. It outperformed state-of-the-art methods, including U-Net, CBAM U-Net, ASCU-Net, FPN, and SegFormer, achieving average improvements of 14.4% and 12.1% on the Culvert-Sewer Defects and DeepGlobe Land Cover datasets, respectively, with IoU scores of 77.2% and 70.6%. Additionally, the training time was reduced. Furthermore, the integration of carefully selected and fine-tuned Haar-like features enhanced the performance of deep learning models by at least 20%. The proposed SHARP-Net, incorporating Haar-like features, achieved an impressive IoU of 94.75%, representing a 22.74% improvement over the base model. These features were also applied to other deep learning models, showing a 35.0% improvement, proving their versatility and effectiveness. SHARP-Net thus provides a powerful and efficient solution for accurate semantic segmentation in challenging real-world scenarios.	翻訳日:2024-08-25 14:30:57 公開日:2024-08-02
# ECGが公開:現実世界のECGデータセットにおけるクライアント再識別リスクの分析 ECG Unveiled: Analysis of Client Re-identification Risks in Real-World ECG Datasets ( http://arxiv.org/abs/2408.10228v1 ) ライセンス: Link先を確認	Ziyu Wang, Anil Kanduri, Seyed Amir Hossein Aqajari, Salar Jafarlou, Sanaz R. Mousavi, Pasi Liljeberg, Shaista Malik, Amir M. Rahmani,	(参考訳) ECGデータは心臓の状態を診断し、監視するために重要であるが、プライバシーの重大なリスクを引き起こすユニークな生体情報も含んでいる。既存の心電図の再同定研究は、多くの深層学習の特徴を徹底的に分析することに依存しており、臨床医の意思決定に対するアドホックな説明性に終止符を打つ。本研究では,透過的な機械学習モデルを用いたECG再識別リスクの説明可能性について検討する。 SHAP(SHapley Additive exPlanations)分析を用いて、再識別リスクの原因となる重要な特徴を特定し、説明する。 223人の参加者を含む5つの現実世界のデータセットのECGデータを用いて、アイデンティティ再識別リスクの実証分析を行った。透明な機械学習モデルを用いて、性別0.76、年齢0.67、参加者ID再識別0.82の個人の再識別に寄与する様々なECG特徴の多様性を明らかにする。本手法は,臨床専門家に貴重な知見を提供し,効果的なプライバシ保護機構の開発を導くものである。さらに,本研究は,現実の健康アプリケーションにおける堅牢なプライバシ対策の必要性を強調し,データ匿名化技術を強化するための詳細な,実用的な洞察を提供する。 While ECG data is crucial for diagnosing and monitoring heart conditions, it also contains unique biometric information that poses significant privacy risks. Existing ECG re-identification studies rely on exhaustive analysis of numerous deep learning features, confining to ad-hoc explainability towards clinicians decision making. In this work, we delve into explainability of ECG re-identification risks using transparent machine learning models. We use SHapley Additive exPlanations (SHAP) analysis to identify and explain the key features contributing to re-identification risks. We conduct an empirical analysis of identity re-identification risks using ECG data from five diverse real-world datasets, encompassing 223 participants. By employing transparent machine learning models, we reveal the diversity among different ECG features in contributing towards re-identification of individuals with an accuracy of 0.76 for gender, 0.67 for age group, and 0.82 for participant ID re-identification. Our approach provides valuable insights for clinical experts and guides the development of effective privacy-preserving mechanisms. Further, our findings emphasize the necessity for robust privacy measures in real-world health applications and offer detailed, actionable insights for enhancing data anonymization techniques.	翻訳日:2024-08-25 14:21:10 公開日:2024-08-02
# 学術検索システムにおけるAIの透明性:最初の探索 AI Transparency in Academic Search Systems: An Initial Exploration ( http://arxiv.org/abs/2408.10229v1 ) ライセンス: Link先を確認	Yifan Liu, Peter Sullivan, Luanne Sinnamon,	(参考訳) AIによって強化された学術検索システムが研究者の間で人気を増すにつれて、彼らのAI透明性の調査は、検索結果への信頼と、学術作品の信頼性と完全性を保証するために不可欠である。本研究は,大学図書館案内書から特定された10種類のAIを活用した学術検索システムのウェブサイトを質的コンテンツ分析手法を用いて分析する。 5つはメカニズムに関する詳細な情報を提供し、3つは部分的な情報を提供し、2つはほとんど情報を提供しない。これらの結果は,研究コミュニティが不透明な機能を持つツールを推奨・使用し,再現性の問題や研究者の責任など研究の整合性への懸念を高めていることを示している。 As AI-enhanced academic search systems become increasingly popular among researchers, investigating their AI transparency is crucial to ensure trust in the search outcomes, as well as the reliability and integrity of scholarly work. This study employs a qualitative content analysis approach to examine the websites of a sample of 10 AI-enhanced academic search systems identified through university library guides. The assessed level of transparency varies across these systems: five provide detailed information about their mechanisms, three offer partial information, and two provide little to no information. These findings indicate that the academic community is recommending and using tools with opaque functionalities, raising concerns about research integrity, including issues of reproducibility and researcher responsibility.	翻訳日:2024-08-25 14:21:10 公開日:2024-08-02
# LLMとのインタラクションのための汎用デバイス A General-Purpose Device for Interaction with LLMs ( http://arxiv.org/abs/2408.10230v1 ) ライセンス: Link先を確認	Jiajun Xu, Qun Wang, Yuhang Cao, Baitao Zeng, Sicheng Liu,	(参考訳) 本稿では,大規模言語モデル(LLM)と高度なハードウェアの統合について検討し,LLMとの対話性の向上を目的とした汎用デバイスの開発に焦点をあてる。当初我々は、仮想アシスタントとLLMが人間とテクノロジーのインタラクションを再構築し、重要な進歩を強調し、新しいインテリジェントハードウェアの時代を舞台にしている現在の状況を分析した。 LLM技術の進歩にもかかわらず、特にスケーラビリティ、効率性、手頃な価格、マルチモーダル機能に関して、ハードウェア開発において大きなギャップが存在する。この格差は、パワフルであるだけでなく、汎用性があり、現代的な計算の洗練された要求を管理することのできるハードウェアの必要性を強調し、課題と機会の両方を提示する。提案するデバイスは,スケーラビリティ,マルチモーダルデータ処理,ユーザインタラクションの強化,プライバシ考慮を重視し,多様なアプリケーションにおけるLLM統合のための総合的なプラットフォームを提供することによって,これらのニーズに対処する。 This paper investigates integrating large language models (LLMs) with advanced hardware, focusing on developing a general-purpose device designed for enhanced interaction with LLMs. Initially, we analyze the current landscape, where virtual assistants and LLMs are reshaping human-technology interactions, highlighting pivotal advancements and setting the stage for a new era of intelligent hardware. Despite substantial progress in LLM technology, a significant gap exists in hardware development, particularly concerning scalability, efficiency, affordability, and multimodal capabilities. This disparity presents both challenges and opportunities, underscoring the need for hardware that is not only powerful but also versatile and capable of managing the sophisticated demands of modern computation. Our proposed device addresses these needs by emphasizing scalability, multimodal data processing, enhanced user interaction, and privacy considerations, offering a comprehensive platform for LLM integration in various applications.	翻訳日:2024-08-25 14:21:10 公開日:2024-08-02
# Responsible AI Question Bank: AIリスクアセスメントのための総合ツール Responsible AI Question Bank: A Comprehensive Tool for AI Risk Assessment ( http://arxiv.org/abs/2408.11820v1 ) ライセンス: Link先を確認	Sung Une Lee, Harsha Perera, Yue Liu, Boming Xia, Qinghua Lu, Liming Zhu,	(参考訳) 人工知能(AI)の急速な成長は、責任あるAIプラクティスに対する緊急の要求を裏付けている。関心の高まりにもかかわらず、包括的なAIリスク評価ツールキットはいまだに欠落している。この研究は、さまざまなAIイニシアチブをサポートするために設計された包括的なフレームワークとツールであるResponsible AI (RAI) Question Bankを紹介します。公正性、透明性、説明責任といったAI倫理原則を構造化された質問形式に統合することで、RAI質問銀行は潜在的なリスクの特定、EU AI Actのような新たな規則の遵守、AIガバナンス全体の強化を支援する。 RAI質問銀行の重要な利点は、低レベルのリスク問題と高レベルのリスク問題と関連テーマを結びつけ、サイロ評価を防止し、結束的な評価プロセスを確実にする体系的なアプローチである。ケーススタディでは、リスク要因の評価から意思決定プロセスの実行に至るまで、AIプロジェクト評価におけるRAI質問銀行の実践的応用を説明している。この研究はまた、RAI質問銀行が標準の遵守を確実にし、リスクを軽減し、信頼できるAIシステムの開発を促進するためにどのように使用できるかを実証している。この作業は、包括的なリスク管理を確保しつつ、倫理的AI開発とデプロイメントの複雑さをナビゲートする貴重なツールを組織に提供することで、RAIを前進させる。 The rapid growth of Artificial Intelligence (AI) has underscored the urgent need for responsible AI practices. Despite increasing interest, a comprehensive AI risk assessment toolkit remains lacking. This study introduces our Responsible AI (RAI) Question Bank, a comprehensive framework and tool designed to support diverse AI initiatives. By integrating AI ethics principles such as fairness, transparency, and accountability into a structured question format, the RAI Question Bank aids in identifying potential risks, aligning with emerging regulations like the EU AI Act, and enhancing overall AI governance. A key benefit of the RAI Question Bank is its systematic approach to linking lower-level risk questions to higher-level ones and related themes, preventing siloed assessments and ensuring a cohesive evaluation process. Case studies illustrate the practical application of the RAI Question Bank in assessing AI projects, from evaluating risk factors to informing decision-making processes. The study also demonstrates how the RAI Question Bank can be used to ensure compliance with standards, mitigate risks, and promote the development of trustworthy AI systems. This work advances RAI by providing organizations with a valuable tool to navigate the complexities of ethical AI development and deployment while ensuring comprehensive risk management.	翻訳日:2024-08-25 14:11:11 公開日:2024-08-02
# DECO: TLSのための分散Oracleを使ってWebデータを解放する DECO: Liberating Web Data Using Decentralized Oracles for TLS ( http://arxiv.org/abs/1909.00938v6 ) ライセンス: Link先を確認	Fan Zhang, Sai Krishna Deepak Maram, Harjasleen Malvai, Steven Goldfeder, Ari Juels,	(参考訳) TLSが広くデプロイされているため、ユーザはエンドツーエンドの機密性と整合性を備えたチャネル上のプライベートデータにアクセスすることができる。しかし、彼らができないことは、第三者にそのようなデータの証明(すなわち、それが実際に特定のウェブサイトから来ていること)を証明できない。既存のアプローチでは、望ましくない信頼の前提を導入するか、サーバ側の修正が必要になる。その結果、ユーザのプライベートデータの値は、その起源点にロックアップされる。ユーザは、現在のデータホルダからのヘルプとパーミッションなしで、保存された完全性でデータを他のアプリケーションにエクスポートすることはできない。上記の問題に対処するため, DECO ( \underline{dec}entralized \underline{o}racle の略) を提案する。 DECOは、TLSを介してアクセスされたデータの一部が特定のWebサイトから来たことを証明し、任意の方法でそのようなデータに関するステートメントをゼロ知識で証明し、データ自体を秘密にすることを可能にする。 DECOは、信頼できるハードウェアやサーバサイドの変更なしに動作する最初のシステムである。 DECOは、集中型のWebサービスサイロからデータを解放し、多様なアプリケーションにアクセスできるようにする。 DECOのパワーを実証するために、スマートコントラクトを使用したプライベートファイナンス、レガシ認証を匿名の認証に変換、価格差別に対するクレームの検証という3つのアプリケーションを実装しました。 Thanks to the widespread deployment of TLS, users can access private data over channels with end-to-end confidentiality and integrity. What they cannot do, however, is prove to third parties the {\em provenance} of such data, i.e., that it genuinely came from a particular website. Existing approaches either introduce undesirable trust assumptions or require server-side modifications. As a result, the value of users' private data is locked up in its point of origin. Users cannot export their data with preserved integrity to other applications without help and permission from the current data holder. We propose DECO (short for \underline{dec}entralized \underline{o}racle) to address the above problems. DECO allows users to prove that a piece of data accessed via TLS came from a particular website and optionally prove statements about such data in zero-knowledge, keeping the data itself secret. DECO is the first such system that works without trusted hardware or server-side modifications. DECO can liberate data from centralized web-service silos, making it accessible to a rich spectrum of applications. To demonstrate the power of DECO, we implement three applications that are hard to achieve without it: a private financial instrument using smart contracts, converting legacy credentials to anonymous credentials, and verifiable claims against price discrimination.	翻訳日:2024-08-19 05:35:40 公開日:2024-08-02
# ロボットコースにおける高度なLLM技術がAI講義チュータに及ぼす影響評価 Evaluating the Impact of Advanced LLM Techniques on AI-Lecture Tutors for a Robotics Course ( http://arxiv.org/abs/2408.04645v1 ) ライセンス: Link先を確認	Sebastian Kahl, Felix Löffler, Martin Maciol, Fabian Ridder, Marius Schmitz, Jennifer Spanagel, Jens Wienkamp, Christopher Burgahn, Malte Schilling,	(参考訳) 本研究では,大規模言語モデル(LLM)を人工知能を用いた大学授業用チューターとして評価する。特に、プロンプトエンジニアリング、Retrieval-Augmented-Generation (RAG)、ファインチューニングなど、様々な高度な技術が利用されている。 BLEU-4, ROUGE, BERTScoreなどの共通類似度指標を用いて, 実用性と信頼性の小さな人為的評価を行った。以上の結果から,RAGと迅速なエンジニアリングを組み合わせることで,モデル応答が大幅に向上し,より優れた事実回答が得られている。教育の文脈において、RAGは、通常大学コースにすでに存在している追加情報と材料でモデルの入力を豊かにすることに基づいているため、理想的な手法として現れる。一方、ファインチューニングは、非常に小さく、まだ強力なエキスパートモデルを生成することができるが、過度に適合する危険性がある。我々の研究は、LLMの性能をどのように測定し、現在の測定値がどの程度正確か、あるいは関連性を表すかをさらに問うものである。類似度指標には高い相関関係があり、これらの指標のほとんどを短い応答に偏りがある。全体として,LLMを教育環境に統合する可能性と課題が指摘され,バランスの取れたトレーニングアプローチと高度な評価フレームワークの必要性が示唆された。 This study evaluates the performance of Large Language Models (LLMs) as an Artificial Intelligence-based tutor for a university course. In particular, different advanced techniques are utilized, such as prompt engineering, Retrieval-Augmented-Generation (RAG), and fine-tuning. We assessed the different models and applied techniques using common similarity metrics like BLEU-4, ROUGE, and BERTScore, complemented by a small human evaluation of helpfulness and trustworthiness. Our findings indicate that RAG combined with prompt engineering significantly enhances model responses and produces better factual answers. In the context of education, RAG appears as an ideal technique as it is based on enriching the input of the model with additional information and material which usually is already present for a university course. Fine-tuning, on the other hand, can produce quite small, still strong expert models, but poses the danger of overfitting. Our study further asks how we measure performance of LLMs and how well current measurements represent correctness or relevance? We find high correlation on similarity metrics and a bias of most of these metrics towards shorter responses. Overall, our research points to both the potential and challenges of integrating LLMs in educational settings, suggesting a need for balanced training approaches and advanced evaluation frameworks.	翻訳日:2024-08-19 04:27:34 公開日:2024-08-02
# SumRecom:ユーザのフィードバックから学ぶパーソナライズされた要約アプローチ SumRecom: A Personalized Summarization Approach by Learning from Users' Feedback ( http://arxiv.org/abs/2408.07294v1 ) ライセンス: Link先を確認	Samira Ghodratnama, Mehrdad Zakershahrak,	(参考訳) 既存の文書要約アプローチは、個人の興味を考慮せずに、すべてのユーザに対して一様要約を生成するが、これは非常に現実的ではない。ユーザ固有の要約を作ることは、要求される課題である。一利用者に関する関連情報を取得すること。二情報をユーザモデルに集約して統合すること。三パーソナライズした要約を作成する際に提供された情報を利用すること。そこで本稿では,要約における実質的かつ困難な問題の解決,すなわち,特定のユーザに対して要約を推奨する手法を提案する。提案したアプローチはSumRecomと呼ばれ、人間をループに巻き込み、参照要約を必要とせず、パーソナライズ、インタラクション、ユーザの興味を学習する3つの側面に焦点を当てている。 SumRecomには2つのステップがある。一本質的な概念を選択する際の利用者の傾きを捉えようとする利用者選好抽出装置二利用者の最も適した要約を所定のフィードバックに基づいて発見する要約者。ベンチマークデータセット上でのさまざまな自動的および人為的評価は、ユーザ固有の要約を生成する上で、SumRecomの優位性を示す。文書要約と対話的要約とパーソナライズされた要約と強化学習 Existing multi-document summarization approaches produce a uniform summary for all users without considering individuals' interests, which is highly impractical. Making a user-specific summary is a challenging task as it requires: i) acquiring relevant information about a user; ii) aggregating and integrating the information into a user-model; and iii) utilizing the provided information in making the personalized summary. Therefore, in this paper, we propose a solution to a substantial and challenging problem in summarization, i.e., recommending a summary for a specific user. The proposed approach, called SumRecom, brings the human into the loop and focuses on three aspects: personalization, interaction, and learning user's interest without the need for reference summaries. SumRecom has two steps: i) The user preference extractor to capture users' inclination in choosing essential concepts, and ii) The summarizer to discover the user's best-fitted summary based on the given feedback. Various automatic and human evaluations on the benchmark dataset demonstrate the supremacy SumRecom in generating user-specific summaries. Document summarization and Interactive summarization and Personalized summarization and Reinforcement learning.	翻訳日:2024-08-19 03:35:49 公開日:2024-08-02
# 正規化コントラスト部分多視点外乱検出 Regularized Contrastive Partial Multi-view Outlier Detection ( http://arxiv.org/abs/2408.07819v1 ) ライセンス: Link先を確認	Yijia Wang, Qianqian Xu, Yangbangyan Jiang, Siran Dai, Qingming Huang,	(参考訳) 近年,マルチビュー・アウトレイラ検出法(MVOD)が大幅に進歩し,マルチビュー・データセット内のアウトレイラの同定が試みられている。重要なポイントは、マルチビューデータにのみ存在する、クラスのアウトラヤとクラス属性のアウトラヤをよりよく検出することである。しかし、既存の手法では、ビュー一貫性のある情報を学ぶときの外れ値の影響を低減できないか、近隣構造が異なる場合に苦労する。さらに、そのほとんどは実世界のシナリオにおける部分的なマルチビューデータには適用されない。これらの欠点を克服するため,RCPMOD (Regularized Contrastive partial Multi-view Outlier Detection) と呼ばれる新しい手法を提案する。このフレームワークでは、コントラスト学習を利用して、ビュー一貫性のある情報を学び、一貫性の度合いでアウトレイラを識別する。具体的には, 理論的解析によって動機付けられたバイアスを除去するため, 1) 潜在外付けメモリバンクを用いた外付けメモリバンクのコントラスト損失について検討する。 2) 視野共有型局所構造相関を捉えるための隣接アライメントのコントラスト損失。 (3) モデルが外れ値よりも過度に収まらないように正規化損失を広げる。クロスビューリレーショナルトランスファー技術を用いることで、近隣住民の特徴に基づいて、行方不明のビューサンプルを簡単にインプットできる。 4つのベンチマークデータセットによる実験結果から,提案手法は異なる条件下での最先端の競合より優れていることが示された。 In recent years, multi-view outlier detection (MVOD) methods have advanced significantly, aiming to identify outliers within multi-view datasets. A key point is to better detect class outliers and class-attribute outliers, which only exist in multi-view data. However, existing methods either is not able to reduce the impact of outliers when learning view-consistent information, or struggle in cases with varying neighborhood structures. Moreover, most of them do not apply to partial multi-view data in real-world scenarios. To overcome these drawbacks, we propose a novel method named Regularized Contrastive Partial Multi-view Outlier Detection (RCPMOD). In this framework, we utilize contrastive learning to learn view-consistent information and distinguish outliers by the degree of consistency. Specifically, we propose (1) An outlier-aware contrastive loss with a potential outlier memory bank to eliminate their bias motivated by a theoretical analysis. (2) A neighbor alignment contrastive loss to capture the view-shared local structural correlation. (3) A spreading regularization loss to prevent the model from overfitting over outliers. With the Cross-view Relation Transfer technique, we could easily impute the missing view samples based on the features of neighbors. Experimental results on four benchmark datasets demonstrate that our proposed approach could outperform state-of-the-art competitors under different settings.	翻訳日:2024-08-19 03:35:49 公開日:2024-08-02
# テレコム財団モデル:応用,課題,将来の動向 Telecom Foundation Models: Applications, Challenges, and Future Trends ( http://arxiv.org/abs/2408.03964v1 ) ライセンス: Link先を確認	Tahar Zanouda, Meysam Masoudi, Fitsum Gaim Gebre, Mischa Dohler,	(参考訳) テレコムネットワークは、展開シナリオの多様化、マルチスタンダード、マルチベンダのサポートなど、ますます複雑化している。通信ネットワークエコシステムの複雑な性質は、ネットワークを効果的に管理、運用、最適化する上での課題を提示している。これらのハードルに対処するために、通信ネットワークにおけるさまざまなタスクを解決するために人工知能(AI)が広く採用されている。しかし、これらの従来のAIモデルは、しばしば特定のタスクのために設計されており、開発と保守のために専門的な通信専門知識を必要とする広範囲で高価なラベル付きデータに依存している。 AIモデルは、通常、さまざまなデプロイメントシナリオやアプリケーションの一般化とサポートに失敗する。対照的に、ファンデーションモデル(FM)は言語、ビジョン、意思決定タスクにおいて、様々な領域において効果的な一般化能力を示す。 FMは、通信エコシステムから生成された複数のデータモダリティに基づいてトレーニングし、専門的なドメイン知識を活用することができる。さらに、FMは、最小限のタスク固有のラベル付きデータで多くの特殊タスクを解くように微調整することができ、場合によっては、以前は目に見えない問題を解決するためにコンテキストを活用することができる。 6Gの夜明けに,FMを用いて通信技術や規格を形作る可能性について検討する。特に,Telecom FMs (TFMs) のコンセプトプロセスについて概説し,ネットワーク構成,運用,保守のための特殊な TFM を編成する新たな機会について論じる。最後に,TFMの開発と展開の限界と課題について論じる。 Telecom networks are becoming increasingly complex, with diversified deployment scenarios, multi-standards, and multi-vendor support. The intricate nature of the telecom network ecosystem presents challenges to effectively manage, operate, and optimize networks. To address these hurdles, Artificial Intelligence (AI) has been widely adopted to solve different tasks in telecom networks. However, these conventional AI models are often designed for specific tasks, rely on extensive and costly-to-collect labeled data that require specialized telecom expertise for development and maintenance. The AI models usually fail to generalize and support diverse deployment scenarios and applications. In contrast, Foundation Models (FMs) show effective generalization capabilities in various domains in language, vision, and decision-making tasks. FMs can be trained on multiple data modalities generated from the telecom ecosystem and leverage specialized domain knowledge. Moreover, FMs can be fine-tuned to solve numerous specialized tasks with minimal task-specific labeled data and, in some instances, are able to leverage context to solve previously unseen problems. At the dawn of 6G, this paper investigates the potential opportunities of using FMs to shape the future of telecom technologies and standards. In particular, the paper outlines a conceptual process for developing Telecom FMs (TFMs) and discusses emerging opportunities for orchestrating specialized TFMs for network configuration, operation, and maintenance. Finally, the paper discusses the limitations and challenges of developing and deploying TFMs.	翻訳日:2024-08-09 17:39:48 公開日:2024-08-02
# 存在オントロジーを基本形式オントロジーにマッピングする Mapping the Provenance Ontology to Basic Formal Ontology ( http://arxiv.org/abs/2408.03866v1 ) ライセンス: Link先を確認	Tim Prudhomme, Giacomo De Colle, Austin Liebers, Alec Sculley, Peihong, Xie, Sydney Cohen, John Beverley,	(参考訳) Provenance Ontology (PROV-O) はWorld Wide Web Consortium (W3C) の推奨オントロジーである。 Basic Formal Ontology (BFO)は、OBO Foundry OntologyやCommon Core Ontology (CCO)など、さまざまなオントロジーを構成するために使用されるトップレベルのオントロジーISO/IEC規格である。これら2つのオントロジ、その拡張、およびそれらによって編成されたデータとの相互運用性を高めるために、構造的および意味的考察を優先する特定のマッピング基準と方法論に従ってアライメントを示す。オントロジーアライメントは、PROV-Oインスタンスの標準的な例と、SPARQLで形式化されたマッピング基準を満たさないクエリ項との論理的整合性をチェックすることで評価される。 FAIR(Findable, Accessible, Interoperable, Reusable)の原則をサポートするために,さまざまなセマンティックWebテクノロジが使用されている。 The Provenance Ontology (PROV-O) is a World Wide Web Consortium (W3C) recommended ontology used to structure data about provenance across a wide variety of domains. Basic Formal Ontology (BFO) is a top-level ontology ISO/IEC standard used to structure a wide variety of ontologies, such as the OBO Foundry ontologies and the Common Core Ontologies (CCO). To enhance interoperability between these two ontologies, their extensions, and data organized by them, an alignment is presented according to a specific mapping criteria and methodology which prioritizes structural and semantic considerations. The ontology alignment is evaluated by checking its logical consistency with canonical examples of PROV-O instances and querying terms that do not satisfy the mapping criteria as formalized in SPARQL. A variety of semantic web technologies are used in support of FAIR (Findable, Accessible, Interoperable, Reusable) principles.	翻訳日:2024-08-08 12:25:11 公開日:2024-08-02
# ベイズ最適化のための事前学習されたガウス過程 Pre-trained Gaussian Processes for Bayesian Optimization ( http://arxiv.org/abs/2109.08215v6 ) ライセンス: Link先を確認	Zi Wang, George E. Dahl, Kevin Swersky, Chansoo Lee, Zachary Nado, Justin Gilmer, Jasper Snoek, Zoubin Ghahramani,	(参考訳) ベイズ最適化(BO)は、高価な実世界の関数をグローバルに最適化するための一般的な戦略となっている。 BOがブラックボックス関数の最適化に適しているという一般的な期待とは対照的に、実際にBOを正常にデプロイするためには、それらの関数に関するドメイン知識が必要である。このようなドメイン知識は、関数に対する初期信念を規定するガウス過程(GP)にしばしば現れる。しかし、専門家の知識があっても、事前を定量的に定義するのは簡単ではない。これは、複雑な機械学習モデルにおけるハイパーパラメータチューニングの問題に特に当てはまる。我々はこれらの機能的前提を設定するための代替のプラクティスを模索する。特に、より厳密な分布を事前訓練できるような、類似した関数のデータを持つシナリオを考察する。我々は,KL分散に基づく損失関数を用いて,GPの事前学習に必要なものについて詳述し,HyperBOと呼ばれる新しい事前学習ベースのBOフレームワークを提案する。理論的には, GP前の「真理」を仮定せずに, 後続の予測とHyperBOのほぼゼロの後悔が認められた。我々のアプローチを現実的なセットアップで検証するために、一般的な画像やテキストデータセット、およびタンパク質配列データセットに基づいて、最先端のディープラーニングモデルの何万もの構成をトレーニングすることで、大規模なマルチタスクハイパーパラメータチューニングデータセットを収集します。以上の結果から,HyperBOは,新しいチューニングデータセットと既存のマルチタスクBOベンチマークの両方において,競合する最良の手法よりも,少なくとも3倍高い効率で優れたハイパーパラメータを見つけることができることがわかった。 Bayesian optimization (BO) has become a popular strategy for global optimization of expensive real-world functions. Contrary to a common expectation that BO is suited to optimizing black-box functions, it actually requires domain knowledge about those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process (GP) priors that specify initial beliefs on functions. However, even with expert knowledge, it is non-trivial to quantitatively define a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. We detail what pre-training entails for GPs using a KL divergence based loss function, and propose a new pre-training based BO framework named HyperBO. Theoretically, we show bounded posterior predictions and near-zero regrets for HyperBO without assuming the "ground truth" GP prior is known. To verify our approach in realistic setups, we collect a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art deep learning models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, HyperBO is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods on both our new tuning dataset and existing multi-task BO benchmarks.	翻訳日:2024-08-07 20:01:27 公開日:2024-08-02
# 高速・高精度脳内出血CT画像分類のためのデュアルタスク・ビジョン・トランス Dual-Task Vision Transformer for Rapid and Accurate Intracerebral Hemorrhage CT Image Classification ( http://arxiv.org/abs/2405.06814v3 ) ライセンス: Link先を確認	Jialiang Fan, Xinhui Fan, Chengyan Song, Xiaofan Wang, Bingdong Feng, Lucan Li, Guoyu Lu,	(参考訳) 脳内出血 (ICH) は、脳血管の破裂によって引き起こされる重篤で急激な医学的症状であり、脳組織に永続的な損傷を与え、しばしば機能障害や死亡を引き起こす。 ICHの診断と解析は、通常、脳のCT画像に頼っている。 ICH 条件の緊急性を考えると,早期治療は極めて重要である。しかし、ICHCT画像の複雑さと専門医の頻繁な不足は重要な課題である。そこで本研究では,ICHと正常分類のための実世界からデータセットを収集し,出血位置,すなわちDeep,Subcortical,Lobarの3種類のICH画像分類を行う。さらに、ICH画像の自動分類と診断のためのニューラルネットワーク構造であるDouble-task Vision Transformer(DTViT)を提案する。 DTViTは、視覚変換器(ViT)からエンコーダをデプロイし、CT画像からの特徴抽出に注意機構を用いる。提案するDTViTフレームワークは、2つの多層認識(MLP)ベースのデコーダを組み込んで、ICHの存在を同時に識別し、3種類の出血部位を分類する。実験の結果,DTViTは実世界のテストデータセットで良好に動作することがわかった。この作業のためのコードと新たに収集されたデータセットは、https://github.com/jfan 1997/DTViT.comで公開されている。 Intracerebral hemorrhage (ICH) is a severe and sudden medical condition caused by the rupture of blood vessels in the brain, leading to permanent damage to brain tissue and often resulting in functional disabilities or death in patients. Diagnosis and analysis of ICH typically rely on brain CT imaging. Given the urgency of ICH conditions, early treatment is crucial, necessitating rapid analysis of CT images to formulate tailored treatment plans. However, the complexity of ICH CT images and the frequent scarcity of specialist radiologists pose significant challenges. Therefore, we collect a dataset from the real world for ICH and normal classification and three types of ICH image classification based on the hemorrhage location, i.e., Deep, Subcortical, and Lobar. In addition, we propose a neural network structure, dual-task vision transformer (DTViT), for the automated classification and diagnosis of ICH images. The DTViT deploys the encoder from the Vision Transformer (ViT), employing attention mechanisms for feature extraction from CT images. The proposed DTViT framework also incorporates two multilayer perception (MLP)-based decoders to simultaneously identify the presence of ICH and classify the three types of hemorrhage locations. Experimental results demonstrate that DTViT performs well on the real-world test dataset. The code and newly collected dataset for this work are available at: https://github.com/jfan1997/DTViT.	翻訳日:2024-08-07 18:52:52 公開日:2024-08-02
# フォトニック応用のための人工ニューラルネットワーク:アルゴリズムから実装まで Artificial Neural Networks for Photonic Applications: From Algorithms to Implementation ( http://arxiv.org/abs/2408.02685v1 ) ライセンス: Link先を確認	Pedro Freire, Egor Manuylovich, Jaroslaw E. Prilepsky, Sergei K. Turitsy,	(参考訳) フォトニクスにおける人工ニューラルネットワークの適用に関するチュートリアル・レビューは、光学研究や工学コミュニティからコンピュータ科学、応用数学まで幅広い読者を対象としている。ここでは、これらの分野間のインターフェースにおける研究領域に注目し、各ドメイン固有の技術的詳細と全体的な明確さの間の適切なバランスを見つけようとしています。まず、いくつかのコアニューラルネットワークタイプの重要な特性と特異性を簡潔に思い出す。これはフォトニクスに最も関係していると考えられており、また、レイヤーの理論的設計をフォトニクスのハードウェア実現と結びつけている。その後、最適化された精度で必要なタスクを実行するために、選択したモデルの設計を微調整する方法の問題を解明する。次に,光通信,イメージング,センシング,新しい材料やレーザーの設計に関わる複数の側面を含む,フォトニクスにおけるニューラルネットワークの応用の最近の展開と進歩について論じる。次の節では、アルゴリズムからハードウェア実装への移行という文脈において、ニューラルネットワークの複雑さを正確に評価する方法に重点を置いている。導入された複雑性特性は、特定の重要な例である光通信におけるニューラルネットワークの応用を分析するために使用され、それらをいくつかのベンチマーク信号処理方法と比較する。我々は、機械学習でよく知られたモデル圧縮戦略の記述と、最近ニューラルネットワークの光学的応用で導入された新しい技法を組み合わせる。このチュートリアルのレビューはフォトニクスに重点を置いているが、ここで紹介される手法や技法は、より広い範囲の科学的・工学的応用において有用であると信じている点を強調することが重要である。 This tutorial-review on applications of artificial neural networks in photonics targets a broad audience, ranging from optical research and engineering communities to computer science and applied mathematics. We focus here on the research areas at the interface between these disciplines, attempting to find the right balance between technical details specific to each domain and overall clarity. First, we briefly recall key properties and peculiarities of some core neural network types, which we believe are the most relevant to photonics, also linking the layer's theoretical design to some photonics hardware realizations. After that, we elucidate the question of how to fine-tune the selected model's design to perform the required task with optimized accuracy. Then, in the review part, we discuss recent developments and progress for several selected applications of neural networks in photonics, including multiple aspects relevant to optical communications, imaging, sensing, and the design of new materials and lasers. In the following section, we put a special emphasis on how to accurately evaluate the complexity of neural networks in the context of the transition from algorithms to hardware implementation. The introduced complexity characteristics are used to analyze the applications of neural networks in optical communications, as a specific, albeit highly important example, comparing those with some benchmark signal processing methods. We combine the description of the well-known model compression strategies used in machine learning, with some novel techniques introduced recently in optical applications of neural networks. It is important to stress that although our focus in this tutorial-review is on photonics, we believe that the methods and techniques presented here can be handy in a much wider range of scientific and engineering applications.	翻訳日:2024-08-07 16:17:55 公開日:2024-08-02
# バイオメディカル応用のためのマルチモーダルディープラーニングにおける中間核融合の体系的検討 A Systematic Review of Intermediate Fusion in Multimodal Deep Learning for Biomedical Applications ( http://arxiv.org/abs/2408.02686v1 ) ライセンス: Link先を確認	Valerio Guarrasi, Fatih Aksu, Camillo Maria Caruso, Francesco Di Feola, Aurora Rofena, Filippo Ruffini, Paolo Soda,	(参考訳) 深層学習は、複雑な高次元データを扱う洗練された方法を提供することによって、生物医学研究に革命をもたらした。マルチモーダルディープラーニング(MDL)は、画像、テキストデータ、遺伝情報などの多様なデータタイプを統合することで、より堅牢で正確な予測モデルを実現することで、この機能をさらに強化する。 MDLでは、早期と後期の融合法とは異なり、中間核融合は学習過程においてモダリティ固有の特徴を効果的に組み合わせる能力において際立っている。本システムレビューは, 生物医学応用における現在の中間核融合法を包括的に解析し, 定式化することを目的としている。本研究では, 中間核融合法の発展に向けた技術, 課題, 今後の方向性について検討する。さらに, バイオメディカルドメインを超えて, これらの手法の理解と応用を高めるための構造的表記法を導入する。我々の発見は、より高度で洞察に富んだマルチモーダルモデルの開発において、研究者、医療専門家、そしてより広範なディープラーニングコミュニティを支援することを目的としています。本稿では,MDLの動的分野における今後の研究および実用化のための基礎的枠組みを提案する。 Deep learning has revolutionized biomedical research by providing sophisticated methods to handle complex, high-dimensional data. Multimodal deep learning (MDL) further enhances this capability by integrating diverse data types such as imaging, textual data, and genetic information, leading to more robust and accurate predictive models. In MDL, differently from early and late fusion methods, intermediate fusion stands out for its ability to effectively combine modality-specific features during the learning process. This systematic review aims to comprehensively analyze and formalize current intermediate fusion methods in biomedical applications. We investigate the techniques employed, the challenges faced, and potential future directions for advancing intermediate fusion methods. Additionally, we introduce a structured notation to enhance the understanding and application of these methods beyond the biomedical domain. Our findings are intended to support researchers, healthcare professionals, and the broader deep learning community in developing more sophisticated and insightful multimodal models. Through this review, we aim to provide a foundational framework for future research and practical applications in the dynamic field of MDL.	翻訳日:2024-08-07 16:17:55 公開日:2024-08-02
# 映像からの物体・事象の合成物理推論 Compositional Physical Reasoning of Objects and Events from Videos ( http://arxiv.org/abs/2408.02687v1 ) ライセンス: Link先を確認	Zhenfang Chen, Shilong Dong, Kexin Yi, Yunzhu Li, Mingyu Ding, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan,	(参考訳) 自然界における物体の物理的性質の理解と推論は、人工知能における根本的な課題である。色や形状などのいくつかの特性は直接観察できるが、質量や電荷などの他の特性は、物体の視覚的な外観から隠されている。本稿では、物体の運動と相互作用からこれらの隠れた物理的特性を推定し、推定された物理的特性に基づいて対応する力学を予測するというユニークな課題に対処する。まず,コンポジション物理推論(ComPhy)データセットを紹介する。与えられたオブジェクトのセットに対して、ComPhyは、異なる初期条件下で動いたり相互作用したりした、限られたビデオを含んでいる。このモデルは、質量や電荷などの構成的隠れた特性を解き放つ能力に基づいて評価され、この知識を用いて一連の疑問に答える。シミュレータの合成ビデオの他に、実世界のデータセットを収集し、異なるモデルの物理的推論能力をテストする。我々は、ComPhyの最先端ビデオ推論モデルを評価し、これらの隠れプロパティをキャプチャする能力に制限があることを明らかにし、性能が低下することを示した。また,視覚的および隠れた物理的特性を質問応答から学習し,原因を解明する,新しいニューロシンボリックな枠組みであるPhysical Concept Reasoner(PCR)を提案する。訓練後、PCRは顕著な能力を示す。フレームをまたいでオブジェクトを検出し、関連付けることができ、視覚的および隠れた物理的特性を検知し、未来と反現実的な予測を行い、これらの抽出された表現を使って挑戦的な質問に答える。 Understanding and reasoning about objects' physical properties in the natural world is a fundamental challenge in artificial intelligence. While some properties like colors and shapes can be directly observed, others, such as mass and electric charge, are hidden from the objects' visual appearance. This paper addresses the unique challenge of inferring these hidden physical properties from objects' motion and interactions and predicting corresponding dynamics based on the inferred physical properties. We first introduce the Compositional Physical Reasoning (ComPhy) dataset. For a given set of objects, ComPhy includes limited videos of them moving and interacting under different initial conditions. The model is evaluated based on its capability to unravel the compositional hidden properties, such as mass and charge, and use this knowledge to answer a set of questions. Besides the synthetic videos from simulators, we also collect a real-world dataset to show further test physical reasoning abilities of different models. We evaluate state-of-the-art video reasoning models on ComPhy and reveal their limited ability to capture these hidden properties, which leads to inferior performance. We also propose a novel neuro-symbolic framework, Physical Concept Reasoner (PCR), that learns and reasons about both visible and hidden physical properties from question answering. After training, PCR demonstrates remarkable capabilities. It can detect and associate objects across frames, ground visible and hidden physical properties, make future and counterfactual predictions, and utilize these extracted representations to answer challenging questions.	翻訳日:2024-08-07 16:17:55 公開日:2024-08-02
# 短時間学習データを用いた長期気候シミュレーションにおける非侵入的補正学習のための確率的枠組み A probabilistic framework for learning non-intrusive corrections to long-time climate simulations from short-time training data ( http://arxiv.org/abs/2408.02688v1 ) ライセンス: Link先を確認	Benedikt Barthel Sorensen, Leonardo Zepeda-Núñez, Ignacio Lopez-Gomez, Zhong Yi Wan, Rob Carver, Fei Sha, Themistoklis Sapsis,	(参考訳) 乱流のようなカオスシステムは、科学や工学においてユビキタスである。しかし、これらの研究は広い範囲のスケールと、しばしば完全には理解されていない他の物理学との強い相互作用のため、依然として課題である。その結果、これらのシステムの正確なシミュレーションに必要な時空間分解能は、特に気候変動による極端な気象リスクの定量化のような長期的リスク評価の応用において、一般に計算不可能である。データ駆動モデリングは、これらの障害を軽減するためのいくつかの約束を提供するが、高品質なシミュレーションの不足は、そのようなモデルを訓練するための限られたデータをもたらす。したがって、計算、アルゴリズム、データ制限は一般的に、稀な極端な事象の確率が正確に捕捉されないことを意味する。本研究では,カオスシステムの非侵襲的に解けない長期シミュレーションにニューラルネットワークモデルをトレーニングするための一般的な戦略を提案する。提案手法は,高忠実度基準に向けた未解決シミュレーションにおいて,後処理補正演算子を訓練することに基づく。これにより、基礎となるシステムのダイナミクスを直接学習することができ、統計が収束していない場合でも、非常に少ないトレーニングデータを使用することができます。さらに、確率的ネットワークアーキテクチャを使用することで、限られたトレーニングデータによる不確実性を活用して、外挿機能をさらに改善できます。本研究では, 準地球栄養流の過度に未解決なシミュレーションに適用し, トレーニングデータより30倍以上の時間地平線上での異方性統計を正確に予測できることを実証する。 Chaotic systems, such as turbulent flows, are ubiquitous in science and engineering. However, their study remains a challenge due to the large range scales, and the strong interaction with other, often not fully understood, physics. As a consequence, the spatiotemporal resolution required for accurate simulation of these systems is typically computationally infeasible, particularly for applications of long-term risk assessment, such as the quantification of extreme weather risk due to climate change. While data-driven modeling offers some promise of alleviating these obstacles, the scarcity of high-quality simulations results in limited available data to train such models, which is often compounded by the lack of stability for long-horizon simulations. As such, the computational, algorithmic, and data restrictions generally imply that the probability of rare extreme events is not accurately captured. In this work we present a general strategy for training neural network models to non-intrusively correct under-resolved long-time simulations of chaotic systems. The approach is based on training a post-processing correction operator on under-resolved simulations nudged towards a high-fidelity reference. This enables us to learn the dynamics of the underlying system directly, which allows us to use very little training data, even when the statistics thereof are far from converged. Additionally, through the use of probabilistic network architectures we are able to leverage the uncertainty due to the limited training data to further improve extrapolation capabilities. We apply our framework to severely under-resolved simulations of quasi-geostrophic flow and demonstrate its ability to accurately predict the anisotropic statistics over time horizons more than 30 times longer than the data seen in training.	翻訳日:2024-08-07 16:17:55 公開日:2024-08-02
# 長期交通に対する時空間部分センシング予測 Spatio-Temporal Partial Sensing Forecast for Long-term Traffic ( http://arxiv.org/abs/2408.02689v1 ) ライセンス: Link先を確認	Zibo Liu, Zhe Jiang, Zelin Xu, Tingsong Xiao, Zhengkun Xiao, Haibo Wang, Shigang Chen,	(参考訳) 交通予測は、選択した場所に設置されたセンサによる最近の計測を使用して、将来の道路交通を予測する。既存の作業では、すべての場所がセンサーを装備していると仮定するか、短期的な予測に焦点を当てている。本稿では,一部の場所でのみセンサを仮定して,長期交通量の部分的検知トラフィック予測について検討する。この研究は、あらゆる場所にセンサーを配置することで、交通管理におけるインフラ投資コストを著しく削減する上で重要である。しかし、この問題は、センサのない場所での未知の分布、長期予測における複雑な時空間相関、データのノイズや交通パターンの異常(例えば道路閉鎖)などにより困難である。本稿では、長期交通予測のための時空間部分センシング(STPS)予測モデルを提案し、不規則を捕捉しノイズを克服するランクベースの埋め込み技術、空間分布シフトを永久に知覚された場所から未感知場所へ克服する空間伝達行列、利用可能なすべてのデータを利用してモデルパラメータを逐次改善する多段階トレーニングプロセスを提案する。複数の実世界の交通データセットに対する広範囲な実験により、STPSは最先端よりも優れ、部分的な知覚的長期予測において優れた精度を達成していることが示された。 Traffic forecasting uses recent measurements by sensors installed at chosen locations to forecast the future road traffic. Existing work either assumes all locations are equipped with sensors or focuses on short-term forecast. This paper studies partial sensing traffic forecast of long-term traffic, assuming sensors only at some locations. The study is important in lowering the infrastructure investment cost in traffic management since deploying sensors at all locations could incur prohibitively high cost. However, the problem is challenging due to the unknown distribution at unsensed locations, the intricate spatio-temporal correlation in long-term forecasting, as well as noise in data and irregularities in traffic patterns (e.g., road closure). We propose a Spatio-Temporal Partial Sensing (STPS) forecast model for long-term traffic prediction, with several novel contributions, including a rank-based embedding technique to capture irregularities and overcome noise, a spatial transfer matrix to overcome the spatial distribution shift from permanently sensed locations to unsensed locations, and a multi-step training process that utilizes all available data to successively refine the model parameters for better accuracy. Extensive experiments on several real-world traffic datasets demonstrate that STPS outperforms the state-of-the-art and achieves superior accuracy in partial sensing long-term forecasting.	翻訳日:2024-08-07 16:17:55 公開日:2024-08-02
# 量子因果構造の再検討 -- 因果秩序はいつ進化するのか? Revisiting dynamics of quantum causal structures -- when can causal order evolve? ( http://arxiv.org/abs/2008.12757v4 ) ライセンス: Link先を確認	John H. Selby, Ana Belén Sainz, Paweł Horodecki,	(参考訳) 近年、量子論の力学、特にチャネルの力学、測定、高次変換の研究に大きな関心が寄せられている。 Ref [Phys. X 8(1), 011047 (2018)]は、プロセス行列のダイナミックスの定義とともに、プロセス行列形式を用いてこれを追求し、特に因果構造の進化の問題に焦点を当てている。その主要な結論の1つは、形式論において、連続的かつ可逆的な変換の下では、操作間の因果順序は保存されなければならないという強い定理である。我々の結果はRefに挑戦する。 [Phys. Rev. X 8(1), 011047 (2018)]:標準的な量子力学形式論における操作の物理的進化の全体像を考慮に入れれば、Refの結論となる。 [X 8(1), 011047(2018)] は成立しない。すなわち、ある連続的かつ可逆的な力学の下では、操作間の因果順序は必ずしも保存されないことを示す。我々はさらに、この明らかな矛盾の根源、具体的には、高次過程の広く受け入れられ、広く適用されている枠組みを、数学的に健全であるのに対して、必ずしも物理力学の結論を導き出すのに適切ではない、と特定し分析する。最後に、局所的な操作による絡み合い処理と古典的コミュニケーションに基づいて、直観の後の全体像の要素の整合性を示す。 Recently, there has been substantial interest in studying the dynamics of quantum theory beyond that of states, in particular, the dynamics of channels, measurements, and higher-order transformations. Ref. [Phys. Rev. X 8(1), 011047 (2018)] pursues this using the process-matrix formalism, together with a definition of the possible dynamics of such process matrices, and focusing especially on the question of evolution of causal structures. One of its major conclusions is a strong theorem saying that within the formalism, under continuous and reversible transformations, the causal order between operations must be preserved. Our result here challenges that of Ref. [Phys. Rev. X 8(1), 011047 (2018)]: if one is to take into account a full picture of the physical evolution of operations within the standard quantum-mechanical formalism, then the conclusion of Ref. [Phys. Rev. X 8(1), 011047 (2018)] does not hold. That is, we show that under certain continuous and reversible dynamics, the causal order between operations is not necessarily preserved. We moreover identify and analyse the root of this apparent contradiction, specifically, that the commonly accepted and widely applied framework of higher-order processes, whilst mathematically sound, is not always appropriate for drawing conclusions on physical dynamics. Finally, we show how to reconcile the elements of the whole picture following the intuition based on entanglement processing by local operations and classical communication.	翻訳日:2024-08-07 01:00:27 公開日:2024-08-02
# グラフにおけるコミュニティの補正確率検出 Amortized Probabilistic Detection of Communities in Graphs ( http://arxiv.org/abs/2010.15727v4 ) ライセンス: Link先を確認	Yueqi Wang, Yoonho Lee, Pallab Basu, Juho Lee, Yee Whye Teh, Liam Paninski, Ari Pakman,	(参考訳) グラフでコミュニティ構造を学ぶことは、科学領域にまたがる幅広い応用をもたらす。グラフニューラルネットワーク(GNN)はグラフ構造を符号化することに成功したが、既存のGNNベースのコミュニティ検出手法は、不確実性を扱うための適切な確率的定式化の欠如に加えて、予め多くのコミュニティの知識を必要とすることによって制限されている。本稿では,GNNの表現力と最近のクラスタリング手法を組み合わせることで,これらの問題に対処する。我々のモデルは、構造情報を抽出するグラフ表現のバックボーンと、クラスタの変動数を自然に扱うアモータイズされたクラスタリングネットワークから成り立っている。どちらの成分も、グラフ群の後部分布のよく定義されたモデルに結合し、ラベル付きグラフを共同最適化する。推測時に、モデルはコミュニティラベルの後部から並列サンプルを生成し、原則化された方法で不確実性を定量化する。合成および実データセットのフレームワークからいくつかのモデルを評価するとともに,従来の手法と比較して性能が向上したことを示す。個別のコントリビューションとして、アテンションモジュールを追加することで、最近のアモルト化確率的クラスタリングアーキテクチャを拡張し、コミュニティ検出タスクをさらに改善する。 Learning community structures in graphs has broad applications across scientific domains. While graph neural networks (GNNs) have been successful in encoding graph structures, existing GNN-based methods for community detection are limited by requiring knowledge of the number of communities in advance, in addition to lacking a proper probabilistic formulation to handle uncertainty. We propose a simple framework for amortized community detection, which addresses both of these issues by combining the expressive power of GNNs with recent methods for amortized clustering. Our models consist of a graph representation backbone that extracts structural information and an amortized clustering network that naturally handles variable numbers of clusters. Both components combine into well-defined models of the posterior distribution of graph communities and are jointly optimized given labeled graphs. At inference time, the models yield parallel samples from the posterior of community labels, quantifying uncertainty in a principled way. We evaluate several models from our framework on synthetic and real datasets, and demonstrate improved performance compared to previous methods. As a separate contribution, we extend recent amortized probabilistic clustering architectures by adding attention modules, which yield further improvements on community detection tasks.	翻訳日:2024-08-07 01:00:27 公開日:2024-08-02
# 光の体性を示す光子数分解検出器 Photon number resolving detectors as evidence for the corpuscular nature of light ( http://arxiv.org/abs/2110.04245v2 ) ライセンス: Link先を確認	Morgan C. Williamson, Gabriel D. Ko, Brian R. La Cour,	(参考訳) 我々は、光子数分解(PNR)検出器が光の離散性についての説得力のある証拠を与えるかどうかという問題を考える。そこで本研究では,既存のPNR検出器の信号対雑音比 (SNR) が不十分であることを明らかにするとともに,光の波動図と一致し,光粒子の推定に依存しないPNR検出器出力の解析に代替的な解釈を提案する。この解釈は、与えられた検出器の偶然の窓内での相関または偶然の一致検出の集約に基づいている。我々の解釈は、検出器の異常窓の任意の特性を考慮し、強度干渉計の確立した処理に接続する。この解釈を検証するために,多重化PNR検出器を用いて実験を行い,光子番号の一致窓への依存性を後処理により検討した。これらの観測を振幅しきい値検出に基づく完全古典波モデルと比較した結果, 良好な一致が得られた。文献上に存在するような低SNRPNR検出器の結果は古典的な記述で説明できるため、光の離散性を示す証拠は得られない。 We consider the question of whether photon-number-resolving (PNR) detectors provide compelling evidence for the discrete nature of light; i.e., whether they indicate the prior presence of a certain number of discrete photons. To answer this question, we reveal the insufficient signal-to-noise ratio (SNR) of existing PNR detectors, and propose an alternative interpretation for the analysis of PNR detector output that is consistent with a wave picture of light and does not rely on the presumption of light particles. This interpretation is based on the aggregation of correlated or accidentally coincident detections within a given detector coincidence window. Our interpretation accounts for the arbitrary character of detector coincidence windows and includes connections to established treatments of intensity interferometers. To validate our interpretation, we performed an experiment on a multiplexed PNR detector and examined the dependence of photon number on the coincidence window via post-processing. These observations were then compared to a fully classical wave model based on amplitude threshold detection, and the results were found to be in excellent agreement. We find that results from low SNR PNR detectors, such as those existing in the literature, are able to be described by classical descriptions, and therefore do not demonstrate evidence for the discrete nature of light.	翻訳日:2024-08-07 01:00:27 公開日:2024-08-02
# ソースコードメトリクスと静的解析を用いたバグの深刻度推定に関する実証的研究 An Empirical Study on Bug Severity Estimation using Source Code Metrics and Static Analysis ( http://arxiv.org/abs/2206.12927v2 ) ライセンス: Link先を確認	Ehsan Mashhadi, Shaiful Chowdhury, Somayeh Modaberi, Hadi Hemmati, Gias Uddin,	(参考訳) 過去数十年間、ソフトウェアバグ(欠陥)の予測に多大な研究努力が注がれている。一般的に、これらの作業はさまざまなメトリクス、ツール、テクニックを活用して、どのクラス、メソッド、行、コミットがバグが多いかを予測します。しかし、このドメインの既存の作業のほとんどはすべてのバグを扱います。バグが厳しければ多いほど、結果が大きくなる。したがって, バグの重大度を推定する欠陥予測手法が重要であり, 高い重大度がすぐに注目される。本稿では,10のソースコードメトリクスを用いた2つの一般的なデータセット(Defects4JとBugs.jar)と,その欠陥とその重症度を予測するための2つの人気のある静的解析ツール(SpotBugsとInfer)について,定量的かつ定性的に検討する。我々は、19のJavaオープンソースプロジェクトから異なる重度ラベルを持つ3,358のバグギーメソッドを調査した。その結果、コードメトリクスはバグの多いコードを予測するのに役立ちます(Lines of the Code、Mantainable Index、FanOut、Effortのメトリクスはベストです)が、バグの深刻度レベルを見積もることはできません。さらに,静的解析ツールは,予測バグ(F1スコアは3.1%-7.1%)と重度ラベル(F1スコアは2%以下)の両方において,弱い性能を示した。また、深刻なバグの特徴を手動で調べて、その深刻さを見積もる上で、コードメトリクスと静的解析ツールの弱いパフォーマンスの背後にある可能性のある理由を特定しました。また、当社の分類では、ほとんどのケースでセキュリティバグは深刻度が高いのに対して、エッジ/バウンダリ障害は深刻度が低いことが示されています。最後に,実験結果の実際的意義について考察し,今後の研究に向けた新たな方向性を提案する。 In the past couple of decades, significant research efforts have been devoted to the prediction of software bugs (i.e., defects). In general, these works leverage a diverse set of metrics, tools, and techniques to predict which classes, methods, lines, or commits are buggy. However, most existing work in this domain treats all bugs the same, which is not the case in practice. The more severe the bugs the higher their consequences. Therefore, it is important for a defect prediction method to estimate the severity of the identified bugs, so that the higher severity ones get immediate attention. In this paper, we provide a quantitative and qualitative study on two popular datasets (Defects4J and Bugs.jar), using 10 common source code metrics, and two popular static analysis tools (SpotBugs and Infer) for analyzing their capability to predict defects and their severity. We studied 3,358 buggy methods with different severity labels from 19 Java open-source projects. Results show that although code metrics are useful in predicting buggy code (Lines of the Code, Maintainable Index, FanOut, and Effort metrics are the best), they cannot estimate the severity level of the bugs. In addition, we observed that static analysis tools have weak performance in both predicting bugs (F1 score range of 3.1%-7.1%) and their severity label (F1 score under 2%). We also manually studied the characteristics of the severe bugs to identify possible reasons behind the weak performance of code metrics and static analysis tools in estimating their severity. Also, our categorization shows that Security bugs have high severity in most cases while Edge/Boundary faults have low severity. Finally, we discuss the practical implications of the results and propose new directions for future research.	翻訳日:2024-08-07 00:54:45 公開日:2024-08-02
# ガウス混合モデルにおけるロバスト教師なしマルチタスクと伝達学習 Robust Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models ( http://arxiv.org/abs/2209.15224v4 ) ライセンス: Link先を確認	Ye Tian, Haolei Weng, Lucy Xia, Yang Feng,	(参考訳) 教師なし学習は多くの現実世界のアプリケーションで広く使われている。最も単純かつ重要な教師なし学習モデルの1つはガウス混合モデル(GMM)である。本研究では,GMMにおけるマルチタスク学習問題について検討し,タスク間の類似したGMMパラメータ構造を活用し,シングルタスク学習と比較して学習性能を向上させることを目的とする。本稿では,EMアルゴリズムに基づくマルチタスクGMM学習手法を提案する。提案手法は,パラメータ推定誤差と過剰な誤クラスタリング誤差の両方に対する最小収束率を,幅広い状況下で達成する。さらに,同様の理論的結果が導出されるGMMにおける移動学習問題へのアプローチを一般化する。さらに、反復的教師なしマルチタスクおよび転送学習法は初期化アライメントの問題に悩まされ、この問題を解決するために2つのアライメントアルゴリズムが提案される。最後に,本手法の有効性をシミュレーションおよび実データ例を用いて実証する。我々の知る限りでは、理論的保証のあるGMM上でマルチタスクとトランスファー学習を研究する最初の研究である。 Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on GMMs, which aims to leverage potentially similar GMM parameter structures among tasks to obtain improved learning performance compared to single-task learning. We propose a multi-task GMM learning procedure based on the EM algorithm that effectively utilizes unknown similarities between related tasks and is robust against a fraction of outlier tasks from arbitrary distributions. The proposed procedure is shown to achieve the minimax optimal rate of convergence for both parameter estimation error and the excess mis-clustering error, in a wide range of regimes. Moreover, we generalize our approach to tackle the problem of transfer learning for GMMs, where similar theoretical results are derived. Additionally, iterative unsupervised multi-task and transfer learning methods may suffer from an initialization alignment problem, and two alignment algorithms are proposed to resolve the issue. Finally, we demonstrate the effectiveness of our methods through simulations and real data examples. To the best of our knowledge, this is the first work studying multi-task and transfer learning on GMMs with theoretical guarantees.	翻訳日:2024-08-07 00:54:45 公開日:2024-08-02
# プログラムで訓練された言語モデルにおけるプログラム意味論の創発的表現 Emergent Representations of Program Semantics in Language Models Trained on Programs ( http://arxiv.org/abs/2305.11169v3 ) ライセンス: Link先を確認	Charles Jin, Martin Rinard,	(参考訳) プログラムの形式的意味論を表現するために,プログラムの言語モデル(LM)が学習できることを実証する。具体的には、2Dグリッド環境をナビゲートするためのドメイン固有言語で記述されたプログラムの合成コーパス上でトランスフォーマーモデルを訓練する。コーパス内の各プログラムは、いくつかの入力出力グリッドの世界状態の形で(部分的な)仕様によって先行される。さらなる帰納バイアスは与えないが、学習中に未観測の中間格子状態のより正確な表現をLM隠蔽状態から抽出できることが判明し、LMがプログラムを形式的に解釈する創発的な能力を得る可能性が示唆された。また,新たな介入ベースラインを開発し,LMで表現されるものを,プローブが学習するのとは対照的に曖昧にすることができるようにした。我々は,この手法が多種多様な意味探索実験に適用可能であることを予測している。要約すると、コードのLMをトレーニングするための新しいテクニックは提案されていないが、コードの統計モデルにおける形式的意味論の獲得と表現に関する知見を提供するための実験的なフレームワークを開発した。私たちのコードはhttps://github.com/charlesjin/emergent-semantics.comで利用可能です。 We present evidence that language models (LMs) of code can learn to represent the formal semantics of programs, despite being trained only to perform next-token prediction. Specifically, we train a Transformer model on a synthetic corpus of programs written in a domain-specific language for navigating 2D grid world environments. Each program in the corpus is preceded by a (partial) specification in the form of several input-output grid world states. Despite providing no further inductive biases, we find that a probing classifier is able to extract increasingly accurate representations of the unobserved, intermediate grid world states from the LM hidden states over the course of training, suggesting the LM acquires an emergent ability to interpret programs in the formal sense. We also develop a novel interventional baseline that enables us to disambiguate what is represented by the LM as opposed to learned by the probe. We anticipate that this technique may be generally applicable to a broad range of semantic probing experiments. In summary, this paper does not propose any new techniques for training LMs of code, but develops an experimental framework for and provides insights into the acquisition and representation of formal semantics in statistical models of code. Our code is available at https://github.com/charlesjin/emergent-semantics.	翻訳日:2024-08-07 00:45:00 公開日:2024-08-02
# 注意すべきことは多かれ少なかれあるべきか? Should We Attend More or Less? Modulating Attention for Fairness ( http://arxiv.org/abs/2305.13088v2 ) ライセンス: Link先を確認	Abdelrahman Zayed, Goncalo Mordido, Samira Shabanian, Sarath Chandar,	(参考訳) 自然言語処理(NLP)の進歩は、機会と課題の両方をもたらす。最近の進歩は、様々なタスクのためのハイパフォーマンスなモデルの開発を可能にする一方で、ジェンダーステレオタイプのようなデータから有害なバイアスを学習するモデルのリスクも引き起こす。本研究では,現在最先端のNLPモデルにおいて広く用いられている,社会バイアスの伝播における注意の役割について検討する。具体的には,注意分布のエントロピーとモデルの性能,公平性の関係について検討する。そこで本研究では,トレーニング後のモデルフェアネスを改善するために,注目度を変調する新しい手法を提案する。本手法はトレーニング後および事前推論にのみ適用されるため,プロセス内手法であり,既存のプロセス内手法や事前処理手法よりも計算コストが低い。本研究の結果は,テキストの分類や生成タスクにおいて,様々なサイズの言語モデルを用いて,公平さと最小性能の損失の増加を示す。 WARNING: この仕事は攻撃的な言語を使用します。 The advances in natural language processing (NLP) pose both opportunities and challenges. While recent progress enables the development of high-performing models for a variety of tasks, it also poses the risk of models learning harmful biases from the data, such as gender stereotypes. In this work, we investigate the role of attention, a widely-used technique in current state-of-the-art NLP models, in the propagation of social biases. Specifically, we study the relationship between the entropy of the attention distribution and the model's performance and fairness. We then propose a novel method for modulating attention weights to improve model fairness after training. Since our method is only applied post-training and pre-inference, it is an intra-processing method and is, therefore, less computationally expensive than existing in-processing and pre-processing approaches. Our results show an increase in fairness and minimal performance loss on different text classification and generation tasks using language models of varying sizes. WARNING: This work uses language that is offensive.	翻訳日:2024-08-07 00:45:00 公開日:2024-08-02
# リンドブラディアンによる単層地盤準備 Single-ancilla ground state preparation via Lindbladians ( http://arxiv.org/abs/2308.15676v4 ) ライセンス: Link先を確認	Zhiyan Ding, Chi-Fang Chen, Lin Lin,	(参考訳) 我々は、早期耐故障状態における基底状態の準備のための量子アルゴリズムを設計する。モンテカルロ型量子アルゴリズムとして、ターゲット状態が定常なリンドブラディアンを特徴とする。このリンドブラディアンの構成はアルゴリズム的であり、自然界の弱結合系-バス力学の特定の近似と見なすべきではない。提案アルゴリズムは1つのアンシラ量子ビットで実装でき、量子コンピュータ上で効率的にシミュレートできる。初期状態が基底状態と重複しない場合でも基底状態を作成することができ、量子位相推定のような手法の最も重要な制限を回避できる。変種として、より優れた効率性を示し、所望の進化時間と精度に応じてほぼ最適なシミュレーションコストを提供する離散時間アルゴリズムも提案する。 IsingモデルとHubbardモデルを用いた数値シミュレーションにより,本手法の有効性と適用性を示した。 We design a quantum algorithm for ground state preparation in the early fault tolerant regime. As a Monte Carlo-style quantum algorithm, our method features a Lindbladian where the target state is stationary. The construction of this Lindbladian is algorithmic and should not be seen as a specific approximation to some weakly coupled system-bath dynamics in nature. Our algorithm can be implemented using just one ancilla qubit and efficiently simulated on a quantum computer. It can prepare the ground state even when the initial state has zero overlap with the ground state, bypassing the most significant limitation of methods like quantum phase estimation. As a variant, we also propose a discrete-time algorithm, demonstrating even better efficiency and providing a near-optimal simulation cost depending on the desired evolution time and precision. Numerical simulation using Ising and Hubbard models demonstrates the efficacy and applicability of our method.	翻訳日:2024-08-07 00:25:32 公開日:2024-08-02
# グリーンマシンによるスーパーアダプティブ通信--絡みのない非局所性の実例 Superadditive Communication with the Green Machine: A Practical Demonstration of Nonlocality without Entanglement ( http://arxiv.org/abs/2310.05889v3 ) ライセンス: Link先を確認	Chaohan Cui, Jack Postlewaite, Babak N. Saif, Linran Fan, Saikat Guha,	(参考訳) 光通信容量の究極的なホレボ限界を達成するには、複数の変調されたシンボルに対して集合的な量子測定を行う共同検出受信機が必要である。このような超付加性は、シンボルバイシンボル光検出によって達成可能な通信速度よりも高いものであり、絡み合いのないよく知られた非局所性の特別な場合であり、まだ実証されていない。本稿では,重付加性を実現する共同検出受信機であるグリーンマシンの設計と実演を行う。我々はこの受信機を構築し、その容量が二相シフトキー(BPSK)を用いたフォトンスターベッド方式のシンボルバイシンボル受信機を超えることを示す。我々のグリーンマシン受信機は、パルス配置変調(宇宙レーザー通信に使用される従来の変調フォーマット)と比較して、送信機のピーク電力要求を著しく低減することができる。さらに, 自己参照相は, 位相雑音, 大気乱流, プラットフォーム振動に免疫を与えることを示した。 Achieving the ultimate Holevo limit of optical communication capacity requires a joint-detection receiver which makes a collective quantum measurement over multiple modulated symbols. Such superadditivity -- a higher communication rate than that achievable by symbol-by-symbol optical detection -- is a special case of the well-known nonlocality without entanglement and has yet to be demonstrated. In this article, we propose and demonstrate a design of joint-detection receivers, the Green Machine, that can achieve superadditivity. We build this receiver and show that its capacity surpasses any symbol-by-symbol receivers in the photon-starved regime with binary-phase-shift-keying (BPSK). Our Green Machine receiver can also significantly reduce the transmitter peak power requirement compared with the pulse-position modulation (the conventional modulation format used for deep space laser communication). We further show that the self-referenced phase makes it immune to phase noise, e.g., atmospheric turbulence or platform vibrations.	翻訳日:2024-08-07 00:15:47 公開日:2024-08-02
# 人々がオンラインでストーリーを語る場所は? オンラインコミュニティ全体にわたるストーリー検出 Where Do People Tell Stories Online? Story Detection Across Online Communities ( http://arxiv.org/abs/2311.09675v3 ) ライセンス: Link先を確認	Maria Antoniak, Joel Mire, Maarten Sap, Elliott Ash, Andrew Piper,	(参考訳) オンラインコミュニティにおけるストーリーの検出は、ストーリーがコミュニティに散らばり、単一のテキスト内でノンストーリーテリングスパンと織り交ぜられるため、難しい作業である。この課題に対処するために、StorySeekerツールキットを構築してリリースする。そこには、502のReddit投稿とコメントの豊富な注釈付きデータセット、ソーシャルメディアのコンテキストに適合した詳細なコードブック、ドキュメントとスパンのレベルでのストーリーテリングを予測するモデルが含まれている。私たちのデータセットは、33のトピックカテゴリにまたがる数百の人気のあるRedditコミュニティからサンプルされ、バイナリストーリーラベル、ストーリースパン、イベントスパンなど、詳細な専門家アノテーションが含まれています。筆者らは,本データを用いたさまざまな検出手法の評価を行い,ストーリーテリングに焦点をあてたオンラインストーリーテリングの特徴を識別する。我々は,大規模なコミュニティ中心のソーシャルメディアプラットフォーム上でのストーリーテリングの分布特性を照らし,また,物語テリングを多くの説得的戦略の1つとして活用するr/ChangeMyViewのケーススタディも実施し,我々のデータとモデルがコミュニティ間およびコミュニティ内研究の両方に利用できることを示した。最後に,ナラトロジーにおけるツールの意味と分析,およびオンラインコミュニティの研究について論じる。 Story detection in online communities is a challenging task as stories are scattered across communities and interwoven with non-storytelling spans within a single text. We address this challenge by building and releasing the StorySeeker toolkit, including a richly annotated dataset of 502 Reddit posts and comments, a detailed codebook adapted to the social media context, and models to predict storytelling at the document and span levels. Our dataset is sampled from hundreds of popular English-language Reddit communities ranging across 33 topic categories, and it contains fine-grained expert annotations, including binary story labels, story spans, and event spans. We evaluate a range of detection methods using our data, and we identify the distinctive textual features of online storytelling, focusing on storytelling spans. We illuminate distributional characteristics of storytelling on a large community-centric social media platform, and we also conduct a case study on r/ChangeMyView, where storytelling is used as one of many persuasive strategies, illustrating that our data and models can be used for both inter- and intra-community research. Finally, we discuss implications of our tools and analyses for narratology and the study of online communities.	翻訳日:2024-08-07 00:06:03 公開日:2024-08-02
# Emu Video: 明示的な画像コンディショニングによるテキスト・ツー・ビデオ生成の要因付け Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning ( http://arxiv.org/abs/2311.10709v2 ) ライセンス: Link先を確認	Rohit Girdhar, Mannat Singh, Andrew Brown, Quentin Duval, Samaneh Azadi, Sai Saketh Rambhatla, Akbar Shah, Xi Yin, Devi Parikh, Ishan Misra,	(参考訳) 本稿では,テキストに条件付き画像を生成し,テキストと生成された画像に条件付き映像を生成する2つのステップに分解するテキスト間ビデオ生成モデルであるEmu Videoを提案する。重要設計決定 - 拡散のための調整されたノイズスケジュール、高画質・高解像度ビデオを直接生成できるマルチステージトレーニング。人間の評価では、我々の生成されたビデオは、以前の作業の81%対GoogleのImagen Video、90%対NvidiaのPYOCO、そして96%対MetaのMake-A-Videoに比べて、品質が強く好まれています。私たちのモデルはRunwayMLのGen2やPika Labsといった商用ソリューションよりも優れています。最後に,本手法は,ユーザのテキストプロンプトに基づく画像のアニメーションに自然に寄与する。 We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image. We identify critical design decisions--adjusted noise schedules for diffusion, and multi-stage training that enable us to directly generate high quality and high resolution videos, without requiring a deep cascade of models as in prior work. In human evaluations, our generated videos are strongly preferred in quality compared to all prior work--81% vs. Google's Imagen Video, 90% vs. Nvidia's PYOCO, and 96% vs. Meta's Make-A-Video. Our model outperforms commercial solutions such as RunwayML's Gen2 and Pika Labs. Finally, our factorizing approach naturally lends itself to animating images based on a user's text prompt, where our generations are preferred 96% over prior work.	翻訳日:2024-08-07 00:06:03 公開日:2024-08-02
# イベントビジョンのためのSNNのための非同期バイオプラスティックニューロン Asynchronous Bioplausible Neuron for SNN for Event Vision ( http://arxiv.org/abs/2311.11853v2 ) ライセンス: Link先を確認	Sanket Kachole, Hussain Sajwani, Fariborz Baghaei Naeini, Dimitrios Makris, Yahya Zweiri,	(参考訳) Spiking Neural Networks (SNN)は、生物学的にインスパイアされたコンピュータビジョンのアプローチを提供する。しかしながら、これらのネットワーク内でのホメオスタシスを維持することは、様々な予測不可能な入力信号の中で平衡と最適な処理効率を維持するために、神経応答の連続的な調整を必要とするため、困難である。これらの課題に対応するために、入力信号の変動を自動的に調整する動的スパイク発火機構であるABN(Asynchronous Bioplausible Neuron)を提案する。様々なデータセットにわたる包括的評価は、画像分類とセグメンテーション、神経平衡の維持、エネルギー効率におけるABNの強化された性能を示す。 Spiking Neural Networks (SNNs) offer a biologically inspired approach to computer vision that can lead to more efficient processing of visual data with reduced energy consumption. However, maintaining homeostasis within these networks is challenging, as it requires continuous adjustment of neural responses to preserve equilibrium and optimal processing efficiency amidst diverse and often unpredictable input signals. In response to these challenges, we propose the Asynchronous Bioplausible Neuron (ABN), a dynamic spike firing mechanism to auto-adjust the variations in the input signal. Comprehensive evaluation across various datasets demonstrates ABN's enhanced performance in image classification and segmentation, maintenance of neural equilibrium, and energy efficiency.	翻訳日:2024-08-07 00:06:03 公開日:2024-08-02
# VAE-IF:定期的に取得したICU時系列における非教師なしアーティファクト検出のための平均化による深部特徴抽出 VAE-IF: Deep feature extraction with averaging for fully unsupervised artifact detection in routinely acquired ICU time-series ( http://arxiv.org/abs/2312.05959v2 ) ライセンス: Link先を確認	Hollan Haule, Ian Piper, Patricia Jones, Chen Qin, Tsz-Yan Milly Lo, Javier Escudero,	(参考訳) 人工物は、集中治療単位(ICU)やその他の設定から収集される生理的時系列において一般的な問題である。臨床研究や患者医療の質や信頼性に影響を及ぼす。アーティファクトのマニュアルアノテーションは費用がかかり、時間がかかり、実用的ではない。自動化された方法が望ましい。本稿では,従来のラベル付けや信号固有知識を使わずに,臨床標準,分単位でのICUデータ中のアーティファクトを検出するための,教師なしの手法を提案する。我々のアプローチは、変動型オートエンコーダ(VAE)と孤立林(IF)をハイブリッドモデルに組み合わせて、血圧、心拍数、頭蓋内圧など、様々な種類の重要な徴候の異常を学習し、同定する。我々は、実世界のICUデータセットに対するアプローチを評価し、長寿命メモリ(LSTM)とXGBoostに基づく教師付きベンチマークモデルと、ARIMAのような統計的手法との比較を行った。我々の教師なしアプローチは、完全に教師付きされた手法に匹敵する感度を達成し、外部データセットによく当てはまることを示す。また、VAEが学習した潜伏空間を可視化し、クリーンでノイズの多いサンプルを分解する能力を示す。本手法は,臨床研究や実践において,ラベルを一切必要とせずにICUデータをクリーニングする,有望なソリューションを提供する。 Artifacts are a common problem in physiological time series collected from intensive care units (ICU) and other settings. They affect the quality and reliability of clinical research and patient care. Manual annotation of artifacts is costly and time-consuming, rendering it impractical. Automated methods are desired. Here, we propose a novel fully unsupervised approach to detect artifacts in clinical-standard, minute-by-minute resolution ICU data without any prior labeling or signal-specific knowledge. Our approach combines a variational autoencoder (VAE) and an isolation forest (IF) into a hybrid model to learn features and identify anomalies in different types of vital signs, such as blood pressure, heart rate, and intracranial pressure. We evaluate our approach on a real-world ICU dataset and compare it with supervised benchmark models based on long short-term memory (LSTM) and XGBoost and statistical methods such as ARIMA. We show that our unsupervised approach achieves comparable sensitivity to fully supervised methods and generalizes well to an external dataset. We also visualize the latent space learned by the VAE and demonstrate its ability to disentangle clean and noisy samples. Our approach offers a promising solution for cleaning ICU data in clinical research and practice without the need for any labels whatsoever.	翻訳日:2024-08-06 23:55:54 公開日:2024-08-02
# 循環流によるホログラフィックエントロピー円錐の向こう側 Beyond the Holographic Entropy Cone via Cycle Flows ( http://arxiv.org/abs/2312.10137v2 ) ライセンス: Link先を確認	Temple He, Sergio Hernández-Cuenca, Cynthia Keeler,	(参考訳) ビットスレッドをモチベーションとして,ホログラフィックエントロピー円錐の外側のエントロピーベクトルを計算するための新しい処方令を導入する。有向グラフ上のサイクルフローを利用することで、頂点の任意の部分集合に付随する最大サイクルフローが、サブシステムに対応するもので、明らかに浄化対称性に従うことを示す。さらに、自分自身を有向グラフのサブクラスに制限することにより、最大サイクルフローが部分加法性と強い部分加法の両方に従うことを証明し、それによって、部分系に関連するエントロピーの候補として確立する。最後に、我々のモデルは、非方向グラフの従来の流れを通して得られるエントロピーベクトルをどのように一般化するかを示し、また、我々のモデルは、ハイパーグラフから生じるエントロピーベクトルを同様に一般化する。 Motivated by bit threads, we introduce a new prescription for computing entropy vectors outside the holographic entropy cone. By utilizing cycle flows on directed graphs, we show that the maximum cycle flow associated to any subset of vertices, which corresponds to a subsystem, manifestly obeys purification symmetry. Furthermore, by restricting ourselves to a subclass of directed graphs, we prove that the maximum cycle flow obeys both subadditivity and strong subadditivity, thereby establishing it as a viable candidate for the entropy associated to the subsystem. Finally, we demonstrate how our model generalizes the entropy vectors obtainable via conventional flows in undirected graphs, as well as conjecture that our model similarly generalizes the entropy vectors arising from hypergraphs.	翻訳日:2024-08-06 23:55:54 公開日:2024-08-02
# DEM: 航空宇宙におけるディープニューラルネットワーク分類器出力の認証方法 DEM: A Method for Certifying Deep Neural Network Classifier Outputs in Aerospace ( http://arxiv.org/abs/2401.02283v4 ) ライセンス: Link先を確認	Guy Katz, Natan Levy, Idan Refaeli, Raz Yerushalmi,	(参考訳) 航空宇宙分野におけるソフトウェア開発は、厳格で高品質な標準に固執する必要がある。この領域には商用ソフトウェア(例えば ARP-4754 や DO-178)の規制ガイドラインがあるが、ディープニューラルネットワーク(DNN)コンポーネントを持つソフトウェアには適用されない。したがって、航空宇宙システムが深層学習革命の恩恵を受けるためには、どうすればよいのかは不明である。我々の研究は、DNN認証のための新しいアウトプット中心のアプローチで、この問題に対処しようとしています。提案手法は統計的検証手法を用いており,DNNの出力が信頼できない可能性のある特定の入力をフラグできる重要な利点がある。そこで本手法では,DNNの他の近傍入力に対する予測を統計的に解析し,不整合を検出する。これは、個々の出力とは対照的に、DNN全体を認証しようとする既存の技術とは対照的である。本手法では,DNNをブラックボックスとして使用し,そのトポロジを仮定しない。この作業が、特に高品質と信頼性の基準が不可欠である航空宇宙領域において、安全クリティカルなアプリケーションにDNNを統合するための別のステップになることを期待しています。 Software development in the aerospace domain requires adhering to strict, high-quality standards. While there exist regulatory guidelines for commercial software in this domain (e.g., ARP-4754 and DO-178), these do not apply to software with deep neural network (DNN) components. Consequently, it is unclear how to allow aerospace systems to benefit from the deep learning revolution. Our work here seeks to address this challenge with a novel, output-centric approach for DNN certification. Our method employs statistical verification techniques, and has the key advantage of being able to flag specific inputs for which the DNN's output may be unreliable - so that they may be later inspected by a human expert. To achieve this, our method conducts a statistical analysis of the DNN's predictions for other, nearby inputs, in order to detect inconsistencies. This is in contrast to existing techniques, which typically attempt to certify the entire DNN, as opposed to individual outputs. Our method uses the DNN as a black-box, and makes no assumptions about its topology. We hope that this work constitutes another step towards integrating DNNs in safety-critical applications - especially in the aerospace domain, where high standards of quality and reliability are crucial.	翻訳日:2024-08-06 23:55:54 公開日:2024-08-02
# ミッション: 不可能な言語モデル Mission: Impossible Language Models ( http://arxiv.org/abs/2401.06416v2 ) ライセンス: Link先を確認	Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, Christopher Potts,	(参考訳) チョムスキーらは、大きな言語モデル(LLM)は人間が学べることが可能で不可能な言語を等しく学習できると主張している。しかし、そのような主張を支持する実験的な証拠はほとんど発表されていない。そこで我々は,不自然な単語順序と文法規則で英語データを体系的に変更して設計した,複雑さの異なる合成不可能な言語の集合を開発する。英語の単語のランダムなシャッフルや不可逆的なシャッフルなど、本質的に不可能な言語であり、他方では直感的に不可能ではないが、言語学、特に単語の位置の数え方に基づく規則でよく考えられている言語である。本稿では,GPT-2小モデルの学習能力を評価するための多種多様な評価について報告し,各言語の学習過程を比較するために,これらの評価を訓練期間中,様々な段階で実施する。我々の中核的な発見は、GPT-2が英語を対照として、不可能な言語を学ぶのに苦労していることであり、中核的な主張に挑戦する。さらに重要なことは、我々のアプローチが、様々なLLMアーキテクチャを様々な不可能な言語でテストし、これらの認知的および型論的調査のツールとしてどのようにLLMを利用できるかを学ぶために、生産的な調査の行をオープンにすることを願っている。 Chomsky and others have very directly claimed that large language models (LLMs) are equally capable of learning languages that are possible and impossible for humans to learn. However, there is very little published experimental evidence to support such a claim. Here, we develop a set of synthetic impossible languages of differing complexity, each designed by systematically altering English data with unnatural word orders and grammar rules. These languages lie on an impossibility continuum: at one end are languages that are inherently impossible, such as random and irreversible shuffles of English words, and on the other, languages that may not be intuitively impossible but are often considered so in linguistics, particularly those with rules based on counting word positions. We report on a wide range of evaluations to assess the capacity of GPT-2 small models to learn these uncontroversially impossible languages, and crucially, we perform these assessments at various stages throughout training to compare the learning process for each language. Our core finding is that GPT-2 struggles to learn impossible languages when compared to English as a control, challenging the core claim. More importantly, we hope our approach opens up a productive line of inquiry in which different LLM architectures are tested on a variety of impossible languages in an effort to learn more about how LLMs can be used as tools for these cognitive and typological investigations.	翻訳日:2024-08-06 23:46:09 公開日:2024-08-02
# AONeuS:音響光学式センサフュージョンのためのニューラルネットワークフレームワーク AONeuS: A Neural Rendering Framework for Acoustic-Optical Sensor Fusion ( http://arxiv.org/abs/2402.03309v3 ) ライセンス: Link先を確認	Mohamad Qadri, Kevin Zhang, Akshay Hinduja, Michael Kaess, Adithya Pediredla, Christopher A. Metzler,	(参考訳) 水中の知覚と3次元表面の再構築は、建設、セキュリティ、海洋考古学、環境モニタリングにおける幅広い応用において難しい問題である。危険な操作条件、脆弱な環境、限られた航法制御は、潜水艇がその移動範囲を制限し、測定を捉えるための基準線を規定することが多い。 3次元シーン再構築の文脈では、より小さなベースラインが再構築をより困難にすることが知られている。本研究は,高分解能RGB計測と低分解能深度画像ソナー計測を効果的に統合できる物理ベースの多モード音響-光学ニューラルサーフェス再構成フレームワーク(AONeuS)を開発した。これらの相補的なモダリティを融合させることで,本フレームワークは,高度に制限されたベースライン上での計測から高精度な高解像度3次元表面を再構築することができる。広範囲なシミュレーションと実験により, AONeuS は最近の RGB とソナーのみの逆微分可能な面再構成法を劇的に上回っていることを示した。論文の結果を視覚化するWebサイトは、このアドレスにある: https://aoneus.github.io/ Underwater perception and 3D surface reconstruction are challenging problems with broad applications in construction, security, marine archaeology, and environmental monitoring. Treacherous operating conditions, fragile surroundings, and limited navigation control often dictate that submersibles restrict their range of motion and, thus, the baseline over which they can capture measurements. In the context of 3D scene reconstruction, it is well-known that smaller baselines make reconstruction more challenging. Our work develops a physics-based multimodal acoustic-optical neural surface reconstruction framework (AONeuS) capable of effectively integrating high-resolution RGB measurements with low-resolution depth-resolved imaging sonar measurements. By fusing these complementary modalities, our framework can reconstruct accurate high-resolution 3D surfaces from measurements captured over heavily-restricted baselines. Through extensive simulations and in-lab experiments, we demonstrate that AONeuS dramatically outperforms recent RGB-only and sonar-only inverse-differentiable-rendering--based surface reconstruction methods. A website visualizing the results of our paper is located at this address: https://aoneus.github.io/	翻訳日:2024-08-06 23:46:09 公開日:2024-08-02
# 変分量子アルゴリズムにおける単位ノイズを超えて--雑音誘起バレンプラトーと極限集合 Beyond unital noise in variational quantum algorithms: noise-induced barren plateaus and limit sets ( http://arxiv.org/abs/2402.08721v5 ) ライセンス: Link先を確認	P. Singkanipa, D. A. Lidar,	(参考訳) 変分量子アルゴリズム(VQA)は、多くの可能性を秘めているが、指数的に小さな勾配の挑戦に直面している。このバレンプラトー(BP)現象は、VQAの指数的トレーニングオーバーヘッドをもたらす。おそらく最も悪名高いのがノイズ誘起バレン台地(NIBP)であり、これはオープン・システム・エフェクトから生じる避けられないBPの一種である。ここでは、NIBP の研究をより一般的な正のトレース保存写像に一般化し、ユニタリケースにおける NIBP の存在とヒルベルト・シュミット (HS)-コントラクティブ (Hilbert-Schmidt) と呼ばれる非ユニタリ写像のクラスを確立する。後者は振幅減衰を含む。本稿では,VQAコスト関数のノイズ誘起極限集合(NILS)の関連現象を同定し,その存在を一元的および一元的非一元的ノイズマップで証明する。その過程で、VQAのパラメータシフトルールをノイズ設定に拡張する。解析結果を示す非分極および振幅減衰マップの数値シミュレーションとともに、NIBPとNILSを生じさせる変数に関して厳密な境界を提供する。 Variational quantum algorithms (VQAs) hold much promise but face the challenge of exponentially small gradients. Unmitigated, this barren plateau (BP) phenomenon leads to an exponential training overhead for VQAs. Perhaps the most pernicious are noise-induced barren plateaus (NIBPs), a type of unavoidable BP arising from open system effects, which have so far been shown to exist for unital noise maps. Here, we generalize the study of NIBPs to more general completely positive, trace-preserving maps, establishing the existence of NIBPs in the unital case and a class of non-unital maps we call Hilbert-Schmidt (HS)-contractive. The latter includes amplitude damping. We identify the associated phenomenon of noise-induced limit sets (NILS) of the VQA cost function and prove its existence for both unital and HS-contractive non-unital noise maps. Along the way, we extend the parameter shift rule of VQAs to the noisy setting. We provide rigorous bounds in terms of the relevant variables that give rise to NIBPs and NILSs, along with numerical simulations of the depolarizing and amplitude-damping maps that illustrate our analytical results.	翻訳日:2024-08-06 23:36:13 公開日:2024-08-02
# ReViT: 意図的残差接続によるビジョントランスフォーマーの多様性向上 ReViT: Enhancing Vision Transformers Feature Diversity with Attention Residual Connections ( http://arxiv.org/abs/2402.11301v2 ) ライセンス: Link先を確認	Anxhelo Diko, Danilo Avola, Marco Cascio, Luigi Cinque,	(参考訳) 視覚変換器 (ViT) の自己保持機構は, 深い層に特徴的崩壊が生じ, 低レベルの視覚的特徴が消失する。しかし、そのような特徴は、画像内の要素を正確に表現し、識別し、視覚ベースの認識システムの精度と堅牢性を高めるのに役立つ。そこで本研究では,視覚的特徴の多様性の向上とモデルロバスト性の向上を図り,視覚的特徴量の向上を図り,視覚的特徴量の向上とモデルロバスト性の向上を図った。このようにして、提案するネットワークは、重要な低レベル特徴をキャプチャして保存し、分析対象のシーン内の要素についてより詳細な情報を提供する。提案手法の有効性とロバスト性は,ImageNet1k, CIFAR10, CIFAR100, Oxford Flowers-102, Oxford-IIIT Petの5つの画像分類ベンチマークで評価され, 性能が向上した。さらに、COCO2017データセットの実験では、空間認識トランスフォーマーモデルに実装された場合、オブジェクト検出とインスタンスセグメンテーションのための意味的および空間的関係を発見し、組み込むことが示されている。 Vision Transformer (ViT) self-attention mechanism is characterized by feature collapse in deeper layers, resulting in the vanishing of low-level visual features. However, such features can be helpful to accurately represent and identify elements within an image and increase the accuracy and robustness of vision-based recognition systems. Following this rationale, we propose a novel residual attention learning method for improving ViT-based architectures, increasing their visual feature diversity and model robustness. In this way, the proposed network can capture and preserve significant low-level features, providing more details about the elements within the scene being analyzed. The effectiveness and robustness of the presented method are evaluated on five image classification benchmarks, including ImageNet1k, CIFAR10, CIFAR100, Oxford Flowers-102, and Oxford-IIIT Pet, achieving improved performances. Additionally, experiments on the COCO2017 dataset show that the devised approach discovers and incorporates semantic and spatial relationships for object detection and instance segmentation when implemented into spatial-aware transformer models.	翻訳日:2024-08-06 23:36:13 公開日:2024-08-02
# 超伝導量子ビットアレイにおけるフォノンを介する準粒子中毒のモデル化 Modeling phonon-mediated quasiparticle poisoning in superconducting qubit arrays ( http://arxiv.org/abs/2402.15471v2 ) ライセンス: Link先を確認	Eric Yelton, Clayton P. Larson, Vito Iaia, Kenneth Dodge, Guglielmo La Magna, Paul G. Baity, Ivan V. Pechenezhskiy, Robert McDermott, Noah Kurinsky, Gianluigi Catelani, Britton L. T. Plourde,	(参考訳) 超伝導量子ビットチップに衝突する電離放射線による相関誤差は、量子誤り訂正に問題となる。このような影響は、クビット電極に準粒子(QP)励起を生成し、クビットコヒーレンスを一時的に減少させる。粒子衝突によって生じる多くのエネルギーフォノンは、デバイス基板中を効率よく移動し、高い確率で準粒子を生成する。衝撃の余波におけるフォノンおよび準粒子動力学の数値シミュレーションのための総合的戦略について述べる。フォノンを介するQP毒の実験的測定と比較し,本モデルがフォノンダウンコンバージョン構造の様々な構成においてQP毒の空間的および時間的フットプリントを捉えることを実証した。そこで我々は、電離放射線の存在下で超伝導量子プロセッサを動作させるための経路を提示する。 Correlated errors caused by ionizing radiation impacting superconducting qubit chips are problematic for quantum error correction. Such impacts generate quasiparticle (QP) excitations in the qubit electrodes, which temporarily reduce qubit coherence significantly. The many energetic phonons produced by a particle impact travel efficiently throughout the device substrate and generate quasiparticles with high probability, thus causing errors on a large fraction of the qubits in an array simultaneously. We describe a comprehensive strategy for the numerical simulation of the phonon and quasiparticle dynamics in the aftermath of an impact. We compare the simulations with experimental measurements of phonon-mediated QP poisoning and demonstrate that our modeling captures the spatial and temporal footprint of the QP poisoning for various configurations of phonon downconversion structures. We thus present a path forward for the operation of superconducting quantum processors in the presence of ionizing radiation.	翻訳日:2024-08-06 23:36:13 公開日:2024-08-02
# PCR-99:99%のアウトリーチによるポイントクラウド登録の実践的方法 PCR-99: A Practical Method for Point Cloud Registration with 99 Percent Outliers ( http://arxiv.org/abs/2402.16598v5 ) ライセンス: Link先を確認	Seong Hun Lee, Javier Civera, Patrick Vandewalle,	(参考訳) 本稿では,未知のスケールと極端外周比の両方を扱える点雲登録法を提案する。 PCR-99と呼ばれる本手法では, 速度を著しく向上させる2つの新しいメカニズムを持つ決定論的3点サンプリング手法を用いて, 1) ペアスケールの整合性に基づくサンプルの整合性の向上, および(2) トリプルトスケールの整合性に基づく効率的な外乱除去手法, 悪いサンプルの事前スクリーニング, テスト対象の仮説数の削減を行う。提案手法は,98%のアウトレイラ比において,最先端技術に匹敵する性能を達成できることを示す。しかし、99%のアウトラヤ比では、既知のスケールと未知のスケールの問題の両方において、最先端の問題を上回ります。特に後者では、ロバスト性と速度の観点から明らかな優位性を観察する。 We propose a robust method for point cloud registration that can handle both unknown scales and extreme outlier ratios. Our method, dubbed PCR-99, uses a deterministic 3-point sampling approach with two novel mechanisms that significantly boost the speed: (1) an improved ordering of the samples based on pairwise scale consistency, prioritizing the point correspondences that are more likely to be inliers, and (2) an efficient outlier rejection scheme based on triplet scale consistency, prescreening bad samples and reducing the number of hypotheses to be tested. Our evaluation shows that, up to 98% outlier ratio, the proposed method achieves comparable performance to the state of the art. At 99% outlier ratio, however, it outperforms the state of the art for both known-scale and unknown-scale problems. Especially for the latter, we observe a clear superiority in terms of robustness and speed.	翻訳日:2024-08-06 23:26:29 公開日:2024-08-02
# $$-QVAE:正規化混合状態潜在表現を用いた量子変分オートエンコーダ $ζ$-QVAE: A Quantum Variational Autoencoder utilizing Regularized Mixed-state Latent Representations ( http://arxiv.org/abs/2402.17749v2 ) ライセンス: Link先を確認	Gaoyuan Wang, Jonathan Warrell, Prashant S. Emani, Mark Gerstein,	(参考訳) 短期量子コンピューティングにおける大きな課題は、量子ハードウェアリソースの不足による大規模な実世界のデータセットへの適用である。このようなデータセットに対してトラクタブルな量子モデルを可能にするアプローチの1つは、下流分析に不可欠な情報を示しながら、元のデータを管理可能な次元に圧縮することである。古典的機械学習では、変動オートエンコーダ(VAE)は効率的なデータ圧縮、その後のタスクの表現学習、新しいデータ生成を容易にする。しかし、量子コンピュータ上での量子データへの直接適用のために、これらの特徴をすべて正確に捉えるモデルが提案されていない。データ圧縮のための既存の量子モデルは、潜在表現の正規化を欠いているため、一般化の生成と制御に直接的な使用を妨げている。他のモデルは、いくつかの内部量子成分しか持たないハイブリッドモデルであり、量子データを直接訓練することを妨げている。このギャップを埋めるために、古典的VAEのすべての能力を包含し、古典的データ圧縮と量子的データ圧縮の両方に直接適用できる完全量子フレームワークである$\zeta$-QVAEを提案する。我々のモデルは、正規化された混合状態を利用して最適な潜在表現を得る。再建・正規化に様々な違いがある。さらに、各段階で混合状態の調整を行うことで、全データ密度行列を利用でき、"グローバル"トレーニングの目的を達成できる。そうすることで効率の良い最適化が可能になり、プライベートとフェデレーションの学習に潜在的に影響する可能性がある。我々は,$\zeta$-QVAEの理論的性質の探索に加えて,代表ゲノミクスと合成データの性能を実証する。我々の結果は、$\zeta$-QVAEがマッチした古典モデルと比較すると、類似またはより良い性能を示すことを一貫して示している。 A major challenge in near-term quantum computing is its application to large real-world datasets due to scarce quantum hardware resources. One approach to enabling tractable quantum models for such datasets involves compressing the original data to manageable dimensions while still representing essential information for downstream analysis. In classical machine learning, variational autoencoders (VAEs) facilitate efficient data compression, representation learning for subsequent tasks, and novel data generation. However, no model has been proposed that exactly captures all of these features for direct application to quantum data on quantum computers. Some existing quantum models for data compression lack regularization of latent representations, thus preventing direct use for generation and control of generalization. Others are hybrid models with only some internal quantum components, impeding direct training on quantum data. To bridge this gap, we present a fully quantum framework, $\zeta$-QVAE, which encompasses all the capabilities of classical VAEs and can be directly applied for both classical and quantum data compression. Our model utilizes regularized mixed states to attain optimal latent representations. It accommodates various divergences for reconstruction and regularization. Furthermore, by accommodating mixed states at every stage, it can utilize the full-data density matrix and allow for a "global" training objective. Doing so, in turn, makes efficient optimization possible and has potential implications for private and federated learning. In addition to exploring the theoretical properties of $\zeta$-QVAE, we demonstrate its performance on representative genomics and synthetic data. Our results consistently indicate that $\zeta$-QVAE exhibits similar or better performance compared to matched classical models.	翻訳日:2024-08-06 23:26:29 公開日:2024-08-02
# マンバにおけるファクチュアル・アソシエーションの立地と編集 Locating and Editing Factual Associations in Mamba ( http://arxiv.org/abs/2404.03646v2 ) ライセンス: Link先を確認	Arnab Sen Sharma, David Atkinson, David Bau,	(参考訳) 本研究では,マンバ状態空間モデルにおける事実的リコールのメカニズムについて検討する。我々の研究は, 自己回帰型トランスフォーマー言語モデルにおける過去の知見に触発されて, それらの知識リコールが特定のトークン位置の特定のモジュールに局所化されていることを示唆し, マンバにおける事実リコールが同様に局所化可能であるかどうかを問う。これを調べるために,マンバで4行の実験を行う。まず,中間層内の特定の成分が被写体の最後のトークンにおいて強い因果効果を示すのに対して,後層の介入による因果効果は,前者のトークンにおいて最も顕著であり,自己回帰トランスフォーマーにおける前の結果と一致する。第2に、トランスフォーマーLMの発見に類似した、ランクワンのモデル編集手法が、特定の場所で事実を挿入できることを示す。第3に,マンバの事実関係表現の線型性について検討する。最後に,実際のリコール時の情報の流れを識別するために,注意ノックアウト手法をMambaに適用する。我々は、Mambaを、同様のサイズの自己回帰変換器LMと直接比較し、アーキテクチャアプローチに大きな違いがあるにもかかわらず、事実的リコールに関しては、2つのアーキテクチャが多くの類似点を共有していると結論づける。 We investigate the mechanisms of factual recall in the Mamba state space model. Our work is inspired by previous findings in autoregressive transformer language models suggesting that their knowledge recall is localized to particular modules at specific token locations; we therefore ask whether factual recall in Mamba can be similarly localized. To investigate this, we conduct four lines of experiments on Mamba. First, we apply causal tracing or interchange interventions to localize key components inside Mamba that are responsible for recalling facts, revealing that specific components within middle layers show strong causal effects at the last token of the subject, while the causal effect of intervening on later layers is most pronounced at the last token of the prompt, matching previous findings on autoregressive transformers. Second, we show that rank-one model editing methods can successfully insert facts at specific locations, again resembling findings on transformer LMs. Third, we examine the linearity of Mamba's representations of factual relations. Finally we adapt attention-knockout techniques to Mamba in order to dissect information flow during factual recall. We compare Mamba directly to a similar-sized autoregressive transformer LM and conclude that despite significant differences in architectural approach, when it comes to factual recall, the two architectures share many similarities.	翻訳日:2024-08-06 23:16:45 公開日:2024-08-02
# 低ランクテンソルコンプリートによる変分量子アルゴリズムのランドスケープ再構築 Variational Quantum Algorithm Landscape Reconstruction by Low-Rank Tensor Completion ( http://arxiv.org/abs/2405.10941v2 ) ライセンス: Link先を確認	Tianyi Hao, Zichang He, Ruslan Shaydulin, Marco Pistoia, Swamit Tannu,	(参考訳) 変分量子アルゴリズム(VQA)は、科学と産業に多くの応用がある幅広い種類のアルゴリズムである。問題へのVQAの適用には、コスト関数の最大化または最小化によるパラメータ化量子回路の最適化が含まれる。 VQAに関連する特別な課題は、関連するコスト関数の性質を理解することである。 VQAコスト関数のランドスケープを持つことは、新しい変分量子アルゴリズムの開発とテストに大いに役立つが、計算は非常に高価である。既存の技術を用いてVQAの景観を再構築するには、特にランドスケープの寸法や解像度が高い場合、多くのコスト関数評価が必要である。そこで本研究では,局所景観復元のための低ランクテンソル・コンプリート・アプローチを提案する。テンソルのコンパクトな低ランク表現を利用することで、この手法は次元の呪いを克服し、高解像度の景観を扱うことができる。本稿では,制約付き最適化問題に対するペナルティ項の解析と,特定の基底状態の確率景観の検証を実践的応用として示すことで,VQA開発における景観のパワーを実証する。 Variational quantum algorithms (VQAs) are a broad class of algorithms with many applications in science and industry. Applying a VQA to a problem involves optimizing a parameterized quantum circuit by maximizing or minimizing a cost function. A particular challenge associated with VQAs is understanding the properties of associated cost functions. Having the landscapes of VQA cost functions can greatly assist in developing and testing new variational quantum algorithms, but they are extremely expensive to compute. Reconstructing the landscape of a VQA using existing techniques requires a large number of cost function evaluations, especially when the dimension or the resolution of the landscape is high. To address this challenge, we propose a low-rank tensor-completion-based approach for local landscape reconstruction. By leveraging compact low-rank representations of tensors, our technique can overcome the curse of dimensionality and handle high-resolution landscapes. We demonstrate the power of landscapes in VQA development by showcasing practical applications of analyzing penalty terms for constrained optimization problems and examining the probability landscapes of certain basis states.	翻訳日:2024-08-06 23:16:45 公開日:2024-08-02
# 一般化Lefschetz thimble法による量子宇宙論のモンテカルロ研究 Monte Carlo studies of quantum cosmology by the generalized Lefschetz thimble method ( http://arxiv.org/abs/2407.17724v2 ) ライセンス: Link先を確認	Chien-Yu Chou, Jun Nishimura,	(参考訳) 量子宇宙論は宇宙の始まりを解明することを目的としています。 80年代初期、ヴィレンキンとハートル・ホーキングは「何もない」と「境界のない」提案を提唱した。近年、ピカール・レフシェッツ理論を用いてローレンツ量子重力の振動経路積分を定義する観点から、この問題に対する新たな関心が高まっている。ミニ超空間とサドル点近似を超えていくことを目的として、一般化されたレフシェッツ・ティンブル法を用いてモンテカルロ計算を行い、符号問題を克服する。特に、パラメータに応じてロビン境界条件を使用する場合、ヴィレンキンあるいはハートル・ホーキング・サドル点が関係することを確認する。また、量子宇宙論の基本的な問題として、ラプス関数の積分領域に関する問題や、サドル点で得られた複素幾何学から実幾何学を読み取る問題などを明らかにした。 Quantum cosmology aims at elucidating the beginning of our Universe. Back in early 80's, Vilenkin and Hartle-Hawking put forward the "tunneling from nothing" and "no boundary" proposals. Recently there has been renewed interest in this subject from the viewpoint of defining the oscillating path integral for Lorentzian quantum gravity using the Picard-Lefschetz theory. Aiming at going beyond the mini-superspace and saddle-point approximations, we perform Monte Carlo calculations using the generalized Lefschetz thimble method to overcome the sign problem. In particular, we confirm that either Vilenkin or Hartle-Hawking saddle point becomes relevant if one uses the Robin boundary condition depending on its parameter. We also clarify some fundamental issues in quantum cosmology, such as an issue related to the integration domain of the lapse function and an issue related to reading off the real geometry from the complex geometry obtained at the saddle point.	翻訳日:2024-08-06 23:07:02 公開日:2024-08-02
# BlenderAlchemy:ビジョンランゲージモデルによる3Dグラフィックの編集 BlenderAlchemy: Editing 3D Graphics with Vision-Language Models ( http://arxiv.org/abs/2404.17672v3 ) ライセンス: Link先を確認	Ian Huang, Guandao Yang, Leonidas Guibas,	(参考訳) グラフィックデザインは、映画制作やゲームデザインなど様々な用途において重要である。高品質なシーンを作るためには、デザイナーは通常、Blenderのようなソフトウェアに何時間も費やす必要がある。さらに、わずかに異なる設計目標には、完全に異なるシーケンスが必要になる可能性があるため、自動化が難しくなる。本稿では,GPT-4Vのような視覚言語モデル(VLM)を利用して,ユーザの意図を満足できる回答に到達するための設計行動空間をインテリジェントに探索するシステムを提案する。具体的には、視覚に基づく編集生成器と状態評価器を協調して設計し、その目標を達成するためのアクションの正しいシーケンスを見つける。人間のデザインプロセスにおける視覚的想像力の役割に触発されて、VLMの視覚的推論能力と画像生成モデルからの「想像」参照イメージを補完し、抽象言語記述の視覚的基盤を提供する。本稿では,テキストや参照画像からの手続き資料や幾何学の編集や,複雑なシーンにおける製品レンダリングの照明構成の調整といったタスクに対して,簡単なが退屈なブレンダー編集シーケンスを生成できることを示す実証的証拠を提供する。 Graphics design is important for various applications, including movie production and game design. To create a high-quality scene, designers usually need to spend hours in software like Blender, in which they might need to interleave and repeat operations, such as connecting material nodes, hundreds of times. Moreover, slightly different design goals may require completely different sequences, making automation difficult. In this paper, we propose a system that leverages Vision-Language Models (VLMs), like GPT-4V, to intelligently search the design action space to arrive at an answer that can satisfy a user's intent. Specifically, we design a vision-based edit generator and state evaluator to work together to find the correct sequence of actions to achieve the goal. Inspired by the role of visual imagination in the human design process, we supplement the visual reasoning capabilities of VLMs with "imagined" reference images from image-generation models, providing visual grounding of abstract language descriptions. In this paper, we provide empirical evidence suggesting our system can produce simple but tedious Blender editing sequences for tasks such as editing procedural materials and geometry from text and/or reference images, as well as adjusting lighting configurations for product renderings in complex scenes.	翻訳日:2024-08-06 22:45:03 公開日:2024-08-02
# 水平拡大:長尺胸部X線分類のためのハイブリッド量子伝達学習の実現 Expanding the Horizon: Enabling Hybrid Quantum Transfer Learning for Long-Tailed Chest X-Ray Classification ( http://arxiv.org/abs/2405.00156v2 ) ライセンス: Link先を確認	Skylar Chan, Pranav Kulkarni, Paul H. Yi, Vishwa S. Parekh,	(参考訳) 量子機械学習(QML)は、サンプル効率と一般化性において古典的機械学習(CML)よりも理論的に有利なため、大規模胸部X線(CXR)データセットにおける希少かつ致命的な疾患のマルチラベル分類を改善する可能性がある。以前の文献では、QMLをCXRで調べているが、量子ハードウェアや計算コストのかかるシミュレーションへのアクセスに制限があるため、小さなデータセットを使ったバイナリ分類タスクに重点を置いている。そのために我々は,現在のソフトウェア製品よりもウォールタイム時間を大幅に改善した,中規模のキュービットアーキテクチャのシミュレーションを可能にするJaxベースのフレームワークを実装した。我々は,大規模CXRデータセットを用いて,8,14,19の疾患ラベルの長期分類のためのハイブリッド量子トランスファー学習の効率と性能の観点から,Jaxベースのフレームワークの性能を評価した。 Jaxベースのフレームワークは、それぞれPyTorchとTensorFlowの実装と比較して、最大58%と95%のスピードアップを実現した。しかし, CMLと比較すると, 平均AUROCは0.70, 0.73, 0.74, CXR病ラベルは8, 14, 19であった。一方、CMLモデルの平均AUROCは0.77、0.78、0.80であった。結論として,計算効率のよいJaxベースのフレームワークを用いて,長い尾を持つCXR分類のためのハイブリッド量子トランスファー学習の実装を提案する。 Quantum machine learning (QML) has the potential for improving the multi-label classification of rare, albeit critical, diseases in large-scale chest x-ray (CXR) datasets due to theoretical quantum advantages over classical machine learning (CML) in sample efficiency and generalizability. While prior literature has explored QML with CXRs, it has focused on binary classification tasks with small datasets due to limited access to quantum hardware and computationally expensive simulations. To that end, we implemented a Jax-based framework that enables the simulation of medium-sized qubit architectures with significant improvements in wall-clock time over current software offerings. We evaluated the performance of our Jax-based framework in terms of efficiency and performance for hybrid quantum transfer learning for long-tailed classification across 8, 14, and 19 disease labels using large-scale CXR datasets. The Jax-based framework resulted in up to a 58% and 95% speed-up compared to PyTorch and TensorFlow implementations, respectively. However, compared to CML, QML demonstrated slower convergence and an average AUROC of 0.70, 0.73, and 0.74 for the classification of 8, 14, and 19 CXR disease labels. In comparison, the CML models had an average AUROC of 0.77, 0.78, and 0.80 respectively. In conclusion, our work presents an accessible implementation of hybrid quantum transfer learning for long-tailed CXR classification with a computationally efficient Jax-based framework.	翻訳日:2024-08-06 22:45:03 公開日:2024-08-02
# 概念以上の意味を持つ関係:CoReXによる分類決定の探索と評価 When a Relation Tells More Than a Concept: Exploring and Evaluating Classifier Decisions with CoReX ( http://arxiv.org/abs/2405.01661v2 ) ライセンス: Link先を確認	Bettina Finzel, Patrick Hilme, Johannes Rabold, Ute Schmid,	(参考訳) 入力画素の関連性に基づく畳み込みニューラルネットワーク(CNN)の解説は、どの入力特徴がモデル決定にどのように影響するかを評価するには、あまり特異ではないかもしれない。特に生物学のような複雑な現実世界の領域では、特定の概念の存在と概念間の関係はクラス間で区別される。ピクセルの関連性はこの種の情報を伝えるのに十分ではない。結果として、モデル評価は制限され、データに関連性があり、モデル決定に影響を与えることは見過ごされかねない。本研究では,概念と関係に基づく説明器(CoReX)を用いて,CNNモデルの説明と評価を行う新しい手法を提案する。決定過程から関連する概念をマスキングし,学習した解釈可能なサロゲートモデルにおける関係を拘束することにより,画像の集合上でのモデルの予測挙動を説明する。いくつかの画像データセットとCNNアーキテクチャでアプローチをテストする。結果から,CNNモデルに対するCReXの説明は予測結果に忠実であることが示唆された。さらに,人間による評価を通じて,CNNの分類品質を評価する上で,CReXは複合的な説明を生成するのに適したツールであることを示す。さらに,CoReXが不正確な分類や曖昧な分類の識別と再分類を支援することを示す。 Explanations for Convolutional Neural Networks (CNNs) based on relevance of input pixels might be too unspecific to evaluate which and how input features impact model decisions. Especially in complex real-world domains like biology, the presence of specific concepts and of relations between concepts might be discriminating between classes. Pixel relevance is not expressive enough to convey this type of information. In consequence, model evaluation is limited and relevant aspects present in the data and influencing the model decisions might be overlooked. This work presents a novel method to explain and evaluate CNN models, which uses a concept- and relation-based explainer (CoReX). It explains the predictive behavior of a model on a set of images by masking (ir-)relevant concepts from the decision-making process and by constraining relations in a learned interpretable surrogate model. We test our approach with several image data sets and CNN architectures. Results show that CoReX explanations are faithful to the CNN model in terms of predictive outcomes. We further demonstrate through a human evaluation that CoReX is a suitable tool for generating combined explanations that help assessing the classification quality of CNNs. We further show that CoReX supports the identification and re-classification of incorrect or ambiguous classifications.	翻訳日:2024-08-06 22:45:03 公開日:2024-08-02
# データ抽出と材料特性予測のための会話モデルを用いた動的インコンテキスト学習 Dynamic In-context Learning with Conversational Models for Data Extraction and Materials Property Prediction ( http://arxiv.org/abs/2405.10448v2 ) ライセンス: Link先を確認	Chinedu Ekuma,	(参考訳) 自然言語処理と大規模言語モデル(LLM)の出現は、構造化されていない学術論文からのデータの抽出に革命をもたらした。しかし、データの信頼性を確保することは重要な課題である。本稿では,Google gemini-proやOpenAI gpt-4といった高度な対話型LLMを活用するオープンソースツールであるPropertyExtractorを紹介する。本試験では,約9%の誤差率で95%を超える精度とリコールを示し,ツールキットの有効性と汎用性を強調した。最後に、PropertyExtractorを用いて、2次元材料厚のデータベース、デバイス統合のクリティカルパラメータ、エネルギーバンドギャップ値を開発する。特に厚さデータベースの場合、フィールドの急速な進化は実験的な測定と計算方法の両方を上回り、重要なデータギャップを生み出している。このギャップに対処し、様々な物件データベースを自動生成するための信頼性と効率的なツールとしてのPropertyExtractorの可能性を示し、フィールドを前進させる。 The advent of natural language processing and large language models (LLMs) has revolutionized the extraction of data from unstructured scholarly papers. However, ensuring data trustworthiness remains a significant challenge. In this paper, we introduce PropertyExtractor, an open-source tool that leverages advanced conversational LLMs like Google gemini-pro and OpenAI gpt-4, blends zero-shot with few-shot in-context learning, and employs engineered prompts for the dynamic refinement of structured information hierarchies - enabling autonomous, efficient, scalable, and accurate identification, extraction, and verification of material property data. Our tests on material data demonstrate precision and recall that exceed 95\% with an error rate of approximately 9%, highlighting the effectiveness and versatility of the toolkit. Finally, databases for 2D material thicknesses, a critical parameter for device integration, and energy bandgap values are developed using PropertyExtractor. Specifically for the thickness database, the rapid evolution of the field has outpaced both experimental measurements and computational methods, creating a significant data gap. Our work addresses this gap and showcases the potential of PropertyExtractor as a reliable and efficient tool for the autonomous generation of various material property databases, advancing the field.	翻訳日:2024-08-06 22:45:03 公開日:2024-08-02
# スキルベース学習における質問応答強化のための認知AIと生成モデルの統合 Integrating Cognitive AI with Generative Models for Enhanced Question Answering in Skill-based Learning ( http://arxiv.org/abs/2407.19393v2 ) ライセンス: Link先を確認	Rochan H. Madhusudhana, Rahul K. Dass, Jeanette Luu, Ashok K. Goel,	(参考訳) オンライン学習では、学習者に迅速かつ正確なフィードバックを提供する能力が不可欠である。スキルベースの学習では、学習者はスキルの根底にある概念やメカニズムを理解して、効果的に適用できる必要がある。ビデオはオンライン学習において一般的なツールであるが、教えられているスキルを理解したり評価したりすることはできない。さらに、生成AI手法はテキストコーパスからの回答の検索と検索に有効であるが、これらの手法が真の理解を示すかどうかは不明である。これにより、スキルの説明や問題解決を支援する能力が制限される。本稿では,認知AIと生成AIを融合してこれらの課題に対処する手法を提案する。我々は、構造化知識表現、TMK(Task-Method-Knowledge)モデルを用いて、オンライン知識ベースのAIコースで教えられたスキルをエンコードする。学習者のスキルに関する質問に応えて,大規模言語モデル,チャット・オブ・ソート(Chain-of-Thought),イテレーティブ・リファインメント(Iterative Refinement)などの手法を活用して,理性的な説明を生成するための枠組みを概説する。 In online learning, the ability to provide quick and accurate feedback to learners is crucial. In skill-based learning, learners need to understand the underlying concepts and mechanisms of a skill to be able to apply it effectively. While videos are a common tool in online learning, they cannot comprehend or assess the skills being taught. Additionally, while Generative AI methods are effective in searching and retrieving answers from a text corpus, it remains unclear whether these methods exhibit any true understanding. This limits their ability to provide explanations of skills or help with problem-solving. This paper proposes a novel approach that merges Cognitive AI and Generative AI to address these challenges. We employ a structured knowledge representation, the TMK (Task-Method-Knowledge) model, to encode skills taught in an online Knowledge-based AI course. Leveraging techniques such as Large Language Models, Chain-of-Thought, and Iterative Refinement, we outline a framework for generating reasoned explanations in response to learners' questions about skills.	翻訳日:2024-08-06 19:59:40 公開日:2024-08-02
# アライメントスコア:マルチビューポース精度評価のためのロバストメトリクス Alignment Scores: Robust Metrics for Multiview Pose Accuracy Evaluation ( http://arxiv.org/abs/2407.20391v2 ) ライセンス: Link先を確認	Seong Hun Lee, Javier Civera,	(参考訳) 本稿では,TAS(Translation Alignment Score),RAS(Rotation Alignment Score),PAS(Pose Alignment Score)の3つの新しい指標を提案する。 TASは回転とは独立して翻訳精度を評価し、RASは翻訳とは独立して回転精度を評価する。 PASは2つのスコアの平均であり、翻訳と回転の組合せの精度を評価する。 TASは以下の4つのステップで計算される: 1) 最寄りのペア距離の上位4分の1、$d$。 2) 頑健な登録法を用いて, 推定軌道を真実に順応する。 (3)全ての距離誤差を収集し、0.01d$から0.01d$までの複数のしきい値の累積周波数を得る。 (4) これらの累積周波数を加算し、理論上の最大値が1となるように正規化する。 TASは,(1)アウトレーヤやコリニア運動に対して頑健であり,(2)異なるデータセットのパラメータを調整する必要がないという点において,既存の指標に対して現実的な優位性を持っている。 RASはTASと同じような方法で計算され、既存の回転測定値よりも外れ値に対して堅牢であることも示されている。我々は、広範囲なシミュレーションを通じてクレームを検証するとともに、提案した指標の長所と短所について詳細な議論を行う。 We propose three novel metrics for evaluating the accuracy of a set of estimated camera poses given the ground truth: Translation Alignment Score (TAS), Rotation Alignment Score (RAS), and Pose Alignment Score (PAS). The TAS evaluates the translation accuracy independently of the rotations, and the RAS evaluates the rotation accuracy independently of the translations. The PAS is the average of the two scores, evaluating the combined accuracy of both translations and rotations. The TAS is computed in four steps: (1) Find the upper quartile of the closest-pair-distances, $d$. (2) Align the estimated trajectory to the ground truth using a robust registration method. (3) Collect all distance errors and obtain the cumulative frequencies for multiple thresholds ranging from $0.01d$ to $d$ with a resolution $0.01d$. (4) Add up these cumulative frequencies and normalize them such that the theoretical maximum is 1. The TAS has practical advantages over the existing metrics in that (1) it is robust to outliers and collinear motion, and (2) there is no need to adjust parameters on different datasets. The RAS is computed in a similar manner to the TAS and is also shown to be more robust against outliers than the existing rotation metrics. We verify our claims through extensive simulations and provide in-depth discussion of the strengths and weaknesses of the proposed metrics.	翻訳日:2024-08-06 19:59:40 公開日:2024-08-02
# SoK: Payment Channel Networks SoK: Payment Channel Networks ( http://arxiv.org/abs/2407.20968v2 ) ライセンス: Link先を確認	Kartick Kolachala, Mohammed Ababneh, Roopa Vishwanathan,	(参考訳) オンチェーントランザクションに関連するスケーラビリティ、スループット、コストオーバーヘッドの代替ソリューションとして、ペイメントチャネルネットワーク(PCN)が提案されている。トランザクションのオフチェーン実行を容易にすることで、PCNはブロックチェーンの負担を大幅に削減し、トランザクション処理の高速化、トランザクション手数料の削減、プライバシの向上を実現した。これらの利点にもかかわらず、PCNの現在の研究は、さらなる探査を必要とする様々な研究課題を提示している。本稿では、パスフィンディングやルーティング、仮想チャネル、状態チャネル、決済チャネルハブ、リバランシングなど、PCNの最近の研究について調査する。本調査は,PCN研究における現状の詳細な理解を読者に提供することを目的としており,いくつかの重要な進展を浮き彫りにしている。さらに,PCN研究の領域における未解決問題について紹介する。具体的には,PCN研究における,学術・研究コミュニティからの即時的な注意を必要とする,興味深い課題と非自明な課題について述べる。この課題に対処することで、興味のある読者がすぐに取り組むことができる最も急進的な問題と今後の研究方向性を特定することを目指している。この分析を通じて、我々は研究者や実践者がこれらの課題に取り組み、PCNをより安全で多目的にすることを望む。 Payment Channel Networks (PCNs) have been proposed as an alternative solution to the scalability, throughput, and cost overhead associated with on-chain transactions. By facilitating offchain execution of transactions, PCNs significantly reduce the burden on the blockchain, leading to faster transaction processing, reduced transaction fees, and enhanced privacy. Despite these advantages, the current research in PCNs presents a variety of research challenges that require further exploration. In this paper, we survey the recent work in several aspects of PCNs, such as pathfinding and routing, virtual channels, state channels, payment channel hubs and rebalancing. This survey aims to provide the reader with a detailed understanding of the current state-of-the-art in PCN research, highlighting a few important advancements. Additionally, we highlight the various unresolved issues in the area of PCN research. Specifically, this paper seeks to answer the following crucial question: What are the various interesting and non-trivial challenges in PCN research that require immediate attention from the academic and research community? By addressing this question, we aim to identify the most pressing problems and future research directions that interested readers can immediately work on. Through this analysis, we hope to inspire researchers and practitioners to tackle these challenges to make PCNs more secure and versatile	翻訳日:2024-08-06 19:59:40 公開日:2024-08-02
# ビジュアルアートワークの創造性を評価するためにCNNモデルを使用する Using a CNN Model to Assess Visual Artwork's Creativity ( http://arxiv.org/abs/2408.01481v1 ) ライセンス: Link先を確認	Zhehan Zhang, Meihua Qian, Li Luo, Ripon Saha, Qianyi Gao, Xinxin Song,	(参考訳) 芸術的創造性を評価することは、長い間研究者に挑戦してきた。近年の研究は、絵画ではなく、絵画の創造性を評価するために機械学習を適用している。本研究は,学生の絵画の創造性を自動評価するCNNモデルを開発することで,このギャップに対処する。専門家や子どもによる600点の絵のデータセットを用いて, 精度は90%, 評価速度は人間よりも向上した。このアプローチは、芸術的創造性評価の進歩における機械学習の可能性を示し、従来の方法よりも効率的な代替手段を提供する。 Assessing artistic creativity has long challenged researchers, with traditional methods proving time-consuming. Recent studies have applied machine learning to evaluate creativity in drawings, but not paintings. Our research addresses this gap by developing a CNN model to automatically assess the creativity of students' paintings. Using a dataset of 600 paintings by professionals and children, our model achieved 90% accuracy and faster evaluation times than human raters. This approach demonstrates the potential of machine learning in advancing artistic creativity assessment, offering a more efficient alternative to traditional methods.	翻訳日:2024-08-06 19:49:47 公開日:2024-08-02
# NeuralFactors: 方程式の生成的モデリングのための新しい因子学習アプローチ NeuralFactors: A Novel Factor Learning Approach to Generative Modeling of Equities ( http://arxiv.org/abs/2408.01499v1 ) ライセンス: Link先を確認	Achintya Gopal,	(参考訳) 統計モデリングにおける機械学習の利用(したがって生成モデリング)は、時系列モデル、テキスト・ツー・イメージモデル、特に大きな言語モデルの普及とともに人気が高まっている。古典的因子モデリングのゴールは、ストックリターンの統計的モデリングであり、本研究では、古典的因子モデルを強化するために、深い生成モデルを用いて検討する。これまでの研究では、数百の在庫をモデル化するために、詳細なリスク予測とアルファポートフォリオ構築のために、深層生成モデルの使用を検討してきたが、特定のモデルでは、因子の暴露が推論できないという容易にファクターモデリングの解釈ができない。本研究では、ニューラルネットワークが因子の露出と因子の戻りを出力する、新しい機械学習に基づく因子分析手法であるNeuralFactorsを紹介し、変分オートエンコーダと同じ手法を用いてトレーニングする。このモデルは,ログライクな性能と計算効率の両面において,従来の手法よりも優れていることを示す。さらに,本手法は,現実的な合成データの生成,共分散推定,リスク分析(ポートフォリオの価値,ポートフォリオの価値,VaR),ポートフォリオ最適化において,事前の作業と競合することを示す。最後に、古典的因子分析とのつながりから、モデルがクラスタを一緒に学習する要因を分析し、要素の露出がストックを埋め込むのに使えることを示す。 The use of machine learning for statistical modeling (and thus, generative modeling) has grown in popularity with the proliferation of time series models, text-to-image models, and especially large language models. Fundamentally, the goal of classical factor modeling is statistical modeling of stock returns, and in this work, we explore using deep generative modeling to enhance classical factor models. Prior work has explored the use of deep generative models in order to model hundreds of stocks, leading to accurate risk forecasting and alpha portfolio construction; however, that specific model does not allow for easy factor modeling interpretation in that the factor exposures cannot be deduced. In this work, we introduce NeuralFactors, a novel machine-learning based approach to factor analysis where a neural network outputs factor exposures and factor returns, trained using the same methodology as variational autoencoders. We show that this model outperforms prior approaches both in terms of log-likelihood performance and computational efficiency. Further, we show that this method is competitive to prior work in generating realistic synthetic data, covariance estimation, risk analysis (e.g., value at risk, or VaR, of portfolios), and portfolio optimization. Finally, due to the connection to classical factor analysis, we analyze how the factors our model learns cluster together and show that the factor exposures could be used for embedding stocks.	翻訳日:2024-08-06 19:49:47 公開日:2024-08-02
# ニューラルネットワークによる効率的なグラフ色付け:大規模グラフに対する物理に着想を得たアプローチ Efficient Graph Coloring with Neural Networks: A Physics-Inspired Approach for Large Graphs ( http://arxiv.org/abs/2408.01503v1 ) ライセンス: Link先を確認	Lorenzo Colantonio, Andrea Cacioppo, Federico Scarpati, Stefano Giagu,	(参考訳) グラフ着色問題は、隣接する2つの頂点が同じ色を共有することのないグラフの各頂点にq色の1つを割り当てることを含む最適化問題である。この問題はNPハードであり、様々な応用に現れる。本研究では,特に大規模グラフにおいて,グラフニューラルネットワークを有効活用する新しいアルゴリズムを提案する。本稿では、統計力学で使用されるツールを活用して、アルゴリズムのトレーニングと性能を向上させる物理に着想を得た手法を提案する。本手法のスケーリングは,異なる接続性およびグラフサイズに対して評価される。最後に,Erdos-Renyiグラフのデータセット上での本手法の有効性を実証し,従来の手法が難解な接続領域においても適用可能であることを示す。 The graph coloring problem is an optimization problem involving the assignment of one of q colors to each vertex of a graph such that no two adjacent vertices share the same color. This problem is NP-hard and arises in various practical applications. In this work, we present a novel algorithm that leverages graph neural networks to tackle the problem efficiently, particularly for large graphs. We propose a physics-inspired approach that leverages tools used in statistical mechanics to improve the training and performance of the algorithm. The scaling of our method is evaluated for different connectivities and graph sizes. Finally, we demonstrate the effectiveness of our method on a dataset of Erdos-Renyi graphs, showing its applicability also in hard-to-solve connectivity regions where traditional methods struggle.	翻訳日:2024-08-06 19:49:47 公開日:2024-08-02
# MoDE: Dyadic Experts を併用したマルチタスクパラメータ効率の良いファインチューニング MoDE: Effective Multi-task Parameter Efficient Fine-Tuning with a Mixture of Dyadic Experts ( http://arxiv.org/abs/2408.01505v1 ) ライセンス: Link先を確認	Lin Ning, Harsh Lara, Meiqi Guo, Abhinav Rastogi,	(参考訳) Low-Rank Adaptation (LoRA)のようなパラメータ効率のよい微調整技術は、大規模言語モデル(LLM)の多様なタスクへの適応に革命をもたらした。近年、マルチタスク設定のためのLoRAモジュールの混合について検討している。しかし,本研究では,これらのアーキテクチャのダウンプロジェクション行列の冗長性を明らかにした。この観察は,提案手法であるMixture of Dyadic Experts (MoDE) を動機付け,効率的なマルチタスク適応のための新しい設計を提案する。これはタスク間でダウンプロジェクション行列を共有し、より高度なタスクレベルの特殊化を可能にするルータとアトミックなランクワンアダプタを併用することで実現される。我々の設計はよりきめ細かい混合を可能にし、それによってモデルの複数のタスクを共同で処理する能力を高めます。我々は,700以上のタスクからなるSNI(Super Natural Instructions)ベンチマーク上でMoDEを評価し,さらにパラメータを追加することなく,最先端のマルチタスクパラメータ効率の微調整(PEFT)手法よりも優れていることを示した。本研究は,マルチタスクLLM適応におけるパラメータ効率のより深い理解に寄与し,高性能で軽量なモデルを展開するための実用的なソリューションを提供する。 Parameter-efficient fine-tuning techniques like Low-Rank Adaptation (LoRA) have revolutionized the adaptation of large language models (LLMs) to diverse tasks. Recent efforts have explored mixtures of LoRA modules for multi-task settings. However, our analysis reveals redundancy in the down-projection matrices of these architectures. This observation motivates our proposed method, Mixture of Dyadic Experts (MoDE), which introduces a novel design for efficient multi-task adaptation. This is done by sharing the down-projection matrix across tasks and employing atomic rank-one adapters, coupled with routers that allow more sophisticated task-level specialization. Our design allows for more fine-grained mixing, thereby increasing the model's ability to jointly handle multiple tasks. We evaluate MoDE on the Supernatural Instructions (SNI) benchmark consisting of a diverse set of 700+ tasks and demonstrate that it outperforms state-of-the-art multi-task parameter-efficient fine-tuning (PEFT) methods, without introducing additional parameters. Our findings contribute to a deeper understanding of parameter efficiency in multi-task LLM adaptation and provide a practical solution for deploying high-performing, lightweight models.	翻訳日:2024-08-06 19:49:47 公開日:2024-08-02
# 強化学習による量子ノイズモデリング Quantum noise modeling through Reinforcement Learning ( http://arxiv.org/abs/2408.01506v1 ) ライセンス: Link先を確認	Simone Bordoni, Andrea Papaluca, Piergiorgio Buttarini, Alejandro Sopena, Stefano Giagu, Stefano Carrazza,	(参考訳) 量子コンピューティングの現在の時代には、シミュレーションと量子ハードウェア実行のギャップを埋めるために、堅牢で効率的なツールが不可欠である。本研究では,量子チップに影響を及ぼすノイズを識別し,シミュレーション中にエミュレートする機械学習手法を提案する。このアルゴリズムは強化学習を活用し、ランダム化ベンチマークやヒューリスティックノイズモデルといった従来の手法と比較して、様々なノイズモデルを再現する柔軟性を向上させる。実超伝導量子ビット上でのシミュレーションおよび試験によりRLエージェントの有効性が検証された。さらに、有名な量子アルゴリズムの研究に応用例を挙げる。 In the current era of quantum computing, robust and efficient tools are essential to bridge the gap between simulations and quantum hardware execution. In this work, we introduce a machine learning approach to characterize the noise impacting a quantum chip and emulate it during simulations. Our algorithm leverages reinforcement learning, offering increased flexibility in reproducing various noise models compared to conventional techniques such as randomized benchmarking or heuristic noise models. The effectiveness of the RL agent has been validated through simulations and testing on real superconducting qubits. Additionally, we provide practical use-case examples for the study of renowned quantum algorithms.	翻訳日:2024-08-06 19:49:47 公開日:2024-08-02
# ブロックチェーン経済におけるサステナビリティ攻撃の否定 - Ethereumトランザクションフォワードにおけるレイテンシ最適化の爆発的実行 Blockchain Economic Denial of Sustainability Attack: Exploiting Latency Optimization in Ethereum Transaction Forwarding ( http://arxiv.org/abs/2408.01508v1 ) ライセンス: Link先を確認	Taro Tsuchiya, Liyi Zhou, Kaihua Qin, Arthur Gervais, Nicolas Christin,	(参考訳) 調停、フロントランニング、バックランニングといったブロックチェーンの概念である抽出可能な価値(MEV/BEV)に関する戦略は、ブロックチェーンネットワークをセキュアにするための中核的な機能であるトランザクション検証時間を最小化するなど、レイテンシを低減するためのネットワークノードの経済的インセンティブを生み出します。 Ethereum P2Pネットワークにおける不正なトランザクションをフィルタリングしない修正ノードは、新しいアタックベクターを導入している。本研究では,修正ノードのオペレーターに対するトラフィックコストの金銭的損失を生じさせるブロックチェーン・エコノミック・デニアル・オブ・サステナビリティ(EDoS)攻撃を形式化し,評価する。我が家 1) 数学的に攻撃モデルを定義する。 2) 野生での同様の攻撃の実証事例を数千件特定する。 3) 2つのモニタリングノードからモデルパラメータを経験的に測定し、 4) 既存のDenial-of-Service攻撃と比較するため, ローカルネットワーク上で攻撃シミュレーションを行う。攻撃者は修正ノードのネットワークトラフィックを3,600倍に増幅し、攻撃を行うために必要な量よりも13,800倍の経済被害を発生させることができることを示す。これらのリスクにもかかわらず、アグレッシブなレイテンシ削減は、修正ノードの存在を正当化するのに十分な利益を得る可能性がある。このトレードオフを評価するには 1)ローカルネットワークにおける取引検証プロセスをシミュレートし、 2)Ethereumテストネットに修正ノードをデプロイすることで遅延低減を実証的に測定する。我々は、スキップ検証の費用対効果分析を行い、この攻撃に対する緩和戦略を提供する。 Strategies related to the blockchain concept of Extractable Value (MEV/BEV), such as arbitrage, front- or back-running create an economic incentive for network nodes to reduce latency, including minimizing transaction validation time -- a core feature to secure blockchain networks. A modified node, that neglects to filter invalid transactions in the Ethereum P2P network, introduces novel attack vectors. In this work, we formalize and evaluate a Blockchain Economic Denial of Sustainability (EDoS) attack, which can cause financial losses in traffic costs for operators of modified nodes. We 1) mathematically define the attack model, 2) identify thousands of empirical instances of this similar attack in the wild, 3) empirically measure the model parameters from our two monitoring nodes, and 4) conduct attack simulations on the local network to compare its performance with existing Denial-of-Service attacks. We show that an attacker can amplify network traffic at modified nodes by a factor of 3,600, and cause economic damages 13,800 times greater than the amount needed to carry out the attack. Despite these risks, aggressive latency reduction may still be profitable enough to justify the existence of modified nodes. To assess this trade-off, we 1) simulate the transaction validation process in the local network and 2) empirically measure the latency reduction by deploying our modified node in the Ethereum testnet. We conclude with a cost-benefit analysis of skipping validation and provide mitigation strategies against this attack.	翻訳日:2024-08-06 19:49:47 公開日:2024-08-02
# 不確実性下における生成モデルによる適応計画 Adaptive Planning with Generative Models under Uncertainty ( http://arxiv.org/abs/2408.01510v1 ) ライセンス: Link先を確認	Pascal Jutras-Dubé, Ruqi Zhang, Aniket Bera,	(参考訳) 生成モデルによる計画は、強化学習や自律ナビゲーションなど、幅広い領域にわたる効果的な意思決定パラダイムとして現れている。各時点における連続的な再計画は、最新の環境観測に基づいて決定を下すことができるため直感的に思えるかもしれないが、主に生成モデルの基盤となるディープラーニングアーキテクチャの複雑さのために、かなりの計算上の困難が生じる。本研究は, 生成モデルの長期的状態軌跡予測能力を活用し, 即時的な計画変更を必要とせずに連続的に複数行動の実行を可能にする, 適応型計画手法を導入することで, この課題に対処する。本稿では,逆動力学モデルのディープアンサンブルから導かれる予測不確実性を利用して,計画セッション間の間隔を動的に調整することを提案する。我々は,OpenAI Gymフレームワーク内での移動タスクの実施実験において,適応計画政策により,性能を損なうことなく,頻度を約10%に短縮できることを実証した。本結果は,意思決定の効率的かつ効果的なツールとしての生成モデルの可能性を明らかにするものである。 Planning with generative models has emerged as an effective decision-making paradigm across a wide range of domains, including reinforcement learning and autonomous navigation. While continuous replanning at each timestep might seem intuitive because it allows decisions to be made based on the most recent environmental observations, it results in substantial computational challenges, primarily due to the complexity of the generative model's underlying deep learning architecture. Our work addresses this challenge by introducing a simple adaptive planning policy that leverages the generative model's ability to predict long-horizon state trajectories, enabling the execution of multiple actions consecutively without the need for immediate replanning. We propose to use the predictive uncertainty derived from a Deep Ensemble of inverse dynamics models to dynamically adjust the intervals between planning sessions. In our experiments conducted on locomotion tasks within the OpenAI Gym framework, we demonstrate that our adaptive planning policy allows for a reduction in replanning frequency to only about 10% of the steps without compromising the performance. Our results underscore the potential of generative modeling as an efficient and effective tool for decision-making.	翻訳日:2024-08-06 19:49:47 公開日:2024-08-02
# グラフ特性を持つ重み付きグラフ状態の曲率とねじれの関係とその量子コンピュータに関する研究 Relation of curvature and torsion of weighted graph states with graph properties and its studies on a quantum computer ( http://arxiv.org/abs/2408.01511v1 ) ライセンス: Link先を確認	Kh. P. Gnatenko,	(参考訳) 重み付きグラフで表現できるスピン系の量子状態は、$G(V, E)$である。これらの状態の幾何学的特性について検討した。量子進化の速度は、グラフ内のノードの重み付け次数の和によって決定され、G(V, E)$の重み付けを第2のパワーに上げることによって構成される。曲率(英: curvature)は、グラフの重み付きノードの和に依り、重みを第2乗と第4乗に$G(V, E)$で上げる。また、グラフ $G(V, E)$ の辺の重みの積の和にも依存する。追加のねじれは、グラフ $G(V, E)$ の辺の重みの積の和 $S_3$ に関係している。量子グラフ状態の幾何学的性質と重み付けされたノードの和は、スピンチェーンの場合、IBMの量子コンピュータ上で量子プログラミングによって計算されている。 Quantum states of spin systems that can be represented with weighted graphs $G(V, E)$ are studied. The geometrical characteristics of these states are examined. We find that the velocity of quantum evolution is determined by the sum of the weighted degrees of the nodes in the graph, constructed by raising to the second power the weights in $G(V, E)$. The curvature depends on the sum of the weighted degrees of nodes in graphs constructed by raising the weights in $G(V, E)$ to the second and fourth powers. It also depends on the sum of the products of the weights of edges forming squares in graph $G(V, E)$. The torsion in addition is related to the sum of the products of the weights of edges in graph $G(V, E)$ forming triangles $S_3$. Geometric properties of quantum graph states and the sum of the weighted degrees of nodes have been calculated with quantum programming on IBM's quantum computer for the case of a spin chain.	翻訳日:2024-08-06 19:49:47 公開日:2024-08-02
# ギブスサンプリングは、O(1)$-ローカルハミルトニアンによる一定温度での量子アドバンテージを与える Gibbs Sampling gives Quantum Advantage at Constant Temperatures with $O(1)$-Local Hamiltonians ( http://arxiv.org/abs/2408.01516v1 ) ライセンス: Link先を確認	Joel Rajakumar, James D. Watson,	(参考訳) Gibbs の標本化 $\unicode{x2013}$ 熱平衡におけるシステムに対応する状態 $\unicode{x2013}$ は、量子コンピュータが古典的コンピュータと比較して超ポリノミカルなスピードアップを達成することを期待するタスクであることが最近示されている(Bergamaschi et al , arXiv: 2404.14639)。これらの結果を拡張し、量子コンピュータを用いて古典的な硬さを示すことによって、O(1)-局所相互作用を持つハミルトニアンのギブス状態に対して、この量子優位性が依然として生じていることを示す。特に、3次元格子上の5-局所ハミルトニアンに対しても、サンプリングの硬さが維持されることを示す。さらに、不完全な測定しかできない場合、サンプルの硬さは堅牢であることを示す。これらの硬度結果の他に、ギブス状態が古典的にハミルトンの相互作用グラフの最大度でサンプリングし易くなる温度の低い境界を示す。 Sampling from Gibbs states $\unicode{x2013}$ states corresponding to system in thermal equilibrium $\unicode{x2013}$ has recently been shown to be a task for which quantum computers are expected to achieve super-polynomial speed-up compared to classical computers, provided the locality of the Hamiltonian increases with the system size (Bergamaschi et al., arXiv: 2404.14639). We extend these results to show that this quantum advantage still occurs for Gibbs states of Hamiltonians with O(1)-local interactions at constant temperature by showing classical hardness-of-sampling and demonstrating such Gibbs states can be prepared efficiently using a quantum computer. In particular, we show hardness-of-sampling is maintained even for 5-local Hamiltonians on a 3D lattice. We additionally show that the hardness-of-sampling is robust when we are only able to make imperfect measurements. Beyond these hardness results, we present a lower bound on the temperatures that Gibbs states become easy to sample from classically in terms of the maximum degree of the Hamiltonian's interaction graph.	翻訳日:2024-08-06 19:49:47 公開日:2024-08-02
# パラメータ空間における勾配流は出力空間における線形補間と等価である Gradient flow in parameter space is equivalent to linear interpolation in output space ( http://arxiv.org/abs/2408.01517v1 ) ライセンス: Link先を確認	Thomas Chen, Patrícia Muñoz Ewald,	(参考訳) 深層学習におけるニューラルネットワークのトレーニングアルゴリズムの根底にあるパラメータ空間における通常の勾配流は、連続的に適応された勾配流に変形し、出力空間におけるユークリッド勾配流を生じることを証明した。さらに、パラメータに関する出力のヤコビアンが完全ランク(固定トレーニングデータ)であれば、時間変数は単に線形補間であり、大域的な最小値が得られるように再パラメータ化することができる。 We prove that the usual gradient flow in parameter space that underlies many training algorithms for neural networks in deep learning can be continuously deformed into an adapted gradient flow which yields (constrained) Euclidean gradient flow in output space. Moreover, if the Jacobian of the outputs with respect to the parameters is full rank (for fixed training data), then the time variable can be reparametrized so that the resulting flow is simply linear interpolation, and a global minimum can be achieved.	翻訳日:2024-08-06 19:49:47 公開日:2024-08-02
# Raster-Wise 床計画のセマンティックセグメンテーションを改良したマルチユニット床計画認識と再構成 Multi-Unit Floor Plan Recognition and Reconstruction Using Improved Semantic Segmentation of Raster-Wise Floor Plans ( http://arxiv.org/abs/2408.01526v1 ) ライセンス: Link先を確認	Lukas Kratochvila, Gijs de Jong, Monique Arkesteijn, Simon Bilik, Tomas Zemcik, Karel Horak, Jan S. Rellermeyer,	(参考訳) デジタル双生児は、避難経路のより効率的な設計、例外的な状況での配向の改善、迅速な救助介入を可能にするため、緊急計画において都市管理の重要な部分を形成する大きな可能性を秘めている。しかし、3D表現が不足しているため、新しい建物では限られた量しか利用できないため、双子の製作は依然として手作業で行われている。そこで本研究では,一般的な2次元建築フロアプランから3次元情報を合成することを目的とする。本稿では,MDA-Unet と MACU-Net アーキテクチャをベースとした2つの新しい画素分割手法を提案する。提案手法は他の2つの最先端技術とベンチマークデータセットと比較した。一般的に使用されるCubeCasaベンチマークデータセットでは,5つのクラスに対して平均F1スコアが0.86であり,他のピクセル単位のアプローチよりも優れていた。私たちはまた、この分野の研究を支援するためにコードを公開しました。 Digital twins have a major potential to form a significant part of urban management in emergency planning, as they allow more efficient designing of the escape routes, better orientation in exceptional situations, and faster rescue intervention. Nevertheless, creating the twins still remains a largely manual effort, due to a lack of 3D-representations, which are available only in limited amounts for some new buildings. Thus, in this paper we aim to synthesize 3D information from commonly available 2D architectural floor plans. We propose two novel pixel-wise segmentation methods based on the MDA-Unet and MACU-Net architectures with improved skip connections, an attention mechanism, and a training objective together with a reconstruction part of the pipeline, which vectorizes the segmented plans to create a 3D model. The proposed methods are compared with two other state-of-the-art techniques and several benchmark datasets. On the commonly used CubiCasa benchmark dataset, our methods have achieved the mean F1 score of 0.86 over five examined classes, outperforming the other pixel-wise approaches tested. We have also made our code publicly available to support research in the field.	翻訳日:2024-08-06 19:49:47 公開日:2024-08-02
# LLMのソフトウェア不適格ユーザ感の確立能力の分析 Analyzing LLMs' Capabilities to Establish Implicit User Sentiment of Software Desirability ( http://arxiv.org/abs/2408.01527v1 ) ライセンス: Link先を確認	Sherri Weitl-Harms, John D. Hastings, Jonah Lum,	(参考訳) 本研究では,ユーザによって表現される暗黙のソフトウェア望ましさを定量的にゼロショット感情分析するために,複数のLCMを用いて検討する。この研究は、感情を肯定的、中立的、否定的なものに分類する他の方法とは異なり、スケールされた数値的な感情分析を提供する。数値分析は感情の大きさについて深い洞察を与え、製品の望ましさに関するより良い意思決定を促す。データはMicrosoft Product Desirability Toolkit (PDT)を使って収集される。最初の探索のために、PDTメトリックは、学部のコンピュータサイエンス教育で使用されるゲーミフィケーションシステムであるZORQのユーザに与えられた。収集したPDTデータは,複数のLDM(Claude Sonnet 3,3.5,GPT4,GPT4o)と,主要な伝達学習技術であるTwitter-Roberta-Base-Sentiment(TRBS)と,主要な感情分析ツールであるVaderを通じて,定量的感情分析を行った。各システムは、まず、PDTワード/説明ペアで表現された感情と、ユーザがグループで表現した5つの単語と説明の感情を総合的に見ることによって、データを2つの方法で評価するよう求められた。各LSMは、感情スコアの信頼度(低、中、高)と、なぜ感情値を選んだのかの説明も求められた。テストされたすべてのLSMは、ユーザのグループ化されたデータから統計的にユーザ感情を検出できたが、TRBSとVaderはそうではなかった。 LLMが提供した信頼感と説明は、ユーザの感情を理解するのに役立った。本研究は、暗黙の感情を定量化する普遍的なツールを開発することを目的として、ユーザエクスペリエンスを評価することの理解を深める。 This study explores the use of several LLMs for providing quantitative zero-shot sentiment analysis of implicit software desirability expressed by users. The study provides scaled numerical sentiment analysis unlike other methods that simply classify sentiment as positive, neutral, or negative. Numerical analysis provides deeper insights into the magnitude of sentiment, to drive better decisions regarding product desirability. Data is collected through the use of the Microsoft Product Desirability Toolkit (PDT), a well-known qualitative user experience analysis tool. For initial exploration, the PDT metric was given to users of ZORQ, a gamification system used in undergraduate computer science education. The PDT data collected was fed through several LLMs (Claude Sonnet 3 and 3.5, GPT4, and GPT4o) and through a leading transfer learning technique, Twitter-Roberta-Base-Sentiment (TRBS), and through Vader, a leading sentiment analysis tool, for quantitative sentiment analysis. Each system was asked to evaluate the data in two ways, first by looking at the sentiment expressed in the PDT word/explanation pairs; and by looking at the sentiment expressed by the users in their grouped selection of five words and explanations, as a whole. Each LLM was also asked to provide its confidence (low, medium, high) in its sentiment score, along with an explanation of why it selected the sentiment value. All LLMs tested were able to statistically detect user sentiment from the users' grouped data, whereas TRBS and Vader were not. The confidence and explanation of confidence provided by the LLMs assisted in understanding the user sentiment. This study adds to a deeper understanding of evaluating user experiences, toward the goal of creating a universal tool that quantifies implicit sentiment expressed.	翻訳日:2024-08-06 19:49:47 公開日:2024-08-02
# 多変量グランガー因果関係は、多変数・動的生物学的決定ネットワークモデルの相互接続を検出することができるか? Can multivariate Granger causality detect directed connectivity of a multistable and dynamic biological decision network model? ( http://arxiv.org/abs/2408.01528v1 ) ライセンス: Link先を確認	Abdoreza Asadpour, KongFatt Wong-Lin,	(参考訳) 因果関係の抽出は、解釈可能なAIと機械学習を前進させる。 Granger causality (GC) は、信号間の直接影響(DC)を推定するための頑健な統計手法である。 GCは、生物学的ニューラルネットワークやその他の領域における神経信号の解析に広く応用されているが、その複雑で非線形で多安定なニューラルネットワークへの応用は、あまり研究されていない。本研究では, 実時間決定不確実性モニタリングを用いた実時間多変数決定ニューラルネットワークモデルにおいて, 全ノードの時系列神経活動に時間領域多変量グランガー因果性(MVGC)を適用した。解析の結果,入力信号が密に一致し得る2方向決定に挑戦し,より微細なスライディング時間窓の適切な適用により,元のモデルのDCが容易に明らかになることがわかった。さらに、同定されたDCは、ネットワークが正しいかエラーかによって異なる。異なる決定結果から識別されたDCを統合することで、いくつかの急激で欠落した接続性にもかかわらず、元のモデルのアーキテクチャの大半を回復した。このアプローチは、ニューラルネットワークのダイナミクスと結果の異なるフェーズにわたって因果関係を明らかにすることによって、動的マルチスタブルおよび非線形生物学的またはAIシステムの解釈可能性と透明性を高めるための最初の調査として使用することができる。 Extracting causal connections can advance interpretable AI and machine learning. Granger causality (GC) is a robust statistical method for estimating directed influences (DC) between signals. While GC has been widely applied to analysing neuronal signals in biological neural networks and other domains, its application to complex, nonlinear, and multistable neural networks is less explored. In this study, we applied time-domain multi-variate Granger causality (MVGC) to the time series neural activity of all nodes in a trained multistable biologically based decision neural network model with real-time decision uncertainty monitoring. Our analysis demonstrated that challenging two-choice decisions, where input signals could be closely matched, and the appropriate application of fine-grained sliding time windows, could readily reveal the original model's DC. Furthermore, the identified DC varied based on whether the network had correct or error decisions. Integrating the identified DC from different decision outcomes recovered most of the original model's architecture, despite some spurious and missing connectivity. This approach could be used as an initial exploration to enhance the interpretability and transparency of dynamic multistable and nonlinear biological or AI systems by revealing causal connections throughout different phases of neural network dynamics and outcomes.	翻訳日:2024-08-06 19:40:03 公開日:2024-08-02
# 液相FTIRと機械学習を用いた持続可能な航空燃料特性予測のための構造的枠組み A Structured Framework for Predicting Sustainable Aviation Fuel Properties using Liquid-Phase FTIR and Machine Learning ( http://arxiv.org/abs/2408.01530v1 ) ライセンス: Link先を確認	Ana E. Comesana, Sharon S. Chen, Kyle E. Niemeyer, Vi H. Rapp,	(参考訳) 持続可能な航空燃料は、排出と環境への影響を減らす可能性がある。持続可能な航空燃料を特定し、研究を加速するために、関連する物理化学的特性を予測するためにいくつかの機械学習モデルが開発されている。しかし、多くのモデルは適用可能性に制限があり、スペクトル範囲が制限された複雑な分析技術からのデータを利用するか、解釈可能性に制限のある特徴分解手法を使用する。本研究では, 液体相フーリエ変換赤外(FTIR)スペクトルを用いて, クリーン分子, 航空燃料, ブレンドの高精度かつ解釈可能な特性予測モデルを構築するための構造化手法を提案する。液相FTIRスペクトル測定は、信頼性、感度、成分特異性を2mL未満の試料を用いて、迅速かつ一貫して収集することができる。この方法は、FTIRスペクトルを非負行列因子化(NMF)を用いて基本構造ブロックに分解し、FTIRスペクトル特性と燃料特性の科学的解析を可能にする。 NMFの機能は、最終沸点、点火点、凍結点、密度15C、運動粘度-20Cを予測するための5つのアンサンブルモデルを作成するために使用される。全てのモデルは、きれいな分子、航空燃料、ブレンドからの実験的な特性データを用いて訓練された。これらのモデルは、機能基や化学クラスなどの燃料の組成要素間の関係の解釈を可能にしながら、特性を正確に予測する。持続可能な航空燃料研究開発を支援するため、モデルとデータはインタラクティブなウェブツールで利用可能である。 Sustainable aviation fuels have the potential for reducing emissions and environmental impact. To help identify viable sustainable aviation fuels and accelerate research, several machine learning models have been developed to predict relevant physiochemical properties. However, many of the models have limited applicability, leverage data from complex analytical techniques with confined spectral ranges, or use feature decomposition methods that have limited interpretability. Using liquid-phase Fourier Transform Infrared (FTIR) spectra, this study presents a structured method for creating accurate and interpretable property prediction models for neat molecules, aviation fuels, and blends. Liquid-phase FTIR spectra measurements can be collected quickly and consistently, offering high reliability, sensitivity, and component specificity using less than 2 mL of sample. The method first decomposes FTIR spectra into fundamental building blocks using Non-negative Matrix Factorization (NMF) to enable scientific analysis of FTIR spectra attributes and fuel properties. The NMF features are then used to create five ensemble models for predicting final boiling point, flash point, freezing point, density at 15C, and kinematic viscosity at -20C. All models were trained using experimental property data from neat molecules, aviation fuels, and blends. The models accurately predict properties while enabling interpretation of relationships between compositional elements of a fuel, such as functional groups or chemical classes, and its properties. To support sustainable aviation fuel research and development, the models and data are available on an interactive web tool.	翻訳日:2024-08-06 19:40:03 公開日:2024-08-02
# 視覚的深度検出と位置推定のための文脈的クロスモーダルアテンション Contextual Cross-Modal Attention for Audio-Visual Deepfake Detection and Localization ( http://arxiv.org/abs/2408.01532v1 ) ライセンス: Link先を確認	Vinaya Sree Katamneni, Ajita Rattani,	(参考訳) デジタル時代には、ディープフェイクや合成メディアの出現は、社会的・政治的整合性に対する重大な脅威となる。オーディオ視覚のようなマルチモーダル操作に基づくディープフェイクは、より現実的であり、より大きな脅威をもたらす。現在のマルチモーダルディープフェイク検出器は、注意に基づく複数のモーダルからの異種データストリームの融合に基づいていることが多い。しかし、データ(音声や視覚信号など)の異種性は、分散モダリティのギャップを生じさせ、効果的な融合やマルチモーダルディープフェイク検出において重要な課題を生じさせる。本稿では,音声・視覚的ディープフェイク検出にコンテキスト情報を活用する,リカレントニューラルネットワーク(RNN)に基づく新しいマルチモーダルアテンションフレームワークを提案する。提案手法はマルチモーダルなマルチシーケンス表現に注意を払い、深度検出と局所化に寄与する特徴を学習する。 FakeAVCeleb, AV-Deepfake1M, TVIL, LAV-DFといったオーディオ・ビジュアルディープフェイク・データセットに対する実験的検証を行い, 本手法の有効性を実証した。本研究との相互比較により, 深度検出と局所化の精度が3.47%, 精度が2.05%向上した。したがって、最先端のパフォーマンスを得る。再現性を促進するため、コードとデータセット情報はhttps://github.com/vcbsl/audiovisual-deepfake/で公開されている。 In the digital age, the emergence of deepfakes and synthetic media presents a significant threat to societal and political integrity. Deepfakes based on multi-modal manipulation, such as audio-visual, are more realistic and pose a greater threat. Current multi-modal deepfake detectors are often based on the attention-based fusion of heterogeneous data streams from multiple modalities. However, the heterogeneous nature of the data (such as audio and visual signals) creates a distributional modality gap and poses a significant challenge in effective fusion and hence multi-modal deepfake detection. In this paper, we propose a novel multi-modal attention framework based on recurrent neural networks (RNNs) that leverages contextual information for audio-visual deepfake detection. The proposed approach applies attention to multi-modal multi-sequence representations and learns the contributing features among them for deepfake detection and localization. Thorough experimental validations on audio-visual deepfake datasets, namely FakeAVCeleb, AV-Deepfake1M, TVIL, and LAV-DF datasets, demonstrate the efficacy of our approach. Cross-comparison with the published studies demonstrates superior performance of our approach with an improved accuracy and precision by 3.47% and 2.05% in deepfake detection and localization, respectively. Thus, obtaining state-of-the-art performance. To facilitate reproducibility, the code and the datasets information is available at https://github.com/vcbsl/audiovisual-deepfake/.	翻訳日:2024-08-06 19:40:03 公開日:2024-08-02
# 効率的な深部ニューラルネットワーク圧縮のための適応型テンソルトレイン分解法 An Adaptive Tensor-Train Decomposition Approach for Efficient Deep Neural Network Compression ( http://arxiv.org/abs/2408.01534v1 ) ライセンス: Link先を確認	Shiyi Luo, Mingshuo Liu, Pu Sun, Yifeng Yu, Shangping Ren, Yu Bai,	(参考訳) モデル圧縮の分野では、モデル圧縮率と効率のバランスをとるために、テンソル分解に適したランクを選択することが重要である。しかし、この選択は手動でも最適化ベースの自動手法でも、しばしば計算複雑性を増大させる。手動のランク選択は効率とスケーラビリティに欠けており、しばしば大規模な試行錯誤を必要とするが、最適化ベースの自動手法は計算負担を大幅に増加させる。そこで我々は,Layer-Wise Imprinting Quantitation (LWIQ) を用いた,効率的なモデル圧縮のための新しい,自動かつ予算を考慮したランク選択手法を提案する。 LWIQは、プロキシ分類器を統合することにより、ニューラルネットワーク内の各レイヤの意義を定量化する。この分類器は、レイヤーが全体的なモデル性能に与える影響を評価し、テンソルランクのより詳細な調整を可能にする。さらに,提案手法は,計算予算の制約に適合するスケーリング係数を含む。この予算意識は、異なる予算シナリオに対する反復的なランク再計算の必要性を排除します。 CIFAR-10データセットによる実験結果から,LWIQのランク検索効率は63.2$\%に向上し,ResNet-56モデルでは3.2倍のモデルサイズで0.86$\%に低下した。 In the field of model compression, choosing an appropriate rank for tensor decomposition is pivotal for balancing model compression rate and efficiency. However, this selection, whether done manually or through optimization-based automatic methods, often increases computational complexity. Manual rank selection lacks efficiency and scalability, often requiring extensive trial-and-error, while optimization-based automatic methods significantly increase the computational burden. To address this, we introduce a novel, automatic, and budget-aware rank selection method for efficient model compression, which employs Layer-Wise Imprinting Quantitation (LWIQ). LWIQ quantifies each layer's significance within a neural network by integrating a proxy classifier. This classifier assesses the layer's impact on overall model performance, allowing for a more informed adjustment of tensor rank. Furthermore, our approach includes a scaling factor to cater to varying computational budget constraints. This budget awareness eliminates the need for repetitive rank recalculations for different budget scenarios. Experimental results on the CIFAR-10 dataset show that our LWIQ improved by 63.2$\%$ in rank search efficiency, and the accuracy only dropped by 0.86$\%$ with 3.2x less model size on the ResNet-56 model as compared to the state-of-the-art proxy-based automatic tensor rank selection method.	翻訳日:2024-08-06 19:40:03 公開日:2024-08-02
# ニューラルPDE解の能動学習 Active Learning for Neural PDE Solvers ( http://arxiv.org/abs/2408.01536v1 ) ライセンス: Link先を確認	Daniel Musekamp, Marimuthu Kalimuthu, David Holzmüller, Makoto Takamoto, Mathias Niepert,	(参考訳) 偏微分方程式 (PDE) の解法は工学と科学の基本的な問題である。ニューラルPDEソルバは、確立された数値ソルバよりも効率がよいが、取得にコストがかかる大量のトレーニングデータを必要とすることが多い。アクティブラーニング(AL)は、より情報のある初期条件とPDEパラメータで古典的解法をクエリすることで、より小さなトレーニングセットでモデルが同じ精度に達するのに役立つ。 ALは他の領域ではより一般的であるが、神経性PDE解法についてはまだ広く研究されていない。このギャップを埋めるために、モジュール的で拡張可能なアクティブラーニングベンチマークであるAL4PDEを導入する。複数のパラメトリックPDEと最先端サロゲートモデルを提供し、PDE解決のための既存手法の評価と新たなAL手法の開発を可能にする。このベンチマークを用いて、不確実性や特徴に基づく手法のようなバッチアクティブな学習アルゴリズムを評価する。 ALは,ランダムサンプリングと比較して平均誤差を最大71%削減し,最悪のケースエラーを著しく低減することを示した。さらにALは、PDEパラメータと初期条件を一貫した分布で、繰り返し実行される複数の類似データセットを生成する。取得したデータセットは再利用可能であり、データ生成に関与しないモデルのサロゲートにメリットを提供する。 Solving partial differential equations (PDEs) is a fundamental problem in engineering and science. While neural PDE solvers can be more efficient than established numerical solvers, they often require large amounts of training data that is costly to obtain. Active Learning (AL) could help surrogate models reach the same accuracy with smaller training sets by querying classical solvers with more informative initial conditions and PDE parameters. While AL is more common in other domains, it has yet to be studied extensively for neural PDE solvers. To bridge this gap, we introduce AL4PDE, a modular and extensible active learning benchmark. It provides multiple parametric PDEs and state-of-the-art surrogate models for the solver-in-the-loop setting, enabling the evaluation of existing and the development of new AL methods for PDE solving. We use the benchmark to evaluate batch active learning algorithms such as uncertainty- and feature-based methods. We show that AL reduces the average error by up to 71% compared to random sampling and significantly reduces worst-case errors. Moreover, AL generates similar datasets across repeated runs, with consistent distributions over the PDE parameters and initial conditions. The acquired datasets are reusable, providing benefits for surrogate models not involved in the data generation.	翻訳日:2024-08-06 19:40:03 公開日:2024-08-02
# SceneMotion: エージェント中心の埋め込みからScene-Wide予測へ SceneMotion: From Agent-Centric Embeddings to Scene-Wide Forecasts ( http://arxiv.org/abs/2408.01537v1 ) ライセンス: Link先を確認	Royden Wagner, Ömer Sahin Tas, Marlon Steiner, Fabian Konstantinidis, Hendrik Königshof, Marvin Klemp, Carlos Fernandez, Christoph Stiller,	(参考訳) 自動運転車は、環境と効果的に対話し、安全な操作を計画するために、マルチモーダルな動き予測に依存している。我々は、複数の交通機関のシーンワイド・モーション・モードを予測するアテンションベースモデルであるSceneMotionを紹介する。我々のモデルは,局所エージェント中心の埋め込みを,新しい潜在コンテキストモジュールを用いてシーンワイドな予測に変換する。このモジュールは複数のエージェント中心の埋め込みからシーン全体の潜在空間を学習し、共同予測と相互作用モデリングを可能にする。 Waymo Open Interaction Prediction Challengeの競合性能は、我々のアプローチの有効性を示している。さらに、エージェント間の相互作用を定量化するために、時間と空間で将来のウェイポイントをクラスタ化する。すべてのモードをマージし、各モードを独立して分析し、相互作用によってどのクラスタが解決されたかを決定します。私たちの実装は、https://github.com/kit-mrt/future-motion.comで利用可能です。 Self-driving vehicles rely on multimodal motion forecasts to effectively interact with their environment and plan safe maneuvers. We introduce SceneMotion, an attention-based model for forecasting scene-wide motion modes of multiple traffic agents. Our model transforms local agent-centric embeddings into scene-wide forecasts using a novel latent context module. This module learns a scene-wide latent space from multiple agent-centric embeddings, enabling joint forecasting and interaction modeling. The competitive performance in the Waymo Open Interaction Prediction Challenge demonstrates the effectiveness of our approach. Moreover, we cluster future waypoints in time and space to quantify the interaction between agents. We merge all modes and analyze each mode independently to determine which clusters are resolved through interaction or result in conflict. Our implementation is available at: https://github.com/kit-mrt/future-motion	翻訳日:2024-08-06 19:40:03 公開日:2024-08-02
# 3D $\mathcal{N}=1$ Supergravity from Virasoro TQFT: Gravitational partition function and Out-of-time-order correlator 3D $\mathcal{N}=1$ supergravity from Virasoro TQFT: Gravitational partition function and Out-of-time-order correlator ( http://arxiv.org/abs/2408.01538v1 ) ライセンス: Link先を確認	Arpan Bhattacharyya, Saptaswa Ghosh, Poulami Nandi, Sounak Pal,	(参考訳) 本論文では,超ビラソーロTQFTを用いて,球面とトーラスの異なる境界位相に対して$\mathcal{N}=1$ SUGRAの分割関数を計算する。我々はスーパーリウヴィル理論の融合とモジュラー核を用いてネックレス-チャネル共形ブロックを計算し、内積がヒルベルト空間の状態として定義される超共形ブロックに対して成り立つことを証明して形式主義を示す。最後に,スーパービラソーロTQFTのツールを用いてトーラストポロジの時間外相関を計算し,その早期挙動について検討する。 In this paper, we compute the partition functions of $\mathcal{N}=1$ SUGRA for different boundary topologies, i.e. sphere and torus, using super-Virasoro TQFT. We use fusion and modular kernels of the super-Liouville theory to compute the necklace-channel conformal block and showcase formalism by proving that the inner product holds for superconformal blocks, defined as states in the Hilbert space. Finally, we compute the out-of-time-order correlator for the torus topology with superconformal primary insertions as matter using the tools of super-Virasoro TQFT and investigate its early-time behaviour.	翻訳日:2024-08-06 19:40:03 公開日:2024-08-02
# 画像品質のガーディアン:画像品質指標に対する敵対的攻撃に対する防御のベンチマーク Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality Metrics ( http://arxiv.org/abs/2408.01541v1 ) ライセンス: Link先を確認	Alexander Gushchin, Khaled Abud, Georgii Bychkov, Ekaterina Shumitskaya, Anna Chistyakova, Sergey Lavrushkin, Bader Rasheed, Kirill Malyshev, Dmitriy Vatolin, Anastasia Antsiferova,	(参考訳) 画像品質評価(IQA)の分野では、メトリクスの対角的堅牢性が重要な関心事となっている。本稿では、IQAに対する敵攻撃の増加に対応する防衛機構の総合的なベンチマーク研究について述べる。本研究は, 敵の浄化, 敵の訓練, 確証された堅牢性手法を含む25の防衛戦略を体系的に評価する。非適応性および適応性の両方の設定において,14種類の逆攻撃アルゴリズムを適用し,これらの防御性を検証した。我々は、IQAのスコアと画質を保存すべきであるとして、ディフェンスとIQAタスクへの適用性の違いを分析した。提案されたベンチマークは、今後の開発をガイドし、新しいメソッドの提出を受け入れることを目的としており、最新の結果がオンラインで公開されている。 In the field of Image Quality Assessment (IQA), the adversarial robustness of the metrics poses a critical concern. This paper presents a comprehensive benchmarking study of various defense mechanisms in response to the rise in adversarial attacks on IQA. We systematically evaluate 25 defense strategies, including adversarial purification, adversarial training, and certified robustness methods. We applied 14 adversarial attack algorithms of various types in both non-adaptive and adaptive settings and tested these defenses against them. We analyze the differences between defenses and their applicability to IQA tasks, considering that they should preserve IQA scores and image quality. The proposed benchmark aims to guide future developments and accepts submissions of new methods, with the latest results available online: https://videoprocessing.ai/benchmarks/iqa-defenses.html.	翻訳日:2024-08-06 19:40:03 公開日:2024-08-02
# 非線形解析による心血管障害の心電図分類 Non-linear Analysis Based ECG Classification of Cardiovascular Disorders ( http://arxiv.org/abs/2408.01542v1 ) ライセンス: Link先を確認	Suraj Kumar Behera, Debanjali Bhattacharya, Ninad Aithal, Neelam Sinha,	(参考訳) マルチチャネル心電図による心疾患の検出は、心臓ケアと治療に影響を及ぼす。既存の手法の限界は、電極の位置によるECG波形の変化、信号の非線形性の高さ、ミリボルトの振幅測定などであった。本研究では,Recurrenceプロットの可視化を利用した非線形解析手法について報告する。 QRS複合体のようなよく定義された構造のパターン化は、再帰プロットを用いて効果的に利用することができる。この再帰的手法は、PhystoNetデータベースから公開されているPhysicalkalisch-Technische Bundesanstalt(PTB)データセットに適用され、心筋梗塞、分枝ブロック、心筋症、Dysrhythmiaの4種類の異なる心疾患と健康管理の分類精度を100%達成した。さらに、t-SNEプロットは、再帰プロットと再帰量子化分析の特徴から導かれる潜伏空間の埋め込みを可視化し、考慮された心疾患と健康な個人の間に明確な境界線が示され、このアプローチの可能性を実証している。 Multi-channel ECG-based cardiac disorders detection has an impact on cardiac care and treatment. Limitations of existing methods included variation in ECG waveforms due to the location of electrodes, high non-linearity in the signal, and amplitude measurement in millivolts. The present study reports a non-linear analysis-based methodology that utilizes Recurrence plot visualization. The patterned occurrence of well-defined structures, such as the QRS complex, can be exploited effectively using Recurrence plots. This Recurrence-based method is applied to the publicly available Physikalisch-Technische Bundesanstalt (PTB) dataset from PhysioNet database, where we studied four classes of different cardiac disorders (Myocardial infarction, Bundle branch blocks, Cardiomyopathy, and Dysrhythmia) and healthy controls, achieving an impressive classification accuracy of 100%. Additionally, t-SNE plot visualizations of the latent space embeddings derived from Recurrence plots and Recurrence Quantification Analysis features reveal a clear demarcation between the considered cardiac disorders and healthy individuals, demonstrating the potential of this approach.	翻訳日:2024-08-06 19:40:03 公開日:2024-08-02
# ウィンブルドン・オープン2023トーナメントデータに基づくモーメント・キャプチャーと予測システム Momentum Capture and Prediction System Based on Wimbledon Open2023 Tournament Data ( http://arxiv.org/abs/2408.01544v1 ) ライセンス: Link先を確認	Chang Liu, Tongyuan Yang, Yan Zhao,	(参考訳) テニスには隠れたエネルギーがあり、見ることも触ることもできない。ゲームの流れを制御する力であり、あらゆる種類の試合に存在している。この謎の力はモメンタムです。本研究では,エントロピー重み法(EWM)とグレイ関係解析(GRA)を相乗化して,運動量の影響を定量化する評価モデルを提案する。実験的な検証はマン=ホイットニーUとコルモゴロフ=スミルノフの実験によって行われ、p値は0.0043と0.00128となった。これらの結果は、運動量シフトと一致結果の非ランダムな関連性を強調し、テニスにおける運動量の重要性を強調している。さもなくば、我々の調査は、高度な機械学習アルゴリズムXGBoostとSHAPフレームワークを組み合わせた予測モデルの作成である。このモデルにより、マッチスイングの精度を極めて高い精度で予測できる(複数試合で0.999013、決勝で0.992738)。本モデルでは,2点間距離の走行など,特定の要因が一致ダイナミクスに与える影響を同定し,その性能を実証し,グランドスラムの4つのトーナメントのデータセットを用いて,モデルの一般化性について徹底的に評価した。結果は,予測精度の微妙な変化にもかかわらず,異なる一致シナリオに対する顕著な適応性を示した。プレイヤーが相手の運動量の変化に効果的に対応し、競争力を高める戦略的洞察を提供する。 There is a hidden energy in tennis, which cannot be seen or touched. It is the force that controls the flow of the game and is present in all types of matches. This mysterious force is Momentum. This study introduces an evaluation model that synergizes the Entropy Weight Method (EWM) and Gray Relation Analysis (GRA) to quantify momentum's impact on match outcomes. Empirical validation was conducted through Mann-Whitney U and Kolmogorov-Smirnov tests, which yielded p values of 0.0043 and 0.00128,respectively. These results underscore the non-random association between momentum shifts and match outcomes, highlighting the critical role of momentum in tennis. Otherwise, our investigation foucus is the creation of a predictive model that combines the advanced machine learning algorithm XGBoost with the SHAP framework. This model enables precise predictions of match swings with exceptional accuracy (0.999013 for multiple matches and 0.992738 for finals). The model's ability to identify the influence of specific factors on match dynamics,such as bilateral distance run during points, demonstrates its prowess.The model's generalizability was thoroughly evaluated using datasets from the four Grand Slam tournaments. The results demonstrate its remarkable adaptability to different match scenarios,despite minor variations in predictive accuracy. It offers strategic insights that can help players effectively respond to opponents' shifts in momentum,enhancing their competitive edge.	翻訳日:2024-08-06 19:40:03 公開日:2024-08-02
# 摂動Floquet-Clifford回路における演算子空間の断片化 Operator space fragmentation in perturbed Floquet-Clifford circuits ( http://arxiv.org/abs/2408.01545v1 ) ライセンス: Link先を確認	Marcell D. Kovács, Christopher J. Turner, Lluis Masanes, Arijeet Pal,	(参考訳) フロッケ量子回路は、幅広い非平衡量子状態を実現し、量子カオス、トポロジカル秩序、局在を示す。本研究では,ランダムなフロケ・クリフォード回路における演算子の局所化の安定性とカオスの出現を,クリフォード極限から遠ざかるユニタリ摂動によって検討する。レンガ加工パターンを用いた最寄りのクリフォード回路を構築し,不規則な非クリフォードゲートの影響について検討する。摂動は、各キュービットに確率$p$のシングルキュービットユニタリから一様にサンプリングされる。相互作用モデルでは, 壁面配置の出現により, 作用素空間が非連結領域に分解されることが特徴である0 \le p < 1$に対して, 作用素の強い局所化が示される。このような壁は、我々が正確に構築した回路に対して、創発的な局所的な運動積分をもたらす。一般摂動に対する局所化の安定性を解析的に確立し、調整可能な演算子の平均長を$p$で計算する。我々の回路は任意の二分割で分離できないが、作用素の局所化が絡み合いのボトルネックに繋がることを示す。最後に、スペクトル形状因子(SFF)を用いて、演算子フラグメントのカオス特性とスペクトル変動を非エルゴディディティのプローブとして特徴付ける。 p = 1$モデルにおいて、断片化時間スケールの出現は、後にSFFが円のユニタリアンサンブルによって近似できるようなランダム行列理論が成立する前に見出される。我々の研究は、現在のNISQデバイスで実現可能な演算子力学と回路エルゴディディティにおける量子位相の明示的な記述を提供する。 Floquet quantum circuits are able to realise a wide range of non-equilibrium quantum states, exhibiting quantum chaos, topological order and localisation. In this work, we investigate the stability of operator localisation and emergence of chaos in random Floquet-Clifford circuits subjected to unitary perturbations which drive them away from the Clifford limit. We construct a nearest-neighbour Clifford circuit with a brickwork pattern and study the effect of including disordered non-Clifford gates. The perturbations are uniformly sampled from single-qubit unitaries with probability $p$ on each qubit. We show that the interacting model exhibits strong localisation of operators for $0 \le p < 1$ that is characterised by the fragmentation of operator space into disjoint sectors due to the appearance of wall configurations. Such walls give rise to emergent local integrals of motion for the circuit that we construct exactly. We analytically establish the stability of localisation against generic perturbations and calculate the average length of operator spreading tunable by $p$. Although our circuit is not separable across any bi-partition, we further show that the operator localisation leads to an entanglement bottleneck, where initially unentangled states remain weakly entangled across typical fragment boundaries. Finally, we study the spectral form factor (SFF) to characterise the chaotic properties of the operator fragments and spectral fluctuations as a probe of non-ergodicity. In the $p = 1$ model, the emergence of a fragmentation time scale is found before random matrix theory sets in after which the SFF can be approximated by that of the circular unitary ensemble. Our work provides an explicit description of quantum phases in operator dynamics and circuit ergodicity which can be realised on current NISQ devices.	翻訳日:2024-08-06 19:40:03 公開日:2024-08-02
# ポイントクラウドセグメンテーションのためのトレーニング可能なポイントワイズデコーダモジュール Trainable Pointwise Decoder Module for Point Cloud Segmentation ( http://arxiv.org/abs/2408.01548v1 ) ライセンス: Link先を確認	Bike Chen, Chen Gong, Antti Tikanmäki, Juha Röning,	(参考訳) ポイントクラウドセグメンテーション(PCS)は、ポイントごとの予測を行い、ロボットや自動運転車が環境を理解することを可能にすることを目的としている。レンジ画像は大規模屋外点雲の密度の高い表現であり、画像上に構築されたセグメンテーションモデルは一般的に効率的に実行される。しかし、複数の点が同じ位置に投影されているにもかかわらず、各画像座標において1つの点だけが保持されるため、遠距離画像への点雲の投影は必然的に落下する。さらに重要なのは、保持されたポイントクラスとは異なるクラスに属するドロップポイントに正しい予測を割り当てることは困難である。さらに、K-nearest neighbor(KNN)探索やカーネルポイント畳み込み(KPConv)のような既存の後処理手法では、エンド・ツー・エンドでモデルでトレーニングすることはできず、また、様々な密度の屋外ポイント・クラウドを適切に処理できないため、モデルが準最適性能を達成することができる。この問題を軽減するために,近隣住民から重み付けされた特徴を収集し,問合せ点の最終的な予測を行う後処理手法として,訓練可能なポイントワイドデコーダモジュール(PDM)を提案する。さらに,データ拡張において仮想レンジ画像誘導コピーロートペースト(VRCrop)戦略を導入する。 VRCropは、ポイントの総数を制限し、拡張ポイントクラウドにおける望ましくないアーティファクトを排除します。 PDMとVRCropでは、既存のレンジイメージベースのセグメンテーションモデルは、SemanticKITTI、SemanticPOSS、nuScenesデータセットのそれよりも一貫してパフォーマンスが向上している。 Point cloud segmentation (PCS) aims to make per-point predictions and enables robots and autonomous driving cars to understand the environment. The range image is a dense representation of a large-scale outdoor point cloud, and segmentation models built upon the image commonly execute efficiently. However, the projection of the point cloud onto the range image inevitably leads to dropping points because, at each image coordinate, only one point is kept despite multiple points being projected onto the same location. More importantly, it is challenging to assign correct predictions to the dropped points that belong to the classes different from the kept point class. Besides, existing post-processing methods, such as K-nearest neighbor (KNN) search and kernel point convolution (KPConv), cannot be trained with the models in an end-to-end manner or cannot process varying-density outdoor point clouds well, thereby enabling the models to achieve sub-optimal performance. To alleviate this problem, we propose a trainable pointwise decoder module (PDM) as the post-processing approach, which gathers weighted features from the neighbors and then makes the final prediction for the query point. In addition, we introduce a virtual range image-guided copy-rotate-paste (VRCrop) strategy in data augmentation. VRCrop constrains the total number of points and eliminates undesirable artifacts in the augmented point cloud. With PDM and VRCrop, existing range image-based segmentation models consistently perform better than their counterparts on the SemanticKITTI, SemanticPOSS, and nuScenes datasets.	翻訳日:2024-08-06 19:40:03 公開日:2024-08-02
# GANに基づく教師なしマニピュレーションによるマルチタスクSAR画像処理 Multi-task SAR Image Processing via GAN-based Unsupervised Manipulation ( http://arxiv.org/abs/2408.01553v1 ) ライセンス: Link先を確認	Xuran Hu, Mingzhe Zhu, Ziqiang Xu, Zhenpeng Feng, Ljubisa Stankovic,	(参考訳) GAN(Generative Adversarial Networks)は、データ分布の学習パターンにより、多数の現実的なSAR画像を合成する大きな可能性を示している。一部のGANは遅延コードを導入して画像編集を実現し、SAR画像処理において大きな可能性を証明している。従来のSAR画像処理法と比較して、GAN遅延空間制御に基づく編集は完全に教師なしであり、ラベル付きデータなしで画像処理を行うことができる。さらに、データから抽出された情報はより解釈可能である。本稿では,GANをベースとしたUnsupervised Editing (GUE) と呼ばれる新しいSAR画像処理フレームワークを提案し,(1)GANラテント空間における意味的方向の分離と意味的方向の発見,(2)複数の画像処理機能を実現しつつ総合的なSAR画像処理フレームワークの構築,という2つの課題に対処する。 GUEの実装において、慎重に設計されたネットワークをトレーニングすることで、GANラテント空間における絡み合った意味方向を分解する。さらに,複数のSAR画像処理タスク(非特定,ローカライゼーション,補助識別,回転編集など)を1つのトレーニングプロセスで行うことができる。大規模実験により提案手法の有効性が検証された。 Generative Adversarial Networks (GANs) have shown tremendous potential in synthesizing a large number of realistic SAR images by learning patterns in the data distribution. Some GANs can achieve image editing by introducing latent codes, demonstrating significant promise in SAR image processing. Compared to traditional SAR image processing methods, editing based on GAN latent space control is entirely unsupervised, allowing image processing to be conducted without any labeled data. Additionally, the information extracted from the data is more interpretable. This paper proposes a novel SAR image processing framework called GAN-based Unsupervised Editing (GUE), aiming to address the following two issues: (1) disentangling semantic directions in the GAN latent space and finding meaningful directions; (2) establishing a comprehensive SAR image processing framework while achieving multiple image processing functions. In the implementation of GUE, we decompose the entangled semantic directions in the GAN latent space by training a carefully designed network. Moreover, we can accomplish multiple SAR image processing tasks (including despeckling, localization, auxiliary identification, and rotation editing) in a single training process without any form of supervision. Extensive experiments validate the effectiveness of the proposed method.	翻訳日:2024-08-06 19:40:03 公開日:2024-08-02
# 部分的表面触覚イメージングによる胃癌ポリープのロボットによる機械学習診断 Robot-Enabled Machine Learning-Based Diagnosis of Gastric Cancer Polyps Using Partial Surface Tactile Imaging ( http://arxiv.org/abs/2408.01554v1 ) ライセンス: Link先を確認	Siddhartha Kapuria, Jeff Bonyun, Yash Kulkarni, Naruhiko Ikoma, Sandeep Chinchali, Farshid Alambeigi,	(参考訳) 本稿では, 進行胃癌(AGC)の内視鏡診断における既存の限界に対処するために, はじめて提案する。 (i)最近開発されたビジョンベース触覚センサ(VTS)の利用と評価 (II) テクスチャ特徴を用いた腫瘍の分類のための補完的機械学習(ML)アルゴリズム。 7台のDoFロボットマニピュレータと、独自に設計され、追加で製造された現実的なAGC腫瘍ファントムを活用し、従来のMLベースのアプローチで発生するデータ不足とバイアスの問題に対処するVTSを用いた自動データ収集の利点を実証した。合成データ学習型MLモデルは, 各種統計指標を用いた従来のMLモデルと比較して, 混合形態特性および部分センサ接触下においても評価し, 比較した。 In this paper, to collectively address the existing limitations on endoscopic diagnosis of Advanced Gastric Cancer (AGC) Tumors, for the first time, we propose (i) utilization and evaluation of our recently developed Vision-based Tactile Sensor (VTS), and (ii) a complementary Machine Learning (ML) algorithm for classifying tumors using their textural features. Leveraging a seven DoF robotic manipulator and unique custom-designed and additively-manufactured realistic AGC tumor phantoms, we demonstrated the advantages of automated data collection using the VTS addressing the problem of data scarcity and biases encountered in traditional ML-based approaches. Our synthetic-data-trained ML model was successfully evaluated and compared with traditional ML models utilizing various statistical metrics even under mixed morphological characteristics and partial sensor contact.	翻訳日:2024-08-06 19:40:03 公開日:2024-08-02
# 強化膝キネマティクス:3次元インプラントモデリングのためのディープラーニングとモーフィングアルゴリズムの活用 Enhanced Knee Kinematics: Leveraging Deep Learning and Morphing Algorithms for 3D Implant Modeling ( http://arxiv.org/abs/2408.01557v1 ) ライセンス: Link先を確認	Viet-Dung Nguyen, Michael T. LaCour, Richard D. Komistek,	(参考訳) 移植膝モデルの正確な再建は, 整形外科手術や生体工学, 術前計画の強化, インプラント設計の最適化, 手術成績の向上に不可欠である。伝統的な手法は、労働集約的かつエラーを起こしやすい手作業のセグメンテーションに依存している。本研究では, 人工膝の正確な3次元再構築のための機械学習アルゴリズムとモーフィング技術を用いた新しいアプローチを提案する。この手法は、患者の膝関節の蛍光画像やX線画像などの術前画像を取得することから始まる。その後、畳み込みニューラルネットワーク(CNN)が訓練され、インプラントされたコンポーネントの大腿骨の輪郭を自動的に分割し、手作業を大幅に削減し、高い精度を確保する。セグメント化後, 変形アルゴリズムは人工膝関節のパーソナライズされた3次元モデルを生成する。このアルゴリズムは膝関節の形状をシミュレートするためにインプラントの位置、大きさ、方向を考慮している。形態データをインプラント固有のパラメータと統合することにより、再建されたモデルは患者のインプラント解剖と構成を正確に反映する。提案手法の有効性は, 基礎的真理データと既存手法との比較など, 定量的評価によって実証される。各種インプラント型を含む19の試験例において、MLベースのセグメンテーション法は手動セグメンテーションよりも精度と一貫性が優れ、平均RMS誤差は0.58 +/- 0.14 mmであった。本研究は, 移植膝モデルの自動再建のための頑健な枠組みを提供することにより整形外科手術を進展させる。 MLとフォーミングアルゴリズムを活用することで、臨床医や研究者は患者固有の膝解剖、インプラントバイオメカニクス、および手術計画に関する貴重な洞察を得ることができ、患者の成果の改善とケアの質の向上につながる。 Accurate reconstruction of implanted knee models is crucial in orthopedic surgery and biomedical engineering, enhancing preoperative planning, optimizing implant design, and improving surgical outcomes. Traditional methods rely on labor-intensive and error-prone manual segmentation. This study proposes a novel approach using machine learning (ML) algorithms and morphing techniques for precise 3D reconstruction of implanted knee models. The methodology begins with acquiring preoperative imaging data, such as fluoroscopy or X-ray images of the patient's knee joint. A convolutional neural network (CNN) is then trained to automatically segment the femur contour of the implanted components, significantly reducing manual effort and ensuring high accuracy. Following segmentation, a morphing algorithm generates a personalized 3D model of the implanted knee joint, using the segmented data and biomechanical principles. This algorithm considers implant position, size, and orientation to simulate the knee joint's shape. By integrating morphological data with implant-specific parameters, the reconstructed models accurately reflect the patient's implant anatomy and configuration. The approach's effectiveness is demonstrated through quantitative evaluations, including comparisons with ground truth data and existing techniques. In 19 test cases involving various implant types, the ML-based segmentation method showed superior accuracy and consistency compared to manual segmentation, with an average RMS error of 0.58 +/- 0.14 mm. This research advances orthopedic surgery by providing a robust framework for the automated reconstruction of implanted knee models. Leveraging ML and morphing algorithms, clinicians and researchers gain valuable insights into patient-specific knee anatomy, implant biomechanics, and surgical planning, leading to improved patient outcomes and enhanced quality of care.	翻訳日:2024-08-06 19:40:03 公開日:2024-08-02
# 合成データを用いた深層学習モデルによる領域認識電子顕微鏡解析の高速化と画像-Wide Confidence Scoring Accelerating Domain-Aware Electron Microscopy Analysis Using Deep Learning Models with Synthetic Data and Image-Wide Confidence Scoring ( http://arxiv.org/abs/2408.01558v1 ) ライセンス: Link先を確認	Matthew J. Lynch, Ryan Jacobs, Gabriella Bruno, Priyam Patki, Dane Morgan, Kevin G. Field,	(参考訳) 機械学習(ML)モデルの統合は、顕微鏡における機能検出の効率性、可視性、信頼性を高めるが、その開発と適用性は、不足し、しばしば手動でラベル付けされたデータセットの欠陥とドメイン認識の欠如によって妨げられる。物理に基づく合成画像とデータジェネレータを作成することでこれらの課題に対処し、人間のラベル付きデータに基づいてトレーニングされたモデルに対して、同等の精度(0.86)、リコール(0.63)、F1スコア(0.71)、エンジニアリングプロパティ予測(R2=0.82)を実現する機械学習モデルを実現した。我々は,特徴予測信頼度スコアを用いて画像全体の信頼度を導出し,領域外画像の曖昧さを排除し,フィルタアウト率25%で5～30%の性能向上を実現した。本研究は,合成データがMLの人間依存を排除し,画像毎に多くの特徴を検出する必要がある場合に,ドメイン認識の手段を提供することを示す。 The integration of machine learning (ML) models enhances the efficiency, affordability, and reliability of feature detection in microscopy, yet their development and applicability are hindered by the dependency on scarce and often flawed manually labeled datasets and a lack of domain awareness. We addressed these challenges by creating a physics-based synthetic image and data generator, resulting in a machine learning model that achieves comparable precision (0.86), recall (0.63), F1 scores (0.71), and engineering property predictions (R2=0.82) to a model trained on human-labeled data. We enhanced both models by using feature prediction confidence scores to derive an image-wide confidence metric, enabling simple thresholding to eliminate ambiguous and out-of-domain images resulting in performance boosts of 5-30% with a filtering-out rate of 25%. Our study demonstrates that synthetic data can eliminate human reliance in ML and provides a means for domain awareness in cases where many feature detections per image are needed.	翻訳日:2024-08-06 19:30:18 公開日:2024-08-02
# 空間的均質モード選択モデルを用いたニューヨーク市インターボロー・エクスプレスの福祉・持続可能性・株式評価 Welfare, sustainability, and equity evaluation of the New York City Interborough Express using spatially heterogeneous mode choice models ( http://arxiv.org/abs/2408.01562v1 ) ライセンス: Link先を確認	Hai Yang, Hongying Wu, Lauren Whang, Xiyuan Ren, Joseph Y. J. Chow,	(参考訳) メトロポリタン・トランジット・オーソリティ(MTA)はクイーンズとブルックリンの間を高速で直通するインターボロー・エクスプレス(IBX)と呼ばれる新しいライトレールの建設を提案した。 IBXがニューヨーク(NYC)にもたらす潜在的影響を評価するために、オープンアクセスの都市全体の旅行アジェンダデータセットとブロックグループレベルのモード選択モデルが使用される。 IBXは28.1分を市内の潜在的な乗客に節約することができた。 IBXに近い地域に行くか出発する旅行者にとって、平均的な節約時間は29.7分と見積もられている。 IBXは完成後、毎日2万4千人(公式のIBX提案より69%高い)の乗車を予定している。その内、7万人以上(30.8%)が低所得世帯から、また165万人(64.7%)がIBX回廊で出発または終了する。 IBXの追加は、トランジットモードへの毎日5万回以上の旅行を惹きつけることとなり、そのうち16万回以上は民間車両の使用から切り替えられ、温室効果ガス(GHG)の排出量は1日あたり29.28トン削減された。 IBXは1回の旅行で1.25米ドル、低所得の旅行者が1回の旅行で最大1.64米ドルと見積もられている。低所得者にとって利益は比例的に高いが、消費者の余剰が人口平均の10%以下(すでにかなり低い)の旅行者の割合を著しく減らしているようには見えない。 The Metropolitan Transit Authority (MTA) proposed building a new light rail route called the Interborough Express (IBX) to provide a direct, fast transit linkage between Queens and Brooklyn. An open-access synthetic citywide trip agenda dataset and a block-group-level mode choice model are used to assess the potential impact IBX could bring to New York City (NYC). IBX could save 28.1 minutes to potential riders across the city. For travelers either going to or departing from areas close to IBX, the average time saving is projected to be 29.7 minutes. IBX is projected to have more than 254 thousand daily ridership after its completion (69% higher than reported in the official IBX proposal). Among those riders, more than 78 thousand people (30.8%) would come from low-income households while 165 thousand people (64.7%) would start or end along the IBX corridor. The addition of IBX would attract more than 50 thousand additional daily trips to transit mode, among which more than 16 thousand would be switched from using private vehicles, reducing potential greenhouse gas (GHG) emissions by 29.28 metric tons per day. IBX can also bring significant consumer surplus benefits to the communities, which are estimated to be $1.25 USD per trip, or as high as $1.64 per trip made by a low-income traveler. While benefits are proportionately higher for lower-income users, the service does not appear to significantly reduce the proportion of travelers whose consumer surpluses fall below 10% of the population average (already quite low).	翻訳日:2024-08-06 19:30:18 公開日:2024-08-02
# カメラモデルに基づく自己監督深度推定 Self-Supervised Depth Estimation Based on Camera Models ( http://arxiv.org/abs/2408.01565v1 ) ライセンス: Link先を確認	Jinchang Zhang, Praveen Kumar Reddy, Xue-Iuan Wong, Guoyu Lu,	(参考訳) 深さ推定はロボット工学と視覚関連タスクにとって重要なトピックである。単眼深度推定では、高価な地中レーティングを必要とする教師あり学習と比較して、自己教師あり手法はラベリングコストを伴わないため、大きな可能性を秘めている。しかし、自己教師付き学習は、深さ推定性能において教師付き学習と大きなギャップがある。一方、スケーリングは単眼で教師なし深度推定を行う上でも大きな問題であり、GPSやLiDAR、あるいは既存の地図からの地上の真理スケールを必要とすることが多い。ディープラーニング時代においては、既存の手法は主に教師なしニューラルネットワークを訓練するための画像関係の探索に依存しているが、カメラ自体の基本的な情報は一般的に無視されており、監視信号を提供するための余分な装置を必要とせずに、広範囲にわたる監視情報を無償で提供することができる。カメラ自体の内在と外在を生かして、物理的原理に基づいて地上と地上を結ぶ地域について深度情報を計算し、他のセンサーを使わずに自由に監視情報を提供する。この方法は容易に実現でき、教師なしのすべての方法の効果を高めるための構成要素となることができる。 Depth estimationn is a critical topic for robotics and vision-related tasks. In monocular depth estimation, in comparison with supervised learning that requires expensive ground truth labeling, self-supervised methods possess great potential due to no labeling cost. However, self-supervised learning still has a large gap with supervised learning in depth estimation performance. Meanwhile, scaling is also a major issue for monocular unsupervised depth estimation, which commonly still needs ground truth scale from GPS, LiDAR, or existing maps to correct. In deep learning era, while existing methods mainly rely on the exploration of image relationships to train the unsupervised neural networks, fundamental information provided by the camera itself has been generally ignored, which can provide extensive supervision information for free, without the need for any extra equipment to provide supervision signals. Utilizing the camera itself's intrinsics and extrinsics, depth information can be calculated for ground regions and regions connecting ground based on physical principles, providing free supervision information without any other sensors. The method is easy to realize and can be a component to enhance the effects of all the unsupervised methods.	翻訳日:2024-08-06 19:30:18 公開日:2024-08-02
# フルレンジヘッドポース幾何データ拡張 Full-range Head Pose Geometric Data Augmentations ( http://arxiv.org/abs/2408.01566v1 ) ライセンス: Link先を確認	Huei-Chung Hu, Xuyang Wu, Haowei Liu, Ting-Ruen Wei, Hsin-Tai Wu,	(参考訳) 多くのヘッドポーズ推定(HPE)手法は、理論上、様々な角度から頭部の回転と位置を推定できるフルレンジデータセットを作成することを約束する。しかし、これらの手法は頭部角度の範囲内でのみ正確であり、この特定の範囲を超えると重大な不正確な結果がもたらされる。これは、座標系と基底回転行列計算で用いられるオイラー角の明確な特異性によって、支配的に説明される。そこで我々は,(1)正しい座標系とユーラー角を正しい軸列で正確に推定する手法,(2)SPECIFIC座標系の下での回転行列の2次元幾何増分式,(3)回転行列とポーズの正しい描画ルーチンの導出,(4)フルレンジの頭部ポーズデータセット生成のための適切なピッチヨットカバレッジを可能にする数学的実験と検証を行うことによって,これらの制限に対処した。提案手法を既存の頭部ポーズ推定法に適用することにより,モデルの性能が大幅に向上した。コードは受理後に公開される。 Many head pose estimation (HPE) methods promise the ability to create full-range datasets, theoretically allowing the estimation of the rotation and positioning of the head from various angles. However, these methods are only accurate within a range of head angles; exceeding this specific range led to significant inaccuracies. This is dominantly explained by unclear specificity of the coordinate systems and Euler Angles used in the foundational rotation matrix calculations. Here, we addressed these limitations by presenting (1) methods that accurately infer the correct coordinate system and Euler angles in the correct axis-sequence, (2) novel formulae for 2D geometric augmentations of the rotation matrices under the (SPECIFIC) coordinate system, (3) derivations for the correct drawing routines for rotation matrices and poses, and (4) mathematical experimentation and verification that allow proper pitch-yaw coverage for full-range head pose dataset generation. Performing our augmentation techniques to existing head pose estimation methods demonstrated a significant improvement to the model performance. Code will be released upon paper acceptance.	翻訳日:2024-08-06 19:30:18 公開日:2024-08-02
# デジタル病理における組織像の検索と検索の妥当性について On Validation of Search & Retrieval of Tissue Images in Digital Pathology ( http://arxiv.org/abs/2408.01570v1 ) ライセンス: Link先を確認	H. R. Tizhoosh,	(参考訳) 医療画像は、診断、治療計画、疾病モニタリングに不可欠な情報を提供することによって、現代医療において重要な役割を担っている。放射線学や病理学などの分野は正確な画像解釈に大きく依存しており、X線、CTスキャン、MRIを用いて骨折から癌までを診断する一方、病理学者は顕微鏡とデジタル画像を用いてがんや感染症の診断に細胞異常を検出する。技術的進歩は、医療画像の量と複雑さを指数関数的に増加させ、管理と検索に効率的なツールを必要としている。 CBIR(Content-Based Image Retrieval)システムは、視覚的コンテンツに基づいて画像の検索と検索を行い、臨床医が類似の症例を見つけ、病理パターンを比較することによって診断精度を高めることで、このニーズに対処する。医療応用における画像検索エンジンの総合的検証には、精度、インデックス化、検索時間、ストレージオーバーヘッドなどのパフォーマンス指標の評価が含まれており、最近の病理組織学の検証で示されているように、正確な結果の信頼性と効率的な検索が保証されている。 Medical images play a crucial role in modern healthcare by providing vital information for diagnosis, treatment planning, and disease monitoring. Fields such as radiology and pathology rely heavily on accurate image interpretation, with radiologists examining X-rays, CT scans, and MRIs to diagnose conditions from fractures to cancer, while pathologists use microscopy and digital images to detect cellular abnormalities for diagnosing cancers and infections. The technological advancements have exponentially increased the volume and complexity of medical images, necessitating efficient tools for management and retrieval. Content-Based Image Retrieval (CBIR) systems address this need by searching and retrieving images based on visual content, enhancing diagnostic accuracy by allowing clinicians to find similar cases and compare pathological patterns. Comprehensive validation of image search engines in medical applications involves evaluating performance metrics like accuracy, indexing, and search times, and storage overhead, ensuring reliable and efficient retrieval of accurate results, as demonstrated by recent validations in histopathology.	翻訳日:2024-08-06 19:30:18 公開日:2024-08-02
# 拡散オートエンコーダを用いた医用画像の分類と回帰の因果的説明 Counterfactual Explanations for Medical Image Classification and Regression using Diffusion Autoencoder ( http://arxiv.org/abs/2408.01571v1 ) ライセンス: Link先を確認	Matan Atad, David Schinz, Hendrik Moeller, Robert Graf, Benedikt Wiestler, Daniel Rueckert, Nassir Navab, Jan S. Kirschke, Matthias Keicher,	(参考訳) 対実的説明(CE)は、入力特徴の変化が結果の予測にどのように影響するかを説明することによって、機械学習モデルの解釈可能性を高めることを目的としている。共通CEアプローチは追加のモデルを必要とし、通常は二項対物に制約される。対照的に、生成モデルの潜在空間、特に拡散オートエンコーダ(DAE)を直接操作する新しい手法を提案する。このアプローチは、CEの生成と決定境界を越えたモデルの内部表現の継続的な可視化を可能にすることによって、固有の解釈可能性を提供します。提案手法は,DAEが意味的にリッチな潜在空間を教師なしでエンコードする機能を活用し,ラベル付きデータや特徴抽出モデルを不要にする。脊椎圧迫骨折 (VCF) や糖尿病性網膜症 (DR) などの重症度病態の医学的分類や経時的退行に有用であることが示唆された。本手法は,線形モデルを用いた順序CEの可視化をサポートし,モデル決定過程の深い洞察と解釈可能性の向上を実現する。様々な医用画像データセットに対する実験は、解釈可能性と汎用性における手法の利点を実証している。 DAEの潜伏空間の線形多様体は意味のある補間と操作を可能にし、医療画像特性を探索するための強力なツールとなった。私たちのコードはhttps://github.com/matanat/dae_counterfactual.comで利用可能です。 Counterfactual explanations (CEs) aim to enhance the interpretability of machine learning models by illustrating how alterations in input features would affect the resulting predictions. Common CE approaches require an additional model and are typically constrained to binary counterfactuals. In contrast, we propose a novel method that operates directly on the latent space of a generative model, specifically a Diffusion Autoencoder (DAE). This approach offers inherent interpretability by enabling the generation of CEs and the continuous visualization of the model's internal representation across decision boundaries. Our method leverages the DAE's ability to encode images into a semantically rich latent space in an unsupervised manner, eliminating the need for labeled data or separate feature extraction models. We show that these latent representations are helpful for medical condition classification and the ordinal regression of severity pathologies, such as vertebral compression fractures (VCF) and diabetic retinopathy (DR). Beyond binary CEs, our method supports the visualization of ordinal CEs using a linear model, providing deeper insights into the model's decision-making process and enhancing interpretability. Experiments across various medical imaging datasets demonstrate the method's advantages in interpretability and versatility. The linear manifold of the DAE's latent space allows for meaningful interpolation and manipulation, making it a powerful tool for exploring medical image properties. Our code is available at https://github.com/matanat/dae_counterfactual.	翻訳日:2024-08-06 19:30:18 公開日:2024-08-02
# リンドープダイヤモンドショットキーダイオードにおける色中心の電気励起 Electrical excitation of color centers in phosphorus-doped diamond Schottky diodes ( http://arxiv.org/abs/2408.01572v1 ) ライセンス: Link先を確認	Florian Sledz, Igor A. Khramtsov, Assegid M. Flatae, Stefano Lagomarsino, Silvio Sciortino, Shannon S. Nicley, Rozita Rouzbahani, Paulius Pobedinskas, Tianxiao Guo, Xin Jiang, Paul Kienitz, Peter Haring Bolivar, Ken Haenen, Dmitry Yu. Fedyanin, Mario Agio,	(参考訳) 環境条件下で電気注入を行う堅牢な量子光源は、量子鍵分布やメトロジーのような量子技術の実用化に望ましい。ダイヤモンドのカラーセンターは、室温と高温で撮影可能なエミッターであるため、有望な候補となっている。それらの電気励起の可能性は既にp-i-nダイオード内で実証されている。しかし、これは複雑なダイヤモンド構造の成長を必要とする。これらの従来手法とは対照的に, ダイヤモンド中の色中心をベースとした単一光子発光デバイスの実現を約束する, 水素を透過したn型ダイヤモンドをベースとした新しいショットキーダイオード構成において, 電気ポンプによる色中心の放出を実証する。 A robust quantum light source operating upon electrical injection at ambient conditions is desirable for practical implementation of quantum technologies, such as quantum key distribution or metrology. Color centers in diamond are promising candidates as they are photostable emitters at room and elevated temperatures. The possibility of their electrical excitation has already been demonstrated within p-i-n diodes. However, this requires the growth of complex diamond structures. In contrast to these conventional approaches, we demonstrate the emission of color centers under electrical pumping in a novel Schottky diode configuration based on hydrogen passivated n-type diamond, which holds promise for integrated single-photon emitting devices based on color centers in diamond.	翻訳日:2024-08-06 19:30:18 公開日:2024-08-02
# 4次元地震探査とモニタリングデータによるCO2貯蔵の履歴マッチングのためのディープラーニングフレームワーク Deep Learning Framework for History Matching CO2 Storage with 4D Seismic and Monitoring Well Data ( http://arxiv.org/abs/2408.01575v1 ) ライセンス: Link先を確認	Nanzhe Wang, Louis J. Durlofsky,	(参考訳) 地質的な炭素貯蔵は、超臨界二酸化炭素のメガトンを地下層に注入する。これらの構造の特徴は、通常非常に不確実であり、大規模なストレージ操作の設計と最適化を困難にしている。本稿では,早期観測に基づく生成特性の校正を可能にする履歴マッチング戦略を提案する。早期評価は、作戦が計画通りに実行されていることを保証するために不可欠である。筆者らのフレームワークは、2つの適合型深層学習サロゲートモデルによって構成されている。これらの2種類のデータは解像度のスケールが全く異なるため、予測のために個別に専門的なディープラーニングネットワークを構築するのが適切である。このアプローチによって、グローバルな高忠実度予測を提供する単一のサロゲートよりも、設計が簡単で、トレーニングが効率的になるワークフローが実現します。ディープラーニングモデルは階層的なマルコフ連鎖モンテカルロ (MCMC) 履歴マッチング手順に統合される。 4次元地震データを用いて, 履歴マッチングを行い, 不確実性低減に対する4次元地震の影響を定量化する。両データ型の利用は,鍵ジオモデルパラメータの相当な不確実性を低減し,CO2配管力学の正確な予測を可能にする。本研究で開発された全体的な履歴マッチングフレームワークは,複数のデータ型を統合し,不確実性低減と性能予測に与える影響を評価するための効率的な方法である。 Geological carbon storage entails the injection of megatonnes of supercritical CO2 into subsurface formations. The properties of these formations are usually highly uncertain, which makes design and optimization of large-scale storage operations challenging. In this paper we introduce a history matching strategy that enables the calibration of formation properties based on early-time observations. Early-time assessments are essential to assure the operation is performing as planned. Our framework involves two fit-for-purpose deep learning surrogate models that provide predictions for in-situ monitoring well data and interpreted time-lapse (4D) seismic saturation data. These two types of data are at very different scales of resolution, so it is appropriate to construct separate, specialized deep learning networks for their prediction. This approach results in a workflow that is more straightforward to design and more efficient to train than a single surrogate that provides global high-fidelity predictions. The deep learning models are integrated into a hierarchical Markov chain Monte Carlo (MCMC) history matching procedure. History matching is performed on a synthetic case with and without 4D seismic data, which allows us to quantify the impact of 4D seismic on uncertainty reduction. The use of both data types is shown to provide substantial uncertainty reduction in key geomodel parameters and to enable accurate predictions of CO2 plume dynamics. The overall history matching framework developed in this study represents an efficient way to integrate multiple data types and to assess the impact of each on uncertainty reduction and performance predictions.	翻訳日:2024-08-06 19:30:18 公開日:2024-08-02
# THOR2: 色空間のトポロジカルソフトクラスタリングによる不透明環境におけるヒューマンインスパイアされた物体認識 THOR2: Leveraging Topological Soft Clustering of Color Space for Human-Inspired Object Recognition in Unseen Environments ( http://arxiv.org/abs/2408.01579v1 ) ライセンス: Link先を確認	Ekta U. Samani, Ashis G. Banerjee,	(参考訳) 見えない、散らばった屋内環境における視覚的物体認識は、移動ロボットにとって難しい問題である。本研究では,RGB-D画像から生成された点群に対する3次元形状と色に基づく記述子TOPS2と,それに付随する認識フレームワークTHOR2を提案する。 TOPS2ディスクリプタは、粗い色領域のネットワークを用いて計算されたスライスベースのカラー埋め込みを通じてオブジェクト色情報をキャプチャしながら、TOPSディスクリプタから3D形状のスライスベースのトポロジカル表現を保持することにより、人間の認知機構であるオブジェクト単位を具現化する。これらの色領域は, トポロジカルソフトクラスタリング法であるMapperアルゴリズムを用いて, 人間の色知覚で同定されたマカダム楕円体に類似している。合成データを用いてトレーニングされたTHOR2は、異なる視点から散在するシーンをキャプチャするOCIDデータセットと、異なる環境条件とコモディティハードウェアを用いて記録されたオブジェクトの閉塞度を反映するUW-IS Occludedデータセットの2つのベンチマークで、3D形状ベースの前駆体であるTHORと比較して、認識精度が著しく向上したことを示した。 THOR2はまた、ベースラインのディープラーニングネットワークよりも優れており、両方のデータセットでRGB-D入力に適応した広く使われているViTである。したがって、THOR2は低コストロボットにおける堅牢な認識を実現するための有望なステップである。 Visual object recognition in unseen and cluttered indoor environments is a challenging problem for mobile robots. This study presents a 3D shape and color-based descriptor, TOPS2, for point clouds generated from RGB-D images and an accompanying recognition framework, THOR2. The TOPS2 descriptor embodies object unity, a human cognition mechanism, by retaining the slicing-based topological representation of 3D shape from the TOPS descriptor while capturing object color information through slicing-based color embeddings computed using a network of coarse color regions. These color regions, analogous to the MacAdam ellipses identified in human color perception, are obtained using the Mapper algorithm, a topological soft-clustering technique. THOR2, trained using synthetic data, demonstrates markedly improved recognition accuracy compared to THOR, its 3D shape-based predecessor, on two benchmark real-world datasets: the OCID dataset capturing cluttered scenes from different viewpoints and the UW-IS Occluded dataset reflecting different environmental conditions and degrees of object occlusion recorded using commodity hardware. THOR2 also outperforms baseline deep learning networks, and a widely-used ViT adapted for RGB-D inputs on both the datasets. Therefore, THOR2 is a promising step toward achieving robust recognition in low-cost robots.	翻訳日:2024-08-06 19:30:18 公開日:2024-08-02
# 巨大アンサンブル(第2報)球状フーリエニューラル演算子を用いたハインドキャストの大規模アンサンブルの特性 Huge Ensembles Part II: Properties of a Huge Ensemble of Hindcasts Generated with Spherical Fourier Neural Operators ( http://arxiv.org/abs/2408.01581v1 ) ライセンス: Link先を確認	Ankur Mahesh, William Collins, Boris Bonev, Noah Brenowitz, Yair Cohen, Peter Harrington, Karthik Kashinath, Thorsten Kurth, Joshua North, Travis OBrien, Michael Pritchard, David Pruitt, Mark Risser, Shashank Subramanian, Jared Willard,	(参考訳) パートIでは,球状フーリエニューラル演算子に基づくアンサンブルを作成した。初期条件摂動として、ブレッドベクトルを用い、モデル摂動として、スクラッチから独立して訓練された複数のチェックポイントを用いた。アンサンブルの物理的忠実度を評価する診断に基づいて、我々のアンサンブルは運用天気予報システムに匹敵する性能を有する。しかし、数桁の計算資源を必要とする。第2部では,2023年夏から1日に7,424人が参加し,巨大なアンサンブル(HENS)を発生させる。この規模で巨大なアンサンブルを実行するための技術的要件を列挙します。 HENSは予測分布の尾部を正確にサンプリングし、内部変数の詳細なサンプリングを行う。極端な気候統計では、HENSはアンサンブル平均から4$\sigma$のイベントをサンプリングする。各グリッドセルにおいて、HENSは最も正確なアンサンブル部材のスキルを改善し、将来の軌道のカバレッジを高める。天気予報モデルとして、HENSは、不確実性の定量化を向上した極端な天気予報を発行する。また、検証値がアンサンブル予測分布の外側にあるような、外れ値イベントの確率を下げる。 In Part I, we created an ensemble based on Spherical Fourier Neural Operators. As initial condition perturbations, we used bred vectors, and as model perturbations, we used multiple checkpoints trained independently from scratch. Based on diagnostics that assess the ensemble's physical fidelity, our ensemble has comparable performance to operational weather forecasting systems. However, it requires several orders of magnitude fewer computational resources. Here in Part II, we generate a huge ensemble (HENS), with 7,424 members initialized each day of summer 2023. We enumerate the technical requirements for running huge ensembles at this scale. HENS precisely samples the tails of the forecast distribution and presents a detailed sampling of internal variability. For extreme climate statistics, HENS samples events 4$\sigma$ away from the ensemble mean. At each grid cell, HENS improves the skill of the most accurate ensemble member and enhances coverage of possible future trajectories. As a weather forecasting model, HENS issues extreme weather forecasts with better uncertainty quantification. It also reduces the probability of outlier events, in which the verification value lies outside the ensemble forecast distribution.	翻訳日:2024-08-06 19:30:18 公開日:2024-08-02
# 個別処理効果推定と推定のための等角拡散モデル Conformal Diffusion Models for Individual Treatment Effect Estimation and Inference ( http://arxiv.org/abs/2408.01582v1 ) ライセンス: Link先を確認	Hengrui Cai, Huaqing Jin, Lexin Li,	(参考訳) 観察データから治療効果を推定することは、多くのアプリケーション領域において中心的な関心事である。個々の治療効果は、個々のレベルで最もきめ細かい治療効果を示し、パーソナライズされたケアを促進するのに最も有用である。しかし、いくつかの問題により、その推定と推測は未発達のままである。本稿では、これらの複雑な課題に対処する新しい共形拡散モデルに基づくアプローチを提案する。我々は,共形推論のモデルフリーな統計的推論パラダイムである高フレキシブルな拡散モデルと,分布シフトに対処する確率スコアと共変局所近似を統合した。我々は、個々の治療効果の潜在的な結果の分布を不偏に見積もり、情報的信頼区間を構築し、厳密な理論的保証を確立する。提案手法の既存解に対する競合性能を,広範囲な数値研究により実証した。 Estimating treatment effects from observational data is of central interest across numerous application domains. Individual treatment effect offers the most granular measure of treatment effect on an individual level, and is the most useful to facilitate personalized care. However, its estimation and inference remain underdeveloped due to several challenges. In this article, we propose a novel conformal diffusion model-based approach that addresses those intricate challenges. We integrate the highly flexible diffusion modeling, the model-free statistical inference paradigm of conformal inference, along with propensity score and covariate local approximation that tackle distributional shifts. We unbiasedly estimate the distributions of potential outcomes for individual treatment effect, construct an informative confidence interval, and establish rigorous theoretical guarantees. We demonstrate the competitive performance of the proposed method over existing solutions through extensive numerical studies.	翻訳日:2024-08-06 19:30:18 公開日:2024-08-02
# GPUDrive:100万FPSでデータ駆動マルチエージェント駆動シミュレーション GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS ( http://arxiv.org/abs/2408.01584v1 ) ライセンス: Link先を確認	Saman Kazemkhani, Aarav Pandya, Daphne Cornelisse, Brennan Shacklett, Eugene Vinitsky,	(参考訳) マルチエージェント学習アルゴリズムは多種多様なゲームでスーパーヒューマンプランニングを生成することに成功したが、デプロイされたマルチエージェントプランナーの設計にはほとんど影響を与えていない。これらのテクニックをマルチエージェント計画に適用する上で重要なボトルネックは、何十億もの経験ステップを必要とすることだ。このスケールでのマルチエージェント計画の研究を可能にするために,Madrona Game Engine上に構築されたGPUアクセラレーションによるマルチエージェントシミュレータであるGPUDriveを紹介した。観察、報酬、動的関数はC++で直接書かれており、ユーザーは高性能なCUDAに格下げされる複雑で異質なエージェントの振る舞いを定義できる。 GPUDriveを使用することで、Waymo Motionデータセットの多くのシーンで強化学習エージェントを効果的にトレーニングすることができ、個々のシーンで数分で高い効率の目標達成エージェントが得られ、数時間で一般的に有能なエージェントが得られます。トレーニングされたエージェントは、https://github.com/Emerge-Lab/gpudrive.comのコードベースの一部として出荷されます。 Multi-agent learning algorithms have been successful at generating superhuman planning in a wide variety of games but have had little impact on the design of deployed multi-agent planners. A key bottleneck in applying these techniques to multi-agent planning is that they require billions of steps of experience. To enable the study of multi-agent planning at this scale, we present GPUDrive, a GPU-accelerated, multi-agent simulator built on top of the Madrona Game Engine that can generate over a million steps of experience per second. Observation, reward, and dynamics functions are written directly in C++, allowing users to define complex, heterogeneous agent behaviors that are lowered to high-performance CUDA. We show that using GPUDrive we are able to effectively train reinforcement learning agents over many scenes in the Waymo Motion dataset, yielding highly effective goal-reaching agents in minutes for individual scenes and generally capable agents in a few hours. We ship these trained agents as part of the code base at https://github.com/Emerge-Lab/gpudrive.	翻訳日:2024-08-06 19:30:18 公開日:2024-08-02
# OpenLogParser: オープンソースの大規模言語モデルによる教師なしのパース OpenLogParser: Unsupervised Parsing with Open-Source Large Language Models ( http://arxiv.org/abs/2408.01585v1 ) ライセンス: Link先を確認	Zeyang Ma, Dong Jae Kim, Tse-Hsun Chen,	(参考訳) ログ解析は、非構造化ログデータを構造化フォーマットに変換する重要なステップであり、その後のログベースの分析を容易にする。従来の構文ベースのログパーサは効率的で効果的だが、事前に定義されたルールから外れたログを処理すると、精度が低下することが多い。近年,大規模言語モデル (LLM) に基づくログ解析では,解析精度が向上している。しかし、既存のLCMベースのパーサは、1)微調整やインコンテキスト学習のための時間的および労働集約的なマニュアルラベリング、2)大量のログデータとLLMのコンテキストサイズ制限による解析コストの増加、3)機密ログ情報を備えたChatGPTのような商用モデルの使用によるプライバシリスクの3つの課題に直面している。この制限を克服するために,OpenLogParserを導入する。これはオープンソースのLLM(Llama3-8B)を活用して,最先端の解析精度を確保しながら,プライバシの向上と運用コストの削減を実現する,教師なしのログ解析アプローチである。 OpenLogParserは、同じ静的テキストでログをグループ化するが、固定深さのグルーピングツリーを使用して動的変数を変更する。次に、これらのグループ内のログを3つのコンポーネントを使って解析する。 i)類似度スコアリングに基づく検索強化生成:Jaccardの類似性に基づいて各グループ内の多様なログを選択し、LCMが静的テキストと動的変数を区別するのに役立つ。二自己回帰解析精度を向上させるため、ログテンプレートを洗練するためにLCMを反復的にクエリすること。三ログテンプレートメモリ: 解析効率を向上させるため、LLMクエリを減らすために解析テンプレートを格納する。 LogHub-2.0の評価では,OpenLogParserは解析精度が25%向上し,ログ処理は最先端のLCMベースのパーサに比べて2.7倍高速であった。簡単に言うと、OpenLogParserは商用LLMを使用することによるプライバシとコストの懸念に対処しつつ、最先端のパース効率と正確性を実現している。 Log parsing is a critical step that transforms unstructured log data into structured formats, facilitating subsequent log-based analysis. Traditional syntax-based log parsers are efficient and effective, but they often experience decreased accuracy when processing logs that deviate from the predefined rules. Recently, large language models (LLM) based log parsers have shown superior parsing accuracy. However, existing LLM-based parsers face three main challenges: 1)time-consuming and labor-intensive manual labeling for fine-tuning or in-context learning, 2)increased parsing costs due to the vast volume of log data and limited context size of LLMs, and 3)privacy risks from using commercial models like ChatGPT with sensitive log information. To overcome these limitations, this paper introduces OpenLogParser, an unsupervised log parsing approach that leverages open-source LLMs (i.e., Llama3-8B) to enhance privacy and reduce operational costs while achieving state-of-the-art parsing accuracy. OpenLogParser first groups logs with similar static text but varying dynamic variables using a fixed-depth grouping tree. It then parses logs within these groups using three components: i)similarity scoring-based retrieval augmented generation: selects diverse logs within each group based on Jaccard similarity, helping the LLM distinguish between static text and dynamic variables; ii)self-reflection: iteratively query LLMs to refine log templates to improve parsing accuracy; and iii) log template memory: stores parsed templates to reduce LLM queries for improved parsing efficiency. Our evaluation on LogHub-2.0 shows that OpenLogParser achieves 25% higher parsing accuracy and processes logs 2.7 times faster compared to state-of-the-art LLM-based parsers. In short, OpenLogParser addresses privacy and cost concerns of using commercial LLMs while achieving state-of-the-arts parsing efficiency and accuracy.	翻訳日:2024-08-06 19:30:18 公開日:2024-08-02
# 幼児の耳認識と縦断的評価のための深層学習アプローチ Deep Learning Approach for Ear Recognition and Longitudinal Evaluation in Children ( http://arxiv.org/abs/2408.01588v1 ) ライセンス: Link先を確認	Afzal Hossain, Tipu Sultan, Stephanie Schuckers,	(参考訳) バイオメトリック・モダリティとしての耳の認識はますます人気を集めており、将来的な応用分野が期待されている。現在の応用は成人を含むが、子供の耳の認識における課題の1つは、年齢とともに耳の構造が急速に変化することである。本研究は,4歳から14歳までの小児から2.5年間に収集した基礎的縦断的データセットを導入し,その評価を行った。本稿では,VGG16とMobileNetのアンサンブルを用いて,成人・小児両方のデータセットに着目し,子どもの縦断的評価を重視した深層学習に基づく認識手法を提案する。 Ear recognition as a biometric modality is becoming increasingly popular, with promising broader application areas. While current applications involve adults, one of the challenges in ear recognition for children is the rapid structural changes in the ear as they age. This work introduces a foundational longitudinal dataset collected from children aged 4 to 14 years over a 2.5-year period and evaluates ear recognition performance in this demographic. We present a deep learning based approach for ear recognition, using an ensemble of VGG16 and MobileNet, focusing on both adult and child datasets, with an emphasis on longitudinal evaluation for children.	翻訳日:2024-08-06 19:30:18 公開日:2024-08-02
# テキスト・画像・ジェネレータにおける鋳物の解釈・表現・ステレオタイプ Interpretations, Representations, and Stereotypes of Caste within Text-to-Image Generators ( http://arxiv.org/abs/2408.01590v1 ) ライセンス: Link先を確認	Sourojit Ghosh,	(参考訳) テキスト・ツー・イメージ・ジェネレータ(T2Is)の普及は、公正性と公平な結果の確保に関する広範な研究と、それらが社会に与える影響に焦点が当てられている。しかし、このような研究は一般的に、世界的な経験のあるアイデンティティや、西洋の文脈中心に焦点を絞ったものである。本稿では,T2I研究における悲劇的に未解明な文脈を取り巻く解釈,表現,ステレオタイプについて述べる。我々は、T2I安定拡散が、様々なキャストの人々をいかに表示するか、そして、彼らが演じている職業をどう表現するかを考察する。 1プロンプトあたり100個の画像を生成し、安定拡散による「インド人」の既定描写とCLIP-cosine類似性の比較を行い、類似性のパターンを探索する。以上の結果から, 安定拡散は「キャスタレスネス」の系を永久に出力し, 高いキャスタネスと同一視し, 貧困の指標とキャスタブルに抑圧されたアイデンティティを表現していることが明らかとなった。特に、歴史的に結婚したダリト族に対するステレオタイプと表現上の害は、農村部に住み、常に抗議活動で顕著に描写されている。以上の結果から,T2I設計に対するキャストアウェアアプローチの必要性が浮き彫りにされ,デザインレコメンデーションで結論が得られた。 The surge in the popularity of text-to-image generators (T2Is) has been matched by extensive research into ensuring fairness and equitable outcomes, with a focus on how they impact society. However, such work has typically focused on globally-experienced identities or centered Western contexts. In this paper, we address interpretations, representations, and stereotypes surrounding a tragically underexplored context in T2I research: caste. We examine how the T2I Stable Diffusion displays people of various castes, and what professions they are depicted as performing. Generating 100 images per prompt, we perform CLIP-cosine similarity comparisons with default depictions of an 'Indian person' by Stable Diffusion, and explore patterns of similarity. Our findings reveal how Stable Diffusion outputs perpetuate systems of 'castelessness', equating Indianness with high-castes and depicting caste-oppressed identities with markers of poverty. In particular, we note the stereotyping and representational harm towards the historically-marginalized Dalits, prominently depicted as living in rural areas and always at protests. Our findings underscore a need for a caste-aware approach towards T2I design, and we conclude with design recommendations.	翻訳日:2024-08-06 19:30:18 公開日:2024-08-02
# 円箱に閉じ込められた電場をもつ二次元高調波発振器について On the two-dimensional harmonic oscillator with an electric field confined to a circular box ( http://arxiv.org/abs/2408.01593v1 ) ライセンス: Link先を確認	Francisco M. Fernández, Javier Garcia, Norberto Aquino, Antonio Flores-Riveros,	(参考訳) 電界を不透明壁の円形箱に閉じ込めた量子力学的2次元高調波発振器を再検討する。エネルギースペクトルを得るために、多項式とガウス基底集合を持つレイリー・リッツ法を利用する。本稿では,近年の他の著者による結果と比較する。我々は、大小の箱半径の限界について議論し、摂動理論を用いていくつかの計算を行う。 We revisit the quantum-mechanical two-dimensional harmonic oscillator with an electric field confined to a circular box of impenetrable walls. In order to obtain the energy spectrum we resort to the Rayleigh-Ritz method with polynomial and Gaussian basis sets. We compare present results with those derived recently by other authors. We discuss the limits of large and small box radius and also do some calculations with perturbation theory.	翻訳日:2024-08-06 19:20:31 公開日:2024-08-02
# 「私はここでは表現できない」:ジェンダーのアイデンティティと国籍にまたがる表現的ハームを含む安定拡散出力のユーザ体験 "I don't see myself represented here at all": User Experiences of Stable Diffusion Outputs Containing Representational Harms across Gender Identities and Nationalities ( http://arxiv.org/abs/2408.01594v1 ) ライセンス: Link先を確認	Sourojit Ghosh, Nina Lutz, Aylin Caliskan,	(参考訳) 安定拡散のようなテキスト・ツー・イメージ・ジェネレータ(T2Is)の研究は、社会的バイアスや害を引き起こす可能性の増幅を実証してきたが、そのような研究は主に、害を経験する実際のユーザから情報を求めるのではなく、計算手法に依存しており、これは重要な知識ギャップである。本稿では,133人のクラウドソーシングデータと,多様な国・性別を対象とした14の半構造化インタビューを組み合わせることで,安定拡散に関する最大の人的研究を行う。集合内コサイン類似性階層の混合メソッドアプローチ(すなわち、同じプロンプトに対する複数の安定拡散出力を比較して、どの結果がプロンプトに最も近いかを調べる)と定性的セマンティック解析により、まず、安定拡散出力に対するユーザ期待と、そのような期待から遠く離れた画像を提供する「人」の安定拡散再帰によって証明された、生成されたものとの間の大きな不一致を示す。そして、この一般的な不満の発見を、我々の被験者、特に伝統的に疎外されたアイデンティティを持つ人々に対する安定拡散による表現的害の強調に拡張し、それらのアイデンティティに関する不正確でしばしば非人間的なステレオタイプを課す。我々は、安定拡散や他のT2Iの将来のバージョンを設計(再設計)するためのハーネスアウェアアプローチを提案する。 Though research into text-to-image generators (T2Is) such as Stable Diffusion has demonstrated their amplification of societal biases and potentials to cause harm, such research has primarily relied on computational methods instead of seeking information from real users who experience harm, which is a significant knowledge gap. In this paper, we conduct the largest human subjects study of Stable Diffusion, with a combination of crowdsourced data from 133 crowdworkers and 14 semi-structured interviews across diverse countries and genders. Through a mixed-methods approach of intra-set cosine similarity hierarchies (i.e., comparing multiple Stable Diffusion outputs for the same prompt with each other to examine which result is 'closest' to the prompt) and qualitative thematic analysis, we first demonstrate a large disconnect between user expectations for Stable Diffusion outputs with those generated, evidenced by a set of Stable Diffusion renditions of `a Person' providing images far away from such expectations. We then extend this finding of general dissatisfaction into highlighting representational harms caused by Stable Diffusion upon our subjects, especially those with traditionally marginalized identities, subjecting them to incorrect and often dehumanizing stereotypes about their identities. We provide recommendations for a harm-aware approach to (re)design future versions of Stable Diffusion and other T2Is.	翻訳日:2024-08-06 19:20:31 公開日:2024-08-02
# 社会的・敵対的データに基づく信頼できる機械学習 Trustworthy Machine Learning under Social and Adversarial Data Sources ( http://arxiv.org/abs/2408.01596v1 ) ライセンス: Link先を確認	Han Shao,	(参考訳) 機械学習は近年、驚くべきブレークスルーを目の当たりにした。機械学習が日常生活の様々な側面に浸透するにつれ、個人や組織はますますこれらのシステムと相互作用し、幅広い社会的・敵対的な行動を示すようになる。これらの振る舞いは、機械学習システムの振る舞いと性能に顕著な影響を与える可能性がある。具体的には、これらの相互作用の間、データは戦略的個人によって生成され、自己関心のデータ収集者によって収集され、おそらく敵の攻撃者によって汚染され、複数の目的を満たす予測器、モデル、ポリシーを作成するために使用される。その結果、ディープラーニングシステムの出力は、敵対的な例(Shafahi et al , 2018; Szegedy et al , 2013)に対するディープニューラルネットワークの感受性や、戦略的個人の存在下での古典的アルゴリズムのパフォーマンスの低下(Ahmadi et al , 2021)など、低下する可能性がある。これらの課題に対処することは、社会的環境における機械学習の成功に不可欠である。 Machine learning has witnessed remarkable breakthroughs in recent years. As machine learning permeates various aspects of daily life, individuals and organizations increasingly interact with these systems, exhibiting a wide range of social and adversarial behaviors. These behaviors may have a notable impact on the behavior and performance of machine learning systems. Specifically, during these interactions, data may be generated by strategic individuals, collected by self-interested data collectors, possibly poisoned by adversarial attackers, and used to create predictors, models, and policies satisfying multiple objectives. As a result, the machine learning systems' outputs might degrade, such as the susceptibility of deep neural networks to adversarial examples (Shafahi et al., 2018; Szegedy et al., 2013) and the diminished performance of classic algorithms in the presence of strategic individuals (Ahmadi et al., 2021). Addressing these challenges is imperative for the success of machine learning in societal settings.	翻訳日:2024-08-06 19:20:31 公開日:2024-08-02
# 物理インフォームド幾何対応ニューラル演算子 Physics-Informed Geometry-Aware Neural Operator ( http://arxiv.org/abs/2408.01600v1 ) ライセンス: Link先を確認	Weiheng Zhong, Hadi Meidani,	(参考訳) 工学設計の問題は、可変PDEパラメータとドメイン幾何学の下でパラメトリック部分微分方程式(PDE)を解くことである。近年、ニューラル演算子はPDE演算子を学習し、PDE解を素早く予測する。しかしながら、これらのニューラル演算子のトレーニングは通常、大きなデータセットを必要とする。これを解決するために、物理インフォームドトレーニングは、ニューラルネットワークを構築する代替方法を提供し、有限要素データ生成に伴う高い計算コストを排除している。それにもかかわらず、現在の物理インフォームドニューラルネットワークは、異なるドメインジオメトリや異なるPDEパラメータを扱う場合の制限に苦慮している。本研究では,PDEパラメータとドメインジオメトリの両方を同時に一般化する物理インフォーメーション幾何認識ニューラル演算子(PI-GANO)を提案する。ドメインの幾何学的特徴を捉えるためにジオメトリエンコーダを採用し、既存のDCONアーキテクチャにこのコンポーネントを統合するための新しいパイプラインを設計する。提案手法の精度と効率を数値計算により検証した。 Engineering design problems often involve solving parametric Partial Differential Equations (PDEs) under variable PDE parameters and domain geometry. Recently, neural operators have shown promise in learning PDE operators and quickly predicting the PDE solutions. However, training these neural operators typically requires large datasets, the acquisition of which can be prohibitively expensive. To overcome this, physics-informed training offers an alternative way of building neural operators, eliminating the high computational costs associated with Finite Element generation of training data. Nevertheless, current physics-informed neural operators struggle with limitations, either in handling varying domain geometries or varying PDE parameters. In this research, we introduce a novel method, the Physics-Informed Geometry-Aware Neural Operator (PI-GANO), designed to simultaneously generalize across both PDE parameters and domain geometries. We adopt a geometry encoder to capture the domain geometry features, and design a novel pipeline to integrate this component within the existing DCON architecture. Numerical results demonstrate the accuracy and efficiency of the proposed method.	翻訳日:2024-08-06 19:20:31 公開日:2024-08-02
# FIVBランキング:正しい方向へのミスステップ FIVB ranking: Misstep in the right direction ( http://arxiv.org/abs/2408.01603v1 ) ライセンス: Link先を確認	Salma Tenni, Daniel Gomes de Pinho Zanco, Leszek Szczecinski,	(参考訳) この研究は統計フレームワークを使用して、2020年からF\'ed\'eration Internationale de Volleyball (FIVB)が使用しているランキングアルゴリズムを提示し、評価している。 FIVBランキングの健全な特徴は確率モデルを使用することであり、これは今後のゲームの確率を明示的に計算する。この明示的なモデリングは、公式ランキングの文脈では新しく、パラメータの最適性だけでなく、ランキングアルゴリズムとの関係についても検討する。解析は解析的手法と数値的手法の両方を用いて行われる。モデリングの観点からは、ホームフィールド・アドバンテージ(HFA)の使用は有用であり、ゲーム結果の重み付けは非生産的であると結論付けている。アルゴリズム自体に関して、現在使われている近似以外の理論的根拠を説明し、性能を改善する新しいパラメータを見つける方法について説明する。最後に,結果のアルゴリズムの実装と解釈を劇的に単純化する新しいモデルを提案する。 This work uses a statistical framework to present and evaluate the ranking algorithm that has been used by F\'ed\'eration Internationale de Volleyball (FIVB) since 2020. The salient feature of the FIVB ranking is the use of the probabilistic model, which explicitly calculates the probabilities of the games to come. This explicit modeling is new in the context of official ranking, and we study the optimality of its parameters as well as its relationship with the ranking algorithm as such. The analysis is carried out using both analytical and numerical methods. We conclude that, from the modeling perspective, the use of the home-field advantage (HFA) would be beneficial and that the weighting of the game results is counterproductive. Regarding the algorithm itself, we explain the rationale beyond the approximations currently used and explain how to find new parameters which improve the performance. Finally, we propose a new model that drastically simplifies both the implementation and interpretation of the resulting algorithm.	翻訳日:2024-08-06 19:20:31 公開日:2024-08-02
# CYBERSECEVAL 3:大規模言語モデルにおけるサイバーセキュリティリスクと能力の評価の改善 CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models ( http://arxiv.org/abs/2408.01605v1 ) ライセンス: Link先を確認	Shengye Wan, Cyrus Nikolaidis, Daniel Song, David Molnar, James Crnkovich, Jayson Grace, Manish Bhatt, Sahana Chennabasappa, Spencer Whitman, Stephanie Ding, Vlad Ionescu, Yue Li, Joshua Saxe,	(参考訳) LLMのための新しいセキュリティベンチマークであるCYBERSECEVAL 3をリリースし、LLMのサイバーセキュリティのリスクと能力を実証的に測定する議論を継続する。 CYBERSECEVAL 3は、サードパーティに対するリスクと、アプリケーション開発者とエンドユーザに対するリスクという、2つの幅広いカテゴリの8つの異なるリスクを評価します。これまでの研究と比較して、攻撃的セキュリティ機能に焦点を当てた新たな分野として、ソーシャルエンジニアリングの自動化、手動攻撃型サイバーオペレーションのスケーリング、自動攻撃型サイバーオペレーションがあります。本稿では,これらのベンチマークをLlama 3モデルと同時期LLMのスイートに適用し,リスクを軽減・回避できる可能性について論じる。 We are releasing a new suite of security benchmarks for LLMs, CYBERSECEVAL 3, to continue the conversation on empirically measuring LLM cybersecurity risks and capabilities. CYBERSECEVAL 3 assesses 8 different risks across two broad categories: risk to third parties, and risk to application developers and end users. Compared to previous work, we add new areas focused on offensive security capabilities: automated social engineering, scaling manual offensive cyber operations, and autonomous offensive cyber operations. In this paper we discuss applying these benchmarks to the Llama 3 models and a suite of contemporaneous state-of-the-art LLMs, enabling us to contextualize risks both with and without mitigations in place.	翻訳日:2024-08-06 19:20:31 公開日:2024-08-02
# Deep Learning Meets OBIA: Tasks, Challenges, Strategies, and Perspectives Deep Learning Meets OBIA: Tasks, Challenges, Strategies, and Perspectives ( http://arxiv.org/abs/2408.01607v1 ) ライセンス: Link先を確認	Lei Ma, Ziyun Yan, Mengmeng Li, Tao Liu, Liqin Tan, Xuan Wang, Weiqiang He, Ruikun Wang, Guangjun He, Heng Lu, Thomas Blaschke,	(参考訳) 深層学習は、特にピクセルレベルのアプリケーションやパッチレベルのアプリケーションにおいて、リモートセンシングにおいて大きな注目を集めている。深層学習をオブジェクトベース画像解析(OBIA)に統合しようという試みは最初はあったが、その潜在能力は未解明のままである。本稿では、OBIAの利用がより広まるにつれて、深層学習の統合の有無にかかわらず、タスクサブドメインの包括的なレビューと拡張を行った。さらに、OBIA内での非構造化オブジェクトデータの直接処理におけるディープラーニングの限界に対処するための5つの一般的な戦略を特定し、要約し、また、いくつかの重要な研究方向性を推奨する。これらの取り組みの目標は、この魅力的だが見落とされがちな領域でのさらなる探索を刺激し、深層学習のOBIA処理ワークフローへの統合を促進することです。 Deep learning has gained significant attention in remote sensing, especially in pixel- or patch-level applications. Despite initial attempts to integrate deep learning into object-based image analysis (OBIA), its full potential remains largely unexplored. In this article, as OBIA usage becomes more widespread, we conducted a comprehensive review and expansion of its task subdomains, with or without the integration of deep learning. Furthermore, we have identified and summarized five prevailing strategies to address the challenge of deep learning's limitations in directly processing unstructured object data within OBIA, and this review also recommends some important future research directions. Our goal with these endeavors is to inspire more exploration in this fascinating yet overlooked area and facilitate the integration of deep learning into OBIA processing workflows.	翻訳日:2024-08-06 19:20:31 公開日:2024-08-02
# ポスト量子暗号(PQC)ネットワーク機器:PQC導入率の測定とマイグレーションパスの同定 Post-Quantum Cryptography (PQC) Network Instrument: Measuring PQC Adoption Rates and Identifying Migration Pathways ( http://arxiv.org/abs/2408.00054v2 ) ライセンス: Link先を確認	Jakub Sowa, Bach Hoang, Advaith Yeluru, Steven Qie, Anita Nikolich, Ravishankar Iyer, Phuong Cao,	(参考訳) 量子耐性暗号ネットワークプロトコルやポスト量子暗号(PQC)を採用する問題は、量子コンピューティングの民主化において極めて重要である。現実的な量子コンピュータは今後数十年で古典的な暗号化を破ることになるため、この問題は緊急である。過去の暗号化されたデータは、既に収集されており、近い将来に復号化できる。量子後暗号を採用する主な課題は、アルゴリズムの複雑さとハードウェア/ソフトウェア/ネットワークの実装である。既存のサイバーインフラ構造がポスト量子暗号をサポートするのかという大きな疑問は、まだ答えられていない。本論文は以下のとおりである。一イリノイ大学アーバナ・シャンペーン校の国立計算応用センター(NCSA)に置かれる新規な量子暗号(PQC)ネットワーク機器及びFABRICテストベッドの一部の設計二幅広いネットワークプロトコル(セキュアシェル、SSH、トランスポート層セキュリティ、TLS等)におけるPQC採用率に関する最新の結果。三重要な科学的応用(例えば、OpenSSH又はSciTokens)におけるPQCの実施の現状四量子抵抗の課題、及び五新規攻撃の可能性についての議論これは、全国規模のスーパーコンピュータセンターとFABRICテストベッドにおけるPQC導入の大規模測定としては初めてである。 OARNET, GTT, Google Fiber Webpass (U.S.) や Uppsala Lans Landsting (Sweden) といった主要なインターネットサービスプロバイダや自律システム(ASes)から来るNCSAにおけるOpenSSHコネクションの初回採用率は0.029%(20,556,816のうち6,044件)に達し,2023～2024年には総じて採用率が増加した。解析により、電流アプリケーションを量子抵抗に移行する経路を同定する。 The problem of adopting quantum-resistant cryptographic network protocols or post-quantum cryptography (PQC) is critically important to democratizing quantum computing. The problem is urgent because practical quantum computers will break classical encryption in the next few decades. Past encrypted data has already been collected and can be decrypted in the near future. The main challenges of adopting post-quantum cryptography lie in algorithmic complexity and hardware/software/network implementation. The grand question of how existing cyberinfrastructure will support post-quantum cryptography remains unanswered. This paper describes: i) the design of a novel Post-Quantum Cryptography (PQC) network instrument placed at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign and a part of the FABRIC testbed; ii) the latest results on PQC adoption rate across a wide spectrum of network protocols (Secure Shell -- SSH, Transport Layer Security -- TLS, etc.); iii) the current state of PQC implementation in key scientific applications (e.g., OpenSSH or SciTokens); iv) the challenges of being quantum-resistant; and v) discussion of potential novel attacks. This is the first large-scale measurement of PQC adoption at national-scale supercomputing centers and FABRIC testbeds. Our results show that only OpenSSH and Google Chrome have successfully implemented PQC and achieved an initial adoption rate of 0.029% (6,044 out of 20,556,816) for OpenSSH connections at NCSA coming from major Internet Service Providers or Autonomous Systems (ASes) such as OARNET, GTT, Google Fiber Webpass (U.S.) and Uppsala Lans Landsting (Sweden), with an overall increasing adoption rate year-over-year for 2023-2024. Our analyses identify pathways to migrate current applications to be quantum-resistant.	翻訳日:2024-08-06 12:36:51 公開日:2024-08-02
# 単純度上のWeighed l1:圧縮センシングは局所性に合致する Weighed l1 on the simplex: Compressive sensing meets locality ( http://arxiv.org/abs/2104.13894v2 ) ライセンス: Link先を確認	Abiy Tasissa, Pranay Tankala, Demba Ba,	(参考訳) スパース多様体学習アルゴリズムは、多様体学習とスパース最適化の技法を組み合わせて、下流タスクに使用できる特徴を学習する。圧縮センシングの標準設定は、この設定に直ちに適用できない。データ固有の幾何学構造のため、辞書原子は冗長であり、制限された等尺性やコヒーレンス条件を満たすことができない。さらに、多様体学習は標準的な$\ell_1$最小化問題に反映されない局所幾何学の学習を強調する。辞書ベースの多様体学習に適した近傍原子による表現を促進するために、重み付き$\ell_0$と重み付き$\ell_1$メトリクスを提案する。データがデラウネー三角測量から生成されると仮定すると、重み付き$\ell_0$と重み付き$\ell_1$の等価性を示す。本稿では,辞書とスパース係数を学習する最適化プログラムについて論じ,合成および実データに対する正規化の有用性を実証する。 Sparse manifold learning algorithms combine techniques in manifold learning and sparse optimization to learn features that could be utilized for downstream tasks. The standard setting of compressive sensing can not be immediately applied to this setup. Due to the intrinsic geometric structure of data, dictionary atoms might be redundant and do not satisfy the restricted isometry property or coherence condition. In addition, manifold learning emphasizes learning local geometry which is not reflected in a standard $\ell_1$ minimization problem. We propose weighted $\ell_0$ and weighted $\ell_1$ metrics that encourage representation via neighborhood atoms suited for dictionary based manifold learning. Assuming that the data is generated from Delaunay triangulation, we show the equivalence of weighted $\ell_0$ and weighted $\ell_1$. We discuss an optimization program that learns the dictionaries and sparse coefficients and demonstrate the utility of our regularization on synthetic and real datasets.	翻訳日:2024-08-05 19:02:21 公開日:2024-08-02
# オンライン学習における最適政策評価のための二重ロバスト区間推定 Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning ( http://arxiv.org/abs/2110.15501v4 ) ライセンス: Link先を確認	Ye Shen, Hengrui Cai, Rui Song,	(参考訳) オンライン実験の初期段階における重要な指示と環境からのタイムリーなフィードバックを提供するため、医療や経済学など多くの分野において、進行中の政策のパフォーマンスを評価することが重要な役割を担っている。オンライン学習における政策評価は、最適政策(すなわち、その価値)の平均的な結果をリアルタイムで推測することによって、注目を集める。しかし、このような問題は、オンライン環境で生成された依存データ、未知の最適ポリシー、適応実験における複雑な探索と搾取のトレードオフなど、特に困難である。本稿では,オンライン学習における政策評価におけるこれらの課題を克服することを目的とする。我々は、一般的に使用される帯域幅アルゴリズムの下で、最適でない動作を探索する確率を定量化する探索の確率を明示的に導出する。この確率を用いて、オンラインの条件付き平均推定器を各動作下で有効に推定し、オンライン学習において、推定された最適ポリシーの下で値を推定する2つの頑健な間隔推定法(DREAM)を開発する。提案した値推定器は、一貫性の二重保護を提供し、ウォルド型信頼区間が提供される漸近正規である。提案手法の実証的妥当性を示すため,大規模なシミュレーション研究と実データ応用を行った。 Evaluating the performance of an ongoing policy plays a vital role in many areas such as medicine and economics, to provide crucial instructions on the early-stop of the online experiment and timely feedback from the environment. Policy evaluation in online learning thus attracts increasing attention by inferring the mean outcome of the optimal policy (i.e., the value) in real-time. Yet, such a problem is particularly challenging due to the dependent data generated in the online environment, the unknown optimal policy, and the complex exploration and exploitation trade-off in the adaptive experiment. In this paper, we aim to overcome these difficulties in policy evaluation for online learning. We explicitly derive the probability of exploration that quantifies the probability of exploring non-optimal actions under commonly used bandit algorithms. We use this probability to conduct valid inference on the online conditional mean estimator under each action and develop the doubly robust interval estimation (DREAM) method to infer the value under the estimated optimal policy in online learning. The proposed value estimator provides double protection for consistency and is asymptotically normal with a Wald-type confidence interval provided. Extensive simulation studies and real data applications are conducted to demonstrate the empirical validity of the proposed DREAM method.	翻訳日:2024-08-05 19:02:21 公開日:2024-08-02
# 質問分解によるテキストからSQLへのパーシングの微妙な改善 Weakly Supervised Text-to-SQL Parsing through Question Decomposition ( http://arxiv.org/abs/2112.06311v4 ) ライセンス: Link先を確認	Tomer Wolfson, Daniel Deutch, Jonathan Berant,	(参考訳) テキストからSQLへのパーサは、非専門家が懸命にリレーショナルデータをクエリできるようにすることに不可欠である。対照的に、このようなパーサーを訓練するには、一般的に、自然言語(NL)の発話を対応するSQLクエリとアノテートする専門知識が必要である。本研究では,テキストからSQLへのパーサの学習において弱い監督手法を提案する。我々は最近提案されたQDMR(QDMR)という,NL言語と形式的クエリ言語の間の中間的表現を利用した。質問、QDMR構造(非専門家によって注釈付けされたり、自動予測されたりする)、そして回答が与えられたら、テキストからSQLモデルをトレーニングするのに使用されるSQLクエリを自動的に合成できます。 5つのベンチマークデータセットで実験することで、アプローチをテストします。その結果, 弱教師付きモデルは, 注釈付きNL-SQLデータで訓練したモデルと競合することがわかった。全体として、SQLアノテーションをゼロにしながら、テキストからSQLへのパーサを効果的にトレーニングします。 Text-to-SQL parsers are crucial in enabling non-experts to effortlessly query relational data. Training such parsers, by contrast, generally requires expertise in annotating natural language (NL) utterances with corresponding SQL queries. In this work, we propose a weak supervision approach for training text-to-SQL parsers. We take advantage of the recently proposed question meaning representation called QDMR, an intermediate between NL and formal query languages. Given questions, their QDMR structures (annotated by non-experts or automatically predicted), and the answers, we are able to automatically synthesize SQL queries that are used to train text-to-SQL models. We test our approach by experimenting on five benchmark datasets. Our results show that the weakly supervised models perform competitively with those trained on annotated NL-SQL data. Overall, we effectively train text-to-SQL parsers, while using zero SQL annotations.	翻訳日:2024-08-05 19:02:21 公開日:2024-08-02
# 視覚変換器:セマンティックセグメンテーションからディエンス予測へ Vision Transformers: From Semantic Segmentation to Dense Prediction ( http://arxiv.org/abs/2207.09339v4 ) ライセンス: Link先を確認	Li Zhang, Jiachen Lu, Sixiao Zheng, Xinxuan Zhao, Xiatian Zhu, Yanwei Fu, Tao Xiang, Jianfeng Feng, Philip H. S. Torr,	(参考訳) 画像分類における視覚変換器(ViT)の出現は、視覚表現学習の方法論をシフトさせた。特に、ViTは、レイヤやその他の代替(例えば、大きなカーネルやアトラスな畳み込み)にわたるCNNの受容野の増加と比較して、すべてのイメージパッチにわたる層ごとの完全な受容野で視覚的表現を学ぶ。本研究では,視覚的密接な予測(セマンティックセグメンテーションなど)のために,ViTのグローバルな文脈学習ポテンシャルを初めて探求する。我々のモチベーションは、全受動的フィールド層でグローバルなコンテキストを階層的に学習することで、高密度な予測タスクにおいて重要な、より強力な長距離依存性情報を取得することができることである。まず,局所的な畳み込みや解像度の低下を伴わないバニラ ViT をパッチのシーケンスとして画像の符号化を行うことで,セマンティックセグメンテーションにおいてより強力な視覚表現が得られることを示す。例えば、Segmentation TRansformer (SETR) と呼ばれ、ADE20K (50.28% mIoU, the first position in the test leaderboard on the submit) を抜粋し、Cityscapes で競争力を発揮する。しかし、基本的なViTアーキテクチャは、ピラミッド構造の欠如、高い計算要求、ローカルコンテキストの不足などにより、オブジェクト検出やインスタンスセグメンテーションといった広範囲にわたる予測アプリケーションでは不十分である。一般の高密度視覚予測タスクにコスト効率で対処するために、階層型局所グロバル変換器(HLG)のファミリーを更に定式化し、窓内部の局所的な注意とピラミッド建築における窓全体のグローバルな注意を特徴とする。画像分類だけでなく,オブジェクト検出やインスタンス分割,セマンティックセマンティックセマンティックセマンティクスなど,多種多種多様な予測タスクにおいて,提案手法が魅力的な性能を発揮することを示す。 The emergence of vision transformers (ViTs) in image classification has shifted the methodologies for visual representation learning. In particular, ViTs learn visual representation at full receptive field per layer across all the image patches, in comparison to the increasing receptive fields of CNNs across layers and other alternatives (e.g., large kernels and atrous convolution). In this work, for the first time we explore the global context learning potentials of ViTs for dense visual prediction (e.g., semantic segmentation). Our motivation is that through learning global context at full receptive field layer by layer, ViTs may capture stronger long-range dependency information, critical for dense prediction tasks. We first demonstrate that encoding an image as a sequence of patches, a vanilla ViT without local convolution and resolution reduction can yield stronger visual representation for semantic segmentation. For example, our model, termed as SEgmentation TRansformer (SETR), excels on ADE20K (50.28% mIoU, the first position in the test leaderboard on the day of submission) and performs competitively on Cityscapes. However, the basic ViT architecture falls short in broader dense prediction applications, such as object detection and instance segmentation, due to its lack of a pyramidal structure, high computational demand, and insufficient local context. For tackling general dense visual prediction tasks in a cost-effective manner, we further formulate a family of Hierarchical Local-Global (HLG) Transformers, characterized by local attention within windows and global-attention across windows in a pyramidal architecture. Extensive experiments show that our methods achieve appealing performance on a variety of dense prediction tasks (e.g., object detection and instance segmentation and semantic segmentation) as well as image classification.	翻訳日:2024-08-05 19:02:21 公開日:2024-08-02
# ベイジアンネットワークによるラグビーのモデル化 : 実践的アプローチ Modelling Assessment Rubrics through Bayesian Networks: a Pragmatic Approach ( http://arxiv.org/abs/2209.05467v3 ) ライセンス: Link先を確認	Francesca Mangili, Giorgia Adorni, Alberto Piatti, Claudio Bonesana, Alessandro Antonucci,	(参考訳) 知的学習システムにおける学習能力の自動評価は基本的な課題である。評価ルーブリックは典型的には、関連する能力と能力レベルを効果的に記述する。本稿では,学習者モデルを直接抽出する手法を提案する。このモデルはベイズ的ネットワークに基づいており、モデルのパラメータの数を減らすために不確実性(しばしばノイズゲートと呼ばれる)を持つ論理ゲートを利用する。本稿では,コンピュータ思考のスキルをテストするために開発された活動の人的評価を自動化するために,この手法を適用する方法について述べる。評価ルーブリックから始まるモデルの簡単な適用により、複数のタスクの迅速な自動化が可能となり、適応的アセスメントツールやインテリジェントなチューリングシステムにおいて、より容易に利用できるようになる。 Automatic assessment of learner competencies is a fundamental task in intelligent tutoring systems. An assessment rubric typically and effectively describes relevant competencies and competence levels. This paper presents an approach to deriving a learner model directly from an assessment rubric defining some (partial) ordering of competence levels. The model is based on Bayesian networks and exploits logical gates with uncertainty (often referred to as noisy gates) to reduce the number of parameters of the model, so to simplify their elicitation by experts and allow real-time inference in intelligent tutoring systems. We illustrate how the approach can be applied to automatize the human assessment of an activity developed for testing computational thinking skills. The simple elicitation of the model starting from the assessment rubric opens up the possibility of quickly automating the assessment of several tasks, making them more easily exploitable in the context of adaptive assessment tools and intelligent tutoring systems.	翻訳日:2024-08-05 19:02:21 公開日:2024-08-02
# 時空間データに対する混合移動平均場誘導学習 Mixed moving average field guided learning for spatio-temporal data ( http://arxiv.org/abs/2301.00736v4 ) ライセンス: Link先を確認	Imma Valentina Curato, Orkun Furat, Lorenzo Proietti, Bennet Stroeh,	(参考訳) 混合移動平均場は時空間データのための多用途モデリングクラスである。しかし、それらの分布は一般には知られていない。このモデリングの前提のもと、我々は新しい時空間埋め込みと一般化されたベイズアルゴリズムを用いてアンサンブル予測を行う理論誘導機械学習アプローチを定義する。リプシッツ予測器を用いて、バッチ学習環境における固定時間および任意のPACベイズ境界を決定する。因果予測の実行は、空間的・時間的短距離・長距離依存データへの潜在的な応用として、我々の方法論のハイライトである。次に,線形予測器と,時空間Ornstein-Uhlenbeckプロセスからシミュレーションしたデータセットを用いて,学習手法の性能を検証した。 Influenced mixed moving average fields are a versatile modeling class for spatio-temporal data. However, their predictive distribution is not generally known. Under this modeling assumption, we define a novel spatio-temporal embedding and a theory-guided machine learning approach that employs a generalized Bayesian algorithm to make ensemble forecasts. We use Lipschitz predictors and determine fixed-time and any-time PAC Bayesian bounds in the batch learning setting. Performing causal forecast is a highlight of our methodology as its potential application to data with spatial and temporal short and long-range dependence. We then test the performance of our learning methodology by using linear predictors and data sets simulated from a spatio-temporal Ornstein-Uhlenbeck process.	翻訳日:2024-08-05 19:02:21 公開日:2024-08-02
# テキスト生成のための再パラメータ化離散拡散モデル A Reparameterized Discrete Diffusion Model for Text Generation ( http://arxiv.org/abs/2302.05737v3 ) ライセンス: Link先を確認	Lin Zheng, Jianbo Yuan, Lei Yu, Lingpeng Kong,	(参考訳) 本研究は, 離散拡散確率モデルと自然言語生成への応用に関する研究である。我々は、離散拡散過程からサンプリングの代替的かつ等価な定式化を導き、この知見を活用して、再パラメータ化された離散拡散モデルのファミリーを開発する。導出された汎用フレームワークは非常に柔軟で、離散拡散モデルにおける生成プロセスの新たな視点を提供し、より効果的なトレーニングと復号化技術を備えている。本研究では,既存の拡散モデルに対して,テキスト生成能力を評価するための広範囲な実験を行った。 This work studies discrete diffusion probabilistic models with applications to natural language generation. We derive an alternative yet equivalent formulation of the sampling from discrete diffusion processes and leverage this insight to develop a family of reparameterized discrete diffusion models. The derived generic framework is highly flexible, offers a fresh perspective of the generation process in discrete diffusion models, and features more effective training and decoding techniques. We conduct extensive experiments to evaluate the text generation capability of our model, demonstrating significant improvements over existing diffusion models.	翻訳日:2024-08-05 19:02:21 公開日:2024-08-02
# 自己監督型ハイブリッドディープラーニングによるロバストミリメートルビームフォーミング Robust Millimeter Beamforming via Self-Supervised Hybrid Deep Learning ( http://arxiv.org/abs/2303.12653v3 ) ライセンス: Link先を確認	Fenghao Zhu, Bohao Wang, Zhaohui Yang, Chongwen Huang, Zhaoyang Zhang, George C. Alexandropoulos, Chau Yuen, Merouane Debbah,	(参考訳) 大型アンテナアレイを用いたビームフォーミングは近年広く使われており、これは5Gと6Gの重要な部分として認識されている。このように、様々な技術が活用され、パフォーマンス、例えば、ディープラーニング、高度な最適化アルゴリズムなどが改善される。ディープラーニングによるこれまでの多くの研究シナリオのパフォーマンスは非常に魅力的なものだが、通常、環境やデータセットを変更すると急速に低下する。したがって、強力なロバスト性を持つ効率的なビームフォーミングネットワークを設計することは、インテリジェント無線通信にとってオープンな問題である。本稿では,頑健なビームフォーミング自己教師ネットワークを提案し,様々なシナリオで異なる2種類のデータセットで検証する。シミュレーションの結果、従来のDeepMIMOとWAIR-Dデータセットの両方において、ハイブリッド学習を用いた自己教師型ネットワークは、様々な環境下で強い堅牢性を持つことがわかった。また,このようなハイブリッド学習の合理性を説明する原理を提案する。 Beamforming with large-scale antenna arrays has been widely used in recent years, which is acknowledged as an important part in 5G and incoming 6G. Thus, various techniques are leveraged to improve its performance, e.g., deep learning, advanced optimization algorithms, etc. Although its performance in many previous research scenarios with deep learning is quite attractive, usually it drops rapidly when the environment or dataset is changed. Therefore, designing effective beamforming network with strong robustness is an open issue for the intelligent wireless communications. In this paper, we propose a robust beamforming self-supervised network, and verify it in two kinds of different datasets with various scenarios. Simulation results show that the proposed self-supervised network with hybrid learning performs well in both classic DeepMIMO and new WAIR-D dataset with the strong robustness under the various environments. Also, we present the principle to explain the rationality of this kind of hybrid learning, which is instructive to apply with more kinds of datasets.	翻訳日:2024-08-05 19:02:21 公開日:2024-08-02
# Adaptive Spiking Encoder-Decoder Network を用いた高精度かつ効率的なイベントベースセマンティックセマンティックセグメンテーション Accurate and Efficient Event-based Semantic Segmentation Using Adaptive Spiking Encoder-Decoder Network ( http://arxiv.org/abs/2304.11857v3 ) ライセンス: Link先を確認	Rui Zhang, Luziwei Leng, Kaiwei Che, Hu Zhang, Jie Cheng, Qinghai Guo, Jiangxing Liao, Ran Cheng,	(参考訳) 低消費電力、イベント駆動型計算、本質的な時間ダイナミクスで知られるスパイキングニューラルネットワーク(SNN)は、イベントベースのセンサーから動的に非同期な信号を処理するための有望なソリューションとして浮上している。その可能性にもかかわらず、SNNはトレーニングとアーキテクチャ設計の課題に直面しており、人工知能ニューラルネットワーク(ANN)と比較して、イベントベースの高密度予測タスクに挑戦する場合には、パフォーマンスが制限される。本研究では,大規模イベントベースセマンティックセマンティックセグメンテーションタスクのための効率的なスパイキングエンコーダデコーダネットワーク(SpikingEDN)を開発した。動的イベントストリームからの学習効率を向上させるために,適応しきい値を用いて,ストリーミング推論におけるネットワークの精度,疎性,堅牢性を向上させる。さらに,スパースイベントやマルチモーダル入力の表現性の向上に特化して,ネットワーク性能を著しく向上させる2経路空間適応変調モジュールを開発した。私たちのSpkingEDNは、DDD17データセットで72.57\%、より大きなDSEC-Semanticデータセットで58.32\%の平均的な結合(MIoU)を獲得し、最先端のANNと競合する結果を示しながら、計算リソースを著しく少なくする。我々の結果は、イベントベースの視覚アプリケーションにおけるSNNの未解決の可能性に光を当てた。ソースコードは一般公開される予定だ。 Spiking neural networks (SNNs), known for their low-power, event-driven computation and intrinsic temporal dynamics, are emerging as promising solutions for processing dynamic, asynchronous signals from event-based sensors. Despite their potential, SNNs face challenges in training and architectural design, resulting in limited performance in challenging event-based dense prediction tasks compared to artificial neural networks (ANNs). In this work, we develop an efficient spiking encoder-decoder network (SpikingEDN) for large-scale event-based semantic segmentation tasks. To enhance the learning efficiency from dynamic event streams, we harness the adaptive threshold which improves network accuracy, sparsity and robustness in streaming inference. Moreover, we develop a dual-path Spiking Spatially-Adaptive Modulation module, which is specifically tailored to enhance the representation of sparse events and multi-modal inputs, thereby considerably improving network performance. Our SpikingEDN attains a mean intersection over union (MIoU) of 72.57\% on the DDD17 dataset and 58.32\% on the larger DSEC-Semantic dataset, showing competitive results to the state-of-the-art ANNs while requiring substantially fewer computational resources. Our results shed light on the untapped potential of SNNs in event-based vision applications. The source code will be made publicly available.	翻訳日:2024-08-05 19:02:21 公開日:2024-08-02
# 電気ネットワークの幾何学的記述とFaddeev-Jackiw量子化 Geometrical description and Faddeev-Jackiw quantization of electrical networks ( http://arxiv.org/abs/2304.12252v3 ) ライセンス: Link先を確認	A. Parra-Rodriguez, I. L. Egusquiza,	(参考訳) ラム要素電気回路理論では、メディアの存在下でマクスウェルの方程式を解く問題は、2つの方程式に還元される: 局所幾何学と閉じ込められたエネルギー密度の力学を包含する構成方程式と、より大きく、位相的なスケールでの電荷とエネルギーの保存を強制するキルヒホフ方程式である。我々は、ラグランジアンおよびレイリー散逸関数から導出される1次微分方程式として、一般ランプ素子電気回路の力学の幾何学的、体系的な新しい記述を開発する。 Faddeev-Jackiw 法により、一般ネットワークのハミルトン的記述を探索する際に生じる特異点を特定し、分類する。我々の解の核は、回路状態が表現可能である還元多様体の正しい同定、例えば、コンパクトな多様体の存在を含むフラックスと電荷の混合に依存する。純ノード束あるいはループ電荷変数が始点構成空間として使われた場合、不規則かつ特異な非線形および非相互回路のハミルトン的記述を得るために、我々の完全プログラマブルな方法を適用する。また、エネルギー要素の分岐変数に対するトポロジーの特定の割り当てを提案し、手順への入力として使用すると、古典的な記述やより関連する量子回路のスペクトルと一致した結果が得られる。この研究は、電気ネットワーク理論の様々な幾何学的イメージを統一し、例えば超伝導量子チップの正確なハミルトン記述の計算を自動化するのに有用であることが証明される。 In lumped-element electrical circuit theory, the problem of solving Maxwell's equations in the presence of media is reduced to two sets of equations, the constitutive equations encapsulating local geometry and dynamics of a confined energy density, and the Kirchhoff equations enforcing conservation of charge and energy in a larger, topological, scale. We develop a new geometric and systematic description of the dynamics of general lumped-element electrical circuits as first order differential equations, derivable from a Lagrangian and a Rayleigh dissipation function. Through the Faddeev-Jackiw method we identify and classify the singularities that arise in the search for Hamiltonian descriptions of general networks. The core of our solution relies on the correct identification of the reduced manifold in which the circuit state is expressible, e.g., a mix of flux and charge degrees of freedom, including the presence of compact ones. We apply our fully programmable method to obtain (canonically quantizable) Hamiltonian descriptions of nonlinear and nonreciprocal circuits which would be cumbersome/singular if pure node-flux or loop-charge variables were used as a starting configuration space. We also propose a specific assignment of topology for the branch variables of energetic elements, that when used as input to the procedure gives results consistent with classical descriptions as well as with spectra of more involved quantum circuits. This work unifies diverse existent geometrical pictures of electrical network theory, and will prove useful, for instance, to automatize the computation of exact Hamiltonian descriptions of superconducting quantum chips.	翻訳日:2024-08-05 19:02:21 公開日:2024-08-02
# 思考の複数の連鎖に対するメタ推論による質問への回答 Answering Questions by Meta-Reasoning over Multiple Chains of Thought ( http://arxiv.org/abs/2304.13007v4 ) ライセンス: Link先を確認	Ori Yoran, Tomer Wolfson, Ben Bogin, Uri Katz, Daniel Deutch, Jonathan Berant,	(参考訳) マルチホップ質問応答(QA)のための現代のシステムは、最終回答に到達する前に、質問を一連の推論ステップ、すなわちチェーン・オブ・シント(CoT)に分割する。多くの場合、複数の連鎖が最終回答の投票機構を通じてサンプリングされ集約されるが、中間ステップ自体は破棄される。このような手法は性能を向上させるが、チェーン間の中間ステップの関係を考慮せず、予測された解に対する統一的な説明を提供しない。 MCR(Multi-Chain Reasoning)は,大規模言語モデルに対して,回答を集約するのではなく,複数の思考チェーン上でメタ推論を行うアプローチである。 MCRは、異なる推論連鎖を調べ、それら間で情報を混合し、説明を生成し、答えを予測する際に最も関係のある事実を選択する。 MCRは7つのマルチホップQAデータセットで強いベースラインを上回ります。さらに,本分析の結果から,MCRの説明は高品質であり,人間が回答を検証できることが判明した。 Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-of-thought (CoT), before arriving at a final answer. Often, multiple chains are sampled and aggregated through a voting mechanism over the final answers, but the intermediate steps themselves are discarded. While such approaches improve performance, they do not consider the relations between intermediate steps across chains and do not provide a unified explanation for the predicted answer. We introduce Multi-Chain Reasoning (MCR), an approach which prompts large language models to meta-reason over multiple chains of thought, rather than aggregating their answers. MCR examines different reasoning chains, mixes information between them and selects the most relevant facts in generating an explanation and predicting the answer. MCR outperforms strong baselines on 7 multi-hop QA datasets. Moreover, our analysis reveals that MCR explanations exhibit high quality, enabling humans to verify its answers.	翻訳日:2024-08-05 19:02:21 公開日:2024-08-02
# 条件量子温度測定 --少ない測定による精度の向上- Conditional quantum thermometry -- enhancing precision by measuring less ( http://arxiv.org/abs/2304.13595v2 ) ライセンス: Link先を確認	Akira Sone, Diogo O. Soares-Pinto, Sebastian Deffner,	(参考訳) 量子系の温度を正確に測定することは難しい課題である。量子情報の数学的特異性により、無限の精度で測定することは事実上不可能である。本稿では、利用可能な測定装置のポインター状態に条件付けされた一般化熱状態を紹介する。この条件付き熱状態は、量子温度測定においてギブス状態よりも優れていることを示す。拡張精度の起源は、ウィグナー・ヤネーゼ・ダイソンスキュー情報によって定量化される非対称性で求めることができる。この追加資源は, 完全資源理論解析においてさらに解明され, 対象状態を条件付き熱状態に変換するギブス保存マップが存在することを示す。条件付き熱状態と同じターゲット状態の量子J偏差を量子熱に関連付ける。 Taking accurate measurements of the temperature of quantum systems is a challenging task. The mathematical peculiarities of quantum information make it virtually impossible to measure with infinite precision. In the present paper, we introduce a generalize thermal state, which is conditioned on the pointer states of the available measurement apparatus. We show that this conditional thermal state outperforms the Gibbs state in quantum thermometry. The origin for the enhanced precision can be sought in its asymmetry quantified by the Wigner-Yanase-Dyson skew information. This additional resource is further clarified in a fully resource-theoretic analysis, and we show that there is a Gibbs-preserving map to convert a target state into the conditional thermal state. We relate the quantum J-divergence between the conditional thermal state and the same target state to quantum heat.	翻訳日:2024-08-05 18:53:04 公開日:2024-08-02
# ボソン-ボソン相互作用を持たない単一ボソンモードによる臨界センシング Critical sensing with a single bosonic mode without boson-boson interactions ( http://arxiv.org/abs/2305.17656v3 ) ライセンス: Link先を確認	Ken Chen, Jia-Hao Lü, Xin Zhu, Hao-Long Zhang, Wen Ning, Zhen-Biao Yang, Shi-Biao Zheng,	(参考訳) 量子系の臨界現象は、量子センシングの強化に有用である。しかし、臨界性増強の実験的な実現は、熱力学やスケーリングの限界を含む厳密な要件、相互作用する量子スーパシステムや粒子の制御など、ごく少数のシステムに限られている。ここでは、これらの条件をどちらも必要としない単純な臨界量子センシング方式を提案する。臨界系は、多くの非相互作用ボソンを含む1つのパラメトリック駆動ボソニックモードで実現される。我々は、量子フィッシャー情報を計算し、臨界を許容するエンハンスメントを確認するシミュレーションを行う。さらに、制御パラメータの変動に対する二次関数の1つの応答について詳述する。数値的な結果から,その逆分散は臨界点における変動挙動を示すことが明らかとなった。現在利用可能なパラメトリック駆動の制御技術に基づいて,本手法はイオントラップや超伝導回路など,様々なシステムで実現可能であることを期待する。 Critical phenomena of quantum systems are useful for enhancement of quantum sensing. However, experimental realizations of criticality enhancement have been confined to very few systems, owing to the stringent requirements, including the thermodynamical or scaling limit, and fine control of interacting quantum susystems or particles. We here propose a simple critical quantum sensing scheme that requires neither of these conditions. The critical system is realized with a single parametrically-driven bosonic mode involving many non-interacting bosons. We calculate the quantum Fisher information, and perform a simulation, which confirms the criticality-enabled enhancement. We further detail the response of one of the quadratures to the variation of the control parameter. The numerical results reveal that its inverted variance exhibits a diverging behavior at the critical point. Based on the presently available control techniques of parametric driving, we expect our scheme can be realized in different systems, e.g., ion traps and superconducting circuits.	翻訳日:2024-08-05 18:53:04 公開日:2024-08-02
# フローガイド型ナノスケールローカライゼーションの設計空間の展望 Insights from the Design Space Exploration of Flow-Guided Nanoscale Localization ( http://arxiv.org/abs/2305.18493v3 ) ライセンス: Link先を確認	Filip Lemic, Gerard Calvo Bartra, Arnau Brosa López, Jorge Torres Gómez, Jakob Struye, Falko Dressler, Sergi Abadal, Xavier Costa Perez,	(参考訳) Terahertz(THz)をベースとした無線通信機能を備えたナノデバイスは、ヒトの血流内におけるフロー誘導局在のプライマーを提供する。このようなローカライゼーションは、イベント自体に知覚されたイベントの位置を割り当てることを可能にし、早期かつ正確な診断の線に沿って利益を提供し、コストと侵襲性を低減させる。フロー誘導型ローカライゼーションはまだ初歩的な段階であり、この問題を対象とする研究はごくわずかである。それにもかかわらず、提案手法の性能評価は、通常、単一の性能指標に沿って、そのようなスケール(例えば、ナノデバイスの限られたエネルギー)と、そのような困難な環境(例えば、体内のTHz伝搬の極端減衰)で関係する様々な側面を無視する非標準化方法で既に実施されている。このように、これらの評価は現実主義のレベルが低く、客観的に比較することはできない。この問題に対処するために、我々はシナリオの環境とスケールに関連する特質を説明し、その精度や信頼性などの不均一なパフォーマンス指標に沿って、最先端のフロー誘導型ローカライゼーションアプローチの2つの性能を評価する。 Nanodevices with Terahertz (THz)-based wireless communication capabilities are providing a primer for flow-guided localization within the human bloodstreams. Such localization is allowing for assigning the locations of sensed events with the events themselves, providing benefits along the lines of early and precise diagnostics, and reduced costs and invasiveness. Flow-guided localization is still in a rudimentary phase, with only a handful of works targeting the problem. Nonetheless, the performance assessments of the proposed solutions are already carried out in a non-standardized way, usually along a single performance metric, and ignoring various aspects that are relevant at such a scale (e.g., nanodevices' limited energy) and for such a challenging environment (e.g., extreme attenuation of in-body THz propagation). As such, these assessments feature low levels of realism and cannot be compared in an objective way. Toward addressing this issue, we account for the environmental and scale-related peculiarities of the scenario and assess the performance of two state-of-the-art flow-guided localization approaches along a set of heterogeneous performance metrics such as the accuracy and reliability of localization.	翻訳日:2024-08-05 18:53:04 公開日:2024-08-02
# ランダム階層グラフにおける量子ウォークの指数的高速化 Exponential speedups for quantum walks in random hierarchical graphs ( http://arxiv.org/abs/2307.15062v2 ) ライセンス: Link先を確認	Shankar Balasubramanian, Tongyang Li, Aram Harrow,	(参考訳) 量子アルゴリズムの指数的スピードアップは知られていないが、これらはさらに少ないファミリーに分類される傾向がある。一般化に抵抗するスピードアップの1つは、Childs, Cleve, Deotto, Farhi, Gutmann, Spielman が溶接木グラフを横切るために量子ウォークを使うことである。これを、頂点が$d$次元格子に従って配置された「超頂点」にグループ化されるような、階層グラフの大規模なクラスに一般化する方法を示す。スーパーバーティスのサイズは異なり、スーパーバーティス間のエッジはその構成頂点間のランダムな接続に対応する。これらのグラフ上の量子ウォークのヒット時間は、特定の乱れた強結合ハミルトニアンにおけるゼロモードの局在特性に関係している。スピードアップは、下層の次元とランダムグラフモデルによって、スーパーポリノミカルから指数関数まで様々である。また、これらの階層グラフの具体的実現法を提供し、グラフスペーサー化を用いて、効率的な量子トラバース時間でグラフを構築する一般的な方法を提案する。 There are few known exponential speedups for quantum algorithms and these tend to fall into even fewer families. One speedup that has mostly resisted generalization is the use of quantum walks to traverse the welded-tree graph, due to Childs, Cleve, Deotto, Farhi, Gutmann, and Spielman. We show how to generalize this to a large class of hierarchical graphs in which the vertices are grouped into "supervertices" which are arranged according to a $d$-dimensional lattice. Supervertices can have different sizes, and edges between supervertices correspond to random connections between their constituent vertices. The hitting times of quantum walks on these graphs are related to the localization properties of zero modes in certain disordered tight binding Hamiltonians. The speedups range from superpolynomial to exponential, depending on the underlying dimension and the random graph model. We also provide concrete realizations of these hierarchical graphs, and introduce a general method for constructing graphs with efficient quantum traversal times using graph sparsification.	翻訳日:2024-08-05 18:53:04 公開日:2024-08-02
# SemiSFL: ラベルなしおよび非IIDデータによるフェデレーション学習 SemiSFL: Split Federated Learning on Unlabeled and Non-IID Data ( http://arxiv.org/abs/2307.15870v5 ) ライセンス: Link先を確認	Yang Xu, Yunming Liao, Hongli Xu, Zhipeng Sun, Liusheng Huang, Chunming Qiao,	(参考訳) フェデレートラーニング(FL)は、複数のクライアントがネットワークエッジでプライベートデータ上で機械学習モデルを協調的にトレーニングできるようにするためのものだ。しかし、リソース制約のあるデバイス上での大規模モデルのトレーニングとデプロイは困難である。幸いなことに、SFL(Split Federated Learning)は、クライアントの計算や通信の負担を軽減することで、実現可能なソリューションを提供します。しかし、既存のSFLの作業は、クライアントに十分なラベル付きデータを仮定することが多い。さらに、データ非IIDnessは、効率的なモデルトレーニングを保証するために別の課題となる。我々の知る限りでは、上記の2つの問題はSFLでは同時に解決されていない。そこで本研究では,クラスタリング正規化を組み込んで,ラベルなしおよび非IIDクライアントデータでSFLを実行する,Semi-supervised SFLシステムを提案する。さらに、モデル収束に関する理論的および実験的研究により、ラベル付きおよびラベルなしデータの一貫性のないトレーニングプロセスがクラスタリング正則化の有効性に影響を及ぼすことが明らかとなった。トレーニングの不整合を軽減するため,グローバルな更新頻度を動的に調整し,トレーニング性能を向上させるアルゴリズムを開発した。ベンチマークモデルとデータセットの大規模な実験により、我々のシステムはトレーニング時間の3.8倍のスピードアップを提供し、目標精度に達しながら通信コストを約70.3%削減し、最先端のベースラインと比較して、非IIDシナリオで最大5.8%の精度向上を実現している。 Federated Learning (FL) has emerged to allow multiple clients to collaboratively train machine learning models on their private data at the network edge. However, training and deploying large-scale models on resource-constrained devices is challenging. Fortunately, Split Federated Learning (SFL) offers a feasible solution by alleviating the computation and/or communication burden on clients. However, existing SFL works often assume sufficient labeled data on clients, which is usually impractical. Besides, data non-IIDness poses another challenge to ensure efficient model training. To our best knowledge, the above two issues have not been simultaneously addressed in SFL. Herein, we propose a novel Semi-supervised SFL system, termed SemiSFL, which incorporates clustering regularization to perform SFL with unlabeled and non-IID client data. Moreover, our theoretical and experimental investigations into model convergence reveal that the inconsistent training processes on labeled and unlabeled data have an influence on the effectiveness of clustering regularization. To mitigate the training inconsistency, we develop an algorithm for dynamically adjusting the global updating frequency, so as to improve training performance. Extensive experiments on benchmark models and datasets show that our system provides a 3.8x speed-up in training time, reduces the communication cost by about 70.3% while reaching the target accuracy, and achieves up to 5.8% improvement in accuracy under non-IID scenarios compared to the state-of-the-art baselines.	翻訳日:2024-08-05 18:53:04 公開日:2024-08-02
# 言語モデルを用いた算術:記憶から計算へ Arithmetic with Language Models: from Memorization to Computation ( http://arxiv.org/abs/2308.01154v4 ) ライセンス: Link先を確認	Davide Maltoni, Matteo Ferrara,	(参考訳) 最近の大規模言語モデルの創発的計算と問題解決能力についてより深く理解することは、それらをさらに改善し、適用性を広げる上で、最重要事項である。本研究は、次のトークンを予測するために訓練された言語モデルが、トレーニングデータを超えて一般化された算術演算を実行する方法を検討する。バイナリの追加と乗算は、非常に小さな語彙を必要とするため、新しいデータに対してスムーズな入力補間を行うのに有効な入力/出力の不連続性を示すため、この目的のために良いテストベッドを構成する。我々はこれらのタスクを学ぶために軽言語モデルを訓練し、外挿能力と内部情報処理を調べるために多くの実験を行った。本研究は,入力トークン表現が適切な内部表現にマップされると,演算が値空間内で行われるエンコード-回帰-復号機として機能する,という仮説を支持する。 A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypothesis that the language model works as an Encoding-Regression-Decoding machine where the computation takes place in the value space once the input token representation is mapped to an appropriate internal representation.	翻訳日:2024-08-05 18:53:04 公開日:2024-08-02
# クロスオブジェクト脳波を用いた感情認識のための半教師付きデュアルストリーム自己弁別グラフコントラスト学習 Semi-Supervised Dual-Stream Self-Attentive Adversarial Graph Contrastive Learning for Cross-Subject EEG-based Emotion Recognition ( http://arxiv.org/abs/2308.11635v2 ) ライセンス: Link先を確認	Weishan Ye, Zhiguo Zhang, Fei Teng, Min Zhang, Jianhong Wang, Dong Ni, Fali Li, Peng Xu, Zhen Liang,	(参考訳) 脳波(Electroencephalography、EEG)は、有望な応用を伴う感情認識のための客観的ツールである。しかし、この分野ではラベル付きデータの不足が大きな課題であり、脳波に基づく感情認識の広範使用を制限する。本稿では,クロスオブジェクトの脳波に基づく感情認識において,限定ラベル付きデータの課題に対処するために,半教師付きデュアルストリーム自己弁別グラフコントラスト学習フレームワーク(DS-AGC)を提案する。 DS-AGCフレームワークは、非構造的および構造的EEG特徴を抽出する2つの並列ストリームを含む。非構造的ストリームは、ラベル付きソースドメイン、ラベル付きソースドメイン、未知のターゲットドメイン間の分散不一致を軽減するための、半教師付きマルチドメイン適応手法を組み込んでいる。構造ストリームは,複数のEEGチャネルから効率的なグラフベースの特徴表現を半教師付きで抽出するグラフコントラスト学習法を開発する。さらに、特徴融合、サンプル選択、感情認識のための自己注意融合モジュールが開発され、ターゲットドメインに近いラベル付きソースドメイン内の感情やデータサンプルとより関連性の高い脳波特徴が強調される。 2つのベンチマークデータベース(SEEDとSEED-IV)で半教師付きクロスオブジェクト・ワン・オブ・サブオブジェクト・アウト・クロスバリデーション・アセスメント・スキームを用いて行った大規模な実験により、提案モデルは、異なる不完全なラベル条件下で既存の手法よりも優れ(SEED-IVでは平均5.83%、SEED-IVでは6.99%)、クロスオブジェクトEEGベースの感情認識におけるラベル不足問題に対処する効果を示す。 Electroencephalography (EEG) is an objective tool for emotion recognition with promising applications. However, the scarcity of labeled data remains a major challenge in this field, limiting the widespread use of EEG-based emotion recognition. In this paper, a semi-supervised Dual-stream Self-Attentive Adversarial Graph Contrastive learning framework (termed as DS-AGC) is proposed to tackle the challenge of limited labeled data in cross-subject EEG-based emotion recognition. The DS-AGC framework includes two parallel streams for extracting non-structural and structural EEG features. The non-structural stream incorporates a semi-supervised multi-domain adaptation method to alleviate distribution discrepancy among labeled source domain, unlabeled source domain, and unknown target domain. The structural stream develops a graph contrastive learning method to extract effective graph-based feature representation from multiple EEG channels in a semi-supervised manner. Further, a self-attentive fusion module is developed for feature fusion, sample selection, and emotion recognition, which highlights EEG features more relevant to emotions and data samples in the labeled source domain that are closer to the target domain. Extensive experiments conducted on two benchmark databases (SEED and SEED-IV) using a semi-supervised cross-subject leave-one-subject-out cross-validation evaluation scheme show that the proposed model outperforms existing methods under different incomplete label conditions (with an average improvement of 5.83% on SEED and 6.99% on SEED-IV), demonstrating its effectiveness in addressing the label scarcity problem in cross-subject EEG-based emotion recognition.	翻訳日:2024-08-05 18:53:04 公開日:2024-08-02
# ルールに基づく動作軌跡分類の誤り検出と補正 Rule-Based Error Detection and Correction to Operationalize Movement Trajectory Classification ( http://arxiv.org/abs/2308.14250v3 ) ライセンス: Link先を確認	Bowen Xi, Kevin Scaria, Divyagna Bavikadi, Paulo Shakarian,	(参考訳) 移動軌跡の分類は交通に多くの応用があり、災害や外部ショックの余波において重要な安全性を有する大規模移動軌跡生成および異常検出の鍵となる要素である。しかし、現在の最先端(SOTA)は教師付き深層学習に基づいているため、このようなショックによって軌道の分布が変化した場合に課題が生じる。我々は,これらのモデルの誤り訂正と検出を行い,運動軌道プラットフォームに統合するための,ニューロシンボリックなルールベースのフレームワークを提供する。我々は,最近のSOTAモデルにおいて,精度の高い誤り検出,テスト分布の変化による精度向上,およびアルゴリズム開発を通知する理論的特性のスイートに加えて,基本ユースケースの精度向上を示す一連の実験を行った。具体的には、最大0.984の誤差を予測するためのF1スコア、分布外精度の大幅な向上(ゼロショット精度のSOTAよりも8.51%改善)、SOTAモデルよりも精度の向上を示す。 Classification of movement trajectories has many applications in transportation and is a key component for large-scale movement trajectory generation and anomaly detection which has key safety applications in the aftermath of a disaster or other external shock. However, the current state-of-the-art (SOTA) are based on supervised deep learning - which leads to challenges when the distribution of trajectories changes due to such a shock. We provide a neuro-symbolic rule-based framework to conduct error correction and detection of these models to integrate into our movement trajectory platform. We provide a suite of experiments on several recent SOTA models where we show highly accurate error detection, the ability to improve accuracy with a changing test distribution, and accuracy improvement for the base use case in addition to a suite of theoretical properties that informed algorithm development. Specifically, we show an F1 scores for predicting errors of up to 0.984, significant performance increase for out-of distribution accuracy (8.51% improvement over SOTA for zero-shot accuracy), and accuracy improvement over the SOTA model.	翻訳日:2024-08-05 18:53:04 公開日:2024-08-02
# エントロピー最大化によるブラックホール Black Hole from Entropy Maximization ( http://arxiv.org/abs/2309.00602v4 ) ライセンス: Link先を確認	Yuki Yokokura,	(参考訳) 局所ホログラフィーと熱力学によって動機付けられたブラックホールの量子的特徴の一つは、与えられた表面領域に対する熱力学的エントロピーを最大化することである。量子重力の文脈では、これは地平線による古典的な特徴づけよりも基礎的である。ステップとして、多くの物質場を持つ4次元半古典アインシュタイン方程式を解くことにより、この可能性を探究し、ブラックホールの画像を見つける。球面静的な高励起構成に対しては、局所的な典型を適用し、自己重力を含むエントロピーを推定し、その上界を導出する。飽和条件はエントロピー最大化の構成を一意に決定する: 自己重力量子は地平線のない放射状に一様密な構成に凝縮し、そこでは自己重力と曲率によって誘導される大きな量子圧がバランスを取り、特異性は現れない。内部計量はプランク定数に対する自己整合かつ非摂動解である。エントロピー密度の体積積分によって与えられる最大エントロピーは、自己重力によるベーケンシュタイン・ホーキングの公式と一致する。最後に、10の将来の展望が議論され、この構成は半古典的にホログラフィックバルク力学を持つ量子重力凝縮体を表すという投機的見解が導かれる。 One quantum characterization of a black hole motivated by (local) holography and thermodynamics is that it maximizes thermodynamic entropy for a given surface area. In the context of quantum gravity, this could be more fundamental than the classical characterization by a horizon. As a step, we explore this possibility by solving the 4D semi-classical Einstein equation with many matter fields, and find a picture of a black hole. For spherical static highly-excited configurations, we apply local typicality and estimate the entropy including self-gravity to derive its upper bound. The saturation condition uniquely determines the entropy-maximized configuration: self-gravitating quanta condensate into a radially-uniform dense configuration with no horizon, where the self-gravity and a large quantum pressure induced by the curvatures are balanced and no singularity appears. The interior metric is a self-consistent and non-perturbative solution for Planck's constant. The maximum entropy, given by the volume integral of the entropy density, agrees with the Bekenstein-Hawking formula through self-gravity, deriving the Bousso bound for thermodynamic entropy. Finally, 10 future prospects are discussed, leading to the speculative view that the configuration represents semi-classically a quantum-gravitational condensate with holographic bulk dynamics.	翻訳日:2024-08-05 18:53:04 公開日:2024-08-02
# コンテンツモデレーションにおけるミスモーダル推論のためのマルチモーダル誘導ネットワーク Multimodal Guidance Network for Missing-Modality Inference in Content Moderation ( http://arxiv.org/abs/2309.03452v2 ) ライセンス: Link先を確認	Zhuokai Zhao, Harish Palani, Tianyi Liu, Lena Evans, Ruth Toner,	(参考訳) マルチモーダルディープラーニング(特に視覚言語モデル)は近年、コンテンツモデレーションや暴力検出など、多くの下流タスクのパフォーマンスを大幅に向上させ、大きな注目を集めている。しかしながら、標準的なマルチモーダルアプローチは、トレーニングと推論の間に一貫したモダリティを仮定し、多くの実世界のユースケースで応用を制限する。既存の研究は、欠落したモダリティを再構築することでこの問題を軽減するが、必然的に不要な計算コストを増大させる。そこで本研究では,学習中の知識共有を促進する新しいガイダンスネットワークを提案し,マルチモーダル表現を活用して,より優れた単一モダリティモデルを推論に使用するためのトレーニングを行う。暴力検出における実世界の実験は、提案フレームワークが従来の訓練されたモデルを大幅に上回るシングルモダリティモデルを訓練し、推論の計算コストの増大を回避していることを示している。 Multimodal deep learning, especially vision-language models, have gained significant traction in recent years, greatly improving performance on many downstream tasks, including content moderation and violence detection. However, standard multimodal approaches often assume consistent modalities between training and inference, limiting applications in many real-world use cases, as some modalities may not be available during inference. While existing research mitigates this problem through reconstructing the missing modalities, they unavoidably increase unnecessary computational cost, which could be just as critical, especially for large, deployed infrastructures in industry. To this end, we propose a novel guidance network that promotes knowledge sharing during training, taking advantage of the multimodal representations to train better single-modality models to be used for inference. Real-world experiments in violence detection shows that our proposed framework trains single-modality models that significantly outperform traditionally trained counterparts, while avoiding increases in computational cost for inference.	翻訳日:2024-08-05 18:53:04 公開日:2024-08-02
# ロシア語における事前学習型トランスフォーマー言語モデルの一家系 A Family of Pretrained Transformer Language Models for Russian ( http://arxiv.org/abs/2309.10931v4 ) ライセンス: Link先を確認	Dmitry Zmitrovich, Alexander Abramov, Andrey Kalmykov, Maria Tikhonova, Ekaterina Taktasheva, Danil Astafurov, Mark Baushenko, Artem Snegirev, Vitalii Kadulin, Sergey Markov, Tatiana Shavrina, Vladislav Mikhailov, Alena Fenogenova,	(参考訳) トランスフォーマー言語モデル(LM)は、様々な言語におけるNLP研究方法論と応用の基礎である。しかし、ロシア語に特化したそのようなモデルの開発はほとんど注目されていない。本稿では、エンコーダ(ruBERT, ruRoBERTa, ruELECTRA)、デコーダ(ruGPT-3)、エンコーダ-デコーダ(ruT5, FRED-T5)アーキテクチャにまたがる13のロシアトランスフォーマーLMのコレクションを紹介する。本稿では, モデルアーキテクチャの設計と事前学習について報告し, それらの一般化能力をロシア語理解および生成データセットおよびベンチマーク上で評価した結果について述べる。これらの特殊なトランスフォーマーLMの事前学習とリリースにより、NLP研究の方向性の範囲を広げ、ロシア語のための産業ソリューションの開発を可能にすることを目指している。 Transformer language models (LMs) are fundamental to NLP research methodologies and applications in various languages. However, developing such models specifically for the Russian language has received little attention. This paper introduces a collection of 13 Russian Transformer LMs, which spans encoder (ruBERT, ruRoBERTa, ruELECTRA), decoder (ruGPT-3), and encoder-decoder (ruT5, FRED-T5) architectures. We provide a report on the model architecture design and pretraining, and the results of evaluating their generalization abilities on Russian language understanding and generation datasets and benchmarks. By pretraining and releasing these specialized Transformer LMs, we aim to broaden the scope of the NLP research directions and enable the development of industrial solutions for the Russian language.	翻訳日:2024-08-05 18:53:04 公開日:2024-08-02
# フラックスパルスを用いたフラクソニウム量子ビットの読み出し Flux-pulse-assisted Readout of a Fluxonium Qubit ( http://arxiv.org/abs/2309.17286v2 ) ライセンス: Link先を確認	Taryn V. Stefanski, Christian Kraglund Andersen,	(参考訳) 大規模な超伝導量子デバイスのためのトランスモンアーキテクチャに多くの注意が向けられているが、フラクソニウム量子ビットが後継候補として浮上している。ヨーゼフソン接合と平行に振る舞うインダクタにより、フラクソニウムはより大きな非調和性を提供し、誘電体損失に対して強い保護を与えるため、従来のトランモン量子ビットと比較してコヒーレンス時間が高い。フラクソニウム量子ビットの誘導エネルギーポテンシャルとジョセフソンエネルギーポテンシャルの間の相互作用は、外部フラックスをチューニングする際に、豊富な分散シフトランドスケープをもたらす。ここでは、量子ビットの読み出しを改善するために分散シフトにおける特徴を活用することを提案する。具体的には,大規模な分散シフトを伴うフラックスバイアス点において,読み出し時間と誤り率の改善を示す理論シミュレーションについて報告する。我々は、異なるエラーチャネルを含むようにスキームを拡張し、155 nsの積分時間で、フラックスパルスアシストによる読み出しにより、信号対雑音比が約5倍向上することを示す。さらに, フラックスパルスアシスト再生点におけるパーセル速度の増加を考慮した場合, 有限測定効率と準静圧フラックスノイズとの併用により, 性能改善が持続することを示す。提案するフラックスパルスアシスト型読み出し方式の実装を可能にするフラクトロニウムアーキテクチャの妥当なエネルギーパラメータセットを提案する。 Much attention has focused on the transmon architecture for large-scale superconducting quantum devices, however, the fluxonium qubit has emerged as a possible successor. With a shunting inductor in parallel to a Josephson junction, the fluxonium offers larger anharmonicity and stronger protection against dielectric loss, leading to higher coherence times as compared to conventional transmon qubits. The interplay between the inductive and Josephson energy potentials of the fluxonium qubit leads to a rich dispersive shift landscape when tuning the external flux. Here we propose to exploit the features in the dispersive shift to improve qubit readout. Specifically, we report on theoretical simulations showing improved readout times and error rates by performing the readout at a flux bias point with large dispersive shift. We expand the scheme to include different error channels, and show that with an integration time of 155 ns, flux-pulse-assisted readout offers about 5 times improvement in the signal to noise ratio. Moreover, we show that the performance improvement persists in the presence of finite measurement efficiency combined with quasi-static flux noise, and also when considering the increased Purcell rate at the flux-pulse-assisted readout point. We suggest a set of reasonable energy parameters for the fluxonium architecture that will allow for the implementation of our proposed flux-pulse-assisted readout scheme.	翻訳日:2024-08-05 18:43:16 公開日:2024-08-02
# DragD3D: 2次元拡散前処理による剛性制御によるリアルメッシュ編集 DragD3D: Realistic Mesh Editing with Rigidity Control Driven by 2D Diffusion Priors ( http://arxiv.org/abs/2310.04561v2 ) ライセンス: Link先を確認	Tianhao Xie, Eugene Belilovsky, Sudhir Mudur, Tiberiu Popa,	(参考訳) ダイレクトメッシュの編集と変形は、幾何学的モデリングとアニメーションパイプラインの重要なコンポーネントである。メッシュ編集法は通常、ユーザ指定の頂点制約と他の頂点の位置を決定する正規化器を組み合わせた最適化問題としてフレーム化される。正則化器の選択は、最終的な結果の現実性と信頼性の鍵となる。物理と幾何学に基づく正規化器は、対象のグローバルな文脈や意味を意識せず、より最近のディープラーニングの先行は、特定の3次元オブジェクトの変形のクラスに限られる。 DragD3Dと呼ばれる頂点ベースのメッシュ編集手法は,(1)変形の回転成分とストレッチ成分を分離し3次元幾何正規化器と(2)最近導入されたDDS損失とを組み合わせた新しい最適化式を,拡散モデルから導出した2次元画像の忠実度を評価する。したがって, この変形法は, 対象物の種類に制限されない世界的現実的な形状変形を実現する。我々の新しい定式化は、回転成分と伸縮成分を明示的に分離する神経ジャコビアン場の変換を直接最適化する。最適化の目的関数は、DDSの近似勾配と幾何学的損失からの勾配を組み合わせて頂点制約を満たす。所望の大域形状変形に対する追加のユーザ制御は、明示的な三角形変形制御と、変形の回転成分と伸縮成分の明示的な分離を可能にする。我々の変形は, 物体のグローバルな文脈を認識した現実的な形状変形を生じさせ, 幾何正規化器よりも優れた結果が得られることを示す。 Direct mesh editing and deformation are key components in the geometric modeling and animation pipeline. Mesh editing methods are typically framed as optimization problems combining user-specified vertex constraints with a regularizer that determines the position of the rest of the vertices. The choice of the regularizer is key to the realism and authenticity of the final result. Physics and geometry-based regularizers are not aware of the global context and semantics of the object, and the more recent deep learning priors are limited to a specific class of 3D object deformations. Our main contribution is a vertex-based mesh editing method called DragD3D based on (1) a novel optimization formulation that decouples the rotation and stretch components of the deformation and combines a 3D geometric regularizer with (2) the recently introduced DDS loss which scores the faithfulness of the rendered 2D image to one from a diffusion model. Thus, our deformation method achieves globally realistic shape deformation which is not restricted to any class of objects. Our new formulation optimizes directly the transformation of the neural Jacobian field explicitly separating the rotational and stretching components. The objective function of the optimization combines the approximate gradients of DDS and the gradients from the geometric loss to satisfy the vertex constraints. Additional user control over desired global shape deformation is made possible by allowing explicit per-triangle deformation control as well as explicit separation of rotational and stretching components of the deformation. We show that our deformations can be controlled to yield realistic shape deformations that are aware of the global context of the objects, and provide better results than just using geometric regularizers.	翻訳日:2024-08-05 18:43:16 公開日:2024-08-02
# QFTの検出器ベース測定:2つの課題とAQFTの提案 Detector-based measurements for QFT: two issues and an AQFT proposal ( http://arxiv.org/abs/2310.06596v2 ) ライセンス: Link先を確認	Nicola Pranzini, Esko Keski-Vakkuri,	(参考訳) 本稿では, J. Polo-G\'omez と L. J. Garay と E. Mart\'in-Mart\'inez が提示した QFT 測定スキームの2つの問題について, 量子場理論の検出器ベース測定理論について述べる。実測スキームを文脈的フィールド状態に適用した場合に発生するいくつかの相違点を指摘し、局所処理領域に基づく$n$-point関数の割り当てが不整合を引き起こすことがあることを示す。これらの問題を解決するために、非相対論的検出器を用いて量子場理論のハーグ・カーストラーの定式化において、代数状態の更新規則を誘導するために測定スキームを変更した。このようにして、$n$-point関数は、測定値と明確な因果関係を持つ任意の領域にわたって一貫して評価することができる。 We present and investigate two issues within the measurement scheme for QFT presented by J. Polo-G\'omez, L. J. Garay and E. Mart\'in-Mart\'inez in "A detector-based measurement theory for quantum field theory". We point out some discrepancies that arise when the measurement scheme is applied to contextual field states and show that $n$-point function assignments based on local processing regions sometimes lead to inconsistencies. To solve these issues, we modify the measurement scheme to use non-relativistic detectors to induce an update rule for algebraic states in the Haag-Kastler formulation of quantum field theory. In this way, $n$-point functions can be consistently evaluated across any region having a definite causal relation with measurements.	翻訳日:2024-08-05 18:43:16 公開日:2024-08-02
# 量子カーネル法のハイパーパラメーターによる研究 A Hyperparameter Study for Quantum Kernel Methods ( http://arxiv.org/abs/2310.11891v3 ) ライセンス: Link先を確認	Sebastian Egginger, Alona Sakhnenko, Jeanette Miriam Lorenz,	(参考訳) 量子カーネル法は、量子機械学習において、それらに接続された保証のおかげで有望な方法である。分析的考察に対するそれらのアクセシビリティは、量子的優位性の可能性に基づいてデータセットを事前スクリーニングする可能性も開ける。そのため、初期の研究は、量子カーネルと古典的カーネルの間の2つのカーネルベースの機械学習アプローチの密接度尺度として理解できる幾何学的差異を開発した。この計量は量子と古典的なモデルの複雑さを結びつけ、一般化誤差を有界にするために開発された。したがって、この計量が経験的な環境でどのように振る舞うかという疑問が提起される。本研究では,ハイパーパラメータ選択がモデル性能および古典カーネルと量子カーネル間の一般化ギャップに与える影響について検討する。ハイパーパラメータの重要性は、古典的な機械学習においてもよく知られている。特に興味深いのは、量子ハミルトン進化特徴写像に関連するハイパーパラメータと、投影された量子カーネルを計算する前に追跡する量子ビットの数である。 11データセットにわたるハイパーパラメータを徹底的に調査し、利用可能な特定の側面を特定します。クロスバリデーション精度によって測定された経験的性能に対するある種のハイパーパラメータ設定の影響の解析と、上記の幾何学的差異によって測定された一般化能力は、古典的なデータセット上での量子カーネル法の可能性を理解するための一歩となる。 Quantum kernel methods are a promising method in quantum machine learning thanks to the guarantees connected to them. Their accessibility for analytic considerations also opens up the possibility of prescreening datasets based on their potential for a quantum advantage. To do so, earlier works developed the geometric difference, which can be understood as a closeness measure between two kernel-based machine learning approaches, most importantly between a quantum kernel and a classical kernel. This metric links the quantum and classical model complexities, and it was developed to bound generalization error. Therefore, it raises the question of how this metric behaves in an empirical setting. In this work, we investigate the effects of hyperparameter choice on the model performance and the generalization gap between classical and quantum kernels. The importance of hyperparameters is well known also for classical machine learning. Of special interest are hyperparameters associated with the quantum Hamiltonian evolution feature map, as well as the number of qubits to trace out before computing a projected quantum kernel. We conduct a thorough investigation of the hyperparameters across 11 datasets and we identify certain aspects that can be exploited. Analyzing the effects of certain hyperparameter settings on the empirical performance, as measured by cross validation accuracy, and generalization ability, as measured by geometric difference described above, brings us one step closer to understanding the potential of quantum kernel methods on classical datasets.	翻訳日:2024-08-05 18:43:16 公開日:2024-08-02
# RCAgent: ツール強化大規模言語モデルを用いた自律エージェントによるクラウドルート解析 RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models ( http://arxiv.org/abs/2310.16340v3 ) ライセンス: Link先を確認	Zefan Wang, Zichuan Liu, Yingying Zhang, Aoxiao Zhong, Jihong Wang, Fengbin Yin, Lunting Fan, Lingfei Wu, Qingsong Wen,	(参考訳) 近年,クラウド根本原因分析(RCA)における言語モデル (LLM) の適用が活発に検討されている。しかし、現在のメソッドは手動のワークフロー設定に依存しており、LCMの意思決定と環境相互作用能力を解き放たない。 RCAgentは、実用的でプライバシに配慮した産業RCA利用のためのツール強化LDM自律エージェントフレームワークである。 RCAgentはGPTファミリではなく、内部的にデプロイされたモデル上で動作し、フリーフォームのデータ収集とツールによる包括的な分析を行うことができる。私たちのフレームワークは、アクショントラジェクトリのためのユニークなセルフ一貫性や、コンテキスト管理、安定化、ドメイン知識のインポートのための一連のメソッドなど、さまざまな拡張を組み合わせています。我々の実験は、RCAのすべての側面 – 根本原因、ソリューション、エビデンス、責任の予測 -- におけるReActに対するRCAgentの明らかかつ一貫した優位性、そして、自動化されたメトリクスと人的評価の両方によって検証された現在のルールによってカバーまたは明らかにされたタスクを示しています。さらに、RCAgentはすでにAlibaba CloudのApache Flink用のReal-time Compute Platformの診断と問題発見ワークフローに統合されている。 Large language model (LLM) applications in cloud root cause analysis (RCA) have been actively explored recently. However, current methods are still reliant on manual workflow settings and do not unleash LLMs' decision-making and environment interaction capabilities. We present RCAgent, a tool-augmented LLM autonomous agent framework for practical and privacy-aware industrial RCA usage. Running on an internally deployed model rather than GPT families, RCAgent is capable of free-form data collection and comprehensive analysis with tools. Our framework combines a variety of enhancements, including a unique Self-Consistency for action trajectories, and a suite of methods for context management, stabilization, and importing domain knowledge. Our experiments show RCAgent's evident and consistent superiority over ReAct across all aspects of RCA -- predicting root causes, solutions, evidence, and responsibilities -- and tasks covered or uncovered by current rules, as validated by both automated metrics and human evaluations. Furthermore, RCAgent has already been integrated into the diagnosis and issue discovery workflow of the Real-time Compute Platform for Apache Flink of Alibaba Cloud.	翻訳日:2024-08-05 18:43:16 公開日:2024-08-02
# 絡み合い支援符号語安定化量子符号のサイズに関する半定値プログラミング境界 Semidefinite programming bounds on the size of entanglement-assisted codeword stabilized quantum codes ( http://arxiv.org/abs/2311.07111v2 ) ライセンス: Link先を確認	Ching-Yi Lai, Pin-Chieh Tseng, Wei-Hsuan Yu,	(参考訳) 本稿では,量子符号の領域における半定値プログラミングの適用について検討し,特に絡み合い支援付き符号語安定化符号(CWS)に着目した。特に、CWS群の等方部分群とCWS型量子コードのワード演算子の集合を利用して、最小距離上の上限を導出する。さらに、この特徴は関連する距離列挙子に組み込むことができ、CWS型量子符号の最小距離またはサイズでSDP境界につながる半定値制約を構築することができる。 SDP が LP 境界より優れており、LP が有意義な結果を得ることができない場合もあれば、SDP は一貫してより厳密で関連する境界を提供する。最後に、コードワード安定化符号に対するShor-Laflamme重み列挙子とシャドー列挙子を解釈し、量子符号の理解を深める。 In this paper, we explore the application of semidefinite programming to the realm of quantum codes, specifically focusing on codeword stabilized (CWS) codes with entanglement assistance. Notably, we utilize the isotropic subgroup of the CWS group and the set of word operators of a CWS-type quantum code to derive an upper bound on the minimum distance. Furthermore, this characterization can be incorporated into the associated distance enumerators, enabling us to construct semidefinite constraints that lead to SDP bounds on the minimum distance or size of CWS-type quantum codes. We illustrate several instances where SDP bounds outperform LP bounds, and there are even cases where LP fails to yield meaningful results, while SDP consistently provides tighter and relevant bounds. Finally, we also provide interpretations of the Shor-Laflamme weight enumerators and shadow enumerators for codeword stabilized codes, enhancing our understanding of quantum codes.	翻訳日:2024-08-05 18:43:16 公開日:2024-08-02
# フィードバックループを用いたインクリメンタルオブジェクトベースノベルティ検出 Incremental Object-Based Novelty Detection with Feedback Loop ( http://arxiv.org/abs/2311.09004v2 ) ライセンス: Link先を確認	Simone Caldarella, Elisa Ricci, Rahaf Aljundi,	(参考訳) オブジェクトベースノベルティ検出(ND)は、オブジェクト検出モデルによってトレーニング中に見られるクラスに属さない未知のオブジェクトを識別することを目的としている。このタスクは、例えば自動運転車や自律ロボットで採用されている物体検出モデルのように、潜在的に有害な行動を回避できるため、現実世界のアプリケーションでは特に重要である。従来のNDのアプローチでは、事前訓練されたオブジェクト検出出力をオフラインで処理することに集中しており、トレーニング後にモデルロバスト性を改善し、デプロイ中に発生する大量のアウト・オブ・ディストリビューションデータを破棄する可能性は残っていない。本研究では,オブジェクト検出性能に悪影響を及ぼすことなく,予測出力に対して人間のフィードバックを要求できることを前提として,オブジェクトベースNDの新しいフレームワークを提案する。この改善操作は、新しいフィードバックが利用できるたびに繰り返される。そこで本研究では,物体検出モデル上に付加された軽量NDモジュールを,フィードバックループを通じて漸進的に更新する手法を提案する。また,この新たな設定の手法を評価し,ベースラインに対するNDアプローチを広範囲に検証する新たなベンチマークを提案し,ロバスト性の向上とフィードバックの取り込みに成功していることを示す。 Object-based Novelty Detection (ND) aims to identify unknown objects that do not belong to classes seen during training by an object detection model. The task is particularly crucial in real-world applications, as it allows to avoid potentially harmful behaviours, e.g. as in the case of object detection models adopted in a self-driving car or in an autonomous robot. Traditional approaches to ND focus on one time offline post processing of the pretrained object detection output, leaving no possibility to improve the model robustness after training and discarding the abundant amount of out-of-distribution data encountered during deployment. In this work, we propose a novel framework for object-based ND, assuming that human feedback can be requested on the predicted output and later incorporated to refine the ND model without negatively affecting the main object detection performance. This refinement operation is repeated whenever new feedback is available. To tackle this new formulation of the problem for object detection, we propose a lightweight ND module attached on top of a pre-trained object detection model, which is incrementally updated through a feedback loop. We also propose a new benchmark to evaluate methods on this new setting and test extensively our ND approach against baselines, showing increased robustness and a successful incorporation of the received feedback.	翻訳日:2024-08-05 18:43:16 公開日:2024-08-02
# スパイクニューラルネットワークの視覚的位置認識への応用 Applications of Spiking Neural Networks in Visual Place Recognition ( http://arxiv.org/abs/2311.13186v2 ) ライセンス: Link先を確認	Somayeh Hussaini, Michael Milford, Tobias Fischer,	(参考訳) ロボット工学において、スパイキングニューラルネットワーク(SNN)は、特にニューロモルフィックハードウェアに実装された場合、その大部分が非現実的なポテンシャルエネルギー効率と低レイテンシーで認識されている。本稿では,視覚的位置認識(VPR)におけるSNNの3つの進歩について述べる。まず,各SNNが重複しない地理的に異なる場所の集合を表現し,大規模環境におけるスケーラブルなネットワークを実現するモジュールSNNを提案する。次に,複数のネットワークが同じ場所を表現し,シングルネットワークモデルと比較して精度を著しく向上させるモジュールSNNのアンサンブルを提案する。モジュラーSNNモジュールはそれぞれコンパクトで、1500のニューロンと474kのシナプスのみで構成されています。最後に,SNNに基づくVPRにおけるシーケンスマッチングの役割について検討する。我々は,他のVPR手法と比較して,SNNのアンサンブルとシーケンスマッチングに対する応答性を解析した。弊社のコントリビューションは、VPRのためのSNNの実用性を強調し、スケーラブルで堅牢なソリューションを提供し、さまざまなエネルギーに敏感なロボットタスクに適用するための道を開いた。 In robotics, Spiking Neural Networks (SNNs) are increasingly recognized for their largely-unrealized potential energy efficiency and low latency particularly when implemented on neuromorphic hardware. Our paper highlights three advancements for SNNs in Visual Place Recognition (VPR). Firstly, we propose Modular SNNs, where each SNN represents a set of non-overlapping geographically distinct places, enabling scalable networks for large environments. Secondly, we present Ensembles of Modular SNNs, where multiple networks represent the same place, significantly enhancing accuracy compared to single-network models. Each of our Modular SNN modules is compact, comprising only 1500 neurons and 474k synapses, making them ideally suited for ensembling due to their small size. Lastly, we investigate the role of sequence matching in SNN-based VPR, a technique where consecutive images are used to refine place recognition. We analyze the responsiveness of SNNs to ensembling and sequence matching compared to other VPR techniques. Our contributions highlight the viability of SNNs for VPR, offering scalable and robust solutions, and paving the way for their application in various energy-sensitive robotic tasks.	翻訳日:2024-08-05 18:43:16 公開日:2024-08-02
# 有限熱力学資源を用いた量子符号化 Quantum Coding with Finite Thermodynamic Resources ( http://arxiv.org/abs/2311.14561v2 ) ライセンス: Link先を確認	Jake Xuereb, Tiago Debarba, Marcus Huber, Paul Erker,	(参考訳) 量子直接符号化(英語版)あるいはシューマッハ圧縮(英語版)はシャノン理論の考えを一般化し、フォン・ノイマンのエントロピーに操作的意味を与え、qubitという用語を確立した。しかし、その情報処理が物理的プロセスによって実行されることを思い出すと、量子情報の圧縮に必要な熱力学的資源と、そのタスクを実行する能力の制約について疑問を抱くようになる。つまり、アリスとボブが有限の精度で熱量子状態や時計にしかアクセスできないなら、純粋な量子状態のメッセージを計測、エンコード、復号できるだろうか? 本研究では、Aliceの典型的な測定を測定プローブを含むユニタリとしてモデル化し、符号化と復号における不完全時間保持を検証し、Bobの付加量子ビットにおける温度の役割を考察する。そうすることで、アリスが測定プローブで生成できる相関、時計のくちばしのばらつき、ボブの量子ビットの温度を含むこのプロトコルの忠実性境界を導出する。最後に、これらの2つのエージェントが生成するエントロピーについて、それらが使用するリソースを量子熱力学冷却プロトコルに関連付けることにより、圧縮プロトコルを通して考察する。 Quantum direct coding or Schumacher compression generalised the ideas of Shannon theory, gave an operational meaning to the von Neumann entropy and established the term qubit. But remembering that information processing is carried out by physical processes prompts one to wonder what thermodynamic resources are required to compress quantum information and how they constrain one's ability to perform this task. That is, if Alice and Bob only have access to thermal quantum states and clocks with finite accuracy, how well can they measure, encode and decode pure quantum state messages? In this work we examine these questions by modelling Alice's typical measurement as a unitary involving a measurement probe, investigating imperfect timekeeping on encoding and decoding and considering the role of temperature in Bob's appended qubits. In doing so, we derive fidelity bounds for this protocol involving the correlations Alice can form with their measurement probe, the variance of the clock's ticks and the temperature of Bob's qubits. Finally, we give an insight into the entropy produced by these two agents throughout the compression protocol by relating the resources they use to a quantum thermodynamic cooling protocol.	翻訳日:2024-08-05 18:43:16 公開日:2024-08-02
# デジタル病理基盤モデルにおける下流ネットワークの重要性 The Importance of Downstream Networks in Digital Pathology Foundation Models ( http://arxiv.org/abs/2311.17804v3 ) ライセンス: Link先を確認	Gustav Bredell, Marcel Fischer, Przemyslaw Szostak, Samaneh Abbasi-Sureshjani, Alvaro Gomariz,	(参考訳) デジタル病理学は、ギガピクセル全スライディング画像(WSI)の解析を通じて、疾患の検出と病理学の効率を大幅に向上させた。このプロセスでは、まずWSIをパッチに分割し、特徴抽出モデルを適用して特徴ベクトルを取得し、その後集約モデルで処理して各WSIラベルを予測する。表現学習の急速な進化に伴い、多くの新しい特徴抽出モデル(しばしば基礎モデルと呼ばれる)が出現した。従来の評価方法は、固定されたアーキテクチャとハイパーパラメータを含む静的な下流アグリゲーションモデルの設定に依存しています。本研究は, 特徴抽出器モデルのアグリゲーションモデル構成に対する感度を明らかにし, 選択した構成に基づいて, 性能コンパビリティをスキューできることを示す。この感度を考慮すると、多くの特徴抽出器モデルの性能が顕著に類似していることが分かる。 162のアグリゲーションモデル構成を持つ3つのデータセットにまたがる7つの特徴抽出モデルを評価することで、この洞察を支援する。この包括的なアプローチは、様々な集約モデル構成に対する特徴抽出器の感度をより微妙に理解し、デジタル病理学における新しい基礎モデルをより公平かつ正確に評価する。 Digital pathology has significantly advanced disease detection and pathologist efficiency through the analysis of gigapixel whole-slide images (WSI). In this process, WSIs are first divided into patches, for which a feature extractor model is applied to obtain feature vectors, which are subsequently processed by an aggregation model to predict the respective WSI label. With the rapid evolution of representation learning, numerous new feature extractor models, often termed foundational models, have emerged. Traditional evaluation methods rely on a static downstream aggregation model setup, encompassing a fixed architecture and hyperparameters, a practice we identify as potentially biasing the results. Our study uncovers a sensitivity of feature extractor models towards aggregation model configurations, indicating that performance comparability can be skewed based on the chosen configurations. By accounting for this sensitivity, we find that the performance of many current feature extractor models is notably similar. We support this insight by evaluating seven feature extractor models across three different datasets with 162 different aggregation model configurations. This comprehensive approach provides a more nuanced understanding of the feature extractors' sensitivity to various aggregation model configurations, leading to a fairer and more accurate assessment of new foundation models in digital pathology.	翻訳日:2024-08-05 18:43:16 公開日:2024-08-02
# 大規模言語モデルのトレーニングのためのデータ管理:サーベイ Data Management For Training Large Language Models: A Survey ( http://arxiv.org/abs/2312.01700v3 ) ライセンス: Link先を確認	Zige Wang, Wanjun Zhong, Yufei Wang, Qi Zhu, Fei Mi, Baojun Wang, Lifeng Shang, Xin Jiang, Qun Liu,	(参考訳) データは、Large Language Models(LLM)のトレーニングにおいて、基本的な役割を果たす。効率的なデータ管理、特に適切なトレーニングデータセットの定式化は、事前トレーニングおよび教師付き微調整段階におけるモデル性能の向上とトレーニング効率の向上に重要である。データ管理の重要性は大きいが、現在の顕著なプラクティスの基盤となるメカニズムはまだ不明である。その結果、データ管理の探究が研究コミュニティの間でますます注目を集めている。本調査は、データ管理戦略設計の様々な側面を網羅し、LLMの事前訓練および微調整段階におけるデータ管理に関する現在の研究の概要を概観することを目的としている。今後の展望として、既存の課題を概説し、この分野の開発に向けた有望な方向性を概説する。したがって、この調査は、効率的なデータ管理の実践を通じて強力なLCMを構築したいと考える実践者の指針となる。最新の論文のコレクションはhttps://github.com/ZigeW/data_management_LLMで公開されている。 Data plays a fundamental role in training Large Language Models (LLMs). Efficient data management, particularly in formulating a well-suited training dataset, is significant for enhancing model performance and improving training efficiency during pretraining and supervised fine-tuning stages. Despite the considerable importance of data management, the underlying mechanism of current prominent practices are still unknown. Consequently, the exploration of data management has attracted more and more attention among the research community. This survey aims to provide a comprehensive overview of current research in data management within both the pretraining and supervised fine-tuning stages of LLMs, covering various aspects of data management strategy design. Looking into the future, we extrapolate existing challenges and outline promising directions for development in this field. Therefore, this survey serves as a guiding resource for practitioners aspiring to construct powerful LLMs through efficient data management practices. The collection of the latest papers is available at https://github.com/ZigeW/data_management_LLM.	翻訳日:2024-08-05 18:43:16 公開日:2024-08-02
# 3次元インスタンス分割のためのSAM誘導グラフカット SAM-guided Graph Cut for 3D Instance Segmentation ( http://arxiv.org/abs/2312.08372v3 ) ライセンス: Link先を確認	Haoyu Guo, He Zhu, Sida Peng, Yuang Wang, Yujun Shen, Ruizhen Hu, Xiaowei Zhou,	(参考訳) 本稿では,3次元画像情報と多視点画像情報の同時利用による3次元インスタンス分割の課題に対処する。これまでの多くの研究は、3Dポイントクラウドにディープラーニング技術を適用して、例のセグメンテーションを行っている。しかし,これらの手法は,ラベル付き3Dポイントクラウドデータの不足と低多様性のため,様々な場面に一般化できなかった。最近のいくつかの作業では、ボトムアップフレームワーク内で2Dインスタンスのセグメンテーションを3Dに引き上げようとしている。ビュー間の2Dインスタンスセグメンテーションの不整合は、3Dセグメンテーションの性能を著しく低下させる。本研究では,3次元インスタンスセグメンテーションのための2次元セグメンテーションモデルを効果的に活用する新しい3D-to-2Dクエリフレームワークを提案する。具体的には、シーンを3次元のいくつかのスーパーポイントに事前分割し、タスクをグラフカット問題に定式化する。スーパーポイントグラフは2次元セグメンテーションモデルに基づいて構築され、マルチビュー画像特徴からノード特徴が得られ、エッジ重みがマルチビューセグメンテーション結果に基づいて計算され、より優れた一般化能力が得られる。グラフを処理するために、2Dセグメンテーションモデルから擬似3Dラベルを用いてグラフニューラルネットワークを訓練する。 ScanNet, ScanNet++, KITTI-360データセットによる実験結果から, 本手法がロバストなセグメンテーション性能を実現し, 異なるタイプのシーンにまたがって一般化可能であることが示された。私たちのプロジェクトページはhttps://zju3dv.github.io/sam_graph.comで公開されている。 This paper addresses the challenge of 3D instance segmentation by simultaneously leveraging 3D geometric and multi-view image information. Many previous works have applied deep learning techniques to 3D point clouds for instance segmentation. However, these methods often failed to generalize to various types of scenes due to the scarcity and low-diversity of labeled 3D point cloud data. Some recent works have attempted to lift 2D instance segmentations to 3D within a bottom-up framework. The inconsistency in 2D instance segmentations among views can substantially degrade the performance of 3D segmentation. In this work, we introduce a novel 3D-to-2D query framework to effectively exploit 2D segmentation models for 3D instance segmentation. Specifically, we pre-segment the scene into several superpoints in 3D, formulating the task into a graph cut problem. The superpoint graph is constructed based on 2D segmentation models, where node features are obtained from multi-view image features and edge weights are computed based on multi-view segmentation results, enabling the better generalization ability. To process the graph, we train a graph neural network using pseudo 3D labels from 2D segmentation models. Experimental results on the ScanNet, ScanNet++ and KITTI-360 datasets demonstrate that our method achieves robust segmentation performance and can generalize across different types of scenes. Our project page is available at https://zju3dv.github.io/sam_graph.	翻訳日:2024-08-05 18:43:16 公開日:2024-08-02
# ゼロ階プロセス忠実度を状態準備と測定誤差から独立にすること Making the zeroth-order process fidelity independent of state preparation and measurement errors ( http://arxiv.org/abs/2312.08590v2 ) ライセンス: Link先を確認	Yu-Hao Chen, Renata Wong, Hsi-Sheng Goan,	(参考訳) 本研究では, プロセス忠実度に対する近似であるゼロ忠実度が, ランダム化ベンチマークと組み合わせると, 状態準備と測定(SPAM)誤差に対して頑健になることを示す。しかし、ランダム化されたベンチマークでは、量子ビット数が増加するとクリフォード群からより多くのクリフォード要素をランダムに選択する必要があるため、この組み合わせは最大3つの量子ビットを持つ量子系に制限される。 SPAMエラーとは無関係に、同時にマルチキュービットシステムにも適用できるようにするため、大域的ユニタリ折り畳み法や、量子誤差軽減に使用されるアイデンティティスケーリングと同様のチャネルノイズスケーリング手法を用いる。 In this work, we demonstrate that the zero-fidelity, an approximation to the process fidelity, when combined with randomized benchmarking, becomes robust to state preparation and measurement (SPAM) errors. However, as randomized benchmarking requires randomly choosing an increasingly large number of Clifford elements from the Clifford group when the qubit number increases, this combination is also limited to quantum systems with up to three qubits. To make the zero-fidelity independent of SPAM errors and, at the same time, applicable to multi-qubit systems, we employ a channel noise scaling method similar to the method of global unitary folding, or identity scaling, used for quantum error mitigation.	翻訳日:2024-08-05 18:43:16 公開日:2024-08-02
# 衝突と部分グラフの検出のインスタンス最適性について On the instance optimality of detecting collisions and subgraphs ( http://arxiv.org/abs/2312.10196v2 ) ライセンス: Link先を確認	Omri Ben-Eliezer, Tomer Grossman, Moni Naor,	(参考訳) 関数 $f\colon [n] \to [n]$ via (black-box) query access to the function を指定します。衝突(ペア$x \neq y$ s.t. $f)のようなローカルなものを探し求めている。 (x)=f (y)$)。問題は、関数の「形」を知ることが、あなたを助けるかどうかである(形によって、関数のいくつかの置換が知られていることを意味する)。本稿では,グラフや関数のサブ構造検出問題のラベルなしインスタンス最適性について検討する。可能な入力に対して、$A$の(ランダム化された)クエリの複雑さが、任意のアルゴリズムのクエリの複雑さの少なくとも1倍の$g(n)$であることを満たすアルゴリズムを許容すると、$g(n)$-instance 最適である。この結果から,グラフや関数のサブ構造検出問題における未ラベルのインスタンス最適性の三分法が示唆された: 1. 非常に単純な性質のいくつかは,$O(1)$-instanceの最適アルゴリズムを持つ。 2. グラフと関数のほとんどの性質は、不動点や関数の3$コリションを含む例やグラフの三角形のような例は、インスタンス最適性から$n^{\Omega(1)}$-farである。 3. 関数における衝突検出の問題とグラフ内の爪の発見は, 両者の中間となる。この2つの性質は、インスタンス最適性から$\Omega(\log n)$-farであることを示し、この境界がきついことを予想する。この予想に対する証拠として、グラフ内の爪を見つけることは、未ラベル証明書を持つアルゴリズムのクエリ複雑性が$O\left(\sqrt{\frac{n}{\log n}}\right)$である全ての入力グラフの中で最適な$O(\log(n))$-instanceであることを示す。 Suppose you are given a function $f\colon [n] \to [n]$ via (black-box) query access to the function. You are looking to find something local, like a collision (a pair $x \neq y$ s.t. $f(x)=f(y)$). The question is whether knowing the "shape" of the function helps you or not (by shape we mean that some permutation of the function is known). Formally, we investigate the unlabeled instance optimality of substructure detection problems in graphs and functions. A problem is $g(n)$-instance optimal if it admits an algorithm $A$ satisfying that for any possible input, the (randomized) query complexity of $A$ is at most $g(n)$ times larger than the query complexity of any algorithm $A'$ which solves the same problem while holding an unlabeled copy of the input (i.e., any $A'$ that "knows the structure of the input"). Our results point to a trichotomy of unlabeled instance optimality among substructure detection problems in graphs and functions: 1. A few very simple properties have an $O(1)$-instance optimal algorithm. 2. Most properties of graphs and functions, with examples such as containing a fixed point or a $3$-collision in functions, or a triangle in graphs, are $n^{\Omega(1)}$-far from instance optimality. 3. The problems of collision detection in functions and finding a claw in a graph serve as a middle ground between the two regimes. We show that these two properties are $\Omega(\log n)$-far from instance optimality, and conjecture that this bound is tight. We provide evidence towards this conjecture, by proving that finding a claw in a graph is $O(\log(n))$-instance optimal among all input graphs for which the query complexity of an algorithm holding an unlabeled certificate is $O\left(\sqrt{\frac{n}{\log n}}\right)$.	翻訳日:2024-08-05 18:33:20 公開日:2024-08-02
# 6Gサブネットワークにおけるサブバンド配置のための教師なしグラフベース学習法 Unsupervised Graph-based Learning Method for Sub-band Allocation in 6G Subnetworks ( http://arxiv.org/abs/2401.00950v2 ) ライセンス: Link先を確認	Daniel Abode, Ramoni Adeogun, Lou Salaün, Renato Abreu, Thomas Jacobsen, Gilberto Berardinelli,	(参考訳) 本稿では,グラフ学習を用いた無線ネットワークにおける周波数サブバンド割り当ての教師なし手法を提案する。我々は、サブネットワーク間の干渉を調整するために最適に割り当てなければならないサブバンドの数が限られている工場環境におけるサブネットワークの密集配置について検討する。サブネットワーク配置をコンフリクトグラフとしてモデル化し,グラフカラーヒューリスティックとポッツモデルにインスパイアされた教師なし学習アプローチを提案し,グラフニューラルネットワークを用いたサブバンド割り当てを最適化する。数値評価により,提案手法は,計算時間の複雑度が低い集中グリーディーカラー化サブバンド割り当てヒューリスティックに密接な性能を実現することを示す。さらに、全ての相互干渉チャネル情報を必要とする反復的最適化ヒューリスティックと比べて、信号のオーバーヘッドを低減させる。さらに,本手法は異なるネットワーク設定に対して堅牢であることを示す。 In this paper, we present an unsupervised approach for frequency sub-band allocation in wireless networks using graph-based learning. We consider a dense deployment of subnetworks in the factory environment with a limited number of sub-bands which must be optimally allocated to coordinate inter-subnetwork interference. We model the subnetwork deployment as a conflict graph and propose an unsupervised learning approach inspired by the graph colouring heuristic and the Potts model to optimize the sub-band allocation using graph neural networks. The numerical evaluation shows that the proposed method achieves close performance to the centralized greedy colouring sub-band allocation heuristic with lower computational time complexity. In addition, it incurs reduced signalling overhead compared to iterative optimization heuristics that require all the mutual interfering channel information. We further demonstrate that the method is robust to different network settings.	翻訳日:2024-08-05 18:33:20 公開日:2024-08-02
# レート制限チャネルを用いたモデルフリーLQR制御に向けて Towards Model-Free LQR Control over Rate-Limited Channels ( http://arxiv.org/abs/2401.01258v2 ) ライセンス: Link先を確認	Aritra Mitra, Lintao Ye, Vijay Gupta,	(参考訳) 多くの問題設定におけるモデルフリーな制御手法の成功を考えると、現実的な通信チャネルを勾配やポリシーの伝達に利用すれば、どう変わるのかを問うことは自然である。結果として生じる問題は、ネットワーク制御システムのルーリックの下で研究される定式化と類似しているが、その領域の豊かな文献は一般にシステムのモデルが知られていると仮定している。モデルフリー制御設計とネットワーク制御システムの分野を橋渡しするステップとして、リニア2次レギュレータ(LQR)問題のような基本的な制御問題を、レート制限チャネル上でモデルフリーに解決することは可能か? 作業者エージェントが(LQRコストの)量子化されたポリシー勾配を有限ビットレートのノイズレスチャネル上のサーバに送信する環境について検討する。そこで我々は,Adaptively Quantized Gradient Descent (\texttt{AQGD}) と題する新しいアルゴリズムを提案し,ある有限しきい値ビットレートを超えると,大域的最適ポリシーに対する指数関数的に高速な収束が保証され,指数関数が不等化設定に対して劣化することを証明する。より一般に、我々の手法は高速線形収束率の保存における適応量子化の利点を明らかにし、圧縮最適化に関する文献には独立した関心を持つ可能性がある。 Given the success of model-free methods for control design in many problem settings, it is natural to ask how things will change if realistic communication channels are utilized for the transmission of gradients or policies. While the resulting problem has analogies with the formulations studied under the rubric of networked control systems, the rich literature in that area has typically assumed that the model of the system is known. As a step towards bridging the fields of model-free control design and networked control systems, we ask: \textit{Is it possible to solve basic control problems - such as the linear quadratic regulator (LQR) problem - in a model-free manner over a rate-limited channel?} Toward answering this question, we study a setting where a worker agent transmits quantized policy gradients (of the LQR cost) to a server over a noiseless channel with a finite bit-rate. We propose a new algorithm titled Adaptively Quantized Gradient Descent (\texttt{AQGD}), and prove that above a certain finite threshold bit-rate, \texttt{AQGD} guarantees exponentially fast convergence to the globally optimal policy, with \textit{no deterioration of the exponent relative to the unquantized setting}. More generally, our approach reveals the benefits of adaptive quantization in preserving fast linear convergence rates, and, as such, may be of independent interest to the literature on compressed optimization.	翻訳日:2024-08-05 18:33:20 公開日:2024-08-02
# OCLと検索によるMC/DCの効率的なテストデータ生成 Efficient Test Data Generation for MC/DC with OCL and Search ( http://arxiv.org/abs/2401.03469v3 ) ライセンス: Link先を確認	Hassan Sartaj, Muhammad Zohaib Iqbal, Atif Aftab Ahmed Jilani, Muhammad Uzair Khan,	(参考訳) アビオニクスのソフトウェアシステムのシステムレベルのテストは、DO-178Cのような異なる国際安全基準に準拠する必要がある。アビオニクス産業の重要な考慮事項は、安全基準によって提案される基準に従って自動テストデータ生成である。 DO-178Cの推奨基準の1つは、修正条件/決定カバレッジ(MC/DC)基準である。現在のモデルベースのテストデータ生成アプローチでは、Object Constraint Language(OCL)で記述された制約を使用し、テストデータを生成するために検索技術を適用します。これらのアプローチはMC/DC基準をサポートしないか、大規模アビオニクスシステムのテストデータを生成する際にパフォーマンス上の問題に悩まされる。本稿では,モデルベーステストにおいてMC/DCテストデータの自動生成を効果的に行う方法を提案する。ケースベース推論 (CBR) と範囲縮小ヒューリスティックスを用いて, MC/DC に適合した OCL 制約を解く手法を開発した。我々は,CBRを用いたMC/DCテストデータ生成のための提案手法と,CBRと範囲縮小の双方を,元の探索アルゴリズムとランダム検索と比較する実験的検討を行った。また、我々の戦略を既存の制約解決アプローチと経験的に比較した。その結果, MC/DCテストデータ生成におけるCBRと範囲の低減は, ベースライン法よりも優れていた。さらに, MC/DCテストデータ生成におけるCBRと範囲削減の組み合わせは, 既存の制約解法と比較して有効である。 System-level testing of avionics software systems requires compliance with different international safety standards such as DO-178C. An important consideration of the avionics industry is automated test data generation according to the criteria suggested by safety standards. One of the recommended criteria by DO-178C is the modified condition/decision coverage (MC/DC) criterion. The current model-based test data generation approaches use constraints written in Object Constraint Language (OCL), and apply search techniques to generate test data. These approaches either do not support MC/DC criterion or suffer from performance issues while generating test data for large-scale avionics systems. In this paper, we propose an effective way to automate MC/DC test data generation during model-based testing. We develop a strategy that utilizes case-based reasoning (CBR) and range reduction heuristics designed to solve MC/DC-tailored OCL constraints. We performed an empirical study to compare our proposed strategy for MC/DC test data generation using CBR, range reduction, both CBR and range reduction, with an original search algorithm, and random search. We also empirically compared our strategy with existing constraint-solving approaches. The results show that both CBR and range reduction for MC/DC test data generation outperform the baseline approach. Moreover, the combination of both CBR and range reduction for MC/DC test data generation is an effective approach compared to existing constraint solvers.	翻訳日:2024-08-05 18:33:20 公開日:2024-08-02
# MERA:ロシアにおける総合的なLCM評価 MERA: A Comprehensive LLM Evaluation in Russian ( http://arxiv.org/abs/2401.04531v3 ) ライセンス: Link先を確認	Alena Fenogenova, Artem Chervyakov, Nikita Martynov, Anastasia Kozlova, Maria Tikhonova, Albina Akhmetgareeva, Anton Emelyanov, Denis Shevelev, Pavel Lebedev, Leonid Sinev, Ulyana Isaeva, Katerina Kolomeytseva, Daniil Moskovskiy, Elizaveta Goncharova, Nikita Savushkin, Polina Mikhailova, Denis Dimitrov, Alexander Panchenko, Sergei Markov,	(参考訳) 過去数年間、AI研究の最も顕著な進歩の1つは、基礎モデル(FM)であり、言語モデル(LM)の台頭に基づいている。モデルのサイズが大きくなるにつれて、LMは測定可能な側面の強化と新しい定性的特徴の開発を示す。しかし、研究者の注意とLM応用の急速な成長にもかかわらず、その能力、限界、関連するリスクをよりよく理解する必要がある。これらの課題に対処するために,ロシア語を指向した基礎モデルを評価するための新しいインストラクション・ベンチマークである,オープンなマルチモーダル・アセスメント・オブ・ロシア・アーキテクチャ(MERA)を導入する。このベンチマークは、11のスキルドメインで生成モデルを評価する21のタスクを含み、データ漏洩の排除を保証するブラックボックステストとして設計されている。本稿では,FMとLMを,他のモードに拡張可能なゼロおよび少数ショットの固定命令設定で評価する手法を提案する。本稿では,評価手法,MERA評価のためのオープンソースコードベース,提案システムを備えたリーダボードを提案する。オープンなLMをベースラインとして評価し,人間のレベルをはるかに下回っていることを確認した。我々はMERAを公開し、今後の研究をガイドし、グラウンディングモデルの特徴を予測し、評価手順を標準化し、潜在的な社会的欠点に対処する。 Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). As the models' size increases, LMs demonstrate enhancements in measurable aspects and the development of new qualitative features. However, despite researchers' attention and the rapid growth in LM application, the capabilities, limitations, and associated risks still need to be better understood. To address these issues, we introduce an open Multimodal Evaluation of Russian-language Architectures (MERA), a new instruction benchmark for evaluating foundation models oriented towards the Russian language. The benchmark encompasses 21 evaluation tasks for generative models in 11 skill domains and is designed as a black-box test to ensure the exclusion of data leakage. The paper introduces a methodology to evaluate FMs and LMs in zero- and few-shot fixed instruction settings that can be extended to other modalities. We propose an evaluation methodology, an open-source code base for the MERA assessment, and a leaderboard with a submission system. We evaluate open LMs as baselines and find that they are still far behind the human level. We publicly release MERA to guide forthcoming research, anticipate groundbreaking model features, standardize the evaluation procedure, and address potential societal drawbacks.	翻訳日:2024-08-05 18:33:20 公開日:2024-08-02
# エッジコンピューティングを実現するブロックチェーンを用いたIoTのセキュアなターゲットメッセージ拡散 Secure Targeted Message Dissemination in IoT Using Blockchain Enabled Edge Computing ( http://arxiv.org/abs/2401.06384v2 ) ライセンス: Link先を確認	Muhammad Baqer Mollah, Md Abul Kalam Azad, Yinghui Zhang,	(参考訳) スマートデバイスはIoT(Internet of Things)の不可欠な部分と見なされており、情報交換、データ収集、分析、自律的な方法で最適な決定を行い、より効率的で自動的で経済的なサービスを実現するための動的ネットワークの実現を目的としている。これらのスマートデバイス間のメッセージの拡散により、新しい機能の追加、更新された命令、アラートまたは安全メッセージの送信、価格情報または請求金額の通知、インセンティブ、セキュリティパッチのインストールが可能になる。一方、このようなメッセージの拡散は、IoTシステムに関わるすべての関係者にとって直接的に有利である。一方、リモートプロシージャのため、スマートデバイス、ベンダー、その他の関係機関は、ターゲットデバイス間でメッセージを発信しながら、セキュリティ、プライバシ、パフォーマンスに関する多くの懸念を満たす必要があるかもしれない。そこで本論文では,IoTにおけるセキュリティとプライバシを意識したターゲットメッセージの普及を目的としたSTarEdgeChainを設計し,ブロックチェーンと高度な暗号化技術が,このような問題に対処するためにどのように取り組まれているかを示す。実際、STarEdgeChainは、ターゲットとするデバイスグループ間でシングルサイン暗号化されたメッセージの拡散を迅速化すると同時に、複数のユニカッティングアプローチを使用する依存関係を回避するために、認可されたブロックチェーン支援エッジコンピューティングを使用している。最後に,STarEdgeChainのプロトタイプを開発し,スマートデバイスの実用性を示す。コードはhttps://github.com/mbaqer/Blockchain-IoTで公開されている。 Smart devices are considered as an integral part of Internet of Things (IoT), have an aim to make a dynamic network to exchange information, collect data, analysis, and make optimal decisions in an autonomous way to achieve more efficient, automatic, and economical services. Message dissemination among these smart devices allows adding new features, sending updated instructions, alerts or safety messages, informing the pricing information or billing amount, incentives, and installing security patches. On one hand, such message disseminations are directly beneficial to the all parties involved in the IoT system. On the other hand, due to remote procedure, smart devices, vendors, and other involved authorities might have to meet a number of security, privacy, and performance related concerns while disseminating messages among targeted devices. To this end, in this paper, we design STarEdgeChain, a security and privacy aware targeted message dissemination in IoT to show how blockchain along with advanced cryptographic techniques are devoted to address such concerns. In fact, the STarEdgeChain employs a permissioned blockchain assisted edge computing in order to expedite a single signcrypted message dissemination among targeted groups of devices, at the same time avoiding the dependency of utilizing multiple unicasting approaches. Finally, we develop a software prototype of STarEdgeChain and show it's practicability for smart devices. The codes are publicly available at https://github.com/mbaqer/Blockchain-IoT	翻訳日:2024-08-05 18:33:20 公開日:2024-08-02
# 自律走行におけるロバスト性を考慮した3次元物体検出:展望と展望 Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook ( http://arxiv.org/abs/2401.06542v2 ) ライセンス: Link先を確認	Ziying Song, Lin Liu, Feiyang Jia, Yadan Luo, Guoxin Zhang, Lei Yang, Li Wang, Caiyan Jia,	(参考訳) 現代の自律運転の領域では、周囲環境の状態を正確に評価するためには認識システムが不可欠であり、情報予測と計画を可能にする。このシステムの重要なステップは、LiDARやカメラなどの車両に搭載されたセンサーを使って、近くの物体のサイズ、カテゴリ、位置を識別する3Dオブジェクト検出に関連している。検出精度と効率性の向上を目的とした3次元物体検出手法の急増にもかかわらず, 環境変動, 騒音, 気象変化に対する耐性を系統的に検討する文献のギャップがある。本研究は,現実シナリオ下での知覚システム評価において,精度と遅延とともに頑健性の重要性を強調した。我々の研究は、カメラのみ、LiDARのみ、マルチモーダルな3Dオブジェクト検出アルゴリズムを広範囲に調査し、精度、レイテンシ、堅牢性の間のトレードオフを、特にKITTI-CやnuScenes-Cのようなデータセットで徹底的に評価し、公正な比較を保証する。これらのうち、多モード3D検出手法は優れた堅牢性を示し、新しい分類法を導入して、文献の明瞭性を高めるために再編成する。本調査は、現実のアプリケーションにおける3次元オブジェクト検出アルゴリズムの現在の機能と制約について、より実用的な視点を提供することを目的としている。 In the realm of modern autonomous driving, the perception system is indispensable for accurately assessing the state of the surrounding environment, thereby enabling informed prediction and planning. The key step to this system is related to 3D object detection that utilizes vehicle-mounted sensors such as LiDAR and cameras to identify the size, the category, and the location of nearby objects. Despite the surge in 3D object detection methods aimed at enhancing detection precision and efficiency, there is a gap in the literature that systematically examines their resilience against environmental variations, noise, and weather changes. This study emphasizes the importance of robustness, alongside accuracy and latency, in evaluating perception systems under practical scenarios. Our work presents an extensive survey of camera-only, LiDAR-only, and multi-modal 3D object detection algorithms, thoroughly evaluating their trade-off between accuracy, latency, and robustness, particularly on datasets like KITTI-C and nuScenes-C to ensure fair comparisons. Among these, multi-modal 3D detection approaches exhibit superior robustness, and a novel taxonomy is introduced to reorganize the literature for enhanced clarity. This survey aims to offer a more practical perspective on the current capabilities and the constraints of 3D object detection algorithms in real-world applications, thus steering future research towards robustness-centric advancements.	翻訳日:2024-08-05 18:33:20 公開日:2024-08-02
# PT対称量子系における量子カオス Quantum chaos in PT symmetric quantum systems ( http://arxiv.org/abs/2401.07215v2 ) ライセンス: Link先を確認	Kshitij Sharma, Himanshu Sahu, Subroto Mukerjee,	(参考訳) 本研究では,非エルミート力学系における$\mathcal{PT}$対称性と量子カオスの相互作用について検討する。量子カオスの標準的な診断、すなわち複素レベル間隔比と時間外順序相関器(OTOC)を拡張して、$\mathcal{PT}$-symmetric 量子キックロータモデルについて検討する。キックローターは、古典的および量子的カオスを研究するためのパラダイム力学システムとして長い間見なされてきた。量子キックローターに非ハーミティシティを導入することで、エルミート系に存在しない新しい位相と遷移を明らかにする。複素レベル間隔比の研究から、積分可能かつ$\mathcal{PT}$対称性(英語版)、$\mathcal{PT}$対称性(英語版)とカオス的であるが$\mathcal{PT}$対称性(英語版)が破れた3つの状態を見つける。複素レベル間隔比は3つの相を区別できることがわかった。 OTOCの計算は、半古典的極限における古典的リャプノフ指数の計算と関係があるので、これらの状態と位相境界におけるその性質について検討する。 $\mathcal{PT}$-対称性の位相において、OTOCは積分可能およびカオス的状態の両方においてエルミート系で観察されるものに似た振舞いを示す。さらに、$\mathcal{PT}$-対称性の破れ相において、OTOCは後代の固有値スペクトルの複素性質から生じる追加の指数的成長を示す。我々はOTOCの深夜動作の分析形式を導出する。正規化OTOCを定義して、$\mathcal{PT}$-対称性の破れによる影響を軽減することにより、OTOCは$\mathcal{PT}$-対称性のカオス相から$\mathcal{PT}$-対称性の破れ、カオス相への遷移において特異な挙動を示すことを示す。 In this study, we explore the interplay between $\mathcal{PT}$-symmetry and quantum chaos in a non-Hermitian dynamical system. We consider an extension of the standard diagnostics of quantum chaos, namely the complex level spacing ratio and out-of-time-ordered correlators (OTOCs), to study the $\mathcal{PT}$-symmetric quantum kicked rotor model. The kicked rotor has long been regarded as a paradigmatic dynamic system to study classical and quantum chaos. By introducing non-Hermiticity in the quantum kicked rotor, we uncover new phases and transitions that are absent in the Hermitian system. From the study of the complex level spacing ratio, we locate three regimes -- one which is integrable and $\mathcal{PT}$-symmetry, another which is chaotic with $\mathcal{PT}$-symmetry and a third which is chaotic but with broken $\mathcal{PT}$-symmetry. We find that the complex level spacing ratio can distinguish between all three phases. Since calculations of the OTOC can be related to those of the classical Lyapunov exponent in the semi-classical limit, we investigate its nature in these regimes and at the phase boundaries. In the phases with $\mathcal{PT}$-symmetry, the OTOC exhibits behaviour akin to what is observed in the Hermitian system in both the integrable and chaotic regimes. Moreover, in the $\mathcal{PT}$-symmetry broken phase, the OTOC demonstrates additional exponential growth stemming from the complex nature of the eigenvalue spectrum at later times. We derive the analytical form of the late-time behaviour of the OTOC. By defining a normalized OTOC to mitigate the effects caused by $\mathcal{PT}$-symmetry breaking, we show that the OTOC exhibits singular behaviour at the transition from the $\mathcal{PT}$-symmetric chaotic phase to the $\mathcal{PT}$-symmetry broken, chaotic phase.	翻訳日:2024-08-05 18:33:20 公開日:2024-08-02
# ロジスティックマップを用いた貯留層計算 Reservoir computing with logistic map ( http://arxiv.org/abs/2401.09501v2 ) ライセンス: Link先を確認	R. Arun, M. Sathish Aravindh, A. Venkatesan, M. Lakshmanan,	(参考訳) 貯水池計算の最近の研究は、時間的および非時間的データ処理のための高次元状態として入力を変換し保存する貯水池としての高次元力学系を本質的に含んでいる。ここでは、非線形写像、すなわちロジスティック写像と単純な有限三角級数を用いて、貯水池計算における貯水池を構成する仮想ノードを構成することにより、時間的および非時間的タスクを予測する方法を示す。時間的タスクに対してはLorenz, Rossler, Hindmarsh-Roseの3つの非線形系を予測し, 高精度な非時間的タスクに対しては7次多項式を推定する。また、予測はノイズの存在下で行われ、ターゲットと密接に一致していることがわかった。注目すべきは、ロジスティックマップがうまく機能し、実際の値や対象値に近いものを予測することである。根平均二乗誤差の低い値は,効率性の観点から,この手法の精度を確認した。本手法は貯水池計算における貯水池構築における連続力学系の必要性を解消するものである。さらに、3つの異なる非線形系の正確な予測は、この手法を一般的なものとみなすことができ、多くの系を予測できることを示している。最後に、この手法は将来のロスラー系の3変数の時系列を正確に予測する(自己予測)。 Recent studies on reservoir computing essentially involve a high dimensional dynamical system as the reservoir, which transforms and stores the input as a higher dimensional state, for temporal and nontemporal data processing. We demonstrate here a method to predict temporal and nontemporal tasks by constructing virtual nodes as constituting a reservoir in reservoir computing using a nonlinear map, namely the logistic map, and a simple finite trigonometric series. We predict three nonlinear systems, namely Lorenz, Rossler, and Hindmarsh-Rose, for temporal tasks and a seventh order polynomial for nontemporal tasks with great accuracy. Also, the prediction is made in the presence of noise and found to closely agree with the target. Remarkably, the logistic map performs well and predicts close to the actual or target values. The low values of the root mean square error confirm the accuracy of this method in terms of efficiency. Our approach removes the necessity of continuous dynamical systems for constructing the reservoir in reservoir computing. Moreover, the accurate prediction for the three different nonlinear systems suggests that this method can be considered a general one and can be applied to predict many systems. Finally, we show that the method also accurately anticipates the time series of the all the three variable of Rossler system for the future (self prediction).	翻訳日:2024-08-05 18:33:20 公開日:2024-08-02
# 微粒なシーングラフ生成のための適応的自己学習フレームワーク Adaptive Self-training Framework for Fine-grained Scene Graph Generation ( http://arxiv.org/abs/2401.09786v5 ) ライセンス: Link先を確認	Kibum Kim, Kanghoon Yoon, Yeonjun In, Jinyoung Moon, Donghyun Kim, Chanyoung Park,	(参考訳) シーングラフ生成(SGG)モデルは、長い尾の述語分布やアノテーションの欠如といったベンチマークデータセットに固有の問題に悩まされている。本研究では, 注釈のない三つ子を用いて, SGGの長期化問題を緩和することを目的とする。そこで本研究では,SGGモデルがトレーニングされている無注釈三重項に対して擬似ラベルを割り当てる,SGG(ST-SGG)の自己評価フレームワークを提案する。画像認識のための自己学習には大きな進歩があったが、意味的あいまいさや述語クラスの長期分布といった固有の性質から、SGGタスクのための自己学習フレームワークを設計することはより困難である。そこで本研究では,既存のSGGモデルに適用可能なモデルに依存しないフレームワークであるClass-specific Adaptive Thresholding with Momentum (CATM)を提案する。さらに,提案する自己学習フレームワークをMPNNベースのSGGモデルに導入する際に有用なグラフ構造学習器(GSL)を考案した。各種SGGモデルにおけるST-SGGの有効性を検証し,特に細粒度述語クラスの性能向上について検討した。 Scene graph generation (SGG) models have suffered from inherent problems regarding the benchmark datasets such as the long-tailed predicate distribution and missing annotation problems. In this work, we aim to alleviate the long-tailed problem of SGG by utilizing unannotated triplets. To this end, we introduce a Self-Training framework for SGG (ST-SGG) that assigns pseudo-labels for unannotated triplets based on which the SGG models are trained. While there has been significant progress in self-training for image recognition, designing a self-training framework for the SGG task is more challenging due to its inherent nature such as the semantic ambiguity and the long-tailed distribution of predicate classes. Hence, we propose a novel pseudo-labeling technique for SGG, called Class-specific Adaptive Thresholding with Momentum (CATM), which is a model-agnostic framework that can be applied to any existing SGG models. Furthermore, we devise a graph structure learner (GSL) that is beneficial when adopting our proposed self-training framework to the state-of-the-art message-passing neural network (MPNN)-based SGG models. Our extensive experiments verify the effectiveness of ST-SGG on various SGG models, particularly in enhancing the performance on fine-grained predicate classes.	翻訳日:2024-08-05 18:33:20 公開日:2024-08-02
# 非対称量子井戸におけるクーパーペア組換えによる絡み合った光子生成 Entangled Photon Generation through Cooper Pair Recombination in a Noncentrosymmetric Quantum Well ( http://arxiv.org/abs/2401.11577v2 ) ライセンス: Link先を確認	Mehdi Biderang, Erfan Hosseini, Alireza Akbari,	(参考訳) 我々は,非セントロ対称[001]量子井戸超伝導体におけるクーパー対再結合による絡み合った2光子対の生成を,アダクチャRashbaとDresselhausのスピン軌道結合を示す超伝導層との前方バイアスp-n接合により理論的に検討した。我々は、純粋な一重項クーパー対、特に従来の$s$-waveギャップ関数を含むシナリオの中で、最も達成可能な光子対の純度が現れることを示す。以上の結果から,Rashba と Dresselhaus のスピン軌道カップリングの大きさのバランスをとることで,反対称スピン軌道カップリングの振幅を小さくすることで,高純度で絡み合った状態を実現することが重要である。純度に関する懸念に加えて、2光子状態の分布を探索するために、絡み合ったペア間でそれらの個体群を比較して、潜在的な超伝導ペアを作る。 We explore theoretically the generation of entangled two-photon pairs by Cooper pair recombination in a noncentrosymmetric [001]-quantum well superconductor, driven by a forward-biased p-n junction with a superconducting layer which exhibits admixture Rashba and Dresselhaus spin-orbit couplings. We show that the highest achievable purity of entangled photon pairs emerges within scenarios involving pure singlet Cooper pairs, specifically, the conventional $s$-wave gap function. Our results highlight the importance of minimizing the charge-carrier level concentration and balancing the magnitudes of Rashba and Dresselhaus spin-orbit couplings to achieve entangled states with enhanced purity, which can be realized by reducing the amplitudes of antisymmetric spin-orbit couplings. In addition to purity concerns, to explore the distribution of two-photon states, we compare their population across entangled pairs for potential superconducting pairings.	翻訳日:2024-08-05 18:33:20 公開日:2024-08-02
# 微分可能木探索網 Differentiable Tree Search Network ( http://arxiv.org/abs/2401.11660v2 ) ライセンス: Link先を確認	Dixant Mittal, Wee Sun Lee,	(参考訳) 訓練データに制限のある意思決定問題では、ディープニューラルネットワークを用いて近似されたポリシー関数は、しばしば準最適性能を示す。別のアプローチでは、限られたデータから世界モデルを学び、オンライン検索を通じて行動を決定する。しかし, 学習世界モデルにおける不正確性に起因する複合的誤りにより, 性能に悪影響を及ぼす。 TreeQNのような手法は、ニューラルネットワークアーキテクチャにアルゴリズム的帰納バイアスを組み込むことで、これらの不正確な問題に対処しようとしているが、彼らが導入するバイアスはしばしば弱く、複雑な意思決定タスクには不十分である。本研究では,最も優れたオンライン検索アルゴリズムのアルゴリズム構造を組み込むことで,帰納的バイアスを大幅に強化するニューラルネットワークアーキテクチャである微分可能木探索ネットワーク(D-TSN)を紹介する。 D-TSNは、完全に差別化可能なオンライン検索を行うために、学習された世界モデルを採用している。世界モデルは検索アルゴリズムと協調的に最適化され、堅牢な世界モデルの学習を可能にし、予測不正確性の影響を緩和する。さらに、最優先探索の素早い組み込みにより、パラメータ空間における不連続損失関数がもたらされる可能性があることに留意する。本稿では、確率木拡張ポリシーを採用し、探索木拡張を別の意思決定課題として定式化し、勾配計算に有効な分散低減手法を導入することでこの問題に対処する。我々は,D-TSNを,ゲームやグリッドナビゲーションタスクにおいて限られたトレーニングデータシナリオでオフラインRLで評価し,D-TSNが一般的なモデルフリーおよびモデルベースラインより優れていることを示す。 In decision-making problems with limited training data, policy functions approximated using deep neural networks often exhibit suboptimal performance. An alternative approach involves learning a world model from the limited data and determining actions through online search. However, the performance is adversely affected by compounding errors arising from inaccuracies in the learned world model. While methods like TreeQN have attempted to address these inaccuracies by incorporating algorithmic inductive biases into the neural network architectures, the biases they introduce are often weak and insufficient for complex decision-making tasks. In this work, we introduce Differentiable Tree Search Network (D-TSN), a novel neural network architecture that significantly strengthens the inductive bias by embedding the algorithmic structure of a best-first online search algorithm. D-TSN employs a learned world model to conduct a fully differentiable online search. The world model is jointly optimized with the search algorithm, enabling the learning of a robust world model and mitigating the effect of prediction inaccuracies. Further, we note that a naive incorporation of best-first search could lead to a discontinuous loss function in the parameter space. We address this issue by adopting a stochastic tree expansion policy, formulating search tree expansion as another decision-making task, and introducing an effective variance reduction technique for the gradient computation. We evaluate D-TSN in an offline-RL setting with a limited training data scenario on Procgen games and grid navigation task, and demonstrate that D-TSN outperforms popular model-free and model-based baselines.	翻訳日:2024-08-05 18:33:20 公開日:2024-08-02
# Routoo: 大規模言語モデルへのルートを効果的に学ぶ Routoo: Learning to Route to Large Language Models Effectively ( http://arxiv.org/abs/2401.13979v2 ) ライセンス: Link先を確認	Alireza Mohammadshahi, Arshad Rafiq Shaikh, Majid Yazdani,	(参考訳) 基盤となる大規模言語モデル(LLM)の開発は、ますますコストがかかり非効率になりつつある。また、クローズドソースおよびより大きなオープンソースモデルは、一般的により優れたレスポンス品質を提供するが、より小さなモデルよりも推論コストが高い。本稿では,性能,コスト,効率に基づいて,特定のプロンプトに対してLLMの選択を最適化するアーキテクチャであるRoutooを紹介する。 Routooは2つの重要なコンポーネントで構成されている。性能予測器は軽量なLCMであり、様々なLCMの性能を評価・実行することなく推定する。コストを意識したデコーディングは、これらの予測とコストやレイテンシといった他の制約に基づいて、最も適切なモデルを選択する。オープンソースモデルを用いた57領域にわたるMMLUベンチマークを用いて,rutooの評価を行った。その結果,RoutooはMixtral 8x7bモデルの性能と一致し,推論コストを3分の1削減できることがわかった。さらに、コストの増加を許すことで、RutooはMixtralの精度を5%以上上回り、75.9%の精度を達成している。モデルプールにGPT4を組み込む場合、RutooはGPT4の性能を半分のコストでほぼ一致させ、25%のコスト削減でそれを上回ります。これらの結果から,複数のLSMの集合的知識を活用することで,新しいSOTAを低コストで作成できる可能性が浮かび上がっている。 Developing foundational large language models (LLMs) is becoming increasingly costly and inefficient. Also, closed-source and larger open-source models generally offer better response quality but come with higher inference costs than smaller models. In this paper, we introduce Routoo, an architecture designed to optimize the selection of LLMs for specific prompts based on performance, cost, and efficiency. Routoo consists of two key components: a performance predictor and a cost-aware decoding. The performance predictor is a lightweight LLM that estimates the performance of various underlying LLMs without needing to execute and evaluate them. The cost-aware decoding then selects the most suitable model based on these predictions and other constraints like cost and latency. We evaluated Routoo using the MMLU benchmark across 57 domains employing open-source models. Our results show that Routoo matches the performance of the Mixtral 8x7b model while reducing inference costs by one-third. Additionally, by allowing increased costs, Routoo surpasses Mixtral's accuracy by over 5% at equivalent costs, achieving an accuracy of 75.9%. When integrating GPT4 into our model pool, Routoo nearly matches GPT4's performance at half the cost and exceeds it with a 25% cost reduction. These outcomes highlight Routoo's potential to create new SOTA in a cost-effective manner by leveraging the collective knowledge of multiple LLMs.	翻訳日:2024-08-05 18:23:26 公開日:2024-08-02
# 格子誘起波動関数が捕捉された超流動体に及ぼす影響 Lattice-induced wavefunction effects on trapped superfluids ( http://arxiv.org/abs/2401.14004v3 ) ライセンス: Link先を確認	Yeyang Zhang,	(参考訳) 非相関系の波動関数効果はベリー曲率と量子計量によって特徴づけられる。さらに、相関粒子間の局所的相互作用に対するブロッホ波動関数効果を記述するゲージ独立テンソルを提案する。光学格子中の超低温ボソンに対する有効流体力学理論を導出する。高対称格子に対して等方性調和トラップの基底状態と超流動の集合モードを解く。動的過程において、波動関数効果は励起呼吸モードの固有周波数、振幅、位相シフトによって特徴づけられ、実験で観察できる。また、非自明な波動関数効果を持つ二部格子の密結合モデルを与え、その結果を典型的な実験パラメータで推定する。我々の発見は、現代のバンド理論と量子多体物理学のつながりを前進させる。 Wavefunction effects in uncorrelated systems are characterized by the Berry curvature and quantum metric. Beyond those, we propose gauge-independent tensors describing Bloch wavefunction effects on local interaction between correlated particles. We derive an effective hydrodynamic theory for ultracold bosons in optical lattices. Ground states and collective modes of superfluids in isotropic harmonic traps are solved for highly symmetric lattices. In a dynamic process, the wavefunction effects are featured by the eigenfrequency, amplitude, and phase shift of an excited breathing mode and can be observed in experiments. We also give a tight-binding model of a bipartite square lattice with nontrivial wavefunction effects, where results are estimated with typical experimental parameters. Our discovery advances the connections between the modern band theory and quantum many-body physics.	翻訳日:2024-08-05 18:23:26 公開日:2024-08-02
# 遺伝的・スライディングウインドウアルゴリズムを用いた空間・時間変動係数を用いた新型COVID-19 SEIRモデルの校正フレームワーク A new framework for calibrating COVID-19 SEIR models with spatial-/time-varying coefficients using genetic and sliding window algorithms ( http://arxiv.org/abs/2402.08524v2 ) ライセンス: Link先を確認	Huan Zhou, Ralf Schneider,	(参考訳) サセプティブル感染除去モデル(SEIR)は、非薬剤的介入(NPI)がCOVID-19の流行の地域的および時間的分布に与える影響をモデル化するために、空間的・時間的に異なる係数を仮定する。このようなモデルを使用する上で重要な課題は、地理的に参照された入院データ、すなわち空間的/時間的変動パラメータの効率的な推定から得られたデータに対する高速かつ正確な校正である。本研究では,SEIRモデルの空間的/時間的パラメータを最適化するための新しい校正フレームワークを提案する。また、重なり合うスライディングウインドウ手法(OSW)と遺伝的アルゴリズム(GA)キャリブレーションルーチンを組み合わせ、セグメント化されたパラメータ空間を自動的に検索する手法も考案した。並列化GAは計算負担を軽減するために使用される。我々のフレームワークは、メソッドの実装の複雑さをユーザから切り離して抽象化します。カスタマイズされたキャリブレーションシステムを設定し、パラメータの最適化された値を使用するための高レベルのAPIを提供する。本手法の適用例を,COVID-19関連ICU需要を観測した単一目的関数を用いて,空間年齢構造マイクロシミュレーションモデルのキャリブレーションについて検討した。その結果, 提案手法の有効性を反映し, 変動環境におけるパラメータの推定を行った。 A susceptible-exposed-infected-removed (SEIR) model assumes spatial-/time-varying coefficients to model the effect of non-pharmaceutical interventions (NPIs) on the regional and temporal distribution of COVID-19 disease epidemics. A significant challenge in using such model is their fast and accurate calibration to observed data from geo-referenced hospitalized data, i.e., efficient estimation of the spatial-/time-varying parameters. In this work, a new calibration framework is proposed towards optimizing the spatial-/time-varying parameters of the SEIR model. We also devise a method for combing the overlapping sliding window technique (OSW) with a genetic algorithm (GA) calibration routine to automatically search the segmented parameter space. Parallelized GA is used to reduce the computational burden. Our framework abstracts the implementation complexity of the method away from the user. It provides high-level APIs for setting up a customized calibration system and consuming the optimized values of parameters. We evaluated the application of our method on the calibration of a spatial age-structured microsimulation model using a single objective function that comprises observed COVID-19-related ICU demand. The results reflect the effectiveness of the proposed method towards estimating the parameters in a changing environment.	翻訳日:2024-08-05 18:23:26 公開日:2024-08-02
# 非局在量子源からのニュートンポテンシャルを超えた重力における量子効果 Quantum effects in gravity beyond the Newton potential from a delocalised quantum source ( http://arxiv.org/abs/2402.10288v2 ) ライセンス: Link先を確認	Lin-Qing Chen, Flaminia Giacomini,	(参考訳) テーブルトップ実験の最近の進歩は、重力が古典的な記述と互換性がないことを示す機会となる。重力の2つの量子源間の重力誘起絡み合いの生成のような現在の実験では、重力効果はニュートンポテンシャルと説明できる。したがって、この効果のニュートンの起源は、これらの実験から得られる重力の性質に関する結論の限界である。ここでは、この制限を克服する2つの効果を同定する。ニュートンポテンシャルを用いて再生することはできず、重力放出とは無関係である。まず、広ガウス状態の2つの一般量子源間の相互作用はニュートンポテンシャルや既知の古典理論や重力では再現できないことを示す。したがって、この相互作用の形式を観察するには、古典的な重力や量子的記述の変更が必要になる。第二に、重力場とその正準共役運動量の間の量子可換器が、試験粒子と相互作用する一般量子源の相対位相において追加用語として現れることを示す。この項を位相で観測することは、量子メディエーターとしての重力場のテストである。ニュートンポテンシャルで再現できるものよりも強い重力の量子的側面を特定することは、重力場の非古典性を証明し、これまで提案されたよりも広い意味で重力の量子的側面をテストする新しい世代の実験を計画するために重要である。 Recent progress in table-top experiments offers the opportunity to show for the first time that gravity is not compatible with a classical description. In all current experimental proposals, such as the generation of gravitationally induced entanglement between two quantum sources of gravity, gravitational effects can be explained with the Newton potential, namely in a regime that is consistent with the weak-field limit of general relativity and does not probe the field nature of gravity. Hence, the Newtonian origin of the effects is a limitation to the conclusions on the nature of gravity that can be drawn from these experiments. Here, we identify two effects that overcome this limitation: they cannot be reproduced using the Newton potential and are independent of graviton emission. First, we show that the interaction between two generic quantum sources of gravity, e.g. in wide Gaussian states, cannot be reproduced with the Newton potential nor with a known classical theory or gravity. Hence, observing the form of this interaction would require either a modification to classical gravity or its quantum description. Second, we show that the quantum commutator between the gravitational field and its canonically conjugate momentum appears as an additional term in the relative phase of a generic quantum source interacting with a test particle. Observing this term in the phase would be a test of the gravitational field as a quantum mediator. Identifying stronger quantum aspects of gravity than those reproducible with the Newton potential is crucial to prove the nonclassicality of the gravitational field and to plan a new generation of experiments testing quantum aspects of gravity in a broader sense than what proposed so far.	翻訳日:2024-08-05 18:23:26 公開日:2024-08-02
# AIM: メタモルフィックセキュリティテストのための入力セットの最小化を自動化する AIM: Automated Input Set Minimization for Metamorphic Security Testing ( http://arxiv.org/abs/2402.10773v3 ) ライセンス: Link先を確認	Nazanin Bayati Chaleshtari, Yoann Marquer, Fabrizio Pastore, Lionel C. Briand,	(参考訳) Webシステムのセキュリティテストは、クラフトインプットを生成して自動化することができるが、テストオラクルを自動化するソリューション、すなわち、正しい出力と間違った出力を区別するソリューションは、まだ予備的なままである。実際には、セキュリティ障害は、有効な入力を悪意のある入力に変換するメタモルフィック関係によって決定できる。しかし、それ以上のガイダンスがなければ、メタモルフィックな関係は通常、大きな入力セット上で実行されるため、時間を要するため、メタモルフィックなテストは現実的ではない。脆弱性検出機能を保ちながら、テストコストを削減するために入力を自動的に選択するアプローチであるAIMを提案する。 AIMにはクラスタリングベースのブラックボックスアプローチが含まれており、セキュリティ特性に基づいて同様の入力を識別する。また、コストを最小化しながら、多様な入力を効率的に選択できる新しい遺伝的アルゴリズムにも依存している。さらに、探索スペースを減らし、最小化処理を高速化する問題低減成分を含む。我々は、文書化された脆弱性で有名な2つのWebシステム、JenkinsとJoomlaにおけるAIMの有効性を評価した。 AIMの結果を4つの基準線と比較した。全体として、AIMは、脆弱性検出を保ちながら、Jenkinsで84%、Joomlaで82%のメタモルフィックテスト時間を短縮した。さらに、AIMは脆弱性カバレッジに関して考慮されたベースラインをすべて上回った。 Although the security testing of Web systems can be automated by generating crafted inputs, solutions to automate the test oracle, i.e., distinguishing correct from incorrect outputs, remain preliminary. Specifically, previous work has demonstrated the potential of metamorphic testing; indeed, security failures can be determined by metamorphic relations that turn valid inputs into malicious inputs. However, without further guidance, metamorphic relations are typically executed on a large set of inputs, which is time-consuming and thus makes metamorphic testing impractical. We propose AIM, an approach that automatically selects inputs to reduce testing costs while preserving vulnerability detection capabilities. AIM includes a clustering-based black box approach, to identify similar inputs based on their security properties. It also relies on a novel genetic algorithm able to efficiently select diverse inputs while minimizing their total cost. Further, it contains a problem-reduction component to reduce the search space and speed up the minimization process. We evaluated the effectiveness of AIM on two well-known Web systems, Jenkins and Joomla, with documented vulnerabilities. We compared AIM's results with four baselines. Overall, AIM reduced metamorphic testing time by 84% for Jenkins and 82% for Joomla, while preserving vulnerability detection. Furthermore, AIM outperformed all the considered baselines regarding vulnerability coverage.	翻訳日:2024-08-05 18:23:26 公開日:2024-08-02
# SymBa: 構造化自然言語推論のためのシンボリック・バックワード・チェイン SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning ( http://arxiv.org/abs/2402.12806v2 ) ライセンス: Link先を確認	Jinu Lee, Wonseok Hwang,	(参考訳) 大規模言語モデル(LLM)は目覚ましい推論能力を示しており、説明可能性を保証するための構造化された説明可能な証明を提供する。構造化推論の2つの方向のうち、特に後方連鎖に着目し、クエリは推論規則を適用して再帰的にサブゴールに分解される。現在普及している後方連鎖実装 (Least-to-most prompting と LAMBADA) は、任意の深度再帰やバインディングの伝搬といった、後方連鎖に必要な機能を実装していないことを指摘する。そこで本研究では,Symblic Backward Chaining (Symbolic Backward Chaining) という,新しい後方チェーンフレームワークを提案する。 SymBAでは、シンボリック・ソルバが証明プロセス全体を制御し、LLMは関連する自然言語の前提を検索し、それらをシンボリック・フォームに変換する。このLCM-ソルバ統合により、記号的に検証された完全に構造化された証明を生成する一方で、Symbaは、ベースラインと比較して様々な構造化された推論ベンチマークにおいて、性能、検証精度、効率を大幅に向上させる。 While Large Language Models (LLMs) have demonstrated remarkable reasoning ability, providing a structured, explainable proof to ensure explainability, i.e. structured reasoning, still remains challenging. Among two directions of structured reasoning, we specifically focus on backward chaining, where the query is recursively decomposed to subgoals by applying inference rules. We point out that current popular backward chaining implementations (Least-to-most prompting and LAMBADA) fail to implement the necessary features of backward chaining, such as arbitrary-depth recursion and binding propagation. To this end, we propose a novel backward chaining framework, SymBa (Symbolic Backward Chaining). In SymBA, a symbolic solver controls the whole proof process, and an LLM searches for the relevant natural language premises and translates them into a symbolic form for the solver. By this LLM-solver integration, while producing a completely structured proof that is symbolically verified, SymBa achieves significant improvement in performance, proof accuracy, and efficiency in diverse structured reasoning benchmarks compared to baselines.	翻訳日:2024-08-05 18:23:26 公開日:2024-08-02
# 簡単な例によるトレーニングデータの自動生成による文埋め込みの改善 Improving Sentence Embeddings with Automatic Generation of Training Data Using Few-shot Examples ( http://arxiv.org/abs/2402.15132v2 ) ライセンス: Link先を確認	Soma Sato, Hayato Tsukagoshi, Ryohei Sasano, Koichi Takeda,	(参考訳) デコーダベースの大規模言語モデル(LLM)は、自然言語処理における多くのタスクにおいて高いパフォーマンスを示している。これは文埋め込み学習にも当てはまり、デコーダベースのモデルであるPromptEOLは、セマンティックテキスト類似性(STS)タスクで最高のパフォーマンスを達成した。しかし、PromptEOLは微調整のために手動で注釈付き自然言語推論(NLI)データセットを必要とする。我々は,LLM を用いて NLI データセットを自動生成し,それを PromptEOL の微調整に用いることにより,手動で注釈付きデータセットを用いることなく文の埋め込みを改善することを目的としている。そこで本研究では,文埋め込み学習に適したデータ生成手法について検討する。具体的には、数ショットの学習による自動データセット生成に焦点を当て、数ショットの例を活用するための適切な方法を探る。 STSタスクの実験結果から,提案手法は手作業による大規模なアノテートデータセットを使わずに,既存のモデルよりも優れていることが示された。 Decoder-based large language models (LLMs) have shown high performance on many tasks in natural language processing. This is also true for sentence embedding learning, where a decoder-based model, PromptEOL, has achieved the best performance on semantic textual similarity (STS) tasks. However, PromptEOL requires a manually annotated natural language inference (NLI) dataset for fine-tuning. We aim to improve sentence embeddings without using large manually annotated datasets by automatically generating an NLI dataset with an LLM and using it for fine-tuning of PromptEOL. To achieve this, we explore methods of data generation suitable for sentence embedding learning in this study. Specifically, we will focus on automatic dataset generation through few-shot learning and explore the appropriate methods to leverage few-shot examples. Experimental results on the STS tasks demonstrate that our approach outperforms existing models in settings without large manually annotated datasets.	翻訳日:2024-08-05 18:23:26 公開日:2024-08-02
# 言語モデルのデータ選択に関する調査 A Survey on Data Selection for Language Models ( http://arxiv.org/abs/2402.16827v3 ) ライセンス: Link先を確認	Alon Albalak, Yanai Elazar, Sang Michael Xie, Shayne Longpre, Nathan Lambert, Xinyi Wang, Niklas Muennighoff, Bairu Hou, Liangming Pan, Haewon Jeong, Colin Raffel, Shiyu Chang, Tatsunori Hashimoto, William Yang Wang,	(参考訳) 近年の大規模言語モデルの成功の大きな要因は、教師なしの事前トレーニングに巨大な成長を続けるテキストデータセットを使用することである。しかしながら、利用可能なすべてのデータに対して、利用可能なテキストデータの質が変化するため、モデルを直接的にトレーニングすることは最適ではない(あるいは実現可能である)。データのフィルタリングは、必要なトレーニングの量を減らすことで、トレーニングモデルのカーボンフットプリントと財政コストを削減できる。データ選択方法は、トレーニングデータセットに含まれる候補データポイントと、選択したデータポイントから適切にサンプリングする方法を決定することを目的としている。改良されたデータ選択手法の約束は、この分野の研究の規模を急速に拡大させてきた。しかし、ディープラーニングは、主に実証的な証拠と大規模なデータの実験によって駆動されるため、広範なデータ選択研究のためのリソースを持つ組織はほとんどない。その結果、効果的なデータ選択のプラクティスに関する知識は、いくつかの組織に集中するようになった。知識のギャップを狭めるために,データ選択手法および関連研究分野に関する既存の文献を包括的にレビューし,既存のアプローチの分類を提示する。本研究は,現在の研究状況を説明することによって,新たな研究者のエントリーポイントを確立することにより,データ選択の進展を加速することを目的としている。さらに,本研究を通じて,文献の目立った穴に注意を向け,将来的な研究の道筋を提案し,論文を締めくくっている。 A major factor in the recent success of large language models is the use of enormous and ever-growing text datasets for unsupervised pre-training. However, naively training a model on all available data may not be optimal (or feasible), as the quality of available text data can vary. Filtering out data can also decrease the carbon footprint and financial costs of training models by reducing the amount of training required. Data selection methods aim to determine which candidate data points to include in the training dataset and how to appropriately sample from the selected data points. The promise of improved data selection methods has caused the volume of research in the area to rapidly expand. However, because deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive, few organizations have the resources for extensive data selection research. Consequently, knowledge of effective data selection practices has become concentrated within a few organizations, many of which do not openly share their findings and methodologies. To narrow this gap in knowledge, we present a comprehensive review of existing literature on data selection methods and related research areas, providing a taxonomy of existing approaches. By describing the current landscape of research, this work aims to accelerate progress in data selection by establishing an entry point for new and established researchers. Additionally, throughout this review we draw attention to noticeable holes in the literature and conclude the paper by proposing promising avenues for future research.	翻訳日:2024-08-05 18:23:26 公開日:2024-08-02
# インコンテキスト学習におけるデュアルオペレーティングモード Dual Operating Modes of In-Context Learning ( http://arxiv.org/abs/2402.18819v2 ) ライセンス: Link先を確認	Ziqian Lin, Kangwook Lee,	(参考訳) In-context Learning (ICL)は、タスク学習(英語版)、すなわち、インコンテキストサンプルから新しいスキルを取得する、タスク検索(英語版)、すなわち、関連する事前訓練されたスキルの位置と活性化の2つの動作モードを示す。最近の理論的研究では、ICLを解析するための様々な数学的モデルが検討されているが、既存のモデルは一度に1つの動作モードしか説明できない。本稿では,ICLの二重動作モードを同時に説明できる確率モデルを提案する。線形関数の文脈内学習に着目し,複数のタスク群とタスク依存入力分布を導入することで,事前学習のための既存のモデルを拡張する。次に,2乗損失下での最適事前学習モデルの挙動,すなわちラベルのMMSE推定器の分析を行った。先行学習タスクの分布を観察例として, タスク後部分布のクローズドフォーム表現を導出する。クローズドフォーム表現では、ICLの2つの動作モードの定量的理解が得られる。さらに、ある条件下では、ICLのリスクは最初増加し、その後、より文脈内での例で減少する。我々のモデルは、この「初期段階」現象について、妥当な説明を提供する: 限られた数のインコンテキストサンプルが不正なスキルの検索に繋がる可能性があり、それによってリスクが増大し、より多くのインコンテキストサンプルでタスク学習が効果を発揮すると、最終的には減少する。また,テキスト内サンプルがランダムラベルに割り当てられるゼロショットICLなど,バイアス付きラベルを用いてICLを理論的に解析する。最後に,トランスフォーマーと大規模言語モデルを用いた実験により,この結果と予測を検証した。 In-context learning (ICL) exhibits dual operating modes: task learning, i.e., acquiring a new skill from in-context samples, and task retrieval, i.e., locating and activating a relevant pretrained skill. Recent theoretical work investigates various mathematical models to analyze ICL, but existing models explain only one operating mode at a time. We introduce a probabilistic model, with which one can explain the dual operating modes of ICL simultaneously. Focusing on in-context learning of linear functions, we extend existing models for pretraining data by introducing multiple task groups and task-dependent input distributions. We then analyze the behavior of the optimally pretrained model under the squared loss, i.e., the MMSE estimator of the label given in-context examples. Regarding pretraining task distribution as prior and in-context examples as the observation, we derive the closed-form expression of the task posterior distribution. With the closed-form expression, we obtain a quantitative understanding of the two operating modes of ICL. Furthermore, we shed light on an unexplained phenomenon observed in practice: under certain settings, the ICL risk initially increases and then decreases with more in-context examples. Our model offers a plausible explanation for this "early ascent" phenomenon: a limited number of in-context samples may lead to the retrieval of an incorrect skill, thereby increasing the risk, which will eventually diminish as task learning takes effect with more in-context samples. We also theoretically analyze ICL with biased labels, e.g., zero-shot ICL, where in-context examples are assigned random labels. Lastly, we validate our findings and predictions via experiments involving Transformers and large language models.	翻訳日:2024-08-05 18:23:26 公開日:2024-08-02
# ソフトウェアエンジニアリングの公平さを理解する - Stack Exchangeからの洞察 Understanding Fairness in Software Engineering: Insights from Stack Exchange ( http://arxiv.org/abs/2402.19038v3 ) ライセンス: Link先を確認	Emeralda Sesari, Federica Sarro, Ayushi Rastogi,	(参考訳) ソフトウェア実践者は、同僚や個人、オンラインで作業する際の問題について議論する。これらの議論は技術的(例:バグの修正方法?)と社会的(例:作業を公平に割り当てる方法? ソフトウェアエンジニアリングの人的および社会的要因において、公平性の問題と解決策を探求する知識が増えている一方で、ほとんどの場合、特定の問題に焦点が当てられている。この研究はStack Exchangeサイトのソフトウェア実践者による公正な議論を提供する。本稿では,ソフトウェア実践者の公正な経験と,ソフトウェア開発チームにおける公正な期待を示す探索的研究について述べる。私たちはまた、ソフトウェア実践者が最もよく話す公平さの側面を特定したいと思っています。例えば、彼らは収入の公平さや、職場でどのように扱われるかをより気にしていますか? Stack Exchangeの8つのサイトでの公平性に関する議論を調査した結果,4,178の候補ポストから手作業で収集した136の投稿(28の質問と108の回答)のリストが得られた。この調査によると、フェアネスに関する議論(24記事)の大多数は、多くのソフトウェア実践者が給与とそれがどのようにかなり分散されているかについて非常に関心を持っていることを示唆している。また、あまり議論されることはないが、採用における公正性に関する議論は、最も多くのビューやスコアを受け取る傾向にあることも指摘した。興味深いことに、この研究は保護された属性を超えて不公平な体験が広がることを示している。本研究では,保護属性について言及した投稿は136件中25件に過ぎず,主にジェンダーが議論されている。 Software practitioners discuss problems at work with peers, in-person and online. These discussions can be technical (e.g., how to fix a bug?) and social (e.g., how to assign work fairly?). While there is a growing body of knowledge exploring fairness problems and solutions in the human and social factors of software engineering, most focus has been on specific problems. This study provides fairness discussions by software practitioners on Stack Exchange sites. We present an exploratory study presenting the fairness experience of software practitioners and fairness expectations in software teams. We also want to identify the fairness aspects software practitioners talk about the most. For example, do they care more about fairness in income or how they are treated in the workplace? Our investigation of fairness discussions on eight Stack Exchange sites resulted in a list of 136 posts (28 questions and 108 answers) manually curated from 4,178 candidate posts. The study reveals that the majority of fairness discussions (24 posts) revolve around the topic of income suggesting that many software practitioners are highly interested in matters related to their pay and how it is fairly distributed. Further, we noted that while not discussed as often, discussions on fairness in recruitment tend to receive the highest number of views and scores. Interestingly, the study shows that unfairness experiences extend beyond the protected attributes. In this study, only 25 out of 136 posts mention protected attributes, with gender mainly being discussed.	翻訳日:2024-08-05 18:23:26 公開日:2024-08-02
# 異方性誘起スピンパリティ効果 Anisotropy-Induced Spin Parity Effects ( http://arxiv.org/abs/2402.19311v4 ) ライセンス: Link先を確認	Shuntaro Sumita, Akihiro Tanaka, Yusuke Kato,	(参考訳) スピンパリティ効果(スピンパリティえき、英: Spin parity effect)とは、系の物理的挙動における二分法が生じる特別な状況を指し、関連するスピン量子数が積分であるか半負積分であるかに依存する。反強磁性スピン鎖におけるハルダン予想と同様に、それらの追求はしばしば深い洞察を導き、量子凝縮物質物理学の新しい発展を呼び起こす。ここでは、異方性相互作用を用いて、任意の空間次元におけるそのような効果を生成するための単純で一般的なスキームと、最先端のコールド原子実装の妥当な到達範囲に設定する。本研究では, 横磁場中の異方性反強磁性体である1次元スピン鎖モデルの磁化挙動を詳細に解析し, 従来注目されてきたが明確には理解されていない磁化曲線で観測された有限サイズの効果の量子的起源を解明する。 Spin parity effects refer to those special situations where a dichotomy in the physical behavior of a system arises, solely depending on whether the relevant spin quantum number is integral or half-odd integral. As is the case with the Haldane conjecture in antiferromagnetic spin chains, their pursuit often derives deep insights and invokes new developments in quantum condensed matter physics. Here we put forth a simple and general scheme for generating such effects in any spatial dimension through the use of anisotropic interactions, and a setup within reasonable reach of state-of-the-art cold-atom implementations. We demonstrate its utility through a detailed analysis of the magnetization behavior of a specific one-dimensional spin chain model -- an anisotropic antiferromagnet in a transverse magnetic field, unraveling along the way the quantum origin of finite-size effects observed in the magnetization curve that had previously been noted but not clearly understood.	翻訳日:2024-08-05 18:23:26 公開日:2024-08-02
# DISORF - 移動ロボットのための分散オンライン3D再構成フレームワーク DISORF: A Distributed Online 3D Reconstruction Framework for Mobile Robots ( http://arxiv.org/abs/2403.00228v3 ) ライセンス: Link先を確認	Chunlin Li, Hanrui Fan, Xiaorui Huang, Ruofan Liang, Sankeerth Durvasula, Nandita Vijaykumar,	(参考訳) 本研究では,資源制約された移動ロボットやエッジデバイスが捉えたシーンのオンライン3次元再構成と可視化を実現するためのフレームワークであるDIORFを提案する。エッジデバイスの限られた計算能力と潜在的に限られたネットワーク可用性に対処するため,エッジデバイスとリモートサーバ間で効率的に計算を分散するフレームワークを設計する。我々は、オンデバイスSLAMシステムを活用して、ポーズ付きキーフレームを生成し、それらを遠隔サーバに送信し、ニューラル3D手法の最近の進歩を活用して、実行時に高品質な3D再構成と可視化を行う。我々は、画像サンプリング戦略がレンダリング品質を著しく低下させるおそれのあるオンライントレーニングにおいて、重要な課題を識別する。本稿では,オンライン学習におけるこの課題に対処する,シフト指数型フレームサンプリング手法を提案する。我々は,移動ロボットやエッジデバイスのカメラから撮影・ストリームされる未知シーンの高品質なリアルタイム再構築と可視化を実現する上で,我々のフレームワークの有効性を実証する。 We present a framework, DISORF, to enable online 3D reconstruction and visualization of scenes captured by resource-constrained mobile robots and edge devices. To address the limited computing capabilities of edge devices and potentially limited network availability, we design a framework that efficiently distributes computation between the edge device and the remote server. We leverage on-device SLAM systems to generate posed keyframes and transmit them to remote servers that can perform high-quality 3D reconstruction and visualization at runtime by leveraging recent advances in neural 3D methods. We identify a key challenge with online training where naive image sampling strategies can lead to significant degradation in rendering quality. We propose a novel shifted exponential frame sampling method that addresses this challenge for online training. We demonstrate the effectiveness of our framework in enabling high-quality real-time reconstruction and visualization of unknown scenes as they are captured and streamed from cameras in mobile robots and edge devices.	翻訳日:2024-08-05 18:23:26 公開日:2024-08-02
# 自然言語処理のためのプレトレイン・フィネチューンパラダイムに関する研究 A Tutorial on the Pretrain-Finetune Paradigm for Natural Language Processing ( http://arxiv.org/abs/2403.02504v3 ) ライセンス: Link先を確認	Yu Wang, Wen Qu,	(参考訳) 自然言語が思考や感情を表現する主要な経路として機能していることを考えると、テキスト分析は心理学研究において重要な技術となっている。自然言語から貴重な洞察を抽出し、人格特性評価、メンタルヘルスモニタリング、対人コミュニケーションにおける感情分析などの取り組みを促進する。テキスト分析では、既存の研究は、事前に構築された辞書を使って、時間を要する人間のコーディングを頼りにし、可能なすべてのシナリオをカバーできないか、大量のラベル付きデータを必要とするモデルをスクラッチからトレーニングする。本チュートリアルでは,プレトレイン-ファインチューンパラダイムについて紹介する。 Pretrain-finetune パラダイムは、テキスト分析と自然言語処理における変換的アプローチを表している。このパラダイムは、大規模な事前訓練された言語モデルを使用することで、限られた訓練データであっても、微調整タスクにおいて顕著な効率性を示す。この効率性は、注釈付きサンプルの数が非常に限られている社会科学の研究にとって特に有益である。本チュートリアルでは,プレトレイン-ファインチューンパラダイムの包括的紹介を行う。まず、事前学習と微調整の基本概念を掘り下げ、続いて実世界のアプリケーションを用いた実践的な演習を行った。マルチクラス分類や回帰など,様々なタスクにまたがるパラダイムの適用例を示す。このチュートリアルは、その有効性とユーザフレンドリさを強調し、このパラダイムのより広範な採用を促進することを目的としている。この目的のために、私たちはすべてのコードとデータセットへのオープンアクセスを提供しました。このチュートリアルは様々な心理学の分野において非常に有益であり、様々な研究環境でテキスト分析を利用するための包括的なガイドを提供する。 Given that natural language serves as the primary conduit for expressing thoughts and emotions, text analysis has become a key technique in psychological research. It enables the extraction of valuable insights from natural language, facilitating endeavors like personality traits assessment, mental health monitoring, and sentiment analysis in interpersonal communications. In text analysis, existing studies often resort to either human coding, which is time-consuming, using pre-built dictionaries, which often fails to cover all possible scenarios, or training models from scratch, which requires large amounts of labeled data. In this tutorial, we introduce the pretrain-finetune paradigm. The pretrain-finetune paradigm represents a transformative approach in text analysis and natural language processing. This paradigm distinguishes itself through the use of large pretrained language models, demonstrating remarkable efficiency in finetuning tasks, even with limited training data. This efficiency is especially beneficial for research in social sciences, where the number of annotated samples is often quite limited. Our tutorial offers a comprehensive introduction to the pretrain-finetune paradigm. We first delve into the fundamental concepts of pretraining and finetuning, followed by practical exercises using real-world applications. We demonstrate the application of the paradigm across various tasks, including multi-class classification and regression. Emphasizing its efficacy and user-friendliness, the tutorial aims to encourage broader adoption of this paradigm. To this end, we have provided open access to all our code and datasets. The tutorial is highly beneficial across various psychology disciplines, providing a comprehensive guide to employing text analysis in diverse research settings.	翻訳日:2024-08-05 18:23:26 公開日:2024-08-02
# 常磁性体の量子アニーリングと非平衡ダイナミクスにおける断熱性ボトルネック Adiabatic Bottlenecks in Quantum Annealing and Nonequilibrium Dynamics of Paramagnons ( http://arxiv.org/abs/2403.11548v2 ) ライセンス: Link先を確認	Tim Bode, Frank K. Wilhelm,	(参考訳) 長距離相互作用量子スピングラスと組合せ最適化問題との対応は、断熱量子コンピューティングの物理的動機となっている。一方、乱れた(量子)スピン系では、無限のシステムとアンサンブルサイズの極限におけるシステム量の計算を可能にする複製トリックのような正確な方法に焦点が当てられている。一方、最適化問題の与えられたインスタンスを解くとき、乱数平均量は、インスタンス固有の有限サイズの性質、特に真の解にのみ興味を持つため、何の関係も持たない。ここでは、スピンコヒーレント状態経路積分に非平衡グリーン関数形式を適用し、アニーリング経路に沿った統計的揺らぎと集合励起スペクトルを得る。量子シェリントン・カークパトリックスピンガラスの例では、広範囲な数値的な結果と比較することにより、この手法がアニーリングプロトコルのインスタンス固有のボトルネックにアクセスできることを示す。 The correspondence between long-range interacting quantum spin glasses and combinatorial optimization problems underpins the physical motivation for adiabatic quantum computing. On one hand, in disordered (quantum) spin systems, the focus is on exact methods such as the replica trick that allow the calculation of system quantities in the limit of infinite system and ensemble size. On the other hand, when solving a given instance of an optimization problem, disorder-averaged quantities are of no relevance, as one is solely interested in instance-specific, finite-size properties, in particular the true solution. Here, we apply the nonequilibrium Green-function formalism to the spin coherent-state path integral to obtain the statistical fluctuations and the collective-excitation spectrum along the annealing path. For the example of the quantum Sherrington-Kirkpatrick spin glass, by comparing to extensive numerically exact results, we show that this method provides access to the instance-specific bottlenecks of the annealing protocol.	翻訳日:2024-08-05 18:13:29 公開日:2024-08-02
# 説明者のアンサンブルが生成した反事実集合から説明を選択するための多基準アプローチ A multi-criteria approach for selecting an explanation from the set of counterfactuals produced by an ensemble of explainers ( http://arxiv.org/abs/2403.13940v2 ) ライセンス: Link先を確認	Ignacy Stępka, Mateusz Lango, Jerzy Stefanowski,	(参考訳) ファクトファクトは、より望ましい予測を得るための代替シナリオを提供することで、MLモデルの予測を説明するために広く使用される。これらは、異なる、時には矛盾する、品質測定を最適化し、全く異なるソリューションを生成する様々な方法によって生成される。しかし、最も適切な説明方法と生成された偽物を選択することは容易ではない。本稿では,ユーザが様々な説明手法をテストし,矛盾する解を解析する代わりに,多段階アンサンブルアプローチを用いることを提案する。それは妥協ソリューションを提供し、いくつかの人気のある品質基準によく適合する。このアプローチは,パレートフロントから1つのファクトファクトを選択する,支配関係と理想的なポイント決定支援手法を利用する。実験により,提案手法は,検討された品質指標の魅力的な妥協値を持つ,完全な動作可能な反事実を生成できることが実証された。 Counterfactuals are widely used to explain ML model predictions by providing alternative scenarios for obtaining the more desired predictions. They can be generated by a variety of methods that optimize different, sometimes conflicting, quality measures and produce quite different solutions. However, choosing the most appropriate explanation method and one of the generated counterfactuals is not an easy task. Instead of forcing the user to test many different explanation methods and analysing conflicting solutions, in this paper, we propose to use a multi-stage ensemble approach that will select single counterfactual based on the multiple-criteria analysis. It offers a compromise solution that scores well on several popular quality measures. This approach exploits the dominance relation and the ideal point decision aid method, which selects one counterfactual from the Pareto front. The conducted experiments demonstrated that the proposed approach generates fully actionable counterfactuals with attractive compromise values of the considered quality measures.	翻訳日:2024-08-05 18:13:29 公開日:2024-08-02
# 3Dメッシュのテクスチャ化のためのマルチビュー整合性向上のための最適化フレームワーク An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes ( http://arxiv.org/abs/2403.15559v2 ) ライセンス: Link先を確認	Zhengyi Zhao, Chen Song, Xiaodong Gu, Yuan Dong, Qi Zuo, Weihao Yuan, Liefeng Bo, Zilong Dong, Qixing Huang,	(参考訳) 事前訓練されたテキスト・ツー・イメージモデルを用いた3Dメッシュのテクスチャ化における根本的な問題は、マルチビューの一貫性を保証することである。最先端のアプローチでは、一般的に拡散モデルを用いてマルチビュー入力を集約するが、一般的な問題は集約ステップにおける平均演算による曖昧さや局所的な特徴の不整合である。本稿では,多視点整合性を実現するために,4段階の最適化フレームワークを提案する。特に、第1段階は、MV一貫性拡散プロセスを用いて、予め定義された視点の集合から、過剰に完全な2次元テクスチャの集合を生成する。第2段階は、基礎となる3Dモデルをカバーしながら相互に一貫性のあるビューのサブセットを選択する。半確定プログラムを解くことで、この目標を達成する方法を示す。第3ステージは、重複する領域にまたがって選択されたビューを調整するために、厳密でないアライメントを実行する。第4ステージは、各メッシュ面と選択されたビューを関連付けるためにMRF問題を解決する。特に第3段と第4段は反復され、第4段のカットは第3段の非剛性アライメントを奨励し、カットに近い領域にフォーカスする。実験結果から,本手法は質的,定量的にベースラインアプローチを著しく上回ることがわかった。プロジェクトページ: https://aigc3d.github.io/ConsistenTex。 A fundamental problem in the texturing of 3D meshes using pre-trained text-to-image models is to ensure multi-view consistency. State-of-the-art approaches typically use diffusion models to aggregate multi-view inputs, where common issues are the blurriness caused by the averaging operation in the aggregation step or inconsistencies in local features. This paper introduces an optimization framework that proceeds in four stages to achieve multi-view consistency. Specifically, the first stage generates an over-complete set of 2D textures from a predefined set of viewpoints using an MV-consistent diffusion process. The second stage selects a subset of views that are mutually consistent while covering the underlying 3D model. We show how to achieve this goal by solving semi-definite programs. The third stage performs non-rigid alignment to align the selected views across overlapping regions. The fourth stage solves an MRF problem to associate each mesh face with a selected view. In particular, the third and fourth stages are iterated, with the cuts obtained in the fourth stage encouraging non-rigid alignment in the third stage to focus on regions close to the cuts. Experimental results show that our approach significantly outperforms baseline approaches both qualitatively and quantitatively. Project page: https://aigc3d.github.io/ConsistenTex.	翻訳日:2024-08-05 18:13:29 公開日:2024-08-02
# 無人航空システムのシステムレベル自動試験 Automated System-level Testing of Unmanned Aerial Systems ( http://arxiv.org/abs/2403.15857v2 ) ライセンス: Link先を確認	Hassan Sartaj, Asmar Muqeet, Muhammad Zohaib Iqbal, Muhammad Uzair Khan,	(参考訳) 無人航空システム(UAS)は、安全クリティカルでミッションクリティカルな様々なアビオニクスシステムに依存している。国際安全基準の主な要件は、アビオニクスソフトウェアシステムの厳格なシステムレベルのテストを実行することである。現在の産業的なプラクティスは、手動でテストシナリオを作成し、シミュレータを使ってこれらのシナリオを手動/自動で実行し、成果を手動で評価することです。テストシナリオは一般的に、特定の飛行条件や環境条件を設定し、これらの設定でテスト中のシステムをテストする。この目的のための最先端のアプローチは、手動のテストシナリオの開発と評価も必要である。本稿では,UASのシステムレベルのテストを自動化する新しい手法を提案する。提案したアプローチ(AITester)は、モデルベースのテストと人工知能(AI)技術を使用して、さまざまなテストシナリオを自動生成、実行、評価する。テストシナリオは、実行時の環境コンテキストに基づいてテスト実行中に、即時に生成される。このアプローチはツールセットによってサポートされます。地上管制局(GCS)の無人航空機(UAV)のオートパイロットシステムとコックピット表示システム(CDS)の2つのコアコンポーネントに対する提案手法を実証的に評価した。その結果,AITesterはUAVオートパイロットの期待される動作から逸脱するテストシナリオを効果的に生成し,GCS-CDSの潜在的な欠陥を明らかにすることができた。 Unmanned aerial systems (UAS) rely on various avionics systems that are safety-critical and mission-critical. A major requirement of international safety standards is to perform rigorous system-level testing of avionics software systems. The current industrial practice is to manually create test scenarios, manually/automatically execute these scenarios using simulators, and manually evaluate outcomes. The test scenarios typically consist of setting certain flight or environment conditions and testing the system under test in these settings. The state-of-the-art approaches for this purpose also require manual test scenario development and evaluation. In this paper, we propose a novel approach to automate the system-level testing of the UAS. The proposed approach (AITester) utilizes model-based testing and artificial intelligence (AI) techniques to automatically generate, execute, and evaluate various test scenarios. The test scenarios are generated on the fly, i.e., during test execution based on the environmental context at runtime. The approach is supported by a toolset. We empirically evaluate the proposed approach on two core components of UAS, an autopilot system of an unmanned aerial vehicle (UAV) and cockpit display systems (CDS) of the ground control station (GCS). The results show that the AITester effectively generates test scenarios causing deviations from the expected behavior of the UAV autopilot and reveals potential flaws in the GCS-CDS.	翻訳日:2024-08-05 18:13:29 公開日:2024-08-02
# DASA: 遅延適応型マルチエージェント確率近似 DASA: Delay-Adaptive Multi-Agent Stochastic Approximation ( http://arxiv.org/abs/2403.17247v3 ) ライセンス: Link先を確認	Nicolò Dal Fabbro, Arman Adibi, H. Vincent Poor, Sanjeev R. Kulkarni, Aritra Mitra, George J. Pappas,	(参考訳) 我々は,Stochastic Approximation (SA) 問題を並列に動作し,中央サーバと通信することで高速化することを目的としている。サーバへのアップリンク送信は、非同期で潜在的に非バウンドな時間変化の遅延にさらされていると仮定する。分散計算の利点を享受しながら遅延とストラグラーの効果を緩和するため,マルチエージェント確率近似のための遅延適応アルゴリズムである \texttt{DASA} を提案する。エージェントの確率的観察過程が独立なマルコフ連鎖であることを仮定して、 texttt{DASA} の有限時間解析を行う。既存の結果を前進させる最初のアルゴリズムは、収束速度が混合時間$\tau_{mix}$と平均遅延$\tau_{avg}$にのみ依存するが、マルコヴィアンサンプリングでは$N$倍収束速度を共同で達成する。我々の研究は、マルチエージェントおよび分散時間差学習(TD)、Qラーニング、相関データによる確率的最適化など、様々なSAアプリケーションに関係している。 We consider a setting in which $N$ agents aim to speedup a common Stochastic Approximation (SA) problem by acting in parallel and communicating with a central server. We assume that the up-link transmissions to the server are subject to asynchronous and potentially unbounded time-varying delays. To mitigate the effect of delays and stragglers while reaping the benefits of distributed computation, we propose \texttt{DASA}, a Delay-Adaptive algorithm for multi-agent Stochastic Approximation. We provide a finite-time analysis of \texttt{DASA} assuming that the agents' stochastic observation processes are independent Markov chains. Significantly advancing existing results, \texttt{DASA} is the first algorithm whose convergence rate depends only on the mixing time $\tau_{mix}$ and on the average delay $\tau_{avg}$ while jointly achieving an $N$-fold convergence speedup under Markovian sampling. Our work is relevant for various SA applications, including multi-agent and distributed temporal difference (TD) learning, Q-learning and stochastic optimization with correlated data.	翻訳日:2024-08-05 18:13:29 公開日:2024-08-02
# デモから視覚的四足歩行操作を学習する Learning Visual Quadrupedal Loco-Manipulation from Demonstrations ( http://arxiv.org/abs/2403.20328v2 ) ライセンス: Link先を確認	Zhengmao He, Kun Lei, Yanjie Ze, Koushil Sreenath, Zhongyu Li, Huazhe Xu,	(参考訳) 四足歩行ロボットは徐々に人間環境に統合されている。四足歩行ロボットの移動能力の増大にもかかわらず、現実的な場面での物体との相互作用はまだ限られている。四足歩行ロボットにロボットアームを追加することで、物体を操作することができるが、四足歩行ロボットは基本的に4つの手足を備えた移動ユニットであり、それぞれが3自由度(DoF)を持つことを考えると、しばしば冗長である。そこで,本研究の目的は,四足歩行ロボットを足のみを用いて実世界の操作タスクの実行に活用することである。我々はロコ操作プロセスを低レベル強化学習(RL)ベースのコントローラと高レベル行動クローン(BC)ベースのプランナに分解する。操作軌跡をパラメータ化することにより,上層と下層の努力を同期させ,RLとBCの利点を活用する。提案手法はシミュレーションや実世界の実験を通じて検証され,移動中にバスケットを持ち上げる,食器洗い機を閉じる,ボタンを押す,ドアを押すなど,移動性や高精度な作業を行うロボットの能力を実証した。プロジェクトウェブサイト: https://zhengmaohe.github.io/leg-manip Quadruped robots are progressively being integrated into human environments. Despite the growing locomotion capabilities of quadrupedal robots, their interaction with objects in realistic scenes is still limited. While additional robotic arms on quadrupedal robots enable manipulating objects, they are sometimes redundant given that a quadruped robot is essentially a mobile unit equipped with four limbs, each possessing 3 degrees of freedom (DoFs). Hence, we aim to empower a quadruped robot to execute real-world manipulation tasks using only its legs. We decompose the loco-manipulation process into a low-level reinforcement learning (RL)-based controller and a high-level Behavior Cloning (BC)-based planner. By parameterizing the manipulation trajectory, we synchronize the efforts of the upper and lower layers, thereby leveraging the advantages of both RL and BC. Our approach is validated through simulations and real-world experiments, demonstrating the robot's ability to perform tasks that demand mobility and high precision, such as lifting a basket from the ground while moving, closing a dishwasher, pressing a button, and pushing a door. Project website: https://zhengmaohe.github.io/leg-manip	翻訳日:2024-08-05 18:13:29 公開日:2024-08-02
# 旅行購入問題に対する深層強化学習 Deep Reinforcement Learning for Traveling Purchaser Problems ( http://arxiv.org/abs/2404.02476v3 ) ライセンス: Link先を確認	Haofeng Yuan, Rongping Zhu, Wanlu Yang, Shiji Song, Keyou You, Yuli Zhang, C. L. Philip Chen,	(参考訳) 旅行購入問題(TPP)は、幅広い応用において重要な組合せ最適化問題である。ルーティングと購入の結合のため、既存のTPPの作業はルート構築と購入計画を同時に扱うことが一般的であり、高い計算コストと厳密な設計を伴うヒューリスティックな手法をもたらすが、性能は限られている。対照的に、我々はルート構築と購入計画を個別に扱う深層強化学習(DRL)に基づく新しいアプローチを提案し、グローバルな視点からソリューションを評価し、最適化する。提案手法の主な構成要素は,TPP が市場生産関係を捉えるための二部グラフ表現と,その二部グラフから情報を抽出し,それを用いて経路を逐次構築するポリシネットワークである。このフレームワークの重要な利点は、ポリシーネットワークを用いて効率的にルートを構築することができ、ルートが決定されると、関連する購入計画は線形プログラミングにより容易に導出でき、DRLを利用することで、ポリシーネットワークをトレーニングして、グローバルなソリューションの目的を最適化することができることである。さらに、メタラーニング戦略を導入することで、ポリシーネットワークは大規模TPPインスタンス上で安定してトレーニングすることができ、トレーニング中に見たことのないはるかに大きなインスタンスであっても、さまざまなサイズや分布のインスタンスに対して適切に一般化することができる。様々な合成TPPインスタンスとTPPLIBベンチマークの実験により、DRLベースのアプローチは、確立されたTPPヒューリスティックスを大幅に上回り、最適性ギャップを40%-90%削減し、特に大規模インスタンスにおいて実行時に有利であることを示す。 The traveling purchaser problem (TPP) is an important combinatorial optimization problem with broad applications. Due to the coupling between routing and purchasing, existing works on TPPs commonly address route construction and purchase planning simultaneously, which, however, leads to exact methods with high computational cost and heuristics with sophisticated design but limited performance. In sharp contrast, we propose a novel approach based on deep reinforcement learning (DRL), which addresses route construction and purchase planning separately, while evaluating and optimizing the solution from a global perspective. The key components of our approach include a bipartite graph representation for TPPs to capture the market-product relations, and a policy network that extracts information from the bipartite graph and uses it to sequentially construct the route. One significant benefit of our framework is that we can efficiently construct the route using the policy network, and once the route is determined, the associated purchasing plan can be easily derived through linear programming, while, leveraging DRL, we can train the policy network to optimize the global solution objective. Furthermore, by introducing a meta-learning strategy, the policy network can be trained stably on large-sized TPP instances, and generalize well across instances of varying sizes and distributions, even to much larger instances that are never seen during training. Experiments on various synthetic TPP instances and the TPPLIB benchmark demonstrate that our DRL-based approach can significantly outperform well-established TPP heuristics, reducing the optimality gap by 40%-90%, and also showing an advantage in runtime, especially on large-sized instances.	翻訳日:2024-08-05 18:13:29 公開日:2024-08-02
# QDsim: 大規模量子ドットデバイスをシミュレートするユーザフレンドリーなツールボックス QDsim: A user-friendly toolbox for simulating large-scale quantum dot devices ( http://arxiv.org/abs/2404.02712v2 ) ライセンス: Link先を確認	Valentina Gualtieri, Charles Renshaw-Whitman, Vinicius Hernandes, Eliska Greplova,	(参考訳) 我々は、大規模な量子ドットデバイスにおける電荷安定性図を高速に生成するためのピソンパッケージであるQDsimを紹介し、従来の二重あるいは三重のドットを超えて拡張する。 QDsimは、凸最適化問題として最低エネルギー電荷構成を求めるタスクを言い換える、定数相互作用モデルに基づいている。したがって,既存のCVXPYパッケージと適切な強力な解法を組み合わせることで,安定図やポリトープの作成を効率化する凸最適化を実現できる。複数の例を通して、自動チューニングアルゴリズムのための機械学習モデルのトレーニングの基礎となる大規模なデータセットを、QDsimがどのように生成できるかを実証する。現在パッケージは、定数相互作用モデル以外の量子効果をサポートしていないが、QDsimは、半導体量子デバイスの開発を加速するために、より良いチューニングアルゴリズムのために、コスト効率と迅速なデータ取得のクリティカルなニーズに対処するツールである。 We introduce QDsim, a python package tailored for the rapid generation of charge stability diagrams in large-scale quantum dot devices, extending beyond traditional double or triple dots. QDsim is founded on the constant interaction model from which we rephrase the task of finding the lowest energy charge configuration as a convex optimization problem. Therefore, we can leverage the existing package CVXPY, in combination with an appropriate powerful solver, for the convex optimization which streamlines the creation of stability diagrams and polytopes. Through multiple examples, we demonstrate how QDsim enables the generation of large-scale dataset that can serve a basis for the training of machine-learning models for automated tuning algorithms. While the package currently does not support quantum effects beyond the constant interaction model, QDsim is a tool that directly addresses the critical need for cost-effective and expeditious data acquisition for better tuning algorithms in order to accelerate the development of semiconductor quantum devices.	翻訳日:2024-08-05 18:13:29 公開日:2024-08-02
# データ拡張における一般化ギャップ:照明からの洞察 Generalization Gap in Data Augmentation: Insights from Illumination ( http://arxiv.org/abs/2404.07514v2 ) ライセンス: Link先を確認	Jianqiang Xiao, Weiwen Guo, Junfeng Liu, Mengze Li,	(参考訳) コンピュータビジョンの分野では、深層学習技術を用いてデータセットをトレーニングする際の特徴的複雑さを強化するために、データ拡張が広く用いられている。しかし、モデルの一般化能力については、データ拡張によって生成された人工的特徴と自然な視覚的特徴との差が完全には明らかになっていない。本研究では,視覚的表現変数の概念を導入し,タスクの視覚的変化をこれらの変数の共分散として定義する。我々は,その分布劣化をシミュレーションし,データ拡張技術が分類タスクにおけるモデル性能をいかに向上させるかを調べることで,視覚表現変数「照明」に着目した。我々のゴールは、拡張現実で訓練されたモデルと実世界の照明条件で訓練されたモデルとの一般化の違いを調査することである。その結果,様々なデータ拡張手法を適用した結果,モデルの性能は大幅に向上した。しかし、様々なデータ拡張手法を利用して、モデル一般化を強化するトレーニングセットにおける特徴多様性の重要な役割を強調した上で、注目すべき一般化ギャップが依然として残っている。 In the field of computer vision, data augmentation is widely used to enrich the feature complexity of training datasets with deep learning techniques. However, regarding the generalization capabilities of models, the difference in artificial features generated by data augmentation and natural visual features has not been fully revealed. This study introduces the concept of "visual representation variables" to define the possible visual variations in a task as a joint distribution of these variables. We focus on the visual representation variable "illumination", by simulating its distribution degradation and examining how data augmentation techniques enhance model performance on a classification task. Our goal is to investigate the differences in generalization between models trained with augmented data and those trained under real-world illumination conditions. Results indicate that after applying various data augmentation methods, model performance has significantly improved. Yet, a noticeable generalization gap still exists after utilizing various data augmentation methods, emphasizing the critical role of feature diversity in the training set for enhancing model generalization.	翻訳日:2024-08-05 18:13:29 公開日:2024-08-02
# 変分LSE解の包括的ライブラリー Comprehensive Library of Variational LSE Solvers ( http://arxiv.org/abs/2404.09916v2 ) ライセンス: Link先を確認	Nico Meyer, Martin Röhn, Jakob Murauer, Axel Plinge, Christopher Mutschler, Daniel D. Scherer,	(参考訳) 方程式の線形系は、様々な数学領域や機械学習の分野にも見られる。ノイズの多い中間スケールの量子デバイスを利用することで、変動解法は大規模システムの探索ソリューションの高速化を約束する。これらのアルゴリズムに関する多くの理論的研究があるが、断片的な実装のみが存在する。このギャップを埋めるために,文献における既存のアプローチを実現する変分解法フレームワークを開発し,いくつかの拡張を導入した。ユーザフレンドリーなインターフェースは、エンド・ツー・エンドのアプリケーションを識別し開発する抽象化レベルで働く研究者のために設計されている。 Linear systems of equations can be found in various mathematical domains, as well as in the field of machine learning. By employing noisy intermediate-scale quantum devices, variational solvers promise to accelerate finding solutions for large systems. Although there is a wealth of theoretical research on these algorithms, only fragmentary implementations exist. To fill this gap, we have developed the variational-lse-solver framework, which realizes existing approaches in literature, and introduces several enhancements. The user-friendly interface is designed for researchers that work at the abstraction level of identifying and developing end-to-end applications.	翻訳日:2024-08-05 18:13:29 公開日:2024-08-02
# SPIdepth:自己教師型単眼深度推定のための強化ポーズ情報 SPIdepth: Strengthened Pose Information for Self-supervised Monocular Depth Estimation ( http://arxiv.org/abs/2404.12501v2 ) ライセンス: Link先を確認	Mykola Lavreniuk,	(参考訳) 自律走行とロボット工学への応用について、自己監督された単眼深度推定が注目されている。最近の手法では、Self Query Layer(SQL)のようなテクニックを活用して動きから奥行きを推測する手法が採用されているが、多くの場合、ポーズ情報を強化する可能性を見落としている。本稿では、ポーズネットワークの強化を優先して深度推定を改善する新しいアプローチであるSPIdepthを紹介する。 SQLによって構築された基盤の上に構築されているSPIdepthは、きめ細かいシーン構造をキャプチャする上で、ポーズ情報の重要性を強調している。 SPIdepthは、ポーズネットワークの能力を高めることにより、シーン理解と深さ推定における顕著な進歩を実現する。 KITTI、Cityscapes、Make3Dといったベンチマークデータセットの実験結果は、SPIdepthの最先端のパフォーマンスを示し、従来の手法をかなり上回っている。具体的には、SPIdepthが自己監督型のKITTIベンチマークを上回っている。さらに、SPIdepthは、KITTI上のAbsRel (0.029)、SqRel (0.069)、RMSE (1.394) の最低値を獲得し、新しい最先端の結果を確立する。 Cityscapesでは、SPIdepthはAbsRelの21.7%、SqRelの36.8%、RMSEの16.5%のSQLdepthの改善を示している。 Make3Dでは、ゼロショットのSPIdepthは他のすべてのモデルより優れている。興味深いことに、SPIdepthは推論のために1つの画像のみを使用してこれらの結果を達成し、推論にビデオシーケンスを利用する方法さえ超え、実世界のアプリケーションにおいてその有効性と効率を実証する。本手法は, 実世界におけるシーン理解の促進を目的としたポーズ情報強化の重要性を強調し, 自己教師型単眼深度推定における飛躍的な進歩を示す。 Self-supervised monocular depth estimation has garnered considerable attention for its applications in autonomous driving and robotics. While recent methods have made strides in leveraging techniques like the Self Query Layer (SQL) to infer depth from motion, they often overlook the potential of strengthening pose information. In this paper, we introduce SPIdepth, a novel approach that prioritizes enhancing the pose network for improved depth estimation. Building upon the foundation laid by SQL, SPIdepth emphasizes the importance of pose information in capturing fine-grained scene structures. By enhancing the pose network's capabilities, SPIdepth achieves remarkable advancements in scene understanding and depth estimation. Experimental results on benchmark datasets such as KITTI, Cityscapes, and Make3D showcase SPIdepth's state-of-the-art performance, surpassing previous methods by significant margins. Specifically, SPIdepth tops the self-supervised KITTI benchmark. Additionally, SPIdepth achieves the lowest AbsRel (0.029), SqRel (0.069), and RMSE (1.394) on KITTI, establishing new state-of-the-art results. On Cityscapes, SPIdepth shows improvements over SQLdepth of 21.7% in AbsRel, 36.8% in SqRel, and 16.5% in RMSE, even without using motion masks. On Make3D, SPIdepth in zero-shot outperforms all other models. Remarkably, SPIdepth achieves these results using only a single image for inference, surpassing even methods that utilize video sequences for inference, thus demonstrating its efficacy and efficiency in real-world applications. Our approach represents a significant leap forward in self-supervised monocular depth estimation, underscoring the importance of strengthening pose information for advancing scene understanding in real-world applications.	翻訳日:2024-08-05 18:13:29 公開日:2024-08-02
# ボトルネックの漂流:ニューラルネットにおける逆行学習から逆行非依存的ドメイン適応学習への進化的移行 Breaching the Bottleneck: Evolutionary Transition from Reward-Driven Learning to Reward-Agnostic Domain-Adapted Learning in Neuromodulated Neural Nets ( http://arxiv.org/abs/2404.12631v2 ) ライセンス: Link先を確認	Solvi Arnold, Reiji Suzuki, Takaya Arita, Kimitoshi Yamazaki,	(参考訳) 高度な生物学的知能は、行動品質に対するフィードバックが不足したり欠如している場合でも、情報豊富な刺激情報のストリームから効率的に学習する。このような学習はタスク領域に関する暗黙の仮定を利用する。ドメイン適応学習(Domain-Adapted Learning, DAL)などを指す。対照的に、AI学習アルゴリズムは、適合した振る舞いを取得するために、外部から提供された行動品質の測定に頼っている。これにより、学習効率を制限し、多様な非逆刺激情報からの学習を妨げる情報ボトルネックが課せられる。生物進化がこのボトルネックを回避してDALを発生させるのかという問題を考察する。まず、報奨信号から学習する能力を進化させ、非効率(ボトルネック化)だが広い適応性を提供することを提案する。そこから、学習プロセスへの非逆情報の統合は、特定のタスク領域におけるそのような情報によって引き起こされるバイアスの段階的な蓄積によって進行する。このシナリオは、ボトルネックのないドメイン適応学習への生物学的に妥当な経路を提供する。このシナリオの第2フェーズに着目して、強化学習(Reinforcement Learning, A2C)をモデルとした報酬駆動学習(Reinforcement Learning, A2C)によるNNの集団を構築し、神経変調更新機構を用いて学習プロセスに非逆情報を統合することにより、学習効率の向上を可能にする。連続2次元空間におけるナビゲーションタスクでは、進化したDALエージェントは純粋なRLエージェントに比べて学習速度が300倍に向上している。進化は報酬情報への依存を完全に排除し、DALエージェントは、局所的な神経変調に基づく接続重み更新のみを使用して、非逆情報からのみ学習することができる。 github.com/aislab/dalで公開されている。 Advanced biological intelligence learns efficiently from an information-rich stream of stimulus information, even when feedback on behaviour quality is sparse or absent. Such learning exploits implicit assumptions about task domains. We refer to such learning as Domain-Adapted Learning (DAL). In contrast, AI learning algorithms rely on explicit externally provided measures of behaviour quality to acquire fit behaviour. This imposes an information bottleneck that precludes learning from diverse non-reward stimulus information, limiting learning efficiency. We consider the question of how biological evolution circumvents this bottleneck to produce DAL. We propose that species first evolve the ability to learn from reward signals, providing inefficient (bottlenecked) but broad adaptivity. From there, integration of non-reward information into the learning process can proceed via gradual accumulation of biases induced by such information on specific task domains. This scenario provides a biologically plausible pathway towards bottleneck-free, domain-adapted learning. Focusing on the second phase of this scenario, we set up a population of NNs with reward-driven learning modelled as Reinforcement Learning (A2C), and allow evolution to improve learning efficiency by integrating non-reward information into the learning process using a neuromodulatory update mechanism. On a navigation task in continuous 2D space, evolved DAL agents show a 300-fold increase in learning speed compared to pure RL agents. Evolution is found to eliminate reliance on reward information altogether, allowing DAL agents to learn from non-reward information exclusively, using local neuromodulation-based connection weight updates only. Code available at github.com/aislab/dal.	翻訳日:2024-08-05 18:13:29 公開日:2024-08-02
# 動画フレーム補間のための動き認識潜時拡散モデル Motion-aware Latent Diffusion Models for Video Frame Interpolation ( http://arxiv.org/abs/2404.13534v3 ) ライセンス: Link先を確認	Zhilin Huang, Yijie Yu, Ling Yang, Chujun Qin, Bing Zheng, Xiawu Zheng, Zikun Zhou, Yaowei Wang, Wenming Yang,	(参考訳) AIGCの進歩に伴い、ビデオフレーム補間(VFI)は既存のビデオ生成フレームワークにおいて重要な要素となり、幅広い研究の関心を集めている。 VFIタスクでは、隣接するフレーム間の動き推定が、動きのあいまいさを避ける上で重要な役割を果たす。しかし、既存のVFI手法は連続するフレーム間の動き情報を正確に予測するのに常に苦労しており、この不正確な推定は曖昧で視覚的に不整合なフレームに繋がる。本稿では,VFIタスクに特化して設計された新しい拡散フレームワークである動き認識潜在拡散モデル(MADiff)を提案する。拡散サンプリング手順を通じて予測される目標補間フレームと条件付き隣接フレーム間の動作先を組み込むことで、MADiffは中間結果を徐々に洗練し、視覚的に滑らかでリアルな結果の両方を生成する。特に複雑な動きを伴う動的テクスチャを含む難解なシナリオにおいて,提案手法が既存手法よりも優れた性能を発揮することを示す。 With the advancement of AIGC, video frame interpolation (VFI) has become a crucial component in existing video generation frameworks, attracting widespread research interest. For the VFI task, the motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity. However, existing VFI methods always struggle to accurately predict the motion information between consecutive frames, and this imprecise estimation leads to blurred and visually incoherent interpolated frames. In this paper, we propose a novel diffusion framework, motion-aware latent diffusion models (MADiff), which is specifically designed for the VFI task. By incorporating motion priors between the conditional neighboring frames with the target interpolated frame predicted throughout the diffusion sampling procedure, MADiff progressively refines the intermediate outcomes, culminating in generating both visually smooth and realistic results. Extensive experiments conducted on benchmark datasets demonstrate that our method achieves state-of-the-art performance significantly outperforming existing approaches, especially under challenging scenarios involving dynamic textures with complex motion.	翻訳日:2024-08-05 18:13:29 公開日:2024-08-02
# 対超流体の非局所次数パラメータ Nonlocal order parameter of pair superfluids ( http://arxiv.org/abs/2404.15972v3 ) ライセンス: Link先を確認	Nitya Cuzzuol, Arianna Montorsi, Luca Barbiero,	(参考訳) 順序パラメータは、量子物質を特徴づける基本的な資源を表す。局所密度測定により導出可能な非局所秩序パラメータである奇数パリティ(英語版)を用いて,ペア超流動を厳密に定義できることが示される。研究の例として,1次元と2次元の異なる密度のボース・ハバードモデルについて検討する。ここでは, 相対的に強い相互作用に対して, 対超流動性を求める。奇パリティ作用素は、系の密度とその次元によらず、そのような位相のユニークな順序パラメータとして作用する。我々の発見を強制するために、我々は、超低温原子系において、実験的な実現がタイムリーな話題である2成分のボース・ハバード・ハミルトン系にも、我々のアプローチの一般性を確認する。その結果, 対超流動における相関密度変動の役割に新たな光を当てた。さらに、これらのエキゾチック相を実験的に検出し、原子超流動相への遷移を特徴づけるための強力なツールを提供する。 Order parameters represent a fundamental resource to characterize quantum matter. We show that pair superfluids can be rigorously defined in terms of a nonlocal order parameter, named odd parity, which derivation is experimentally accessible by local density measurements. As a case of study, we first investigate a constrained Bose-Hubbard model at different densities, both in one and two spatial dimensions. Here, our analysis finds pair superfluidity for relatively strong attractive interactions. The odd parity operator acts as the unique order parameter for such phase irrespectively to the density of the system and its dimensionality. In order to enforce our finding, we confirm the generality of our approach also on a two-component Bose-Hubbard Hamiltonian, which experimental realization represents a timely topic in ultracold atomic systems. Our results shed new light on the role of correlated density fluctuations in pair superfluids. In addition, they provide a powerful tool for the experimental detection of such exotic phases and the characterization of their transition to the atomic superfluid phase.	翻訳日:2024-08-05 18:03:40 公開日:2024-08-02
# LlamaTouch: モバイルUIタスク自動化のための忠実でスケーラブルなテストベッド LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation ( http://arxiv.org/abs/2404.16054v2 ) ライセンス: Link先を確認	Li Zhang, Shihe Wang, Xianqing Jia, Zhihan Zheng, Yunhe Yan, Longxi Gao, Yuanchun Li, Mengwei Xu,	(参考訳) 創発的な大規模言語/マルチモーダルモデルは、特にモバイルUIタスク自動化において、モバイルエージェントの進化を促進する。しかしながら、エージェント予測されたアクションと事前に定義されたアクションシーケンスを比較するために、人間の検証や確立されたデータセットに依存する既存の評価アプローチは、スケール不可能であり、不信である。これらの制限を克服するために、デバイス上でのモバイルUIタスク実行と忠実でスケーラブルなタスク評価のためのテストベッドであるLlamaTouchを提案する。タスク実行プロセスがUI状態のみを転送することを確認することで、LlamaTouchは、エージェントが手動でアノテートされた本質的なアプリケーション/システム状態をトラバースするかどうかのみを評価する、新しい評価アプローチを採用する。 1)モバイルエージェントがタスク実行のためにリアルなモバイル環境と対話できるオンデバイスタスク実行。 2) ピクセルレベルのスクリーンショットとテキスト画面階層をマージして、設計済みのアノテーションプリミティブの豊富なセットで必須のUIコンポーネントを明示的に識別し、正確にアノテートする、きめ細かいUIコンポーネントアノテーション。 (3) 予測不能なUIレイアウト/コンテントダイナミックスであっても、精度とファジィマッチングを利用して各画面の重要情報を正確に検出するマルチレベルアプリケーション状態マッチングアルゴリズム。現在、LlamaTouchには4つのモバイルエージェントと496のタスクが組み込まれています。評価結果は,LlamaTouchの実環境における評価の忠実度の高さと,人間の検証よりも優れたスケーラビリティを示す。 LlamaTouchはまた、タスクアノテーションと新しいモバイルエージェントの統合を可能にする。コードとデータセットはhttps://github.com/LlamaTouch/LlamaTouchで公開されている。 The emergent large language/multimodal models facilitate the evolution of mobile agents, especially in mobile UI task automation. However, existing evaluation approaches, which rely on human validation or established datasets to compare agent-predicted actions with predefined action sequences, are unscalable and unfaithful. To overcome these limitations, this paper presents LlamaTouch, a testbed for on-device mobile UI task execution and faithful, scalable task evaluation. By observing that the task execution process only transfers UI states, LlamaTouch employs a novel evaluation approach that only assesses whether an agent traverses all manually annotated, essential application/system states. LlamaTouch comprises three key techniques: (1) On-device task execution that enables mobile agents to interact with realistic mobile environments for task execution. (2) Fine-grained UI component annotation that merges pixel-level screenshots and textual screen hierarchies to explicitly identify and precisely annotate essential UI components with a rich set of designed annotation primitives. (3) A multi-level application state matching algorithm that utilizes exact and fuzzy matching to accurately detect critical information in each screen, even with unpredictable UI layout/content dynamics. LlamaTouch currently incorporates four mobile agents and 496 tasks, encompassing both tasks in the widely-used datasets and our self-constructed ones to cover more diverse mobile applications. Evaluation results demonstrate LlamaTouch's high faithfulness of evaluation in real-world mobile environments and its better scalability than human validation. LlamaTouch also enables easy task annotation and integration of new mobile agents. Code and dataset are publicly available at https://github.com/LlamaTouch/LlamaTouch.	翻訳日:2024-08-05 18:03:40 公開日:2024-08-02
# 正準決定ダイアグラムによるモデュロ理論 Canonical Decision Diagrams Modulo Theories ( http://arxiv.org/abs/2404.16455v3 ) ライセンス: Link先を確認	Massimo Michelutti, Gabriele Masina, Giuseppe Spallitta, Roberto Sebastiani,	(参考訳) 決定図(Decision diagrams, DD)は、多くの領域、特に形式的検証や知識コンパイルにおいて、効果的に命題式を表現する強力なツールである。 DDのいくつかの形式(例: OBDDs, SDDs)は標準的であり、(原子リスト上の与えられた条件の下では)公式の同値類を一意的に表す。命題論理の限られた表現性を考えると、DDをSMTレベルに活用する試みが文献で紹介されている。残念なことに、これらの技術は依然としていくつかの制限に悩まされている: ほとんどの手順は理論固有のものであり、いくつかの生成理論DD(T-DDs)は T-valid 式や T-consistent 式を単項的に表さない。また、これらの手順は実装が簡単ではなく、実際に実装できるものはほとんどありません。本稿では,全SMT ソルバと DD パッケージをブラックボックスとして実装することは極めて容易であり,すべての DD の形式や,AllSMT ソルバがサポートする理論,あるいはその組み合わせに対して有効であり,提案 DD が正則であれば理論-正準 T-DD を生成するという,SMT レベルに DD を活用するための新しい手法を提案する。我々は,OBDDとSDDパッケージとMathSAT SMTソルバ上に,T-OBDDとT-SDDのプロトタイプツールを実装した。いくつかの予備的な経験的評価は、アプローチの有効性を支持する。 Decision diagrams (DDs) are powerful tools to represent effectively propositional formulas, which are largely used in many domains, in particular in formal verification and in knowledge compilation. Some forms of DDs (e.g., OBDDs, SDDs) are canonical, that is, (under given conditions on the atom list) they univocally represent equivalence classes of formulas. Given the limited expressiveness of propositional logic, a few attempts to leverage DDs to SMT level have been presented in the literature. Unfortunately, these techniques still suffer from some limitations: most procedures are theory-specific; some produce theory DDs (T-DDs) which do not univocally represent T-valid formulas or T-inconsistent formulas; none of these techniques provably produces theory-canonical T-DDs, which (under given conditions on the T-atom list) univocally represent T-equivalence classes of formulas. Also, these procedures are not easy to implement, and very few implementations are actually available. In this paper, we present a novel very-general technique to leverage DDs to SMT level, which has several advantages: it is very easy to implement on top of an AllSMT solver and a DD package, which are used as blackboxes; it works for every form of DDs and every theory, or combination thereof, supported by the AllSMT solver; it produces theory-canonical T-DDs if the propositional DD is canonical. We have implemented a prototype tool for both T-OBDDs and T-SDDs on top of OBDD and SDD packages and the MathSAT SMT solver. Some preliminary empirical evaluation supports the effectiveness of the approach.	翻訳日:2024-08-05 18:03:40 公開日:2024-08-02
# 大規模言語モデルのイベント推論に関する総合的評価 A Comprehensive Evaluation on Event Reasoning of Large Language Models ( http://arxiv.org/abs/2404.17513v2 ) ライセンス: Link先を確認	Zhengwei Tao, Zhi Jin, Yifan Zhang, Xiancai Chen, Haiyan Zhao, Jia Li, Bing Liang, Chongyang Tao, Qun Liu, Kam-Fai Wong,	(参考訳) イベント推論は多くのアプリケーションの基礎となる基本的な能力です。グローバルな推論を行うにはイベントスキーマの知識が必要であり、イベント間の関係や推論パラダイムの多様性を扱う必要がある。 LLMが、様々な関係や推論パラダイムに基づいたイベント推論をいかにうまく達成するかは、いまだに不明である。この格差を緩和するため,LLMの事象推論能力について総合的に評価した。本稿ではEVent推論のEValuationのための新しいベンチマークEV2を紹介する。 EV2はスキーマとインスタンスの評価の2つのレベルから構成されており、関係性や推論のパラダイムにおいて包括的である。 EV2について広範な実験を行った。 LLMにはイベント推論を実現する能力があるが、その性能は十分ではない。また,LLMにおける事象推論能力の不均衡にも気付く。 LLMにはイベントスキーマの知識もありますが、その知識の活用方法については、人間と一致していません。これらの知見に基づいて,イベントスキーマの知識をメモリとして活用することで,イベント推論の改善を導出する。 Event reasoning is a fundamental ability that underlies many applications. It requires event schema knowledge to perform global reasoning and needs to deal with the diversity of the inter-event relations and the reasoning paradigms. How well LLMs accomplish event reasoning on various relations and reasoning paradigms remains unknown. To mitigate this disparity, we comprehensively evaluate the abilities of event reasoning of LLMs. We introduce a novel benchmark EV2 for EValuation of EVent reasoning. EV2 consists of two levels of evaluation of schema and instance and is comprehensive in relations and reasoning paradigms. We conduct extensive experiments on EV2. We find that LLMs have abilities to accomplish event reasoning but their performances are far from satisfactory. We also notice the imbalance of event reasoning abilities in LLMs. Besides, LLMs have event schema knowledge, however, they're not aligned with humans on how to utilize the knowledge. Based on these findings, we guide the LLMs in utilizing the event schema knowledge as memory leading to improvements on event reasoning.	翻訳日:2024-08-05 18:03:40 公開日:2024-08-02
# MiPa:Mixed Patch Infrared-VisibleModality Agnostic Object Detection MiPa: Mixed Patch Infrared-Visible Modality Agnostic Object Detection ( http://arxiv.org/abs/2404.18849v2 ) ライセンス: Link先を確認	Heitor R. Medeiros, David Latortue, Eric Granger, Marco Pedersoli,	(参考訳) 現実のシナリオでは、可視光(RGB)や赤外線(IR)のような複数のモードを使用することで、オブジェクト検出(OD)のような予測タスクの性能を大幅に向上させることができる。マルチモーダル学習は、これらのモダリティを活用する一般的な方法であり、複数のモダリティ固有のエンコーダと融合モジュールを用いて性能を向上させる。本稿では、RGBとIRのモダリティを1つの共有ビジョンエンコーダによって1つのモダリティまたはもう1つのモダリティのみを観測する別の方法に取り組む。この現実的な設定はメモリフットプリントが低く、RGBやIRデータに依存する自律運転や監視といったアプリケーションに適している。しかし、1つのエンコーダを複数のモダリティで学習すると、一方のモダリティが他方を支配し、不均一な認識結果を生み出す。本研究では、RGBとIRのモダリティを効率よく活用して、共通トランスフォーマーベースのODビジョンエンコーダをトレーニングし、モダリティの不均衡の影響に対処する方法について検討する。そこで本研究では,2つのモダリティの共通表現を学習するために,パッチワイドなモダリティ非依存モジュールと組み合わせたMiPa(MiPa)の新たなトレーニング手法を提案する。我々の実験は、MiPaが従来のRGB/IRベンチマークで競合する結果に到達するための表現を学習できることを示し、推論中に単一のモダリティしか必要としないことを示した。私たちのコードは、https://github.com/heitorrapela/MiPa.comで利用可能です。 In real-world scenarios, using multiple modalities like visible (RGB) and infrared (IR) can greatly improve the performance of a predictive task such as object detection (OD). Multimodal learning is a common way to leverage these modalities, where multiple modality-specific encoders and a fusion module are used to improve performance. In this paper, we tackle a different way to employ RGB and IR modalities, where only one modality or the other is observed by a single shared vision encoder. This realistic setting requires a lower memory footprint and is more suitable for applications such as autonomous driving and surveillance, which commonly rely on RGB and IR data. However, when learning a single encoder on multiple modalities, one modality can dominate the other, producing uneven recognition results. This work investigates how to efficiently leverage RGB and IR modalities to train a common transformer-based OD vision encoder, while countering the effects of modality imbalance. For this, we introduce a novel training technique to Mix Patches (MiPa) from the two modalities, in conjunction with a patch-wise modality agnostic module, for learning a common representation of both modalities. Our experiments show that MiPa can learn a representation to reach competitive results on traditional RGB/IR benchmarks while only requiring a single modality during inference. Our code is available at: https://github.com/heitorrapela/MiPa.	翻訳日:2024-08-05 18:03:40 公開日:2024-08-02
# 単一光子検出は光子確率振幅の干渉を示すために必ずしも必要ではない Detecting single photons is not always necessary to evidence interference of photon probability amplitudes ( http://arxiv.org/abs/2405.01050v4 ) ライセンス: Link先を確認	Eric Lantz, Fabrice Devaux, Serge Massar,	(参考訳) 偶然の偶然の抽出は、量子光学実験の一般的な実践である。真空圧縮のようなゼロ平均ガウス状態の場合、偶然の一致を取り除いた場合、測定結果は、非常に低いフラックスでの光子偶然と強度の共分散の両方で定量的に同じであることを示す。したがって、光子波動関数の干渉や光子束の干渉のような光子レベルの純粋な量子効果は、自発的なダウン変換から発行されるマクロビームのゆらぎの相関で再現される。これは、検出分解能がコヒーレンスセル(モードのサイズ)よりも小さい場合と、ウィグナー関数のサンプリングに基づく確率シミュレーションの場合の両方に当てはまる。本稿では,ベルの不等式(偶発的偶然を減じることができない),量子イメージングなどの多モードな状況,高次相関など,この対応の限界について論じる。 Subtracting accidental coincidences is a common practice quantum optics experiments. For zero mean Gaussian states, such as squeezed vacuum, we show that if one removes accidental coincidences the measurement results are quantitatively the same, both for photon coincidences at very low flux and for intensity covariances. Consequently, pure quantum effects at the photon level, like interference of photon wave functions or photon bunching, are reproduced in the correlation of fluctuations of macroscopic beams issued from spontaneous down conversion. This is true both in experiment if the detection resolution is smaller than the coherence cell (size of the mode), and in stochastic simulations based on sampling the Wigner function. We discuss the limitations of this correspondence, such as Bell inequalities (for which one cannot substract accidental coincidences), highly multimode situations such as quantum imaging, and higher order correlations.	翻訳日:2024-08-05 18:03:40 公開日:2024-08-02
# 時間を無駄にしない - クロスバリデーションの早期停止 Don't Waste Your Time: Early Stopping Cross-Validation ( http://arxiv.org/abs/2405.03389v2 ) ライセンス: Link先を確認	Edward Bergman, Lennart Purucker, Frank Hutter,	(参考訳) グラフデータのための最先端の自動機械学習システムは、しばしばクロスバリデーションを使用し、測定されたパフォーマンスが目に見えないデータに一般化すること、あるいはその後のアンサンブルが過度に適合しないことを保証する。しかし、ホールドアウトバリデーションの代わりにk倍のクロスバリデーションを使用すると、単一の構成を検証する計算コストが大幅に増大する。より良い一般化と、拡張によって、より良いパフォーマンスを保証する一方で、追加のコストは、しばしば時間予算内で効果的なモデル選択を禁止します。クロスバリデーションによるモデル選択をより効果的にすることを目指している。そこで本研究では,モデル選択時のクロスバリデーションプロセスの早期停止について検討する。我々は,36の分類データセットにおいて,早期停止が2つのアルゴリズム(MLPとランダムフォレスト)のランダム探索に与える影響について検討した。さらに, 3, 5-, 10-folds を考慮し, 折りたたみ数の影響を解析した。さらに,ランダム探索の代わりにベイズ最適化による早期停止の効果と,繰り返しのクロスバリデーションについて検討した。我々の探索的研究は、単純な理解と実装の容易な方法でさえ、モデル選択が一貫して高速に収束できることを示し、全てのデータセットの94%が平均214%の速度でモデル選択を行う。さらに、クロスバリデーションの停止により、1時間以内に平均で+167%の構成を考慮し、モデル選択により検索空間をより徹底的に探索できると同時に、全体的なパフォーマンスも向上する。 State-of-the-art automated machine learning systems for tabular data often employ cross-validation; ensuring that measured performances generalize to unseen data, or that subsequent ensembling does not overfit. However, using k-fold cross-validation instead of holdout validation drastically increases the computational cost of validating a single configuration. While ensuring better generalization and, by extension, better performance, the additional cost is often prohibitive for effective model selection within a time budget. We aim to make model selection with cross-validation more effective. Therefore, we study early stopping the process of cross-validation during model selection. We investigate the impact of early stopping on random search for two algorithms, MLP and random forest, across 36 classification datasets. We further analyze the impact of the number of folds by considering 3-, 5-, and 10-folds. In addition, we investigate the impact of early stopping with Bayesian optimization instead of random search and also repeated cross-validation. Our exploratory study shows that even a simple-to-understand and easy-to-implement method consistently allows model selection to converge faster; in ~94% of all datasets, on average by ~214%. Moreover, stopping cross-validation enables model selection to explore the search space more exhaustively by considering +167% configurations on average within one hour, while also obtaining better overall performance.	翻訳日:2024-08-05 18:03:40 公開日:2024-08-02
# MovieLensの信奉データ:オンラインレコメンダシステムのためのプレChoiceデータ収集 The MovieLens Beliefs Dataset: Collecting Pre-Choice Data for Online Recommender Systems ( http://arxiv.org/abs/2405.11053v3 ) ライセンス: Link先を確認	Guy Aridor, Duarte Goncalves, Ruoyan Kong, Daniel Kluver, Joseph Konstan,	(参考訳) レコメンデーションシステムをデザインする上でますます重要な側面は、リコメンデーションが消費者の選択にどのように影響するかを検討することである。本稿では,未経験項目に対するユーザの信念を収集する手法を導入することでこの問題に対処する。この手法をMovieLensプラットフォームに実装し,ユーザ評価,信条,レコメンデーションを組み合わせたリッチデータセットを構築した。このようなデータ収集の課題には、応答における選択バイアスや、製品空間の限定的なカバレッジなどが含まれる。このユニークなリソースにより、研究者はユーザーの振る舞いを深く掘り下げ、不在のレコメンデーションを分析し、レコメンデーションの有効性を計測し、ユーザー信条データを活用するアルゴリズムのプロトタイプを作成することができ、最終的にはより影響力のあるレコメンデーションシステムに繋がる。データセットはhttps://grouplens.org/datasets/movielens/ml_belief_2024/で見ることができる。 An increasingly important aspect of designing recommender systems involves considering how recommendations will influence consumer choices. This paper addresses this issue by introducing a method for collecting user beliefs about un-experienced items - a critical predictor of choice behavior. We implemented this method on the MovieLens platform, resulting in a rich dataset that combines user ratings, beliefs, and observed recommendations. We document challenges to such data collection, including selection bias in response and limited coverage of the product space. This unique resource empowers researchers to delve deeper into user behavior and analyze user choices absent recommendations, measure the effectiveness of recommendations, and prototype algorithms that leverage user belief data, ultimately leading to more impactful recommender systems. The dataset can be found at https://grouplens.org/datasets/movielens/ml_belief_2024/.	翻訳日:2024-08-05 18:03:40 公開日:2024-08-02
# NISQフレンドリーなHHLアルゴリズムのボットネックの同定 Identifying Bottlenecks of NISQ-friendly HHL algorithms ( http://arxiv.org/abs/2406.06288v2 ) ライセンス: Link先を確認	Marc Andreu Marfany, Alona Sakhnenko, Jeanette Miriam Lorenz,	(参考訳) 量子コンピューティングは、ハードウェアスタックが成熟すると、例えばHHLアルゴリズムによる大規模線形方程式システムのような、大きな問題インスタンスの解決を可能にすることを約束する。将来の量子コンピューティングは、ノイズなどのハードウェアの欠陥をアルゴリズムが考慮する必要があるいわゆるNISQ時代のままである。本研究は,HHLアルゴリズムの最も資源を消費するコンポーネントであるQPEとそのNISQ適応反復QPEのスケーリング特性と直接的ノイズレジリエンスをテストするための実証的研究である。そこで我々は,これらのアルゴリズムにおける雑音低減手法の有効性について検討し,入力に間隔制約を課すことでゲート数を低く抑えることができるか,Qiskitパッケージが提供する回路最適化手法を用いて検討する。この結果から,現在利用可能なQiskitreadoutやM Threereadoutパッケージなどのノイズ低減技術は,ここでテストした小さなインスタンスにおいても,結果の回復には不十分であることが示唆された。さらに,本研究の結果から,これらのアルゴリズムの精度向上に伴うスケーリングが,最も重大な障害であることが示唆された。これらの知見により、QPEと同様の時間進化を考慮したアルゴリズムの近似ボトルネックを導出することができた。このような観測は、NISQデバイスにおけるそのようなアルゴリズムの弱点の証拠を提供し、有意義な将来の研究方向性を定式化するのに役立ちます。 Quantum computing promises enabling solving large problem instances, e.g. large linear equation systems with HHL algorithm, once the hardware stack matures. For the foreseeable future quantum computing will remain in the so-called NISQ era, in which the algorithms need to account for the flaws of the hardware such as noise. In this work, we perform an empirical study to test scaling properties and directly related noise resilience of the the most resources-intense component of the HHL algorithm, namely QPE and its NISQ-adaptation Iterative QPE. We explore the effectiveness of noise mitigation techniques for these algorithms and investigate whether we can keep the gate number low by enforcing sparsity constraints on the input or using circuit optimization techniques provided by Qiskit package. Our results indicate that currently available noise mitigation techniques, such as Qiskit readout and Mthree readout packages, are insufficient for enabling results recovery even in the small instances tested here. Moreover, our results indicate that the scaling of these algorithms with increase in precision seems to be the most substantial obstacle. These insights allowed us to deduce an approximate bottleneck for algorithms that consider a similar time evolution as QPE. Such observations provide evidence of weaknesses of such algorithms on NISQ devices and help us formulate meaningful future research directions.	翻訳日:2024-08-05 18:03:40 公開日:2024-08-02
# スパース・デンス混合符号化プロセスにおけるMajoranaオブジェクトの無散逸位相量子計算 Dissipationless topological quantum computation for Majorana objects in sparse-dense mixed encoding process ( http://arxiv.org/abs/2407.11544v2 ) ライセンス: Link先を確認	Ye-Min Zhan, Guan-Dong Mao, Yu-Ge Chen, Yue Yu, Xi Luo,	(参考訳) 2量子ビットの量子ゲートの少なくともいくつかは、量子ビットのフェルミオン(電荷またはスピン)パリティに依存しているため、マヨラナ天体に基づくトポロジカル量子計算は重要な課題である。この依存は、量子回路モデル内で量子プロセスを進めようとするとき、これらのゲートを含む量子演算を確率的に表す。このようなアプローチは、測定が望ましくないフェルミオンパリティをもたらすと、重大な情報損失につながる。情報の浪費問題を解決するため,不要なフェルミオンパリティから所望のフェミオンパリティへの情報の非散逸補正を可能にするトポロジカルな操作を考案した。我々は、制御NOTゲートに対してスパース・デンス混合符号化プロセスを用いて、計算量子ビットが持つ量子情報に影響を与えることなく、どのように修正を行うかを説明する。この補正プロセスは、望ましくない入力量子ビットかフェルミオンパリティ依存の量子ゲートのいずれかに適用することができ、Majorana-zero-modeベースおよびMajorana-edge-modeベースのトポロジカル量子計算に有効である。 Topological quantum computation based on Majorana objects is subject to a significant challenge because at least some of the two-qubit quantum gates rely on the fermion (either charge or spin) parity of the qubits. This dependency renders the quantum operations involving these gates probabilistic when attempting to advance quantum processes within the quantum circuit model. Such an approach leads to significant information loss whenever measurements yield the undesired fermion parity. To resolve the problem of wasting information, we devise topological operations that allow for the non-dissipative correction of information from undesired fermion parity to the desired one. We will use the sparse-dense mixed encoding process for the controlled-NOT gate as an example to explain how corrections can be implemented without affecting the quantum information carried by the computational qubits. This correction process can be applied {to} either the undesired input qubits or the fermion parity-dependent quantum gates, and it works for both Majorana-zero-mode-based and Majorana-edge-mode-based topological quantum computation.	翻訳日:2024-08-05 18:03:40 公開日:2024-08-02
# 量子チャネルの可視性に対する十分な基準 A Sufficient Criterion for Divisibility of Quantum Channels ( http://arxiv.org/abs/2407.17103v2 ) ライセンス: Link先を確認	Frederik vom Ende,	(参考訳) 我々は、ある量子チャネル $\Phi$ が可除であること、すなわち非自明な分解 $\Phi=\Phi_1\Phi_2$ が存在することを保証する、単純で次元に依存しない基準を示す。まず "elementary" チャネル $\Phi_2$ を定義し、次に $\Phi\Phi_2^{-1}$ が完全に正であるときに解析する。この方法で得られる十分条件は、$\Phi$ の明示的な因数分解をもたらそうとするものであり、$\langle x^\perp\|\mathcal K_\Phi\mathcal K_\Phi^\perp\|x\rangle=\langle x\|\mathcal K_\Phi\mathcal K_\Phi\mathcal K_\Phi^\perp\|x\rangle=\{0\}$ ここで$\mathcal K_\Phi$ は $\Phi$ のクラウス部分空間であり、$\mathcal K_\Phi^\perp$ はその直交補空間である。もちろん、線型性を用いることで、この基準は有限個の等式に還元できる。一般論として、この分割はクラウス階数をさらに低くするので、繰り返しアプリケーション(可能であれば)は、ある意味では「単純な」チャネルに$\Phi$を分解する。最後に、我々の技術は私たちが選択した特定の基本チャネルに限らないことに注意してください。 We present a simple, dimension-independent criterion which guarantees that some quantum channel $\Phi$ is divisible, i.e. that there exists a non-trivial factorization $\Phi=\Phi_1\Phi_2$. The idea is to first define an "elementary" channel $\Phi_2$ and then to analyze when $\Phi\Phi_2^{-1}$ is completely positive. The sufficient criterion obtained this way -- which even yields an explicit factorization of $\Phi$ -- is that one has to find orthogonal unit vectors $x,x^\perp$ such that $\langle x^\perp\|\mathcal K_\Phi\mathcal K_\Phi^\perp\|x\rangle=\langle x\|\mathcal K_\Phi\mathcal K_\Phi^\perp\|x\rangle=\{0\}$ where $\mathcal K_\Phi$ is the Kraus subspace of $\Phi$ and $\mathcal K_\Phi^\perp$ is its orthogonal complement. Of course, using linearity this criterion can be reduced to finitely many equalities. Generically, this division even lowers the Kraus rank which is why repeated application -- if possible -- results in a factorization of $\Phi$ into in some sense "simple" channels. Finally, be aware that our techniques are not limited to the particular elementary channel we chose.	翻訳日:2024-08-05 18:03:40 公開日:2024-08-02
# 社会技術スタック:非合意的近親メディアにおけるソーシャル・コンピューティング研究の機会 The Sociotechnical Stack: Opportunities for Social Computing Research in Non-consensual Intimate Media ( http://arxiv.org/abs/2405.03585v2 ) ライセンス: Link先を確認	Li Qiwei, Allison McDonald, Oliver L. Haimson, Sarita Schoenebeck, Eric Gilbert,	(参考訳) 非合意的親密なメディア(NCIM)は、人物の同意なしに親密なコンテンツを共有することであり、その中には「復讐ポルノ」や性的に露骨なディープフェイクが含まれる。 NCIMは過去10年間、法学、心理学、コミュニケーションの分野で注目を集めてきたが、コンピュータ奨学金では十分に扱われていない。本稿では、NCIMがそれらを促進する特定の技術コンポーネントに害を及ぼすことによって、このギャップを解消する。技術的スタックをそれに対応する社会的影響にマッピングするために設計された概念的フレームワークである社会技術的スタックを紹介する。社会工学的なスタックは、NCIMのような社会工学的な問題を解析し、コンピューティング研究の機会へ向けることを可能にする。本稿では,NCIMの潜伏を防止し,技術の構築と再構築を通じて被害者の生存を支援するための,コンピューティングと社会コンピューティングコミュニティのための研究ロードマップを提案する。 Non-consensual intimate media (NCIM) involves sharing intimate content without the depicted person's consent, including "revenge porn" and sexually explicit deepfakes. While NCIM has received attention in legal, psychological, and communication fields over the past decade, it is not sufficiently addressed in computing scholarship. This paper addresses this gap by linking NCIM harms to the specific technological components that facilitate them. We introduce the sociotechnical stack, a conceptual framework designed to map the technical stack to its corresponding social impacts. The sociotechnical stack allows us to analyze sociotechnical problems like NCIM, and points toward opportunities for computing research. We propose a research roadmap for computing and social computing communities to deter NCIM perpetration and support victim-survivors through building and rebuilding technologies.	翻訳日:2024-08-05 17:53:28 公開日:2024-08-02
# SemiCD-VL: ビジュアルランゲージモデル誘導による半教師付き変化検出器の改良 SemiCD-VL: Visual-Language Model Guidance Makes Better Semi-supervised Change Detector ( http://arxiv.org/abs/2405.04788v3 ) ライセンス: Link先を確認	Kaiyu Li, Xiangyong Cao, Yupeng Deng, Junmin Liu, Deyu Meng, Zhi Wang,	(参考訳) Change Detection (CD) は、画像間のセマンティックな変化でピクセルを識別することを目的としている。しかし、大量のピクセルレベルの画像に注釈を付けることは、特に人間の専門家によるピクセルレベルの比較を必要とするマルチテンポラリ画像に対して、労働集約的でコストがかかる。ゼロショットやオープンボキャブラリなどにおける視覚言語モデル(VLM)の性能を即時推論で向上させることを考えると,VLMを利用してラベル付きデータでより良いCDを作成することが期待できる。本稿では,VLM誘導に基づく半教師付きCD手法,すなわちSemiCD-VLを提案する。 SemiCD-VLの洞察は、VLMを用いて自由な変更ラベルを合成し、ラベルなしデータに対するさらなる監視信号を提供することである。しかしながら、現在のほとんどのVLMは単一時間画像用に設計されており、バイ時間画像や複数時間画像に直接適用することはできない。そこで我々はまず,VLMに基づく混合変化イベント生成(CEG)戦略を提案し,ラベルなしCDデータに擬似ラベルを付与する。これらのVLM駆動型擬似ラベルによって提供される追加の教師付き信号は、整合正則化パラダイム(例えば FixMatch)の擬似ラベルと矛盾する可能性があるため、異なる信号源を分離するための二重投影ヘッドを提案する。さらに、VLMによってガイドされる2つの補助セグメント化デコーダを通して、両時間画像の意味表現を明示的に分離する。最後に、モデルが変化表現をより適切にキャプチャするために、補助枝における特徴レベルのコントラスト損失によるメトリクス認識の監視を導入する。広汎な実験はセミCD-VLの利点を示している。例えば、SemiCD-VLはFixMatchベースラインをWHU-CDで+5.3 IoU、LEVIR-CDで+2.4 IoUで5%改善している。さらに、当社のCEG戦略は、教師なしの方法で、最先端の教師なしCD手法よりもはるかに優れた性能を達成することができる。 Change Detection (CD) aims to identify pixels with semantic changes between images. However, annotating massive numbers of pixel-level images is labor-intensive and costly, especially for multi-temporal images, which require pixel-wise comparisons by human experts. Considering the excellent performance of visual language models (VLMs) for zero-shot, open-vocabulary, etc. with prompt-based reasoning, it is promising to utilize VLMs to make better CD under limited labeled data. In this paper, we propose a VLM guidance-based semi-supervised CD method, namely SemiCD-VL. The insight of SemiCD-VL is to synthesize free change labels using VLMs to provide additional supervision signals for unlabeled data. However, almost all current VLMs are designed for single-temporal images and cannot be directly applied to bi- or multi-temporal images. Motivated by this, we first propose a VLM-based mixed change event generation (CEG) strategy to yield pseudo labels for unlabeled CD data. Since the additional supervised signals provided by these VLM-driven pseudo labels may conflict with the pseudo labels from the consistency regularization paradigm (e.g. FixMatch), we propose the dual projection head for de-entangling different signal sources. Further, we explicitly decouple the bi-temporal images semantic representation through two auxiliary segmentation decoders, which are also guided by VLM. Finally, to make the model more adequately capture change representations, we introduce metric-aware supervision by feature-level contrastive loss in auxiliary branches. Extensive experiments show the advantage of SemiCD-VL. For instance, SemiCD-VL improves the FixMatch baseline by +5.3 IoU on WHU-CD and by +2.4 IoU on LEVIR-CD with 5% labels. In addition, our CEG strategy, in an un-supervised manner, can achieve performance far superior to state-of-the-art un-supervised CD methods.	翻訳日:2024-08-05 17:53:28 公開日:2024-08-02
# ニュース記事イベントベース埋め込みの新しい手法 A Novel Method for News Article Event-Based Embedding ( http://arxiv.org/abs/2405.13071v2 ) ライセンス: Link先を確認	Koren Ishlach, Itzhak Ben-David, Michael Fire, Lior Rokach,	(参考訳) ニュース記事の埋め込みは、メディアバイアスの検出、偽ニュースの特定、ニュースレコメンデーションなど、複数の分野にとって重要なツールである。しかし、既存のニュース埋め込み手法は、ニュースイベントの潜在コンテキストを捉えるために最適化されていない。ほとんどの埋め込み方法はフルテキスト情報に依存し、時間関連の埋め込み生成を無視する。本稿では,記事に言及されているエンティティやテーマに着目し,特定のイベントに関連付けることで,ニュース埋め込み生成を最適化する,新しい軽量な手法を提案する。 3段階からなる手法を提案する。まず、与えられたニュース記事からイベント、エンティティ、テーマを処理し、抽出する。第2に、現在および歴史的データに基づいて時間分離したGloVeモデルをトレーニングすることにより、テーマとエンティティの周期的時間埋め込みを生成する。最後に、記事レベルのベクトルに対するSIF(Smooth Inverse Frequency)と、イベント関連情報による埋め込みのためのSamese Neural Networksの2つの異なるアプローチによって生成されたニュース埋め込みを結合する。我々はGDELTプロジェクトから,85万件以上のニュース記事と1000,000件のイベントを活用し,本手法の検証と評価を行った。検証のための様々なニュース埋め込み生成手法の比較分析を行った。提案手法は,共有イベント検出タスクにおける最先端手法の改善と性能向上の両立を実証した。 Embedding news articles is a crucial tool for multiple fields, such as media bias detection, identifying fake news, and making news recommendations. However, existing news embedding methods are not optimized to capture the latent context of news events. Most embedding methods rely on full-text information and neglect time-relevant embedding generation. In this paper, we propose a novel lightweight method that optimizes news embedding generation by focusing on entities and themes mentioned in articles and their historical connections to specific events. We suggest a method composed of three stages. First, we process and extract events, entities, and themes from the given news articles. Second, we generate periodic time embeddings for themes and entities by training time-separated GloVe models on current and historical data. Lastly, we concatenate the news embeddings generated by two distinct approaches: Smooth Inverse Frequency (SIF) for article-level vectors and Siamese Neural Networks for embeddings with nuanced event-related information. We leveraged over 850,000 news articles and 1,000,000 events from the GDELT project to test and evaluate our method. We conducted a comparative analysis of different news embedding generation methods for validation. Our experiments demonstrate that our approach can both improve and outperform state-of-the-art methods on shared event detection tasks.	翻訳日:2024-08-05 17:53:28 公開日:2024-08-02
# 多変量時系列のための多物理情報ニューラルネットワークのためのボンドグラフ Bond Graphs for multi-physics informed Neural Networks for multi-variate time series ( http://arxiv.org/abs/2405.13586v2 ) ライセンス: Link先を確認	Alexis-Raja Brachet, Pierre-Yves Richard, Céline Hudelot,	(参考訳) ハイブリッド人工知能技術のトレンドの中で、物理情報機械学習はますます関心を集めている。主に、データ、学習、アーキテクチャバイアスをシミュレーションデータ、部分微分方程式、等分散と不変性で表す。流体力学のような1つの物理領域に関わるタスクでは大きな成功を収めてきたが、既存の手法は複雑な多物理現象や多ドメイン現象のタスクには適応していない。また、主にエンドツーエンドの学習方式として定式化されている。これらの課題に対処するために、メッセージパッシンググラフニューラルネットワークとともに、多物理モデリングアプローチであるボンドグラフを活用することを提案する。タスク固有モデルに入力可能な多物理インフォームド表現を生成するニューラルボンドグラフエンコーダ(NBgE)を提案する。ディープラーニングにデータバイアスとアーキテクチャバイアスを統合する統一的な方法を提供する。直流モータと呼吸システムという2つの挑戦的マルチドメイン物理システムに関する実験により,多変量時系列予測タスクにおけるアプローチの有効性を実証した。 In the trend of hybrid Artificial Intelligence techniques, Physical-Informed Machine Learning has seen a growing interest. It operates mainly by imposing data, learning, or architecture bias with simulation data, Partial Differential Equations, or equivariance and invariance properties. While it has shown great success on tasks involving one physical domain, such as fluid dynamics, existing methods are not adapted to tasks with complex multi-physical and multi-domain phenomena. In addition, it is mainly formulated as an end-to-end learning scheme. To address these challenges, we propose to leverage Bond Graphs, a multi-physics modeling approach, together with Message Passing Graph Neural Networks. We propose a Neural Bond graph Encoder (NBgE) producing multi-physics-informed representations that can be fed into any task-specific model. It provides a unified way to integrate both data and architecture biases in deep learning. Our experiments on two challenging multi-domain physical systems - a Direct Current Motor and the Respiratory System - demonstrate the effectiveness of our approach on a multivariate time-series forecasting task.	翻訳日:2024-08-05 17:53:28 公開日:2024-08-02
# 大規模言語モデルを用いたインフォーマティブテキスト評価の緩和 Eliciting Informative Text Evaluations with Large Language Models ( http://arxiv.org/abs/2405.15077v3 ) ライセンス: Link先を確認	Yuxuan Lu, Shengwei Xu, Yichi Zhang, Yuqing Kong, Grant Schoenebeck,	(参考訳) ピア予測機構は、証明可能な保証で高品質なフィードバックを動機付ける。しかし、現在の手法は、多重選択やスカラー数のような比較的単純なレポートにのみ適用される。我々は,これらの手法をテキストベースレポートの大規模領域に拡張することを目指しており,近年の大規模言語モデルの発展を反映している。これは、ピアレビュー、eコマースの顧客レビュー、ソーシャルメディアへのコメントなど、さまざまなフィードバックチャネルにおいて、テキストフィードバックが標準となっているため、ピア予測メカニズムの適用性を大幅に向上させる。本稿では,GPPM(Generative Peer Prediction Mechanism)とGSPPM(Generative Synopsis Peer Prediction Mechanism)の2つのメカニズムを紹介する。これらのメカニズムはLSMを予測因子として利用し、あるエージェントのレポートから仲間のレポートの予測にマッピングする。理論的には、LLM予測が十分正確であれば、我々のメカニズムは(近似)ベイズナッシュ平衡として高い努力と真理を動機付けることができる。実験により,Yelp レビューデータセットと ICLR OpenReview データセットという,2つの実際のデータセットで実施した実験を通じて,我々のメカニズムの有効性を確認した。 ICLRデータセットでは、人間によるレビュー、GPT-4生成レビュー、GPT-3.5生成レビューの3つの品質レベルを、期待されるスコアの観点から区別することが可能です。さらに、GSPPMはLPM生成レビューをGPPMよりも効果的にペナルティ化する。 Peer prediction mechanisms motivate high-quality feedback with provable guarantees. However, current methods only apply to rather simple reports, like multiple-choice or scalar numbers. We aim to broaden these techniques to the larger domain of text-based reports, drawing on the recent developments in large language models. This vastly increases the applicability of peer prediction mechanisms as textual feedback is the norm in a large variety of feedback channels: peer reviews, e-commerce customer reviews, and comments on social media. We introduce two mechanisms, the Generative Peer Prediction Mechanism (GPPM) and the Generative Synopsis Peer Prediction Mechanism (GSPPM). These mechanisms utilize LLMs as predictors, mapping from one agent's report to a prediction of her peer's report. Theoretically, we show that when the LLM prediction is sufficiently accurate, our mechanisms can incentivize high effort and truth-telling as an (approximate) Bayesian Nash equilibrium. Empirically, we confirm the efficacy of our mechanisms through experiments conducted on two real datasets: the Yelp review dataset and the ICLR OpenReview dataset. We highlight the results that on the ICLR dataset, our mechanisms can differentiate three quality levels -- human-written reviews, GPT-4-generated reviews, and GPT-3.5-generated reviews in terms of expected scores. Additionally, GSPPM penalizes LLM-generated reviews more effectively than GPPM.	翻訳日:2024-08-05 17:53:28 公開日:2024-08-02
# モデルレス強化学習のための多状態TDターゲット Multi-State TD Target for Model-Free Reinforcement Learning ( http://arxiv.org/abs/2405.16522v4 ) ライセンス: Link先を確認	Wuhao Wang, Zhiyong Chen, Lepeng Zhang,	(参考訳) 時間差学習(TD learning)は、TDターゲットを用いて状態または状態-作用対の値推定を更新する強化学習の基本的な技術である。このターゲットは、後続状態の即時報酬と推定値の両方を組み込むことにより、真の価値の見積もりを改善することを表す。伝統的に、TD学習は後の1つの状態の価値に依存している。本稿では、その後の複数の状態の推定値を利用する拡張多状態TD(MSTD)ターゲットを提案する。この新たなMSTD概念に基づいて,リプレイバッファを2つのモードで管理し,深い決定論的ポリシー最適化(DDPG)とソフトアクタクリティカル(SAC)を統合した,完全なアクタ批判アルゴリズムを開発した。実験の結果,MSTDをターゲットとしたアルゴリズムは従来の手法に比べて学習性能を著しく向上させることがわかった。 Temporal difference (TD) learning is a fundamental technique in reinforcement learning that updates value estimates for states or state-action pairs using a TD target. This target represents an improved estimate of the true value by incorporating both immediate rewards and the estimated value of subsequent states. Traditionally, TD learning relies on the value of a single subsequent state. We propose an enhanced multi-state TD (MSTD) target that utilizes the estimated values of multiple subsequent states. Building on this new MSTD concept, we develop complete actor-critic algorithms that include management of replay buffers in two modes, and integrate with deep deterministic policy optimization (DDPG) and soft actor-critic (SAC). Experimental results demonstrate that algorithms employing the MSTD target significantly improve learning performance compared to traditional methods.The code is provided on GitHub.	翻訳日:2024-08-05 17:53:28 公開日:2024-08-02
# 動的トンネル法による変分量子アルゴリズムの大域的最適化 Global optimization in variational quantum algorithms via dynamic tunneling method ( http://arxiv.org/abs/2405.18783v2 ) ライセンス: Link先を確認	Seung Park, Kyunghyun Baek, Seungjin Lee, Mahn-Soo Choi,	(参考訳) 動的トンネル流れを利用した変分量子アルゴリズムのグローバル最適化ルーチンを提案する。もともと、局所最小値の周辺で勾配に基づく最適化器が収集した情報を活用するために設計されたもので、従来の動的トンネル流を量子状態の距離測定に応用し、量子状態のパラメトリゼーションから生じる外在的縮退の問題を解消する。パラメータ空間上のユークリッド距離測定に基づく従来の動的トンネル法と比較しながら, 横フィールドイジングモデルに対する変分量子固有解法に適用し, ルーチンの性能を実証する。 We present a global optimization routine for the variational quantum algorithms, which utilizes the dynamic tunneling flow. Originally designed to leverage information gathered by a gradient-based optimizer around local minima, we adapt the conventional dynamic tunneling flow to exploit the distance measure of quantum states, resolving issues of extrinsic degeneracy arising from the parametrization of quantum states. Our global optimization algorithm is applied to the variational quantum eigensolver for the transverse-field Ising model to demonstrate the performance of our routine while comparing it with the conventional dynamic tunneling method, which is based on the Euclidean distance measure on the parameter space.	翻訳日:2024-08-05 17:53:28 公開日:2024-08-02
# 変圧器を用いた量子回路アンサッツの表現性学習 Learning the expressibility of quantum circuit ansatz using transformer ( http://arxiv.org/abs/2405.18837v2 ) ライセンス: Link先を確認	Fei Zhang, Jie Li, Zhimin He, Haozhen Situ,	(参考訳) 特定の問題に対する指数関数的に高速な計算により、近年は量子コンピューティングに大きな注目を集めている。変分量子アルゴリズムは量子コンピューティングを実装する上で重要な手法であり、適切なタスク固有の量子回路アンサッツは、VQAの量子優位性を効果的に強化することができる。しかし、膨大な検索スペースは、最適なタスク固有のアンサッツを見つけるのを困難にしている。量子回路のアンザッツ状態の多様性を定量化してヒルベルト空間を効果的に探索する表現性は、一方のアンザッツが他方よりも優れているかどうかを評価するために用いられる。本研究では,量子回路のアンサーゼの表現可能性を予測するために,トランスフォーマーモデルを提案する。ゲートワイズパイプラインが生成するランダムなPQCを含むデータセットを構築する。回路の表現性は、KL偏差、相対KL偏差、最大平均偏差の3つの尺度を用いて計算される。トランスモデルをトレーニングし、回路特性と表現性の間の複雑な関係をキャプチャする。変圧器の性能評価には4つの評価指標が用いられる。数値的な結果から, 訓練されたモデルは, 様々な表現性尺度で高い性能とロバスト性を達成できることが示されている。この研究は、量子回路アンサーゼの表現可能性の理解を深め、量子アーキテクチャ探索アルゴリズムを進化させることが可能である。 With the exponentially faster computation for certain problems, quantum computing has garnered significant attention in recent years. Variational quantum algorithms are crucial methods to implement quantum computing, and an appropriate task-specific quantum circuit ansatz can effectively enhance the quantum advantage of VQAs. However, the vast search space makes it challenging to find the optimal task-specific ansatz. Expressibility, quantifying the diversity of quantum circuit ansatz states to explore the Hilbert space effectively, can be used to evaluate whether one ansatz is superior to another. In this work, we propose using a transformer model to predict the expressibility of quantum circuit ansatze. We construct a dataset containing random PQCs generated by the gatewise pipeline, with varying numbers of qubits and gates. The expressibility of the circuits is calculated using three measures: KL divergence, relative KL divergence, and maximum mean discrepancy. A transformer model is trained on the dataset to capture the intricate relationships between circuit characteristics and expressibility. Four evaluation metrics are employed to assess the performance of the transformer. Numerical results demonstrate that the trained model achieves high performance and robustness across various expressibility measures. This research can enhance the understanding of the expressibility of quantum circuit ansatze and advance quantum architecture search algorithms.	翻訳日:2024-08-05 17:53:28 公開日:2024-08-02
# LightDE: ダングリングポインターを除去するための軽量化手法 LightDE: A Lightweight Method for Eliminating Dangling Pointers ( http://arxiv.org/abs/2405.20697v2 ) ライセンス: Link先を確認	Xun An,	(参考訳) UAF(Use-After-Free)脆弱性が広く存在していることは、ソフトウェアのセキュリティに深刻な脅威をもたらす。しかし、ポインタのメモリアドレスを特定のデータ構造に格納するためには、ポインタの割り当て操作に遭遇する際に、ダングリングポインタを排除してUAF脆弱性を防御する既存の方法がプログラムの実行を中断する必要がある。これにより、これらのメソッドは軽量ではない。この欠点を克服するために、LightDEと呼ばれる新しいアプローチを提案する。この方法は、プログラム実行中にポインタのメモリアドレスを保存する必要はない。 LightDEは,提案手法を用いて,プログラムコンパイル中にプログラムのデータセグメント内のポインティング関係をどのオブジェクトに向けるかを判断し,格納する。 LightDEは、ダングリングポインタをなくす際に、リリースオブジェクトに対するポインタ解析ポイントによって特定されるポインタのみを検証する必要があるため、非常に軽量である。実験の結果、LightDEはUAFの脆弱性に対して効果的に防御できることがわかった。 The widespread presence of Use-After-Free (UAF) vulnerabilities poses a serious threat to software security, with dangling pointers being considered the primary cause of these vulnerabilities. However, existing methods for defending against UAF vulnerabilities by eliminating dangling pointers need to interrupt the program's execution when encountering pointer assignment operations in order to store the memory addresses of the pointers in a specific data structure. This makes these methods not lightweight. To overcome this drawback, we propose a novel approach called LightDE. This method does not require storing the memory addresses of pointers during program execution. LightDE uses our proposed structure-sensitive pointer analysis method to determine which objects pointers point to and stores the pointing relationships in the program's data segment during program compilation. Since LightDE only needs to verify if pointers identified by the pointer analysis point to released objects when eliminating dangling pointers, it is very lightweight. Our experimental results show that LightDE can effectively defend against UAF vulnerabilities and the performance overhead it introduces is very low.	翻訳日:2024-08-05 17:53:28 公開日:2024-08-02
# L-PR: 未順序低オーバーラップマルチビューポイントクラウド登録のためのLiDARフィデューシャルマーカーのエクスプロイト L-PR: Exploiting LiDAR Fiducial Marker for Unordered Low Overlap Multiview Point Cloud Registration ( http://arxiv.org/abs/2406.03298v2 ) ライセンス: Link先を確認	Yibo Liu, Jinjun Shan, Amaldev Haridevan, Shuo Zhang,	(参考訳) ポイントクラウド登録は、コンピュータビジョンとロボティクスにおける多くのアプリケーションにとって必須条件である。既存の手法の多くは、高い重なり合いを持つ2点の雲をペアで登録することに焦点を当てている。重複の少ないケースにはいくつかの方法があるが、それらは劣化したシナリオで苦労している。本稿では,L-PRと呼ばれる新しいフレームワークについて紹介する。我々はこれらをLiDARフィデューシャルマーカーと呼んでいるが、一般的なエイプリルタグやArUcoマーカーと同じであり、環境の3次元幾何学に影響を与えない薄い紙のシートである。まず, 点雲間の視点が劇的に変化した場合に, 頑健な検出結果を提供する適応しきい値検出法を提案する。そこで,無秩序なマルチビューポイントクラウド登録問題をMAP問題として定式化し,それに対応するための2段階のグラフからなるフレームワークを開発する。重み付きグラフとして構築された第1レベルグラフは、非順序集合からスキャンポーズの初期値を効率よく最適に推定するように設計されている。第2レベルグラフは因子グラフとして構成される。スキャンポーズ,マーカーポーズ,マーカーコーナー位置など,グラフ上の変数をグローバルに最適化することにより,MAP問題に対処する。提案手法が従来のSOTA法を上回り,L-PRが低コストかつ効率的な3Dアセット収集・トレーニングデータ収集ツールとして機能することを示すために,定性的かつ定量的な実験を行った。特に,L-PRを用いたLvox-3DMatchという新しいデータセットを収集し,それをSOTA学習ベース手法であるSGHRのトレーニングに組み込むことにより,各種ベンチマークにおけるSGHRの大幅な改善を実現する。 Point cloud registration is a prerequisite for many applications in computer vision and robotics. Most existing methods focus on pairwise registration of two point clouds with high overlap. Although there have been some methods for low overlap cases, they struggle in degraded scenarios. This paper introduces a novel framework dubbed L-PR, designed to register unordered low overlap multiview point clouds leveraging LiDAR fiducial markers. We refer to them as LiDAR fiducial markers, but they are the same as the popular AprilTag and ArUco markers, thin sheets of paper that do not affect the 3D geometry of the environment. We first propose an improved adaptive threshold marker detection method to provide robust detection results when the viewpoints among point clouds change dramatically. Then, we formulate the unordered multiview point cloud registration problem as a maximum a-posteriori (MAP) problem and develop a framework consisting of two levels of graphs to address it. The first-level graph, constructed as a weighted graph, is designed to efficiently and optimally infer initial values of scan poses from the unordered set. The second-level graph is constructed as a factor graph. By globally optimizing the variables on the graph, including scan poses, marker poses, and marker corner positions, we tackle the MAP problem. We conduct both qualitative and quantitative experiments to demonstrate that the proposed method surpasses previous state-of-the-art (SOTA) methods and to showcase that L-PR can serve as a low-cost and efficient tool for 3D asset collection and training data collection. In particular, we collect a new dataset named Livox-3DMatch using L-PR and incorporate it into the training of the SOTA learning-based method, SGHR, which brings evident improvements for SGHR on various benchmarks.	翻訳日:2024-08-05 17:53:28 公開日:2024-08-02
# コンテクスト化されたベンディスコア誘導による生成画像のジオ多様性向上 Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance ( http://arxiv.org/abs/2406.04551v2 ) ライセンス: Link先を確認	Reyhane Askari Hemmat, Melissa Hall, Alicia Sun, Candace Ross, Michal Drozdzal, Adriana Romero-Soriano,	(参考訳) テキストから画像への生成モデルの人気が高まり、リスクやバイアスを理解することに焦点が当てられている。近年の研究では、最先端のモデルでは、日常の物体を現実世界の真の多様性で表現し、地理的領域間に顕著なギャップがあることが判明している。本研究では,地域ごとの変動が実世界の代表となるような共通オブジェクトの生成画像の多様性を高めることを目的としている。本稿では,従来の画像の「メモリバンク」と比較して,遅延拡散モデルの逆方向ステップを誘導し,サンプルの多様性を増大させるとともに,実世界の文脈化画像群の内部の変動量を制約する,推論時間介入(contextualized Vendi Score Guidance,c-VSG)を導入する。地理的に代表される2つのデータセットを用いてc-VSGを評価し、画像の品質と一貫性を同時に維持・改善しつつ、最もパフォーマンスの悪い領域と平均の両方において、生成された画像の多様性を著しく向上させることを示した。さらに、定性的分析により、原モデルに存在する還元領域の描写行を含む、生成画像の多様性が著しく改善されていることが明らかとなった。この研究が、世界の真の地理的多様性を反映した、テキストから画像への生成モデルへの一歩になることを願っています。 With the growing popularity of text-to-image generative models, there has been increasing focus on understanding their risks and biases. Recent work has found that state-of-the-art models struggle to depict everyday objects with the true diversity of the real world and have notable gaps between geographic regions. In this work, we aim to increase the diversity of generated images of common objects such that per-region variations are representative of the real world. We introduce an inference time intervention, contextualized Vendi Score Guidance (c-VSG), that guides the backwards steps of latent diffusion models to increase the diversity of a sample as compared to a "memory bank" of previously generated images while constraining the amount of variation within that of an exemplar set of real-world contextualizing images. We evaluate c-VSG with two geographically representative datasets and find that it substantially increases the diversity of generated images, both for the worst performing regions and on average, while simultaneously maintaining or improving image quality and consistency. Additionally, qualitative analyses reveal that diversity of generated images is significantly improved, including along the lines of reductive region portrayals present in the original model. We hope that this work is a step towards text-to-image generative models that reflect the true geographic diversity of the world.	翻訳日:2024-08-05 17:43:44 公開日:2024-08-02
# 言語モデルは特定の認知プロファイルをエミュレートする:予測可能性測定と個人差との相互作用に関する研究 Language models emulate certain cognitive profiles: An investigation of how predictability measures interact with individual differences ( http://arxiv.org/abs/2406.04988v2 ) ライセンス: Link先を確認	Patrick Haller, Lena S. Bolliger, Lena A. Jäger,	(参考訳) これまで, 集団レベルでは, 個人差によらず, 読書における主観的, エントロピー的効果に関するほとんどの調査が実施されてきた。本研究では,言語利用者の認知能力の情報を組み込んだ処理努力の指標として,人間の読取時間データに基づく言語モデル(LM)から推定される,素因とエントロピーの予測力を再考する。そこで本研究では,広範囲な心理測定試験を完了した個人から得られた読解データに基づいて,世代別およびエントロピーの推定値の予測力を評価した。具体的には,認知的スコアに対する主観的・エントロピーの調節が読解時間の予測精度を高めるかどうかを検討するとともに,認知的ハイパフォーマンス群や低パフォーマンス群の読解時間の予測において,LMが体系的なバイアスを示すかどうかを検証し,与えられたLMがどのような心理言語的対象をエミュレートするかを明らかにする。本研究は, 認知能力の付加は, 読解時間における主観的・エントロピーの予測能力を高め, 一般に, 心理測定試験における高い評価は, 予測可能性に対する感度の低下と関連していることを明らかにした。最後に, 分析したLMは, 対象群(高い言語知能を有する個人)に対して, 精度の低い予測可能性を示唆した。 To date, most investigations on surprisal and entropy effects in reading have been conducted on the group level, disregarding individual differences. In this work, we revisit the predictive power of surprisal and entropy measures estimated from a range of language models (LMs) on data of human reading times as a measure of processing effort by incorporating information of language users' cognitive capacities. To do so, we assess the predictive power of surprisal and entropy estimated from generative LMs on reading data obtained from individuals who also completed a wide range of psychometric tests. Specifically, we investigate if modulating surprisal and entropy relative to cognitive scores increases prediction accuracy of reading times, and we examine whether LMs exhibit systematic biases in the prediction of reading times for cognitively high- or low-performing groups, revealing what type of psycholinguistic subject a given LM emulates. Our study finds that in most cases, incorporating cognitive capacities increases predictive power of surprisal and entropy on reading times, and that generally, high performance in the psychometric tests is associated with lower sensitivity to predictability effects. Finally, our results suggest that the analyzed LMs emulate readers with lower verbal intelligence, suggesting that for a given target group (i.e., individuals with high verbal intelligence), these LMs provide less accurate predictability estimates.	翻訳日:2024-08-05 17:43:44 公開日:2024-08-02
# Bipartite Reweight-Annealingアルゴリズムによる絡み合いエントロピーとその誘導体の大規模データを高精度に抽出する Bipartite reweight-annealing algorithm to extract large-scale data of entanglement entropy and its derivative in high precision ( http://arxiv.org/abs/2406.05324v3 ) ライセンス: Link先を確認	Zhe Wang, Zhiyan Wang, Yi-Ming Ding, Bin-Bin Mao, Zheng Yan,	(参考訳) 本稿では,量子モンテカルロ法(QMC)を用いて,エンタングルメントエントロピー(EE)とその誘導体の大規模データを高精度かつ低い技術的障壁で抽出する手法を提案する。我々は、異なる時空多様体内の2つの分割関数の重なりの直接計算を回避し、代わりにreweight-annealingスキームを介してそれらを別々に得る。インクリメンタルなプロセスはこのフレームの実際の物理パラメータの経路に沿って設計することができ、全ての中間子は対応するパラメータのEEであり、アルゴリズムの効率は10^4$以上改善される。 EEの計算はずっと安くなり、より簡単になります。 2次元および高次元系の広いパラメータ領域でEEを走査することで、新しい位相と位相遷移を数値的に検出する手段を開く。次に、EEとそのデリバティブを用いて位相遷移点を見つけ、新しい位相を探索する可能性を示す。 We propose a quantum Monte Carlo (QMC) scheme able to extract large-scale data of entanglement entropy (EE) and its derivative with high precision and low technical barrier. We avoid directly computing the overlap of two partition functions within different spacetime manifolds and instead obtain them separately via reweight-annealing scheme. The incremental process can be designed along the path of real physical parameters in this frame, and all intermediates are EEs of corresponding parameters, so the algorithm efficiency is improved by more than $10^4$ of times. The calculation of EE becomes much cheaper and simpler. It opens a way to numerically detect the novel phases and phase transitions by scanning EE in a wide parameter-region in two and higher dimensional systems. We then show the feasibility of using EE and its derivative to find phase transition points and to probe novel phases.	翻訳日:2024-08-05 17:43:44 公開日:2024-08-02
# 連続および離散量子バスを用いた光誘起ダイナミクス Photo-induced dynamics with continuous and discrete quantum baths ( http://arxiv.org/abs/2406.07047v2 ) ライセンス: Link先を確認	Zhaoxuan Xie, Mattia Moroder, Ulrich Schollwöck, Sebastian Paeckel,	(参考訳) 複雑な分子における光物理過程の超高速量子力学は、量子化学と生物学における様々な興味深い応用で非常に難しい計算問題である。オープン量子系の最近の発展に触発されて、マルコフの埋め込みを用いて、離散的で効果的なボゾン自由度の集合を通して連続環境を記述する純粋状態の未発見ハイブリッドバス法を導入する。本手法は, 連続スペクトル密度と鋭いピークの双方を記述できる。これにより、離散振動モードの集合のユニタリダイナミクスを用いた長期記憶効果のキャプチャや、リンドブラッドやレッドフィールドのマスター方程式を用いたメモリレスマルコフ環境の利用といった、従来の手法の限界を克服する。量子化学と生物学の2つのパラダイム的問題に対して,本手法をベンチマークする。単元的記述と比較して、ボソニックモードの数が非常に少なく、エクシトニックダイナミクスを正確に記述でき、計算速度がほぼ1桁に向上することを示した。さらに、光ハーベスティング複合体のスペクトル密度が$$\delta$-peakの効果を明示的に考慮し、環境の長期記憶が動的に与える影響を強く示している。 The ultrafast quantum dynamics of photophysical processes in complex molecules is an extremely challenging computational problem with a wide variety of fascinating applications in quantum chemistry and biology. Inspired by recent developments in open quantum systems, we introduce a pure-state unraveled hybrid-bath method that describes a continuous environment via a set of discrete, effective bosonic degrees of freedom using a Markovian embedding. Our method is capable of describing both, a continuous spectral density and sharp peaks embedded into it. Thereby, we overcome the limitations of previous methods, which either capture long-time memory effects using the unitary dynamics of a set of discrete vibrational modes or use memoryless Markovian environments employing a Lindblad or Redfield master equation. We benchmark our method against two paradigmatic problems from quantum chemistry and biology. We demonstrate that compared to unitary descriptions, a significantly smaller number of bosonic modes suffices to describe the excitonic dynamics accurately, yielding a computational speed-up of nearly an order of magnitude. Furthermore, we take into account explicitly the effect of a $\delta$-peak in the spectral density of a light-harvesting complex, demonstrating the strong impact of the long-time memory of the environment on the dynamics.	翻訳日:2024-08-05 17:43:44 公開日:2024-08-02
# 実話にしよう:対面会話のための音声対話モデル Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation ( http://arxiv.org/abs/2406.07867v2 ) ライセンス: Link先を確認	Se Jin Park, Chae Won Kim, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeong Hun Yeo, Yong Man Ro,	(参考訳) 本稿では,新しい対面音声対話モデルを提案する。ユーザ入力から音声-視覚音声を処理し、音声-視覚音声を応答として生成し、中間テキストに頼ることなくアバターチャットボットシステムを構築するための最初のステップを示す。この目的のために我々は,オープンドメイン対話データセットであるTopicalChatに基づいて,約9000対話の340時間を含む,最初の大規模マルチモーダル音声対話コーパスであるMultiDialogを新たに導入した。マルチダイアログには、与えられたスクリプトに従って行動する会話相手の音声と視覚の同時記録が含まれており、マルチモーダル合成の研究機会が開けることを期待している。我々の対面音声対話モデルは、テキスト事前学習された大きな言語モデルを導入し、音声-テキスト共同学習を取り入れて音声-視覚対話領域に適応する。広範にわたる実験を通して, 対面会話の促進におけるモデルの有効性を検証した。デモとデータはhttps://multidialog.github.ioとhttps://huggingface.co/datasets/IVLLab/MultiDialogで公開されている。 In this paper, we introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response, marking the initial step towards creating an avatar chatbot system without relying on intermediate text. To this end, we newly introduce MultiDialog, the first large-scale multimodal (i.e., audio and visual) spoken dialogue corpus containing 340 hours of approximately 9,000 dialogues, recorded based on the open domain dialogue dataset, TopicalChat. The MultiDialog contains parallel audio-visual recordings of conversation partners acting according to the given script with emotion annotations, which we expect to open up research opportunities in multimodal synthesis. Our Face-to-Face spoken dialogue model incorporates a textually pretrained large language model and adapts it into the audio-visual spoken dialogue domain by incorporating speech-text joint pretraining. Through extensive experiments, we validate the effectiveness of our model in facilitating a face-to-face conversation. Demo and data are available at https://multidialog.github.io and https://huggingface.co/datasets/IVLLab/MultiDialog, respectively.	翻訳日:2024-08-05 17:43:44 公開日:2024-08-02
# ハイパースペクトル画像復調のためのハイブリッド空間スペクトルニューラルネットワーク Hybrid Spatial-spectral Neural Network for Hyperspectral Image Denoising ( http://arxiv.org/abs/2406.08782v2 ) ライセンス: Link先を確認	Hao Liang, Chengjie, Kun Li, Xin Tian,	(参考訳) ハイパースペクトル画像(HSI)は、HSIアプリケーションに必須の手順である。残念なことに、Transformerベースの既存の手法は主に非局所モデリングに焦点をあてており、画像の復調における局所性の重要性を無視している。さらに、深層学習法は複雑なスペクトル学習機構を用いており、計算コストが大きい。これらの問題に対処するために,CNNとTransformer特性にインスパイアされた新しいハイブリッドデュアルパスネットワークを設計し,局所的および非局所的な空間的詳細を効率よく捕捉し,ノイズを抑えるハイブリッド空間スペクトル認知ネットワーク(HSSD)を提案する。さらに、計算複雑性を低減するために、空間とスペクトルチャネルの学習を阻害する単純だが効果的な分離戦略を採用し、パラメータの少ない多層認識を用いてスペクトルのグローバルな相関関係を学習する。合成および実実験により,提案手法は空間的およびスペクトル的再構成における最先端の手法より優れていることが示された。コードと詳細はhttps://github.com/HLImg/HSSDで確認できる。 Hyperspectral image (HSI) denoising is an essential procedure for HSI applications. Unfortunately, the existing Transformer-based methods mainly focus on non-local modeling, neglecting the importance of locality in image denoising. Moreover, deep learning methods employ complex spectral learning mechanisms, thus introducing large computation costs. To address these problems, we propose a hybrid spatial-spectral denoising network (HSSD), in which we design a novel hybrid dual-path network inspired by CNN and Transformer characteristics, leading to capturing both local and non-local spatial details while suppressing noise efficiently. Furthermore, to reduce computational complexity, we adopt a simple but effective decoupling strategy that disentangles the learning of space and spectral channels, where multilayer perception with few parameters is utilized to learn the global correlations among spectra. The synthetic and real experiments demonstrate that our proposed method outperforms state-of-the-art methods on spatial and spectral reconstruction. The code and details are available on https://github.com/HLImg/HSSD.	翻訳日:2024-08-05 17:43:44 公開日:2024-08-02
# ポスト量子ステアリングの活性化 Activation of post-quantum steering ( http://arxiv.org/abs/2406.10570v2 ) ライセンス: Link先を確認	Ana Belén Sainz, Paul Skrzypczyk, Matty J. Hoban,	(参考訳) ベルの不等式により大きな違反を与える物理理論は、対応するティレルソン境界(英語版)(Tsirelson bound)、すなわち量子後非局所性(英語版)(post-quantum non-locality)と呼ばれるものよりも大きい可能性がある。このような理論は特殊相対性理論に反するものではなく、特定の情報処理タスクにおいて有利である可能性がある。エンタングル量子状態が非古典的な現象を示す別の方法として、アインシュタイン=ポドルスキー=ローゼン(EPR)ステアリングがある。術後のERPステアリングはより複雑であるが,従来のベル試験では必ずしも非局所性を示すものではないことが示されている。本研究では,量子後非局所性を個別に示さない大規模ネットワークにおいて資源を分配する方法を示す。すなわち,ベルシナリオにおいて,量子後相関として確認できるように,量子後ステアリングを活性化する方法を示す。独立した研究の1つの要素は、量子後資源を仮定してさえも、ネットワーク内の二部量子集合を自己テストする方法を示すことである。 There are possible physical theories that give greater violations of Bell's inequalities than the corresponding Tsirelson bound, termed post-quantum non-locality. Such theories do not violate special relativity, but could give an advantage in certain information processing tasks. There is another way in which entangled quantum states exhibit non-classical phenomena, with one notable example being Einstein-Podolsky-Rosen (EPR) steering; a violation of a bipartite Bell inequality implies EPR steering, but the converse is not necessarily true. The study of post-quantum EPR steering is more intricate, but it has been shown that it does not always imply post-quantum non-locality in a conventional Bell test. In this work we show how to distribute resources in a larger network that individually do not demonstrate post-quantum non-locality but violate a Tsirelson bound for the network. That is, we show how to activate post-quantum steering so that it can now be witnessed as post-quantum correlations in a Bell scenario. One element of our work that may be of independent interest is we show how to self-test a bipartite quantum assemblage in a network, even assuming post-quantum resources.	翻訳日:2024-08-05 17:43:44 公開日:2024-08-02
# 限界フェルミオンはユニバーサル・エンベズラーである Critical Fermions are Universal Embezzlers ( http://arxiv.org/abs/2406.11747v2 ) ライセンス: Link先を確認	Lauritz van Luijk, Alexander Stottmeister, Henrik Wilming,	(参考訳) 普遍エンベズラー(Universal embezzler)は、任意の絡み合った状態が任意の精度で局所演算に抽出され、系の状態を任意に摂動する二部量子系である。一次元格子上の局所的、翻訳不変、および臨界自由フェルミオン多体系の基底状態セクターは、2つの半鎖に分割された場合、普遍エンベズラーである。同じ性質は、ジョルダン・ウィグナー変換を介して局所的に相互作用する双対スピン鎖において成り立つ。普遍エンベッズメントは、熱力学の極限に限らず、既に有限系のサイズに対して現れている:任意の有限誤差と任意の対象の絡み合った状態に対して、鎖の有限長は与えられた誤差の中にその状態をエンベッズするのに十分である。技術的なレベルでは、与えられたモデルの基底状態セクターに付随する半鎖可観測代数がタイプIII$_1$因子であることを示す。 Universal embezzlers are bipartite quantum systems from which any entangled state may be extracted to arbitrary precision using local operations while perturbing the state of the system arbitrarily little. Here, we show that universal embezzlers are ubiquitous in many-body physics: The ground state sector of every local, translation-invariant, and critical free-fermionic many-body system on a one-dimensional lattice is a universal embezzler if bi-partitioned into two half-chains. The same property holds in locally-interacting, dual spin chains via the Jordan-Wigner transformation. Universal embezzlement manifests already for finite system sizes, not only in the thermodynamic limit: For any finite error and any targeted entangled state, a finite length of the chain is sufficient to embezzle said state within the given error. On a technical level, our main result establishes that the half-chain observable algebras associated with ground state sectors of the given models are type III$_1$ factors.	翻訳日:2024-08-05 17:43:44 公開日:2024-08-02
# 階層型変圧器における細粒度注意 Fine-grained Attention in Hierarchical Transformers for Tabular Time-series ( http://arxiv.org/abs/2406.15327v2 ) ライセンス: Link先を確認	Raphael Azorin, Zied Ben Houidi, Massimo Gallo, Alessandro Finamore, Pietro Michiardi,	(参考訳) タブラルデータは、多くの実生活システムにおいてユビキタスである。特に、行が時系列的に関連付けられている時間依存の表データは、典型的には歴史的イベント、例えば、金融取引、医療記録、株価履歴を記録するために使用される。近年,変圧器アーキテクチャのアテンション機構の階層的変化は,表型時系列データのモデル化に利用されている。最初は、行(または列)は、フィールド間の注意を計算することによって、別々に符号化される。その後、エンコードされた行(または列)が互いに参加し、表の時系列全体をモデル化する。このアプローチは効率的だが、注意の粒度を制限し、異なる行や列をまたいだフィールドレベルでパターンを学習する能力を制限する。このギャップに対処する第一歩として、行レベルと列レベルのフィールドをコンテキスト化する、きめ細かい階層モデルであるFieldyを提案します。我々は,表表表時系列データセットを用いた回帰・分類タスクの最先端モデルに対する提案を比較検討した。その結果,行ワイドと列ワイドアテンションを組み合わせることで,モデルサイズを増大させることなく性能が向上することがわかった。コードとデータはhttps://github.com/raphaaal/fieldy.comで公開されている。 Tabular data is ubiquitous in many real-life systems. In particular, time-dependent tabular data, where rows are chronologically related, is typically used for recording historical events, e.g., financial transactions, healthcare records, or stock history. Recently, hierarchical variants of the attention mechanism of transformer architectures have been used to model tabular time-series data. At first, rows (or columns) are encoded separately by computing attention between their fields. Subsequently, encoded rows (or columns) are attended to one another to model the entire tabular time-series. While efficient, this approach constrains the attention granularity and limits its ability to learn patterns at the field-level across separate rows, or columns. We take a first step to address this gap by proposing Fieldy, a fine-grained hierarchical model that contextualizes fields at both the row and column levels. We compare our proposal against state of the art models on regression and classification tasks using public tabular time-series datasets. Our results show that combining row-wise and column-wise attention improves performance without increasing model size. Code and data are available at https://github.com/raphaaal/fieldy.	翻訳日:2024-08-05 17:43:44 公開日:2024-08-02
# 計算生活: 単純な相互作用から生み出す、十分に形成された自己複製プログラム Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction ( http://arxiv.org/abs/2406.19108v2 ) ライセンス: Link先を確認	Blaise Agüera y Arcas, Jyrki Alakuijala, James Evans, Ben Laurie, Alexander Mordvintsev, Eyvind Niklasson, Ettore Randazzo, Luca Versari,	(参考訳) 生命の起源と人工生命の分野はどちらも、生命とは何か、どのようにして「前生命」のダイナミクスの異なる集合から生まれるのかを疑問視している。生命が出現するほとんどの基質の一般的な特徴は、自己複製が現れるときのダイナミックスの変化である。自己複製器が自然にどのように出現したかについてはいくつかの仮説があるが、一般力学、計算原理、そして自己複製器が現れるために必要な条件についてはほとんど分かっていない。これは、相互作用が論理的、数学的、またはプログラミング規則を含む「計算基板」に特に当てはまる。本稿では,様々な単純なプログラム言語と機械命令セットに基づいて,複数の計算基板を研究することによって,自己複製器がどのように生じるかを理解するための一歩を踏み出した。本研究では,無作為で非自己複製プログラムが明示的なフィットネス環境を持たない環境に置かれる場合,自己複製プログラムが出現する傾向があることを示す。ランダムな相互作用と自己修正が原因で発生することを示し、バックグラウンドなランダムな突然変異を伴わずとも起こりうる。また,自己複製器の出現に伴い,複雑なダイナミクスが出現し続けていることを示す。最後に,自己複製が可能である最小主義プログラミング言語の反例を示す。 The fields of Origin of Life and Artificial Life both question what life is and how it emerges from a distinct set of "pre-life" dynamics. One common feature of most substrates where life emerges is a marked shift in dynamics when self-replication appears. While there are some hypotheses regarding how self-replicators arose in nature, we know very little about the general dynamics, computational principles, and necessary conditions for self-replicators to emerge. This is especially true on "computational substrates" where interactions involve logical, mathematical, or programming rules. In this paper we take a step towards understanding how self-replicators arise by studying several computational substrates based on various simple programming languages and machine instruction sets. We show that when random, non self-replicating programs are placed in an environment lacking any explicit fitness landscape, self-replicators tend to arise. We demonstrate how this occurs due to random interactions and self-modification, and can happen with and without background random mutations. We also show how increasingly complex dynamics continue to emerge following the rise of self-replicators. Finally, we show a counterexample of a minimalistic programming language where self-replicators are possible, but so far have not been observed to arise.	翻訳日:2024-08-05 17:43:44 公開日:2024-08-02
# Chat AI: HPCベースのサービスのためのシームレススラムネイティブソリューション Chat AI: A Seamless Slurm-Native Solution for HPC-Based Services ( http://arxiv.org/abs/2407.00110v2 ) ライセンス: Link先を確認	Ali Doosthosseini, Jonathan Decker, Hendrik Nolte, Julian M. Kunkel,	(参考訳) 大規模言語モデル(LLM)の普及により、効率的でセキュアでプライベートなサービスインフラストラクチャの必要性が高まっている。最先端GPUを備えた高性能コンピューティング(HPC)システムは、LLMのトレーニングに適しているが、彼らのバッチスケジューリングパラダイムは、AIアプリケーションのリアルタイム配信をサポートするように設計されていない。一方、クラウドシステムはWebサービスには適しているが、一般的にHPCクラスタの計算能力、特に最適な推論速度に必要な高価で少ないハイエンドGPUにアクセスできない。本稿では,HPC システム上で多数の LLM モデルを実行するスケーラブルなバックエンドにセキュアにアクセス可能な,クラウド VM 上で動作する Web サービスによる実装を提案する。 LLMをホストするHPCインフラストラクチャを使用したWebサービスを提供することで、ローカル大学や研究センターの信頼された環境を活用し、商用LLMサービスに代わるプライベートでセキュアな代替手段を提供する。我々のソリューションは、HPCバッチスケジューラSlurmとネイティブに統合され、HPCクラスタへのシームレスなデプロイを可能にし、Slurmが生成したスケジュールのギャップを利用して、通常のSlurmワークロードと並行して実行できる。 HPCシステムのセキュリティを確保するため、SSH ForceCommandディレクティブを用いてロバストなサーキットブレーカーを構築する。当社のシステムは実運用サービスとして成功し、ソースコードは \url{https://github.com/gwdg/chat-ai} で公開しました。 The widespread adoption of large language models (LLMs) has created a pressing need for an efficient, secure and private serving infrastructure, which allows researchers to run open source or custom fine-tuned LLMs and ensures users that their data remains private and is not stored without their consent. While high-performance computing (HPC) systems equipped with state-of-the-art GPUs are well-suited for training LLMs, their batch scheduling paradigm is not designed to support real-time serving of AI applications. Cloud systems, on the other hand, are well suited for web services but commonly lack access to the computational power of HPC clusters, especially expensive and scarce high-end GPUs, which are required for optimal inference speed. We propose an architecture with an implementation consisting of a web service that runs on a cloud VM with secure access to a scalable backend running a multitude of LLM models on HPC systems. By offering a web service using our HPC infrastructure to host LLMs, we leverage the trusted environment of local universities and research centers to offer a private and secure alternative to commercial LLM services. Our solution natively integrates with the HPC batch scheduler Slurm, enabling seamless deployment on HPC clusters, and is able to run side by side with regular Slurm workloads, while utilizing gaps in the schedule created by Slurm. In order to ensure the security of the HPC system, we use the SSH ForceCommand directive to construct a robust circuit breaker, which prevents successful attacks on the web-facing server from affecting the cluster. We have successfully deployed our system as a production service, and made the source code available at \url{https://github.com/gwdg/chat-ai}	翻訳日:2024-08-05 17:43:44 公開日:2024-08-02
# 近似ベイズ計算による量子系パラメータの効率的な推論 Efficient inference of quantum system parameters by Approximate Bayesian Computation ( http://arxiv.org/abs/2407.00724v2 ) ライセンス: Link先を確認	Lewis A. Clark, Jan Kolodynski,	(参考訳) システムパラメータを効率的に推論する能力は、高速な操作を必要とする任意の信号処理タスクにおいて不可欠である。量子システムとのディーリングはヒルベルト空間がシステムサイズで大きく成長することによる深刻な問題である。観測された測定データの統計、すなわち、容易に計算できないため、最大形推定器や粒子フィルタのような一般的な手法は実用的ではない。この問題に対処するために、与えられた量子デバイスに予め用意された測定データのライブラリーからサンプリングすることで、確率計算を回避できる近似ベイズ計算(ABC)アルゴリズムを提案する。本研究では,2レベル原子と光学系をリアルタイムに探索する際に発生する光検出クリックパターンの解釈にABCを適用した。後者については、線形と非線形の両方のレジームを考察し、量子計測統計を理解することによってABCアルゴリズムをカスタマイズする方法を示す。我々の研究は、量子デバイスと関連する測定方法が複雑でなくても、高速パラメータ推論が可能であることを実証している。 The ability to efficiently infer system parameters is essential in any signal-processing task that requires fast operation. Dealing with quantum systems, a serious challenge arises due to substantial growth of the underlying Hilbert space with the system size. As the statistics of the measurement data observed, i.e. the likelihood, can no longer be easily computed, common approaches such as maximum-likelihood estimators or particle filters become impractical. To address this issue, we propose the use of the Approximate Bayesian Computation (ABC) algorithm, which evades likelihood computation by sampling from a library of measurement data -- a priori prepared for a given quantum device. We apply ABC to interpret photodetection click-patterns arising when probing in real time a two-level atom and an optomechanical system. For the latter, we consider both linear and non-linear regimes, in order to show how to tailor the ABC algorithm by understanding the quantum measurement statistics. Our work demonstrates that fast parameter inference may be possible no matter the complexity of a quantum device and the measurement scheme involved.	翻訳日:2024-08-05 17:43:44 公開日:2024-08-02
# OpenVid-1M:テキスト・ビデオ・ジェネレーションのための大規模高品質データセット OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation ( http://arxiv.org/abs/2407.02371v2 ) ライセンス: Link先を確認	Kepan Nan, Rui Xie, Penghao Zhou, Tiehan Fan, Zhenheng Yang, Zhijie Chen, Xiang Li, Jian Yang, Ying Tai,	(参考訳) テキスト・ツー・ビデオ(T2V)生成は、大規模なマルチモダリティモデルであるSoraのおかげで、近年大きな注目を集めている。しかし、T2V生成には2つの重要な課題がある。 1) 正確なオープンソースの高品質データセットの欠如。以前の人気ビデオデータセットであるWebVid-10MやPanda-70Mは、ほとんどの研究機関では低品質か大きすぎる。したがって、T2V生成のために正確な高品質のテキストビデオペアを収集することは困難であるが、極めて重要である。 2) テキスト情報の完全活用を怠る。近年のT2V法は、テキストプロンプトから意味情報を徹底的に抽出するに足らない、ビデオ生成のための単純なクロスアテンションモジュールを用いて、視覚変換器に焦点を当てている。これらの問題に対処するために,表現的キャプションを備えた高精度な高品質データセットOpenVid-1Mを導入する。このオープンシナリオデータセットには100万以上のテキストビデオペアが含まれており、T2V生成の研究を容易にする。さらに、OpenVid-1Mから433K 1080pのビデオをキュレートし、OpenVidHD-0.4Mを作成し、高精細ビデオ生成を進める。さらに,視覚トークンから構造情報を抽出し,テキストトークンから意味情報を抽出する多モードビデオ拡散変換器(MVDiT)を提案する。大規模な実験とアブレーション研究により,過去のデータセットよりもOpenVid-1Mの方が優れており,MVDiTの有効性が検証された。 Text-to-video (T2V) generation has recently garnered significant attention thanks to the large multi-modality model Sora. However, T2V generation still faces two important challenges: 1) Lacking a precise open sourced high-quality dataset. The previous popular video datasets, e.g. WebVid-10M and Panda-70M, are either with low quality or too large for most research institutions. Therefore, it is challenging but crucial to collect a precise high-quality text-video pairs for T2V generation. 2) Ignoring to fully utilize textual information. Recent T2V methods have focused on vision transformers, using a simple cross attention module for video generation, which falls short of thoroughly extracting semantic information from text prompt. To address these issues, we introduce OpenVid-1M, a precise high-quality dataset with expressive captions. This open-scenario dataset contains over 1 million text-video pairs, facilitating research on T2V generation. Furthermore, we curate 433K 1080p videos from OpenVid-1M to create OpenVidHD-0.4M, advancing high-definition video generation. Additionally, we propose a novel Multi-modal Video Diffusion Transformer (MVDiT) capable of mining both structure information from visual tokens and semantic information from text tokens. Extensive experiments and ablation studies verify the superiority of OpenVid-1M over previous datasets and the effectiveness of our MVDiT.	翻訳日:2024-08-05 15:50:45 公開日:2024-08-02
# D-Rax:マルチモーダルデータとeXpertモデル予測を利用したドメイン固有無線アシスタント D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions ( http://arxiv.org/abs/2407.02604v2 ) ライセンス: Link先を確認	Hareem Nisar, Syed Muhammad Anwar, Zhifan Jiang, Abhijeet Parida, Ramon Sanchez-Jacob, Vishwesh Nath, Holger R. Roth, Marius George Linguraru,	(参考訳) 大規模視覚言語モデル(VLM)は、研究から汎用ユースケースへの適用に至るまで、驚くほど進歩している。 LLaVA-Medは、バイオメディシンのための先駆的な大規模言語とビジョンアシスタントであり、放射線医学者のための自然言語インタフェースを提供するために、マルチモーダルなバイオメディカルイメージとデータ分析を実行することができる。非常に一般化可能であり、マルチモーダルデータで動作するが、現在、大きな言語モデル空間に存在するよく知られた課題によって制限されている。幻覚と反応のインプレクションは、現在VLMの臨床的適応性を阻害している誤診を引き起こす可能性がある。医療において正確なユーザフレンドリなモデルを作成するために、D-Raxを提案する。D-Raxは、特定の放射線画像についての洞察を得るために使用できる、ドメイン固有の、会話型、無線支援ツールである。本研究では,胸部X線画像(CXR)の会話解析を強化し,放射線学的診断を支援するとともに,医用画像からの包括的洞察と正確な診断の定式化を支援する。 D-Raxは、画像、命令、およびMIMIC-CXR画像データ、CXR関連視覚質問応答(VQA)ペア、および複数の専門家AIモデルから得られる予測結果からなる画像、命令、および疾患診断および人口統計予測を含む、我々のキュレートされた命令追従データに基づいてLLaVA-Medアーキテクチャを微調整することで実現される。オープン・エンド・会話とクローズド・会話の双方において,反応の統計的に有意な改善が認められた。最先端の診断モデルのパワーをVLMと組み合わせることで、D-Raxは、臨床医が自然言語を使って医療画像と対話できるようにし、意思決定プロセスの合理化、診断精度の向上、時間の保存を可能にする。 Large vision language models (VLMs) have progressed incredibly from research to applicability for general-purpose use cases. LLaVA-Med, a pioneering large language and vision assistant for biomedicine, can perform multi-modal biomedical image and data analysis to provide a natural language interface for radiologists. While it is highly generalizable and works with multi-modal data, it is currently limited by well-known challenges that exist in the large language model space. Hallucinations and imprecision in responses can lead to misdiagnosis which currently hinder the clinical adaptability of VLMs. To create precise, user-friendly models in healthcare, we propose D-Rax -- a domain-specific, conversational, radiologic assistance tool that can be used to gain insights about a particular radiologic image. In this study, we enhance the conversational analysis of chest X-ray (CXR) images to support radiological reporting, offering comprehensive insights from medical imaging and aiding in the formulation of accurate diagnosis. D-Rax is achieved by fine-tuning the LLaVA-Med architecture on our curated enhanced instruction-following data, comprising of images, instructions, as well as disease diagnosis and demographic predictions derived from MIMIC-CXR imaging data, CXR-related visual question answer (VQA) pairs, and predictive outcomes from multiple expert AI models. We observe statistically significant improvement in responses when evaluated for both open and close-ended conversations. Leveraging the power of state-of-the-art diagnostic models combined with VLMs, D-Rax empowers clinicians to interact with medical images using natural language, which could potentially streamline their decision-making process, enhance diagnostic accuracy, and conserve their time.	翻訳日:2024-08-05 15:50:45 公開日:2024-08-02
# LLMs Plagiarize:知識グラフ比較による大規模言語モデルの学習データの応答性保証 LLMs Plagiarize: Ensuring Responsible Sourcing of Large Language Model Training Data Through Knowledge Graph Comparison ( http://arxiv.org/abs/2407.02659v2 ) ライセンス: Link先を確認	Devam Mondal, Carlo Lipizzi,	(参考訳) 近年,出版社,新聞,その他の著作権付きコーパス作成者が,著作権付き教材をトレーニングや微調整目的で利用する大規模言語モデル開発者に対して起こした法的主張を踏まえ,大規模言語モデルの訓練や微調整に知識源が使用されているかどうかを評価する新しいシステム,盗作検出システムを提案する。現在の手法とは異なり、我々はResource Description Framework(RDF)トリプルを使用して、ソースドキュメントとLLM継続の両方から知識グラフを作成するアプローチを利用する。これらのグラフは、コサイン類似性を用いてコンテンツに関して分析され、また、同型度を示すグラフ編集距離の正規化版を用いて構造に関して分析される。ソースとターゲットコーパス間のコンテンツマッチングやキーワード識別に重点を置く従来のプラジャリズムシステムとは異なり,提案手法は,アイデアと組織間の相互関係に着目して,ソースドキュメントとLCM継続の類似性をより広く,より正確な評価を可能にする。さらに,閉鎖型大規模言語モデル「ブラックボックス」システムやトレーニングコーパスでは利用できないパープレキシティなどのLCMメトリクスへのアクセスも不要である。そこで我々は,LLMがコーパスの継続を類似度測定によって「プラギアル化」したかどうかを評価する。システムのプロトタイプはハイパーリンクされたGitHubリポジトリで公開されます。 In light of recent legal allegations brought by publishers, newspapers, and other creators of copyrighted corpora against large language model developers who use their copyrighted materials for training or fine-tuning purposes, we propose a novel system, a variant of a plagiarism detection system, that assesses whether a knowledge source has been used in the training or fine-tuning of a large language model. Unlike current methods, we utilize an approach that uses Resource Description Framework (RDF) triples to create knowledge graphs from both a source document and an LLM continuation of that document. These graphs are then analyzed with respect to content using cosine similarity and with respect to structure using a normalized version of graph edit distance that shows the degree of isomorphism. Unlike traditional plagiarism systems that focus on content matching and keyword identification between a source and a target corpus, our approach enables a broader and more accurate evaluation of similarity between a source document and LLM continuation by focusing on relationships between ideas and their organization with regards to others. Additionally, our approach does not require access to LLM metrics like perplexity that may be unavailable in closed large language model "black-box" systems, as well as the training corpus. We thus assess whether an LLM has "plagiarized" a corpus in its continuation through similarity measures. A prototype of our system will be found on a hyperlinked GitHub repository.	翻訳日:2024-08-05 15:50:45 公開日:2024-08-02
# 機械学習を用いた二元中性子星のリアルタイム重力波推定 Real-time gravitational-wave inference for binary neutron stars using machine learning ( http://arxiv.org/abs/2407.09602v2 ) ライセンス: Link先を確認	Maximilian Dax, Stephen R. Green, Jonathan Gair, Nihar Gupte, Michael Pürrer, Vivien Raymond, Jonas Wildberger, Jakob H. Macke, Alessandra Buonanno, Bernhard Schölkopf,	(参考訳) 二元中性子星(BNS)の融合は重力波(GW)と電磁スペクトル(EM)の両方で信号を放出する。有名なことに、GW170817のマルチセンサー観測は、宇宙論、核物理学、重力の科学的な発見につながった。これらの結果の中心は、GW170817の場合、GW信号の11時間後、関連するEM過渡性AT 2017gfoを特定するのに役立った、GWデータから得られる空の局在と距離である。 GWデータの高速解析は、時間に敏感なEM観測を誘導するために重要であるが、信号の長さと複雑さから生じる問題のため、精度を犠牲にする近似を行う必要があることが多い。本稿では,そのような近似を行なわずに,1秒で完全なBNS推論を行う機械学習フレームワークを提案する。提案手法によるマルチメーカ観測の促進一合併前の正確な位置決め (ii) 近似低遅延法と比較して印加精度を$\sim30\%$で改善し, 三光度距離、傾き、質量の詳細な情報で、高価な望遠鏡の時間を優先することができる。さらに、我々の手法の柔軟性とコスト削減は、状態方程式研究の新しい機会を開く。最後に,提案手法は最大1時間までの超長信号にスケールし,次世代地上・宇宙用検出器のデータ解析の青写真として機能することを示す。 Mergers of binary neutron stars (BNSs) emit signals in both the gravitational-wave (GW) and electromagnetic (EM) spectra. Famously, the 2017 multi-messenger observation of GW170817 led to scientific discoveries across cosmology, nuclear physics, and gravity. Central to these results were the sky localization and distance obtained from GW data, which, in the case of GW170817, helped to identify the associated EM transient, AT 2017gfo, 11 hours after the GW signal. Fast analysis of GW data is critical for directing time-sensitive EM observations; however, due to challenges arising from the length and complexity of signals, it is often necessary to make approximations that sacrifice accuracy. Here, we present a machine learning framework that performs complete BNS inference in just one second without making any such approximations. Our approach enhances multi-messenger observations by providing (i) accurate localization even before the merger; (ii) improved localization precision by $\sim30\%$ compared to approximate low-latency methods; and (iii) detailed information on luminosity distance, inclination, and masses, which can be used to prioritize expensive telescope time. Additionally, the flexibility and reduced cost of our method open new opportunities for equation-of-state studies. Finally, we demonstrate that our method scales to extremely long signals, up to an hour in length, thus serving as a blueprint for data analysis for next-generation ground- and space-based detectors.	翻訳日:2024-08-05 15:50:45 公開日:2024-08-02
# 大規模言語モデルの LoRA に関する調査 A Survey on LoRA of Large Language Models ( http://arxiv.org/abs/2407.11046v2 ) ライセンス: Link先を確認	Yuren Mao, Yuhang Ge, Yijiang Fan, Wenyi Xu, Yu Mi, Zhonghao Hu, Yunjun Gao,	(参考訳) Low-Rank Adaptation~(LoRA)は、高密度ニューラルネットワーク層をプラグ可能な低ランク行列で更新する、パラメータ効率の良い微調整パラダイムの1つである。さらに、クロスタスクの一般化とプライバシ保護において大きな利点がある。したがって、LoRAは近年注目を集めており、関連する文献の数は指数関数的な成長を示している。 LoRAの現状を概観する必要がある。本調査は,(1)ダウンストリーム適応の改善による下流タスクの性能向上,(2)複数のLoRAプラグインを混合してタスク間一般化を実現するクロスタスク一般化手法,(3)LoRAの計算効率を高める効率改善手法,(4)LoRAをフェデレート学習に使用するデータプライバシ保護手法,(5)アプリケーションの観点から,進捗を分類し,レビューする。また,本調査では今後の方向性についても論じる。最後に、読者にGithubページ(https://github.com/ZJU-LLMs/Awesome-LoRAs.git)を提供し、この調査論文の更新を確認し、議論を開始する。 Low-Rank Adaptation~(LoRA), which updates the dense neural network layers with pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning paradigms. Furthermore, it has significant advantages in cross-task generalization and privacy-preserving. Hence, LoRA has gained much attention recently, and the number of related literature demonstrates exponential growth. It is necessary to conduct a comprehensive overview of the current progress on LoRA. This survey categorizes and reviews the progress from the perspectives of (1) downstream adaptation improving variants that improve LoRA's performance on downstream tasks; (2) cross-task generalization methods that mix multiple LoRA plugins to achieve cross-task generalization; (3) efficiency-improving methods that boost the computation-efficiency of LoRA; (4) data privacy-preserving methods that use LoRA in federated learning; (5) application. Besides, this survey also discusses the future directions in this field. At last, we provide a Github page (https://github.com/ZJU-LLMs/Awesome-LoRAs.git) for readers to check the updates and initiate discussions on this survey paper.	翻訳日:2024-08-05 15:50:45 公開日:2024-08-02
# 学習した画像の圧縮を再考する Rethinking Learned Image Compression: Context is All You Need ( http://arxiv.org/abs/2407.11590v3 ) ライセンス: Link先を確認	Jixiang Luo,	(参考訳) 近年,licは従来の手法と比較して急速に進歩しているため,本稿では「Learred Image Compression(lic)の境界はどこにあるのか?」という疑問を論じる。以上の問題を2つのサブプロブレムに分割する: 1)PSNRの速度歪み性能の境界は何か? 2) 圧縮ゲインをさらに改善し、境界を達成するにはどうすればいいのか? そこで本研究では,エンコーダ,デコーダ,コンテキストモデルのスケーリングパラメータの有効性を解析する。そして、licのスケーリングは、lic内のコンテキストモデルとデコーダのスケーリングである、と結論付けます。大規模な実験は、オーバーフィッティングが実際に効果的な文脈として機能することを示した。文脈を最適化することにより、PSNRをさらに改善し、最先端のパフォーマンスを実現し、VVCよりもBD-RATEの方が14.39%向上したことを示す。 Since LIC has made rapid progress recently compared to traditional methods, this paper attempts to discuss the question about 'Where is the boundary of Learned Image Compression(LIC)?'. Thus this paper splits the above problem into two sub-problems:1)Where is the boundary of rate-distortion performance of PSNR? 2)How to further improve the compression gain and achieve the boundary? Therefore this paper analyzes the effectiveness of scaling parameters for encoder, decoder and context model, which are the three components of LIC. Then we conclude that scaling for LIC is to scale for context model and decoder within LIC. Extensive experiments demonstrate that overfitting can actually serve as an effective context. By optimizing the context, this paper further improves PSNR and achieves state-of-the-art performance, showing a performance gain of 14.39% with BD-RATE over VVC.	翻訳日:2024-08-05 15:50:45 公開日:2024-08-02
# CCVA-FL:医療画像のための適応的フェデレーション学習 CCVA-FL: Cross-Client Variations Adaptive Federated Learning for Medical Imaging ( http://arxiv.org/abs/2407.11652v3 ) ライセンス: Link先を確認	Sunny Gupta, Amit Sethi,	(参考訳) Federated Learning(FL)は、分散データ上でモデルをトレーニングするためのプライバシ保護アプローチを提供する。医療におけるそのポテンシャルは重要であるが、制限されたアノテーションによって悪化する医療画像データの横断的変動によって、課題が生じる。本稿では,これらの問題に対処するため,CCVA-FL(Cross-Client Variations Adaptive Federated Learning)を提案する。 CCVA-FLは、画像を共通の特徴空間に変換することで、クロスクライアントの変動を最小限にすることを目的としている。各クライアントからのイメージのサブセットを専門的にアノテーションし、続いてターゲットとして最もデータ複雑性の低いクライアントを選択する。次に、ターゲットクライアントの注釈付き画像に基づいて、変換器付きスケーラブル拡散モデル(DiT)を用いて合成医療画像を生成する。これらの合成画像は多様性を捉え、元のデータを表現し、他のクライアントと共有する。各クライアントは、画像から画像への変換を使用して、そのローカル画像を対象のイメージ空間に変換する。翻訳された画像は、その後、サーバモデルを開発するための連合学習設定で使用される。その結果、CCVA-FLはプライバシーを損なうことなく、クライアント間でのデータ分散の違いを効果的に解決することで、Vanilla Federated Averagingよりも優れていることが示された。 Federated Learning (FL) offers a privacy-preserving approach to train models on decentralized data. Its potential in healthcare is significant, but challenges arise due to cross-client variations in medical image data, exacerbated by limited annotations. This paper introduces Cross-Client Variations Adaptive Federated Learning (CCVA-FL) to address these issues. CCVA-FL aims to minimize cross-client variations by transforming images into a common feature space. It involves expert annotation of a subset of images from each client, followed by the selection of a client with the least data complexity as the target. Synthetic medical images are then generated using Scalable Diffusion Models with Transformers (DiT) based on the target client's annotated images. These synthetic images, capturing diversity and representing the original data, are shared with other clients. Each client then translates its local images into the target image space using image-to-image translation. The translated images are subsequently used in a federated learning setting to develop a server model. Our results demonstrate that CCVA-FL outperforms Vanilla Federated Averaging by effectively addressing data distribution differences across clients without compromising privacy.	翻訳日:2024-08-05 15:50:45 公開日:2024-08-02
# GenRC: スパースイメージコレクションから生成した3Dルームコンプリート GenRC: Generative 3D Room Completion from Sparse Image Collections ( http://arxiv.org/abs/2407.12939v3 ) ライセンス: Link先を確認	Ming-Feng Li, Yueh-Feng Ku, Hong-Xuan Yen, Chi Liu, Yu-Lun Liu, Albert Y. C. Chen, Cheng-Hao Kuo, Min Sun,	(参考訳) 特に、シーン全体を通して一貫したテクスチャやジオメトリーを考える場合、スパースRGBDシーンの完成は難しい課題である。人間の設計したテキストプロンプトやカメラトラジェクトリに依存する既存のソリューションとは違って,高忠実度テクスチャを備えた部屋規模の3Dメッシュを実現するための,自動トレーニングフリーパイプラインであるGenRCを提案する。これを実現するために、まず、スパースRGBD画像を高度に不完全な3Dメッシュに投影する。空白を埋めるために新しいビューを反復的に生成する代わりに,提案したE-Diffusionを用いて,大域的幾何学と外観整合性を保証するビュー一貫性パノラマRGBD画像を生成する。さらに,人間設計のテキストプロンプトを置き換えるために,テキスト変換による入力出力シーンのスタイリスティックな整合性を維持する。データセット間のドメインギャップを埋めるために、E-Diffusionは大規模なデータセットでトレーニングされたモデルを活用して、さまざまな外観を生成する。 GenRCは、ScanNetとARKitScenesデータセットにおいて、これらのデータセットや事前に定義されたカメラトラジェクトリを使用してトレーニングされていないにもかかわらず、ほとんどの外観と幾何学的メトリクスの下で最先端の手法よりも優れています。プロジェクトページ:https://minfenli.github.io/GenRC Sparse RGBD scene completion is a challenging task especially when considering consistent textures and geometries throughout the entire scene. Different from existing solutions that rely on human-designed text prompts or predefined camera trajectories, we propose GenRC, an automated training-free pipeline to complete a room-scale 3D mesh with high-fidelity textures. To achieve this, we first project the sparse RGBD images to a highly incomplete 3D mesh. Instead of iteratively generating novel views to fill in the void, we utilized our proposed E-Diffusion to generate a view-consistent panoramic RGBD image which ensures global geometry and appearance consistency. Furthermore, we maintain the input-output scene stylistic consistency through textual inversion to replace human-designed text prompts. To bridge the domain gap among datasets, E-Diffusion leverages models trained on large-scale datasets to generate diverse appearances. GenRC outperforms state-of-the-art methods under most appearance and geometric metrics on ScanNet and ARKitScenes datasets, even though GenRC is not trained on these datasets nor using predefined camera trajectories. Project page: https://minfenli.github.io/GenRC	翻訳日:2024-08-05 15:50:45 公開日:2024-08-02
# 生成型AIと大規模言語モデルの最近の進歩:現状,課題,展望 Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives ( http://arxiv.org/abs/2407.14962v3 ) ライセンス: Link先を確認	Desta Haileselassie Hagos, Rick Battle, Danda B. Rawat,	(参考訳) 生成人工知能(AI)とLarge Language Models(LLMs)の出現は、さまざまなドメインに革命をもたらす前例のない機能を導入し、自然言語処理(NLP)の新しい時代を象徴している。本稿では,これらの最先端技術の現状を概観し,その顕著な進歩と広範囲な応用を実証する。本稿では,ジェネレーティブAIとLLMの進化途上における技術的基盤,実践的応用,新たな課題に関する総合的な視点の提供に寄与する。我々は、AIシステムの生成能力とLLMの特定のコンテキストを理解することは、研究者、実践者、政策立案者にとって、これらの技術の責任と倫理的統合を様々な領域に協調的に形成することが不可欠であると考えている。さらに、主要な研究ギャップを特定し、対処し、AI研究コミュニティにおける将来の研究成果をガイドするための貴重な洞察を提供する。 The emergence of Generative Artificial Intelligence (AI) and Large Language Models (LLMs) has marked a new era of Natural Language Processing (NLP), introducing unprecedented capabilities that are revolutionizing various domains. This paper explores the current state of these cutting-edge technologies, demonstrating their remarkable advancements and wide-ranging applications. Our paper contributes to providing a holistic perspective on the technical foundations, practical applications, and emerging challenges within the evolving landscape of Generative AI and LLMs. We believe that understanding the generative capabilities of AI systems and the specific context of LLMs is crucial for researchers, practitioners, and policymakers to collaboratively shape the responsible and ethical integration of these technologies into various domains. Furthermore, we identify and address main research gaps, providing valuable insights to guide future research endeavors within the AI research community.	翻訳日:2024-08-05 15:50:45 公開日:2024-08-02
# サーキットブレーカのロバストアライメントの再検討 Revisiting the Robust Alignment of Circuit Breakers ( http://arxiv.org/abs/2407.15902v2 ) ライセンス: Link先を確認	Leo Schwinn, Simon Geisler,	(参考訳) 過去10年間で、敵の攻撃に対するモデル堅牢性を高める数少ない信頼性の高い方法(Szegedy et al , 2014 Madry et al , 2018, Xhonneux et al , 2024)の1つとして、敵の訓練が登場した。近年,LLMの整合性を示す新たな防御機構として,回路ブレーカー(Zou et al , 2024)が提案されている。本報告では,入力トークンの埋め込み空間における非拘束的連続攻撃に対する「回路ブレーカーによるアライメントとロバスト性の向上」のロバスト性は過大評価される可能性があることを示す(Zou et al , 2024]。具体的には、スペースアタック(Schwinn et al , 2024a, b]にいくつかの簡単な変更を加えることで、サーキットブレーカモデルに対する100%アタック成功率(ASR)を達成できることを実証する。それ以上のハイパーパラメータチューニングを行なわなければ、これらの調整は元の評価と比べてASRを80%以上増加させる。 https://github.com/SchwinnL/circuit-breakers-eval Over the past decade, adversarial training has emerged as one of the few reliable methods for enhancing model robustness against adversarial attacks [Szegedy et al., 2014, Madry et al., 2018, Xhonneux et al., 2024], while many alternative approaches have failed to withstand rigorous subsequent evaluations. Recently, an alternative defense mechanism, namely "circuit breakers" [Zou et al., 2024], has shown promising results for aligning LLMs. In this report, we show that the robustness claims of "Improving Alignment and Robustness with Circuit Breakers" against unconstraint continuous attacks in the embedding space of the input tokens may be overestimated [Zou et al., 2024]. Specifically, we demonstrate that by implementing a few simple changes to embedding space attacks [Schwinn et al., 2024a,b], we achieve 100% attack success rate (ASR) against circuit breaker models. Without conducting any further hyperparameter tuning, these adjustments increase the ASR by more than 80% compared to the original evaluation. Code is accessible at: https://github.com/SchwinnL/circuit-breakers-eval	翻訳日:2024-08-05 15:50:45 公開日:2024-08-02
# 適応的勾配正規化法 An Adaptive Gradient Regularization Method ( http://arxiv.org/abs/2407.16944v3 ) ライセンス: Link先を確認	Huixiu Jiang, Ling Yang, Yu Bao, Rutong Si,	(参考訳) 最適化は、高い効率とパフォーマンスを持つニューラルネットワークトレーニングにおいて重要な役割を果たす。勾配に基づく重み更新は、オプティマイザの中心部分である。重みと勾配の正規化および標準化操作は、トレーニングプロセスを加速し、ウェイト標準化(WS)、ウェイト正規化(WN)、勾配正規化(GN)などの性能を向上させることが示されている。本研究では,任意の次元の勾配ベクトルを係数ベクトルとして正規化し,バニラ勾配によって勾配とその係数ベクトルの積を減算する勾配ベクトルの勾配等級に基づく新しい最適化手法を提案する。これは適応的な勾配クリッピング法と見なすことができる。 AGRは、より安定したトレーニングプロセスとより優れた一般化性能により、損失関数リプシッツネスを改善することができることを示す。 AGRは3行のコードだけで、AdanやAdamWといったバニラオプティマイザに組み込むことができる。実験は画像生成,画像分類,言語表現において行われ,AGRがトレーニング結果を改善することを示す。 Optimizer plays an important role in neural network training with high efficiency and performance. Weight update based on its gradient is the central part of the optimizer. It has been shown that normalization and standardization operation on weight and gradient can accelerate the training process and improve performance such as Weight Standardization (WS), weight normalization (WN) and gradient normalization (GN); there is also gradient centralization (GC). In this work, we introduce a new optimization technique based on the gradient magnitude in a gradient vector named adaptive gradient regularization (AGR), which normalizes the gradient vector in all dimensions as a coefficient vector and subtracts the product of the gradient and its coefficient vector by the vanilla gradient. It can be viewed as an adaptive gradient clipping method. We show that the AGR can improve the loss function Lipschitzness with a more stable training process and better generalization performance. AGR is very simple to be embedded into vanilla optimizers such as Adan and AdamW with only three lines of code. Our experiments are conducted in image generation, image classification and language representation, which shows that our AGR improves the training result.	翻訳日:2024-08-05 15:50:45 公開日:2024-08-02
# 表現論的多重性のための量子アルゴリズム Quantum Algorithms for Representation-Theoretic Multiplicities ( http://arxiv.org/abs/2407.17649v2 ) ライセンス: Link先を確認	Martin Larocca, Vojtech Havlicek,	(参考訳) Kostka, Littlewood-Richardson, Plethysm および Kronecker 係数は、既約の制限と積における対称群の既約表現(不規則)の多重性である。それらは表現論において重要な役割を担い、計算が難しいことで知られている。表現の次元の比が多項式であれば、これらの係数を効率的に計算する量子アルゴリズムを与える。コストカ数は組合せ解釈を許容するので、多項式有界コストカ数に対する効率的な古典的アルゴリズムと、リトルウッド・リチャードソン係数に対する同様のアルゴリズムの存在が示される。同じ古典的アルゴリズムがプレトヒズム係数やクロネッカー係数に対して直接作用しない理由を論じ、我々の量子アルゴリズムが計算の困難さをいかに回避するかを証明し、この問題がいくつかの入力における超多項式量子スピードアップに繋がるかを推測する。最終的にフロベニウスの相互性を用いて別の量子アルゴリズムを導出し、誘導法を用いてこれらの係数を推定し、異なるコスト対インプット依存を持つ。 Kostka, Littlewood-Richardson, Plethysm and Kronecker coefficients are multiplicities of irreducible representations (irreps) of the symmetric group in restrictions and products of irreps. They play an important role in representation theory and are notoriously hard to compute. We give quantum algorithms that efficiently compute these coefficients whenever the ratio of dimensions of the representations is polynomial. Using that the Kostka numbers admit combinatorial interpretation, we show that there is an efficient classical algorithm for polynomially-bounded Kostka numbers and conjecture existence of a similar algorithm for the Littlewood-Richardson coefficients. We argue why the same classical algorithm does not straightforwardly work for the Plethysm and Kronecker coefficients, give evidence on how our quantum algorithm may avoid some hardness obstructions in their computation, and conjecture that the problem could lead to superpolynomial quantum speedups on some inputs. We finally use Frobenius reciprocity to derive another quantum algorithm that estimates these coefficients using induction and has a different cost-to-input dependence.	翻訳日:2024-08-05 15:50:45 公開日:2024-08-02
# データと知識の組み合わせの力:GPT-4oは肺癌リンパ節転移の予測に機械学習モデルを効果的に解釈する The Power of Combining Data and Knowledge: GPT-4o is an Effective Interpreter of Machine Learning Models in Predicting Lymph Node Metastasis of Lung Cancer ( http://arxiv.org/abs/2407.17900v3 ) ライセンス: Link先を確認	Danqing Hu, Bing Liu, Xiaofeng Zhu, Nan Wu,	(参考訳) リンパ節転移 (LNM) は肺癌患者の早期治療を決定する重要な因子であるが, 正確な術前診断は困難である。近年,大きな言語モデル (LLM) が注目されている。巨大なコーパスから学んだ広範な医学知識を活用して、LLMは臨床上の問題に対する確率を推定できるが、その性能は歴史的にデータ駆動機械学習モデルよりも劣っている。本稿では,LNM予測性能を向上させるために,LLMが取得した医療知識と機械学習モデルが同定した潜伏パターンを組み合わせた新しいアンサンブル手法を提案する。当初,患者データを用いた機械学習モデルを開発した。次に、患者データを機械学習モデルから予測される確率と統合するプロンプトテンプレートを設計した。その後,OpenAIが開発した最も先進的なLCMであるGPT-4oに,患者データに基づいてLNMの確率を推定し,機械学習出力を用いて推定を調整するように指示した。最後に,同じプロンプトを用いてGPT-4oから3つのアウトプットを収集し,これらの結果を最終予測としてアンサンブルした。提案手法を用いて,LNM予測におけるAUC値0.765,AP値0.415を達成し,ベースライン機械学習モデルと比較して予測性能を著しく向上させた。実験の結果, GPT-4oは, より正確なLNM予測を実現するために, 機械学習モデルによって予測される医療知識と確率を効果的に活用できることが示唆された。これらの結果から,LSMは臨床リスク予測タスクにおいて良好に機能し,臨床リスク予測に医療知識と患者データを統合するための新たなパラダイムを提供することが明らかとなった。 Lymph node metastasis (LNM) is a crucial factor in determining the initial treatment for patients with lung cancer, yet accurate preoperative diagnosis of LNM remains challenging. Recently, large language models (LLMs) have garnered significant attention due to their remarkable text generation capabilities. Leveraging the extensive medical knowledge learned from vast corpora, LLMs can estimate probabilities for clinical problems, though their performance has historically been inferior to data-driven machine learning models. In this paper, we propose a novel ensemble method that combines the medical knowledge acquired by LLMs with the latent patterns identified by machine learning models to enhance LNM prediction performance. Initially, we developed machine learning models using patient data. We then designed a prompt template to integrate the patient data with the predicted probability from the machine learning model. Subsequently, we instructed GPT-4o, the most advanced LLM developed by OpenAI, to estimate the likelihood of LNM based on patient data and then adjust the estimate using the machine learning output. Finally, we collected three outputs from the GPT-4o using the same prompt and ensembled these results as the final prediction. Using the proposed method, our models achieved an AUC value of 0.765 and an AP value of 0.415 for LNM prediction, significantly improving predictive performance compared to baseline machine learning models. The experimental results indicate that GPT-4o can effectively leverage its medical knowledge and the probabilities predicted by machine learning models to achieve more accurate LNM predictions. These findings demonstrate that LLMs can perform well in clinical risk prediction tasks, offering a new paradigm for integrating medical knowledge and patient data in clinical predictions.	翻訳日:2024-08-05 15:50:45 公開日:2024-08-02
# 視覚変換器における奥行きの畳み込み Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets ( http://arxiv.org/abs/2407.19394v3 ) ライセンス: Link先を確認	Tianxiao Zhang, Wenju Xu, Bo Luo, Guanghui Wang,	(参考訳) Vision Transformer (ViT)はTransformerのエンコーダを利用して、イメージをパッチに分割することでグローバル情報をキャプチャし、様々なコンピュータビジョンタスクで優れたパフォーマンスを達成する。しかし、ViTの自己注意機構は、画像やビデオの隣り合うピクセル間の固有の関係を見渡すことで、グローバルなコンテキストを最初から捉えている。トランスフォーマーは主に、細かなローカルの詳細を無視しながら、グローバル情報に焦点を当てている。その結果、ViTは画像やビデオデータセットのトレーニング中に誘導バイアスを欠いている。対照的に、畳み込みニューラルネットワーク(CNN)は、局所的なフィルタに依存するため、固有の帰納バイアスを持ち、より少ないデータでViTよりも効率的で高速に収束する。本稿では,トランスフォーマーブロック全体をバイパスして,ローカルおよびグローバル両方の情報を最小限のオーバーヘッドで捕捉する,軽量なDepth-Wise ConvolutionモジュールをViTモデルのショートカットとして提案する。さらに、Depth-Wise Convolutionモジュールをパラメータセーブのために複数のTransformerブロックに適用し、異なるカーネルで独立した並列Depth-Wise Convolutionモジュールを組み込むことにより、ローカル情報の取得を促進する。提案手法は,画像分類のためのCIFAR-10, CIFAR-100, Tiny-ImageNet, ImageNet, オブジェクト検出およびインスタンスセグメント化のためのCOCOにおいて評価され, 画像分類, オブジェクト検出, インスタンスセグメント化におけるViTモデルの性能を大幅に向上させる。ソースコードはhttps://github.com/ZTX-100/Efficient_ViT_with_DWでアクセスできる。 The Vision Transformer (ViT) leverages the Transformer's encoder to capture global information by dividing images into patches and achieves superior performance across various computer vision tasks. However, the self-attention mechanism of ViT captures the global context from the outset, overlooking the inherent relationships between neighboring pixels in images or videos. Transformers mainly focus on global information while ignoring the fine-grained local details. Consequently, ViT lacks inductive bias during image or video dataset training. In contrast, convolutional neural networks (CNNs), with their reliance on local filters, possess an inherent inductive bias, making them more efficient and quicker to converge than ViT with less data. In this paper, we present a lightweight Depth-Wise Convolution module as a shortcut in ViT models, bypassing entire Transformer blocks to ensure the models capture both local and global information with minimal overhead. Additionally, we introduce two architecture variants, allowing the Depth-Wise Convolution modules to be applied to multiple Transformer blocks for parameter savings, and incorporating independent parallel Depth-Wise Convolution modules with different kernels to enhance the acquisition of local information. The proposed approach significantly boosts the performance of ViT models on image classification, object detection and instance segmentation by a large margin, especially on small datasets, as evaluated on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet for image classification, and COCO for object detection and instance segmentation. The source code can be accessed at https://github.com/ZTX-100/Efficient_ViT_with_DW.	翻訳日:2024-08-05 15:40:20 公開日:2024-08-02
# X線画像における領域適応肺結節検出 Domain Adaptive Lung Nodule Detection in X-ray Image ( http://arxiv.org/abs/2407.19397v2 ) ライセンス: Link先を確認	Haifeng Zhao, Lixiang Jiang, Leilei Ma, Dengdi Sun, Yanping Fu,	(参考訳) 異なる医療センターの医療画像は様々なデータ分布を示しており、トレーニングと適用フェーズのドメインシフトによって肺結節の検出に適応する上で重要な課題となっている。従来の教師なしドメイン適応検出手法は、しばしばこのシフトに苦慮し、最適以下の結果をもたらす。これらの課題を克服するために,教師の自己学習とコントラスト学習を活用した,肺結節検出のための新しい領域適応アプローチを提案する。まず、結節表現を洗練させ、結節と背景の区別を強化する階層的コントラスト学習戦略を提案する。第二に、異なるドメインをまたいだ対角学習を通じて、ドメイン不変の特徴を捉えるために、nodule-level domain-invariant feature learning (NDL)モジュールを導入する。また,肺結節検出研究の進展を支援するために,X線画像の注釈付きデータセットを提案する。複数のX線データセットで行った大規模な実験は、ドメインシフトの影響を緩和するためのアプローチの有効性を示した。 Medical images from different healthcare centers exhibit varied data distributions, posing significant challenges for adapting lung nodule detection due to the domain shift between training and application phases. Traditional unsupervised domain adaptive detection methods often struggle with this shift, leading to suboptimal outcomes. To overcome these challenges, we introduce a novel domain adaptive approach for lung nodule detection that leverages mean teacher self-training and contrastive learning. First, we propose a hierarchical contrastive learning strategy to refine nodule representations and enhance the distinction between nodules and background. Second, we introduce a nodule-level domain-invariant feature learning (NDL) module to capture domain-invariant features through adversarial learning across different domains. Additionally, we propose a new annotated dataset of X-ray images to aid in advancing lung nodule detection research. Extensive experiments conducted on multiple X-ray datasets demonstrate the efficacy of our approach in mitigating domain shift impacts.	翻訳日:2024-08-05 15:40:20 公開日:2024-08-02
# XLIP:医療用言語画像事前学習のためのクロスモーダル・アテンション・マスクド・モデリング XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training ( http://arxiv.org/abs/2407.19546v2 ) ライセンス: Link先を確認	Biao Wu, Yutong Xie, Zeyu Zhang, Minh Hieu Phan, Qi Chen, Ling Chen, Qi Wu,	(参考訳) 医療分野におけるVLP(Vision-and-Language Pretraining)は、画像テキストペアのコントラスト学習を利用して、タスク間の効果的な伝達を実現する。しかし、現在のVLPアプローチでは、医療領域に適用する場合、マスク付きモデリング戦略が2つの課題に直面している。第一に、現在のモデルは、医療データの不足のため、重要な病理的特徴を正確に再構築するのに苦労している。第二に、ほとんどのメソッドはペア化された画像テキストまたはイメージのみのデータのみを採用しており、ペア化されたデータとペアなしのデータの組み合わせを利用できない。そこで本稿では,XLIP(Masked modelling for Medical Language-Image Pre-Training)フレームワークを提案する。まず、マルチモーダルな特徴の相互作用によって、病理的視覚的およびテキスト的トークンを再構築し、医療的特徴を改善することを学ぶ、AttMIM(AttMIM)とエンティティ駆動型マスキング言語モデリングモジュール(EntMLM)を紹介する。 AttMIMモジュールは、テキスト機能に非常に反応する画像機能の一部をマスクする。これにより、XLIPは医療効率において、非常に類似した画像データの再構成を改善することができる。第2に、XLIPは、病原性プロンプトを導入してマルチモーダル学習を強化するために、不適切なデータを大まかに活用する。実験結果から,XLIPは5つのデータセットに対してゼロショットおよび微調整による分類性能のSOTAを実現することがわかった。私たちのコードはhttps://github.com/White65534/XLIPで利用可能です。 Vision-and-language pretraining (VLP) in the medical field utilizes contrastive learning on image-text pairs to achieve effective transfer across tasks. Yet, current VLP approaches with the masked modelling strategy face two challenges when applied to the medical domain. First, current models struggle to accurately reconstruct key pathological features due to the scarcity of medical data. Second, most methods only adopt either paired image-text or image-only data, failing to exploit the combination of both paired and unpaired data. To this end, this paper proposes a XLIP (Masked modelling for medical Language-Image Pre-training) framework to enhance pathological learning and feature learning via unpaired data. First, we introduce the attention-masked image modelling (AttMIM) and entity-driven masked language modelling module (EntMLM), which learns to reconstruct pathological visual and textual tokens via multi-modal feature interaction, thus improving medical-enhanced features. The AttMIM module masks a portion of the image features that are highly responsive to textual features. This allows XLIP to improve the reconstruction of highly similar image data in medicine efficiency. Second, our XLIP capitalizes unpaired data to enhance multimodal learning by introducing disease-kind prompts. The experimental results show that XLIP achieves SOTA for zero-shot and fine-tuning classification performance on five datasets. Our code will be available at https://github.com/White65534/XLIP	翻訳日:2024-08-05 15:40:20 公開日:2024-08-02
# LLMの自然言語理解 LLMs' Understanding of Natural Language Revealed ( http://arxiv.org/abs/2407.19630v2 ) ライセンス: Link先を確認	Walid S. Saba,	(参考訳) 大規模言語モデル(LLM)は、大規模言語におけるボトムアップ、データ駆動のリバースエンジニアリングにおける大規模な実験の結果である。下流のNLPタスクで多用されているにもかかわらず、LLMは量子化を必要とするタスクやシンボル変数の操作(例えば、計画と問題解決)において推論を行うことができない。しかし,本稿では,LLMの言語理解能力の検証に焦点をあてる。ここで示すように、LLMの言語理解能力は、広く誇張されている。 LLMは人間のようなコヒーレントな言語を生成することが証明されているが、言語理解能力は適切にテストされていない。特に、LLMの言語理解能力は、"テキスト生成"とは逆の操作を実行し、具体的にはテキストのLLMスニペットを入力として与え、LLMの"理解"を問うことで検証されるべきであると考えている。ここで示すように、LLMが言語を真に理解していないことは明らかになるでしょう。 Large language models (LLMs) are the result of a massive experiment in bottom-up, data-driven reverse engineering of language at scale. Despite their utility in a number of downstream NLP tasks, ample research has shown that LLMs are incapable of performing reasoning in tasks that require quantification over and the manipulation of symbolic variables (e.g., planning and problem solving); see for example [25][26]. In this document, however, we will focus on testing LLMs for their language understanding capabilities, their supposed forte. As we will show here, the language understanding capabilities of LLMs have been widely exaggerated. While LLMs have proven to generate human-like coherent language (since that's how they were designed), their language understanding capabilities have not been properly tested. In particular, we believe that the language understanding capabilities of LLMs should be tested by performing an operation that is the opposite of 'text generation' and specifically by giving the LLM snippets of text as input and then querying what the LLM "understood". As we show here, when doing so it will become apparent that LLMs do not truly understand language, beyond very superficial inferences that are essentially the byproduct of the memorization of massive amounts of ingested text.	翻訳日:2024-08-05 15:40:20 公開日:2024-08-02
# CP-Prompt:ドメイン・インクリメンタル連続学習のための構成に基づくクロスモーダル・プロンプト CP-Prompt: Composition-Based Cross-modal Prompting for Domain-Incremental Continual Learning ( http://arxiv.org/abs/2407.21043v2 ) ライセンス: Link先を確認	Yu Feng, Zhen Tian, Yifan Zhu, Zongfu Han, Haoran Luo, Guangwei Zhang, Meina Song,	(参考訳) クロスモーダルドメイン・インクリメンタル・ラーニング(DIL)の鍵となる課題は、学習モデルが古いことを忘れずに、同じタスクの下で異なる特徴分布を持つ新しいデータから継続的に学習できるようにすることである。しかし、ドメイン内知識抽出とドメイン間共通プロンプト戦略が欠如しているため、既存のトップパフォーマンス手法は依然として高い忘れ込み率を引き起こす。本稿では,制約パラメータをトレーニングして,事前学習したモデルに新しいドメインを学習させ,既存の特徴分布を忘れないようにする,シンプルで効果的なフレームワークCP-Promptを提案する。 CP-Promptはドメイン内知識を、多頭部自己注意層にパーソナライズされたプロンプトを合成的に挿入し、共通のプロンプト戦略でドメイン間知識を学ぶ。 CP-Promptは,3つの広く評価されたDILタスクにおいて,最先端のベースラインよりも優れていた。ソースコードはhttps://github.com/dannis97500/CP_Prompt.comで入手できる。 The key challenge of cross-modal domain-incremental learning (DIL) is to enable the learning model to continuously learn from novel data with different feature distributions under the same task without forgetting old ones. However, existing top-performing methods still cause high forgetting rates, by lacking intra-domain knowledge extraction and inter-domain common prompting strategy. In this paper, we propose a simple yet effective framework, CP-Prompt, by training limited parameters to instruct a pre-trained model to learn new domains and avoid forgetting existing feature distributions. CP-Prompt captures intra-domain knowledge by compositionally inserting personalized prompts on multi-head self-attention layers and then learns the inter-domain knowledge with a common prompting strategy. CP-Prompt shows superiority compared with state-of-the-art baselines among three widely evaluated DIL tasks. The source code is available at https://github.com/dannis97500/CP_Prompt.	翻訳日:2024-08-05 15:40:20 公開日:2024-08-02
# ペルソマ:パーソナライズされたSoft ProMptアダプタアーキテクチャ PERSOMA: PERsonalized SOft ProMpt Adapter Architecture for Personalized Language Prompting ( http://arxiv.org/abs/2408.00960v1 ) ライセンス: Link先を確認	Liam Hebert, Krishna Sayana, Ambarish Jash, Alexandros Karatzoglou, Sukhdeep Sodhi, Sumanth Doddapaneni, Yanli Cai, Dima Kuzmin,	(参考訳) ユーザの広範なインタラクション履歴のニュアンスを理解することは、進化するユーザの好みに適応できる正確でパーソナライズされた自然言語システムを構築するための鍵となる。そこで我々は,Personalized Soft Prompt AdapterアーキテクチャであるPERSOMAを紹介した。大規模な言語モデルのパーソナライズされたプロンプトメソッドとは異なり、PERSOMAはユーザ履歴を効率的にキャプチャするための新しいアプローチを提供する。 LLMの入力として埋め込み表現を利用する最近の研究に基づいて、自由形式のテキストとして相互作用を再サンプリングし、圧縮することで、これを実現できる。我々は,様々なアダプタアーキテクチャ,第1ステージサンプリング戦略,LoRAなどのパラメータ効率向上手法,その他パーソナライズ手法を評価することで,我々のアプローチを厳格に検証する。 PERSOMAは,既存の埋め込み技術やテキストプロンプト技術と比較して,大規模かつ複雑なユーザ履歴を扱う能力に優れていた。 Understanding the nuances of a user's extensive interaction history is key to building accurate and personalized natural language systems that can adapt to evolving user preferences. To address this, we introduce PERSOMA, Personalized Soft Prompt Adapter architecture. Unlike previous personalized prompting methods for large language models, PERSOMA offers a novel approach to efficiently capture user history. It achieves this by resampling and compressing interactions as free form text into expressive soft prompt embeddings, building upon recent research utilizing embedding representations as input for LLMs. We rigorously validate our approach by evaluating various adapter architectures, first-stage sampling strategies, parameter-efficient tuning techniques like LoRA, and other personalization methods. Our results demonstrate PERSOMA's superior ability to handle large and complex user histories compared to existing embedding-based and text-prompt-based techniques.	翻訳日:2024-08-05 14:46:34 公開日:2024-08-02
# MIS-ME:土壌水分推定のためのマルチモーダルフレームワーク MIS-ME: A Multi-modal Framework for Soil Moisture Estimation ( http://arxiv.org/abs/2408.00963v1 ) ライセンス: Link先を確認	Mohammed Rakib, Adil Aman Mohammed, Cole Diggins, Sumit Sharma, Jeff Michael Sadler, Tyson Ochsner, Arun Bagavathi,	(参考訳) 土壌水分推定は、灌水、肥料化、収穫のための最適な計画を作成する際に、精密農業を可能にする重要な課題である。気象予報や土壌特性,作物特性といった従来のデータソースから土壌水分を推定するために,統計的および機械学習モデルを利用するのが一般的である。しかし, 土壌水分を推定するために, 地空間画像の利用への関心が高まっている。これらの画像は高解像度の作物の細部を捉えているが、キュレートするのは高価であり、解釈は困難である。スマートフォンが捉えた視覚的手がかりと天気予報による統計データを使って土壌の水分を予測するAI強化ソフトウェアツールを想像してみてほしい。この研究は、土壌水分推定のためのマルチモーダルアプローチを開発するための第一歩である。特に,地上局から撮影した実世界の画像とそれに対応する気象データからなるデータセットをキュレートする。また, 土壌水分推定のためのマルチモーダルフレームワークMIS-ME-Meteorological & Imageベース土壌水分推定器を提案する。我々はMIS-MEが10.79%のMAPEを達成し、気象データに対するMAPEの2.6%、画像データにおけるMAPEの1.5%を削減した。 Soil moisture estimation is an important task to enable precision agriculture in creating optimal plans for irrigation, fertilization, and harvest. It is common to utilize statistical and machine learning models to estimate soil moisture from traditional data sources such as weather forecasts, soil properties, and crop properties. However, there is a growing interest in utilizing aerial and geospatial imagery to estimate soil moisture. Although these images capture high-resolution crop details, they are expensive to curate and challenging to interpret. Imagine, an AI-enhanced software tool that predicts soil moisture using visual cues captured by smartphones and statistical data given by weather forecasts. This work is a first step towards that goal of developing a multi-modal approach for soil moisture estimation. In particular, we curate a dataset consisting of real-world images taken from ground stations and their corresponding weather data. We also propose MIS-ME - Meteorological & Image based Soil Moisture Estimator, a multi-modal framework for soil moisture estimation. Our extensive analysis shows that MIS-ME achieves a MAPE of 10.79%, outperforming traditional unimodal approaches with a reduction of 2.6% in MAPE for meteorological data and 1.5% in MAPE for image data, highlighting the effectiveness of tailored multi-modal approaches.	翻訳日:2024-08-05 14:46:34 公開日:2024-08-02
# ディフェンダー・アタッカーシークエンシャルセキュリティゲームにおける量子応答解析 A Quantal Response Analysis of Defender-Attacker Sequential Security Games ( http://arxiv.org/abs/2408.00964v1 ) ライセンス: Link先を確認	Md Reya Shad Azim, Mustafa Abdallah,	(参考訳) 攻撃者が攻撃を企てている間、攻撃者がサイトを保護する責任を負う2つのサイトと、攻撃者と攻撃者の間のシーケンシャルなゲームに関するシナリオを探索する。各サイトは、攻撃が成功する確率とともに、妥協されたときにディフェンダーの損失値を保持する。ディフェンダーは、各サイトのセキュリティ投資を通じて、これらの可能性を減らすことができる。攻撃の目的は、防御者のセキュリティ投資を考慮して、防御者の期待する損失を最大化するサイトをターゲットにすることである。従来,このようなシナリオにおけるセキュリティ投資について検討してきたが,本稿では,行動経済学において確認されたように,被告が提示する有界合理性の影響について検討した。具体的には、人間が効率的な(純粋な)戦略を選択する際にエラーを犯す量子的行動バイアスについて考察する。逐次ゲームにおける量子応答平衡の存在を実証し、このバイアスが最適なセキュリティ投資の選択にどのように影響するかを分析する。さらに, 行動バイアスのない最適解に比べて, 量的意思決定下での均衡投資の非効率性を定量化する。本研究の主な成果を数値シミュレーションで検証する。 We explore a scenario involving two sites and a sequential game between a defender and an attacker, where the defender is responsible for securing the sites while the attacker aims to attack them. Each site holds a loss value for the defender when compromised, along with a probability of successful attack. The defender can reduce these probabilities through security investments at each site. The attacker's objective is to target the site that maximizes the expected loss for the defender, taking into account the defender's security investments. While previous studies have examined security investments in such scenarios, our work investigates the impact of bounded rationality exhibited by the defender, as identified in behavioral economics. Specifically, we consider quantal behavioral bias, where humans make errors in selecting efficient (pure) strategies. We demonstrate the existence of a quantal response equilibrium in our sequential game and analyze how this bias affects the defender's choice of optimal security investments. Additionally, we quantify the inefficiency of equilibrium investments under quantal decision-making compared to an optimal solution devoid of behavioral biases. We provide numerical simulations to validate our main findings.	翻訳日:2024-08-05 14:46:34 公開日:2024-08-02
# ESGとAIの統合: 総合責任AIアセスメントフレームワーク Integrating ESG and AI: A Comprehensive Responsible AI Assessment Framework ( http://arxiv.org/abs/2408.00965v1 ) ライセンス: Link先を確認	Sung Une Lee, Harsha Perera, Yue Liu, Boming Xia, Qinghua Lu, Liming Zhu,	(参考訳) 人工知能(AI)は、業界全体で広く開発され、採用されている技術である。環境、社会、ガバナンス(ESG)とAI投資を統合することは、倫理的かつ持続可能な技術進歩の確保に不可欠である。特に投資家の視点では、この統合はリスクを軽減するだけでなく、AIイニシアティブをより広範な社会的目標と整合させることで、長期的な価値創造を促進する。しかし、この領域は学術と産業の両方であまり調査されていない。このギャップを埋めるために,28社の企業との関わりから洞察を得て開発され,3つの重要なコンポーネントから構成されるESG-AIフレームワークを導入する。このフレームワークは、業界関係者とのコラボレーションによって開発された、この統合に対する構造化されたアプローチを提供する。 ESG-AIフレームワークは、AIアプリケーションの環境および社会的影響の概要を提供する。さらに、投資家は、構造化されたエンゲージメントと特定のリスク領域の徹底的な評価を通じて、責任あるAIに対する企業のコミットメントを評価することができる。我々は2024年4月にフレームワークとツールキットを公開し、投資コミュニティから大きな注目を集め、肯定的なフィードバックを受けています。本稿では、現実世界の文脈における適用可能性とその倫理的AI投資を導く可能性を示す、フレームワークの各コンポーネントについて詳述する。 Artificial Intelligence (AI) is a widely developed and adopted technology across entire industry sectors. Integrating environmental, social, and governance (ESG) considerations with AI investments is crucial for ensuring ethical and sustainable technological advancement. Particularly from an investor perspective, this integration not only mitigates risks but also enhances long-term value creation by aligning AI initiatives with broader societal goals. Yet, this area has been less explored in both academia and industry. To bridge the gap, we introduce a novel ESG-AI framework, which is developed based on insights from engagements with 28 companies and comprises three key components. The framework provides a structured approach to this integration, developed in collaboration with industry practitioners. The ESG-AI framework provides an overview of the environmental and social impacts of AI applications, helping users such as investors assess the materiality of AI use. Moreover, it enables investors to evaluate a company's commitment to responsible AI through structured engagements and thorough assessment of specific risk areas. We have publicly released the framework and toolkit in April 2024, which has received significant attention and positive feedback from the investment community. This paper details each component of the framework, demonstrating its applicability in real-world contexts and its potential to guide ethical AI investments.	翻訳日:2024-08-05 14:46:34 公開日:2024-08-02
# 自然言語テキストからの動機・感情・行動の関係の自動抽出 Automatic Extraction of Relationships among Motivations, Emotions and Actions from Natural Language Texts ( http://arxiv.org/abs/2408.00966v1 ) ライセンス: Link先を確認	Fei Yang,	(参考訳) 本稿では,自然言語テキストを明示的に付与したモチベーション,感情,行動間の関係を明らかにするためのグラフベースの新しいフレームワークを提案する。有向非巡回グラフは、人間の性質を記述するように設計されている。ナーチュアの信念は、外部の出来事と人間の自然グラフを結びつけるために組み込まれている。大きな言語モデルのパワーのため、アノテーションのリソースは必要ない。 Amazon Fine Foods Reviewsデータセットがコーパスとして使用され、食品関連のモチベーションが重視されている。 92,990個の関係グラフが生成され、そのうち63%が論理的意味を持つ。今後の研究において、最適化方向のエラータイプについてさらなる分析を行う。 We propose a new graph-based framework to reveal relationships among motivations, emotions and actions explicitly given natural language texts. A directed acyclic graph is designed to describe human's nature. Nurture beliefs are incorporated to connect outside events and the human's nature graph. No annotation resources are required due to the power of large language models. Amazon Fine Foods Reviews dataset is used as corpus and food-related motivations are focused. Totally 92,990 relationship graphs are generated, of which 63% make logical sense. We make further analysis to investigate error types for optimization direction in future research.	翻訳日:2024-08-05 14:46:34 公開日:2024-08-02
# LiDARと空中画像から物体の高さを抽出する Extracting Object Heights From LiDAR & Aerial Imagery ( http://arxiv.org/abs/2408.00967v1 ) ライセンス: Link先を確認	Jesus Guerrero,	(参考訳) 本研究は,LiDARと空中画像から物体の高さを抽出する手続き的手法を示す。我々は,LiDARと画像処理の高度と将来について論じる。 SOTAオブジェクトセグメンテーションは、ディープラーニングのバックグラウンドなしでオブジェクトの高さを取得できます。エンジニアは世代間で世界データの追跡と再処理を行う。彼らはこの論文のような古い手続き的手法と、ここで議論した新しい手法を使っています。 SOTAメソッドは分析を超えて、生成AIに移行している。手続き的手法と言語モデルを用いた新しい手法の両方を取り上げる。これには、ポイントクラウド、画像、テキストエンコーディングが含まれており、空間的に認識されたAIを可能にする。 This work shows a procedural method for extracting object heights from LiDAR and aerial imagery. We discuss how to get heights and the future of LiDAR and imagery processing. SOTA object segmentation allows us to take get object heights with no deep learning background. Engineers will be keeping track of world data across generations and reprocessing them. They will be using older procedural methods like this paper and newer ones discussed here. SOTA methods are going beyond analysis and into generative AI. We cover both a procedural methodology and the newer ones performed with language models. These include point cloud, imagery and text encoding allowing for spatially aware AI.	翻訳日:2024-08-05 14:46:34 公開日:2024-08-02
# DNSSEC+:DNSSECのメリットと落とし穴によって動機付けられた拡張DNSスキーム DNSSEC+: An Enhanced DNS Scheme Motivated by Benefits and Pitfalls of DNSSEC ( http://arxiv.org/abs/2408.00968v1 ) ライセンス: Link先を確認	Ali Sadeghi Jahromi, AbdelRahman Abdou, Paul C. van Oorschot,	(参考訳) DNS再帰リゾルバと権限を持ったネームサーバ間のセキュリティ対策がないことは、インライン攻撃とオフパス攻撃の両方によって悪用されている。多くのセキュリティ提案が実際や以前の文献で行われているが、それらは通常、デプロイの障壁や/または不適切なセキュリティ特性に悩まされている。レゾルバとネームサーバの間に広く採用されているセキュリティソリューションがないことは、これらの問題を以前の提案で緩和する新しいスキームを動機付けている。 DNSSECのセキュリティとデプロイ性に対処しつつ、そのメリットを維持しながら、DNSSEC+を提示する。 DNSSEC+は、既存のDNSSECトラストモデルを利用して、ゾーン内のネームサーバを短期間に認可し、ゾーンデータを安全に提供し、DNSレスポンスのリアルタイムセキュリティプロパティを容易にします。名前解決のレイテンシに関しては、DNSSEC+は安全性の低いスキームに匹敵するパフォーマンスを提供する。名前解決のために9つのセキュリティ、プライバシ、デプロイ可能性プロパティを定義し、DNSSEC+がこれらのプロパティをどのように満たしているかを示します。 The absence of security measures between DNS recursive resolvers and authoritative nameservers has been exploited by both inline and off-path attacks. While many security proposals have been made in practice and previous literature, they typically suffer from deployability barriers and/or inadequate security properties. The absence of a broadly adopted security solution between resolvers and nameservers motivates a new scheme that mitigates these issues in previous proposals. We present DNSSEC+, which addresses security and deployability downsides of DNSSEC, while retaining its benefits. DNSSEC+ takes advantage of the existent DNSSEC trust model and authorizes the nameservers within a zone for short intervals to serve the zone data securely, facilitating real-time security properties for DNS responses, without requiring long-term private keys to be duplicated (thus put at risk) on authoritative nameservers. Regarding name resolution latency, DNSSEC+ offers a performance comparable to less secure schemes. We define nine security, privacy, and deployability properties for name resolution, and show how DNSSEC+ fulfills these properties.	翻訳日:2024-08-05 14:46:34 公開日:2024-08-02
# Visible-Thermal Multiple Object Tracking:大規模ビデオデータセットとプログレッシブ・フュージョン・アプローチ Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach ( http://arxiv.org/abs/2408.00969v1 ) ライセンス: Link先を確認	Yabin Zhu, Qianwu Wang, Chenglong Li, Jin Tang, Zhixiang Huang,	(参考訳) 可視・熱赤外データによる相補的な利点は、視覚追跡、セマンティックセグメンテーション、オブジェクト検出など様々なコンピュータビジョンタスクで広く活用されているが、多重物体追跡(MOT)では滅多に研究されていない。本稿では、VT-MOTと呼ばれるMOTのための大規模な可視熱ビデオベンチマークに貢献する。 VT-MOTには以下の利点がある。 1) データは大規模で多様である。 VT-MOTには、582のビデオシーケンスペア、401kフレームペアの監視、ドローン、ハンドヘルドプラットフォームが含まれている。 2) クロスモーダルアライメントは極めて正確である。フレーム単位で空間的および時間的アライメントフレームを実行するために、複数の専門家を招待する。 3)アノテーションは密で高品質である。 VT-MOTには、3.99百万のアノテーションボックスがあり、専門家によって注釈付けされ、二重チェックされている。本研究では,2つのモードの時間的情報と相補的情報を段階的に効果的に融合した,可視光熱MOTのための簡易かつ効果的な追跡フレームワークを設計する。 VT-MOTについて総合実験を行い, 提案手法の有効性と有効性を示した。評価結果と解析結果から,可視光熱的MOTの今後の方向性を示す。このプロジェクトはhttps://github.com/wqw123wqw/PFTrack.comで公開されている。 The complementary benefits from visible and thermal infrared data are widely utilized in various computer vision task, such as visual tracking, semantic segmentation and object detection, but rarely explored in Multiple Object Tracking (MOT). In this work, we contribute a large-scale Visible-Thermal video benchmark for MOT, called VT-MOT. VT-MOT has the following main advantages. 1) The data is large scale and high diversity. VT-MOT includes 582 video sequence pairs, 401k frame pairs from surveillance, drone, and handheld platforms. 2) The cross-modal alignment is highly accurate. We invite several professionals to perform both spatial and temporal alignment frame by frame. 3) The annotation is dense and high-quality. VT-MOT has 3.99 million annotation boxes annotated and double-checked by professionals, including heavy occlusion and object re-acquisition (object disappear and reappear) challenges. To provide a strong baseline, we design a simple yet effective tracking framework, which effectively fuses temporal information and complementary information of two modalities in a progressive manner, for robust visible-thermal MOT. A comprehensive experiment are conducted on VT-MOT and the results prove the superiority and effectiveness of the proposed method compared with state-of-the-art methods. From the evaluation results and analysis, we specify several potential future directions for visible-thermal MOT. The project is released in https://github.com/wqw123wqw/PFTrack.	翻訳日:2024-08-05 14:46:34 公開日:2024-08-02
# META-ANOVA:解釈可能な機械学習のためのスクリーニングインタラクション META-ANOVA: Screening interactions for interpretable machine learning ( http://arxiv.org/abs/2408.00973v1 ) ライセンス: Link先を確認	Yongchan Choi, Seokhun Park, Chanmoo Park, Dongha Kim, Yongdai Kim,	(参考訳) 予測モデルを評価する際に考慮すべきことは2つある。 1つは予測精度、もう1つは解釈可能性である。近年では、アンサンブルベースのモデルやディープニューラルネットワークなど、高性能の予測モデルが数多く開発されている。しかし、これらのモデルは複雑すぎることが多く、その予測を直感的に解釈することは困難である。この解釈の複雑さは、医学、ファイナンス、大学入学などの説明責任を必要とする多くの現実世界の分野での使用を制限する。本研究では,メタアノバと呼ばれる新しい手法を開発し,任意の予測モデルに対して解釈可能なモデルを提供する。 Meta-ANOVAの基本的な考え方は、与えられたブラックボックス予測モデルを機能的ANOVAモデルに変換することである。 Meta-ANOVAの新たな技術的貢献は、与えられたブラックボックスモデルを機能的ANOVAモデルに変換する前に不要な相互作用をスクリーニングする手順である。このスクリーニング手法により、計算困難を伴わずに変換された機能的ANOVAモデルに高次相互作用を組み込むことができる。スクリーニング手順が漸近的に一貫性があることを実証する。合成および実世界のデータセットを用いた様々な実験を通じて,メタアノバの優位性を実証的に実証した。 There are two things to be considered when we evaluate predictive models. One is prediction accuracy,and the other is interpretability. Over the recent decades, many prediction models of high performance, such as ensemble-based models and deep neural networks, have been developed. However, these models are often too complex, making it difficult to intuitively interpret their predictions. This complexity in interpretation limits their use in many real-world fields that require accountability, such as medicine, finance, and college admissions. In this study, we develop a novel method called Meta-ANOVA to provide an interpretable model for any given prediction model. The basic idea of Meta-ANOVA is to transform a given black-box prediction model to the functional ANOVA model. A novel technical contribution of Meta-ANOVA is a procedure of screening out unnecessary interaction before transforming a given black-box model to the functional ANOVA model. This screening procedure allows the inclusion of higher order interactions in the transformed functional ANOVA model without computational difficulties. We prove that the screening procedure is asymptotically consistent. Through various experiments with synthetic and real-world datasets, we empirically demonstrate the superiority of Meta-ANOVA	翻訳日:2024-08-05 14:36:49 公開日:2024-08-02
# 長寿命Metastable-qubitメモリ Long-lived metastable-qubit memory ( http://arxiv.org/abs/2408.00975v1 ) ライセンス: Link先を確認	Xiaoyang Shi, Jasmine Sinanan-Singh, Kyle DeBry, Susanna L. Todaro, Isaac L. Chuang, John Chiaverini,	(参考訳) 量子情報のコヒーレントな保存は多くの量子技術にとって不可欠である。長いコヒーレンス時間は閉じ込められたイオン量子ビットで示されており、典型的には1つのイオンの基底状態における超微細な準位を用いている。しかし、最近の研究では、量子ビットを準安定状態に符号化することで、単一種システムにおける効果的な二重種操作の可能性や、フォールトトレラント量子コンピューティングにおける消去エラー変換といった、量子情報処理にアーキテクチャ上の利点をもたらす可能性が示唆されている。ここでは、捕捉されたイオンの準安定状態における量子状態の長寿命符号化を示す。同調的に同じ種の他のイオンと冷却し、常に消去エラーをモニタリングすることにより、準安定な5D_{5/2}$状態の量子ビットで136(42)秒のコヒーレンス時間を示す。動的デカップリングに基づくノイズスペクトロスコピーによる実験結果に基づくモデルと一致して, 消去誤差が除去されると, メタスタブルレベルのデフォーカスがエラーの原因となることが判明した。 Coherent storage of quantum information is crucial to many quantum technologies. Long coherence times have been demonstrated in trapped-ion qubits, typically using the hyperfine levels within the ground state of a single ion. However, recent research suggests qubits encoded in metastable states could provide architectural benefits for quantum information processing, such as the possibility of effective dual-species operation in a single-species system and erasure-error conversion for fault-tolerant quantum computing. Here we demonstrate long-lived encoding of a quantum state in the metastable states of a trapped ion. By sympathetically cooling with another ion of the same species and constantly monitoring for erasure errors, we demonstrate a coherence time of 136(42) seconds with a qubit encoded in the metastable $5D_{5/2}$ state of a single $^{137}$Ba$^+$ ion. In agreement with a model based on empirical results from dynamical-decoupling-based noise spectroscopy, we find that dephasing of the metastable levels is the dominant source of error once erasure errors are removed.	翻訳日:2024-08-05 14:36:49 公開日:2024-08-02
# グラフマッチングによるドメイン間名前付きエンティティ認識 Cross-domain Named Entity Recognition via Graph Matching ( http://arxiv.org/abs/2408.00981v1 ) ライセンス: Link先を確認	Junhao Zheng, Haibin Chen, Qianli Ma,	(参考訳) クロスドメインのNERは、現実のシナリオにおけるデータの不足から、実用的ながら難しい問題である。一般的なプラクティスは、まず、リッチリソースの汎用ドメインでNERモデルを学習し、その後、モデルを特定のドメインに適応させることである。ドメイン間のエンティティタイプ間のミスマッチの問題により、汎用ドメインにおける幅広い知識は、ターゲットドメイン NER モデルに効果的に転送できない。この目的のために、ラベル関係を確率分布としてモデル化し、ソースとターゲットの両方のラベル空間にラベルグラフを構築する。ラベル構造を用いた文脈表現を強化するため,BERTによる単語埋め込み出力にラベルグラフを融合する。ラベル関係をグラフとして表現することにより、グラフマッチング問題としてクロスドメインNERを定式化する。さらに,本提案手法は事前学習法に適用性が高く,他のドメイン間予測タスクも可能となる可能性が示唆された。 4つのデータセットに対する実験結果から,本手法は一連の移動学習,マルチタスク学習,少数ショット学習よりも優れていた。 Cross-domain NER is a practical yet challenging problem since the data scarcity in the real-world scenario. A common practice is first to learn a NER model in a rich-resource general domain and then adapt the model to specific domains. Due to the mismatch problem between entity types across domains, the wide knowledge in the general domain can not effectively transfer to the target domain NER model. To this end, we model the label relationship as a probability distribution and construct label graphs in both source and target label spaces. To enhance the contextual representation with label structures, we fuse the label graph into the word embedding output by BERT. By representing label relationships as graphs, we formulate cross-domain NER as a graph matching problem. Furthermore, the proposed method has good applicability with pre-training methods and is potentially capable of other cross-domain prediction tasks. Empirical results on four datasets show that our method outperforms a series of transfer learning, multi-task learning, and few-shot learning methods.	翻訳日:2024-08-05 14:36:49 公開日:2024-08-02
# 低次元特徴と注目型ニューラルネットワークを用いたノイズラジオグラフィーからのRichtmyer-Meshkov不安定性の再構築 Reconstructing Richtmyer-Meshkov instabilities from noisy radiographs using low dimensional features and attention-based neural networks ( http://arxiv.org/abs/2408.00985v1 ) ライセンス: Link先を確認	Daniel A. Serino, Marc L. Klasky, Balasubramanya T. Nadiga, Xiaojian Xu, Trevor Wilcox,	(参考訳) 訓練された注意に基づくトランスフォーマーネットワークは、ブラー、散乱、ノイズで劣化した放射線画像から得られた一連の流体力学的特徴から、リッチマイア・メシュコフ不安定によって与えられる複雑なトポロジーを確実に回復することができる。このアプローチは、ICFのような二重貝殻流体力学シミュレーションで実証される。このネットワークの重要なコンポーネントは、ノイズの多いラジオグラフから抽出された一連の特徴に作用するトランスフォーマーエンコーダである。このエンコーダは、入力シーケンスにおける時間的依存関係を学習し、モデルの表現性を高めるために作用する多数の自己注意層を含む。この手法は, ガス-金属界面がラジオグラフィーノイズによって著しく隠蔽されているにもかかわらず, リヒトマイアー-メシュコフ不安定性成長速度を正確に回復する優れた能力を示すことが示されている。 A trained attention-based transformer network can robustly recover the complex topologies given by the Richtmyer-Meshkoff instability from a sequence of hydrodynamic features derived from radiographic images corrupted with blur, scatter, and noise. This approach is demonstrated on ICF-like double shell hydrodynamic simulations. The key component of this network is a transformer encoder that acts on a sequence of features extracted from noisy radiographs. This encoder includes numerous self-attention layers that act to learn temporal dependencies in the input sequences and increase the expressiveness of the model. This approach is demonstrated to exhibit an excellent ability to accurately recover the Richtmyer-Meshkov instability growth rates, even despite the gas-metal interface being greatly obscured by radiographic noise.	翻訳日:2024-08-05 14:36:49 公開日:2024-08-02
# SATによるベイズネットワークの厳密な検証 A SAT-based approach to rigorous verification of Bayesian networks ( http://arxiv.org/abs/2408.00986v1 ) ライセンス: Link先を確認	Ignacy Stępka, Nicholas Gisolfi, Artur Dubrawski,	(参考訳) 機械学習の最近の進歩は、様々な現実世界のアプリケーションで広く採用されている。しかしながら、安全クリティカルなドメインでは、マシンラーニングモデルのデプロイは、その複雑さ、解釈可能性の欠如、行動に関する正式な保証の欠如など、課題によって取り除かれています。本稿では,ベイジアンネットワークに適した検証フレームワークを提案する。本フレームワークは,(1)ベイジアンネットワークをブール論理リテラルに変換する2段階のコンパイルおよび符号化スキームと,(2)これらのリテラルを活用して制約として符号化された様々なプロパティを検証する形式的検証クエリの2つの主要なコンポーネントから構成される。具体的には、if-then Rule(ITR)とFeature monotonicity(FMO)の2つの検証クエリを導入する。検証手法の効率をベンチマークし、実世界のシナリオでその実用性を実証する。 Recent advancements in machine learning have accelerated its widespread adoption across various real-world applications. However, in safety-critical domains, the deployment of machine learning models is riddled with challenges due to their complexity, lack of interpretability, and absence of formal guarantees regarding their behavior. In this paper, we introduce a verification framework tailored for Bayesian networks, designed to address these drawbacks. Our framework comprises two key components: (1) a two-step compilation and encoding scheme that translates Bayesian networks into Boolean logic literals, and (2) formal verification queries that leverage these literals to verify various properties encoded as constraints. Specifically, we introduce two verification queries: if-then rules (ITR) and feature monotonicity (FMO). We benchmark the efficiency of our verification scheme and demonstrate its practical utility in real-world scenarios.	翻訳日:2024-08-05 14:36:49 公開日:2024-08-02
# 有害剤を用いたマルチエージェントシステムのレジリエンスについて On the Resilience of Multi-Agent Systems with Malicious Agents ( http://arxiv.org/abs/2408.00989v1 ) ライセンス: Link先を確認	Jen-tse Huang, Jiaxu Zhou, Tailin Jin, Xuhui Zhou, Zixi Chen, Wenxuan Wang, Youliang Yuan, Maarten Sap, Michael R. Lyu,	(参考訳) 大規模言語モデルを用いたマルチエージェントシステムは、専門家エージェントの協調によって様々なタスクにまたがる優れた能力を示し、それぞれが特定のドメインに焦点を当てている。しかし、エージェントを別々に配置する場合、悪意のあるユーザーが不正または無関係な結果をもたらす悪意のあるエージェントを導入するリスクがある。そこで本研究では,(1) 様々なマルチエージェントシステムのレジリエンス(例えば, A$\rightarrow$B$\rightarrow$C, A$\leftrightarrow$B$\leftrightarrow$C)が,悪質なエージェントの下で,異なる下流タスクに対してどのようなレジリエンスを持つかを検討する。 (2)悪意のあるエージェントに対して、システムレジリエンスを高めるにはどうすればいいのか? 悪意のあるエージェントをシミュレートするために、AutoTransformとAutoInjectという2つのメソッドを考案しました。我々は、コード生成、数学問題、翻訳、テキスト評価という、4つの下流マルチエージェントシステムタスクに関する総合的な実験を行う。その結果、A$\rightarrow$(B$\leftrightarrow$C)という階層的なマルチエージェント構造は、他の2つの構造のうち、46.4\%と49.8\%よりも、最低性能が23.6\%$の優れたレジリエンスを示すことが示唆された。さらに,各エージェントが他のエージェントの出力に挑戦するメッセージやメカニズムをレビューし,修正するための追加エージェントを導入することによって,システムレジリエンスを向上できることを示すことにより,マルチエージェントシステムレジリエンスの向上が期待できることを示す。私たちのコードとデータはhttps://github.com/CUHK-ARISE/MAS-Resilience.comで公開されています。 Multi-agent systems, powered by large language models, have shown great abilities across various tasks due to the collaboration of expert agents, each focusing on a specific domain. However, when agents are deployed separately, there is a risk that malicious users may introduce malicious agents who generate incorrect or irrelevant results that are too stealthy to be identified by other non-specialized agents. Therefore, this paper investigates two essential questions: (1) What is the resilience of various multi-agent system structures (e.g., A$\rightarrow$B$\rightarrow$C, A$\leftrightarrow$B$\leftrightarrow$C) under malicious agents, on different downstream tasks? (2) How can we increase system resilience to defend against malicious agents? To simulate malicious agents, we devise two methods, AutoTransform and AutoInject, to transform any agent into a malicious one while preserving its functional integrity. We run comprehensive experiments on four downstream multi-agent systems tasks, namely code generation, math problems, translation, and text evaluation. Results suggest that the "hierarchical" multi-agent structure, i.e., A$\rightarrow$(B$\leftrightarrow$C), exhibits superior resilience with the lowest performance drop of $23.6\%$, compared to $46.4\%$ and $49.8\%$ of other two structures. Additionally, we show the promise of improving multi-agent system resilience by demonstrating that two defense methods, introducing an additional agent to review and correct messages or mechanisms for each agent to challenge others' outputs, can enhance system resilience. Our code and data are available at https://github.com/CUHK-ARISE/MAS-Resilience.	翻訳日:2024-08-05 14:36:49 公開日:2024-08-02
# 3時間における大規模言語モデルの公平性 Fairness in Large Language Models in Three Hour ( http://arxiv.org/abs/2408.00992v1 ) ライセンス: Link先を確認	Thang Doan Viet, Zichong Wang, Minh Nhat Nguyen, Wenbin Zhang,	(参考訳) 大規模言語モデル (LLMs) は、様々な領域で顕著な成功を収めてきたが、フェアネスの考慮が欠如していることが多い。従来の機械学習の公平さとは異なり、LLMの公正さには独自の背景、分類学、実現技術が含まれる。本チュートリアルは,LLMを紹介する実世界のケーススタディから始まり,それに続くバイアスの原因の分析を通じて,フェアLLMに関する文献の最近の進歩を体系的に概説する。 LLMにおける公平性の概念を考察し、バイアスを評価するための戦略と公正性を促進するために設計されたアルゴリズムを要約する。さらに、ツールキットやデータセットを含むLCMのバイアスを評価するためのリソースがコンパイルされ、この分野における現在の研究課題とオープンな疑問が議論される。リポジトリは \url{https://github.com/LavinWong/Fairness-in-Large-Language-Models} で公開されている。 Large Language Models (LLMs) have demonstrated remarkable success across various domains but often lack fairness considerations, potentially leading to discriminatory outcomes against marginalized populations. Unlike fairness in traditional machine learning, fairness in LLMs involves unique backgrounds, taxonomies, and fulfillment techniques. This tutorial provides a systematic overview of recent advances in the literature concerning fair LLMs, beginning with real-world case studies to introduce LLMs, followed by an analysis of bias causes therein. The concept of fairness in LLMs is then explored, summarizing the strategies for evaluating bias and the algorithms designed to promote fairness. Additionally, resources for assessing bias in LLMs, including toolkits and datasets, are compiled, and current research challenges and open questions in the field are discussed. The repository is available at \url{https://github.com/LavinWong/Fairness-in-Large-Language-Models}.	翻訳日:2024-08-05 14:36:49 公開日:2024-08-02
# ArchCode: 大きな言語モデルでコード生成にソフトウェア要件を組み込む ArchCode: Incorporating Software Requirements in Code Generation with Large Language Models ( http://arxiv.org/abs/2408.00994v1 ) ライセンス: Link先を確認	Hojae Han, Jaejin Kim, Jaeseok Yoo, Youngwon Lee, Seung-won Hwang,	(参考訳) 本稿では,大規模言語モデル(LLM)のコード生成機能を拡張し,テキスト記述から包括的なソフトウェア要件を自動的に管理することを目的とする。このような要件には、機能的(例えば、入力に対する期待された動作を達成する)と非機能的(例えば、時間/空間のパフォーマンス、堅牢性、保守性)の両方が含まれる。しかし、テキストによる記述は、要求を口頭で表現するか、あるいはその一部を省略することもある。 ARCHCODEは、文脈内学習を利用して記述中の要求を整理し、表現されていない要求を外挿する新しいフレームワークである。 ARCHCODEは所定の記述から要求を生成し、コードスニペットとテストケースを生成するように条件付けする。各テストケースは要件の1つに合わせて調整され、その要件に従って実行結果のコンプライアンスに基づいてコードスニペットのランク付けが可能である。パブリックベンチマークによると、ARCHCODEは機能要件を満たすために拡張され、Pass@kスコアが大幅に改善されている。さらに、コード生成におけるLLMの非機能要件を最初に評価したHumanEval-NFRを紹介し、ARCHCODEがベースライン法よりも優れていることを示す。 ARCHCODEとHumanEval-NFRベンチマークの実装はどちらも一般公開されている。 This paper aims to extend the code generation capability of large language models (LLMs) to automatically manage comprehensive software requirements from given textual descriptions. Such requirements include both functional (i.e. achieving expected behavior for inputs) and non-functional (e.g., time/space performance, robustness, maintainability) requirements. However, textual descriptions can either express requirements verbosely or may even omit some of them. We introduce ARCHCODE, a novel framework that leverages in-context learning to organize requirements observed in descriptions and to extrapolate unexpressed requirements from them. ARCHCODE generates requirements from given descriptions, conditioning them to produce code snippets and test cases. Each test case is tailored to one of the requirements, allowing for the ranking of code snippets based on the compliance of their execution results with the requirements. Public benchmarks show that ARCHCODE enhances to satisfy functional requirements, significantly improving Pass@k scores. Furthermore, we introduce HumanEval-NFR, the first evaluation of LLMs' non-functional requirements in code generation, demonstrating ARCHCODE's superiority over baseline methods. The implementation of ARCHCODE and the HumanEval-NFR benchmark are both publicly accessible.	翻訳日:2024-08-05 14:36:49 公開日:2024-08-02
# インシデントネット:スパースセンシングによる交通事故検出・局所化・深刻度推定 IncidentNet: Traffic Incident Detection, Localization and Severity Estimation with Sparse Sensing ( http://arxiv.org/abs/2408.00996v1 ) ライセンス: Link先を確認	Sai Shashank Peddiraju, Kaustubh Harapanahalli, Edward Andert, Aviral Shrivastava,	(参考訳) 交通事故検出の先行技術は、高いセンサーカバレッジに依存しており、主に、表現能力に制限のある決定木とランダムな森林モデルに基づいており、その結果、精度の高いインシデントを検出できない。本稿では,都市環境における疎設置センサから得られたデータに基づいて学習した深層学習モデルを用いて,交通事故の重大度を分類,ローカライズ,推定するための新しいアプローチであるインシデントネットを提案する。本モデルでは,交通交差点に設置したカメラを用いて収集可能な微視的交通データについて検討する。微視的トラフィックの詳細と交通事故の詳細を同時に提供するデータセットが利用できないため、マクロ的なトラフィックデータと一致する合成微視的トラフィックデータセットを生成する方法も提案する。インシデントネットは交通事故検出率98%を達成し、交通交差点の20%未満のカメラを備えた都市環境では、平均197秒で7%未満の誤報率を達成している。 Prior art in traffic incident detection relies on high sensor coverage and is primarily based on decision-tree and random forest models that have limited representation capacity and, as a result, cannot detect incidents with high accuracy. This paper presents IncidentNet - a novel approach for classifying, localizing, and estimating the severity of traffic incidents using deep learning models trained on data captured from sparsely placed sensors in urban environments. Our model works on microscopic traffic data that can be collected using cameras installed at traffic intersections. Due to the unavailability of datasets that provide microscopic traffic details and traffic incident details simultaneously, we also present a methodology to generate a synthetic microscopic traffic dataset that matches given macroscopic traffic data. IncidentNet achieves a traffic incident detection rate of 98%, with false alarm rates of less than 7% in 197 seconds on average in urban environments with cameras on less than 20% of the traffic intersections.	翻訳日:2024-08-05 14:36:49 公開日:2024-08-02
# 安全制約グリッド環境におけるモデルフリータスク適応のための安全な探索戦略 A Safe Exploration Strategy for Model-free Task Adaptation in Safety-constrained Grid Environments ( http://arxiv.org/abs/2408.00997v1 ) ライセンス: Link先を確認	Erfan Entezami, Mahsa Sahebdel, Dhawal Gupta,	(参考訳) モデルのない強化学習エージェントを訓練するには、エージェントが最適なポリシーを探すのに十分な環境を探索することが必要である。安全に制約された環境では、監督されていない探索または非最適政策を利用することで、エージェントは望ましくない状態に陥り、エージェントと環境の両方にとってコストがかかるか有害な結果をもたらす可能性がある。本稿では,モデルフリーエージェントが安全制約に固執しながら環境と対話できるグリッド環境をナビゲートするための新しい探索フレームワークを提案する。我々のフレームワークには事前学習フェーズが含まれており、その間、エージェントは、観測可能な特徴と環境内の特定の安全制約の両方に基づいて、潜在的に安全でない状態を特定することを学習する。その後、二項分類モデルは、同様のダイナミクスを示す新しい環境において、これらの安全でない状態を予測するために訓練される。この訓練された分類器は、モデルフリーエージェントにランダムな探索や準最適政策を採用する状況を決定する権限を与え、その場合、我々のフレームワークは、危険をもたらす可能性を軽減するために、事前に定義された安全なポリシーに従うようエージェントに促す。ランダムに生成された3つのグリッド環境におけるフレームワークの評価を行い、モデルフリーエージェントが新しいタスクに安全に適応し、新しい環境に対する最適なポリシーを学習する方法を実証した。その結果, 適切な安全ポリシーを定義し, 十分に訓練されたモデルを用いて安全でない状態を検出することにより, モデルフリーエージェントが新たなタスクや環境に適応し, 安全性違反が著しく少ないことが示唆された。 Training a model-free reinforcement learning agent requires allowing the agent to sufficiently explore the environment to search for an optimal policy. In safety-constrained environments, utilizing unsupervised exploration or a non-optimal policy may lead the agent to undesirable states, resulting in outcomes that are potentially costly or hazardous for both the agent and the environment. In this paper, we introduce a new exploration framework for navigating the grid environments that enables model-free agents to interact with the environment while adhering to safety constraints. Our framework includes a pre-training phase, during which the agent learns to identify potentially unsafe states based on both observable features and specified safety constraints in the environment. Subsequently, a binary classification model is trained to predict those unsafe states in new environments that exhibit similar dynamics. This trained classifier empowers model-free agents to determine situations in which employing random exploration or a suboptimal policy may pose safety risks, in which case our framework prompts the agent to follow a predefined safe policy to mitigate the potential for hazardous consequences. We evaluated our framework on three randomly generated grid environments and demonstrated how model-free agents can safely adapt to new tasks and learn optimal policies for new environments. Our results indicate that by defining an appropriate safe policy and utilizing a well-trained model to detect unsafe states, our framework enables a model-free agent to adapt to new tasks and environments with significantly fewer safety violations.	翻訳日:2024-08-05 14:36:49 公開日:2024-08-02
# FBSDiff: 高可制御性テキスト駆動画像変換のための拡散機能のプラグアンドプレイ周波数帯域置換 FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation ( http://arxiv.org/abs/2408.00998v1 ) ライセンス: Link先を確認	Xiang Gao, Jiaying Liu,	(参考訳) 大規模テキスト画像拡散モデルは、生成的AIとマルチモーダル技術の進化における画期的なマイルストーンであり、自然言語のテキストプロンプトに基づいた並外れた画像生成を可能にしている。しかし,このようなモデルの制御性の欠如は,参照画像を利用したテキスト・ツー・イメージの合成制御に注目が集まっている実生活コンテンツ作成の実践的適用性を制限している。参照画像と生成された画像との密接な相関から、この問題は、テキスト駆動の画像から画像への変換という、テキストごとの参照画像を操作(あるいは編集)するタスクと見なすこともできる。本稿では,大規模テキスト・ツー・イメージ(T2I)拡散モデルとイメージ・ツー・イメージ(I2I)パラダイムをプラグ・アンド・プレイ方式で適用し,モデルトレーニングやモデル微調整,オンライン最適化などなしに高品質で多目的なテキスト駆動型I2I翻訳を実現する,新しい,簡潔かつ効率的なアプローチを提案する。基準画像を用いてT2I生成を誘導するため、DCTスペクトル空間における拡散特徴の周波数帯域の異なる多様な誘導因子をモデル化し、その逆サンプリング過程に沿った参照画像に対応する拡散特徴の特定のDCT周波数帯域を動的に置換する新しい周波数帯域置換層を考案する。提案手法は,各周波数帯域のタイプと帯域幅を調整し,テキスト駆動型I2I翻訳を基準画像の導出係数と導出強度の両方で柔軟に実現できることを実証する。広汎な質的,定量的実験により,I2I翻訳の視覚的品質,汎用性,制御性に対するアプローチの優位性を検証した。 Large-scale text-to-image diffusion models have been a revolutionary milestone in the evolution of generative AI and multimodal technology, allowing extraordinary image generation based on natural-language text prompts. However, the issue of lacking controllability of such models restricts their practical applicability for real-life content creation, for which attention has been focused on leveraging a reference image to control text-to-image synthesis. Due to the close correlation between the reference image and the generated image, this problem can also be regarded as the task of manipulating (or editing) the reference image as per the text, namely text-driven image-to-image translation. This paper contributes a novel, concise, and efficient approach that adapts the pre-trained large-scale text-to-image (T2I) diffusion model to the image-to-image (I2I) paradigm in a plug-and-play manner, realizing high-quality and versatile text-driven I2I translation without any model training, model fine-tuning, or online optimization process. To guide T2I generation with a reference image, we propose to model diverse guiding factors with correspondingly different frequency bands of diffusion features in the DCT spectral space, and accordingly devise a novel frequency band substitution layer that dynamically substitutes a certain DCT frequency band of the diffusion features with the corresponding counterpart of the reference image along the reverse sampling process. We demonstrate that our method flexibly enables highly controllable text-driven I2I translation both in the guiding factor and guiding intensity of the reference image, simply by tuning the type and bandwidth of the substituted frequency band, respectively. Extensive qualitative and quantitative experiments verify the superiority of our approach over related methods in I2I translation visual quality, versatility, and controllability.	翻訳日:2024-08-05 14:36:49 公開日:2024-08-02
# Community Cellular Networks Coverage Visualizer Community Cellular Networks Coverage Visualizer ( http://arxiv.org/abs/2408.00999v1 ) ライセンス: Link先を確認	Chanwut Kittivorawong, Sirapop Theeranantachai, Nussara Tieanklin, Esther Han Beol Jang, Kurtis Heimerl,	(参考訳) コミュニティの携帯電話ネットワークのボランティアや研究者は、現在、各サイトのネットワークに関する情報にアクセスすることはめったにない。これにより、ネットワークのパフォーマンスを評価したり、激怒やダウンタイムを特定したり、現在の場所を示すことさえ困難になる。本稿では、技術者の作業量を削減し、ネットワークの信頼性を図り、信頼を得るためのパフォーマンスダッシュボードであるCommunity Cellular Networks Coverage Visualizerを提案する。このマップは、現在のCCNの各サイトと将来のCCNの総合的および詳細なパフォーマンスをプライバシを重視した実装で表示し、マルチシリーズラインチャートは、ネットワークのオーバータイム機能を提供することを強調している。近くにより強く信頼性の高い信号がある場所を特定するのに役立つだけでなく、ボランティアやエンジニアが新しいサイトをインストールする最適な場所を判断し、ネットワークの障害を素早く特定する上でも重要なツールとなるでしょう。 The community cellular networks volunteers and researchers currently rarely have an access to information about the networks for each site. This makes it difficult for them to evaluate network performance, identify outrages and downtimes, or even to show the current site locations. In this paper, we propose the Community Cellular Networks Coverage Visualizer, a performance dashboard to help reduce the workload of technicians and gain trust from illustrating the reliability of the networks. The map displays the overall and in-depth performance for each current and future CCNs sites with privacy-focused implementation, while the multi-series line chart emphasizes on providing the capability of network overtime. Not only it will help users identify locations that have stronger and reliable signals nearby, but our applicaiton will also be an essential tool for volunteers and engineers to determine the optimal locations to install a new site and quickly identify possible network failures.	翻訳日:2024-08-05 14:36:49 公開日:2024-08-02
# 階層型マルチ指標予測とベイズ決定による適応型2段階クラウドリソーススケーリング Adaptive Two-Stage Cloud Resource Scaling via Hierarchical Multi-Indicator Forecasting and Bayesian Decision-Making ( http://arxiv.org/abs/2408.01000v1 ) ライセンス: Link先を確認	Yang Luo, Shiyu Wang, Zhemeng Yu, Wei Lu, Xiaofeng Gao, Lintao Ma, Guihai Chen,	(参考訳) 高度な大規模モデルとデータセンターの急速な成長により、クラウドコンピューティングリソースの需要が急増していることは、効率的かつ適応的なリソース割り当ての重要性を浮き彫りにしている。大企業が数千のGPUで大規模なインフラストラクチャをデプロイする中、既存のクラウドプラットフォームは、階層的なインジケータ構造の取得、非ガウス分布のモデリング、不確実性の下での意思決定といった大きな課題のために、依然としてリソース利用の低さに苦慮している。これらの課題に対処するため,適応型階層型アテンションに基づく資源モデリングと意思決定システムであるHRAMONYを提案する。 HARMONYは階層的多指標分布予測と不確実性を考慮したベイズ決定を組み合わせている。複雑なインジケータ間の依存関係を包括的にモデル化し、進化する環境状態に適応可能な正確な予測を可能にする、新しい階層的注意機構を導入している。正規化フローを通してガウス射影を適応非ガウス分布に変換する。重要なことは、HARMONYは適応ベイズ過程における完全な予測分布を利用し、様々な条件下でSLA制約を堅牢に満たしながら資源割り当てを最適化するために不確実性を積極的に取り入れている。 4つの大規模クラウドデータセットにわたる大規模な評価は、HARMONYの最先端のパフォーマンスを示し、9つの確立されたメソッドを著しく上回っている。 1ヶ月にわたる実世界の展開は、HARMONYの実質的な影響を検証し、35,000時間以上のGPUの節約と1K以上のコスト削減を実現し、適応的で不確実性を認識したスケーリングを通じて、その顕著な経済的価値を示した。私たちのコードはhttps://github.com/Floating-LY/HARMONY1.comで利用可能です。 The surging demand for cloud computing resources, driven by the rapid growth of sophisticated large-scale models and data centers, underscores the critical importance of efficient and adaptive resource allocation. As major tech enterprises deploy massive infrastructures with thousands of GPUs, existing cloud platforms still struggle with low resource utilization due to key challenges: capturing hierarchical indicator structures, modeling non-Gaussian distributions, and decision-making under uncertainty. To address these challenges, we propose HRAMONY, an adaptive Hierarchical Attention-based Resource Modeling and Decision-Making System. HARMONY combines hierarchical multi-indicator distribution forecasting and uncertainty-aware Bayesian decision-making. It introduces a novel hierarchical attention mechanism that comprehensively models complex inter-indicator dependencies, enabling accurate predictions that can adapt to evolving environment states. By transforming Gaussian projections into adaptive non-Gaussian distributions via Normalizing Flows. Crucially, HARMONY leverages the full predictive distributions in an adaptive Bayesian process, proactively incorporating uncertainties to optimize resource allocation while robustly meeting SLA constraints under varying conditions. Extensive evaluations across four large-scale cloud datasets demonstrate HARMONY's state-of-the-art performance, significantly outperforming nine established methods. A month-long real-world deployment validated HARMONY's substantial practical impact, realizing over 35,000 GPU hours in savings and translating to $100K+ in cost reduction, showcasing its remarkable economic value through adaptive, uncertainty-aware scaling. Our code is available at https://github.com/Floating-LY/HARMONY1.	翻訳日:2024-08-05 14:36:49 公開日:2024-08-02
# 対数深さの量子キャリアローカヘッドモデュロ$(2^n-1)$アダ A Logarithmic Depth Quantum Carry-Lookahead Modulo $(2^n-1)$ Adder ( http://arxiv.org/abs/2408.01002v1 ) ライセンス: Link先を確認	Bhaskar Gaur, Edgard Muñoz-Coreas, Himanshu Thapliyal,	(参考訳) 量子コンピューティングは、量子暗号、量子画像処理、最適化など、さまざまな分野で量子アルゴリズムを実装することができるマシンの開発に向けて、大きな進歩を遂げている。モジュロ加算のための量子演算回路の開発は、これらの量子アルゴリズムの実装に不可欠である。フォールトトレラントゲートをベースとした量子回路を用いてノイズやデコヒーレンスエラーを克服することは理想的であるが、現在のノイズ中間スケール量子(NISQ)時代の量子コンピュータは、フォールトトレラント設計に関連する追加の計算コストを処理できない。本研究の目的は,ノイズを低減し,NISQマシン上での量子変調加算回路の実装を容易にする回路深さの最小化である。この研究は、量子キャリーヘッドモジュロ$(2^n - 1)$ adder (QCLMA)を示し、2つのnビット番号を受け取り、その加算をO(log n)深さで行うように設計されている。従来のO(n)深度処理と比較して,提案したQCLMAは深度を低減し,ノイズの忠実度向上に寄与する。エラーのレジリエンスを高めるため、私たちは、現在の作業のチェーンベースのCarryパスとは異なり、ツリー構造に基づくCarryパスの作成にも重点を置いています。我々はQuantum Computer IBM Cairoの実験を行い、提案したQCLMAの性能を既存の作業に対して評価し、Quantum State Fidelity Ratio (QSFR)を定義し、正しい出力をトップ出力に量子化する。既存の作業と比較すると,QCLMAは4キュービット変調加算器のQSFRが47.21%増加し,優れたノイズ忠実度を示す。 Quantum Computing is making significant advancements toward creating machines capable of implementing quantum algorithms in various fields, such as quantum cryptography, quantum image processing, and optimization. The development of quantum arithmetic circuits for modulo addition is vital for implementing these quantum algorithms. While it is ideal to use quantum circuits based on fault-tolerant gates to overcome noise and decoherence errors, the current Noisy Intermediate Scale Quantum (NISQ) era quantum computers cannot handle the additional computational cost associated with fault-tolerant designs. Our research aims to minimize circuit depth, which can reduce noise and facilitate the implementation of quantum modulo addition circuits on NISQ machines. This work presents quantum carry-lookahead modulo $(2^n - 1)$ adder (QCLMA), which is designed to receive two n-bit numbers and perform their addition with an O(log n) depth. Compared to existing work of O(n) depth, our proposed QCLMA reduces the depth and helps increase the noise fidelity. In order to increase error resilience, we also focus on creating a tree structure based Carry path, unlike the chain based Carry path of the current work. We run experiments on Quantum Computer IBM Cairo to evaluate the performance of the proposed QCLMA against the existing work and define Quantum State Fidelity Ratio (QSFR) to quantify the closeness of the correct output to the top output. When compared against existing work, the proposed QCLMA achieves a 47.21% increase in QSFR for 4-qubit modulo addition showcasing its superior noise fidelity.	翻訳日:2024-08-05 14:36:49 公開日:2024-08-02
# Piculet: マルチモーダル大言語モデルのための特別モデルガイドによる幻覚の減少 Piculet: Specialized Models-Guided Hallucination Decrease for MultiModal Large Language Models ( http://arxiv.org/abs/2408.01003v1 ) ライセンス: Link先を確認	Kohou Wang, Xiang Liu, Zhaoxiang Liu, Kai Wang, Shiguo Lian,	(参考訳) MLLM(Multimodal Large Language Models)は、視覚と言語の間のギャップを埋める上で大きな進歩を遂げた。しかし、生成したテキストが画像の内容と一致しないMLLMの幻覚は、引き続き大きな課題である。既存の幻覚に対処する方法は、しばしば命令チューニングに依存しており、特定のデータでモデルを再訓練する必要があるため、MLLMのさらなる利用コストが増大する。本稿では,MLLMの入力表現を向上するために,Piculetという新しいトレーニングフリー手法を提案する。 Piculetは複数の専門モデルを活用して、入力画像から視覚情報の記述を抽出し、これらの記述を元の画像と組み合わせ、MLLMへの入力としてクエリする。その結果,PiculetはMLLMの幻覚を著しく減少させることが明らかとなった。我々の手法は普遍的でありながら、異なるMLLMに容易に拡張できる。 Multimodal Large Language Models (MLLMs) have made significant progress in bridging the gap between visual and language modalities. However, hallucinations in MLLMs, where the generated text does not align with image content, continue to be a major challenge. Existing methods for addressing hallucinations often rely on instruction-tuning, which requires retraining the model with specific data, which increases the cost of utilizing MLLMs further. In this paper, we introduce a novel training-free method, named Piculet, for enhancing the input representation of MLLMs. Piculet leverages multiple specialized models to extract descriptions of visual information from the input image and combine these descriptions with the original image and query as input to the MLLM. We evaluate our method both quantitively and qualitatively, and the results demonstrate that Piculet greatly decreases hallucinations of MLLMs. Our method can be easily extended to different MLLMs while being universal.	翻訳日:2024-08-05 14:36:49 公開日:2024-08-02
# 金融市場予測の強化:因果関係による特徴選択 Enhancing Financial Market Predictions: Causality-Driven Feature Selection ( http://arxiv.org/abs/2408.01005v1 ) ライセンス: Link先を確認	Wenhao Liang, Zhengyang Li, Weitong Chen,	(参考訳) 本稿では、197カ国の経済・金融ニュース記事と株式市場データを統合することで、金融市場分析に革命をもたらすFinSenデータセットを紹介する。データセットの広範なカバレッジは、2007年から2023年までの15年間にわたって、時間的情報とともに、金融市場ニュースに関する16万件の記録を持つ、豊かなグローバルな視点を提供する。本研究は、市場予測精度と信頼性を高めるために、因果検証された感情スコアとLSTMモデルを活用する。 FinSenデータセットを利用することで、革新的なフーカルキャリブレーション損失を導入し、期待キャリブレーションエラー(ECE)をDAN 3モデルで3.34%削減する。これは予測精度を向上するだけでなく、予測確率が最重要である金融セクターにとって重要な、実際の結果と密に確率予測を一致させる。提案手法は,誤解釈のコストが高い信頼に値する財務予測において,感情分析と正確な校正手法を組み合わせることの有効性を示す。 Finsen Dataは[このgithub URL](https://github.com/EagleAdelaide/FinSen_Dataset.git)で見ることができる。 This paper introduces the FinSen dataset that revolutionizes financial market analysis by integrating economic and financial news articles from 197 countries with stock market data. The dataset's extensive coverage spans 15 years from 2007 to 2023 with temporal information, offering a rich, global perspective with 160,000 records on financial market news. Our study leverages causally validated sentiment scores and LSTM models to enhance market forecast accuracy and reliability. Utilizing the FinSen dataset, we introduce an innovative Focal Calibration Loss, reducing Expected Calibration Error (ECE) to 3.34 percent with the DAN 3 model. This not only improves prediction accuracy but also aligns probabilistic forecasts closely with real outcomes, crucial for the financial sector where predicted probability is paramount. Our approach demonstrates the effectiveness of combining sentiment analysis with precise calibration techniques for trustworthy financial forecasting where the cost of misinterpretation can be high. Finsen Data can be found at [this github URL](https://github.com/EagleAdelaide/FinSen_Dataset.git).	翻訳日:2024-08-05 14:26:49 公開日:2024-08-02
# テンソルトレイン低ランク近似(TT-LoRA):加速LDMによるAIの民主化 Tensor Train Low-rank Approximation (TT-LoRA): Democratizing AI with Accelerated LLMs ( http://arxiv.org/abs/2408.01008v1 ) ライセンス: Link先を確認	Afia Anjum, Maksim E. Eren, Ismael Boureima, Boian Alexandrov, Manish Bhattarai,	(参考訳) 近年、Large Language Models (LLM) は、質問応答、感情分析、テキスト要約、機械翻訳など、幅広い自然言語処理(NLP)タスクにおいて顕著な機能を示している。しかし、LLMの複雑さはますます増大し、膨大な計算資源を必要とし、これらのモデルの研究と応用を妨げている。これを解決するために,ローランド近似 (LoRA) やアダプタ (Adapters) などのパラメータ効率のよい微調整戦略が開発されている。その可能性にもかかわらず、これらの方法は圧縮性の限界に直面していることが多い。特に、LoRAは、現代の大規模LLMにおいて、トレーニング可能なパラメータの数の増加とともに、効果的にスケールするのに苦労している。さらに、テンソル列車分解を利用したローランド経済テンソル・トレイン適応(LoRETTA)は、限られた資源を持つ大規模モデルの微調整に必要な圧縮レベルをまだ達成していない。本稿では,TT 分解積分を最適化して LoRETTA を拡張する新しいパラメータ効率細調整 (PEFT) 手法である Tensor Train Low-Rank Approximation (TT-LoRA) を提案する。アダプタと従来のLoRA構造を排除することにより、TT-LoRAは、ダウンストリームタスク性能を損なうことなく、より優れたモデル圧縮を実現し、推論遅延と計算オーバーヘッドを低減させる。我々は、モデル圧縮と性能のトレードオフを強調するベンチマークを確立するために、徹底的なパラメータ探索を行う。以上の結果から,LLMは大規模モデルに匹敵する性能を維持しつつ,資源制約型プラットフォームへの展開を容易化しつつ,大幅な圧縮を図っている。 In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing (NLP) tasks, such as question-answering, sentiment analysis, text summarization, and machine translation. However, the ever-growing complexity of LLMs demands immense computational resources, hindering the broader research and application of these models. To address this, various parameter-efficient fine-tuning strategies, such as Low-Rank Approximation (LoRA) and Adapters, have been developed. Despite their potential, these methods often face limitations in compressibility. Specifically, LoRA struggles to scale effectively with the increasing number of trainable parameters in modern large scale LLMs. Additionally, Low-Rank Economic Tensor-Train Adaptation (LoRETTA), which utilizes tensor train decomposition, has not yet achieved the level of compression necessary for fine-tuning very large scale models with limited resources. This paper introduces Tensor Train Low-Rank Approximation (TT-LoRA), a novel parameter-efficient fine-tuning (PEFT) approach that extends LoRETTA with optimized tensor train (TT) decomposition integration. By eliminating Adapters and traditional LoRA-based structures, TT-LoRA achieves greater model compression without compromising downstream task performance, along with reduced inference latency and computational overhead. We conduct an exhaustive parameter search to establish benchmarks that highlight the trade-off between model compression and performance. Our results demonstrate significant compression of LLMs while maintaining comparable performance to larger models, facilitating their deployment on resource-constraint platforms.	翻訳日:2024-08-05 14:26:49 公開日:2024-08-02
# EIUP: 暗黙の安全でないプロンプトで条件付き非競合概念を根絶するためのトレーニング不要なアプローチ EIUP: A Training-Free Approach to Erase Non-Compliant Concepts Conditioned on Implicit Unsafe Prompts ( http://arxiv.org/abs/2408.01014v1 ) ライセンス: Link先を確認	Die Chen, Zhiwen Li, Mingyuan Fan, Cen Chen, Wenmeng Zhou, Yaliang Li,	(参考訳) テキストから画像への拡散モデルは様々な概念を学習する能力を示している。しかし、それらは望ましくないアウトプットを発生させ、結果として重大なセキュリティ上の懸念を引き起こす可能性があることに注意する必要がある。特に、Not Safe for Work(NSFW)コンテンツやスタイル著作権違反などの問題に遭遇する可能性がある。画像生成はテキスト上で条件付けされているため、迅速な浄化はコンテンツ安全性の簡単な解決策となる。 LLMのアプローチと同様に、プロンプトを浄化することで安全な出力の生成を制御するためにいくつかの取り組みがなされている。しかし、これらの努力にもかかわらず、有害でないテキストは、暗黙の安全でないプロンプトと呼ばれる非準拠な画像を生成するリスクがある点にも注意が必要である。さらに、既存の作品の中には、モデルウェイトから望ましくない概念を消すためにモデルを微調整するものもある。このタイプのメソッドは、コンセプトが更新されるたびに複数のトレーニングイテレーションを必要とします。これらの課題に対処するために,非準拠概念を消去プロンプトに組み込む,シンプルで効果的なアプローチを提案する。この消去は、画像空間特徴とテキスト埋め込みの融合に積極的に関与する。注意機構により,画像空間における非準拠概念の特徴表現を同定することができる。我々はこれらの特徴を再重み付けし、元の暗黙の安全でないプロンプトに条件付けされた安全でない画像の発生を効果的に抑制する。本手法は,最先端のベースラインと比較して画像の忠実度を高く評価しながら,優れた消去効果を示す。 WARNING: 攻撃的かもしれないモデル出力を含む。 Text-to-image diffusion models have shown the ability to learn a diverse range of concepts. However, it is worth noting that they may also generate undesirable outputs, consequently giving rise to significant security concerns. Specifically, issues such as Not Safe for Work (NSFW) content and potential violations of style copyright may be encountered. Since image generation is conditioned on text, prompt purification serves as a straightforward solution for content safety. Similar to the approach taken by LLM, some efforts have been made to control the generation of safe outputs by purifying prompts. However, it is also important to note that even with these efforts, non-toxic text still carries a risk of generating non-compliant images, which is referred to as implicit unsafe prompts. Furthermore, some existing works fine-tune the models to erase undesired concepts from model weights. This type of method necessitates multiple training iterations whenever the concept is updated, which can be time-consuming and may potentially lead to catastrophic forgetting. To address these challenges, we propose a simple yet effective approach that incorporates non-compliant concepts into an erasure prompt. This erasure prompt proactively participates in the fusion of image spatial features and text embeddings. Through attention mechanisms, our method is capable of identifying feature representations of non-compliant concepts in the image space. We re-weight these features to effectively suppress the generation of unsafe images conditioned on original implicit unsafe prompts. Our method exhibits superior erasure effectiveness while achieving high scores in image fidelity compared to the state-of-the-art baselines. WARNING: This paper contains model outputs that may be offensive.	翻訳日:2024-08-05 14:26:49 公開日:2024-08-02
# IBBトラフィックグラフデータ:ベンチマークと道路交通予測モデル IBB Traffic Graph Data: Benchmarking and Road Traffic Prediction Model ( http://arxiv.org/abs/2408.01016v1 ) ライセンス: Link先を確認	Eren Olug, Kiymet Kaya, Resul Tugay, Sule Gunduz Oguducu,	(参考訳) 道路交通渋滞予測は、積極的な交通管理を可能にし、郊外での体験を高め、環境への影響を低減し、全体的な安全と効率を向上させるため、インテリジェント交通システムにおいて重要な要素である。特に大都市圏の公共データセットはいくつか存在するが、これらのデータセットは、データ規模(センサ数や道路リンク数など)の不足や、都市、高速道路、データ収集場所といった対象領域の異なる特徴のような外部要因により、現実的なシナリオには適用できない可能性がある。そこで本研究では,これらの制約を緩和し,新たな地理的特徴を持つ文献を充実させるための代替ベンチマークデータセットとして,新しいIBBトラヒックグラフデータセットを提案する。 IBB Traffic Graphデータセットは、2451の異なる場所で収集されたセンサーデータをカバーしている。さらに,機能工学による時間的リンクを強化する新しい道路交通予測モデル,交通ネットワーク内の関連関係を表現するためのGLEEへのノード埋め込み,およびExtraTreesによる交通予測を提案する。その結果,提案モデルはベースラインモデルより一貫して優れており,平均精度は4%向上した。 Road traffic congestion prediction is a crucial component of intelligent transportation systems, since it enables proactive traffic management, enhances suburban experience, reduces environmental impact, and improves overall safety and efficiency. Although there are several public datasets, especially for metropolitan areas, these datasets may not be applicable to practical scenarios due to insufficiency in the scale of data (i.e. number of sensors and road links) and several external factors like different characteristics of the target area such as urban, highways and the data collection location. To address this, this paper introduces a novel IBB Traffic graph dataset as an alternative benchmark dataset to mitigate these limitations and enrich the literature with new geographical characteristics. IBB Traffic graph dataset covers the sensor data collected at 2451 distinct locations. Moreover, we propose a novel Road Traffic Prediction Model that strengthens temporal links through feature engineering, node embedding with GLEE to represent inter-related relationships within the traffic network, and traffic prediction with ExtraTrees. The results indicate that the proposed model consistently outperforms the baseline models, demonstrating an average accuracy improvement of 4%.	翻訳日:2024-08-05 14:26:49 公開日:2024-08-02
# GNN-MolKAN:kanの力とGNNによる分子表現学習の促進 GNN-MolKAN: Harnessing the Power of KAN to Advance Molecular Representation Learning with GNNs ( http://arxiv.org/abs/2408.01018v1 ) ライセンス: Link先を確認	Ruifeng Li,	(参考訳) 分子特性予測と薬物設計には効果的な分子表現学習が不可欠である。しかし、既存のアプローチは、不十分なアノテーションと最適でないアーキテクチャ設計の制限に悩まされている。例えば、グラフニューラルネットワーク(GNN)は過剰なスカッシングに悩まされ、分子内の重要な構造的詳細が失われ、分子表現が損なわれる。本稿では,AI+ScienceのKANアーキテクチャをGNNに統合し,これらの課題に対処する,GNNの新しいクラスであるGNN-MolKANとその拡張型であるGNN-MolKAN+を提案する。さらに,安定性と速度を向上し,標準GNNの性能をさらに向上する先進的な Kan である AdFastKAN (Adaptive FastKAN) を導入する。私たちのアプローチには3つの大きなメリットがあります。 1) 高性能: GNN-MolKAN と GNN-MolKAN+ は優れた予測能力, 見えない足場への堅牢な一般化, 異なる GNN アーキテクチャ間の多目的転送性を示す。 2) 効率性: これらのモデルでは,SOTA(State-of-the-art)の自己管理手法をマッチングあるいは超越しながら,計算時間が少なく,パラメータも少ない。 3) 数ショットの学習能力: GNN-MolKANは、数ショットの学習シナリオにおいて大きなポテンシャルを示し、数ショットのベンチマークで平均6.97%の改善を実現している。全体として、アーキテクチャを6つの分類データセット、6つの回帰データセット、および4つの数ショットの学習データセットで検証し、それらすべてに対して一貫して高い競争力のある結果が得られるようにします。 Effective molecular representation learning is crucial for molecular property prediction and drug design. However, existing approaches struggle with limitations in insufficient annotations and suboptimal architecture design. For instance, Graph Neural Networks (GNNs) suffer from over-squashing, causing the loss of important structural details in molecules, thus impairing molecular representations. In this work, we propose a new class of GNNs, GNN-MolKAN and its augmented variant, GNN-MolKAN+, that integrate the Kolmogorov-Arnold Networks (KAN) architecture from AI + Science into GNNs to address these challenges. Additionally, we introduce Adaptive FastKAN (AdFastKAN), an advanced KAN that offers increased stability and speed, further enhancing the performance of standard GNNs. Notably, our approach holds three key benefits: 1) Superior Performance: GNN-MolKAN and GNN-MolKAN+ demonstrate superior prediction ability, robust generalization to unseen scaffolds, and versatile transferability across different GNN architectures. 2) Efficiency: These models require less computational time and fewer parameters while matching or surpassing the state-of-the-art (SOTA) self-supervised methods. 3) Few-shot Learning Ability: GNN-MolKAN demonstrates great potential in few-shot learning scenarios, achieving an average improvement of 6.97% across few-shot benchmarks. Overall, we validate our architecture on 6 classification datasets, 6 regression datasets, and 4 few-shot learning datasets, consistently achieving highly competitive results across all of them.	翻訳日:2024-08-05 14:26:49 公開日:2024-08-02
# 正負依存制御のためのランダムサブセット分布の一家系 A Family of Distributions of Random Subsets for Controlling Positive and Negative Dependence ( http://arxiv.org/abs/2408.01022v1 ) ライセンス: Link先を確認	Takahiro Kawashima, Hideitsu Hino,	(参考訳) 正と負の依存は、ランダム部分集合の魅力的で反発的な振る舞いを特徴づける基本的な概念である。いくつかの確率モデルは正あるいは負の依存を示すことが知られているが、実践可能な確率モデルでそれらをシームレスに橋渡しすることは困難である。本研究では,行列点過程とボルツマンマシンの一部を含む離散カーネル点過程 (DKPP) を新たに導入する。また, DKPPを用いた確率的演算と推定のための計算手法を開発し, 限界確率と条件確率を計算し, パラメータを学習する。数値実験により, 正負依存の制御性とDKPPの計算方法の有効性が示された。 Positive and negative dependence are fundamental concepts that characterize the attractive and repulsive behavior of random subsets. Although some probabilistic models are known to exhibit positive or negative dependence, it is challenging to seamlessly bridge them with a practicable probabilistic model. In this study, we introduce a new family of distributions, named the discrete kernel point process (DKPP), which includes determinantal point processes and parts of Boltzmann machines. We also develop some computational methods for probabilistic operations and inference with DKPPs, such as calculating marginal and conditional probabilities and learning the parameters. Our numerical experiments demonstrate the controllability of positive and negative dependence and the effectiveness of the computational methods for DKPPs.	翻訳日:2024-08-05 14:26:49 公開日:2024-08-02
# 因果樹から解釈可能な因果樹を蒸留する Distilling interpretable causal trees from causal forests ( http://arxiv.org/abs/2408.01023v1 ) ライセンス: Link先を確認	Patrick Rehill,	(参考訳) 治療効果のヘテロジニティを推定する機械学習手法は、いくつかのあらかじめ特定された仮説をテストする既存の方法よりも柔軟性が高い。しかし、これらの手法が抱える1つの問題は、複雑な機械学習モデルから洞察を抽出することが難しいことである。条件付き平均治療効果の高次元分布は、正確で個々のレベルの見積もりを与えるが、根底にあるパターンを理解することは困難であり、分析の意味を理解することは困難である。本論文は, 原生林から1本, 解釈可能な因果樹を蒸留する方法である, 希釈因果樹を提案する。これは、特に多くの相関する特徴があるノイズの多いデータや高次元データにおいて、単一の木を抽出する既存の方法とよく比較できる。ここでは、ほとんどのシミュレーションにおいて、基礎となる因果林よりも優れています。その推定値は2倍に頑丈で、因果樹林と同様に漸近的に正常である。 Machine learning methods for estimating treatment effect heterogeneity promise greater flexibility than existing methods that test a few pre-specified hypotheses. However, one problem these methods can have is that it can be challenging to extract insights from complicated machine learning models. A high-dimensional distribution of conditional average treatment effects may give accurate, individual-level estimates, but it can be hard to understand the underlying patterns; hard to know what the implications of the analysis are. This paper proposes the Distilled Causal Tree, a method for distilling a single, interpretable causal tree from a causal forest. This compares well to existing methods of extracting a single tree, particularly in noisy data or high-dimensional data where there are many correlated features. Here it even outperforms the base causal forest in most simulations. Its estimates are doubly robust and asymptotically normal just as those of the causal forest are.	翻訳日:2024-08-05 14:26:49 公開日:2024-08-02
# 身体的指導のためのセマンティック・スキル・グラウンドディング-クロスドメイン環境におけるフォローイング Semantic Skill Grounding for Embodied Instruction-Following in Cross-Domain Environments ( http://arxiv.org/abs/2408.01024v1 ) ライセンス: Link先を確認	Sangwoo Shin, Seunghyun Kim, Youngsoo Jang, Moontae Lee, Honguk Woo,	(参考訳) EIF(Embodied instruction-following)では、タスクプランナとしての事前訓練言語モデル(LM)の統合が重要なブランチとして現れ、事前訓練されたスキルとユーザ指示でLMに促すことで、スキルレベルでタスクを計画する。しかし、ドメイン固有の知識との複雑な絡み合いから、これらの事前訓練されたスキルを異なるドメインに根ざすことは依然として困難である。この課題に対処するために、セマンティックスキルの階層的性質を活用するセマンティックスキル基盤(セマンティックスキル基盤)フレームワークを提案する。 SemGroはこれらのスキルの幅広い範囲を認識しており、ドメイン間で普遍的に適用可能な短地低セマンティックスキルから、特定のドメインに高度に特化され、調整された長地富裕なセマンティックスキルまで幅広い。フレームワークは、高いレベルのセマンティックスキル階層から始まり、次に下方へ移動して、計画された各スキルをターゲットドメイン内の実行可能なレベルにグラウンドする、反復的なスキル分解アプローチを採用している。そこで本手法では,意味的スキルの合成と分解に,LMの推論能力と,対象ドメインにおけるスキル実現可能性を評価するためのマルチモーダル拡張を利用する。 VirtualHomeベンチマークで行った実験では,300のドメイン横断EIFシナリオにおけるSemGroの有効性が示された。 In embodied instruction-following (EIF), the integration of pretrained language models (LMs) as task planners emerges as a significant branch, where tasks are planned at the skill level by prompting LMs with pretrained skills and user instructions. However, grounding these pretrained skills in different domains remains challenging due to their intricate entanglement with the domain-specific knowledge. To address this challenge, we present a semantic skill grounding (SemGro) framework that leverages the hierarchical nature of semantic skills. SemGro recognizes the broad spectrum of these skills, ranging from short-horizon low-semantic skills that are universally applicable across domains to long-horizon rich-semantic skills that are highly specialized and tailored for particular domains. The framework employs an iterative skill decomposition approach, starting from the higher levels of semantic skill hierarchy and then moving downwards, so as to ground each planned skill to an executable level within the target domain. To do so, we use the reasoning capabilities of LMs for composing and decomposing semantic skills, as well as their multi-modal extension for assessing the skill feasibility in the target domain. Our experiments in the VirtualHome benchmark show the efficacy of SemGro in 300 cross-domain EIF scenarios.	翻訳日:2024-08-05 14:26:49 公開日:2024-08-02
# CALA-$n$:Bloch Sphere Approach, Clifford+T Gates, Layoutsを用いたIBM量子コンピュータ上のコスト効果2-, 3-, 4-, 5ビットゲートを実現する量子ライブラリ CALA-$n$: A Quantum Library for Realizing Cost-Effective 2-, 3-, 4-, and 5-bit Gates on IBM Quantum Computers using Bloch Sphere Approach, Clifford+T Gates, and Layouts ( http://arxiv.org/abs/2408.01025v1 ) ライセンス: Link先を確認	Ali Al-Bayaty, Xiaoyu Song, Marek Perkowski,	(参考訳) 我々は,Bloch球を用いた費用効率の良い$n$-bitゲートを実現するために,新しい量子レイアウトを意識したアプローチを,2ドル n \le 5$ qubits で導入する。これらの$n$-bitゲートはクリフォード+Tゲートから完全に構成されており、ブロッホ球上で視覚化された回転列を選択するアプローチである。このBloch sphereアプローチは、これらの$n$-bitゲートをIBM量子コンピュータに合成(変換)するための量子レイアウトを確実に一致させる。各種標準$n$-bitゲート(Toffoli,Fredkinなど)と,提案した$n$-bitゲートの動作等価性について,最終量子コストの文脈で検討し,IBMネイティブゲートの最終的な数として評価した。本稿では,すべての$n$-bitゲートが,トランスパイレーション後の標準$n$-bitゲートよりも量子コストが低いことを示す。したがって、我々のBloch sphereアプローチは、IBM量子コンピュータの異なるレイアウトのための費用効率のよい$n$-bitゲートの量子ライブラリを構築するのに利用できる。 We introduce a new quantum layout-aware approach to realize cost-effective $n$-bit gates using the Bloch sphere, for $2 \le n \le 5$ qubits. These $n$-bit gates are entirely constructed from the Clifford+T gates, in the approach of selecting sequences of rotations visualized on the Bloch sphere. This Bloch sphere approach ensures to match the quantum layout for synthesizing (transpiling) these $n$-bit gates into an IBM quantum computer. Various standard $n$-bit gates (Toffoli, Fredkin, etc.) and their operational equivalent of our proposed $n$-bit gates are examined and evaluated, in the context of the final quantum costs, as the final counts of generated IBM native gates. In this paper, we demonstrate that all our $n$-bit gates always have lower quantum costs than those of standard $n$-bit gates after transpilation. Hence, our Bloch sphere approach can be used to build a quantum library of various cost-effective $n$-bit gates for different layouts of IBM quantum computers.	翻訳日:2024-08-05 14:26:49 公開日:2024-08-02
# 医用画像解析のためのPINN PINNs for Medical Image Analysis: A Survey ( http://arxiv.org/abs/2408.01026v1 ) ライセンス: Link先を確認	Chayan Banerjee, Kien Nguyen, Olivier Salvado, Truyen Tran, Clinton Fookes,	(参考訳) 機械学習フレームワークにおける物理情報の取り込みは、医療画像分析(MIA)の変換である。基本的な知識と物理法則を統合することにより、これらのモデルは強化された堅牢性と解釈可能性を達成する。本研究では,登録,生成,分類,再構築などのMIA(PIMIA)タスクに対する物理インフォームドアプローチの有用性について検討する。本報告では,MIA専用の物理インフォームド手法に関する80以上の論文について,系統的な文献レビューを行う。本稿では,物理知識とプロセスのモデル化,表現方法,MIAモデルへの統合戦略について検討する。画像処理,生成,予測,逆画像(超解像と再構成),登録,画像解析(分離と分類)など,幅広い画像解析タスクを深く掘り下げる。各タスクについて,中心的な物理誘導操作,関心領域(人体解剖学),対応する画像モダリティ,モデルトレーニングに用いるデータセット,深層ネットワークアーキテクチャ,物理過程,方程式,原理を表形式で網羅的に検討し,提示する。さらに、異なるタスクやデータセット間でPIMIAメソッドのパフォーマンスを比較するための新しいメトリクスも導入する。本レビューに基づき,今後の課題,オープンな研究課題,今後の研究の方向性について,私たちの視点を要約し,無視する。 PIMIAにおける重要なオープンな課題として、適切な物理の事前選択や標準化されたベンチマークプラットフォームの構築を挙げる。 The incorporation of physical information in machine learning frameworks is transforming medical image analysis (MIA). By integrating fundamental knowledge and governing physical laws, these models achieve enhanced robustness and interpretability. In this work, we explore the utility of physics-informed approaches for MIA (PIMIA) tasks such as registration, generation, classification, and reconstruction. We present a systematic literature review of over 80 papers on physics-informed methods dedicated to MIA. We propose a unified taxonomy to investigate what physics knowledge and processes are modelled, how they are represented, and the strategies to incorporate them into MIA models. We delve deep into a wide range of image analysis tasks, from imaging, generation, prediction, inverse imaging (super-resolution and reconstruction), registration, and image analysis (segmentation and classification). For each task, we thoroughly examine and present in a tabular format the central physics-guided operation, the region of interest (with respect to human anatomy), the corresponding imaging modality, the dataset used for model training, the deep network architecture employed, and the primary physical process, equation, or principle utilized. Additionally, we also introduce a novel metric to compare the performance of PIMIA methods across different tasks and datasets. Based on this review, we summarize and distil our perspectives on the challenges, open research questions, and directions for future research. We highlight key open challenges in PIMIA, including selecting suitable physics priors and establishing a standardized benchmarking platform.	翻訳日:2024-08-05 14:26:49 公開日:2024-08-02
# POA: すべてのサイズのモデルの事前トレーニング POA: Pre-training Once for Models of All Sizes ( http://arxiv.org/abs/2408.01031v1 ) ライセンス: Link先を確認	Yingying Zhang, Xin Guo, Jiangwei Lao, Lei Yu, Lixiang Ru, Jian Wang, Guo Ye, Huimei He, Jingdong Chen, Ming Yang,	(参考訳) 大規模自己監督型事前学習は、ひとつの基盤モデルが多くの異なるビジョンタスクを処理するための道を開いた。ほとんどの事前学習手法は、あるサイズの1つのモデルを一度に訓練する。それでも、実際のシナリオにおける様々な計算やストレージの制約は、デプロイするサイズが異なる一連のモデルを開発するためにかなりの努力を必要とする。そこで本研究では,この課題に対処するために,POA(Pre-Treating Once for All)と呼ばれる新しい三枝学習フレームワークを提案する。我々のアプローチは、革新的な弾性的な学生分岐を近代的な自己蒸留パラダイムに導入する。事前学習の各段階において、元の学生からランダムにサブネットワークをサンプリングし、弾性的な学生を形成し、全ての枝を自己蒸留的に訓練する。一度トレーニング済みになると、POAは下流タスクのためのさまざまなサイズの事前トレーニングされたモデルの抽出を可能にする。注目すべきは、弾力性のある学生は、異なる大きさの複数のモデルの同時事前訓練を促進することであり、また、表現学習を強化するために、様々なサイズのモデルの追加のアンサンブルとして機能する。複数の下流タスクに対する線形探索評価と評価を含む大規模な実験は、我々のPOAの有効性と利点を実証している。 ViT、Swin Transformer、ResNetのバックボーンを使用して最先端のパフォーマンスを実現し、単一の事前トレーニングセッションを通じて、100ほどのモデルを生成する。コードは、https://github.com/Qichuzyy/POA.comで入手できる。 Large-scale self-supervised pre-training has paved the way for one foundation model to handle many different vision tasks. Most pre-training methodologies train a single model of a certain size at one time. Nevertheless, various computation or storage constraints in real-world scenarios require substantial efforts to develop a series of models with different sizes to deploy. Thus, in this study, we propose a novel tri-branch self-supervised training framework, termed as POA (Pre-training Once for All), to tackle this aforementioned issue. Our approach introduces an innovative elastic student branch into a modern self-distillation paradigm. At each pre-training step, we randomly sample a sub-network from the original student to form the elastic student and train all branches in a self-distilling fashion. Once pre-trained, POA allows the extraction of pre-trained models of diverse sizes for downstream tasks. Remarkably, the elastic student facilitates the simultaneous pre-training of multiple models with different sizes, which also acts as an additional ensemble of models of various sizes to enhance representation learning. Extensive experiments, including k-nearest neighbors, linear probing evaluation and assessments on multiple downstream tasks demonstrate the effectiveness and advantages of our POA. It achieves state-of-the-art performance using ViT, Swin Transformer and ResNet backbones, producing around a hundred models with different sizes through a single pre-training session. The code is available at: https://github.com/Qichuzyy/POA.	翻訳日:2024-08-05 14:26:49 公開日:2024-08-02
# 動きに基づく動き推定と未知の形状空間デブリの3次元再構成 Structure from Motion-based Motion Estimation and 3D Reconstruction of Unknown Shaped Space Debris ( http://arxiv.org/abs/2408.01035v1 ) ライセンス: Link先を確認	Kentaro Uno, Takehiro Matsuoka, Akiyoshi Uchida, Kazuya Yoshida,	(参考訳) 今後数十年で打ち上げられる宇宙船の数が増えるにつれ、宇宙デブリの問題は極めて重要になっている。持続可能な宇宙利用のために、宇宙ゴミの継続的な除去は人類にとって最も深刻な問題である。軌道上でのデブリ捕獲ミッションの信頼性を最大化するためには、目標の正確な動き推定が不可欠である。宇宙デブリは姿勢と軌道制御能力を失い、その形状は壊れたために不明である。本稿では,入力として2次元画像のみを必要とする限られた資源で未知の形状の空間デブリ運動推定を行うための,動きに基づく構造的アルゴリズムを提案する。次に、未知物体の再構成形状と、被写体とカメラの間の相対ポーズ軌跡を同時に出力し、被写体の動きを推定する。本手法は, 微小重力実験により生成した現実的な画像データセットを用いて, 2次元気流実験ベッドと3次元運動シミュレーションを用いて定量的に検証する。 With the boost in the number of spacecraft launches in the current decades, the space debris problem is daily becoming significantly crucial. For sustainable space utilization, the continuous removal of space debris is the most severe problem for humanity. To maximize the reliability of the debris capture mission in orbit, accurate motion estimation of the target is essential. Space debris has lost its attitude and orbit control capabilities, and its shape is unknown due to the break. This paper proposes the Structure from Motion-based algorithm to perform unknown shaped space debris motion estimation with limited resources, where only 2D images are required as input. The method then outputs the reconstructed shape of the unknown object and the relative pose trajectory between the target and the camera simultaneously, which are exploited to estimate the target's motion. The method is quantitatively validated with the realistic image dataset generated by the microgravity experiment in a 2D air-floating testbed and 3D kinematic simulation.	翻訳日:2024-08-05 14:26:49 公開日:2024-08-02
# パラメタライズド量子回路の解析--表現性と量子ゲートの種類との関係から- Analysis of Parameterized Quantum Circuits: on The Connection Between Expressibility and Types of Quantum Gates ( http://arxiv.org/abs/2408.01036v1 ) ライセンス: Link先を確認	Yu Liu, Kentaro Baba, Kazuya Kaneko, Naoyuki Takeda, Junpei Koyama, Koyiti Kimura,	(参考訳) 表現性はパラメータ化量子回路(PQC)の重要な要素である。変分量子アルゴリズム(VQA)に基づく量子機械学習(QML)の文脈では、高表現能なPQCと十分な数の量子ビットからなるQMLモデルは任意の連続関数を近似することができる。表現可能性と学習性能の関係やPQCの層数について多くの研究が行われてきたが、表現性とPQC構造との関係は比較的少ない。本稿では、勾配ブースティングツリーモデルとSHAP(SHapley Additive ExPlanations)の値を用いて、PQC内の表現可能性と量子ゲートのタイプとの関係を解析する。解析は19個のPQCトポロジから導出された1,615個のPQCに対して行われ,それぞれ2-18量子ビットと1-5層からなる。分析の結果,高表現能なPQCの設計指針が得られ,CNOTゲート数と注意的バランスを維持しつつ,より多くのRXゲートやRYゲートの統合が示唆された。さらに, この評価は, 従来研究で見られたように, 表現性飽和の新たな証拠となる。 Expressibility is a crucial factor of a Parameterized Quantum Circuit (PQC). In the context of Variational Quantum Algorithms (VQA) based Quantum Machine Learning (QML), a QML model composed of highly expressible PQC and sufficient number of qubits is theoretically capable of approximating any arbitrary continuous function. While much research has explored the relationship between expressibility and learning performance, as well as the number of layers in PQCs, the connection between expressibility and PQC structure has received comparatively less attention. In this paper, we analyze the connection between expressibility and the types of quantum gates within PQCs using a Gradient Boosting Tree model and SHapley Additive exPlanations (SHAP) values. Our analysis is performed on 1,615 instances of PQC derived from 19 PQC topologies, each with 2-18 qubits and 1-5 layers. The findings of our analysis provide guidance for designing highly expressible PQCs, suggesting the integration of more RX or RY gates while maintaining a careful balance with the number of CNOT gates. Furthermore, our evaluation offers an additional evidence of expressibility saturation, as observed by previous studies.	翻訳日:2024-08-05 14:26:49 公開日:2024-08-02
# MambaST:効率的な歩行者検出のためのプラグイン・アンド・プレイ型クロススペクトル時空間フィルタ MambaST: A Plug-and-Play Cross-Spectral Spatial-Temporal Fuser for Efficient Pedestrian Detection ( http://arxiv.org/abs/2408.01037v1 ) ライセンス: Link先を確認	Xiangbo Gao, Asiegbu Miracle Kanu-Asiegbu, Xiaoxiao Du,	(参考訳) 本稿では,効率的な歩行者検出のためのプラグ・アンド・プレイ型時空間融合パイプラインであるMambaSTを提案する。自動運転車の歩行者検出にはいくつかの課題がある。第一に、暗黒または低照度条件下でRGBカメラを用いて正確な検出を行うことは困難である。サーマルカメラや可視カメラなどの複数のセンサモードからの補完情報を統合して、検出の堅牢性を向上させるために、クロススペクトルシステムを開発する必要がある。第二に、歩行者検出モデルは遅延に敏感である。パラメータが少ない効率的な大規模検出モデルは、自律運転のようなリアルタイムアプリケーションに非常に望ましい。第3に、歩行者映像データは歩行者の動きの空間的時間的相関を提供する。時間的・空間的な情報を取り入れて歩行者検出を強化することは有益である。この研究は、状態空間モデル(Mamba)の最近の進歩を活用し、RGBと熱画像の両方から細粒度と粗粒度の情報を抽出する新しいMHPA(Multi-head Hierarchical Patching and Aggregation)構造を提案する。実験結果から,提案したMHHPAは,横断歩行者検出のためのトランスフォーマーモデルに代わる有効かつ効率的な代替手段であることがわかった。また,提案モデルにより,小規模歩行者検出の性能も向上する。コードはhttps://github.com/XiangboGaoBarry/MambaST}{https://github.com/XiangboGaoBarry/MambaSTで入手できる。 This paper proposes MambaST, a plug-and-play cross-spectral spatial-temporal fusion pipeline for efficient pedestrian detection. Several challenges exist for pedestrian detection in autonomous driving applications. First, it is difficult to perform accurate detection using RGB cameras under dark or low-light conditions. Cross-spectral systems must be developed to integrate complementary information from multiple sensor modalities, such as thermal and visible cameras, to improve the robustness of the detections. Second, pedestrian detection models are latency-sensitive. Efficient and easy-to-scale detection models with fewer parameters are highly desirable for real-time applications such as autonomous driving. Third, pedestrian video data provides spatial-temporal correlations of pedestrian movement. It is beneficial to incorporate temporal as well as spatial information to enhance pedestrian detection. This work leverages recent advances in the state space model (Mamba) and proposes a novel Multi-head Hierarchical Patching and Aggregation (MHHPA) structure to extract both fine-grained and coarse-grained information from both RGB and thermal imagery. Experimental results show that the proposed MHHPA is an effective and efficient alternative to a Transformer model for cross-spectral pedestrian detection. Our proposed model also achieves superior performance on small-scale pedestrian detection. The code is available at https://github.com/XiangboGaoBarry/MambaST}{https://github.com/XiangboGaoBarry/MambaST.	翻訳日:2024-08-05 14:26:49 公開日:2024-08-02
# UNER:ビジュアルリッチドキュメントにおける名前付きエンティティ認識のための統一予測ヘッド UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents ( http://arxiv.org/abs/2408.01038v1 ) ライセンス: Link先を確認	Yi Tu, Chong Zhang, Ya Guo, Huan Chen, Jinyang Tang, Huijia Zhu, Qi Zhang,	(参考訳) 視覚的にリッチなドキュメント(VrD-NER)における名前付きエンティティの認識は、様々な現実のシナリオやアプリケーションにおいて重要な役割を果たす。しかしながら、VrD-NERの研究は、複雑なドキュメントレイアウト、誤った読み込み順序、不適切なタスク定式化の3つの大きな課題に直面している。これらの課題に対処するため、既存のマルチモーダル文書変換器と協調してより堅牢なVrD-NERモデルを開発するために、クエリ対応エンティティ抽出ヘッドであるUNERを提案する。 UNERヘッドは、VrD-NERタスクをシーケンスラベリングと読み込み順序予測の組み合わせとみなし、文書における不連続なエンティティの問題に効果的に対処する。多様なデータセットの実験的評価は、UNERがエンティティ抽出性能を向上させる効果を示す。さらに、UNERヘッドは、各種VrD-NERデータセットの教師付き事前学習段階を可能とし、文書トランスフォーマーバックボーンを強化し、事前学習段階から微調整段階への実質的な知識伝達を示す。普遍的なレイアウト理解を取り入れることで、事前訓練されたUNERベースのモデルは、少数ショットおよび多言語シナリオにおいて大きな利点を示し、ゼロショットエンティティ抽出能力を示す。 The recognition of named entities in visually-rich documents (VrD-NER) plays a critical role in various real-world scenarios and applications. However, the research in VrD-NER faces three major challenges: complex document layouts, incorrect reading orders, and unsuitable task formulations. To address these challenges, we propose a query-aware entity extraction head, namely UNER, to collaborate with existing multi-modal document transformers to develop more robust VrD-NER models. The UNER head considers the VrD-NER task as a combination of sequence labeling and reading order prediction, effectively addressing the issues of discontinuous entities in documents. Experimental evaluations on diverse datasets demonstrate the effectiveness of UNER in improving entity extraction performance. Moreover, the UNER head enables a supervised pre-training stage on various VrD-NER datasets to enhance the document transformer backbones and exhibits substantial knowledge transfer from the pre-training stage to the fine-tuning stage. By incorporating universal layout understanding, a pre-trained UNER-based model demonstrates significant advantages in few-shot and cross-linguistic scenarios and exhibits zero-shot entity extraction abilities.	翻訳日:2024-08-05 14:26:49 公開日:2024-08-02
# 凍結ガウス近似を用いたInchworm法によるカルデイラ・レゲットモデルの解法 Solving Caldeira-Leggett Model by Inchworm Method with Frozen Gaussian Approximation ( http://arxiv.org/abs/2408.01039v1 ) ライセンス: Link先を確認	Geshuo Wang, Siyao Yang, Zhenning Cai,	(参考訳) 本研究では, 量子粒子と熱調和浴を結合したカルデイラ・レゲットモデルをシミュレートするために, インキワーム法と凍結ガウス近似を組み合わせたアルゴリズムを提案する。特に、還元密度作用素のリアルタイムダイナミクスに関心がある。このアルゴリズムでは、凍結したガウス近似を用いて波動関数を積分形式で近似する。所望の還元密度作用素はダイソン級数として記述され、相互作用系の量子力学における経路積分の級数表現である。ダイソン級数を計算するために、ガウス波束を用いて各項を近似し、インヒワーム法のアイデアを用いて級数の収束を加速する。インチワーム法は、級数を「フルプロパゲータ」の積分微分方程式として定式化し、これら全プロパゲータを用いて右辺の無限級数を書き直し、和の項数を著しく減らし、より高速な収束を実現する。本アルゴリズムの性能は,様々な実験により数値的に検証される。 We propose an algorithm that combines the inchworm method and the frozen Gaussian approximation to simulate the Caldeira-Leggett model in which a quantum particle is coupled with thermal harmonic baths. In particular, we are interested in the real-time dynamics of the reduced density operator. In our algorithm, we use frozen Gaussian approximation to approximate the wave function as a wave packet in integral form. The desired reduced density operator is then written as a Dyson series, which is the series expression of path integrals in quantum mechanics of interacting systems. To compute the Dyson series, we further approximate each term in the series using Gaussian wave packets, and then employ the idea of the inchworm method to accelerate the convergence of the series. The inchworm method formulates the series as an integro-differential equation of "full propagators", and rewrites the infinite series on the right-hand side using these full propagators, so that the number of terms in the sum can be significantly reduced, and faster convergence can be achieved. The performance of our algorithm is verified numerically by various experiments.	翻訳日:2024-08-05 14:17:04 公開日:2024-08-02
# Patch-Wise Random と Noisy CutMix を用いた視覚変換器を用いたプライバシ保護スプリット学習 Privacy-Preserving Split Learning with Vision Transformers using Patch-Wise Random and Noisy CutMix ( http://arxiv.org/abs/2408.01040v1 ) ライセンス: Link先を確認	Seungeun Oh, Sihun Baek, Jihong Park, Hyelin Nam, Praneeth Vepakomma, Ramesh Raskar, Mehdi Bennis, Seong-Lyun Kim,	(参考訳) コンピュータビジョンでは、視覚変換器(ViT)が、精度と堅牢性を改善するために畳み込みニューラルネットワーク(CNN)に取って代わりつつある。しかし、ViTの大きなモデルサイズと高いサンプルの複雑さは、リソース制約のあるエッジデバイスでトレーニングすることを困難にしている。分散学習(SL)は、サーバ側のリソースを活用してViTをトレーニングし、分散デバイスからのプライベートデータを活用する、実行可能なソリューションとして登場した。しかし、SLはデバイスとサーバ間の重み更新のために追加の情報交換を必要としており、これはプライベートトレーニングデータに対する様々な攻撃にさらされる可能性がある。分類タスクにおけるデータ漏洩のリスクを軽減するために,DP-CutMixSLと呼ばれるクライアント間でランダムに選択したスマッシュデータにガウスノイズを注入する新しいプライバシ保護SLフレームワークを提案する。本分析により,DP-CutMixSLは,プログレッシブ・プロポーザルにおけるメンバーシップ・推論攻撃に対するプライバシー保護を強化する,差分プライベート(DP)機構であることが示された。シミュレーションにより、DP-CutMixSLは、DP-SLやDP-MixSLと比較して、メンバーシップ推論攻撃、再構築攻撃、ラベル推論攻撃に対するプライバシー保護を改善し、精度も向上することを示した。 In computer vision, the vision transformer (ViT) has increasingly superseded the convolutional neural network (CNN) for improved accuracy and robustness. However, ViT's large model sizes and high sample complexity make it difficult to train on resource-constrained edge devices. Split learning (SL) emerges as a viable solution, leveraging server-side resources to train ViTs while utilizing private data from distributed devices. However, SL requires additional information exchange for weight updates between the device and the server, which can be exposed to various attacks on private training data. To mitigate the risk of data breaches in classification tasks, inspired from the CutMix regularization, we propose a novel privacy-preserving SL framework that injects Gaussian noise into smashed data and mixes randomly chosen patches of smashed data across clients, coined DP-CutMixSL. Our analysis demonstrates that DP-CutMixSL is a differentially private (DP) mechanism that strengthens privacy protection against membership inference attacks during forward propagation. Through simulations, we show that DP-CutMixSL improves privacy protection against membership inference attacks, reconstruction attacks, and label inference attacks, while also improving accuracy compared to DP-SL and DP-MixSL.	翻訳日:2024-08-05 14:17:04 公開日:2024-08-02
# 線形光学を用いた高閾値のエンコードフュージョンに基づく量子計算 Encoded-Fusion-Based Quantum Computation for High Thresholds with Linear Optics ( http://arxiv.org/abs/2408.01041v1 ) ライセンス: Link先を確認	Wooyeong Song, Nuri Kang, Yong-Su Kim, Seung-Woo Lee,	(参考訳) 本稿では,有限サイズの絡み合った資源状態を持つ測定方式と線形光学を用いた符号化融合方式を提案する。エンコード融合(encoded-fusion)は、量子誤り訂正符号に基づく損失やエラーの存在下での融合成功確率を高めるために考案された絡み合った測定である。一般のショア符号を実装するために線形光学系と能動フィードフォワードで行うことができる符号化融合方式を適用し, 3次元ラスセンドルフ-ハリントン-ゴヤル格子の耐故障ネットワーク構成を構築する。数値シミュレーションにより, 核融合における光子数に制限のある非符号化核融合法よりも最大10倍高い損失閾値が得られることが示された。本手法は,有限サイズの絡み合った資源状態と線形光学を用いて,フォールトトレラント量子コンピューティングへの効率的な経路を舗装する。 We propose a fault-tolerant quantum computation scheme in a measurement-based manner with finite-sized entangled resource states and encoded fusion scheme with linear optics. The encoded-fusion is an entangled measurement devised to enhance the fusion success probability in the presence of losses and errors based on a quantum error-correcting code. We apply an encoded-fusion scheme, which can be performed with linear optics and active feedforwards to implement the generalized Shor code, to construct a fault-tolerant network configuration in a three-dimensional Raussendorf-Harrington-Goyal lattice based on the surface code. Numerical simulations show that our scheme allows us to achieve up to 10 times higher loss thresholds than nonencoded fusion approaches with limited numbers of photons used in fusion. Our scheme paves an efficient route toward fault-tolerant quantum computing with finite-sized entangled resource states and linear optics.	翻訳日:2024-08-05 14:17:04 公開日:2024-08-02
# 共変帯域スカラー場における局所性と絡み合いの収穫 Locality and entanglement harvesting in covariantly bandlimited scalar fields ( http://arxiv.org/abs/2408.01043v1 ) ライセンス: Link先を確認	Nicholas Funai, Nicolas C. Menicucci,	(参考訳) 滑らかな多様体上の量子論における高エネルギーの考察は、一般化された不確実性原理と量子重力シナリオにおける物理的最小長の可能性をもたらした。これらのモデルでは、最小長は単なる数学的ツールではなく物理的な極限であり、ローレンツ不変である。本稿では,同変帯域(最小長)を受けるフィールドにおける2量子通信と絡み合いの収穫について検討し,この帯域幅によって引き起こされる変化について述べる。バンドリミットは、非共変バンドリミットや他の量子光学近似とは異なり、非局所性や因果通信を導入している。また、この共変バンドリミットは、共変カットオフによって修正される仮想粒子の挙動に起因する異常な挙動と時間的・時間的整合性を導入することも観察した。 Considerations of high energies in quantum field theories on smooth manifolds have led to generalized uncertainty principles and the possibility of a physical minimal length in quantum gravitational scenarios. In these models, the minimal length would be a physical limit, not just a mathematical tool, and should be Lorentz invariant. In this paper, we study two-qubit communication and entanglement harvesting in a field subject to a covariant bandlimit (minimum length) and present the changes induced by this bandlimit. We find the bandlimit introduces nonlocality and acausal communication in a manner unlike non-covariant bandlimits or other quantum optical approximations. We also observe that this covariant bandlimit introduces uncertainties in time and temporal ordering with the unusual behavior attributed to the behavior of virtual particles being modified by the covariant cutoff.	翻訳日:2024-08-05 14:17:04 公開日:2024-08-02
# ビジョンファウンデーションモデルによる画素レベルスーパービジョンによる迷路物体の予測 Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model ( http://arxiv.org/abs/2408.01044v1 ) ライセンス: Link先を確認	Yang Jin, Lei Zhang, Shi Yan, Bin Fan, Binglu Wang,	(参考訳) 迷路オブジェクト予測(GOP)は、人間が見ている物体のカテゴリと位置を予測することを目的としている。従来はボックスレベルの監視手法を使用して、人が見ているオブジェクトを特定するが、意味的曖昧さに悩まされていたため、オブジェクトが近接しているため、単一のボックスにはいくつかのアイテムが含まれる可能性がある。ビジョンファウンデーションモデル(VFM)は、ボックスプロンプトを用いてオブジェクトのセグメンテーションを改善し、より正確にオブジェクトを配置することで混乱を低減する。本稿では,人間の視線行動によって捉えた被写体に対応する画素レベルのマスクを推定する,より困難な視線オブジェクトセグメンテーション(GOS)タスクを提案する。特に,VFMによる画素レベルの監視を視線オブジェクトの予測に統合し,意味的曖昧さを軽減することを提案する。これにより、正確なピクセルレベルの予測が可能な視線オブジェクトの検出とセグメンテーションフレームワークが実現される。付加的な頭部入力や頭部特徴の無視を必要とする従来の手法とは異なり,シーン特徴から頭部特徴を自動的に取得し,実世界におけるモデルの推論効率と柔軟性を確保することを提案する。さらに,物体の空間的位置や微妙な細部を見失うような既存の手法のように視線熱マップを予測するための特徴を直接融合させるのではなく,人と物との視線相互作用を容易にする空間対物視線回帰法を開発した。具体的には、まず最初の人間と対象の空間接続を構築し、次にセグメンテーションブランチで意味的に明確な特徴と相互作用し、最終的に正確な位置付けのための視線熱マップを予測することによって、この接続を洗練する。 GOO-SynthおよびGOO-Realデータセットの大規模な実験により,本手法の有効性が示された。 Gaze object prediction (GOP) aims to predict the category and location of the object that a human is looking at. Previous methods utilized box-level supervision to identify the object that a person is looking at, but struggled with semantic ambiguity, ie, a single box may contain several items since objects are close together. The Vision foundation model (VFM) has improved in object segmentation using box prompts, which can reduce confusion by more precisely locating objects, offering advantages for fine-grained prediction of gaze objects. This paper presents a more challenging gaze object segmentation (GOS) task, which involves inferring the pixel-level mask corresponding to the object captured by human gaze behavior. In particular, we propose that the pixel-level supervision provided by VFM can be integrated into gaze object prediction to mitigate semantic ambiguity. This leads to our gaze object detection and segmentation framework that enables accurate pixel-level predictions. Different from previous methods that require additional head input or ignore head features, we propose to automatically obtain head features from scene features to ensure the model's inference efficiency and flexibility in the real world. Moreover, rather than directly fuse features to predict gaze heatmap as in existing methods, which may overlook spatial location and subtle details of the object, we develop a space-to-object gaze regression method to facilitate human-object gaze interaction. Specifically, it first constructs an initial human-object spatial connection, then refines this connection by interacting with semantically clear features in the segmentation branch, ultimately predicting a gaze heatmap for precise localization. Extensive experiments on GOO-Synth and GOO-Real datasets demonstrate the effectiveness of our method.	翻訳日:2024-08-05 14:17:04 公開日:2024-08-02
# QUDSELECT: 議論中の質問に対する選択的デコーディング QUDSELECT: Selective Decoding for Questions Under Discussion Parsing ( http://arxiv.org/abs/2408.01046v1 ) ライセンス: Link先を確認	Ashima Suvarna, Xiao Liu, Tanmay Parekh, Kai-Wei Chang, Nanyun Peng,	(参考訳) Question Under Examination (QUD) は、暗黙の質問を用いて文間の会話関係を明らかにするための談話フレームワークである。 QUD解析では、各文は、前の文脈でアンカー文によって引き起こされる質問に対する答えと見なされる。結果のQUD構造は、応答整合性(質問がどの程度答えられるか)のようないくつかの理論的基準に適合することが要求され、QUD解析は難しい課題となる。以前の作業はパイプライン方式でQUDパーサを構築する(つまり、トリガー文をコンテキストで検出し、質問を生成する)。しかしながら、これらのパーサーはタスクの全体像を欠き、全ての基準を満たすことはほとんどできない。本稿では,QUD基準を考慮したQUD依存構造を選択的に復号する共同学習フレームワークであるQUDSELECTを紹介する。命令チューニングを用いて、アンカー文を同時に予測し、関連する質問を生成するモデルを訓練する。基準を明示的に組み込むために、推論中に複数のQUD候補をサンプリングし、その後、基準スコアの最良の候補を選択する選択復号戦略を採用する。提案手法は, 最先端のベースラインモデルに対して, 人的評価で9%, 自動評価で4%向上し, フレームワークの有効性を実証する。 Question Under Discussion (QUD) is a discourse framework that uses implicit questions to reveal discourse relationships between sentences. In QUD parsing, each sentence is viewed as an answer to a question triggered by an anchor sentence in prior context. The resulting QUD structure is required to conform to several theoretical criteria like answer compatibility (how well the question is answered), making QUD parsing a challenging task. Previous works construct QUD parsers in a pipelined manner (i.e. detect the trigger sentence in context and then generate the question). However, these parsers lack a holistic view of the task and can hardly satisfy all the criteria. In this work, we introduce QUDSELECT, a joint-training framework that selectively decodes the QUD dependency structures considering the QUD criteria. Using instruction-tuning, we train models to simultaneously predict the anchor sentence and generate the associated question. To explicitly incorporate the criteria, we adopt a selective decoding strategy of sampling multiple QUD candidates during inference, followed by selecting the best one with criteria scorers. Our method outperforms the state-of-the-art baseline models by 9% in human evaluation and 4% in automatic evaluation, demonstrating the effectiveness of our framework.	翻訳日:2024-08-05 14:17:04 公開日:2024-08-02
# ハイパーパラメータが大規模言語モデル推論性能に及ぼす影響:vLLMとHuggingFace Pipelinesの評価 The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines ( http://arxiv.org/abs/2408.01050v1 ) ライセンス: Link先を確認	Matias Martinez,	(参考訳) 最近のオープンソースの大規模言語モデル(LLMs)の急増により、開発者はプライバシやコンプライアンスといった側面のコントロールを維持しながら、AIベースのソリューションを作成し、モデルデプロイメントプロセスのガバナンスとオーナシップを提供することができる。これらのLLMを利用するには、推論エンジンが必要である。これらのエンジンはGPUなどの利用可能なリソースにモデルの重みをロードし、クエリを処理してレスポンスを生成する。 LLMの推論速度や性能は、推論毎に数百万から数十億の浮動小数点演算を計算しているため、リアルタイムアプリケーションには不可欠である。近年、vLLMのような高度な推論エンジンが登場し、効率的なメモリ管理などの新しいメカニズムを取り入れて最先端の性能を実現している。本稿では,2つの推論ライブラリ,vLLMとHugingFaceのパイプラインを用いて,性能,特にスループット(時間単位当たりのトークン)を解析する。開発者が設定しなければならない様々なハイパーパラメータが、推論性能にどのように影響するかを検討する。その結果,スループットのランドスケープは不規則であり,最大性能を実現するためのハイパーパラメータ最適化の重要性が浮き彫りになった。また、推論に使用するGPUモデルをアップグレードまたはダウングレードする際のハイパーパラメータ最適化を適用することで、HuggingFaceパイプラインのスループットを平均9.16%、13.7%向上できることを示す。 The recent surge of open-source large language models (LLMs) enables developers to create AI-based solutions while maintaining control over aspects such as privacy and compliance, thereby providing governance and ownership of the model deployment process. To utilize these LLMs, inference engines are needed. These engines load the model's weights onto available resources, such as GPUs, and process queries to generate responses. The speed of inference, or performance, of the LLM, is critical for real-time applications, as it computes millions or billions of floating point operations per inference. Recently, advanced inference engines such as vLLM have emerged, incorporating novel mechanisms such as efficient memory management to achieve state-of-the-art performance. In this paper, we analyze the performance, particularly the throughput (tokens generated per unit of time), of 20 LLMs using two inference libraries: vLLM and HuggingFace's pipelines. We investigate how various hyperparameters, which developers must configure, influence inference performance. Our results reveal that throughput landscapes are irregular, with distinct peaks, highlighting the importance of hyperparameter optimization to achieve maximum performance. We also show that applying hyperparameter optimization when upgrading or downgrading the GPU model used for inference can improve throughput from HuggingFace pipelines by an average of 9.16% and 13.7%, respectively.	翻訳日:2024-08-05 14:17:04 公開日:2024-08-02
# StemからSternへ - AIバリューチェーンによるテスト可能性 From Stem to Stern: Contestability Along AI Value Chains ( http://arxiv.org/abs/2408.01051v1 ) ライセンス: Link先を確認	Agathe Balayn, Yulu Pi, David Gray Widder, Kars Alfrink, Mireia Yurrita, Sohini Upadhyay, Naveena Karusala, Henrietta Lyons, Cagatay Turkay, Christelle Tessono, Blair Attard-Frost, Ujwal Gadiraju,	(参考訳) このワークショップは、競争可能なAIのトピックに焦点を当てた学際的なCSCW研究者のコミュニティを成長させ、統合する。ワークショップの成果として、AIバリューチェーンに沿った競争可能性に関する最も急進的な機会と課題を、研究ロードマップの形でまとめます。このロードマップは、この分野における差し迫った仕事を形作り、刺激するのに役立ちます。 AIバリューチェーンの長さと深さを考慮すると、このようなチェーンのさまざまな場所に沿ってAIシステムの競争性について、特に議論が引き起こされるだろう。このワークショップは、さまざまな状況において、競争可能なAIを設計、展開するための要件、障害、機会を特定するために、(すべきであろうが)争われた、具体的な、成功した、失敗したAIシステムの対話とデモンストレーションのためのプラットフォームとして機能する。これは主に、ハイブリッドな宿泊施設を備えた個人ワークショップとして開催される。この日は、個々のプレゼンテーションとグループ活動から成り、アイデアを刺激し、競争可能なAIの分野に対する広範なリフレクションを刺激する。我々の目標は、研究者、実践者、利害関係者を集結させ、競争可能なAIの設計と展開を促進することで学際対話を促進することである。 This workshop will grow and consolidate a community of interdisciplinary CSCW researchers focusing on the topic of contestable AI. As an outcome of the workshop, we will synthesize the most pressing opportunities and challenges for contestability along AI value chains in the form of a research roadmap. This roadmap will help shape and inspire imminent work in this field. Considering the length and depth of AI value chains, it will especially spur discussions around the contestability of AI systems along various sites of such chains. The workshop will serve as a platform for dialogue and demonstrations of concrete, successful, and unsuccessful examples of AI systems that (could or should) have been contested, to identify requirements, obstacles, and opportunities for designing and deploying contestable AI in various contexts. This will be held primarily as an in-person workshop, with some hybrid accommodation. The day will consist of individual presentations and group activities to stimulate ideation and inspire broad reflections on the field of contestable AI. Our aim is to facilitate interdisciplinary dialogue by bringing together researchers, practitioners, and stakeholders to foster the design and deployment of contestable AI.	翻訳日:2024-08-05 14:17:04 公開日:2024-08-02
# MILP/MIQCPを用いたシモン様暗号の微分線形識別器の自動探索の高速化 Enhancing the MILP/MIQCP-based Automatic Search for Differential-Linear Distinguishers of Simon-Like Ciphers ( http://arxiv.org/abs/2408.01052v1 ) ライセンス: Link先を確認	Siwei Chen, Zejun Xiang, Xiangyong Zeng, Guangxue Qin,	(参考訳) 本稿では,Simon および Simeck ブロック暗号系の全メンバに対して,より優れた差分線形(DL)判別器を自動で見つけるために,MILP/MIQCP(MILP/MIQCP)に基づく改良手法を提案する。具体的には、まず、線形部分を記述するための完全に正確なMILPモデルを与え、また、 \textsf{Gurobi} ソルバの一般式を用いて、中間部分に対する連続差の伝播を非常に簡単な方法でモデル化する方法を説明する。第二に、MILP/MIQCPモデルを妥当な時間で解くために、探索過程を高速化する分割・畳み込みのアイデアに基づく2つのヒューリスティック戦略を提案する。第3に,DL軌跡のクラスタリング効果を利用した変換手法を導入し,DL近似の相関性を推定する。本手法をSimonおよびSimeckブロック暗号系に適用する。その結果,Simon32/48/64/96の1ラウンド,Simon64は2ラウンドで,14/17/21/26ラウンドのSim32/48/64/96のDL差分器が得られた。 Simeck氏にとって、現在最高の結果よりも長い区別器を探索するわけではないが、Zhou et al(MILP/MIQCPを用いたSimonライクな暗号のDL区別器の発見を自動化する最初の作業)の結果をすべて更新する。さらに,シモン32/シメック32とシモン48/シメック48で,これらの判別器の正当性を検証する実験を行った。その結果, 相関関係の理論的推定は実験値に非常に近いことが示され, 提案手法の有効性を裏付ける具体的な支援とみなすことができる。 In this paper, we propose an improved method based on Mixed-Integer Linear Programming/Mixed-Integer Quadratic Constraint Programming (MILP/MIQCP) to automatically find better differential-linear (DL) distinguishers for the all members of Simon and Simeck block cipher families. To be specific, we first give the completely precise MILP model to describe the linear part, and explain how to utilize the general expressions of \textsf{Gurobi} solver to model the propagation of continuous difference for the middle part in a quite easy way. Secondly, in order to solve the MILP/MIQCP model in a reasonable time, we propose two heuristic strategies based on the divide-and-conquer idea to speed up the search process. Thirdly, we introduce the transforming technique, which exploits the clustering effect on DL trails, to improve the estimated correlation of the DL approximation. We apply our method to Simon and Simeck block cipher families. Consequently, we find the 14/17/21/26-round theoretical DL distinguishers of Simon32/48/64/96, which extend the previous longest ones of Simon32/48/96 by one round and Simon64 by two rounds, respectively. For Simeck, we do not explore longer distinguishers compared to the currently best results, but refresh all the results of Zhou et al. (the first work to automate finding DL distinguishers for Simon-like ciphers using MILP/MIQCP). Besides, in order to validate the correctness of these distinguishers, the experimental verifications are conducted on Simon32/Simeck32 and Simon48/Simeck48. The results show that our theoretical estimations on correlations are very close to the experimental ones, which can be regarded as a concrete support for the effectiveness of our method.	翻訳日:2024-08-05 14:17:04 公開日:2024-08-02
# 実行時エラーハンドラとしてのLLM: ソフトウェアシステムの適応的自己修復のための実証パス LLM as Runtime Error Handler: A Promising Pathway to Adaptive Self-Healing of Software Systems ( http://arxiv.org/abs/2408.01055v1 ) ライセンス: Link先を確認	Zhensu Sun, Haotian Zhu, Bowen Xu, Xiaoning Du, Li Li, David Lo,	(参考訳) 事前に定義されたハンドラが欠如している予期しない実行時エラーは、突然実行を終了させ、データ損失やシステムクラッシュなどの重大な結果をもたらす可能性がある。開発段階で潜在的なエラーを特定するための大規模な努力にもかかわらず、そのような予期せぬエラーを完全に排除することは依然として困難であり、ランタイムの緩和測定は影響を最小限に抑えるために依然として不可欠である。既存のハンドラを再利用するなど自動自己修復技術は,実行終了に伴う損失を軽減するために研究されている。しかし、既存のメソッドのユーザビリティは、事前に定義されたヒューリスティックなルールによって維持され、様々なランタイムエラーを適応的に処理することができない。近年,Large Language Models (LLMs) の出現により,この問題に対処するための新たな道が開かれた。コードの理解と生成において顕著な能力に着想を得て,LLMを用いてリアルタイムに実行時のエラーに対処することを提案する。具体的には、ランタイムエラーを処理するための最初のLCM支援セルフヒーリングフレームワークであるHealerを提案する。未処理のランタイムエラーが発生した場合、Healerは内部LCMの助けを借りてエラー処理コードを生成するためにアクティベートされ、フレームワークが所有するランタイム環境内でコードが実行され、プログラムの実行を継続する修正プログラム状態を取得する。我々は,4つの異なるコードベンチマークと3つの最先端LCM,GPT-3.5,GPT-4,CodeQwen-7Bを用いて,Healerの性能を評価する。その結果、微調整の必要なく、GPT-4は72.8%のランタイムエラーからプログラムをリカバリするのに役立ち、実行時エラーを処理するLCMの可能性を強調している。 Unanticipated runtime errors, lacking predefined handlers, can abruptly terminate execution and lead to severe consequences, such as data loss or system crashes. Despite extensive efforts to identify potential errors during the development phase, such unanticipated errors remain a challenge to to be entirely eliminated, making the runtime mitigation measurements still indispensable to minimize their impact. Automated self-healing techniques, such as reusing existing handlers, have been investigated to reduce the loss coming through with the execution termination. However, the usability of existing methods is retained by their predefined heuristic rules and they fail to handle diverse runtime errors adaptively. Recently, the advent of Large Language Models (LLMs) has opened new avenues for addressing this problem. Inspired by their remarkable capabilities in understanding and generating code, we propose to deal with the runtime errors in a real-time manner using LLMs. Specifically, we propose Healer, the first LLM-assisted self-healing framework for handling runtime errors. When an unhandled runtime error occurs, Healer will be activated to generate a piece of error-handling code with the help of its internal LLM and the code will be executed inside the runtime environment owned by the framework to obtain a rectified program state from which the program should continue its execution. Our exploratory study evaluates the performance of Healer using four different code benchmarks and three state-of-the-art LLMs, GPT-3.5, GPT-4, and CodeQwen-7B. Results show that, without the need for any fine-tuning, GPT-4 can successfully help programs recover from 72.8% of runtime errors, highlighting the potential of LLMs in handling runtime errors.	翻訳日:2024-08-05 14:17:04 公開日:2024-08-02
# 二次ボソニック系のボソニックホール Bosonic Holes in Quadratic Bosonic Systems ( http://arxiv.org/abs/2408.01059v1 ) ライセンス: Link先を確認	Jia-Ming Hu, Bo Wang, Ze-Liang Xiang,	(参考訳) 電子孔の概念は凝縮物質物理学において重要な役割を果たす。ここでは、二次ボソニック系において負の粒子励起を示すボソニックホールの概念を開発する。電子孔とは異なり、ボゾン孔のフォック状態は生物直交であり、それらの励起は平均場背景を持つ平均粒子数から粒子を除去するものとして解釈することができる。さらに、非ユニタリおよび局所粒子孔変換に関連する二次ボソニックハミルトニアンは、異なる空間における同じ局所構造とスペクトル特性を持ち、PH双対性を反映している。そこで本研究では,2つのモードの場合におけるPHエンタングルメントの発生と3つのモードの場合におけるPHアハロノフ・ボーム効果について検討した。本研究は粒子非保存系および非エルミート系における異常な物理現象の理解と探索を行う新しい方法を提供する。 The concept of electron holes plays a significant role in condensed matter physics. Here we develop the concept of bosonic holes, which exhibit negative particle excitations, in quadratic bosonic systems. Unlike electron holes, the Fock states of bosonic holes are biorthogonal, and their excitation can be interpreted as removing particles from a mean-particle number with a mean field background. Furthermore, we find that quadratic bosonic Hamiltonians related by non unitary and local particle hole transformation possess the same locality structure and spectral properties in different spaces, reflecting the PH duality. Based on this, we study the generation of PH entanglement in two mode cases and the PH Aharonov Bohm effect in the three mode case, which results in a PH chiral flow with time reversal symmetry breaking. Our findings provide a new way to understand and explore unusual physical phenomena in particle non conserving and non Hermitian systems.	翻訳日:2024-08-05 14:17:04 公開日:2024-08-02
# 二次状態におけるカーネルランダム行列の普遍性とカーネル回帰 Universality of kernel random matrices and kernel regression in the quadratic regime ( http://arxiv.org/abs/2408.01062v1 ) ライセンス: Link先を確認	Parthe Pandit, Zhichao Wang, Yizhe Zhu,	(参考訳) カーネルリッジ回帰(KRR)は機械学習モデルの一般的なクラスであり、ディープラーニングを理解するための重要なツールとなっている。ここでは、$n$はトレーニングサンプルの数、$d$はデータセットの次元である。この状態において、データ分布の一定の条件下では、KRRに関わるカーネルランダム行列は、線形カーネルと同様の振舞いを示す。本研究では、カーネル回帰の研究を2次漸近状態に拡張し、$n \asymp d^2$とする。本研究では,内積核の幅広いクラスが二次核と同様の挙動を示すことを示す。具体的には、元のカーネル乱数行列と二次カーネル乱数行列との差に対する作用素ノルム近似を、カーネル関数のテイラー展開と比較して追加の補正項で確立する。この近似は、ガウスモーメントマッチング仮定の下での一般データ分布と共分散構造に作用する。この新たな近似を用いて、元のカーネル行列のスペクトル分布を制限し、$n/d^2$が非ゼロ定数に収束した場合の二次状態におけるKRRの正確な漸近的トレーニングと一般化誤差を特徴づける。一般化誤差は、決定論的およびランダムな教師モデルの両方に対して得られる。我々の証明手法はモーメント法, ウィックの公式, 直交多項式, および相関成分を持つランダム行列の分解能解析を組み合わせている。 Kernel ridge regression (KRR) is a popular class of machine learning models that has become an important tool for understanding deep learning. Much of the focus has been on studying the proportional asymptotic regime, $n \asymp d$, where $n$ is the number of training samples and $d$ is the dimension of the dataset. In this regime, under certain conditions on the data distribution, the kernel random matrix involved in KRR exhibits behavior akin to that of a linear kernel. In this work, we extend the study of kernel regression to the quadratic asymptotic regime, where $n \asymp d^2$. In this regime, we demonstrate that a broad class of inner-product kernels exhibit behavior similar to a quadratic kernel. Specifically, we establish an operator norm approximation bound for the difference between the original kernel random matrix and a quadratic kernel random matrix with additional correction terms compared to the Taylor expansion of the kernel functions. The approximation works for general data distributions under a Gaussian-moment-matching assumption with a covariance structure. This new approximation is utilized to obtain a limiting spectral distribution of the original kernel matrix and characterize the precise asymptotic training and generalization errors for KRR in the quadratic regime when $n/d^2$ converges to a non-zero constant. The generalization errors are obtained for both deterministic and random teacher models. Our proof techniques combine moment methods, Wick's formula, orthogonal polynomials, and resolvent analysis of random matrices with correlated entries.	翻訳日:2024-08-05 14:17:04 公開日:2024-08-02
# モバイルアプリレビュー機能抽出のための大規模言語モデルの活用 Leveraging Large Language Models for Mobile App Review Feature Extraction ( http://arxiv.org/abs/2408.01063v1 ) ライセンス: Link先を確認	Quim Motger, Alessio Miaschi, Felice Dell'Orletta, Xavier Franch, Jordi Marco,	(参考訳) モバイルアプリレビュー分析では,ユーザ生成ドキュメントの低品質,主観的バイアス,ノイズのある内容など,ユニークな課題が提示される。これらのレビューから特徴を抽出することは、機能の優先順位付けや感情分析といったタスクには不可欠ですが、それでも難しい作業です。一方、Transformerアーキテクチャに基づくエンコーダのみのモデルでは、複数のソフトウェアエンジニアリングプロセスの分類と情報抽出タスクに有望な結果が示されている。本研究では,エンコーダのみの大規模言語モデルがモバイルアプリレビューから特徴抽出を促進できるという仮説を考察する。クラウドソーシングされたアノテーションを産業的文脈から活用することにより、特徴抽出を教師付きトークン分類タスクとして再定義する。我々のアプローチは、コンテキスト理解を改善するためにユーザーレビューの膨大なコーパスでこれらのモデルの事前学習を拡張し、モデル微調整を最適化するためにインスタンス選択技術を採用することである。実験により,抽出した特徴の精度とリコールが向上し,性能効率が向上することが確認された。主なコントリビューションには、特徴抽出に対する新しいアプローチ、注釈付きデータセット、拡張事前訓練されたモデル、コスト効率の良い微調整のためのインスタンス選択メカニズムなどがある。本研究は,モバイルアプリレビューにおける自然言語処理タスクに大規模言語モデルを適用するための実践的手法と実証的エビデンスを提供し,特徴抽出の性能向上を提供する。 Mobile app review analysis presents unique challenges due to the low quality, subjective bias, and noisy content of user-generated documents. Extracting features from these reviews is essential for tasks such as feature prioritization and sentiment analysis, but it remains a challenging task. Meanwhile, encoder-only models based on the Transformer architecture have shown promising results for classification and information extraction tasks for multiple software engineering processes. This study explores the hypothesis that encoder-only large language models can enhance feature extraction from mobile app reviews. By leveraging crowdsourced annotations from an industrial context, we redefine feature extraction as a supervised token classification task. Our approach includes extending the pre-training of these models with a large corpus of user reviews to improve contextual understanding and employing instance selection techniques to optimize model fine-tuning. Empirical evaluations demonstrate that this method improves the precision and recall of extracted features and enhances performance efficiency. Key contributions include a novel approach to feature extraction, annotated datasets, extended pre-trained models, and an instance selection mechanism for cost-effective fine-tuning. This research provides practical methods and empirical evidence in applying large language models to natural language processing tasks within mobile app reviews, offering improved performance in feature extraction.	翻訳日:2024-08-05 14:17:04 公開日:2024-08-02
# 腹腔鏡下手術用ビデオ機器のアモーダルセグメンテーション Amodal Segmentation for Laparoscopic Surgery Video Instruments ( http://arxiv.org/abs/2408.01067v1 ) ライセンス: Link先を確認	Ruohua Shi, Zhaochen Liu, Lingyu Duan, Tingting Jiang,	(参考訳) 手術器具の分節化は, 外科的パフォーマンスの向上と患者の安全確保に不可欠である。バイナリ、セマンティック、インスタンスセグメンテーションといった従来のテクニックは共通の欠点を共有している。これらの閉塞楽器の完全な範囲を正確に予測することは、手術中に重要なガイダンスを提供し、潜在的な外科的誤りの分析を支援し、教育目的に役立てることによって、腹腔鏡下手術を著しく改善することができる。本稿では,医療分野における外科器具の領域にアモーダルセグメンテーションを導入する。このテクニックは、オブジェクトの可視部と隠蔽部の両方を識別する。これを実現するために、2017 MICCAI EndoVis Robotic Instrument Segmentation Challengeデータセットを用いて、各機器に完全なマスクを付加することで、新しいAmoal Instruments Segmentation(AIS)データセットを導入する。さらに、この新たなデータセットのベンチマークを確立するために、いくつかの主要なアモーダルセグメンテーション手法を評価する。 Segmentation of surgical instruments is crucial for enhancing surgeon performance and ensuring patient safety. Conventional techniques such as binary, semantic, and instance segmentation share a common drawback: they do not accommodate the parts of instruments obscured by tissues or other instruments. Precisely predicting the full extent of these occluded instruments can significantly improve laparoscopic surgeries by providing critical guidance during operations and assisting in the analysis of potential surgical errors, as well as serving educational purposes. In this paper, we introduce Amodal Segmentation to the realm of surgical instruments in the medical field. This technique identifies both the visible and occluded parts of an object. To achieve this, we introduce a new Amoal Instruments Segmentation (AIS) dataset, which was developed by reannotating each instrument with its complete mask, utilizing the 2017 MICCAI EndoVis Robotic Instrument Segmentation Challenge dataset. Additionally, we evaluate several leading amodal segmentation methods to establish a benchmark for this new dataset.	翻訳日:2024-08-05 14:17:04 公開日:2024-08-02
# シリコンおよび金基板上の遷移金属ジアルコゲナイドナノアンテナと結合した単層半導体における単一光子放出体 Single photon emitters in monolayer semiconductors coupled to transition metal dichalcogenide nanoantennas on silica and gold substrates ( http://arxiv.org/abs/2408.01070v1 ) ライセンス: Link先を確認	Panaiot G. Zotev, Sam A. Randerson, Xuerong Hu, Yue Wang, Alexander I. Tartakovskii,	(参考訳) 遷移金属ジアルコゲナイド(TMD)単光子エミッタ(SPE)は、高い単一光子純度や決定論的位置決めなどの量子情報応用に多くの利点をもたらす。誘電体Mie共振器によって誘導されるホスト単分子膜のひずみは、その形成を光学特性をより制御するために、近接場フォトニックホットスポットと共配置された位置に局在させることが知られている。しかし、シリコンやガリウムホスプヒド(GaP)のようなナノ共振器の製造に用いられる伝統的な材料は、しばしば高屈折率の基板を必要とするため、発光光の損失と光強度の制限が生じる。そこで我々は,多層TMDから作製したナノアンテナ(NA)を用いて,接着ファンデルワールス力による基板選択による完全な柔軟性を実現し,高い屈折率コントラストや高反射率金属表面の利用を可能にした。 SiO$_2$およびAu基板上のWS$_2$NAsに移動したWSe$_2$単分子膜におけるSPEの局在化を実証し,光機能強化と単一光子コレクションの増大を可能にした。 SiO$_2$(Au)基板上で、WS$_2$NAs上でSPEの平均値が43%(7%)に達する量子効率(QE)が向上する証拠を提供する。さらに、誘電体基板と金属基板の両方で得られる利点を組み合わせて、最大WSe$2$単一光子励起、放出、収集のために最適化されたNA幾何を数値的にシミュレートする。したがって、蛍光は真空に比べて4桁以上、平坦なSiO$_2$/Si表面に比べて5桁以上増強される。本研究は, 種々の基板上にTMD材料ナノ共振器を用いることにより, SPE形成とフォトニック増強に有効であることを示す。 Transition metal dichalcogenide (TMD) single photon emitters (SPEs) offer numerous advantages to quantum information applications, such as high single photon purity and deterministic positioning. Strain in the host monolayer, induced by underlying dielectric Mie resonators, is known to localize their formation to positions co-located with near-field photonic hotspots providing further control over their optical properties. However, traditional materials used for the fabrication of nanoresonators, such as silicon or gallium phosphide (GaP), often require a high refractive index substrate resulting in losses of the emitted light and limited photonic enhancement. Here, we use nanoantennas (NAs) fabricated from multilayer TMDs, which allow complete flexibility with the choice of substrate due to the adhesive van der Waals forces, enabling high refractive index contrast or the use of highly reflective metallic surfaces. We demonstrate the localized formation of SPEs in WSe$_2$ monolayers transferred onto WS$_2$ NAs on both SiO$_2$ and Au substrates, enabling strong photonic enhancements and increased single photon collection. We provide evidence for enhanced quantum efficiencies (QE) reaching an average value of 43% (7%) for SPEs on WS$_2$ NAs on a SiO$_2$ (Au) substrate. We further combine the advantages offered by both dielectric and metallic substrates to numerically simulate an optimized NA geometry for maximum WSe$_2$ single photon excitation, emission, collection. Thus, the fluorescence is enhanced by a factor of over 4 orders of magnitude compared to vacuum and 5 orders of magnitude compared to a flat SiO$_2$/Si surface. Our work showcases the advantages offered by employing TMD material nanoresonators on various substrates for SPE formation and photonic enhancement.	翻訳日:2024-08-05 14:17:04 公開日:2024-08-02
# 強化学習における自己表現法の検討 A Survey on Self-play Methods in Reinforcement Learning ( http://arxiv.org/abs/2408.01072v1 ) ライセンス: Link先を確認	Ruize Zhang, Zelai Xu, Chengdong Ma, Chao Yu, Wei-Wei Tu, Shiyu Huang, Deheng Ye, Wenbo Ding, Yaodong Yang, Yu Wang,	(参考訳) エージェントとコピーや過去のバージョンとの相互作用を特徴とするセルフプレイは、近年、強化学習において注目されている。本稿ではまず,マルチエージェント強化学習フレームワークやゲーム理論の基本概念を含む,セルフプレイの予備的概念を明らかにする。そして、統合されたフレームワークを提供し、このフレームワーク内の既存のセルフプレイアルゴリズムを分類する。さらに,本論文は,異なるシナリオにおける自己表現の役割を具現化することによって,アルゴリズムと実践的意味のギャップを埋めるものである。最後に、この調査はオープンな課題と、セルフプレイにおける今後の研究方向性を強調している。本稿は,RLにおける自己表現の多面的景観を理解するためのガイドマップである。 Self-play, characterized by agents' interactions with copies or past versions of itself, has recently gained prominence in reinforcement learning. This paper first clarifies the preliminaries of self-play, including the multi-agent reinforcement learning framework and basic game theory concepts. Then it provides a unified framework and classifies existing self-play algorithms within this framework. Moreover, the paper bridges the gap between the algorithms and their practical implications by illustrating the role of self-play in different scenarios. Finally, the survey highlights open challenges and future research directions in self-play. This paper is an essential guide map for understanding the multifaceted landscape of self-play in RL.	翻訳日:2024-08-05 14:07:18 公開日:2024-08-02
# EAP-AIAS: 学術目的のための英語のためのAIアセスメント尺度の適用 The EAP-AIAS: Adapting the AI Assessment Scale for English for Academic Purposes ( http://arxiv.org/abs/2408.01075v1 ) ライセンス: Link先を確認	Jasper Roe, Mike Perkins, Yulia Tregubova,	(参考訳) ジェネレーティブ・人工知能(GenAI)の急速な進歩は、学術目的のための英語教育の機会と課題の両方を提示する。本稿では,AIA-AIAS(AIA-AIAS)と呼ばれる,EAPコンテキストに適したAIアセスメント尺度(AIAS)の適応を提案する。このフレームワークは,学術的完全性を維持しつつ,言語開発を支援するとともに,GenAIツールをEAP評価プラクティスに統合するための構造化されたアプローチを提供することを目的としている。 EAP-AIASは、"No AI"から"Full AI"までの5つのレベルで構成されている。言語学習者の独特なニーズと、EAPが言語習熟度とアカデミック・アカデミック・アカデミック・アカルチュレーションに注力していることを考えると、この適応の背景にある理論的根拠について論じる。本稿では,タスクやプレゼンテーション,研究プロジェクトなど,さまざまなEAP評価タイプにまたがるEAP-AIASの適用可能性について検討する。柔軟なフレームワークを提供することにより、EAP-AIASは、教育におけるGenAI統合の複雑さに対処し、AIに強化された学術的および専門的な未来のために学生を準備するEAP実践者に力を与えようとしている。この適応は、言語教育における倫理的かつ教育的に健全なAI統合の必要性に対処するためのステップである。 The rapid advancement of Generative Artificial Intelligence (GenAI) presents both opportunities and challenges for English for Academic Purposes (EAP) instruction. This paper proposes an adaptation of the AI Assessment Scale (AIAS) specifically tailored for EAP contexts, termed the EAP-AIAS. This framework aims to provide a structured approach for integrating GenAI tools into EAP assessment practices while maintaining academic integrity and supporting language development. The EAP-AIAS consists of five levels, ranging from "No AI" to "Full AI", each delineating appropriate GenAI usage in EAP tasks. We discuss the rationale behind this adaptation, considering the unique needs of language learners and the dual focus of EAP on language proficiency and academic acculturation. This paper explores potential applications of the EAP-AIAS across various EAP assessment types, including writing tasks, presentations, and research projects. By offering a flexible framework, the EAP-AIAS seeks to empower EAP practitioners seeking to deal with the complexities of GenAI integration in education and prepare students for an AI-enhanced academic and professional future. This adaptation represents a step towards addressing the pressing need for ethical and pedagogically sound AI integration in language education.	翻訳日:2024-08-05 14:07:18 公開日:2024-08-02
# 継続学習のための事前学習型テキストエンコーダのセマンティック知識の活用 Exploiting the Semantic Knowledge of Pre-trained Text-Encoders for Continual Learning ( http://arxiv.org/abs/2408.01076v1 ) ライセンス: Link先を確認	Lu Yu, Zhe Tao, Hantao Yao, Joost Van de Weijer, Changsheng Xu,	(参考訳) ディープニューラルネットワーク(DNN)は、固定データセットに排他的だが、現実のシナリオでは、インクリメンタルなデータシフトに苦労する。継続学習は、モデルが学習した知識を維持しながら、新しいデータから学習できるようにすることによって、この課題に対処する。既存の手法は主に視覚的特徴に依存しており、しばしばテキストで符号化されたリッチな意味情報を無視する。画像のラベル情報で利用できるセマンティック知識は、以前に取得したセマンティッククラスの知識と関連する重要なセマンティック情報を提供する。その結果、継続的な学習を通じてこの情報を効果的に活用することは有益であることが期待される。そこで本研究では,テキスト埋め込みを用いて意味的類似性を捉えることによって,タスク内およびタスク間のセマンティックガイダンスの統合を提案する。事前訓練されたCLIPモデルから始まり、現在のすべてのタスククラスに対してソフトアサインメントを行うために \emph{Semantically-guided Representation Learning (SG-RL) モジュールを使用し、知識伝達を強化するために Semantically-guided Knowledge Distillation (SG-KD) モジュールを使用する。実験結果から,本手法の汎用および微粒なデータセット上での優位性を示した。私たちのコードは、https://github.com/aprilsveryown/semantically-guided-continual-learningで見られます。 Deep neural networks (DNNs) excel on fixed datasets but struggle with incremental and shifting data in real-world scenarios. Continual learning addresses this challenge by allowing models to learn from new data while retaining previously learned knowledge. Existing methods mainly rely on visual features, often neglecting the rich semantic information encoded in text. The semantic knowledge available in the label information of the images, offers important semantic information that can be related with previously acquired knowledge of semantic classes. Consequently, effectively leveraging this information throughout continual learning is expected to be beneficial. To address this, we propose integrating semantic guidance within and across tasks by capturing semantic similarity using text embeddings. We start from a pre-trained CLIP model, employ the \emph{Semantically-guided Representation Learning (SG-RL)} module for a soft-assignment towards all current task classes, and use the Semantically-guided Knowledge Distillation (SG-KD) module for enhanced knowledge transfer. Experimental results demonstrate the superiority of our method on general and fine-grained datasets. Our code can be found in https://github.com/aprilsveryown/semantically-guided-continual-learning.	翻訳日:2024-08-05 14:07:18 公開日:2024-08-02
# PhysMamba: リモート生理計測のためのデュアルストリームクロスアテンションSSDの活用 PhysMamba: Leveraging Dual-Stream Cross-Attention SSD for Remote Physiological Measurement ( http://arxiv.org/abs/2408.01077v1 ) ライセンス: Link先を確認	Zhixin Yan, Yan Zhong, Wenjun Zhang, Lin Shu, Hongbin Xu, Wenxiong Kang,	(参考訳) リモートフォトプラチスモグラフィー(Remote Photoplethysmography, RPPG)は、感情モニタリング、医療支援、反顔スプーフィングなどの応用に用いられる、顔ビデオから生理的信号を抽出する非接触技術である。制御された実験室環境とは異なり、現実の環境は、しばしば動きのアーティファクトやノイズを含み、既存の手法の性能に影響する。そこで本研究では,Mambaをベースとした双方向時間周波数対話モデルであるPhysMambaを提案する。 PhysMambaは最先端のMamba-2モデルを統合し、マルチストリームアーキテクチャを用いて様々なrPPG特徴を学習し、ノイズ条件下で堅牢性を向上させる。さらに、情報交換を改善し、2つのストリーム間の相補性を特徴とするCASSDモジュールを設計した。 PURE,UBFC-rPPG,MMPDを用いてPhysMambaを検証する。実験の結果,PhysMambaは様々なシナリオ,特に複雑な環境での最先端のパフォーマンスを実現し,遠隔心拍モニタリングの実用化の可能性を示した。 Remote Photoplethysmography (rPPG) is a non-contact technique for extracting physiological signals from facial videos, used in applications like emotion monitoring, medical assistance, and anti-face spoofing. Unlike controlled laboratory settings, real-world environments often contain motion artifacts and noise, affecting the performance of existing methods. To address this, we propose PhysMamba, a dual-stream time-frequency interactive model based on Mamba. PhysMamba integrates the state-of-the-art Mamba-2 model and employs a dual-stream architecture to learn diverse rPPG features, enhancing robustness in noisy conditions. Additionally, we designed the Cross-Attention State Space Duality (CASSD) module to improve information exchange and feature complementarity between the two streams. We validated PhysMamba using PURE, UBFC-rPPG and MMPD. Experimental results show that PhysMamba achieves state-of-the-art performance across various scenarios, particularly in complex environments, demonstrating its potential in practical remote heart rate monitoring applications.	翻訳日:2024-08-05 14:07:18 公開日:2024-08-02
# FCDフュージョン:可視・赤外線画像対の高速低色偏光法 FCDFusion: a Fast, Low Color Deviation Method for Fusing Visible and Infrared Image Pairs ( http://arxiv.org/abs/2408.01080v1 ) ライセンス: Link先を確認	Hesong Li, Ying Fu,	(参考訳) 可視・赤外画像融合(VIF)は、可視・赤外画像からの情報を単一の融合画像に結合することを目的としている。従来のVIF法では、色空間変換を用いて色相と彩度を元の可視像から守る。しかし、高速なVIF法では、この演算が計算の大部分を占め、高速な処理を妨げるボトルネックとなっている。本稿では,色差の少ない高速核融合法FCDFusionを提案する。 RGB色空間で直接操作することで、色空間変換なしで色情報を保存する。ガンマ補正を少しのコストで組み込んでおり、色とコントラストを迅速に改善することができる。我々は,融合過程を3次元カラーベクトルのスケーリング操作とみなし,計算を大幅に単純化した。理論的解析と実験により,1ピクセルあたりのFLOPは7個しか得られなかった。 HSV色空間を用いた最先端の高速色保存法と比較すると,計算コストの半減で高いコントラストが得られる。さらに、色保存のためのVIF法の性能を測定するための新しい測度、色偏差について提案する。カラー可視光画像を用いたVIFタスク用に特別に設計されており、既存のVIFメトリクスの欠如を克服している。私たちのコードはhttps://github.com/HeasonLee/FCDFusion.comで利用可能です。 Visible and infrared image fusion (VIF) aims to combine information from visible and infrared images into a single fused image. Previous VIF methods usually employ a color space transformation to keep the hue and saturation from the original visible image. However, for fast VIF methods, this operation accounts for the majority of the calculation and is the bottleneck preventing faster processing. In this paper, we propose a fast fusion method, FCDFusion, with little color deviation. It preserves color information without color space transformations, by directly operating in RGB color space. It incorporates gamma correction at little extra cost, allowing color and contrast to be rapidly improved. We regard the fusion process as a scaling operation on 3D color vectors, greatly simplifying the calculations. A theoretical analysis and experiments show that our method can achieve satisfactory results in only 7 FLOPs per pixel. Compared to state-of-the-art fast, color-preserving methods using HSV color space, our method provides higher contrast at only half of the computational cost. We further propose a new metric, color deviation, to measure the ability of a VIF method to preserve color. It is specifically designed for VIF tasks with color visible-light images, and overcomes deficiencies of existing VIF metrics used for this purpose. Our code is available at https://github.com/HeasonLee/FCDFusion.	翻訳日:2024-08-05 14:07:18 公開日:2024-08-02
# ヘリカルコンパクト次元を持つモデルにおける位相カシミール効果 Topological Casimir effect in models with helical compact dimensions ( http://arxiv.org/abs/2408.01082v1 ) ライセンス: Link先を確認	R. M. Avagyan, A. A. Saharian, D. H. Simonyan, G. H. Harutyunyan,	(参考訳) 一般曲率結合パラメータを持つ荷電スカラー場の真空状態の局所特性に及ぼす空間次元のヘリカルコンパクト化の影響について検討する。一般的な背景幾何学は、ヘリカル周期性条件に現れる座標を持つ部分空間において回転対称性を持つと考えられる。座標変換により、問題は同じ局所幾何学における標準準周期性条件と、コンパクト次元の長さとヘリシティパラメータによって決定される有効コンパクト化半径の問題に還元される。一般手順の応用として、ヘリカルコンパクト次元を持つ局所デ・ジッター時空を考える。バンチダヴィーズ真空状態に対するアダマール関数を用いて, 電場2乗, 電流密度, エネルギー-モーメントテンソルの真空期待値について検討した。トポロジカルコントリビューションは明確に分離され、その漸近は宇宙膨張の初期段階と後期に記述される。準周期条件の問題と比較して重要な違いは、エネルギー-運動量テンソルの非零外対角成分と非コンパクト次元に沿った電流密度の成分の出現である。 We investigate the influence of the helical compactification of spatial dimension on the local properties of the vacuum state for a charged scalar field with general curvature coupling parameter. A general background geometry is considered with rotational symmetry in the subspace with the coordinates appearing in the helical periodicity condition. It is shown that by a coordinate transformation the problem is reduced to the problem with standard quasiperiodicity condition in the same local geometry and with the effective compactification radius determined by the length of the compact dimension and the helicity parameter. As an application of the general procedure we have considered locally de Sitter spacetime with a helical compact dimension. By using the Hadamard function for the Bunch-Davies vacuum state, the vacuum expectation values of the field squared, current density, and energy-momentum tensor are studied. The topological contributions are explicitly separated and their asymptotics are described at early and late stages of cosmological expansion. An important difference, compared to the problem with quasiperiodic conditions, is the appearance of the nonzero off-diagonal component of the energy-momentum tensor and of the component of the current density along the uncompact dimension.	翻訳日:2024-08-05 14:07:18 公開日:2024-08-02
# 雑音文脈処理のための検索拡張生成における適応的コントラスト復号法 Adaptive Contrastive Decoding in Retrieval-Augmented Generation for Handling Noisy Contexts ( http://arxiv.org/abs/2408.01084v1 ) ライセンス: Link先を確認	Youna Kim, Hyuhng Joon Kim, Cheonbok Park, Choonghyun Park, Hyunsoo Cho, Junyeob Kim, Kang Min Yoo, Sang-goo Lee, Taeuk Kim,	(参考訳) オープンドメイン質問応答のような知識集約的なタスクで大きな言語モデル(LLM)を使用する場合、外部コンテキストは外部知識とLLMのパラメトリック知識のギャップを埋める可能性がある。近年,LLMのパラメトリック知識に関する文脈知識を対照的な復号法で増幅する研究が進められている。これらのアプローチは、関連するコンテキストが提供されると真に反応する可能性があるが、ノイズの多いコンテキストに直面すると脆弱性が発生する傾向がある。我々は,従来の研究の範囲を広げて,雑音の文脈を包含し,文脈の影響を効果的に活用するための適応型コントラッシブ・デコーディング(ACD)を提案する。 ACDは、ベースラインと比較してオープンドメインの質問応答タスクの改善を示す。 When using large language models (LLMs) in knowledge-intensive tasks, such as open-domain question answering, external context can bridge a gap between external knowledge and LLM's parametric knowledge. Recent research has been developed to amplify contextual knowledge over the parametric knowledge of LLM with contrastive decoding approaches. While these approaches could yield truthful responses when relevant context is provided, they are prone to vulnerabilities when faced with noisy contexts. We extend the scope of previous studies to encompass noisy contexts and propose adaptive contrastive decoding (ACD) to leverage contextual influence effectively. ACD demonstrates improvements in open-domain question answering tasks compared to baselines, especially in robustness by remaining undistracted by noisy contexts in retrieval-augmented generation.	翻訳日:2024-08-05 14:07:18 公開日:2024-08-02
# 逆気象下における3次元物体検出における卵粒径分布の影響 Effect of Fog Particle Size Distribution on 3D Object Detection Under Adverse Weather Conditions ( http://arxiv.org/abs/2408.01085v1 ) ライセンス: Link先を確認	Ajinkya Shinde, Gaurav Sharma, Manisha Pattanaik, Sri Niwas Singh,	(参考訳) 光学スペクトル信号を用いたLiDARベースのセンサーは、自律走行車システムにおける対象物に関する重要な情報を提供する上で重要な役割を果たす。しかし、大気中の霧の存在はシステム全体の性能を著しく低下させる。悪天候下における3次元物体検出における霧粒径分布の役割を解析した。我々は,三重理論と気象光学範囲(MOR)を用いて点雲発生の減衰・後方散乱係数を計算し,車,自転車,歩行者のケースシナリオにおけるシステム全体の精度を,容易で中堅な検出困難下で解析する。ガンマとジュンゲの分布は、強い対流と中程度の対流の霧環境下での霧粒子の粒径分布を数学的にモデル化するために用いられる。その後、後方散乱係数値に基づいてKITTIデータセットを修正し、異なる検出困難下で、PV-RCNN++ディープニューラルネットワークモデルを用いてトレーニングした。その結果, 対象物体の寸法変化, 霧環境の性質, 検出困難度など, 車両の精度は99%程度, 歩行者の精度は73%程度であった。 LiDAR-based sensors employing optical spectrum signals play a vital role in providing significant information about the target objects in autonomous driving vehicle systems. However, the presence of fog in the atmosphere severely degrades the overall system's performance. This manuscript analyzes the role of fog particle size distributions in 3D object detection under adverse weather conditions. We utilise Mie theory and meteorological optical range (MOR) to calculate the attenuation and backscattering coefficient values for point cloud generation and analyze the overall system's accuracy in Car, Cyclist, and Pedestrian case scenarios under easy, medium and hard detection difficulties. Gamma and Junge (Power-Law) distributions are employed to mathematically model the fog particle size distribution under strong and moderate advection fog environments. Subsequently, we modified the KITTI dataset based on the backscattering coefficient values and trained it on the PV-RCNN++ deep neural network model for Car, Cyclist, and Pedestrian cases under different detection difficulties. The result analysis shows a significant variation in the system's accuracy concerning the changes in target object dimensionality, the nature of the fog environment and increasing detection difficulties, with the Car exhibiting the highest accuracy of around 99% and the Pedestrian showing the lowest accuracy of around 73%.	翻訳日:2024-08-05 14:07:18 公開日:2024-08-02
# 知識グラフを用いた接地交換を用いた対話における情報ギャップのブリッジ化 Bridging Information Gaps in Dialogues With Grounded Exchanges Using Knowledge Graphs ( http://arxiv.org/abs/2408.01088v1 ) ライセンス: Link先を確認	Phillip Schneider, Nektarios Machner, Kristiina Jokinen, Florian Matthes,	(参考訳) 知識モデルは対話システムの基本であり、ドメイン固有の知識を扱う必要がある。情報提供会話における効果的なコミュニケーションの確保には、ユーザの理解とシステムに利用可能な知識の整合が不可欠である。しかしながら、対話システムは、自然言語で情報がどのように表現されるかという点における意味的な矛盾から生じる課題に直面することが多い。この問題に対処するために,対話参加者間の共有知識を確立することで,情報ギャップを埋めるメカニズムである対話基盤のための大規模言語モデルの可能性を検討する。私たちのアプローチでは、5つの知識領域にまたがる人間の会話を注釈付けして、BridgeKGと呼ばれる対話コーパスを作成します。本データセットの一連の実験を通じて,知識グラフ構造内の接地行動の分類と接地情報項目の同定において,大規模言語モデルの有効性を実証的に評価した。本研究は,これらのモデルが会話の接地作業や一般的な予測誤りに対して,文脈内学習をどのように利用するかの知見を提供する。本稿では,非構造化対話発話と構造化情報項目のセマンティックレイヤとして,モデルが知識グラフをどのように扱うかについて議論する。 Knowledge models are fundamental to dialogue systems for enabling conversational interactions, which require handling domain-specific knowledge. Ensuring effective communication in information-providing conversations entails aligning user understanding with the knowledge available to the system. However, dialogue systems often face challenges arising from semantic inconsistencies in how information is expressed in natural language compared to how it is represented within the system's internal knowledge. To address this problem, we study the potential of large language models for conversational grounding, a mechanism to bridge information gaps by establishing shared knowledge between dialogue participants. Our approach involves annotating human conversations across five knowledge domains to create a new dialogue corpus called BridgeKG. Through a series of experiments on this dataset, we empirically evaluate the capabilities of large language models in classifying grounding acts and identifying grounded information items within a knowledge graph structure. Our findings offer insights into how these models use in-context learning for conversational grounding tasks and common prediction errors, which we illustrate with examples from challenging dialogues. We discuss how the models handle knowledge graphs as a semantic layer between unstructured dialogue utterances and structured information items.	翻訳日:2024-08-05 14:07:18 公開日:2024-08-02
# ユニバーサルドメイン適応のためのプロトタイプ部分最適輸送 Prototypical Partial Optimal Transport for Universal Domain Adaptation ( http://arxiv.org/abs/2408.01089v1 ) ライセンス: Link先を確認	Yucheng Yang, Xiang Gu, Jian Sun,	(参考訳) ユニバーサルドメイン適応(UniDA)は、両方のドメインの同じラベルセットを必要とすることなく、ラベル付きソースドメインからラベルなしターゲットドメインに知識を転送することを目的としている。ドメインとカテゴリシフトの存在はタスクを難しくし、ドメインギャップを減らす前に「既知の」サンプル(両方のドメインにラベルが存在するサンプル)と「未知の」サンプル(一つのドメインにラベルがあるサンプル)を区別する必要がある。本稿では,2つの分布を部分的に整合させるだけでよい分布マッチングの観点から問題を考察する。ミニバッチ型部分最適輸送(m-PPOT)と呼ばれる新しい手法を提案する。トレーニングフェーズでは,m-PPOTの最小化に加えて,m-PPOTの輸送計画を利用して原型および対象試料の再重み付けを行い,再重み付きエントロピー損失と再重み付きクロスエントロピー損失を設計し,"未知"と"未知"のサンプルを識別する。 4つのベンチマーク実験の結果,提案手法は従来のUniDA手法よりも優れていた。 Universal domain adaptation (UniDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain without requiring the same label sets of both domains. The existence of domain and category shift makes the task challenging and requires us to distinguish "known" samples (i.e., samples whose labels exist in both domains) and "unknown" samples (i.e., samples whose labels exist in only one domain) in both domains before reducing the domain gap. In this paper, we consider the problem from the point of view of distribution matching which we only need to align two distributions partially. A novel approach, dubbed mini-batch Prototypical Partial Optimal Transport (m-PPOT), is proposed to conduct partial distribution alignment for UniDA. In training phase, besides minimizing m-PPOT, we also leverage the transport plan of m-PPOT to reweight source prototypes and target samples, and design reweighted entropy loss and reweighted cross-entropy loss to distinguish "known" and "unknown" samples. Experiments on four benchmarks show that our method outperforms the previous state-of-the-art UniDA methods.	翻訳日:2024-08-05 14:07:18 公開日:2024-08-02
# ニューロモルフィックプリミティブを用いた汎用データフローモデル General-purpose Dataflow Model with Neuromorphic Primitives ( http://arxiv.org/abs/2408.01090v1 ) ライセンス: Link先を確認	Weihao Zhang, Yu Du, Hongyi Li, Songchen Ma, Rong Zhao,	(参考訳) ニューロモルフィックコンピューティングは、ニューラルネットワーク以外の様々なアプリケーションに高性能な利点をもたらす大きな可能性を示している。しかし、プログラムの汎用性とニューロモルフィックハードウェア効率のギャップを埋めるためには、ニューロモルフィックコンピューティングの特徴と整合する汎用プログラム実行モデルが必要である。データフローモデルは潜在的な解決策を提供するが、制御フロープログラムを扱う際には、グラフの複雑さとニューロモルフィックハードウェアとの非互換性に直面するため、プログラム性と性能が低下する。本稿では、制御論理のためのコンパクトで簡潔でニューロモーフィックなプログラム表現を提供するニューロモーフィック・データフローと呼ばれる、ニューロモーフィック・ハードウェアに適したデータフローモデルを提案する。ニューロモルフィックデータフローは「いつ」と「どこで」プリミティブを導入し、制御の視点を再構築する。ニューロモルフィックデータフローは、これらのプリミティブをデータフロースキーマに埋め込む。本手法は,プログラム性と可塑性を両立したニューロモルフィックハードウェアへの汎用プログラムの展開を可能にするとともに,ハードウェアの可能性を完全に活用する。 Neuromorphic computing exhibits great potential to provide high-performance benefits in various applications beyond neural networks. However, a general-purpose program execution model that aligns with the features of neuromorphic computing is required to bridge the gap between program versatility and neuromorphic hardware efficiency. The dataflow model offers a potential solution, but it faces high graph complexity and incompatibility with neuromorphic hardware when dealing with control flow programs, which decreases the programmability and performance. Here, we present a dataflow model tailored for neuromorphic hardware, called neuromorphic dataflow, which provides a compact, concise, and neuromorphic-compatible program representation for control logic. The neuromorphic dataflow introduces "when" and "where" primitives, which restructure the view of control. The neuromorphic dataflow embeds these primitives in the dataflow schema with the plasticity inherited from the spiking algorithms. Our method enables the deployment of general-purpose programs on neuromorphic hardware with both programmability and plasticity, while fully utilizing the hardware's potential.	翻訳日:2024-08-05 14:07:18 公開日:2024-08-02
# 分散不協和音:自己矛盾命令に対する大規模マルチモーダルモデルのベンチマーク Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions ( http://arxiv.org/abs/2408.01091v1 ) ライセンス: Link先を確認	Jin Gao, Lei Gan, Yuankai Li, Yixin Ye, Dequan Wang,	(参考訳) 大型マルチモーダルモデル(LMM)は、人間の指示に固執することが優れている。しかし、言語初心者や脆弱な人口にとって困難であるマルチモーダル相互作用や文脈長の増加により、自己矛盾的な指示が生じる可能性がある。矛盾するコマンドを認識する上でのLMMの能力を評価するために,自己コントラクショナルインストラクションベンチマークを導入する。言語とビジョンのパラダイムに均等に分散した2万のコンフリクトで構成されている。プロセスの迅速化と幅広い命令形式を包含できる新しい自動データセット作成フレームワークによって構築されている。我々の総合的な評価では、現在のLMMは、自己認識の欠如により、マルチモーダルな命令の不一致を特定するのに一貫して苦労している。そこで本研究では,外部から認識を注入する認知覚醒プロンプトを提案する。データセットとコードはここにある。 Large multimodal models (LMMs) excel in adhering to human instructions. However, self-contradictory instructions may arise due to the increasing trend of multimodal interaction and context length, which is challenging for language beginners and vulnerable populations. We introduce the Self-Contradictory Instructions benchmark to evaluate the capability of LMMs in recognizing conflicting commands. It comprises 20,000 conflicts, evenly distributed between language and vision paradigms. It is constructed by a novel automatic dataset creation framework, which expedites the process and enables us to encompass a wide range of instruction forms. Our comprehensive evaluation reveals current LMMs consistently struggle to identify multimodal instruction discordance due to a lack of self-awareness. Hence, we propose the Cognitive Awakening Prompting to inject cognition from external, largely enhancing dissonance detection. The dataset and code are here: https://selfcontradiction.github.io/.	翻訳日:2024-08-05 14:07:18 公開日:2024-08-02
# 例外点における創発的非エルミート保存法 Emergent non-Hermitian conservation laws at exceptional points ( http://arxiv.org/abs/2408.01092v1 ) ライセンス: Link先を確認	Zuo Wang, Liang He,	(参考訳) 非エルミート系は、それらの例外点(EP)においてリッチな静的および動的性質を示すことができる。ここでは、EP、すなわち一連の非エルミート保存法が出現する際、異なる現象の別の類を同定する。我々は、これらの異なる現象を非エルミート的ハイゼンベルク連鎖で具体的に示し、これらの創発的非エルミート保存則をEPで特定するための一般的な理論を定式化する。 EPの運動定数とそれに対応するヘルミタン系の運動定数を1対1で対応させることで、それらの物理的起源を補助系における創発対称性の存在に遡る。量子回路上の具体的なシミュレーションは、これらの創発的な保存されたダイナミクスが、現在のデジタル量子コンピューティングシステムで容易に観察できることを示している。 Non-Hermitian systems can manifest rich static and dynamical properties at their exceptional points (EPs). Here, we identify yet another class of distinct phenomena that is hinged on EPs, namely, the emergence of a series of non-Hermitian conservation laws. We demonstrate these distinct phenomena concretely in the non-Hermitian Heisenberg chain and formulate a general theory for identifying these emergent non-Hermitian conservation laws at EPs. By establishing a one-to-one correspondence between the constant of motions at EPs and those in corresponding auxiliary Hermitian systems, we trace their physical origin back to the presence of emergent symmetries in the auxiliary systems. Concrete simulations on quantum circuits show that these emergent conserved dynamics can be readily observed in current digital quantum computing systems.	翻訳日:2024-08-05 14:07:18 公開日:2024-08-02
# バイエンコーダニューラルサーチにおける符号化-探索分離視点 An Encoding--Searching Separation Perspective on Bi-Encoder Neural Search ( http://arxiv.org/abs/2408.01094v1 ) ライセンス: Link先を確認	Hung-Nghiep Tran, Akiko Aizawa, Atsuhiro Takasu,	(参考訳) 本稿では,ニューラルサーチのためのバイエンコーダアーキテクチャの新しい視点をレビューし,分析し,提案する。ビエンコーダアーキテクチャは、テスト時の単純さとスケーラビリティのために広く使用されているが、に見えるデータセットでの低パフォーマンスや、新しいデータセットでの低ゼロショットパフォーマンスなど、いくつかの注目すべき問題がある。本稿では,これらの問題を解析し,符号化情報ボトルネック問題と組込み検索の基本前提の限界という2つの主要な批判を要約する。そこで我々は,エンコーディングと探索操作を論理的に解析する思考実験を構築し,埋め込み探索の基本仮定に挑戦する。これらの観測結果に基づいて,符号化と探索操作を概念的に,実用的に分離する‘textit{encoding-searching separation’ という,バイエンコーダアーキテクチャの新しい視点を提案する。この新たな視点を適用して、特定された問題の根本原因を説明し、問題を緩和する方法について議論する。最後に、新しい視点の根底にある概念や、それが露呈する設計面、そこから生じる潜在的研究の方向性について論じる。 This paper reviews, analyzes, and proposes a new perspective on the bi-encoder architecture for neural search. While the bi-encoder architecture is widely used due to its simplicity and scalability at test time, it has some notable issues such as low performance on seen datasets and weak zero-shot performance on new datasets. In this paper, we analyze these issues and summarize two main critiques: the encoding information bottleneck problem and limitations of the basic assumption of embedding search. We then construct a thought experiment to logically analyze the encoding and searching operations and challenge the basic assumption of embedding search. Building on these observations, we propose a new perspective on the bi-encoder architecture called the \textit{encoding--searching separation} perspective, which conceptually and practically separates the encoding and searching operations. This new perspective is applied to explain the root cause of the identified issues and discuss ways to mitigate the problems. Finally, we discuss the implications of the ideas underlying the new perspective, the design surface that it exposes and the potential research directions arising from it.	翻訳日:2024-08-05 14:07:18 公開日:2024-08-02
# 6つのドラゴンが再び飛ぶ: トランスフォーマーと新しいエンコーディングで15世紀の韓国の宮廷音楽が復活 Six Dragons Fly Again: Reviving 15th-Century Korean Court Music with Transformers and Novel Encoding ( http://arxiv.org/abs/2408.01096v1 ) ライセンス: Link先を確認	Danbinaerin Han, Mark Gotham, Dongmin Kim, Hannah Park, Sihun Lee, Dasaem Jeong,	(参考訳) 15世紀の朝鮮の宮廷音楽「チワピョン」と「チワピョンヒョン」を復活させるプロジェクトを紹介します。韓国の音楽表記体系であるJeongganboの初期の例の1つで、残りのバージョンは初歩的なメロディのみで構成されている。我々の研究チームは、ナショナル・グガック(韓国伝統音楽センター)に委託され、この古いメロディを6パートのアンサンブルのための演奏可能なアレンジに変換することを目指していました。ベスポーク光音楽認識により取得したJeongganboデータを用いて,BERTのようなマスキング言語モデルとエンコーダ・デコーダ・トランスモデルを訓練した。また,Jeongganboの構造を厳密に追従し,音符の長さを位置として示す符号化方式を提案する。結果、ChwapyeongとChwipunghyeongの機械変換版は専門家によって評価され、ナショナル・グガック・センターのコート・ミュージック・オーケストラによって演奏された。本研究は, 注意深い設計と組み合わせれば, 限られたトレーニングデータを用いて, 生成モデルを従来の音楽に適用できることを実証する。 We introduce a project that revives a piece of 15th-century Korean court music, Chihwapyeong and Chwipunghyeong, composed upon the poem Songs of the Dragon Flying to Heaven. One of the earliest examples of Jeongganbo, a Korean musical notation system, the remaining version only consists of a rudimentary melody. Our research team, commissioned by the National Gugak (Korean Traditional Music) Center, aimed to transform this old melody into a performable arrangement for a six-part ensemble. Using Jeongganbo data acquired through bespoke optical music recognition, we trained a BERT-like masked language model and an encoder-decoder transformer model. We also propose an encoding scheme that strictly follows the structure of Jeongganbo and denotes note durations as positions. The resulting machine-transformed version of Chihwapyeong and Chwipunghyeong were evaluated by experts and performed by the Court Music Orchestra of National Gugak Center. Our work demonstrates that generative models can successfully be applied to traditional music with limited training data if combined with careful design.	翻訳日:2024-08-05 14:07:18 公開日:2024-08-02
# 実画像復元のための事前学習モデルによるコントリビューションに基づく低ランク適応 Contribution-based Low-Rank Adaptation with Pre-training Model for Real Image Restoration ( http://arxiv.org/abs/2408.01099v1 ) ライセンス: Link先を確認	Donwon Park, Hayeon Kim, Se Young Chun,	(参考訳) 近年,事前学習モデルと効率的なパラメータチューニングは,マスキングと即時チューニングの助けを借りて,自然言語処理やハイレベルコンピュータビジョンにおいて顕著な成功を収めている。しかし、低レベルのコンピュータビジョンでは、AIエッジデバイスに新しいタスクを統合する際のメモリインフレーションの問題など、さまざまな現実的なタスクの重要性とメリットにもかかわらず、事前訓練されたモデルに対する限定的な調査や、効率的な微調整戦略がまだ検討されていない。本稿では,複数画像復元のためのコントリビューションベース低ランク適応(CoLoRA)と呼ばれる新しいパラメータチューニング手法を提案する。すべてのネットワークパラメータをチューニングする先行技術とは異なり、我々のCoLoRAは、新しいビジョンタスク毎にLoRA(ローランク適応)を活用して、そのタスクの層容量を適応的に決定し、完全なチューニングに匹敵するパフォーマンスをもたらすことで、効果的に小さなパラメータを微調整します。さらに,我々のPRD戦略は,事前学習モデルの性能向上と,合成事前学習と実世界の微調整を橋渡しするロバスト性の向上を可能にする。 PRODを用いた我々のCoLoRAは、既知のタスクと新規タスクの合成と実世界の両方のデータセットにおいて、様々な画像復元タスクにおいて優れた性能を示した。 Recently, pre-trained model and efficient parameter tuning have achieved remarkable success in natural language processing and high-level computer vision with the aid of masked modeling and prompt tuning. In low-level computer vision, however, there have been limited investigations on pre-trained models and even efficient fine-tuning strategy has not yet been explored despite its importance and benefit in various real-world tasks such as alleviating memory inflation issue when integrating new tasks on AI edge devices. Here, we propose a novel efficient parameter tuning approach dubbed contribution-based low-rank adaptation (CoLoRA) for multiple image restorations along with effective pre-training method with random order degradations (PROD). Unlike prior arts that tune all network parameters, our CoLoRA effectively fine-tunes small amount of parameters by leveraging LoRA (low-rank adaptation) for each new vision task with our contribution-based method to adaptively determine layer by layer capacity for that task to yield comparable performance to full tuning. Furthermore, our PROD strategy allows to extend the capability of pre-trained models with improved performance as well as robustness to bridge synthetic pre-training and real-world fine-tuning. Our CoLoRA with PROD has demonstrated its superior performance in various image restoration tasks across diverse degradation types on both synthetic and real-world datasets for known and novel tasks.	翻訳日:2024-08-05 13:57:23 公開日:2024-08-02
# ハイブリッド量子ソフトウェアにおける解析可能性モデルの検証 Validation of an Analysability Model in Hybrid Quantum Software ( http://arxiv.org/abs/2408.01105v1 ) ライセンス: Link先を確認	Díaz-Muñoz Ana, Cruz-Lemus José A., Rodríguez Moisés, Piattini Mario, Baldassarre Maria Teresa,	(参考訳) 量子古典ハイブリッドコンピューティングの文脈において、ソフトウェアを理解して修正することの容易さである分析可能性を評価することは、量子アルゴリズムの複雑さと新規性に起因する重大な課題を提起する。量子ソフトウェア開発における進歩はあったが、標準的なソフトウェア品質評価手法は量子コンポーネントの仕様を完全には解決していないため、ハイブリッドソフトウェア製品の品質を確実に維持する能力のギャップが生じる。本報告では,イタリアとスペインの学術機関による国際共同弁論的アプローチを通じて,ハイブリッドソフトウェアの分析可能性に着目した品質モデルを検証することを目的としている。このアプローチは、より詳細な分析と検証の方法論を可能にし、量子コンピューティングにおけるソフトウェア品質評価における将来の研究と開発のためのフレームワークを確立する。 In the context of quantum-classical hybrid computing, evaluating analysability, which is the ease of understanding and modifying software, presents significant challenges due to the complexity and novelty of quantum algorithms. Although advances have been made in quantum software development, standard software quality evaluation methods do not fully address the specifics of quantum components, resulting in a gap in the ability to ensure and maintain the quality of hybrid software products. In this registered report proposal, we intend to validate a quality model focused on the analysability of hybrid software through an international collab orative approach involving academic institutions from Italy and Spain through a controlled experiment. This approach allows for a more detailed analysis and validation methodology and establishes a framework for future research and developments in software quality assessment in quantum computing.	翻訳日:2024-08-05 13:57:23 公開日:2024-08-02
# BioRAG: 生物学的質問応答のためのRAG-LLMフレームワーク BioRAG: A RAG-LLM Framework for Biological Question Reasoning ( http://arxiv.org/abs/2408.01107v1 ) ライセンス: Link先を確認	Chengrui Wang, Qingqing Long, Xiao Meng, Xunxin Cai, Chengjun Wu, Zhen Meng, Xuezhi Wang, Yuanchun Zhou,	(参考訳) 生命科学研究のための質問答えシステムは、発見の急激なペース、洞察の進化、知識エンティティ間の複雑な相互作用を特徴とし、総合的な知識倉庫と正確な情報検索を維持する上で、ユニークな課題を提示する。このような問題に対処するために,我々は,Large Language Models (LLMs) フレームワークを備えた新しいレトリーバル拡張生成(RAG)であるBioRAGを紹介した。このアプローチは、基本的な知識として2200万の科学論文を解析、索引付け、セグメント化することから始まり、続いて、このドメインに適した特別な埋め込みモデルをトレーニングします。さらに、各クエリとコンテキスト間の複雑な相互関係のモデル化を支援するドメイン固有の知識階層を組み込むことで、ベクトル検索プロセスを強化する。最新の情報を必要とするクエリに対して、BioRAGは質問を分解し、検索エンジンに組み込まれた反復的な検索プロセスを用いてステップバイステップの推論を行う。厳密な実験により、我々のモデルは、複数のライフサイエンス質問応答タスクにおいて、微調整 LLM や LLM 、検索エンジン、その他の科学的RAG フレームワークよりも優れていることが示された。 The question-answering system for Life science research, which is characterized by the rapid pace of discovery, evolving insights, and complex interactions among knowledge entities, presents unique challenges in maintaining a comprehensive knowledge warehouse and accurate information retrieval. To address these issues, we introduce BioRAG, a novel Retrieval-Augmented Generation (RAG) with the Large Language Models (LLMs) framework. Our approach starts with parsing, indexing, and segmenting an extensive collection of 22 million scientific papers as the basic knowledge, followed by training a specialized embedding model tailored to this domain. Additionally, we enhance the vector retrieval process by incorporating a domain-specific knowledge hierarchy, which aids in modeling the intricate interrelationships among each query and context. For queries requiring the most current information, BioRAG deconstructs the question and employs an iterative retrieval process incorporated with the search engine for step-by-step reasoning. Rigorous experiments have demonstrated that our model outperforms fine-tuned LLM, LLM with search engines, and other scientific RAG frameworks across multiple life science question-answering tasks.	翻訳日:2024-08-05 13:57:23 公開日:2024-08-02
# セマンティック・シンボリック環境におけるエピステミック・アンサンブル(証明付き拡張版) Epistemic Ensembles in Semantic and Symbolic Environments (Extended Version with Proofs) ( http://arxiv.org/abs/2408.01115v1 ) ライセンス: Link先を確認	Rolf Hennicker, Alexander Knapp, Martin Wirsing,	(参考訳) エピステミック・アンサンブル(英: epistemic ensemble)は、知識に基づくエージェントによって構成され、自分自身とその仲間についての知識や信念を検索し、共有することができる。これらのエージェントは、グローバルな知識状態にアクセスし、行動を使用してコミュニケーションと協力を行い、集合的な知識状態を変更する。本研究では, 共通の統語的操作的アンサンブルのセマンティクスに基づく, 表皮的アンサンブルのための2種類の数学的セマンティクスについて検討する。これらの環境を関連づけるために、我々は「phi}-equivalence」という概念を使い、もし「phi}-equivalence」の任意の式が、それが知識基底の要素であるならば、それは「phi}-equivalent」である。我々の主定理は、 {\Phi} と同値な構成が互いにシミュレートし、同じ動的エピステミックアンサンブル式を満たすことを示している。 An epistemic ensemble is composed of knowledge-based agents capable of retrieving and sharing knowledge and beliefs about themselves and their peers. These agents access a global knowledge state and use actions to communicate and cooperate, altering the collective knowledge state. We study two types of mathematical semantics for epistemic ensembles based on a common syntactic operational ensemble semantics: a semantic environment defined by a class of global epistemic states, and a symbolic environment consisting of a set of epistemic formul{\ae}. For relating these environments, we use the concept of {\Phi}-equivalence, where a class of epistemic states and a knowledge base are {\Phi}-equivalent, if any formula of {\Phi} holds in the class of epistemic states if, and only if, it is an element of the knowledge base. Our main theorem shows that {\Phi}-equivalent configurations simulate each other and satisfy the same dynamic epistemic ensemble formulae.	翻訳日:2024-08-05 13:57:23 公開日:2024-08-02
# IAI Group at CheckThat! 2024: Transformer Models and Data Augmentation for Checkworthy Claim Detection IAI Group at CheckThat! 2024: Transformer Models and Data Augmentation for Checkworthy Claim Detection ( http://arxiv.org/abs/2408.01118v1 ) ライセンス: Link先を確認	Peter Røysland Aarnes, Vinay Setty, Petra Galuščáková,	(参考訳) 本稿は,2024 CheckThat! の枠組みの中で,IAIグループによるクレームの自動チェックハーネス評価への参加について述べる。 Task 1: Check-Worthiness Estimation」に収録。このタスクには、英語、オランダ語、アラビア語の政治討論やTwitterのデータで、チェック価値のあるクレームを自動的に検出することが含まれる。事前訓練された生成デコーダとエンコーダトランスフォーマモデルを用いて、少数ショット連鎖推論、微調整、データ拡張、言語から別の言語への変換学習などの手法を用いた。パフォーマンス面では様々な成功を収めたにもかかわらず、我々のモデルは主催者のリーダーボードに顕著な配置を達成しました。英語では9位、オランダ語では3位、アラビア語では最高位です。開発テストデータセットと比較してラベル付きテストデータセットの性能は著しく低下しているものの、クレーム検出研究における継続的な取り組みに寄与し、クレーム検証システムにおける言語固有の適応の課題と可能性を強調した。 This paper describes IAI group's participation for automated check-worthiness estimation for claims, within the framework of the 2024 CheckThat! Lab "Task 1: Check-Worthiness Estimation". The task involves the automated detection of check-worthy claims in English, Dutch, and Arabic political debates and Twitter data. We utilized various pre-trained generative decoder and encoder transformer models, employing methods such as few-shot chain-of-thought reasoning, fine-tuning, data augmentation, and transfer learning from one language to another. Despite variable success in terms of performance, our models achieved notable placements on the organizer's leaderboard: ninth-best in English, third-best in Dutch, and the top placement in Arabic, utilizing multilingual datasets for enhancing the generalizability of check-worthiness detection. Despite a significant drop in performance on the unlabeled test dataset compared to the development test dataset, our findings contribute to the ongoing efforts in claim detection research, highlighting the challenges and potential of language-specific adaptations in claim verification systems.	翻訳日:2024-08-05 13:57:23 公開日:2024-08-02
# タスクプロンプトベクトル:マルチタスクソフトプロンプト転送による効果的な初期化 Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer ( http://arxiv.org/abs/2408.01119v1 ) ライセンス: Link先を確認	Robert Belanec, Simon Ostermann, Ivan Srba, Maria Bielikova,	(参考訳) Prompt tuningは、大規模言語モデル(LLM)をトレーニングするためのモジュール式で効率的なソリューションである。主な利点の1つはタスクのモジュール化であり、マルチタスク問題に適している。しかし、現在のソフトプロンプトベースの手法は、しばしばマルチタスクのモジュラリティを犠牲にし、新たに追加されたタスクごとにトレーニングプロセスを完全にあるいは部分的に繰り返す必要がある。タスクベクトルに関する最近の研究は、望まれるマルチタスク性能を達成するために、フルモデルウェイトに算術演算を適用しているが、ソフトプロンプトに対する同様のアプローチはいまだに欠落している。そこで本研究では,調整したソフトプロンプトの重みとランダム初期化との要素的差異から生成したタスクプロンプトベクトルを提案する。 12個のNLUデータセットの実験結果から、タスクプロンプトベクトルを低リソース設定で使用して、類似タスクのプロンプトチューニングを効果的に初期化できることが示されている。さらに,タスクプロンプトベクトルはプロンプトチューニングのランダム初期化とは無関係であることを示す。これにより、異なるタスクから事前訓練されたベクトルで即時算術を行うことができる。このようにして、複数のタスクからタスクプロンプトベクトルを算術的に加算することで、場合によっては最先端のベースラインを上回ります。 Prompt tuning is a modular and efficient solution for training large language models (LLMs). One of its main advantages is task modularity, making it suitable for multi-task problems. However, current soft-prompt-based methods often sacrifice multi-task modularity, requiring the training process to be fully or partially repeated for each newly added task. While recent work on task vectors applied arithmetic operations on full model weights to achieve the desired multi-task performance, a similar approach for soft-prompts is still missing. To this end, we introduce Task Prompt Vectors, created by element-wise difference between weights of tuned soft-prompts and their random initialization. Experimental results on 12 NLU datasets show that task prompt vectors can be used in low-resource settings to effectively initialize prompt tuning on similar tasks. In addition, we show that task prompt vectors are independent of the random initialization of prompt tuning. This allows prompt arithmetics with the pre-trained vectors from different tasks. In this way, by arithmetic addition of task prompt vectors from multiple tasks, we are able to outperform a state-of-the-art baseline in some cases.	翻訳日:2024-08-05 13:57:23 公開日:2024-08-02
# マルチタスク視覚グラウンドのための効率的かつ効果的なトランスフォーマーデコーダベースフレームワーク An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding ( http://arxiv.org/abs/2408.01120v1 ) ライセンス: Link先を確認	Wei Chen, Long Chen, Yu Wu,	(参考訳) ほとんどの先進的な視覚接地法は、視覚言語的特徴融合のためのトランスフォーマーに依存している。しかし、これらのトランスフォーマーベースのアプローチは、特に高解像度の画像や長い文脈文を扱う場合、トランスフォーマーエンコーダの自己保持機構により、計算コストが2次的にエスカレートするなど、大きな欠点に直面する。この2次計算負荷の増加は、長い言語表現を含む会話に基づく推論セグメンテーションのような、より複雑なシーンへの視覚的グラウンドの適用性を制限している。本稿では,トランスフォーマーデコーダをベースとした効率的なマルチタスクビジュアルグラウンドティング(EEVG)フレームワークを提案する。言語的側面では、言語的特徴がメモリとして入力され、視覚的特徴がクエリとして入力される、視覚的特徴と言語的特徴を融合するためにTransformer Decoderを使用します。これにより、融合は言語表現長と線形にスケールすることができる。視覚的側面では、注目スコアに基づく背景視覚トークンを排除し、パラメータフリーで計算を削減できる手法を導入する。次に、残りのスパース特徴写像からセグメント化マスクを直接予測するために、ライトマスクヘッドを設計する。ベンチマークの大規模な結果とアブレーション研究は、我々のアプローチの有効性と有効性を示している。コードはhttps://github.com/chenwei746/EEVGで入手できる。 Most advanced visual grounding methods rely on Transformers for visual-linguistic feature fusion. However, these Transformer-based approaches encounter a significant drawback: the computational costs escalate quadratically due to the self-attention mechanism in the Transformer Encoder, particularly when dealing with high-resolution images or long context sentences. This quadratic increase in computational burden restricts the applicability of visual grounding to more intricate scenes, such as conversation-based reasoning segmentation, which involves lengthy language expressions. In this paper, we propose an efficient and effective multi-task visual grounding (EEVG) framework based on Transformer Decoder to address this issue, which reduces the cost in both language and visual aspects. In the language aspect, we employ the Transformer Decoder to fuse visual and linguistic features, where linguistic features are input as memory and visual features as queries. This allows fusion to scale linearly with language expression length. In the visual aspect, we introduce a parameter-free approach to reduce computation by eliminating background visual tokens based on attention scores. We then design a light mask head to directly predict segmentation masks from the remaining sparse feature maps. Extensive results and ablation studies on benchmarks demonstrate the efficiency and effectiveness of our approach. Code is available in https://github.com/chenwei746/EEVG.	翻訳日:2024-08-05 13:57:23 公開日:2024-08-02
# 説明責任は賢明である - 電力グリッドのためのAIベースのサービスの技術的および規制的景観をナビゲートする Being Accountable is Smart: Navigating the Technical and Regulatory Landscape of AI-based Services for Power Grid ( http://arxiv.org/abs/2408.01121v1 ) ライセンス: Link先を確認	Anna Volkova, Mahdieh Hatamian, Alina Anapyanova, Hermann de Meer,	(参考訳) 人工知能の出現と電力グリッドのデジタル化により、スマートグリッドのためのAIベースのサービスのための多くの効果的なアプリケーションシナリオが導入された。それでも、重要なインフラストラクチャにAIを採用することは、不明確な規制とリスク定量化テクニックの欠如による課題を提起する。 AIベースのサービスをスマートグリッドに統合するための規制された説明可能なアプローチは、日々のプラクティスにおける革新的な手法の採用を加速し、社会の一般的な安全上の懸念に対処する。本稿では、説明責任を定義し、エネルギーセクターにおけるAIベースのサービスの重要性を強調することにより、この目的に寄与する。 AI法の現在の欠点を根底から説明し、これらの問題に潜在的に委譲された行為で対処するアプローチを提案する。説明責任AIベースのスマートグリッドサービスの開発と運用のための技術アプローチでは、さまざまなサービスライフサイクルフェーズを評価し、関連する説明責任リスクを特定することができる。 The emergence of artificial intelligence and digitization of the power grid introduced numerous effective application scenarios for AI-based services for the smart grid. Nevertheless, adopting AI in critical infrastructures presents challenges due to unclear regulations and lacking risk quantification techniques. Regulated and accountable approaches for integrating AI-based services into the smart grid could accelerate the adoption of innovative methods in daily practices and address society's general safety concerns. This paper contributes to this objective by defining accountability and highlighting its importance for AI-based services in the energy sector. It underlines the current shortcomings of the AI Act and proposes an approach to address these issues in a potential delegated act. The proposed technical approach for developing and operating accountable AI-based smart grid services allows for assessing different service life cycle phases and identifying related accountability risks.	翻訳日:2024-08-05 13:57:23 公開日:2024-08-02
# CFBench: LLMの総合的制約フォローベンチマーク CFBench: A Comprehensive Constraints-Following Benchmark for LLMs ( http://arxiv.org/abs/2408.01122v1 ) ライセンス: Link先を確認	Tao Zhang, Yanjun Shen, Wenjing Luo, Yan Zhang, Hao Liang, Tao Zhang, Fan Yang, Mingan Lin, Yujing Qiao, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou,	(参考訳) 自然言語命令の解釈と追従におけるLLM(Large Language Models)の有効性は、高度な現実世界のアプリケーションへの展開において重要である。既存の評価は主に断片化された制約や狭いシナリオに重点を置いているが、ユーザの視点から制約の包括性と信頼性を見落としている。このギャップを埋めるために、我々はCFBenchを提案する。CFBenchはLLMのベンチマークに従って、200以上の実環境シナリオと50以上のNLPタスクをカバーする1,000以上のキュレートされたサンプルを特徴とする大規模な包括的制約である。 CFBenchは実世界の命令から厳密に制約をコンパイルし、10のプライマリカテゴリと25以上のサブカテゴリを含む制約型のための革新的な体系的フレームワークを構築し、各制約が命令内にシームレスに統合されることを保証する。 LLM出力の評価がユーザ認識と一致していることを確認するために,多次元評価基準と要求優先化を統合し,制約,指示,要求充足の様々な観点を網羅する高度な方法論を提案する。 CFBench上での現在のLLMの評価は、制約の改善のためのかなりの余地を明らかにし、さらに影響要因と強化戦略について検討する。データとコードはhttps://github.com/PKU-Baichuan-MLSystemLab/CFBenchで公開されている。 The adeptness of Large Language Models (LLMs) in comprehending and following natural language instructions is critical for their deployment in sophisticated real-world applications. Existing evaluations mainly focus on fragmented constraints or narrow scenarios, but they overlook the comprehensiveness and authenticity of constraints from the user's perspective. To bridge this gap, we propose CFBench, a large-scale Comprehensive Constraints Following Benchmark for LLMs, featuring 1,000 curated samples that cover more than 200 real-life scenarios and over 50 NLP tasks. CFBench meticulously compiles constraints from real-world instructions and constructs an innovative systematic framework for constraint types, which includes 10 primary categories and over 25 subcategories, and ensures each constraint is seamlessly integrated within the instructions. To make certain that the evaluation of LLM outputs aligns with user perceptions, we propose an advanced methodology that integrates multi-dimensional assessment criteria with requirement prioritization, covering various perspectives of constraints, instructions, and requirement fulfillment. Evaluating current leading LLMs on CFBench reveals substantial room for improvement in constraints following, and we further investigate influencing factors and enhancement strategies. The data and code are publicly available at https://github.com/PKU-Baichuan-MLSystemLab/CFBench	翻訳日:2024-08-05 13:57:23 公開日:2024-08-02
# レーザー誘起アライメントにおける核四極子結合効果の分子的影響 Molecular influence on nuclear-quadrupole-coupling effects in laser induced alignment ( http://arxiv.org/abs/2408.01125v1 ) ライセンス: Link先を確認	Linda V. Thesing, Andrey Yachmenev, Rosario González-Férez, Jochen Küpper,	(参考訳) 我々は、異なる非対称トップ分子のフィールドフリーインパルスアライメントに対する核四極子相互作用の効果について検討した。解析は超微細構造と回転エネルギー構造の影響に焦点をあてる。これらは核スピンの数、回転定数、核スピンと外界相互作用に関与するテンソルの対称性に依存する。レーザーパルス後のスピン回転力学において, 原型大核スピン分子のヨードベンゼン, 1,2-ジオドベンゼン, 1,3-ジオドベンゼンおよび2,5-ジオドベンゾニトリルと比較し, 回転エネルギー分裂に対する超微細分裂の大きさが重要な役割を担っていることを示した。さらに, 高励起回転状態が動力学を支配下に置くと, 四重極結合が回転力学に与える影響が減少することを示した。 We studied the effect of nuclear-quadrupole interactions on the field-free impulsive alignment of different asymmetric-top molecules. Our analysis is focused on the influence of the hyperfine- and rotational-energy-level structures. These depend on the number of nuclear spins, the rotational constants, and the symmetry of the tensors involved in the nuclear spin and external field interactions. Comparing the prototypical large-nuclear-spin molecules iodobenzene, 1,2-diiodobenzene, 1,3-diiodobenzene, and 2,5-diiodobenzonitrile, we demonstrate that the magnitude of the hyperfine splittings compared to the rotational-energy splittings plays a crucial role in the spin-rotational dynamics after the laser pulse. Moreover, we point out that the impact of the quadrupole coupling on the rotational dynamics decreases when highly excited rotational states dominate the dynamics.	翻訳日:2024-08-05 13:57:23 公開日:2024-08-02
# IG-SLAM:インスタントガウスSLAM IG-SLAM: Instant Gaussian SLAM ( http://arxiv.org/abs/2408.01126v1 ) ライセンス: Link先を確認	Furkan Aykut Sarikamis, Abdullah Aydin Alatan,	(参考訳) 3D Gaussian Splattingは、最近、神経暗黙の表現に対するSLAMシステムにおける代替のシーン表現として有望な結果を示している。しかしながら、現在の手法では、マッピングプロセスを監視するための深度マップが欠落しているか、環境の規模を考慮した詳細なトレーニングデザインが欠落している。これらの欠点に対処するため,高密度RGBのみのSLAMシステムであるIG-SLAMを提案する。環境の3次元マップは、トラッキングによって提供される正確なポーズと密集した深さを用いて構築される。さらに,マップ最適化における深度不確実性を利用して3次元再構成を改善する。写像最適化における我々の崩壊戦略は収束を高め、単一のプロセスで10 fpsでシステムを実行することを可能にする。我々は、最先端のRGBのみのSLAMシステムと競合する性能を示し、高速な動作速度を実現する。本稿では、Replica、TUM-RGBD、ScanNet、EuRoCデータセットについて実験を行った。このシステムは、特にEuRoCデータセットにおいて、大規模なシーケンスで、フォトリアリスティックな3D再構成を実現する。 3D Gaussian Splatting has recently shown promising results as an alternative scene representation in SLAM systems to neural implicit representations. However, current methods either lack dense depth maps to supervise the mapping process or detailed training designs that consider the scale of the environment. To address these drawbacks, we present IG-SLAM, a dense RGB-only SLAM system that employs robust Dense-SLAM methods for tracking and combines them with Gaussian Splatting. A 3D map of the environment is constructed using accurate pose and dense depth provided by tracking. Additionally, we utilize depth uncertainty in map optimization to improve 3D reconstruction. Our decay strategy in map optimization enhances convergence and allows the system to run at 10 fps in a single process. We demonstrate competitive performance with state-of-the-art RGB-only SLAM systems while achieving faster operation speeds. We present our experiments on the Replica, TUM-RGBD, ScanNet, and EuRoC datasets. The system achieves photo-realistic 3D reconstruction in large-scale sequences, particularly in the EuRoC dataset.	翻訳日:2024-08-05 13:57:23 公開日:2024-08-02
# マンバのサーベイ A Survey of Mamba ( http://arxiv.org/abs/2408.01129v1 ) ライセンス: Link先を確認	Haohao Qu, Liangbo Ning, Rui An, Wenqi Fan, Tyler Derr, Xin Xu, Qing Li,	(参考訳) ディープラーニングは重要な技術であり、人工知能に顕著な革命をもたらした。最も代表的なアーキテクチャとして、トランスフォーマーは多くの高度なモデル、特に数十億のパラメータからなる大規模言語モデルが強化され、ディープラーニングの基盤となっている。素晴らしい成果にもかかわらず、トランスフォーマーは依然として固有の制限に直面しており、特に注意計算の2次計算の複雑さから生じる時間を要する推論である。近年、古典的状態空間モデルからインスピレーションを得た新しいアーキテクチャであるMambaが、基盤モデル構築のための有望な代替として登場し、トランスフォーマーに匹敵するモデリング能力を提供しながら、シーケンス長に関するほぼ直線的スケーラビリティを保っている。このことが、様々な領域で印象的なパフォーマンスを達成するためのマンバの可能性を積極的に探究する研究を活発に進めるきっかけとなった。このような急速な進化を考えると、既存のマンバ駆動モデルを統合する体系的なレビューが不可欠であり、この新たなモデルアーキテクチャの包括的理解を提供する。そこで本研究では,近年のマンバ関連研究を詳細に調査し,マンバモデルの発展,さまざまなデータにマンバを適応させる技術,およびマンバが卓越できる応用の3つの側面から取り上げる。具体的には,まず,様々な代表的深層学習モデルの基礎知識と,マンバの詳細を予備研究として思い出す。そこで,本研究では,Mambaのアーキテクチャ設計,データ適応性,アプリケーションに焦点をあてた,Mambaの意義を概観する。最後に,現状の限界について論じ,将来的な研究の方向性を探究し,今後の研究に深い洞察を与える。 Deep learning, as a vital technique, has sparked a notable revolution in artificial intelligence. As the most representative architecture, Transformers have empowered numerous advanced models, especially the large language models that comprise billions of parameters, becoming a cornerstone in deep learning. Despite the impressive achievements, Transformers still face inherent limitations, particularly the time-consuming inference resulting from the quadratic computation complexity of attention calculation. Recently, a novel architecture named Mamba, drawing inspiration from classical state space models, has emerged as a promising alternative for building foundation models, delivering comparable modeling abilities to Transformers while preserving near-linear scalability concerning sequence length. This has sparked an increasing number of studies actively exploring Mamba's potential to achieve impressive performance across diverse domains. Given such rapid evolution, there is a critical need for a systematic review that consolidates existing Mamba-empowered models, offering a comprehensive understanding of this emerging model architecture. In this survey, we therefore conduct an in-depth investigation of recent Mamba-associated studies, covering from three main aspects: the advancements of Mamba-based models, the techniques of adapting Mamba to diverse data, and the applications where Mamba can excel. Specifically, we first recall the foundational knowledge of various representative deep learning models and the details of Mamba as preliminaries. Then, to showcase the significance of Mamba, we comprehensively review the related studies focusing on Mamba models' architecture design, data adaptability, and applications. Finally, we present an discussion of current limitations and explore various promising research directions to provide deeper insights for future investigations.	翻訳日:2024-08-05 13:57:23 公開日:2024-08-02
# 自動プログラム修復におけるプログラム削減の効果 The Impact of Program Reduction on Automated Program Repair ( http://arxiv.org/abs/2408.01134v1 ) ライセンス: Link先を確認	Linas Vidziunas, David Binkley, Leon Moonen,	(参考訳) 最新の自動プログラム修正(APR)を使ってバグを修正することは、時間とリソースを消費する可能性がある。本稿では,現代のAPRツールのスケーラビリティ向上を目的としたプログラム修復手法について述べる。このアプローチでは、プログラムスライシングの形式でプログラムの削減を活用して、バグの修正に関係のないコードを排除することにより、APRツール全体のパフォーマンスが向上する。本研究では,スライシングが修復プロセスの3つの段階,すなわち障害局所化,パッチ生成,パッチ検証に与える影響について検討する。実験により,提案手法は平均してTBar APRツールの補修能力を高めるが,成功しなかった事例もいくつか見いだされた。特に、広く使われているDefects4Jデータセットの例では、中央値の修理時間を大幅に短縮し、80分から18分未満に低下する。プログラムの削減は修理品質を劣化させることなくAPRの性能を向上させることができるが、この改善は普遍的ではない。 Zenodoのレプリケーションパッケージはhttps://doi.org/10.5281/zenodo.13074333で公開されている。キーワード: プログラムの自動修復、動的プログラムスライシング、フォールトローカライゼーション、テストスーツリダクション、ハイブリッド技術。 Correcting bugs using modern Automated Program Repair (APR) can be both time-consuming and resource-expensive. We describe a program repair approach that aims to improve the scalability of modern APR tools. The approach leverages program reduction in the form of program slicing to eliminate code irrelevant to fixing the bug, which improves the APR tool's overall performance. We investigate slicing's impact on all three phases of the repair process: fault localization, patch generation, and patch validation. Our empirical exploration finds that the proposed approach, on average, enhances the repair ability of the TBar APR tool, but we also discovered a few cases where it was less successful. Specifically, on examples from the widely used Defects4J dataset, we obtain a substantial reduction in median repair time, which falls from 80 minutes to just under 18 minutes. We conclude that program reduction can improve the performance of APR without degrading repair quality, but this improvement is not universal. A replication package is available via Zenodo at https://doi.org/10.5281/zenodo.13074333. Keywords: automated program repair, dynamic program slicing, fault localization, test-suite reduction, hybrid techniques.	翻訳日:2024-08-05 13:57:23 公開日:2024-08-02
# PGNeXt:ピラミッドグラフトネットワークによる高分解能塩性物体検出 PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network ( http://arxiv.org/abs/2408.01137v1 ) ライセンス: Link先を確認	Changqun Xia, Chenxi Xie, Zhentao He, Tianshu Yu, Jia Li,	(参考訳) 本稿では、データセットとネットワークフレームワークの両方の観点から、より難解な高分解能サルエントオブジェクト検出(HRSOD)について述べる。 HRSODデータセットの欠如を補うため、4K-8K解像度で現実の複雑なシナリオから5,920枚の画像を含む、UHRSDと呼ばれる大規模高解像度の高分解能物体検出データセットを慎重に収集した。すべての画像はピクセルレベルで微妙にアノテートされ、以前の低解像度のSODデータセットをはるかに上回っている。従来の手法では,サンプリング深度と受容場の大きさの矛盾を克服することを目的として,ピラミッドグラフト機構を用いたHR-SODタスクのための新しい一段階フレームワークを提案する。一般に、変換器ベースとCNNベースのバックボーンを用いて、異なる解像度画像から特徴を独立に抽出し、これらの特徴を変換器ブランチからCNNブランチに移植する。 CNNブランチが、デコード処理中に異なるソース機能によってガイドされる、壊れた詳細情報をより公平に組み合わせられるように、アテンションベースのクロスモデルグラフティングモジュール(CMGM)が提案されている。さらに,CMGMによるアテンション行列を明示的に監視し,ネットワークが異なるブランチからのアテンションとよりよく対話できるように,AGL(Atention Guided Loss)を設計する。 UHRSDと広く使用されているSODデータセットに関する総合的な実験により、我々の手法は、有能なオブジェクトを同時に検出し、リッチな詳細を保存し、最先端の手法より優れていることを示す。提案するフレームワークの一般化能力を検証するために,COD(camouflaged object detection)タスクに適用する。特に, ベルやホイッスルを使わずに, 最先端のCOD法よりも優れた性能を発揮する。 We present an advanced study on more challenging high-resolution salient object detection (HRSOD) from both dataset and network framework perspectives. To compensate for the lack of HRSOD dataset, we thoughtfully collect a large-scale high resolution salient object detection dataset, called UHRSD, containing 5,920 images from real-world complex scenarios at 4K-8K resolutions. All the images are finely annotated in pixel-level, far exceeding previous low-resolution SOD datasets. Aiming at overcoming the contradiction between the sampling depth and the receptive field size in the past methods, we propose a novel one-stage framework for HR-SOD task using pyramid grafting mechanism. In general, transformer-based and CNN-based backbones are adopted to extract features from different resolution images independently and then these features are grafted from transformer branch to CNN branch. An attention-based Cross-Model Grafting Module (CMGM) is proposed to enable CNN branch to combine broken detailed information more holistically, guided by different source feature during decoding process. Moreover, we design an Attention Guided Loss (AGL) to explicitly supervise the attention matrix generated by CMGM to help the network better interact with the attention from different branches. Comprehensive experiments on UHRSD and widely-used SOD datasets demonstrate that our method can simultaneously locate salient object and preserve rich details, outperforming state-of-the-art methods. To verify the generalization ability of the proposed framework, we apply it to the camouflaged object detection (COD) task. Notably, our method performs superior to most state-of-the-art COD methods without bells and whistles.	翻訳日:2024-08-05 13:57:23 公開日:2024-08-02
# Axiomatic Spectral Importance Decomposition を用いた画像モデルの大域的摂動ロバスト性の解析 Interpreting Global Perturbation Robustness of Image Models using Axiomatic Spectral Importance Decomposition ( http://arxiv.org/abs/2408.01139v1 ) ライセンス: Link先を確認	Róisín Luo, James McDermott, Colm O'Riordan,	(参考訳) 摂動堅牢性は、データ破損や敵攻撃など、さまざまな摂動から生じるモデルの脆弱性を評価する。摂動堅牢性のメカニズムを理解することは、大域的解釈可能性にとって重要である。本稿では,画像モデルの摂動ロバスト性を理解するために,モデルに依存しない大域的機械論的解釈法を提案する。この研究は2つの重要な側面によって動機付けられている。第一に、従来のグローバルな解釈可能性の研究は、例えば、画像モデル内での摂動堅牢性のメカニズムを直接解釈するようには設計されていない。第2に、摂動自然画像のスペクトル信号-雑音比(SNR)が周波数上で指数関数的に減衰していることに気づく。低周波信号は一般的に高周波信号よりも強いが、低周波信号だけでは高い分類精度は達成できない。本手法は,Shapley値理論の適用により,情報理論フレームワーク内でのロバストな特徴と非ロバストな特徴の予測力を軸に定量化する。提案手法は, モデルロバストネス機構について, モデルロバストネス機構に関するユニークな知見を提供する。我々は、ImageNet上で事前訓練された様々な視覚モデルに対して広範な実験を行い、 \textbf{I-ASIDE} が摂動ロバスト性だけでなく、そのメカニズムの \textbf{provide 解釈も可能であることを示す。 Perturbation robustness evaluates the vulnerabilities of models, arising from a variety of perturbations, such as data corruptions and adversarial attacks. Understanding the mechanisms of perturbation robustness is critical for global interpretability. We present a model-agnostic, global mechanistic interpretability method to interpret the perturbation robustness of image models. This research is motivated by two key aspects. First, previous global interpretability works, in tandem with robustness benchmarks, e.g. mean corruption error (mCE), are not designed to directly interpret the mechanisms of perturbation robustness within image models. Second, we notice that the spectral signal-to-noise ratios (SNR) of perturbed natural images exponentially decay over the frequency. This power-law-like decay implies that: Low-frequency signals are generally more robust than high-frequency signals -- yet high classification accuracy can not be achieved by low-frequency signals alone. By applying Shapley value theory, our method axiomatically quantifies the predictive powers of robust features and non-robust features within an information theory framework. Our method, dubbed as \textbf{I-ASIDE} (\textbf{I}mage \textbf{A}xiomatic \textbf{S}pectral \textbf{I}mportance \textbf{D}ecomposition \textbf{E}xplanation), provides a unique insight into model robustness mechanisms. We conduct extensive experiments over a variety of vision models pre-trained on ImageNet to show that \textbf{I-ASIDE} can not only \textbf{measure} the perturbation robustness but also \textbf{provide interpretations} of its mechanisms.	翻訳日:2024-08-05 13:57:23 公開日:2024-08-02
# 非ブロックバンドの機械学習トポロジカルエネルギーブレイディング Machine learning topological energy braiding of non-Bloch bands ( http://arxiv.org/abs/2408.01141v1 ) ライセンス: Link先を確認	Shuwei Shi, Shibing Chu, Yuee Xie, Yuanping Chen,	(参考訳) 機械学習は様々な物理系の相転移を識別するために使われてきた。しかしながら、非エルミート系における非ブロックエネルギーブレイディングに関する関連する研究はいまだに存在しない。本研究では,教師なしおよび教師なし手法を用いて,一次元非エルミート系における非ブロックエネルギーのブレイディングについて検討する。教師なし学習では、拡散マップを用いて、事前の知識なしに非ブロックエネルギーブレイディングを識別し、それをk平均と組み合わせて異なる位相要素をアンリンクやホップリンクのようなクラスタにクラスタ化する。教師付き学習では、Blochエネルギーデータに基づく畳み込みニューラルネットワーク(CNN)を訓練し、Blochエネルギーブレイディングだけでなく、100%の精度で非Blochエネルギーブレイディングを予測する。 CNNを解析することにより、ネットワークがエネルギーバンドのブレイディングトポロジを認識できることを確認することができる。本研究では,非エルミート位相位相とエネルギーブレイディングの同定における機械学習の可能性を示す。 Machine learning has been used to identify phase transitions in a variety of physical systems. However, there is still a lack of relevant research on non-Bloch energy braiding in non-Hermitian systems. In this work, we study non-Bloch energy braiding in one-dimensional non-Hermitian systems using unsupervised and supervised methods. In unsupervised learning, we use diffusion maps to successfully identify non-Bloch energy braiding without any prior knowledge and combine it with k-means to cluster different topological elements into clusters, such as Unlink and Hopf link. In supervised learning, we train a Convolutional Neural Network (CNN) based on Bloch energy data to predict not only Bloch energy braiding but also non-Bloch energy braiding with an accuracy approaching 100%. By analysing the CNN, we can ascertain that the network has successfully acquired the ability to recognise the braiding topology of the energy bands. The present study demonstrates the considerable potential of machine learning in the identification of non-Hermitian topological phases and energy braiding.	翻訳日:2024-08-05 13:47:29 公開日:2024-08-02
# 高度機械学習を用いた外傷性脳損傷患者の換気器関連肺炎の予測 Enhanced Prediction of Ventilator-Associated Pneumonia in Patients with Traumatic Brain Injury Using Advanced Machine Learning Techniques ( http://arxiv.org/abs/2408.01144v1 ) ライセンス: Link先を確認	Negin Ashrafi, Armin Abdollahi, Maryam Pishgar,	(参考訳) 背景: 外傷性脳損傷(TBI)患者の呼吸器関連肺炎(VAP)は、重大な死亡リスクを生じ、患者や医療システムにかなりの経済的負担を課す。 TBI患者のVAPのタイムリーな検出と予後は、患者の予後を改善し、医療資源の負担を軽減するために重要である。方法:MIMIC-IIIデータベースを用いて6つの機械学習モデルを実装した。提案手法には,CatBoostを用いた機能選択や専門家の意見,SMOTE(Synthetic Minority Oversampling Technique)とのクラス不均衡への対処,5倍のクロスバリデーションによる厳密なモデルチューニング,ハイパーパラメータの最適化など,事前処理のステップが含まれていた。評価された主要なモデルは、SVM、ロジスティック回帰、ランダムフォレスト、XGBoost、ANN、AdaBoostである。さらに,特徴量を決定するためにSHAP解析を行い,モデル性能に影響を及ぼす特徴量を評価するためのアブレーション試験を行った。結果: XGBoostはベースラインモデルと既存の最高の文献を上回りました。 AUC、正確性、特異性、感度、F1スコア、PV、NPVといったメトリクスを使用しました。 XGBoostは、AUCが0.940、精度が0.875で、AUCが0.706、精度が0.640で、既存の文献よりも23.4%、23.5%高い性能を示した。この性能向上は、臨床環境でのモデルの有効性を裏付けるものである。結論: 本研究は, TBI患者におけるVAPの予測モデルを強化し, 早期発見と介入の可能性を向上させる。改良された特徴選択と高度なアンサンブル技術は、モデル精度と信頼性を著しく向上させ、将来の臨床応用と医療診断研究に有望な方向性を提供した。 Background: Ventilator-associated pneumonia (VAP) in traumatic brain injury (TBI) patients poses a significant mortality risk and imposes a considerable financial burden on patients and healthcare systems. Timely detection and prognostication of VAP in TBI patients are crucial to improve patient outcomes and alleviate the strain on healthcare resources. Methods: We implemented six machine learning models using the MIMIC-III database. Our methodology included preprocessing steps, such as feature selection with CatBoost and expert opinion, addressing class imbalance with the Synthetic Minority Oversampling Technique (SMOTE), and rigorous model tuning through 5-fold cross-validation to optimize hyperparameters. Key models evaluated included SVM, Logistic Regression, Random Forest, XGBoost, ANN, and AdaBoost. Additionally, we conducted SHAP analysis to determine feature importance and performed an ablation study to assess feature impacts on model performance. Results: XGBoost outperformed the baseline models and the best existing literature. We used metrics, including AUC, Accuracy, Specificity, Sensitivity, F1 Score, PPV, and NPV. XGBoost demonstrated the highest performance with an AUC of 0.940 and an Accuracy of 0.875, which are 23.4% and 23.5% higher than the best results in the existing literature, with an AUC of 0.706 and an Accuracy of 0.640, respectively. This enhanced performance underscores the models' effectiveness in clinical settings. Conclusions: This study enhances the predictive modeling of VAP in TBI patients, improving early detection and intervention potential. Refined feature selection and advanced ensemble techniques significantly boosted model accuracy and reliability, offering promising directions for future clinical applications and medical diagnostics research.	翻訳日:2024-08-05 13:47:29 公開日:2024-08-02
# 無調波振動子のエネルギースペクトルの高精度解析モデル Accurate Analytic Model for the Energy Spectrum of the Anharmonic Oscillator ( http://arxiv.org/abs/2408.01146v1 ) ライセンス: Link先を確認	Michel Caffarel,	(参考訳) 最近の研究で、経路積分形式を用いたクォート発振器の分配関数の解析式を導出した。非常に顕著なことに、自由エネルギーは温度とクォート結合定数全体の数パーセントまで正確であることが判明した。さらに、正確な分割関数の重要な特徴が再現された。基底および第一励起状態エネルギーの正確な解析式を$g$関数として導出した。本研究では、全エネルギースペクトルの計算に結果を拡張する。また, 性的・咬合的結合を有する非調和振動子の場合には, クォート振動子の研究を一般化する。発見されたエネルギー準位は、ここで考慮されたすべての結合と主量子数(最大$n=8$)に対して正確であり、このモデル分割関数が正確な結合のよい忠実な近似であると確認する。 In a recent work we have derived an analytic expression for the partition function of the quartic oscillator using a path integral formalism. Quite remarkably, the free energy was found to be accurate to a few percent over the entire range of temperatures and quartic coupling constant. In addition, the key features of the exact partition function were successfully reproduced. Accurate analytic expressions for the ground- and first-excited state energies as function of $g$ were derived. In this work, we extend our results to the calculation of the full energy spectrum. We also generalize our study of the quartic oscillator to the case of the anharmonic oscillator with sextic and octic couplings. The energy levels found are accurate for all couplings and principal quantum numbers considered here (up to $n=8$), confirming this model partition function as a good and faithful approximation of the exact one.	翻訳日:2024-08-05 13:47:29 公開日:2024-08-02
# 補助的非量子チャネルを持たない絡み合った光子による情報伝達 Information transfer by entangled photons without auxiliary non-quantum channel ( http://arxiv.org/abs/2408.01150v1 ) ライセンス: Link先を確認	Levente Szabó, Pál Maák,	(参考訳) 本稿では, 絡み合った光子を用いた光通信の高速化に関する理論的解析を行う。我々は、絡み合った光子対同士の直接情報伝達の問題を解くことができる設計を解析する。我々は、実験的な検証がこれを確認または否定できると考えている。我々の仮説は、ノコミュニケーション定理のほとんどの証明は、ある条件のセットに基づいており、古典的なチャネルを使わずに、量子情報チャネルとして絡み合った状態の確立を可能にする、より広い条件のセットを提供することができるというものである。提案設計の1つの基本単位は、絡み合った光子対の1つの部材の偏光状態を空間重畳状態に変換する。これにより、絡みをなくした一方の部材上での偏光測定の後、他方の部材の空間重畳状態に量子情報を保持する。これは空間干渉に基づく特定の測定によって回復することができる。我々は、いわゆる対称関数の解が、非コミュニケーション定理に対応する平均的な結果をもたらすことを示した。しかし、非対称関数を用いることで、所定の時間窓で算出した平均測定結果を、ペアの他部材で行った測定のタイプを区別することができる。これにより、特定の条件下でのより高速な情報共有を可能にする通信コードを確立することができる。量子力学的非局所性原理の重要な拡張である。 In this paper we present a theoretical analysis of the faster than light communication possibility based on entangled photons. We analyze designs that may be capable to solve the problem of direct information transfer between members of an entangled photon pairs. We consider that experimental verifications can confirm or even refute this. Our hypothesis was that most proofs of the nocommunication theorem are based on a certain set of conditions, and it is possible to provide a broader set of conditions that allow the establishment of entangled states as quantum information channels, without using a classical channel. One basic unit of the proposed design transforms the polarization state of one member of an entangled photon pair into a spatial superposition state. Thus, after the polarization measurement performed on one member, which eliminates the entanglement, the quantum information is maintained in the spatial superposition state of the other member. This can be recovered by a particular measurement based on spatial interference. We have shown that solutions with so-called symmetric functions lead to average results that corresponds to the nocommunication theorem. However, using asymmetric functions the averaged measurement results calculated in a prescribed time window can distinguish the types of measurements performed on the other member of the pair. This can establish a communication code that enables faster-than-light information sharing under specific conditions. There may be also further theoretical consequences: a significant extension of the quantum mechanical nonlocality principle.	翻訳日:2024-08-05 13:47:29 公開日:2024-08-02
# DERA:知識グラフにおけるエンティティアライメントのためのエンティティ検索 DERA: Dense Entity Retrieval for Entity Alignment in Knowledge Graphs ( http://arxiv.org/abs/2408.01154v1 ) ライセンス: Link先を確認	Zhichun Wang, Xuan Chen,	(参考訳) エンティティアライメント(EA)は、知識の融合と統合に不可欠な、異なる知識グラフ(KG)の同等のエンティティをマッチングすることを目的としている。近年,埋め込み型EAが注目され,多くのアプローチが提案されている。初期のアプローチは主に、関係三重項によって定義されるKGの構造的特徴からエンティティの埋め込みを学ぶことに焦点を当てていた。その後の手法では、EAの埋め込みを強化する補助情報としてエンティティの名前と属性が組み込まれた。しかし、これらの手法は構造情報と属性情報をエンコードするためにしばしば異なる手法を使用しており、相互の相互作用と相互強化を制限している。本研究では,言語モデルを用いて,エンティティの様々な特徴を一様にエンコードし,KG間で最も近いエンティティ検索を容易にする,EAのための密度の高いエンティティ検索フレームワークを提案する。アライメント候補はまずエンティティ検索によって生成され、最後にアライメントを決定するためにリランクされる。我々は,従来のEA手法と比較して,我々のアプローチが最先端のパフォーマンスを達成することを実証し,言語間および単言語間のEAデータセットに関する包括的な実験を行った。 Entity Alignment (EA) aims to match equivalent entities in different Knowledge Graphs (KGs), which is essential for knowledge fusion and integration. Recently, embedding-based EA has attracted significant attention and many approaches have been proposed. Early approaches primarily focus on learning entity embeddings from the structural features of KGs, defined by relation triples. Later methods incorporated entities' names and attributes as auxiliary information to enhance embeddings for EA. However, these approaches often used different techniques to encode structural and attribute information, limiting their interaction and mutual enhancement. In this work, we propose a dense entity retrieval framework for EA, leveraging language models to uniformly encode various features of entities and facilitate nearest entity search across KGs. Alignment candidates are first generated through entity retrieval, which are subsequently reranked to determine the final alignments. We conduct comprehensive experiments on both cross-lingual and monolingual EA datasets, demonstrating that our approach achieves state-of-the-art performance compared to existing EA methods.	翻訳日:2024-08-05 13:47:29 公開日:2024-08-02
# フェルミオンガウス状態から行列生成状態への効率的な変換 Efficient conversion from fermionic Gaussian states to matrix product states ( http://arxiv.org/abs/2408.01155v1 ) ライセンス: Link先を確認	Tong Liu, Ying-Hai Wu, Hong-Hao Tu, Tao Xiang,	(参考訳) フェルミオンガウス状態は二次ハミルトンの固有状態であり、量子多体問題において広く用いられる。フェミオンガウス状態から行列積状態に変換する高効率なアルゴリズムを提案する。翻訳不変性のない有限サイズ系に対しては定式化できるが、翻訳不変性を持つ無限系に適用すると特に魅力的になる。無限のシリンダー上の位相的に順序付けられた系の基底状態が行列積状態として表されるとき、転移行列の固定点は、極小絡み合った状態としても知られるエノン固有基底(英語版)( anyon eigenbasis)をフィルタリングするために利用することができる。これにより、絡み合いスペクトルやモジュラ行列のような普遍的性質の効率的な計算が可能になる。本手法のポテンシャルは, ボーソニックなラウリン状態とムーア-リード状態の位相秩序を持つ2つのキラルスピン液体の数値計算によって示される。最初のeigenbasisは以前検討され、有用なベンチマークとして役立ちます。しかし、第2の固有ベイジは透明ではなく、その構造が成功したことは、我々の方法の非自明な腐食をもたらす。 Fermionic Gaussian states are eigenstates of quadratic Hamiltonians and widely used in quantum many-body problems. We propose a highly efficient algorithm that converts fermionic Gaussian states to matrix product states. It can be formulated for finite-size systems without translation invariance, but becomes particularly appealing when applied to infinite systems with translation invariance. If the ground states of a topologically ordered system on infinite cylinders are expressed as matrix product states, then the fixed points of the transfer matrix can be harnessed to filter out the anyon eigenbasis, also known as minimally entangled states. This allows for efficient computation of universal properties such as entanglement spectrum and modular matrices. The potential of our method is demonstrated by numerical calculations in two chiral spin liquids that have the same topological orders as the bosonic Laughlin and Moore-Read states, respectively. The anyon eigenbasis for the first one has been worked out before and serves as a useful benchmark. The anyon eigenbasis of the second one is, however, not transparent and its successful construction provides a nontrivial corroboration of our method.	翻訳日:2024-08-05 13:47:29 公開日:2024-08-02
# TCR-GPT:T細胞受容体レパートリー生成のための自己回帰モデルと強化学習の統合 TCR-GPT: Integrating Autoregressive Model and Reinforcement Learning for T-Cell Receptor Repertoires Generation ( http://arxiv.org/abs/2408.01156v1 ) ライセンス: Link先を確認	Yicheng Lin, Dandan Zhang, Yun Liu,	(参考訳) T細胞受容体(TCR)は、感染またはがん細胞によって提示される特定の抗原を認識し、結合することによって免疫系において重要な役割を担っている。 TCRの配列パターンを理解することは、標的となる免疫療法を開発し、効果的なワクチンを設計するのに不可欠である。自動回帰変換器のような言語モデルは、TCRレパートリーの確率分布を学習し、レパートリーの基本パターンを継承する新しいTCRシーケンスを生成することにより、この問題に対する強力な解決策を提供する。本稿では,デコーダのみのトランスアーキテクチャ上に構築された確率モデルTCR-GPTを紹介する。 TCR-GPTはピアソン相関係数によって測定されたシーケンス確率分布の推定において0.953の精度を示す。さらに, 強化学習(Reinforcement Learning, RL)を活用することで, TCR配列の分布を, 特定のペプチドを認識できるTCRの生成に適用し, 標的とする免疫療法やワクチン開発に有意義な可能性を示唆した。 RLの有効性により、微調整されたTCR-GPTモデルにより、特定のペプチドに結合する可能性のあるTCRレパートリーを生産する能力が示され、生物学的に関連するTCR配列の確率分布へのモデルの適応性を高める上でRLの効率が示された。 T-cell receptors (TCRs) play a crucial role in the immune system by recognizing and binding to specific antigens presented by infected or cancerous cells. Understanding the sequence patterns of TCRs is essential for developing targeted immune therapies and designing effective vaccines. Language models, such as auto-regressive transformers, offer a powerful solution to this problem by learning the probability distributions of TCR repertoires, enabling the generation of new TCR sequences that inherit the underlying patterns of the repertoire. We introduce TCR-GPT, a probabilistic model built on a decoder-only transformer architecture, designed to uncover and replicate sequence patterns in TCR repertoires. TCR-GPT demonstrates an accuracy of 0.953 in inferring sequence probability distributions measured by Pearson correlation coefficient. Furthermore, by leveraging Reinforcement Learning(RL), we adapted the distribution of TCR sequences to generate TCRs capable of recognizing specific peptides, offering significant potential for advancing targeted immune therapies and vaccine development. With the efficacy of RL, fine-tuned pretrained TCR-GPT models demonstrated the ability to produce TCR repertoires likely to bind specific peptides, illustrating RL's efficiency in enhancing the model's adaptability to the probability distributions of biologically relevant TCR sequences.	翻訳日:2024-08-05 13:47:29 公開日:2024-08-02
# 運動野を用いたボリューム医用画像のロバストカーブ検出 Robust Curve Detection in Volumetric Medical Imaging via Attraction Field ( http://arxiv.org/abs/2408.01159v1 ) ライセンス: Link先を確認	Farukh Yaushev, Daria Nogina, Valentin Samokhin, Mariya Dugova, Ekaterina Petrash, Dmitry Sevryukov, Mikhail Belyaev, Maxim Pisov,	(参考訳) 身体部分の幾何学を理解することは、正確な診断に不可欠である。カーブは解剖学的構造を効果的に記述し、心血管疾患、呼吸障害、骨格疾患に関連する医療画像の分野で広く用いられている。従来の曲線検出手法は、しばしばタスク固有のものであり、ドメイン固有の特徴に大きく依存し、適用範囲を制限している。本稿では, 物体の向き, 形状, 位置に関する事前の知識を必要としない非分岐曲線の検出手法を提案する。提案手法は,(1)サブピクセル精度を提供するアトラクション場,(2)関心領域を制限し,所望の曲線から外れたアウトリーチを本質的に排除するクローズネスマップをニューラルネットワークで予測する。各種形態の異なるいくつかの臨床的タスクに対して曲線検出器を試験し,既存の手法を超越した印象的なサブピクセルレベルの精度を達成し,その汎用性と堅牢性を強調した。さらに、この分野でさらなる進歩をサポートするために、大動脈中心線とマスクのプライベートアノテーションを提供し、将来の研究のベンチマークとして機能する。データセットはhttps://github.com/neuro-ml/curve-detectionで見ることができる。 Understanding body part geometry is crucial for precise medical diagnostics. Curves effectively describe anatomical structures and are widely used in medical imaging applications related to cardiovascular, respiratory, and skeletal diseases. Traditional curve detection methods are often task-specific, relying heavily on domain-specific features, limiting their broader applicability. This paper introduces a novel approach for detecting non-branching curves, which does not require prior knowledge of the object's orientation, shape, or position. Our method uses neural networks to predict (1) an attraction field, which offers subpixel accuracy, and (2) a closeness map, which limits the region of interest and essentially eliminates outliers far from the desired curve. We tested our curve detector on several clinically relevant tasks with diverse morphologies and achieved impressive subpixel-level accuracy results that surpass existing methods, highlighting its versatility and robustness. Additionally, to support further advancements in this field, we provide our private annotations of aortic centerlines and masks, which can serve as a benchmark for future research. The dataset can be found at https://github.com/neuro-ml/curve-detection.	翻訳日:2024-08-05 13:47:29 公開日:2024-08-02
# PreMix: バッチ内スライドミキシングによる事前トレーニングによるデジタル病理学における複数インスタンス学習の促進 PreMix: Boosting Multiple Instance Learning in Digital Histopathology through Pre-training with Intra-Batch Slide Mixing ( http://arxiv.org/abs/2408.01162v1 ) ライセンス: Link先を確認	Bryan Wong, Mun Yong Yi,	(参考訳) 高分解能スキャナーを用いて得られた組織スライドのデジタル表現であるギガピクセルサイズの全スライド画像(WSI)の分類は、きめ細かなラベリングの細部と時間的特性に関連する重大な課題に直面している。弱い教師付き多重インスタンス学習(MIL)が有望なアプローチとして登場したが、現在のMILメソッドは、ラベルのないWSIに埋め込まれた豊富な情報を活用できる能力に制限されている。この制限は、しばしば、特徴抽出プロセス後のスクラッチからMIL機能アグリゲータを訓練し、効率と精度を阻害する。 PreMixは、MILアグリゲータをバッチ内スライド混合アプローチで事前トレーニングすることで、一般的なMILフレームワークを拡張している。具体的には、PreMixは事前トレーニング中にBarlow Twins Slide Mixingを導入し、様々なWSIサイズを扱う能力を高め、ラベルなしWSIの有用性を最大化します。微調整中にMixupとManifold Mixupと組み合わせることで、PreMixはCamelyon16データセット上の階層画像ピラミッドトランスフォーマー(HIPT)のベースラインMILフレームワークよりも4.7%パフォーマンスが向上した。さまざまなアクティブな学習獲得機能とWSIラベルのトレーニング予算による改善は、さまざまなデータセットへのフレームワークの適応性と、さまざまなリソース制約を強調します。最終的にPreMixは、限られたWSIラベル付きデータセットの下で、より効率的で正確なWSI分類の道を開いた。コードはhttps://anonymous.4open.science/r/PreMixで公開されている。 The classification of gigapixel-sized whole slide images (WSIs), digital representations of histological slides obtained via a high-resolution scanner, faces significant challenges associated with the meticulous and time-consuming nature of fine-grained labeling. While weakly-supervised multiple instance learning (MIL) has emerged as a promising approach, current MIL methods are constrained by their limited ability to leverage the wealth of information embedded within unlabeled WSIs. This limitation often necessitates training MIL feature aggregators from scratch after the feature extraction process, hindering efficiency and accuracy. PreMix extends the general MIL framework by pre-training the MIL aggregator with an intra-batch slide mixing approach. Specifically, PreMix incorporates Barlow Twins Slide Mixing during pre-training, enhancing its ability to handle diverse WSI sizes and maximizing the utility of unlabeled WSIs. Combined with Mixup and Manifold Mixup during fine-tuning, PreMix achieves a mean of 4.7% performance improvement over the baseline MIL framework, the hierarchical image pyramid transformer (HIPT), on the Camelyon16 dataset. The observed improvement across a range of active learning acquisition functions and WSI-labeled training budgets highlights the framework's adaptability to diverse datasets and varying resource constraints. Ultimately, PreMix paves the way for more efficient and accurate WSI classification under limited WSI-labeled datasets, encouraging the broader adoption of unlabeled WSI data in histopathological research. The code is available at https://anonymous.4open.science/r/PreMix	翻訳日:2024-08-05 13:47:29 公開日:2024-08-02
# ドメイン適応強化サーチライト:脳の認知から精神イメージへのデコーディングの実現 Domain Adaptation-Enhanced Searchlight: Enabling brain decoding from visual perception to mental imagery ( http://arxiv.org/abs/2408.01163v1 ) ライセンス: Link先を確認	Alexander Olza, David Soto, Roberto Santana,	(参考訳) 認知神経科学と脳-コンピュータインターフェースの研究では、想像された刺激を正確に予測することが重要である。本研究は, 画像予測における領域適応(DA)の有効性について検討した。当初我々は、14の脳領域のデータを利用して、視覚刺激のベースラインモデルをトレーニングし、想像された刺激を予測する。次に、様々なDA手法を比較し、画像予測を改善するために複数のモデルを開発する。以上の結果から,DAは画像予測の精度を著しく向上させることが明らかとなった。次に、正規転送を用いたDA強化サーチライト分析を行い、その後、置換に基づく統計的テストを行い、画像復号が被検体全体で常に上回っている脳領域を特定する。我々のDA強化サーチライトは、視覚野や前頭前頭葉皮質を含む高度に分散した脳領域のイメージ内容を予測し、標準的なクロスドメイン分類法より優れている。この論文の完全なコードとデータは、科学コミュニティの利用のために公開されています。 In cognitive neuroscience and brain-computer interface research, accurately predicting imagined stimuli is crucial. This study investigates the effectiveness of Domain Adaptation (DA) in enhancing imagery prediction using primarily visual data from fMRI scans of 18 subjects. Initially, we train a baseline model on visual stimuli to predict imagined stimuli, utilizing data from 14 brain regions. We then develop several models to improve imagery prediction, comparing different DA methods. Our results demonstrate that DA significantly enhances imagery prediction, especially with the Regular Transfer approach. We then conduct a DA-enhanced searchlight analysis using Regular Transfer, followed by permutation-based statistical tests to identify brain regions where imagery decoding is consistently above chance across subjects. Our DA-enhanced searchlight predicts imagery contents in a highly distributed set of brain regions, including the visual cortex and the frontoparietal cortex, thereby outperforming standard cross-domain classification methods. The complete code and data for this paper have been made openly available for the use of the scientific community.	翻訳日:2024-08-05 13:47:29 公開日:2024-08-02
# 複雑な生体系における物理的に許容できる分類器による可視性ナノダイヤモンドの識別的アドレス化 Discriminative Addressing of Versatile Nanodiamonds via Physically-Enabled Classifier in Complex Bio-Systems ( http://arxiv.org/abs/2408.01164v1 ) ライセンス: Link先を確認	Yayin Tan, Xiaolu Wang, Feng Xu, Xinhao Hu, Yuan Lin, Bo Gao, Zhiqin Chu,	(参考訳) 窒素空孔(NV)センターは、ナノスケールのバイオセンシングとバイオイメージングに大きな可能性を秘めている。それにもかかわらず、彼らの想定する生体応用は、避けられない光散乱と細胞や組織における自己蛍光による固有の背景ノイズに悩まされる。そこで本研究では,背景雑音を効果的に除去しつつ,画素解像度でNV蛍光にオンデマンドかつ直接アクセスするための,物理機能付き分類器による新しい全光変調撮像法を開発した。具体的には、NV蛍光を光学的に変調して正弦波様の変化を示し、分類の基礎を与えることができる。本手法は, 細胞から生物まで, 蛍光干渉を伴う複雑な生物学的シナリオで検証する。特に,我々の分類に基づくアプローチは,神経タンパク質イメージングにおける蛍光ナノダイアモンド(FND)の信号-背景比(SBR)を約10^6倍に向上させる。また、染色細胞中のFNDの光検出磁気共鳴測定(ODMR)において、4倍のコントラスト改善を示す。提案手法は, 現実的な高忠実度イメージングや, 難聴シナリオのセンシングに応用可能な汎用的, 説明可能な, 堅牢なソリューションを提供する。 Nitrogen-vacancy (NV) centers show great potentials for nanoscale bio-sensing and bio-imaging. Nevertheless, their envisioned bio-applications suffer from intrinsic background noise due to unavoidable light scattering and autofluorescence in cells and tissues. Herein, we develop a novel all-optical modulated imaging method via physically-enabled classifier, for on-demand and direct access to NV fluorescence at pixel resolution while effectively filtering out background noise. Specifically, NV fluorescence can be modulated optically to exhibit sinusoid-like variations, providing basis for classification. We validate our method in various complex biological scenarios with fluorescence interference, ranging from cells to organisms. Notably, our classification-based approach achieves almost 10^6 times enhancement of signal-to-background ratio (SBR) for fluorescent nanodiamonds (FNDs) in neural protein imaging. We also demonstrate 4-fold contrast improvement in optically-detected magnetic resonance measurements (ODMR) of FNDs inside stained cells. Our technique offers a generic, explainable and robust solution, applicable for realistic high-fidelity imaging and sensing in challenging noise-laden scenarios.	翻訳日:2024-08-05 13:47:29 公開日:2024-08-02
# 連続時間ニューラルネットワークは、ランダムスパイク列車を安定的に記憶できる Continuous-Time Neural Networks Can Stably Memorize Random Spike Trains ( http://arxiv.org/abs/2408.01166v1 ) ライセンス: Link先を確認	Hugo Aguettaz, Hans-Andrea Loeliger,	(参考訳) 本稿では,連続時間リカレントニューラルネットワークによるスパイクパターンの保存とリコール能力について検討する。ある種のパラメータにおいて、スパイク列(ネットワーク内のすべてのニューロン)のランダムスコアは、全てのスパイクの安定した正確な相対時間で頑健に記憶され、自律的に再生され、確率は1に近い。また,ノイズ条件下での連想的リコールも示す。これらの実験では、必要なシナプス重みはオフラインで計算され、時間的安定性を促進するテンプレートを満たす。 The paper explores the capability of continuous-time recurrent neural networks to store and recall precisely timed spike patterns. We show (by numerical experiments) that this is indeed possible: within some range of parameters, any random score of spike trains (for all neurons in the network) can be robustly memorized and autonomously reproduced with stable accurate relative timing of all spikes, with probability close to one. We also demonstrate associative recall under noisy conditions. In these experiments, the required synaptic weights are computed offline, to satisfy a template that encourages temporal stability.	翻訳日:2024-08-05 13:47:29 公開日:2024-08-02
# 全スライド画像分類のための複数インスタンス学習における事前訓練された特徴外子選択の再考 Rethinking Pre-trained Feature Extractor Selection in Multiple Instance Learning for Whole Slide Image Classification ( http://arxiv.org/abs/2408.01167v1 ) ライセンス: Link先を確認	Bryan Wong, Mun Yong Yi,	(参考訳) 多重インスタンス学習(MIL)は、パッチラベルアノテーションを必要とせず、ギガピクセル全体のスライド画像(WSI)を分類する方法として好まれている。現在のMIL研究ストリームの焦点は、事前訓練された特徴抽出器を使用してパッチから特徴ベクトルを抽出する組み込みベースのMILアプローチである。これらの特徴ベクトルは、スライドレベルの予測のためにMILアグリゲータに入力される。 ImageNet-1Kで事前訓練された最も一般的なResNet50教師付きモデルの強化に関する以前の研究提案にもかかわらず、WSI性能を最大化するために最適な特徴抽出器を選択するための明確なガイダンスがない。本研究は,3次元のMIL特徴抽出器(事前学習データセット,バックボーンモデル,事前学習手法)を用いて,このギャップに対処することを目的とする。 4つのSOTA MILモデルを用いて2つのWSIデータセット(TCGA-NSCLCとCamelyon16)で大規模な実験を行った。主な発見は以下のとおりである。 1) CNNとTransformerのバックボーンにおいて,より大きく,より多様な事前トレーニングデータセットにより,パフォーマンスが大幅に向上する。 2) `Modern and Deep' バックボーンは ‘standard' バックボーン(ResNet と ViT)を大幅に上回り、Transformer ベースのバックボーンではパフォーマンス改善がより保証されている。 3) Transformer (ViT) バックボーンに適用した場合, 自己教師あり学習 (SSL) の選択は極めて重要である。研究結果は、より効果的な病理基盤モデルの設計を含む、実践的な意味を持つ。私たちのコードは、https://anonymous.4open.science/r/MIL-Feature-Extractor-Selectionで利用可能です。 Multiple instance learning (MIL) has become a preferred method for classifying gigapixel whole slide images (WSIs), without requiring patch label annotation. The focus of the current MIL research stream is on the embedding-based MIL approach, which involves extracting feature vectors from patches using a pre-trained feature extractor. These feature vectors are then fed into an MIL aggregator for slide-level prediction. Despite prior research suggestions on enhancing the most commonly used ResNet50 supervised model pre-trained on ImageNet-1K, there remains a lack of clear guidance on selecting the optimal feature extractor to maximize WSI performance. This study aims at addressing this gap by examining MIL feature extractors across three dimensions: pre-training dataset, backbone model, and pre-training method. Extensive experiments were carried out on the two public WSI datasets (TCGA-NSCLC and Camelyon16) using four SOTA MIL models. The main findings indicate the following: 1) Performance significantly improves with larger and more varied pre-training datasets in both CNN and Transformer backbones. 2) `Modern and deeper' backbones greatly outperform `standard' backbones (ResNet and ViT), with performance improvements more guaranteed in Transformer-based backbones. 3) The choice of self-supervised learning (SSL) method is crucial, with the most significant benefits observed when applied to the Transformer (ViT) backbone. The study findings have practical implications, including designing more effective pathological foundation models. Our code is available at: https://anonymous.4open.science/r/MIL-Feature-Extractor-Selection	翻訳日:2024-08-05 13:47:29 公開日:2024-08-02
# LLMを誤解させる: 脆弱性、課題、機会 Misinforming LLMs: vulnerabilities, challenges and opportunities ( http://arxiv.org/abs/2408.01168v1 ) ライセンス: Link先を確認	Bo Zhou, Daniel Geißler, Paul Lukowicz,	(参考訳) 大規模言語モデル(LLM)は自然言語処理において大きな進歩を遂げているが、その基盤となるメカニズムはしばしば誤解されている。一貫性のある答えと明らかな推論行動を示すにもかかわらず、LLMは真の認知過程ではなく、単語の埋め込みにおける統計的パターンに依存している。これは"幻覚"や誤報といった脆弱性につながる。この論文は、現在のLLMアーキテクチャは、単語埋め込みベクトルの逐次パターンの相関に依存するため、本質的に信頼できないと論じている。しかし、生成トランスフォーマーモデルとファクトベースと論理型言語を組み合わせる研究は、与えられた真実に基づいてステートメントを生成し、自己推論プロセスを説明することができる信頼できるLCMの開発に繋がる可能性がある。 Large Language Models (LLMs) have made significant advances in natural language processing, but their underlying mechanisms are often misunderstood. Despite exhibiting coherent answers and apparent reasoning behaviors, LLMs rely on statistical patterns in word embeddings rather than true cognitive processes. This leads to vulnerabilities such as "hallucination" and misinformation. The paper argues that current LLM architectures are inherently untrustworthy due to their reliance on correlations of sequential patterns of word embedding vectors. However, ongoing research into combining generative transformer-based models with fact bases and logic programming languages may lead to the development of trustworthy LLMs capable of generating statements based on given truth and explaining their self-reasoning process.	翻訳日:2024-08-05 13:47:29 公開日:2024-08-02
# 産業用サイバー物理システムにおけるAI駆動型ディジタル双生児の持続的拡散型インセンティブメカニズム Sustainable Diffusion-based Incentive Mechanism for Generative AI-driven Digital Twins in Industrial Cyber-Physical Systems ( http://arxiv.org/abs/2408.01173v1 ) ライセンス: Link先を確認	Jinbo Wen, Jiawen Kang, Dusit Niyato, Yang Zhang, Shiwen Mao,	(参考訳) 産業用サイバー物理システム(ICPS)は、現代の製造業と産業にとって不可欠なコンポーネントである。製品ライフサイクルを通じてデータをデジタル化することで、ICPSのDigital Twins(DT)は、現在の産業インフラからインテリジェントで適応的なインフラへの移行を可能にします。データ処理機能のおかげで、生成人工知能(GAI)はDTの構築と更新を推し進め、予測精度を改善し、多様なスマート製造の準備ができる。しかし, 産業用IoT(Industrial Internet of Things, IIoT)デバイスを利用したDT構築のためのデータ共有機構は, 有害な選択問題の影響を受けやすい。本稿ではまず,ICPSのためのGAI駆動型DTアーキテクチャを開発する。情報非対称性に起因する有害な選択問題に対処するため,契約理論モデルを提案し,持続可能な拡散に基づくソフトアクター・クリティック・アルゴリズムを開発し,最適に実現可能な契約を同定する。具体的には,動的構造化プルーニング技術を利用してアクターネットワークのパラメータ数を削減し,提案アルゴリズムのサステナビリティと効率的な実装を可能にする。最後に,提案手法の有効性を数値的に示す。 Industrial Cyber-Physical Systems (ICPSs) are an integral component of modern manufacturing and industries. By digitizing data throughout the product life cycle, Digital Twins (DTs) in ICPSs enable a shift from current industrial infrastructures to intelligent and adaptive infrastructures. Thanks to data process capability, Generative Artificial Intelligence (GAI) can drive the construction and update of DTs to improve predictive accuracy and prepare for diverse smart manufacturing. However, mechanisms that leverage sensing Industrial Internet of Things (IIoT) devices to share data for the construction of DTs are susceptible to adverse selection problems. In this paper, we first develop a GAI-driven DT architecture for ICPSs. To address the adverse selection problem caused by information asymmetry, we propose a contract theory model and develop the sustainable diffusion-based soft actor-critic algorithm to identify the optimal feasible contract. Specifically, we leverage the dynamic structured pruning technique to reduce parameter numbers of actor networks, allowing sustainability and efficient implementation of the proposed algorithm. Finally, numerical results demonstrate the effectiveness of the proposed scheme.	翻訳日:2024-08-05 13:47:29 公開日:2024-08-02
# 量子ネットワーク間の分数的状態伝達による決定論的多部絡み合い Deterministic multipartite entanglement via fractional state transfer across quantum networks ( http://arxiv.org/abs/2408.01177v1 ) ライセンス: Link先を確認	G. F. Peñas, J. -J. García-Ripoll, R. Puebla,	(参考訳) 分散量子アーキテクチャにおける異なるノード間の絡み合いの生成は、異なるアプリケーションにおいて重要な役割を果たす。特に、決定論的で堅牢で高速なプロトコルは、真のマルチパートの絡み合った状態を作るのが非常に望ましい。本稿では,エミッタの励起が部分的に量子通信チャネルを介して伝達され,空間的に分離されたノードで吸収される分数量子状態伝達を提案する。このプロトコルは2つの量子レジスタ間のベル状態の高速な決定論的生成を可能にし、ネットワークのトポロジに応じて連続的または同時的に$N$ qubitsの一般的な設定に対して$W$状態を提供する。詳細な数値シミュレーションにより, 真のマルチパーティント絡み合った状態は, 現在の実験プラットフォーム内で忠実に準備できることを示し, ネットワークトポロジに応じて, 主デコヒーレンス源, クビットデフォーカス, 緩和の役割について議論する。 The generation of entanglement across different nodes in distributed quantum architectures plays a pivotal role for different applications. In particular, deterministic, robust, and fast protocols that prepare genuine multipartite entangled states are highly desirable. In this article, we propose a fractional quantum state transfer, in which the excitation of an emitter is partially transmitted through the quantum communication channel and then absorbed at a spatially separated node. This protocol is based on wavepacket shaping allowing for a fast deterministic generation of Bell states among two quantum registers and $W$ states for a general setting of $N$ qubits, either in a sequential or simultaneous fashion, depending on the topology of the network. By means of detailed numerical simulations, we show that genuine multipartite entangled states can be faithfully prepared within current experimental platforms and discuss the role of the main decoherence sources, qubit dephasing and relaxation, depending on the network topology.	翻訳日:2024-08-05 13:37:26 公開日:2024-08-02
# EmoBack:感情韻律を用いた話者識別に対するバックドア攻撃 EmoBack: Backdoor Attacks Against Speaker Identification Using Emotional Prosody ( http://arxiv.org/abs/2408.01178v1 ) ライセンス: Link先を確認	Coen Schoof, Stefanos Koffas, Mauro Conti, Stjepan Picek,	(参考訳) 話者識別(SI)は、話者の発話に基づいて話者の身元を決定する。これまでの研究は、SIディープニューラルネットワーク(DNN)がバックドア攻撃に対して脆弱であることを示している。バックドア攻撃は、DNNのトレーニングデータに隠れたトリガを埋め込むことで、推論中にこれらのトリガが存在する場合、DNNは誤った出力を生成する。これは、SI DNNのバックドア攻撃に対する脆弱性を話者の感情的韻律を用いて探求する最初の作品であり、動的で目立たないトリガーをもたらす。攻撃によって、鑑識、認証、監視に現実世界に影響を及ぼす可能性がある。 3つの異なるデータセットとDNNアーキテクチャを用いてパラメータスタディを行い、SIシステムの正確性に対するバックドアトリガーとしての感情の影響を調べた。さらに、プルーニング、STRIP-ViTA、および量子化、中央値フィルタリング、スクイーズという3つの一般的な前処理技術による攻撃の堅牢性についても検討した。以上の結果から, 上記のモデルでは攻撃の傾向が強く, 感情的トリガー(サドと中性韻律)がSIシステムの整合性を損なうのに有効であることが示唆された。しかし, プルーニング実験の結果から, 攻撃に対するモデル強化の潜在的な解決策が示唆され, 攻撃成功率は40%まで低下した。 Speaker identification (SI) determines a speaker's identity based on their spoken utterances. Previous work indicates that SI deep neural networks (DNNs) are vulnerable to backdoor attacks. Backdoor attacks involve embedding hidden triggers in DNNs' training data, causing the DNN to produce incorrect output when these triggers are present during inference. This is the first work that explores SI DNNs' vulnerability to backdoor attacks using speakers' emotional prosody, resulting in dynamic, inconspicuous triggers. %Such an attack could have real-world implications in forensics, authentication, and surveillance. We conducted a parameter study using three different datasets and DNN architectures to determine the impact of emotions as backdoor triggers on the accuracy of SI systems. Additionally, we have explored the robustness of our attacks by applying defenses like pruning, STRIP-ViTA, and three popular preprocessing techniques: quantization, median filtering, and squeezing. Our findings show that the aforementioned models are prone to our attack, indicating that emotional triggers (sad and neutral prosody) can be effectively used to compromise the integrity of SI systems. However, the results of our pruning experiments suggest potential solutions for reinforcing the models against our attacks, decreasing the attack success rate up to 40%.	翻訳日:2024-08-05 13:37:26 公開日:2024-08-02
# Nested Music Transformer:シンボリック・ミュージックとオーディオ・ジェネレーションにおける複合トークンの逐次デコード Nested Music Transformer: Sequentially Decoding Compound Tokens in Symbolic Music and Audio Generation ( http://arxiv.org/abs/2408.01180v1 ) ライセンス: Link先を確認	Jiwoo Ryu, Hao-Wen Dong, Jongmin Jung, Dasaem Jeong,	(参考訳) 記号を複合トークンで表現し、それぞれのトークンは異なる音楽の特徴や属性を表すいくつかの異なるサブトークンで構成されており、シーケンス長を減少させる利点がある。音楽シーケンスモデリングにおける複合トークンの有効性は過去の研究で検証されているが、全てのサブトークンを同時に予測することは、それらの相互依存性を完全に把握できないため、最適以下の結果につながる可能性がある。我々はNested Music Transformer(NMT)を紹介した。これは、フラット化トークンの処理と似ているが、メモリ使用量の少ない複合トークンを自動回帰的に復号するアーキテクチャである。 NMTは、複合トークンの列をモデル化するメインデコーダと、各複合トークンのサブトークンをモデル化するサブデコーダの2つのトランスフォーマから構成される。実験の結果,複合トークンにNMTを適用することで,MAESTROデータセットから様々なシンボリック音楽データセットや離散音声トークンを処理する際の難易度が向上することが示された。 Representing symbolic music with compound tokens, where each token consists of several different sub-tokens representing a distinct musical feature or attribute, offers the advantage of reducing sequence length. While previous research has validated the efficacy of compound tokens in music sequence modeling, predicting all sub-tokens simultaneously can lead to suboptimal results as it may not fully capture the interdependencies between them. We introduce the Nested Music Transformer (NMT), an architecture tailored for decoding compound tokens autoregressively, similar to processing flattened tokens, but with low memory usage. The NMT consists of two transformers: the main decoder that models a sequence of compound tokens and the sub-decoder for modeling sub-tokens of each compound token. The experiment results showed that applying the NMT to compound tokens can enhance the performance in terms of better perplexity in processing various symbolic music datasets and discrete audio tokens from the MAESTRO dataset.	翻訳日:2024-08-05 13:37:26 公開日:2024-08-02
# VAR-CLIP:視覚的自己回帰モデルを用いたテキスト・画像生成装置 VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling ( http://arxiv.org/abs/2408.01181v1 ) ライセンス: Link先を確認	Qian Zhang, Xiangzi Dai, Ninghua Yang, Xiang An, Ziyong Feng, Xingyu Ren,	(参考訳) VARは「次世代予測」とは対照的に「次世代予測」を用いる新世代のパラダイムである。この革新的な変換により、自動回帰(AR)変換器は視覚分布を迅速に学習し、堅牢な一般化を実現することができる。しかしながら、オリジナルのVARモデルは、ガイダンスのためのテキストキャプションのみに依存するため、クラス条件の合成に制約されている。本稿では,Visual Auto-Regressive技術とCLIPの機能を統合する新しいテキスト・ツー・イメージ・モデルであるVAR-CLIPを紹介する。 VAR-CLIPフレームワークはキャプションをテキスト埋め込みにエンコードし、画像生成のテキスト条件として使用される。 ImageNetなどの広範なデータセットのトレーニングを容易にするため,BLIP2を利用した画像テキストデータセットを構築した。さらに,キャプションガイダンスの目的で,CLIP内の単語位置決めの重要性について検討した。 VAR-CLIPは,高忠実度,テキストの整合性,美的卓越性を有する幻想画像の生成に優れていた。私たちのプロジェクトページはhttps://github.com/daixiangzi/VAR-CLIPです。 VAR is a new generation paradigm that employs 'next-scale prediction' as opposed to 'next-token prediction'. This innovative transformation enables auto-regressive (AR) transformers to rapidly learn visual distributions and achieve robust generalization. However, the original VAR model is constrained to class-conditioned synthesis, relying solely on textual captions for guidance. In this paper, we introduce VAR-CLIP, a novel text-to-image model that integrates Visual Auto-Regressive techniques with the capabilities of CLIP. The VAR-CLIP framework encodes captions into text embeddings, which are then utilized as textual conditions for image generation. To facilitate training on extensive datasets, such as ImageNet, we have constructed a substantial image-text dataset leveraging BLIP2. Furthermore, we delve into the significance of word positioning within CLIP for the purpose of caption guidance. Extensive experiments confirm VAR-CLIP's proficiency in generating fantasy images with high fidelity, textual congruence, and aesthetic excellence. Our project page are https://github.com/daixiangzi/VAR-CLIP	翻訳日:2024-08-05 13:37:26 公開日:2024-08-02
# 強化学習におけるメタヒューリスティック戦略を用いた変分量子回路の最適化 Optimizing Variational Quantum Circuits Using Metaheuristic Strategies in Reinforcement Learning ( http://arxiv.org/abs/2408.01187v1 ) ライセンス: Link先を確認	Michael Kölle, Daniel Seidl, Maximilian Zorn, Philipp Altmann, Jonas Stein, Thomas Gabor,	(参考訳) 量子強化学習(QRL)は、特定のシナリオにおいて、コンパクトな状態空間表現やより高速な収束など、古典的な強化学習よりも潜在的に有利である。しかし、実際的な利点はさらなる検証を必要とする。 QRLはフラットなソリューションランドスケープのような課題に直面している。本研究では,メタヒューリスティックアルゴリズム – Particle Swarm Optimization, Ant Colony Optimization, Tabu Search, Genetic Algorithm, Simulated Annealing, Harmony Search – のQRLへの統合について検討する。これらのアルゴリズムはパラメータ最適化の柔軟性と効率性を提供する。 5\times5$ MiniGrid Reinforcement Learning環境の評価は、全てのアルゴリズムが最適に近い結果を得ることを示している。キャットポール環境では、シミュレートされたアニーリング、遺伝的アルゴリズム、パーティクルスワーム最適化が最適な結果を得る一方、他はランダムなアクション選択よりも若干良い結果が得られる。これらの結果から,QRL学習を効率的に行うために,Particle Swarm Optimization と Simulated Annealing の可能性を示唆し,アルゴリズムの選択と適応を慎重に行う必要性を強調した。 Quantum Reinforcement Learning (QRL) offers potential advantages over classical Reinforcement Learning, such as compact state space representation and faster convergence in certain scenarios. However, practical benefits require further validation. QRL faces challenges like flat solution landscapes, where traditional gradient-based methods are inefficient, necessitating the use of gradient-free algorithms. This work explores the integration of metaheuristic algorithms -- Particle Swarm Optimization, Ant Colony Optimization, Tabu Search, Genetic Algorithm, Simulated Annealing, and Harmony Search -- into QRL. These algorithms provide flexibility and efficiency in parameter optimization. Evaluations in $5\times5$ MiniGrid Reinforcement Learning environments show that, all algorithms yield near-optimal results, with Simulated Annealing and Particle Swarm Optimization performing best. In the Cart Pole environment, Simulated Annealing, Genetic Algorithms, and Particle Swarm Optimization achieve optimal results, while the others perform slightly better than random action selection. These findings demonstrate the potential of Particle Swarm Optimization and Simulated Annealing for efficient QRL learning, emphasizing the need for careful algorithm selection and adaptation.	翻訳日:2024-08-05 13:37:26 公開日:2024-08-02
# 自律システムにおける最適化のための多目的深層強化学習 Multi-Objective Deep Reinforcement Learning for Optimisation in Autonomous Systems ( http://arxiv.org/abs/2408.01188v1 ) ライセンス: Link先を確認	Juan C. Rosero, Ivana Dusparic, Nicolás Cardozo,	(参考訳) 強化学習(Reinforcement Learning, RL)は、自律システム(AS)において、環境のモデルや事前定義されたアクションを必要とせず、実行時の学習を可能にするために広く使用されている。しかしながら、Q-learning のような AS における RL のほとんどの応用は、1つの目的のみを最適化することができ、複数の目的を1つの目的関数と事前定義された重みで組み合わせるために、多目的システムにおいて必要となる。 MORL(Multi-Objective Reinforcement Learning)技術はいくつか存在するが、実世界のASシステムではなくRLベンチマークで採用されている。本稿では,Deep W-Learning(DWN)と呼ばれるMORL技術を用いて,自己適応型サーバであるEmergent Web Servers exemplarに適用し,実行時のパフォーマンス最適化に最適な構成を求める。 DWNを2つの単目的最適化実装と比較する: {\epsilon}-greedyアルゴリズムとDeep Q-Networks。最初の評価では,DWN は DQN と {\epsilon}-greedy のアプローチと類似した結果と同時に複数の目的を最適化し,いくつかの指標の性能が向上し,複数の目的をひとつのユーティリティ関数に結合する問題を回避する。 Reinforcement Learning (RL) is used extensively in Autonomous Systems (AS) as it enables learning at runtime without the need for a model of the environment or predefined actions. However, most applications of RL in AS, such as those based on Q-learning, can only optimize one objective, making it necessary in multi-objective systems to combine multiple objectives in a single objective function with predefined weights. A number of Multi-Objective Reinforcement Learning (MORL) techniques exist but they have mostly been applied in RL benchmarks rather than real-world AS systems. In this work, we use a MORL technique called Deep W-Learning (DWN) and apply it to the Emergent Web Servers exemplar, a self-adaptive server, to find the optimal configuration for runtime performance optimization. We compare DWN to two single-objective optimization implementations: {\epsilon}-greedy algorithm and Deep Q-Networks. Our initial evaluation shows that DWN optimizes multiple objectives simultaneously with similar results than DQN and {\epsilon}-greedy approaches, having a better performance for some metrics, and avoids issues associated with combining multiple objectives into a single utility function.	翻訳日:2024-08-05 13:37:26 公開日:2024-08-02
# 脳腫瘍分離のための弱教師付き,グローバルに説明可能な学習フレームワーク A Weakly Supervised and Globally Explainable Learning Framework for Brain Tumor Segmentation ( http://arxiv.org/abs/2408.01191v1 ) ライセンス: Link先を確認	Ruitao Xie, Limai Jiang, Xiaoxi He, Yi Pan, Yunpeng Cai,	(参考訳) マシンベースの脳腫瘍セグメント化は、医師がより良い診断を行うのに役立つ。しかし、脳腫瘍の複雑な構造と高価なピクセルレベルのアノテーションは、自動腫瘍セグメンテーションの課題を呈している。本稿では, ピクセルレベルのアノテーションを必要とせずに, 例外的な脳腫瘍のセグメンテーション性能を実現するための反ファクト生成フレームワークを提案する。本フレームワークは, クラス関連機能とクラス関連機能とを効果的に分離し, クラス関連機能を埋め込み, クラス属性を変更しながら, アイデンティティ機能を保存する新しいサンプルを生成する。抽出したクラス関連特徴についてトポロジカルデータ解析を行い,グローバルに説明可能な多様体を得るとともに,各異常標本をセグメント化するために,腫瘍領域の同定のための比較のために,多様体内に設計されたルールベースパスのガイダンスを用いて有意な正常標本を効果的に生成することができた。提案手法を2つのデータセットで評価し,脳腫瘍セグメンテーションの優れた性能を示す。コードはhttps://github.com/xrt11/tumor-segmentationで入手できる。 Machine-based brain tumor segmentation can help doctors make better diagnoses. However, the complex structure of brain tumors and expensive pixel-level annotations present challenges for automatic tumor segmentation. In this paper, we propose a counterfactual generation framework that not only achieves exceptional brain tumor segmentation performance without the need for pixel-level annotations, but also provides explainability. Our framework effectively separates class-related features from class-unrelated features of the samples, and generate new samples that preserve identity features while altering class attributes by embedding different class-related features. We perform topological data analysis on the extracted class-related features and obtain a globally explainable manifold, and for each abnormal sample to be segmented, a meaningful normal sample could be effectively generated with the guidance of the rule-based paths designed within the manifold for comparison for identifying the tumor regions. We evaluate our proposed method on two datasets, which demonstrates superior performance of brain tumor segmentation. The code is available at https://github.com/xrt11/tumor-segmentation.	翻訳日:2024-08-05 13:37:26 公開日:2024-08-02
# 能動的ロバスト符号化方式 Certifiably Robust Encoding Schemes ( http://arxiv.org/abs/2408.01200v1 ) ライセンス: Link先を確認	Aman Saxena, Tom Wollschläger, Nicola Franco, Jeanette Miriam Lorenz, Stephan Günnemann,	(参考訳) 量子機械学習は、量子力学の原理を使ってデータを処理し、速度と性能の潜在的な進歩を提供する。しかし、以前の研究では、これらのモデルが入力データを操作する攻撃や量子回路のノイズを利用する攻撃に感受性があることが示されている。これに続いて、これらのモデルの堅牢性について様々な研究がなされている。これらの研究は、量子状態の操作の堅牢性証明に焦点を当てている。本研究では,従来のデータ符号化方式における摂動に対するロバスト性を調べることで,この研究線を拡張した。このようなスキームでは、従来の機械学習によるランダム化平滑化と類似した、スムーズなデータにおけるノイズレス分類器の平均値を評価するのに、適切なノイズチャネルの追加が等価であることを示す。一般の枠組みを用いて、位相減衰型ノイズチャネルの適切な付加により、検討された符号化方式に対する経験的かつ証明可能なロバスト性が向上することを示す。 Quantum machine learning uses principles from quantum mechanics to process data, offering potential advances in speed and performance. However, previous work has shown that these models are susceptible to attacks that manipulate input data or exploit noise in quantum circuits. Following this, various studies have explored the robustness of these models. These works focus on the robustness certification of manipulations of the quantum states. We extend this line of research by investigating the robustness against perturbations in the classical data for a general class of data encoding schemes. We show that for such schemes, the addition of suitable noise channels is equivalent to evaluating the mean value of the noiseless classifier at the smoothed data, akin to Randomized Smoothing from classical machine learning. Using our general framework, we show that suitable additions of phase-damping noise channels improve empirical and provable robustness for the considered class of encoding schemes.	翻訳日:2024-08-05 13:37:26 公開日:2024-08-02
# 大規模言語モデルを用いた臨床テキストの高調波フェノタイピング High-Throughput Phenotyping of Clinical Text Using Large Language Models ( http://arxiv.org/abs/2408.01214v1 ) ライセンス: Link先を確認	Daniel B. Hier, S. Ilyas Munzir, Anne Stahlfeld, Tayo Obafemi-Ajayi, Michael D. Carrithers,	(参考訳) 高スループット表現型化は、患者サインの標準化されたオントロジー概念へのマッピングを自動化し、精密医療に必須である。本研究では,大規模言語モデルを用いたオンラインMendelian Inheritance in Man (OMIM)データベースから臨床要約の表現の自動化について検討した。豊かな表現型データのため、これらの要約は医師のメモの代理となる。 GPT-4とGPT-3.5-Turboの性能比較を行った。その結果, GPT-4 は GPT-3.5-Turbo を超越し, 信号の識別, 分類, 正規化を行い, 文字間合意に匹敵する手動アノテータと一致した。符号正規化のいくつかの制限にもかかわらず、GPT-4の広範囲な事前訓練は、手動で注釈付けされたトレーニングデータの必要性を回避しつつ、複数の表現型タスクのハイパフォーマンスと一般化性をもたらす。大規模言語モデルが臨床テキストの高スループット表現型自動化の主流となることが期待されている。 High-throughput phenotyping automates the mapping of patient signs to standardized ontology concepts and is essential for precision medicine. This study evaluates the automation of phenotyping of clinical summaries from the Online Mendelian Inheritance in Man (OMIM) database using large language models. Due to their rich phenotype data, these summaries can be surrogates for physician notes. We conduct a performance comparison of GPT-4 and GPT-3.5-Turbo. Our results indicate that GPT-4 surpasses GPT-3.5-Turbo in identifying, categorizing, and normalizing signs, achieving concordance with manual annotators comparable to inter-rater agreement. Despite some limitations in sign normalization, the extensive pre-training of GPT-4 results in high performance and generalizability across several phenotyping tasks while obviating the need for manually annotated training data. Large language models are expected to be the dominant method for automating high-throughput phenotyping of clinical text.	翻訳日:2024-08-05 13:37:26 公開日:2024-08-02
# ZNorm: 加速ニューラルネットワークトレーニングのためのZスコア勾配正規化 ZNorm: Z-Score Gradient Normalization for Accelerating Neural Network Training ( http://arxiv.org/abs/2408.01215v1 ) ライセンス: Link先を確認	Juyoung Yun, Hoyoung Kim, Suin Cho, Hangil Kang,	(参考訳) ディープラーニングの急速な進歩は、ディープニューラルネットワーク(DNN)の効率的なトレーニング方法を必要とする。モデルが複雑化するにつれて、勾配の消滅と爆発は収束と性能を妨げる。本研究では,Z-Score Normalization for Gradient Descent (ZNorm)を提案する。 ZNormは、全体的な勾配を正規化し、層をまたいだ一貫した勾配スケーリングを提供し、これにより、消滅と爆発する勾配のリスクを低減する。 CIFAR-10と医療データセットに関する広範な実験により、ZNormは収束を加速するだけでなく、パフォーマンス指標も向上することが示された。 ZNormは既存の手法を一貫して上回り、同じ計算設定で優れた結果を得る。医用画像の応用において、ZNormは腫瘍予測とセグメンテーション性能を改善し、その実用性を強調している。これらの調査結果は、さまざまなアーキテクチャやアプリケーションにわたるディープニューラルネットワークトレーニングの効率性と有効性を改善するための、堅牢で汎用的なツールとしてのZNormの可能性を強調している。 The rapid advancements in deep learning necessitate efficient training methods for deep neural networks (DNNs). As models grow in complexity, vanishing and exploding gradients impede convergence and performance. We propose Z-Score Normalization for Gradient Descent (ZNorm), an innovative technique that adjusts only the gradients to enhance training efficiency and improve model performance. ZNorm normalizes the overall gradients, providing consistent gradient scaling across layers, thereby reducing the risks of vanishing and exploding gradients. Our extensive experiments on CIFAR-10 and medical datasets demonstrate that ZNorm not only accelerates convergence but also enhances performance metrics. ZNorm consistently outperforms existing methods, achieving superior results using the same computational settings. In medical imaging applications, ZNorm improves tumor prediction and segmentation performances, underscoring its practical utility. These findings highlight ZNorm's potential as a robust and versatile tool for improving the efficiency and effectiveness of deep neural network training across a wide range of architectures and applications.	翻訳日:2024-08-05 13:37:26 公開日:2024-08-02
# S2TD-Face:1つのスケッチから制御可能なテクスチャで詳細な3D顔の再構築 S2TD-Face: Reconstruct a Detailed 3D Face with Controllable Texture from a Single Sketch ( http://arxiv.org/abs/2408.01218v1 ) ライセンス: Link先を確認	Zidu Wang, Xiangyu Zhu, Jiang Yu, Tianshuo Zhang, Zhen Lei,	(参考訳) アニメーションや3Dアバター,芸術デザイン,行方不明者検索など,多くのシナリオに適用可能なスケッチから3Dテクスチャ化された顔復元は,非常に有望だが未開発な研究課題である。一方、スケッチのスタイリスティックな多様性は、ポーズ限定で現実的な陰影のスケッチのみを処理できる既存のスケッチ・ツー・3Dフェイス手法に繋がる。一方、テクスチャは顔の外観を表現する上で重要な役割を担っているが、スケッチにはこの情報が欠如しており、再構築過程において追加のテクスチャ制御が必要である。本稿では,S2TD-Faceと呼ばれるスケッチから,制御可能なテクスチャと詳細な3次元顔の再構成手法を提案する。 S2TD-Faceは2段階の幾何再構成フレームワークを導入し、入力スケッチから詳細な幾何を直接再構築する。スケッチの微妙なストロークと幾何的整合性を維持するため,ディアンプやしわなどの入力特徴を正確に再現できる新しいスケッチ・ツー・ジオメトリー・ロスを提案する。我々のトレーニング戦略は、3D顔スキャンデータや労働集約的な手描きスケッチに頼らない。さらに、S2TD-Faceは、テキストプロンプトを利用したテクスチャ制御モジュールを導入し、ライブラリから最も適したテクスチャを選択し、それらをシームレスに幾何学に統合することで、制御可能なテクスチャを持つ3Dディテールフェイスを実現する。 S2TD-Faceは、膨大な量的および定性的な実験において、既存の最先端の手法を超越している。私たちのプロジェクトはhttps://github.com/wang-zidu/S2TD-Faceで利用可能です。 3D textured face reconstruction from sketches applicable in many scenarios such as animation, 3D avatars, artistic design, missing people search, etc., is a highly promising but underdeveloped research topic. On the one hand, the stylistic diversity of sketches leads to existing sketch-to-3D-face methods only being able to handle pose-limited and realistically shaded sketches. On the other hand, texture plays a vital role in representing facial appearance, yet sketches lack this information, necessitating additional texture control in the reconstruction process. This paper proposes a novel method for reconstructing controllable textured and detailed 3D faces from sketches, named S2TD-Face. S2TD-Face introduces a two-stage geometry reconstruction framework that directly reconstructs detailed geometry from the input sketch. To keep geometry consistent with the delicate strokes of the sketch, we propose a novel sketch-to-geometry loss that ensures the reconstruction accurately fits the input features like dimples and wrinkles. Our training strategies do not rely on hard-to-obtain 3D face scanning data or labor-intensive hand-drawn sketches. Furthermore, S2TD-Face introduces a texture control module utilizing text prompts to select the most suitable textures from a library and seamlessly integrate them into the geometry, resulting in a 3D detailed face with controllable texture. S2TD-Face surpasses existing state-of-the-art methods in extensive quantitative and qualitative experiments. Our project is available at https://github.com/wang-zidu/S2TD-Face .	翻訳日:2024-08-05 13:37:26 公開日:2024-08-02
# 計算思考能力評価のためのノイズゲイツベイズネットワークを用いたルーブリック学習モデル Rubric-based Learner Modelling via Noisy Gates Bayesian Networks for Computational Thinking Skills Assessment ( http://arxiv.org/abs/2408.01221v1 ) ライセンス: Link先を確認	Giorgia Adorni, Francesca Mangili, Alberto Piatti, Claudio Bonesana, Alessandro Antonucci,	(参考訳) 近代的・パーソナライズされた教育においては、学習者の能力を開発し、それらを正確に評価することへの関心が高まっている。本研究では,タスク固有の能力評価ルーリックから自動スキルアセスメントのための学習者モデルを導出する手法を提案し,自動アセスメントツールの実装を簡略化した。しかし、以前のアプローチには2つの大きな制限があった。一評価ルーリックで定める能力の秩序は、間接的にのみモデル化する。 (二)補足技は、評価対象ではなく、課題達成に必要なもので、そのモデルには含まれなかった。この作業では、問題に対処します。 (i)ダミー観測ノードの導入により,ネットワークの構造を変化させることなく,厳密な順序付けを行うことができた。対照的に、 2)2つのゲート層を持つネットワークを設計し,一方はノイズORゲートによる解離操作を行い,他方は論理的ANDによる解離操作を行う。このような変更は、モデルのコンパクトなパラメトリエーション、解釈可能性、単純な専門家の推論を妥協することなく、モデル結果の一貫性とモデリングツールの柔軟性を改善します。本研究では,CT(Computational Thinking)スキルアセスメントのための学習モデルの開発に,このアプローチを用いた。 CT-cubeスキルアセスメントフレームワークとCAT(Cross Array Task)は、それを実証し、その実現可能性を示すために使用される。 In modern and personalised education, there is a growing interest in developing learners' competencies and accurately assessing them. In a previous work, we proposed a procedure for deriving a learner model for automatic skill assessment from a task-specific competence rubric, thus simplifying the implementation of automated assessment tools. The previous approach, however, suffered two main limitations: (i) the ordering between competencies defined by the assessment rubric was only indirectly modelled; (ii) supplementary skills, not under assessment but necessary for accomplishing the task, were not included in the model. In this work, we address issue (i) by introducing dummy observed nodes, strictly enforcing the skills ordering without changing the network's structure. In contrast, for point (ii), we design a network with two layers of gates, one performing disjunctive operations by noisy-OR gates and the other conjunctive operations through logical ANDs. Such changes improve the model outcomes' coherence and the modelling tool's flexibility without compromising the model's compact parametrisation, interpretability and simple experts' elicitation. We used this approach to develop a learner model for Computational Thinking (CT) skills assessment. The CT-cube skills assessment framework and the Cross Array Task (CAT) are used to exemplify it and demonstrate its feasibility.	翻訳日:2024-08-05 13:37:26 公開日:2024-08-02
# ハイパースペクトル画像分類のためのマルチヘッド空間スペクトルマンバ Multi-head Spatial-Spectral Mamba for Hyperspectral Image Classification ( http://arxiv.org/abs/2408.01224v1 ) ライセンス: Link先を確認	Muhammad Ahmad, Muhammad Hassaan Farooq Butt, Muhammad Usama, Hamad Ahmed Altuwaijri, Manual Mazzara, Salvatore Distenano,	(参考訳) 空間スペクトルマンバ(SSM)は計算効率を改善し、トランスフォーマーの制限に対処して長距離依存をキャプチャする。しかし、伝統的なマンバモデルは、HSIの豊富なスペクトル情報を見落とし、高次元とシーケンシャルなデータに苦しむ。これらの課題に対処するため,マルチヘッド自己注意・トークン拡張(MHSSMamba)を用いたSSMを提案する。このモデルは、スペクトルトークンの強化とマルチヘッドアテンションを用いてスペクトルバンドと空間位置の複雑な関係を捉えることで、スペクトル情報と空間情報を統合する。また、スペクトル帯域にまたがるコンテキスト情報を保存し、長距離依存やHSIデータのシーケンシャルな性質も管理する。 MHSSMambaはパヴィア大学で97.62 %、ヒューストン大学で96.92 %、サリナスで96.85 %、武漢長クーのデータセットで99.49 %という顕著な分類精度を達成した。 Spatial-Spectral Mamba (SSM) improves computational efficiency and captures long-range dependencies, addressing Transformer limitations. However, traditional Mamba models overlook rich spectral information in HSIs and struggle with high dimensionality and sequential data. To address these issues, we propose the SSM with multi-head self-attention and token enhancement (MHSSMamba). This model integrates spectral and spatial information by enhancing spectral tokens and using multi-head attention to capture complex relationships between spectral bands and spatial locations. It also manages long-range dependencies and the sequential nature of HSI data, preserving contextual information across spectral bands. MHSSMamba achieved remarkable classification accuracies of 97.62\% on Pavia University, 96.92\% on the University of Houston, 96.85\% on Salinas, and 99.49\% on Wuhan-longKou datasets.	翻訳日:2024-08-05 13:37:26 公開日:2024-08-02
# 幻覚の脅威:視覚・言語モデルにおけるプライバシー漏洩を解き明かす The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models ( http://arxiv.org/abs/2408.01228v1 ) ライセンス: Link先を確認	Simone Caldarella, Massimiliano Mancini, Elisa Ricci, Rahaf Aljundi,	(参考訳) VLM(Vision-Language Models)は、視覚的およびテキスト的理解を組み合わせることで、画像キャプションの生成や、さまざまな領域にわたる視覚的質問への回答など、さまざまなタスクに適している。しかし、これらの機能は、Webからクロールされた大量の未処理データのトレーニングに基づいて構築されている。後者には、VLMが記憶し、リークする可能性のある機密情報が含まれており、重要なプライバシー上の懸念を引き起こす可能性がある。本稿では,これらの脆弱性が存在するかどうかを,ID漏洩に着目して評価する。私たちの研究は3つの重要な発見につながります。 i)VLMは、視覚言語アライメント及び微調整用データの使用時であっても、識別情報を漏洩する。 (二)身元漏洩にはほとんど影響しない。 (三)曖昧化のようにシンプルで広く用いられる匿名化技術は、この問題に対処するには不十分である。これらの知見は、VLMをデプロイする際の堅牢なプライバシ保護戦略の緊急の必要性を浮き彫りにした。倫理的認識と責任ある開発プラクティスは、これらのリスクを軽減するために不可欠です。 Vision-Language Models (VLMs) combine visual and textual understanding, rendering them well-suited for diverse tasks like generating image captions and answering visual questions across various domains. However, these capabilities are built upon training on large amount of uncurated data crawled from the web. The latter may include sensitive information that VLMs could memorize and leak, raising significant privacy concerns. In this paper, we assess whether these vulnerabilities exist, focusing on identity leakage. Our study leads to three key findings: (i) VLMs leak identity information, even when the vision-language alignment and the fine-tuning use anonymized data; (ii) context has little influence on identity leakage; (iii) simple, widely used anonymization techniques, like blurring, are not sufficient to address the problem. These findings underscore the urgent need for robust privacy protection strategies when deploying VLMs. Ethical awareness and responsible development practices are essential to mitigate these risks.	翻訳日:2024-08-05 13:37:26 公開日:2024-08-02
# HeteroMorpheus:形態的不均一性モデリングに基づくユニバーサルコントロール HeteroMorpheus: Universal Control Based on Morphological Heterogeneity Modeling ( http://arxiv.org/abs/2408.01230v1 ) ライセンス: Link先を確認	YiFan Hao, Yang Yang, Junru Song, Wei Peng, Weien Zhou, Tingsong Jiang, Wen Yao,	(参考訳) ロボット制御の分野では、各ロボットのために個々のコントローラを設計することは、高い計算コストをもたらす。多様なロボット形態に適用可能なユニバーサルコントロールポリシーは、この課題を軽減することを約束する。優先的に、グラフニューラルネットワーク(GNN)とトランスフォーマーに基づくモデルが採用されている。しかしながら、これらのモデルは典型的には、異なる手足の機能的多様性を見渡す均質なグラフ構造を用いる。このギャップを埋めるために、異種グラフ変換器に基づく新しい手法であるHeteroMorpheusを導入する。この方法は一意に四肢の不均一性に対処し、様々な形態のロボット力学のより良い表現を促進する。大規模な実験を通じて、ゼロショットの一般化やサンプル効率のよいロボット形態への移動を含む、政策一般化能力における最先端の手法に対するヘテロモルフェウスの優位性を実証する。 In the field of robotic control, designing individual controllers for each robot leads to high computational costs. Universal control policies, applicable across diverse robot morphologies, promise to mitigate this challenge. Predominantly, models based on Graph Neural Networks (GNN) and Transformers are employed, owing to their effectiveness in capturing relational dynamics across a robot's limbs. However, these models typically employ homogeneous graph structures that overlook the functional diversity of different limbs. To bridge this gap, we introduce HeteroMorpheus, a novel method based on heterogeneous graph Transformer. This method uniquely addresses limb heterogeneity, fostering better representation of robot dynamics of various morphologies. Through extensive experiments we demonstrate the superiority of HeteroMorpheus against state-of-the-art methods in the capability of policy generalization, including zero-shot generalization and sample-efficient transfer to unfamiliar robot morphologies.	翻訳日:2024-08-05 13:37:26 公開日:2024-08-02
# WaveMamba:ハイパースペクトル画像分類のための空間スペクトルウェーブレットマンバ WaveMamba: Spatial-Spectral Wavelet Mamba for Hyperspectral Image Classification ( http://arxiv.org/abs/2408.01231v1 ) ライセンス: Link先を確認	Muhammad Ahmad, Muhammad Usama, Manual Mazzara,	(参考訳) ハイパースペクトルイメージング(HSI)は、様々なアプリケーションにわたる詳細なスペクトルと空間情報をキャプチャするための強力なツールであることが証明されている。 HSI分類のためのディープラーニング(DL)とトランスフォーマーアーキテクチャ(HSIC)の進歩にもかかわらず、計算効率や広範なラベル付きデータの必要性といった課題が続いている。本稿では、ウェーブレット変換を空間スペクトルマンバアーキテクチャと統合してHSICを強化する新しいアプローチであるWaveMambaを紹介する。 WaveMambaは、エンドツーエンドのトレーニング可能なモデルで、ローカルなテクスチャパターンとグローバルなコンテキスト関係の両方をキャプチャします。 Waveletベースの拡張機能はステートスペースアーキテクチャを通じて処理され、空間-スペクトル関係と時間的依存関係をモデル化する。実験の結果、WaveMambaは既存のモデルを超え、ヒューストン大学のデータセットでは4.5倍の精度向上、パヴィア大学のデータセットでは2.0倍の精度向上を達成した。これらの結果は,HSIに固有の複雑なデータ相互作用に対処する上での有効性を検証した。 Hyperspectral Imaging (HSI) has proven to be a powerful tool for capturing detailed spectral and spatial information across diverse applications. Despite the advancements in Deep Learning (DL) and Transformer architectures for HSI Classification (HSIC), challenges such as computational efficiency and the need for extensive labeled data persist. This paper introduces WaveMamba, a novel approach that integrates wavelet transformation with the Spatial-Spectral Mamba architecture to enhance HSIC. WaveMamba captures both local texture patterns and global contextual relationships in an end-to-end trainable model. The Wavelet-based enhanced features are then processed through the state-space architecture to model spatial-spectral relationships and temporal dependencies. The experimental results indicate that WaveMamba surpasses existing models, achieving an accuracy improvement of 4.5\% on the University of Houston dataset and a 2.0\% increase on the Pavia University dataset. These findings validate its effectiveness in addressing the complex data interactions inherent in HSIs.	翻訳日:2024-08-05 13:27:42 公開日:2024-08-02
# CLIP4Sketch: 拡散モデルを用いたデータセット拡張によるスケッチとマグショットマッチングの強化 CLIP4Sketch: Enhancing Sketch to Mugshot Matching through Dataset Augmentation using Diffusion Models ( http://arxiv.org/abs/2408.01233v1 ) ライセンス: Link先を確認	Kushal Kumar Jain, Steve Grosz, Anoop M. Namboodiri, Anil K. Jain,	(参考訳) Forensic sketch-to-mugshot matchingは顔認識において難しい課題であり、主に注釈付き法医学的スケッチの不足と、スケッチと写真の間のモダリティギャップによって妨げられる。これを解決するために,拡散モデルを利用して多種多様なスケッチ画像を生成する新しいアプローチであるCLIP4Sketchを提案する。本手法は拡散確率モデル(DDPM)を用いて,個人性やスタイルを明確に制御したスケッチを生成する。参照マグショットのCLIPとAdafaceの埋め込みとスタイルのテキスト記述を,拡散モデルの条件として組み合わせる。本研究のアプローチの有効性は,マグショットに対応するスケッチの包括的データセットを作成し,合成データに基づいて顔認識モデルを訓練することによって実証する。本研究は,既存の実顔スケッチデータに対するトレーニングよりも,スケッチ・ツー・マガットのマッチング精度を大幅に向上させ,モダリティを越えた顔認識システムの性能向上における拡散モデルの可能性を検証した。また、その優位性を示すために、GANベースの手法を用いて生成されたデータセットとデータセットを比較した。 Forensic sketch-to-mugshot matching is a challenging task in face recognition, primarily hindered by the scarcity of annotated forensic sketches and the modality gap between sketches and photographs. To address this, we propose CLIP4Sketch, a novel approach that leverages diffusion models to generate a large and diverse set of sketch images, which helps in enhancing the performance of face recognition systems in sketch-to-mugshot matching. Our method utilizes Denoising Diffusion Probabilistic Models (DDPMs) to generate sketches with explicit control over identity and style. We combine CLIP and Adaface embeddings of a reference mugshot, along with textual descriptions of style, as the conditions to the diffusion model. We demonstrate the efficacy of our approach by generating a comprehensive dataset of sketches corresponding to mugshots and training a face recognition model on our synthetic data. Our results show significant improvements in sketch-to-mugshot matching accuracy over training on an existing, limited amount of real face sketch data, validating the potential of diffusion models in enhancing the performance of face recognition systems across modalities. We also compare our dataset with datasets generated using GAN-based methods to show its superiority.	翻訳日:2024-08-05 13:27:42 公開日:2024-08-02
# 量子ネットワークにおける絡み合いルーティング:包括的調査 Entanglement Routing in Quantum Networks: A Comprehensive Survey ( http://arxiv.org/abs/2408.01234v1 ) ライセンス: Link先を確認	Amar Abane, Michael Cubeddu, Van Sy Mai, Abdella Battou,	(参考訳) 近距離量子ネットワークにおけるエンタングルメントルーティングは、2つの離れたノード間の終端エンタングルメントを確立するために、スワップ操作によって結合するショートレンジエンタングルメントの最適なシーケンスを選択することで構成される。従来のルーティング技術と同様に、量子ルーティングプロトコルは、ネットワーク情報を使用して、一連のエンドツーエンドの絡み合い要求を満たす最適なパスを選択する。しかし、ネットワーク状態情報に加えて、量子ルーティングプロトコルは要求される絡み合いの忠実さ、スワップ操作の確率的性質、絡み合い状態の短寿命を考慮に入れなければならない。本研究では,実際の絡み合いルーティング問題を定式化し,それに対応する主要なアプローチを解析・分類し,従来のネットワークルーティング戦略と比較し,そこからインスピレーションを得る。我々は、研究された量子ルーティングスキームを、反応性、プロアクティブ、機会論的、仮想ルーティングに分類し、議論する。 Entanglement routing in near-term quantum networks consists of choosing the optimal sequence of short-range entanglements to combine through swapping operations to establish end-to-end entanglement between two distant nodes. Similar to traditional routing technologies, a quantum routing protocol uses network information to choose the best paths to satisfy a set of end-to-end entanglement requests. However, in addition to network state information, a quantum routing protocol must also take into account the requested entanglement fidelity, the probabilistic nature of swapping operations, and the short lifetime of entangled states. In this work, we formulate a practical entanglement routing problem and analyze and categorize the main approaches to address it, drawing comparisons to, and inspiration from, classical network routing strategies where applicable. We classify and discuss the studied quantum routing schemes into reactive, proactive, opportunistic, and virtual routing	翻訳日:2024-08-05 13:27:42 公開日:2024-08-02
# グラフニューラルネットワークによる個別血流と活動へのフロー誘導型位置決め Tailoring Graph Neural Network-based Flow-guided Localization to Individual Bloodstreams and Activities ( http://arxiv.org/abs/2408.01239v1 ) ライセンス: Link先を確認	Pablo Galván, Filip Lemic, Gerard Calvo Bartra, Sergi Abadal, Xavier Costa Pérez,	(参考訳) 血流中ナノデバイスを用いたフローガイドの局在化は,早期疾患の検出,生物状態の連続モニタリング,標的治療に有用であることが期待される。ナノデバイスは、ローカライゼーション目的のために誤った生データを生成する、サイズと電力制約を呈する。オンボディアンカーはこのデータを受信し、興味のある診断イベントの場所を導出する。さまざまな機械学習(ML)アプローチが最近提案されているが、現在は安静患者の基準血流に制限されている。そのため、患者の血流の物理的多様性には対処できず、個々の患者の活動の変化による継続的なモニタリングもできない。グラフニューラルネットワーク(GNN)をベースとした現状のフローガイド型ローカライズ手法であるSotA(State-of-the-Art)に対するこれらの課題に対処するために,身長,体重,心拍数などの個々の生理指標に基づくGNN適応のためのパイプラインを提案する。以上の結果から,提案した適応は,血流と活動の個人差を和らげる上で有益であることが示唆された。 Flow-guided localization using in-body nanodevices in the bloodstream is expected to be beneficial for early disease detection, continuous monitoring of biological conditions, and targeted treatment. The nanodevices face size and power constraints that produce erroneous raw data for localization purposes. On-body anchors receive this data, and use it to derive the locations of diagnostic events of interest. Different Machine Learning (ML) approaches have been recently proposed for this task, yet they are currently restricted to a reference bloodstream of a resting patient. As such, they are unable to deal with the physical diversity of patients' bloodstreams and cannot provide continuous monitoring due to changes in individual patient's activities. Toward addressing these issues for the current State-of-the-Art (SotA) flow-guided localization approach based on Graph Neural Networks (GNNs), we propose a pipeline for GNN adaptation based on individual physiological indicators including height, weight, and heart rate. Our results indicate that the proposed adaptions are beneficial in reconciling the individual differences between bloodstreams and activities.	翻訳日:2024-08-05 13:27:42 公開日:2024-08-02
# 300mmウエハプロセスを用いたMOS二重量子ドットの交換制御 Exchange control in a MOS double quantum dot made using a 300 mm wafer process ( http://arxiv.org/abs/2408.01241v1 ) ライセンス: Link先を確認	Jacob F. Chittock-Wood, Ross C. C. Leon, Michael A. Fogarty, Tara Murphy, Sofia M. Patomäki, Giovanni A. Oakes, Felix-Ekkehard von Horstig, Nathan Johnson, Julien Jussot, Stefan Kubicek, Bogdan Govoreanu, David F. Wise, M. Fernando Gonzalez-Zalba, John J. L. Morton,	(参考訳) 半導体産業の先進的な製造能力を活用することで、歩留まり、均一性、統合性を高めることで、シリコンベースの量子プロセッサのスケールアップを支援することが約束される。 300mmウエハ金属-酸化物-半導体(MOS)プロセスで作製された量子ドットの最近の研究は、個々のスピン量子ビットの制御と読み出しを示しているが、量子プロセッサは2量子ビットの相互作用を必要とする。ここでは、スピン量子ビット用にカスタマイズされた300mmウエハMOSプロセスを使用し、スピン-スピン交換相互作用を用いた2つの電子スピンのコヒーレント制御を示し、$\sqrt{\text{SWAP}}$のようなエンタングルゲートの基礎を形成する。ゲート劣化時間は最大$T_2^{}\approx500$ns, ゲート品質係数は10。我々はさらに、エコーシーケンスを用いて最大1桁までコヒーレンスを拡大する。読み出しには、分散測定のスピンプロジェクティブな性質を維持しながら信号を増幅する、分散読出技術である高周波電子カスケードを導入する。本研究は,分散センシング技術との統合とともに,2量子演算のための産業用グレードプラットフォームを実証した。 Leveraging the advanced manufacturing capabilities of the semiconductor industry promises to help scale up silicon-based quantum processors by increasing yield, uniformity and integration. Recent studies of quantum dots fabricated on 300 mm wafer metal-oxide-semiconductor (MOS) processes have shown control and readout of individual spin qubits, yet quantum processors require two-qubit interactions to operate. Here, we use a 300 mm wafer MOS process customized for spin qubits and demonstrate coherent control of two electron spins using the spin-spin exchange interaction, forming the basis of an entangling gate such as $\sqrt{\text{SWAP}}$. We observe gate dephasing times of up to $T_2^{}\approx500$ ns and a gate quality factor of 10. We further extend the coherence by up to an order of magnitude using an echo sequence. For readout, we introduce a dispersive readout technique, the radiofrequency electron cascade, that amplifies the signal while retaining the spin-projective nature of dispersive measurements. Our results demonstrate an industrial grade platform for two-qubit operations, alongside integration with dispersive sensing techniques.	翻訳日:2024-08-05 13:27:42 公開日:2024-08-02
# XGBoostモデルとSVMモデルを用いたドライビーン品種の自動分類 Automated Classification of Dry Bean Varieties Using XGBoost and SVM Models ( http://arxiv.org/abs/2408.01244v1 ) ライセンス: Link先を確認	Ramtin Ardeshirifar,	(参考訳) 本稿では,機械学習モデルを用いた7種類の乾燥豆の自動分類について比較検討する。 12,909個のドライビーンサンプルを用いて, 初期13,611個から外乱除去と特徴抽出を行い, 主成分分析 (PCA) を次元化に応用し, XGBoost と Support Vector Machine (SVM) の2種類のマルチクラス分類器を訓練した。モデルをネストしたクロスバリデーションを用いて評価し,ロバストな性能評価とハイパーパラメータチューニングを実現した。 XGBoostとSVMのモデルはそれぞれ94.00%と94.39%の正確な分類率を達成した。この結果は、特に種子分類の均一性と効率を高めるために、農業応用におけるこれらの機械学習アプローチの有効性を裏付けるものである。本研究は, 種子品質制御と収量最適化を効果的に支援できることを実証し, 精密農業への取り組みの活発化に寄与する。今後は、より多様なデータセットと高度なアルゴリズムを取り入れて、分類精度をさらに向上していく予定だ。 This paper presents a comparative study on the automated classification of seven different varieties of dry beans using machine learning models. Leveraging a dataset of 12,909 dry bean samples, reduced from an initial 13,611 through outlier removal and feature extraction, we applied Principal Component Analysis (PCA) for dimensionality reduction and trained two multiclass classifiers: XGBoost and Support Vector Machine (SVM). The models were evaluated using nested cross-validation to ensure robust performance assessment and hyperparameter tuning. The XGBoost and SVM models achieved overall correct classification rates of 94.00% and 94.39%, respectively. The results underscore the efficacy of these machine learning approaches in agricultural applications, particularly in enhancing the uniformity and efficiency of seed classification. This study contributes to the growing body of work on precision agriculture, demonstrating that automated systems can significantly support seed quality control and crop yield optimization. Future work will explore incorporating more diverse datasets and advanced algorithms to further improve classification accuracy.	翻訳日:2024-08-05 13:27:42 公開日:2024-08-02
# MapComp: グループアグリゲーションのためのセキュアなビューベースの協調分析フレームワーク MapComp: A Secure View-based Collaborative Analytics Framework for Join-Group-Aggregation ( http://arxiv.org/abs/2408.01246v1 ) ライセンス: Link先を確認	Xinyu Peng, Feng Han, Li Peng, Weiran Liu, Zheng Yan, Kai Kang, Xinyuan Zhang, Guoxing Wei, Jianling Sun, Jinfei Liu,	(参考訳) 本稿では、協調分析のための結合グループ集約(JGA)クエリを容易にするビューベースの新しいフレームワークであるMapCompを紹介する。グループ集約(group-aggregation, GA)プロトコルの結合と新規設計のための特別に製作されたマテリアライズドビューにより、MapCompは重複したジョインのワークロードを排除し、その後のGAを高速化し、JGAクエリの実行効率を向上する。連続的なデータ更新をサポートするため、当社のマテリアライズドビューはペイロード独立機能を提供し、無料のMPCオーバーヘッドでビューリフレッシュの大幅な効率向上を実現しています。この機能はまた、GAのさらなる加速を可能にし、以前の作業より優れた複数の新しいプロトコルを考案しました。特に、本研究は、マテリアライズドビューを使ったセキュアなJGAクエリを高速化する最初の取り組みである。本実験はMapCompの大きな利点を示し,クエリを8回実行する場合の非ビューベースラインと比較して,2189.9倍の効率向上を実現した。 This paper introduces MapComp, a novel view-based framework to facilitate join-group-aggregation (JGA) queries for collaborative analytics. Through specially crafted materialized view for join and novel design of group-aggregation (GA) protocols, MapComp removes duplicated join workload and expedites subsequent GA, improving the efficiency of JGA query execution. To support continuous data updates, our materialized view offers payload-independence feature and brings in significant efficiency improvement of view refreshing with free MPC overhead. This feature also allows further acceleration for GA, where we devised multiple novel protocols that outperform prior works. Notably, our work represents the first endeavor to expedite secure collaborative JGA queries using materialized views. Our experiments demonstrate a significant advantage of MapComp, achieving up to a 2189.9x efficiency improvement compared to the non-view based baseline when executing queries eight times.	翻訳日:2024-08-05 13:27:42 公開日:2024-08-02
# IRSおよびUAV支援MECシステムのための深層強化学習に基づくフレキシブルリソーススケジューリングフレームワーク Deep progressive reinforcement learning-based flexible resource scheduling framework for IRS and UAV-assisted MEC system ( http://arxiv.org/abs/2408.01248v1 ) ライセンス: Link先を確認	Li Dong, Feibo Jiang, Minjie Wang, Yubo Peng, Xiaolong Li,	(参考訳) インテリジェントリフレクションサーフェス (IRS) と無人航空機 (UAV) による移動エッジコンピューティング (MEC) システムは、一時的および緊急のシナリオで広く利用されている。我々のゴールは、UAV位置、IRS位相シフト、タスクオフロード、リソース割り当てを可変数のUAVで共同最適化することで、MECシステムのエネルギー消費を最小化することである。この目的のために,新しいマルチタスクエージェントが提案され,混合整数非線形プログラミング(MINLP)問題に対処する。本発明のマルチタスクエージェントは、異なるタスク用に設計された2つの出力ヘッドを有し、分類されたヘッドを用いて整数変数によるオフロード決定を行い、適合ヘッドは連続変数によるリソース割り当てを解決する。次に、プログレッシブスケジューラを導入して、エージェント内のニューロンの一部を段階的に調整することにより、エージェントを様々な数のUAVに適応させる。この構造は自然に経験を蓄積し、破滅的な忘れ物に免疫を持つ。最後に、FRESのグローバル検索を強化するために、ライトタブー検索(LTS)を導入する。数値計算により,動的MECシステムにおいてもリアルタイムかつ最適な資源スケジューリングを実現するFRESフレームワークの優位性を示す。 The intelligent reflection surface (IRS) and unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) system is widely used in temporary and emergency scenarios. Our goal is to minimize the energy consumption of the MEC system by jointly optimizing UAV locations, IRS phase shift, task offloading, and resource allocation with a variable number of UAVs. To this end, we propose a Flexible REsource Scheduling (FRES) framework by employing a novel deep progressive reinforcement learning which includes the following innovations: Firstly, a novel multi-task agent is presented to deal with the mixed integer nonlinear programming (MINLP) problem. The multi-task agent has two output heads designed for different tasks, in which a classified head is employed to make offloading decisions with integer variables while a fitting head is applied to solve resource allocation with continuous variables. Secondly, a progressive scheduler is introduced to adapt the agent to the varying number of UAVs by progressively adjusting a part of neurons in the agent. This structure can naturally accumulate experiences and be immune to catastrophic forgetting. Finally, a light taboo search (LTS) is introduced to enhance the global search of the FRES. The numerical results demonstrate the superiority of the FRES framework which can make real-time and optimal resource scheduling even in dynamic MEC systems.	翻訳日:2024-08-05 13:27:42 公開日:2024-08-02
# 不確実環境におけるメタレアソン--メタBAMDPフレームワーク Metareasoning in uncertain environments: a meta-BAMDP framework ( http://arxiv.org/abs/2408.01253v1 ) ライセンス: Link先を確認	Prakhar Godara, Tilman Diego Aléman, Angela J. Yu,	(参考訳) 意思決定のシナリオでは、 \textit{reasoning} は、アクション $a^* \in \mathcal{A}$ を選択するアルゴリズム $P$ と見ることができ、マルコフ決定プロセス(MDP)の値関数の最大化などの結果の最適化を目的としている。しかしながら、$P$自体の実行にはいくつかのコスト(時間、エネルギー、限られた容量など)がかかり、根底にある決定問題における選択によって得られる明示的なユーティリティと並行して考慮する必要がある。このようなコストは、人間の振る舞いを正確にモデル化するだけでなく、すべての物理的システムがリソースの制約に直面しているため、AI計画の最適化にも考慮する必要がある。正しい$P$を見つけることは、推論プロセスの空間上の最適化問題として、$P$(一般には \textit{metareasoning} と呼ばれる)と表すことができる。従来、ヒトメタレゾンモデルでは、エージェントは基礎となるMDPの遷移と報酬分布を知っていると仮定していた。本稿では,メタベイズ適応型MDP(meta-BAMDP)フレームワークを,人間やAIシステムが直面している,はるかに大規模で現実的な計画問題を含む,未知の報酬/遷移分布を持つ環境におけるメタ推論を扱うことで,そのようなモデルを一般化する。最初のステップとして、人間の意思決定によく使われる2本腕のBernoulli bandit(TABB)タスクにこのフレームワークを適用します。メタ問題の複雑さのため、我々のソリューションは必ずしも近似的だが、それでも人間の意思決定シナリオにとって間違いなく現実的な仮定の範囲内で堅牢である。これらの結果は、認知的制約の下での人間の探索を理解するための規範的な枠組みを提供する。ベイズ適応戦略とメタ推論の統合は、意思決定研究の理論的な展望と、不確実性とリソース制約の下で計画するAIシステムを設計する実践的応用の両方を豊かにする。 In decision-making scenarios, \textit{reasoning} can be viewed as an algorithm $P$ that makes a choice of an action $a^* \in \mathcal{A}$, aiming to optimize some outcome such as maximizing the value function of a Markov decision process (MDP). However, executing $P$ itself may bear some costs (time, energy, limited capacity, etc.) and needs to be considered alongside explicit utility obtained by making the choice in the underlying decision problem. Such costs need to be taken into account in order to accurately model human behavior, as well as optimizing AI planning, as all physical systems are bound to face resource constraints. Finding the right $P$ can itself be framed as an optimization problem over the space of reasoning processes $P$, generally referred to as \textit{metareasoning}. Conventionally, human metareasoning models assume that the agent knows the transition and reward distributions of the underlying MDP. This paper generalizes such models by proposing a meta Bayes-Adaptive MDP (meta-BAMDP) framework to handle metareasoning in environments with unknown reward/transition distributions, which encompasses a far larger and more realistic set of planning problems that humans and AI systems face. As a first step, we apply the framework to two-armed Bernoulli bandit (TABB) tasks, which have often been used to study human decision making. Owing to the meta problem's complexity, our solutions are necessarily approximate, but nevertheless robust within a range of assumptions that are arguably realistic for human decision-making scenarios. These results offer a normative framework for understanding human exploration under cognitive constraints. This integration of Bayesian adaptive strategies with metareasoning enriches both the theoretical landscape of decision-making research and practical applications in designing AI systems that plan under uncertainty and resource constraints.	翻訳日:2024-08-05 13:27:42 公開日:2024-08-02
# TrIM:畳み込みニューラルネットワークのための三角形入力運動シストリックアレイ-その1:データフローと解析モデル TrIM: Triangular Input Movement Systolic Array for Convolutional Neural Networks -- Part I: Dataflow and Analytical Modelling ( http://arxiv.org/abs/2408.01254v1 ) ライセンス: Link先を確認	Cristian Sestito, Shady Agwa, Themis Prodromakis,	(参考訳) 最先端AIモデルの継続的な計算複雑性とデータ強度に従うために、新しい計算パラダイムが提案されている。これらのパラダイムは、処理コアとメモリの間のデータ移動のエネルギーコストに関連するフォン・ノイマンのボトルネックを緩和することにより、高いエネルギー効率を達成することを目的としている。畳み込みニューラルネットワーク(CNN)はこのボトルネックに特に影響を受けやすい。 Systolic Arrays (SA)は、処理要素の配列(PE)によって実行される高いデータ利用のおかげで、データ転送コストを軽減できる有望なアーキテクチャである。これらのPEは、特定のデータフロー(重量定常や行定常など)に基づいて、データを連続的に交換し、処理し、メインメモリへのメモリアクセス数を減少させる。 SAのハードウェア特殊化は、行列乗算から多次元畳み込みまで、さまざまなワークロードに対応できる。本稿では,三角入力運動に基づく新しいデータフローであるTrIMを提案する。重量定常や行定常のような最先端のSAデータフローと比較すると、TrIMが提供する高いデータ利用はメモリアクセスを約10倍削減する。さらに、PEが連続的に乗算と累積を重複していることを考えると、TrIMは限られたレジスタ(行定常よりも最大で15.6倍少ないレジスタ)を必要とせず、高いスループット(行定常よりも81.8%高い)を達成する。 In order to follow the ever-growing computational complexity and data intensity of state-of-the-art AI models, new computing paradigms are being proposed. These paradigms aim at achieving high energy efficiency, by mitigating the Von Neumann bottleneck that relates to the energy cost of moving data between the processing cores and the memory. Convolutional Neural Networks (CNNs) are particularly susceptible to this bottleneck, given the massive data they have to manage. Systolic Arrays (SAs) are promising architectures to mitigate the data transmission cost, thanks to high data utilization carried out by an array of Processing Elements (PEs). These PEs continuously exchange and process data locally based on specific dataflows (like weight stationary and row stationary), in turn reducing the number of memory accesses to the main memory. The hardware specialization of SAs can meet different workloads, ranging from matrix multiplications to multi-dimensional convolutions. In this paper, we propose TrIM: a novel dataflow for SAs based on a Triangular Input Movement and compatible with CNN computing. When compared to state-of-the-art SA dataflows, like weight stationary and row stationary, the high data utilization offered by TrIM guarantees ~10x less memory access. Furthermore, considering that PEs continuously overlap multiplications and accumulations, TrIM achieves high throughput (up to 81.8% higher than row stationary), other than requiring a limited number of registers (up to 15.6x fewer registers than row stationary).	翻訳日:2024-08-05 13:27:42 公開日:2024-08-02
# SeCritMass:秘密の誓約を守る SeCritMass: Threshold Secret Petitions ( http://arxiv.org/abs/2408.01255v1 ) ライセンス: Link先を確認	Florian Breuer,	(参考訳) 我々は、ユーザが署名に暗号化された署名を付加し、少なくとも$n$署名が収集された場合に限り、署名が復号化されるという、$n$-thresholdの秘密請願の概念を導入する。これは、ユーザーが請願書に署名したり、原因にコミットしたいと願う調整の問題を解決するが、他のユーザーが署名する前に署名したと特定したくない。本稿では,ElGamal暗号システムに基づく請願書の実装について述べる。申請書には、セクハラや警察の残虐行為の訴えなど、不平を訴える者が単独で立ち上がるのをためらった状況の不正行為を報告することが含まれる。 We introduce the notion of an $n$-threshold secret petition, in which users add encrypted signatures to a petition, and the signatures are decrypted if and only if at least $n$ signatures have been gathered. This solves the coordination problem in which users wish to sign a petition or commit to a cause, but do not want to be identified as having signed it before enough others have signed it too. We present an implementation of such a petition based on the ElGamal cryptosystem. Applications include reporting misconduct in situations were complainants hesitate to come forward alone, such as in allegations of sexual harassment or police brutality.	翻訳日:2024-08-05 13:27:42 公開日:2024-08-02
# 協調型オンライン行動の検出と評価 Detection and Characterization of Coordinated Online Behavior: A Survey ( http://arxiv.org/abs/2408.01257v1 ) ライセンス: Link先を確認	Lorenzo Mannocci, Michele Mazza, Anna Monreale, Maurizio Tesconi, Stefano Cresci,	(参考訳) 調整は人生の基本的な側面である。ソーシャルメディアの出現は、オンラインコミュニティの繁栄や社会運動を特徴付けるような、オンライン人との交流にも不可欠なものとなっている。同時に、コーディネーションは効果的な偽情報、操作、ヘイトキャンペーンのコアでもある。この調査は、コーディネートされたオンライン行動への関心が高まった結果、得られた仕事の身体を収集し、分類し、批判的に議論する。我々は、業界と学術的定義を整理し、協調したオンライン行動を研究するための包括的な枠組みを提案し、既存の検出方法と特徴付け手法をレビューし、批判的に議論する。本分析では,オンラインコーディネーションに固有の複雑さを理解し,対処する上で,学者,実践者,政策立案者のガイドとして,オープンな課題と研究の有望な方向性を特定した。 Coordination is a fundamental aspect of life. The advent of social media has made it integral also to online human interactions, such as those that characterize thriving online communities and social movements. At the same time, coordination is also core to effective disinformation, manipulation, and hate campaigns. This survey collects, categorizes, and critically discusses the body of work produced as a result of the growing interest on coordinated online behavior. We reconcile industry and academic definitions, propose a comprehensive framework to study coordinated online behavior, and review and critically discuss the existing detection and characterization methods. Our analysis identifies open challenges and promising directions of research, serving as a guide for scholars, practitioners, and policymakers in understanding and addressing the complexities inherent to online coordination.	翻訳日:2024-08-05 13:27:42 公開日:2024-08-02
# 絡み合いエントロピーのQCD進化 QCD evolution of entanglement entropy ( http://arxiv.org/abs/2408.01259v1 ) ライセンス: Link先を確認	Martin Hentschinski, Dmitri E. Kharzeev, Krzysztof Kutak, Zhoudunming Tu,	(参考訳) エンタングルメントエントロピーは、プロトンにおける色閉じ込めのような非摂動量子色力学(QCD)現象を探索するための新しいツールとして登場した。近年の研究では、深い非弾性散乱におけるハドロンの生成を説明する上で重要な能力を示しているが、絡み合いエントロピーのQCD進化は未解明のままである。本研究では, 陽子内における速度依存性エンタングルメントエントロピーとその最終状態ハドロンへの関連性について検討し, QCDの進化を解明することを目的とした。解析の結果,QCD進化方程式から得られたフォン・ノイマンエントロピーの速さ依存性と,それに対応するハドロンエントロピーの実験データとの間には強い一致が認められた。これらの発見は、最大絡み合った状態の出現を示す説得力のある証拠となり、陽子の非摂動構造に対する新たな洞察を与える。 Entanglement entropy has emerged as a novel tool for probing nonperturbative quantum chromodynamics (QCD) phenomena, such as color confinement in protons. While recent studies have demonstrated its significant capability in describing hadron production in deep inelastic scatterings, the QCD evolution of entanglement entropy remains unexplored. In this work, we investigate the differential rapidity-dependent entanglement entropy within the proton and its connection to final-state hadrons, aiming to elucidate its QCD evolution. Our analysis reveals a strong agreement between the rapidity dependence of von Neumann entropy, obtained from QCD evolution equations, and the corresponding experimental data on hadron entropy. These findings provide compelling evidence for the emergence of a maximally entangled state, offering new insights into the nonperturbative structure of protons.	翻訳日:2024-08-05 13:27:42 公開日:2024-08-02
# RAGEval:シナリオ固有のRAG評価データセット生成フレームワーク RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework ( http://arxiv.org/abs/2408.01262v1 ) ライセンス: Link先を確認	Kunlun Zhu, Yifan Luo, Dingling Xu, Ruobing Wang, Shi Yu, Shuo Wang, Yukun Yan, Zhenghao Liu, Xu Han, Zhiyuan Liu, Maosong Sun,	(参考訳) Retrieval-Augmented Generation (RAG) システムは,Large Language Models (LLM) の幻覚を緩和する上で,その利点を実証している。既存のRAGベンチマークは主に、LLMが一般的な知識に正しく答えられるかどうかを評価することに焦点を当てている。しかし、異なる垂直領域のデータを扱う場合、RAGシステムの有効性は評価できない。本稿では,異なるシナリオにおける異なるLLMの知識利用能力を評価するために,評価データセットを自動生成するフレームワークであるRAGEvalを紹介する。具体的には、RAGEvalはシードドキュメントからスキーマを要約し、さまざまなドキュメントを生成するために構成を適用し、記事と構成の両方に応じて質問応答ペアを構築する。 LLMが生み出す応答を慎重に評価するために, 完全性, 幻覚, 不適切性の3つの新しい指標を提案する。 RAGEvalは、垂直領域のRAGモデルをベンチマークすることで、LCMの知識使用能力をよりよく評価する能力を持ち、既存のQAデータセットにおける知識の源泉に関する混乱を避ける。 Retrieval-Augmented Generation (RAG) systems have demonstrated their advantages in alleviating the hallucination of Large Language Models (LLMs). Existing RAG benchmarks mainly focus on evaluating whether LLMs can correctly answer the general knowledge. However, they are unable to evaluate the effectiveness of the RAG system in dealing with the data from different vertical domains. This paper introduces RAGEval, a framework for automatically generating evaluation datasets to evaluate the knowledge usage ability of different LLMs in different scenarios. Specifically, RAGEval summarizes a schema from seed documents, applies the configurations to generate diverse documents, and constructs question-answering pairs according to both articles and configurations. We propose three novel metrics, Completeness, Hallucination, and Irrelevance, to carefully evaluate the responses generated by LLMs. By benchmarking RAG models in vertical domains, RAGEval has the ability to better evaluate the knowledge usage ability of LLMs, which avoids the confusion regarding the source of knowledge in answering question in existing QA datasets--whether it comes from parameterized memory or retrieval.	翻訳日:2024-08-05 13:27:42 公開日:2024-08-02
# 仮想CAT:スイスの強制教育におけるアルゴリズム的思考評価ツール The virtual CAT: A tool for algorithmic thinking assessment in Swiss compulsory education ( http://arxiv.org/abs/2408.01263v1 ) ライセンス: Link先を確認	Giorgia Adorni, Alberto Piatti,	(参考訳) 今日のデジタル時代において、アルゴリズム思考(AT)スキルを保持することは、コンピュータ科学の分野だけでなく、重要なことである。これらの能力により、個人は複雑な問題をより管理可能なステップに分解し、解決するための一連のアクションを作成することができる。教育環境におけるATアセスメントの需要の増加と現行手法の限界に対処するため,スイスの強制教育におけるアルゴリズムスキルの評価を目的とした非プラグ型アセスメント活動のデジタル適応である仮想クロスアレータスク(CAT)を紹介した。このツールはスケーラブルで自動化されたアセスメントを提供し、人間の関与を減らし、潜在的なデータ収集エラーを軽減する。このプラットフォームはジェスチャーベースおよび視覚ブロックベースのプログラミングインタフェースを備えており、多様な学習者に対するユーザビリティを確保し、さらに多言語機能によってサポートされている。仮想CATプラットフォームを評価するため,スイスで異種学生グループによるパイロット評価を行った。この結果から, 多様な年齢, 開発段階, 教育的背景を持つ学生のATスキルを評価するためのプラットフォームの有用性, 習熟度, 適性, および大規模データ収集の可能性が示唆された。 In today's digital era, holding algorithmic thinking (AT) skills is crucial, not only in computer science-related fields. These abilities enable individuals to break down complex problems into more manageable steps and create a sequence of actions to solve them. To address the increasing demand for AT assessments in educational settings and the limitations of current methods, this paper introduces the virtual Cross Array Task (CAT), a digital adaptation of an unplugged assessment activity designed to evaluate algorithmic skills in Swiss compulsory education. This tool offers scalable and automated assessment, reducing human involvement and mitigating potential data collection errors. The platform features gesture-based and visual block-based programming interfaces, ensuring its usability for diverse learners, further supported by multilingual capabilities. To evaluate the virtual CAT platform, we conducted a pilot evaluation in Switzerland involving a heterogeneous group of students. The findings show the platform's usability, proficiency and suitability for assessing AT skills among students of diverse ages, development stages, and educational backgrounds, as well as the feasibility of large-scale data collection.	翻訳日:2024-08-05 13:27:42 公開日:2024-08-02
# 浮遊ナノ粒子の量子非局在化 Quantum Delocalization of a Levitated Nanoparticle ( http://arxiv.org/abs/2408.01264v1 ) ライセンス: Link先を確認	Massimiliano Rossi, Andrei Militaru, Nicola Carlon Zambon, Andreu Riera-Campeny, Oriol Romero-Isart, Martin Frimmer, Lukas Novotny,	(参考訳) 量子物理学によれば、全ての巨大な粒子は波のように振る舞う。しかし、この特性波の性質は、原子や分子のような顕微鏡システムを用いた二重スリット実験でのみ観察されている。鍵となる側面は、これらの系の運動を記述する波動関数がスリット分離に匹敵する距離を連続的に拡張し、系自体のサイズよりもはるかに大きいことである。より巨大で複雑な物体をこれらの状態に準備することは、依然として顕著な課題である。固体振動子の運動は単一量子のレベルで制御できるが、そのコヒーレンス長はゼロ点運動と同等であり、原子間距離に制限される。ここでは、ゼロ点運動を超えるコヒーレンス長を有する浮遊固体ナノスフィアの非局在状態を作成する。私たちはまずその動きを地平線に冷やします。そして、閉じ込め電位の剛性を調節することにより、最小付加雑音で初期コヒーレンス長の3倍以上の増分を達成する。光学浮揚は、他の機械的プラットフォームに欠けている閉じ込めを制御できる。我々の研究は、物体の大きさに匹敵する非局在化スケールの生成に向けたステップストーンであり、マクロ的な量子実験にとって重要なレギュレーションであり、浮遊粒子を用いた量子増強力センシングに向けたものである。 Every massive particle behaves like a wave, according to quantum physics. Yet, this characteristic wave nature has only been observed in double-slit experiments with microscopic systems, such as atoms and molecules. The key aspect is that the wavefunction describing the motion of these systems extends coherently over a distance comparable to the slit separation, much larger than the size of the system itself. Preparing these states of more massive and complex objects remains an outstanding challenge. While the motion of solid-state oscillators can now be controlled at the level of single quanta, their coherence length remains comparable to the zero-point motion, limited to subatomic distances. Here, we prepare a delocalized state of a levitating solid-state nanosphere with coherence length exceeding the zero-point motion. We first cool its motion to the ground state. Then, by modulating the stiffness of the confinement potential, we achieve more than a threefold increment of the initial coherence length with minimal added noise. Optical levitation gives us the necessary control over the confinement that other mechanical platforms lack. Our work is a stepping stone towards the generation of delocalization scales comparable to the object size, a crucial regime for macroscopic quantum experiments, and towards quantum-enhanced force sensing with levitated particles.	翻訳日:2024-08-05 13:17:55 公開日:2024-08-02
# 非エルミート系における不純物による皮膚効果とリニアモード Impurity-induced counter skin-effect and linear modes in non-Hermitian systems ( http://arxiv.org/abs/2408.01265v1 ) ライセンス: Link先を確認	Nico G. Leumer, Dario Bercioux,	(参考訳) 非相互格子系は最も単純で非エルミート系の一つであり、エルミート系にはいくつかの重要な特徴が欠落している。本研究では,不純物を用いた波多野・ネルソンモデルについて検討し,不純物が本態性非エルミット皮膚への影響を明らかにする。不純物の位置や強度に関わらず、開境界条件と周期境界条件の両方で問題に対して正確な解析解を提示する。この正確な解は数値シミュレーションによって完全に検証される。非相反ホッピングパラメータによって決定される特異な不純物強度は、不純物部位で特異な皮膚状態を引き起こす。この不純物状態は、境界誘発皮膚効果と相反する皮膚効果を示し、不純物誘発皮膚効果と呼ばれる現象である。これらの発見は、不純物を持つ非エルミート系の力学に関する前例のない洞察を与え、不純物とシステムの非相互性の間の複雑な相互作用を解明する。さらに,不純物による対向皮膚効果の特異な特性を,感度と精度の向上に活用できる量子センシングへの応用の可能性も示唆した。 Non-reciprocal lattice systems are among the simplest non-Hermitian systems, exhibiting several key features absent in their Hermitian counterparts. In this study, we investigate the Hatano-Nelson model with an impurity and unveil how the impurity influences the intrinsic non-Hermitian skin effect of the system. We present an exact analytical solution to the problem under both open and periodic boundary conditions, irrespective of the impurity's position and strength. This exact solution is thoroughly validated by numerical simulations. Our analysis reveals a distinctive phenomenon where a specific impurity strength, determined by the non-reciprocal hopping parameters, induces a unique skin state at the impurity site. This impurity state exhibits a skin effect that counterbalances the boundary-induced skin effect, a phenomenon we term the impurity-induced counter skin-effect. These findings offer unprecedented insights into the dynamics of non-Hermitian systems with impurities, elucidating the complex interplay between impurities and the system's non-reciprocal nature. Furthermore, our results suggest potential applications in quantum sensing, where the unique characteristics of the impurity-induced counter skin-effect could be harnessed for enhanced sensitivity and precision.	翻訳日:2024-08-05 13:17:55 公開日:2024-08-02
# 語彙豊かさによるテキスト・ツー・3次元生成のための3次元GS初期化向上のための一般フレームワーク A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness ( http://arxiv.org/abs/2408.01269v1 ) ライセンス: Link先を確認	Lutao Jiang, Hangyu Li, Lin Wang,	(参考訳) テキストから3Dコンテンツの作成は、特に3Dガウススプラッティングの流行により、最近多くの注目を集めている。一般に、GSベースの手法は初期化とレンダリング最適化という2つの重要な段階から構成される。初期化を達成するために、既存の研究は、初期形状を導出するためにランダム球初期化または3次元拡散モデル(例えば、Point-E)を直接適用している。しかし、このような戦略は2つの重大な難題に苦しむ。 1) 最終形状は,訓練後においても,初期形状と変わらず類似している。 2) 形状は単純なテキスト,例えば "a dog" からのみ生成できるが,これは語彙的にリッチなテキスト,例えば "a dog are on the top of the plane" のためではない。これらの問題に対処するために,テキストから3D生成のための3次元GS初期化を語彙的豊かさに基づいて促進する,新しい汎用フレームワークを提案する。我々のキーとなる考え方は、3Dガウスを空間的に均一なボクセルに集約し、複雑な形状を表現し、3Dガウスとテクスト間の空間的相互作用とガウスとテクスト間の意味的相互作用を可能にすることである。具体的には、まず、各ボクセルが位置、スケール、回転を固定した3次元ガウスを持ち、不透明度を唯一の要因として設定し、位置の占有度を決定するボクセル化表現を構築する。次に、主に2つの新しいコンポーネントからなる初期化ネットワークを設計する。 1)グローバルインフォメーション・パーセプション(GIP)ブロックと 2) Gaussians-Text Fusion (GTF) ブロック。このような設計により、各3次元ガウスは、他の領域からの空間情報とテキストからの意味情報を同化することができる。大規模な実験により,従来の手法であるShap-Eに対して,語彙的に単純,中,硬テキストを採り入れ,高品質な3D GS初期化の枠組みが優れていることが示された。また、私たちのフレームワークは、セマンティックに一貫性のあるテキストから3D生成のためのSoTAトレーニングフレームワーク、例えばLucidDreamerにシームレスにプラグインすることができます。 Text-to-3D content creation has recently received much attention, especially with the prevalence of 3D Gaussians Splatting. In general, GS-based methods comprise two key stages: initialization and rendering optimization. To achieve initialization, existing works directly apply random sphere initialization or 3D diffusion models, e.g., Point-E, to derive the initial shapes. However, such strategies suffer from two critical yet challenging problems: 1) the final shapes are still similar to the initial ones even after training; 2) shapes can be produced only from simple texts, e.g., "a dog", not for lexically richer texts, e.g., "a dog is sitting on the top of the airplane". To address these problems, this paper proposes a novel general framework to boost the 3D GS Initialization for text-to-3D generation upon the lexical richness. Our key idea is to aggregate 3D Gaussians into spatially uniform voxels to represent complex shapes while enabling the spatial interaction among the 3D Gaussians and semantic interaction between Gaussians and texts. Specifically, we first construct a voxelized representation, where each voxel holds a 3D Gaussian with its position, scale, and rotation fixed while setting opacity as the sole factor to determine a position's occupancy. We then design an initialization network mainly consisting of two novel components: 1) Global Information Perception (GIP) block and 2) Gaussians-Text Fusion (GTF) block. Such a design enables each 3D Gaussian to assimilate the spatial information from other areas and semantic information from texts. Extensive experiments show the superiority of our framework of high-quality 3D GS initialization against the existing methods, e.g., Shap-E, by taking lexically simple, medium, and hard texts. Also, our framework can be seamlessly plugged into SoTA training frameworks, e.g., LucidDreamer, for semantically consistent text-to-3D generation.	翻訳日:2024-08-05 13:17:55 公開日:2024-08-02
# ニューラルネットワークODEにおける認証ロバスト不変ポリトープ訓練 Certified Robust Invariant Polytope Training in Neural Controlled ODEs ( http://arxiv.org/abs/2408.01273v1 ) ライセンス: Link先を確認	Akash Harapanahalli, Samuel Coogan,	(参考訳) 本研究では、フィードフォワードニューラルネットワークとしてパラメータ化された状態フィードバック制御器を用いて、外乱を受ける通常の微分方程式としてモデル化された非線形制御系について考察する。本研究では,ポリトープ内で初期化される任意の軌道が,乱れに関係なくポリトープ内に留まる,頑健な前方不変ポリトープを持つコントローラのトレーニングフレームワークを提案する。まず,高次元空間における昇降制御系の一群をパラメータ化し,各昇降系の不変部分空間上で元のニューラル制御系が進化する。我々は、間隔解析とニューラルネットワーク検証を用いて、昇降埋め込みシステムのファミリーを構築し、この不変部分空間の知識を注意深く把握する。任意の持ち上げ埋め込み系のベクトル場が1つの点で符号制約を満たすなら、元の系のある凸ポリトープは頑強に前方不変である。ニューラルネットワーク制御器と昇降系パラメータを変数として扱うことにより,閉ループ制御系における前方不変ポリトープを認定した制御器を訓練するアルゴリズムを提案する。 2つの例を通して、手話制約の単純さによって、システム次元を50ドル以上の状態に拡張し、実行時に最先端のリャプノフベースのサンプリングアプローチより優れていることを示す。 We consider a nonlinear control system modeled as an ordinary differential equation subject to disturbance, with a state feedback controller parameterized as a feedforward neural network. We propose a framework for training controllers with certified robust forward invariant polytopes, where any trajectory initialized inside the polytope remains within the polytope, regardless of the disturbance. First, we parameterize a family of lifted control systems in a higher dimensional space, where the original neural controlled system evolves on an invariant subspace of each lifted system. We use interval analysis and neural network verifiers to further construct a family of lifted embedding systems, carefully capturing the knowledge of this invariant subspace. If the vector field of any lifted embedding system satisfies a sign constraint at a single point, then a certain convex polytope of the original system is robustly forward invariant. Treating the neural network controller and the lifted system parameters as variables, we propose an algorithm to train controllers with certified forward invariant polytopes in the closed-loop control system. Through two examples, we demonstrate how the simplicity of the sign constraint allows our approach to scale with system dimension to over $50$ states, and outperform state-of-the-art Lyapunov-based sampling approaches in runtime.	翻訳日:2024-08-05 13:17:55 公開日:2024-08-02
# ウェーブマンバ:超高精細低光画像強調のためのウェーブレット状態空間モデル Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement ( http://arxiv.org/abs/2408.01276v1 ) ライセンス: Link先を確認	Wenbin Zou, Hongxia Gao, Weipeng Yang, Tongtong Liu,	(参考訳) 超高精細(UHD)技術は、視力の異常さから注目されているが、低照度画像強調(LLIE)技術に新たな課題も生じている。 UHD画像は本質的に高い計算複雑性を有しており、既存のUHD LLIE法では、計算コストを削減し、結果として情報損失をもたらす。ウェーブレット変換は情報を失うことなくダウンサンプリングを可能にするだけでなく、画像の内容とノイズを分離する。これにより、状態空間モデル(SSM)は、長いシーケンスをモデル化する際のノイズの影響を避けることができ、SSMの長いシーケンスモデリング機能を完全に活用することができる。そこで本研究では,ウェーブレット領域から導出した2つの重要な洞察に基づく新しいアプローチであるWave-Mambaを提案する。 1) 画像の内容情報の大部分は低周波成分に存在し、高周波成分にはほとんど含まれない。 2) 高周波成分は低照度化の結果に最小限の影響を与える。具体的には,UHD画像のグローバルなコンテント情報を効率的にモデル化するために,低周波サブバンド情報の復元に重点を置いた低周波状態空間ブロック(LFSSBlock)を提案する。さらに,高周波数サブバンド情報に対する高周波数拡張ブロック (HFEBlock) を提案する。網羅的な評価により,提案手法は優れた性能を示し,より合理化されたアーキテクチャを維持しつつ,現在の先行技術を大きく上回っている。コードはhttps://github.com/AlexZou14/Wave-Mambaで入手できる。 Ultra-high-definition (UHD) technology has attracted widespread attention due to its exceptional visual quality, but it also poses new challenges for low-light image enhancement (LLIE) techniques. UHD images inherently possess high computational complexity, leading existing UHD LLIE methods to employ high-magnification downsampling to reduce computational costs, which in turn results in information loss. The wavelet transform not only allows downsampling without loss of information, but also separates the image content from the noise. It enables state space models (SSMs) to avoid being affected by noise when modeling long sequences, thus making full use of the long-sequence modeling capability of SSMs. On this basis, we propose Wave-Mamba, a novel approach based on two pivotal insights derived from the wavelet domain: 1) most of the content information of an image exists in the low-frequency component, less in the high-frequency component. 2) The high-frequency component exerts a minimal influence on the outcomes of low-light enhancement. Specifically, to efficiently model global content information on UHD images, we proposed a low-frequency state space block (LFSSBlock) by improving SSMs to focus on restoring the information of low-frequency sub-bands. Moreover, we propose a high-frequency enhance block (HFEBlock) for high-frequency sub-band information, which uses the enhanced low-frequency information to correct the high-frequency information and effectively restore the correct high-frequency details. Through comprehensive evaluation, our method has demonstrated superior performance, significantly outshining current leading techniques while maintaining a more streamlined architecture. The code is available at https://github.com/AlexZou14/Wave-Mamba.	翻訳日:2024-08-05 13:17:55 公開日:2024-08-02
# 共鳴原理を超えたタイプIIポンプ:エネルギーから幾何学的規則へ Type-II pumping beyond resonance principle: From energetic to geometric rules ( http://arxiv.org/abs/2408.01282v1 ) ライセンス: Link先を確認	B. Q. Song, J. D. H. Smith, Y. X. Yao, J. Wang,	(参考訳) 従来、ポンプはエネルギー共鳴に依存していた: Energy Quanta ${\hbar}{\omega}$はギャップ$\Delta$と一致する。線形近似では、これはフェルミ・ゴールデン・ルール(Fermi Golden Rule, FGR)と呼ばれる。しかし、この原理は、${\omega},{\Delta}{\rightarrow}0$を同時に持つ「0/0」極限に適用することが困難になる。位相位相遷移 (TPT) のような「0/0」のシナリオでは、FGRが定式化したタイプIと区別した幾何学的規則に基づくタイプIIポンピング、幾何ポンピング (GP) が認識される。 Type-Iは「エネルギーの狭さ」を特徴とし、FGRのフェルミ分布への依存(原子価と伝導帯の確率)に反映される粒子を高エネルギーで送る。 GPは非指向的であるが、その確率は、検出のキーシグネチャである$f_v+f_c-2f_v f_c$に依存する。本研究では,(1)GPの概念,(2)TPTの分数性,可逆性,依存性,(3)ZrTe$_5$のコヒーレントフォノン駆動における超高速スペクトルによる実験的検出について述べる。 Conventionally, pumping relies on energetic resonance: energy quanta ${\hbar}{\omega}$ matches the gap $\Delta$. Under linear approximation, this is known as the Fermi golden rule (FGR). However, this principle becomes challenging to apply in the "0/0" limit, where ${\omega},{\Delta}{\rightarrow}0$ simultaneously. In "0/0" scenarios, such as topological phase transition (TPT), a type-II pumping, geometric pumping (GP), is recognized subject to geometric rules, distinguished from type-I dictated by FGR. Type-I features an "arrow of energy", sending particles higher in energy, reflected by FGR's dependence on Fermi distribution $f_v-f_c$ (probabilities of valence and conduction bands). While GP is non-directional, its probability relies on $f_v+f_c-2f_v f_c$ instead, a key signature for detection. In this work, we address: (1) the concept of GP; (2) its features of fractionality, irreversibility, and dependence on TPT; (3) experimental detection with ultra-fast spectrum in coherent phonon driving of ZrTe$_5$.	翻訳日:2024-08-05 13:17:55 公開日:2024-08-02
# オート・データ・プルーニングによる人間活動認識用ODLコアの小型化 A Tiny Supervised ODL Core with Auto Data Pruning for Human Activity Recognition ( http://arxiv.org/abs/2408.01283v1 ) ライセンス: Link先を確認	Hiroki Matsutani, Radu Marculescu,	(参考訳) 本稿では,人間活動認識のための入力データの分布変化に対処できる,低コストで低消費電力の小型教師ありオンデバイス学習(ODL)コアを提案する。リソース制限エッジデバイス用のODLは近年研究されているが、実行時にこれらのデバイスにトレーニングラベルを正確に提供する方法は未解決のままである。この問題に対処するために、教師付きODLと自動データプルーニングを組み合わせることで、教師装置から予測されたラベルを取得するのに必要なクエリ数を削減し、モデル再トレーニング時の消費電力を削減することを提案する。データプルーニングしきい値が自動的に調整され、手動のしきい値調整が不要になる。人間の活動認識のための数mWの小さなMLソリューションとして、45nmのCMOSプロセス技術を用いて、自動データプルーニングをサポートする教師付きODLコアを設計する。我々は,コアに必要なメモリサイズが同一形状の多層パーセプトロン(MLP)よりも小さく,消費電力は3.39mWであることを示した。人間の活動認識データセットを用いた実験では、提案した自動データプルーニングにより通信容量が55.7%減少し、消費電力は0.9%の精度で減少した。 In this paper, we introduce a low-cost and low-power tiny supervised on-device learning (ODL) core that can address the distributional shift of input data for human activity recognition. Although ODL for resource-limited edge devices has been studied recently, how exactly to provide the training labels to these devices at runtime remains an open-issue. To address this problem, we propose to combine an automatic data pruning with supervised ODL to reduce the number queries needed to acquire predicted labels from a nearby teacher device and thus save power consumption during model retraining. The data pruning threshold is automatically tuned, eliminating a manual threshold tuning. As a tinyML solution at a few mW for the human activity recognition, we design a supervised ODL core that supports our automatic data pruning using a 45nm CMOS process technology. We show that the required memory size for the core is smaller than the same-shaped multilayer perceptron (MLP) and the power consumption is only 3.39mW. Experiments using a human activity recognition dataset show that the proposed automatic data pruning reduces the communication volume by 55.7% and power consumption accordingly with only 0.9% accuracy loss.	翻訳日:2024-08-05 13:17:55 公開日:2024-08-02
# 音声視覚一般化ゼロショット学習のためのアウトオフ分布検出:汎用フレームワーク Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework ( http://arxiv.org/abs/2408.01284v1 ) ライセンス: Link先を確認	Liuyuan Wen,	(参考訳) Generalized Zero-Shot Learning (GZSL) は、目に見えないクラスと見えないクラスの両方を正確に分類する必要がある課題である。この領域内では、視覚的特徴と音響的特徴の両方をマルチモーダル入力として含めることを考えると、オーディオ視覚GZSLは非常にエキサイティングだが難しいタスクとして現れます。この分野での既存の取り組みは、主に埋め込みベースの手法または生成ベースの手法を利用する。しかし、生成的トレーニングは困難で不安定であり、埋め込みベースの手法はドメインシフト問題に遭遇することが多い。したがって、両手法を統一されたフレームワークに統合し、それぞれのデメリットを軽減しつつ、それらの利点を活用することが期待できる。本研究は,両アプローチの強みを活かすために,OED(out-of-distriion)検出を用いた汎用フレームワークを提案する。まず、生成的対向ネットワークを用いて未知の特徴を合成し、見知らぬクラスのための分類器とともにOOD検出器の訓練を可能にする。この検出器は、テスト特徴が見知らぬクラスに属しているかどうかを判断し、続いて各特徴種別分類器を用いた分類を行う。我々は,3つの人気オーディオ・ビジュアル・データセット上でフレームワークをテストし,既存の最先端技術と比較した大幅な改善を観察する。コードはhttps://github.com/liuyuan-wen/AV-OOD-GZSLにある。 Generalized Zero-Shot Learning (GZSL) is a challenging task requiring accurate classification of both seen and unseen classes. Within this domain, Audio-visual GZSL emerges as an extremely exciting yet difficult task, given the inclusion of both visual and acoustic features as multi-modal inputs. Existing efforts in this field mostly utilize either embedding-based or generative-based methods. However, generative training is difficult and unstable, while embedding-based methods often encounter domain shift problem. Thus, we find it promising to integrate both methods into a unified framework to leverage their advantages while mitigating their respective disadvantages. Our study introduces a general framework employing out-of-distribution (OOD) detection, aiming to harness the strengths of both approaches. We first employ generative adversarial networks to synthesize unseen features, enabling the training of an OOD detector alongside classifiers for seen and unseen classes. This detector determines whether a test feature belongs to seen or unseen classes, followed by classification utilizing separate classifiers for each feature type. We test our framework on three popular audio-visual datasets and observe a significant improvement comparing to existing state-of-the-art works. Codes can be found in https://github.com/liuyuan-wen/AV-OOD-GZSL.	翻訳日:2024-08-05 13:17:55 公開日:2024-08-02
# 人間とモデルの誤測:大規模言語モデルにおけるアロケーション・ハームの評価 The Mismeasure of Man and Models: Evaluating Allocational Harms in Large Language Models ( http://arxiv.org/abs/2408.01285v1 ) ライセンス: Link先を確認	Hannah Chen, Yangfeng Ji, David Evans,	(参考訳) 大規模言語モデル(LLM)は現在検討され、採用や臨床決定といった高い意思決定をサポートするアプリケーションにデプロイされている。バイアスを測定するためにいくつかの方法が提案されているが、提案手法が考慮している予測と、その決定にどのように使用されるかの間には、いまだにギャップがある。本研究では,LLM予測におけるバイアスに起因する潜在的アロケーション障害を評価するモデル非依存バイアス指標であるランクアロケーションベースバイアス指数(RABBI)を導入する。 RABBIと現在のバイアスメトリクスを2つの割り当て決定タスクで比較する。モデル選択のための10個のLLMと実用性に対して,それらの予測妥当性を評価した。以上の結果から, 平均性能差と分布距離に基づくバイアス指標は, 割り当て結果におけるグループ差を確実に捉えることができず, RABBIはアロケーション差と強い相関関係を示すことが明らかとなった。私たちの研究は、限られたリソース制約のあるコンテキストでモデルがどのように使用されるかを説明する必要性を強調しています。 Large language models (LLMs) are now being considered and even deployed for applications that support high-stakes decision-making, such as recruitment and clinical decisions. While several methods have been proposed for measuring bias, there remains a gap between predictions, which are what the proposed methods consider, and how they are used to make decisions. In this work, we introduce Rank-Allocational-Based Bias Index (RABBI), a model-agnostic bias measure that assesses potential allocational harms arising from biases in LLM predictions. We compare RABBI and current bias metrics on two allocation decision tasks. We evaluate their predictive validity across ten LLMs and utility for model selection. Our results reveal that commonly-used bias metrics based on average performance gap and distribution distance fail to reliably capture group disparities in allocation outcomes, whereas RABBI exhibits a strong correlation with allocation disparities. Our work highlights the need to account for how models are used in contexts with limited resource constraints.	翻訳日:2024-08-05 13:17:55 公開日:2024-08-02
# 深層学習に基づくビジュアルリッチな文書コンテンツ理解:調査 Deep Learning based Visually Rich Document Content Understanding: A Survey ( http://arxiv.org/abs/2408.01287v1 ) ライセンス: Link先を確認	Yihao Ding, Jean Lee, Soyeon Caren Han,	(参考訳) ビジュアルリッチドキュメンテーション(VRD)は、学術、金融、医療、マーケティングにおいて必要不可欠である。 VRDから情報を抽出する従来の方法は、専門家の知識と手作業に依存しており、費用がかかり非効率である。ディープラーニングの出現は、このプロセスに革命をもたらし、マルチモーダルな情報ビジョン、テキスト、レイアウト、および包括的な文書表現を開発するための事前訓練タスクを活用するモデルを導入した。これらのモデルは、様々な下流タスクにおける最先端のパフォーマンスを達成し、VRDからの情報抽出の効率と精度を大幅に向上させた。本稿では,Visually Rich Document Understanding (VRDU)における要求の高まりと迅速な開発に対応するため,ディープラーニングベースのVRDUフレームワークの包括的なレビューを行う。既存の手法とベンチマークデータセットを体系的に調査し分析し、採用戦略と下流タスクに基づいて分類する。さらに,VRDUモデルで使用されるさまざまなテクニックを比較し,特徴表現と融合,モデルアーキテクチャ,事前学習手法に着目し,その強み,制限,適切なシナリオを強調した。最後に、VRDUの新たなトレンドと課題を特定し、今後の研究方向や実践的応用に関する洞察を提供する。本調査は,VRDUの進歩を深く理解し,学術分野と産業分野の両方に利益をもたらすことを目的としている。 Visually Rich Documents (VRDs) are essential in academia, finance, medical fields, and marketing due to their multimodal information content. Traditional methods for extracting information from VRDs depend on expert knowledge and manual labor, making them costly and inefficient. The advent of deep learning has revolutionized this process, introducing models that leverage multimodal information vision, text, and layout along with pretraining tasks to develop comprehensive document representations. These models have achieved state-of-the-art performance across various downstream tasks, significantly enhancing the efficiency and accuracy of information extraction from VRDs. In response to the growing demands and rapid developments in Visually Rich Document Understanding (VRDU), this paper provides a comprehensive review of deep learning-based VRDU frameworks. We systematically survey and analyze existing methods and benchmark datasets, categorizing them based on adopted strategies and downstream tasks. Furthermore, we compare different techniques used in VRDU models, focusing on feature representation and fusion, model architecture, and pretraining methods, while highlighting their strengths, limitations, and appropriate scenarios. Finally, we identify emerging trends and challenges in VRDU, offering insights into future research directions and practical applications. This survey aims to provide a thorough understanding of VRDU advancements, benefiting both academic and industrial sectors.	翻訳日:2024-08-05 13:17:55 公開日:2024-08-02
# TexGen:マルチビューサンプリングと再サンプリングによるテキストガイド型3Dテクスチャ生成 TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling ( http://arxiv.org/abs/2408.01291v1 ) ライセンス: Link先を確認	Dong Huo, Zixin Guo, Xinxin Zuo, Zhihao Shi, Juwei Lu, Peng Dai, Songcen Xu, Li Cheng, Yee-Hong Yang,	(参考訳) 3Dメッシュが与えられた場合、任意のテクスチャ記述に対応する3Dテクスチャを合成することを目的としている。サンプルビューからテクスチャを生成し組み立てる現在の手法は、しばしば顕著なシームや過度な平滑化をもたらす。これらの課題に対処するために,事前学習したテキスト・画像拡散モデルを利用したテクスチャ生成のための新しい多視点サンプリング・再サンプリングフレームワークであるTexGenを提案する。ビュー一貫したサンプリングのために、まず第一にRGB空間におけるテクスチャマップを維持し、それはデノナイジングステップによってパラメータ化され、拡散モデルの各サンプリングステップ後に更新され、ビューの不一致を漸進的に低減する。注目誘導型マルチビューサンプリング戦略を利用して、ビュー間で外観情報をブロードキャストする。テクスチャの詳細を保存するために、テキストプロンプトと現在のテクスチャマップによって指示された、ノイズの推定を支援し、その後のデノナイジングステップの入力を生成するノイズリサンプリング技術を開発した。定性的・定量的な評価を多量に行い, 多様な3次元オブジェクトのテクスチャ品質を高いビュー一貫性とリッチな外観で向上させ, 最先端の手法よりも優れていたことを実証した。さらに,テクスチャ生成技術は,テクスチャ編集にも適用可能である。さらなる実験結果はhttps://dong-huo.github.io/TexGen/で公開されている。 Given a 3D mesh, we aim to synthesize 3D textures that correspond to arbitrary textual descriptions. Current methods for generating and assembling textures from sampled views often result in prominent seams or excessive smoothing. To tackle these issues, we present TexGen, a novel multi-view sampling and resampling framework for texture generation leveraging a pre-trained text-to-image diffusion model. For view consistent sampling, first of all we maintain a texture map in RGB space that is parameterized by the denoising step and updated after each sampling step of the diffusion model to progressively reduce the view discrepancy. An attention-guided multi-view sampling strategy is exploited to broadcast the appearance information across views. To preserve texture details, we develop a noise resampling technique that aids in the estimation of noise, generating inputs for subsequent denoising steps, as directed by the text prompt and current texture map. Through an extensive amount of qualitative and quantitative evaluations, we demonstrate that our proposed method produces significantly better texture quality for diverse 3D objects with a high degree of view consistency and rich appearance details, outperforming current state-of-the-art methods. Furthermore, our proposed texture generation technique can also be applied to texture editing while preserving the original identity. More experimental results are available at https://dong-huo.github.io/TexGen/	翻訳日:2024-08-05 13:17:55 公開日:2024-08-02
# 3DPX:ハイブリッドMLP-CNNネットワークを用いたプログレッシブ2次元3次元口腔画像再構成 3DPX: Progressive 2D-to-3D Oral Image Reconstruction with Hybrid MLP-CNN Networks ( http://arxiv.org/abs/2408.01292v1 ) ライセンス: Link先を確認	Xiaoshuang Li, Mingyuan Meng, Zimo Huang, Lei Bi, Eduardo Delamare, Dagan Feng, Bin Sheng, Jinman Kim,	(参考訳) パノラマX線(パノラマX線、英: Panoramic X-ray、PX)は、歯科医療において、広く利用でき、低コストである。しかし、2Dプロジェクション画像として、PXは解剖学的情報を含まないため、3D情報(例えば、歯角ミスリグメンションの検出と分類)の恩恵を受けることができる歯科応用に限られている。 2D PXから直接3D構造を再構築し、直接2Dから3Dマッピングのために主に畳み込みニューラルネットワーク(CNN)に依存する既存の手法の限界に対処する研究が最近行われた。しかし,これらの手法は深度軸空間情報を正確に推測することができない。さらに、畳み込み演算の固有の局所性によって制限され、畳み込みカーネルはすぐ近くのピクセルの情報のみをキャプチャする。本研究では2D-to-3D経口PX再建のためのプログレッシブハイブリッド多層パーセプトロン(MLP)-CNNピラミドネットワーク(DPX)を提案する。本稿では, 3次元像を3DPXで段階的に再構成し, 各ピラミッドレベルでの中間再構成結果にガイダンスを付与するプログレッシブ・コンストラクション戦略を提案する。さらに, 細粒度長範囲依存の獲得を約束するMLPの出現により, 再建中の意味理解を改善するため, 3DPXはMLPとCNNを統合した。 464研究を含む2つの大規模データセットの大規模な実験により、我々の3DPXは、スタンドアローンのMLPやトランスフォーマーを含む最先端の2D-to-3D経口再建法を再構築品質で上回り、下流の角方向の不整合分類タスクの性能を即時的に向上させることを示した。 Panoramic X-ray (PX) is a prevalent modality in dental practice for its wide availability and low cost. However, as a 2D projection image, PX does not contain 3D anatomical information, and therefore has limited use in dental applications that can benefit from 3D information, e.g., tooth angular misa-lignment detection and classification. Reconstructing 3D structures directly from 2D PX has recently been explored to address limitations with existing methods primarily reliant on Convolutional Neural Networks (CNNs) for direct 2D-to-3D mapping. These methods, however, are unable to correctly infer depth-axis spatial information. In addition, they are limited by the in-trinsic locality of convolution operations, as the convolution kernels only capture the information of immediate neighborhood pixels. In this study, we propose a progressive hybrid Multilayer Perceptron (MLP)-CNN pyra-mid network (3DPX) for 2D-to-3D oral PX reconstruction. We introduce a progressive reconstruction strategy, where 3D images are progressively re-constructed in the 3DPX with guidance imposed on the intermediate recon-struction result at each pyramid level. Further, motivated by the recent ad-vancement of MLPs that show promise in capturing fine-grained long-range dependency, our 3DPX integrates MLPs and CNNs to improve the semantic understanding during reconstruction. Extensive experiments on two large datasets involving 464 studies demonstrate that our 3DPX outperforms state-of-the-art 2D-to-3D oral reconstruction methods, including standalone MLP and transformers, in reconstruction quality, and also im-proves the performance of downstream angular misalignment classification tasks.	翻訳日:2024-08-05 13:17:55 公開日:2024-08-02
# 水路安定化による水中物体検出 Underwater Object Detection Enhancement via Channel Stabilization ( http://arxiv.org/abs/2408.01293v1 ) ライセンス: Link先を確認	Muhammad Ali, Salman Khan,	(参考訳) 複雑な海洋環境は、物体検出多様体の課題を悪化させる。海洋ゴミは水生生態系を危険にさらし、永続的な課題を提示している。海洋堆積物の正確な検出は、この害を緩和するために重要である。本研究は、画像品質の向上と検出方法の評価により水中物体検出に対処する。 Detectron2のバックボーンには,さまざまなベースモデルとコンフィギュレーションが用意されています。本稿では,訓練画像におけるヘイズやカラーキャストを低減し,マルチスケール物体検出を改善するため,簡易な画像強調モデルとともにチャネル安定化手法を提案する。画像処理の後、最適な検出精度を得るために、異なるDectron2バックボーンをテストした。さらに、オブジェクトプロファイルの強調表示に拡張手法を付加したシャープニングフィルタを適用し、認識を容易にする。 TrashCan Datasetでは、インスタンスバージョンとマテリアルバージョンの両方で結果が示されている。最も優れたバックボーン法は,チャネル安定化と拡張技術が組み込まれている。また、検出結果をDeformable Transformerと比較する。 TrashCan 1.0のインスタンスバージョンでは、小さなオブジェクトの平均精度が9.53%向上し、ベースラインと比較して境界ボックス検出が7%向上した。コードはコードで利用可能になる。 https://github.com/aliman80/Underwater-Object-Detection-via-Channel-Stablization The complex marine environment exacerbates the challenges of object detection manifold. Marine trash endangers the aquatic ecosystem, presenting a persistent challenge. Accurate detection of marine deposits is crucial for mitigating this harm. Our work addresses underwater object detection by enhancing image quality and evaluating detection methods. We use Detectron2's backbone with various base models and configurations for this task. We propose a novel channel stabilization technique alongside a simplified image enhancement model to reduce haze and color cast in training images, improving multi-scale object detection. Following image processing, we test different Detectron2 backbones for optimal detection accuracy. Additionally, we apply a sharpening filter with augmentation techniques to highlight object profiles for easier recognition. Results are demonstrated on the TrashCan Dataset, both instance and material versions. The best-performing backbone method incorporates our channel stabilization and augmentation techniques. We also compare our Detectron2 detection results with the Deformable Transformer. In the instance version of TrashCan 1.0, our method achieves a 9.53% absolute increase in average precision for small objects and a 7% absolute gain in bounding box detection compared to the baseline. The code will be available on Code: https://github.com/aliman80/Underwater- Object-Detection-via-Channel-Stablization	翻訳日:2024-08-05 13:17:55 公開日:2024-08-02
# 特徴時計:二次元プロットにおける高次元効果 Feature Clock: High-Dimensional Effects in Two-Dimensional Plots ( http://arxiv.org/abs/2408.01294v1 ) ライセンス: Link先を確認	Olga Ovcharenko, Rita Sevastjanova, Valentina Boeva,	(参考訳) 人間は高次元データを知覚し解釈するのに苦労する。したがって、高次元データは可視化のために2次元に投影されることが多い。多くの応用は複素非線形次元減少法の恩恵を受けるが、個々の高次元特徴の影響は二次元空間では説明が難しい。ほとんどの可視化ソリューションでは、複数の2次元プロットを使用し、それぞれが2次元に1つの高次元特徴の効果を示す。我々のソリューションであるFeature Clockは、2次元で表現されたデータ構造に対する元の特徴の影響を把握するためにこれらのkプロットを検査する必要がない新しいアプローチを提供する。 Feature Clockは、組み込みデータの視覚化の可視性とコンパクト性を高め、オープンソースのPythonライブラリで利用できる。 Humans struggle to perceive and interpret high-dimensional data. Therefore, high-dimensional data are often projected into two dimensions for visualization. Many applications benefit from complex nonlinear dimensionality reduction techniques, but the effects of individual high-dimensional features are hard to explain in the two-dimensional space. Most visualization solutions use multiple two-dimensional plots, each showing the effect of one high-dimensional feature in two dimensions; this approach creates a need for a visual inspection of k plots for a k-dimensional input space. Our solution, Feature Clock, provides a novel approach that eliminates the need to inspect these k plots to grasp the influence of original features on the data structure depicted in two dimensions. Feature Clock enhances the explainability and compactness of visualizations of embedded data and is available in an open-source Python library.	翻訳日:2024-08-05 13:17:55 公開日:2024-08-02
# 多変量分類木を用いた最適混合整数線形最適化 Optimal Mixed Integer Linear Optimization Trained Multivariate Classification Trees ( http://arxiv.org/abs/2408.01297v1 ) ライセンス: Link先を確認	Brandon Alston, Illya V. Hicks,	(参考訳) 多変量決定木は、多くの研究者や業界の専門家を惹きつける、分類と回帰のための強力な機械学習ツールである。最適な二分木は2種類の頂点を持つ。一正確に二人の子供がおり、個別の特徴の集合に基づいてデータポイントが評価されている分枝頂点二データポイントが予測される葉の頂点であって、目的とする生物客観的最適化問題を解くことにより得ることができること。 i) 正しく分類されたデータポイントの数を最大化し、 (ii)分岐頂点の数を最小化する。分岐頂点は訓練特徴の線形結合であり、したがって超平面とみなすことができる。本稿では、最適二分分類木を設計するための2つのカットベース混合整数線形最適化(MILO)法を提案する。我々のモデルは、最小限の実用不可能なサブシステム(MIS)をオンザフライで識別し、パッケージング制約の形をとる切断平面を導出する。本稿では,現在論文の中で最強のフローベースMILO定式化に関する理論的改善と,我々のモデルがスケールする能力,従来のブランチとバウンドアプローチに対する強み,サンプル外テスト性能の堅牢性を示すために利用可能なデータセットの実験を行う。コードとデータはGitHubで入手可能です。 Multivariate decision trees are powerful machine learning tools for classification and regression that attract many researchers and industry professionals. An optimal binary tree has two types of vertices, (i) branching vertices which have exactly two children and where datapoints are assessed on a set of discrete features and (ii) leaf vertices at which datapoints are given a prediction, and can be obtained by solving a biobjective optimization problem that seeks to (i) maximize the number of correctly classified datapoints and (ii) minimize the number of branching vertices. Branching vertices are linear combinations of training features and therefore can be thought of as hyperplanes. In this paper, we propose two cut-based mixed integer linear optimization (MILO) formulations for designing optimal binary classification trees (leaf vertices assign discrete classes). Our models leverage on-the-fly identification of minimal infeasible subsystems (MISs) from which we derive cutting planes that hold the form of packing constraints. We show theoretical improvements on the strongest flow-based MILO formulation currently in the literature and conduct experiments on publicly available datasets to show our models' ability to scale, strength against traditional branch and bound approaches, and robustness in out-of-sample test performance. Our code and data are available on GitHub.	翻訳日:2024-08-05 13:17:55 公開日:2024-08-02
# 遠隔超電導量子ビットシステムの完全自己試験 Complete Self-Testing of a System of Remote Superconducting Qubits ( http://arxiv.org/abs/2408.01299v1 ) ライセンス: Link先を確認	Simon Storz, Anatoly Kulikov, Josua D. Schär, Victor Barizien, Xavier Valcarce, Florence Berterottière, Nicolas Sangouard, Jean-Daniel Bancal, Andreas Wallraff,	(参考訳) セルフテストプロトコルは、デバイスに依存しない方法で量子システムの認証を可能にする。本稿では、大規模量子コンピューティングシステムを構築するための主要なプラットフォームである超伝導回路を用いた評価ルーチンの高規格化を実演する。まず、パウリ測度の自己検定が可能な欠損理論を開発する。次にベル対の生成と測定を同時に行い、30m間隔で動作する2つの絡み合った超伝導回路からなるシステムで完全な自己試験を行う。 1700万回の試験に基づく実験では、平均CHSH (Clauser-Horne-Shimony-Holt) S値は2.236である。実験装置に関する追加の仮定を頼らずに、平均ベル状態忠実度は58.9%、平均測定忠実度は少なくとも89.5%、信頼度は99%である。これにより、分散量子コンピューティングや、デリゲート量子コンピューティングのような超伝導回路との通信の分野での応用が可能になる。 Self-testing protocols enable the certification of quantum systems in a device-independent manner, i.e. without knowledge of the inner workings of the quantum devices under test. Here, we demonstrate this high standard for characterization routines with superconducting circuits, a prime platform for building large-scale quantum computing systems. We first develop the missing theory allowing for the self-testing of Pauli measurements. We then self-test Bell pair generation and measurements at the same time, performing a complete self-test in a system composed of two entangled superconducting circuits operated at a separation of 30 meters. In an experiment based on 17 million trials, we measure an average CHSH (Clauser-Horne-Shimony-Holt) S-value of 2.236. Without relying on additional assumptions on the experimental setup, we certify an average Bell state fidelity of at least 58.9% and an average measurement fidelity of at least 89.5% in a device-independent manner, both with 99% confidence. This enables applications in the field of distributed quantum computing and communication with superconducting circuits, such as delegated quantum computing.	翻訳日:2024-08-05 13:07:59 公開日:2024-08-02
# 共変量摂動を用いた機械学習モデルのロバスト性評価 Assessing Robustness of Machine Learning Models using Covariate Perturbations ( http://arxiv.org/abs/2408.01300v1 ) ライセンス: Link先を確認	Arun Prakash R, Anwesha Bhattacharyya, Joel Vaughan, Vijayan N. Nair,	(参考訳) 機械学習モデルが金融、医療などの分野における重要な意思決定モデルやシステムでますます普及するにつれて、敵対的攻撃に対する堅牢性を確保し、入力データの変化が最優先される。本稿では,共変量摂動手法を用いて機械学習モデルの堅牢性を評価するための包括的枠組みを提案する。本研究では, モデル予測におけるロバスト性の評価と, 数値変数と非数値変数の分離戦略, 異なるシナリオにおけるモデルロバスト性の評価と比較のための摂動の要約, モデルが特に不安定なデータ領域を識別するための局所ロバスト性診断などについて検討する。実世界のデータセットに関する実証的研究を通じて、モデル間のロバスト性を比較し、モデルの不安定性を同定し、モデルロバスト性を高めるためのアプローチの有効性を実証する。 As machine learning models become increasingly prevalent in critical decision-making models and systems in fields like finance, healthcare, etc., ensuring their robustness against adversarial attacks and changes in the input data is paramount, especially in cases where models potentially overfit. This paper proposes a comprehensive framework for assessing the robustness of machine learning models through covariate perturbation techniques. We explore various perturbation strategies to assess robustness and examine their impact on model predictions, including separate strategies for numeric and non-numeric variables, summaries of perturbations to assess and compare model robustness across different scenarios, and local robustness diagnosis to identify any regions in the data where a model is particularly unstable. Through empirical studies on real world dataset, we demonstrate the effectiveness of our approach in comparing robustness across models, identifying the instabilities in the model, and enhancing model robustness.	翻訳日:2024-08-05 13:07:59 公開日:2024-08-02
# 不確実性を考慮したAI自己評価設計のための決定駆動手法 A Decision-driven Methodology for Designing Uncertainty-aware AI Self-Assessment ( http://arxiv.org/abs/2408.01301v1 ) ライセンス: Link先を確認	Gregory Canal, Vladimir Leung, Philip Sage, Eric Heim, I-Jeng Wang,	(参考訳) 人工知能(AI)は社会全体で意思決定プロセスやシステムに革命をもたらしており、特に国家の関心の高影響シナリオにおいて重要な技術として現れている。しかし、制御された設定におけるAIの印象的な予測能力にもかかわらず、さまざまなクリティカルなシナリオでAIが広く使われるのを防ぐための、さまざまな実践的な障害に悩まされている。特に、特定のAIシステムの予測が下流アプリケーションで意思決定者によって信頼されるかどうかは、一般的には不明確である。より透明で堅牢で信頼性の高いAIシステムの必要性に対処するため、AI予測の不確実性を定量化するための一連のツールが開発され、より一般的には、AIがその予測の信頼性を"自己評価"することができる。本稿では,いくつかの重要な側面に沿ったAI自己評価手法を分類し,実践者のニーズに応じて適切な方法を選択し,設計するためのガイドラインを提供する。特に,下流の意思決定者による選択に対する自己評価の影響を考慮した不確実性評価手法と,意思決定結果のコストとメリットに着目した。自己評価設計における方法論の有用性を実証するために,2つの現実的な国家的関心シナリオにその有用性を示す。この原稿は、機械学習エンジニアとAIシステム利用者が各問題に対する理想的な自己評価テクニックを選択するための実践的なガイドである。 Artificial intelligence (AI) has revolutionized decision-making processes and systems throughout society and, in particular, has emerged as a significant technology in high-impact scenarios of national interest. Yet, despite AI's impressive predictive capabilities in controlled settings, it still suffers from a range of practical setbacks preventing its widespread use in various critical scenarios. In particular, it is generally unclear if a given AI system's predictions can be trusted by decision-makers in downstream applications. To address the need for more transparent, robust, and trustworthy AI systems, a suite of tools has been developed to quantify the uncertainty of AI predictions and, more generally, enable AI to "self-assess" the reliability of its predictions. In this manuscript, we categorize methods for AI self-assessment along several key dimensions and provide guidelines for selecting and designing the appropriate method for a practitioner's needs. In particular, we focus on uncertainty estimation techniques that consider the impact of self-assessment on the choices made by downstream decision-makers and on the resulting costs and benefits of decision outcomes. To demonstrate the utility of our methodology for self-assessment design, we illustrate its use for two realistic national-interest scenarios. This manuscript is a practical guide for machine learning engineers and AI system users to select the ideal self-assessment techniques for each problem.	翻訳日:2024-08-05 13:07:59 公開日:2024-08-02
# IoTネットワークにおけるセキュリティ向上のためのSDNコントローラに関するシステマティックマッピング A Systematic Mapping Study on SDN Controllers for Enhancing Security in IoT Networks ( http://arxiv.org/abs/2408.01303v1 ) ライセンス: Link先を確認	Charles Oredola, Adnan Ashraf,	(参考訳) コンテキスト:モノのインターネット(IoT)デバイスの増加は、悪意のあるアクターによる偽装操作の増加を引き起こします。これらのアクターは、IoTネットワークをターゲットにしてはならない。サイバーセキュリティの脅威は、IoTネットワークにある脆弱性を悪用できるように、進化し、動的に洗練されたものになっている。しかしながら、中央監視ユニットとしてIoTネットワークにSoftware Defined Network(SDN)が導入されることにより、IoTネットワークの脆弱性は低く、脅威も少なくなっている。しかし、SDN自体はいくつかの脅威に対して脆弱である。目的: SDNコントローラを使用したIoTネットワークのセキュリティ強化に関する最新の技術の概要を、包括的で偏見のない概要として示す。方法:確立したガイドラインに従って,システムマッピング研究(SMS)を用いてSDNを用いたIoTネットワークのセキュリティ向上に関する現在の知見をレビューする。結果: SMSは4つの主要な研究課題に対して分析された33の初等研究からなる。 SMSは現在の研究トレンドを強調し、SDN-IoTネットワークセキュリティのギャップを特定する。結論: IoTネットワークのセキュア化に一般的に使用されるSDNコントローラアーキテクチャは,集中型コントローラアーキテクチャである,と結論付けています。しかし、このアーキテクチャには限界はない。さらに、リスク軽減に使用される主な技術は機械学習である。 Context: The increase in Internet of Things (IoT) devices gives rise to an increase in deceptive manipulations by malicious actors. These actors should be prevented from targeting the IoT networks. Cybersecurity threats have evolved and become dynamically sophisticated, such that they could exploit any vulnerability found in IoT networks. However, with the introduction of the Software Defined Network (SDN) in the IoT networks as the central monitoring unit, IoT networks are less vulnerable and less prone to threats. %Although, the SDN itself is vulnerable to several threats. Objective: To present a comprehensive and unbiased overview of the state-of-the-art on IoT networks security enhancement using SDN controllers. Method: We review the current body of knowledge on enhancing the security of IoT networks using SDN with a Systematic Mapping Study (SMS) following the established guidelines. Results: The SMS result comprises 33 primary studies analyzed against four major research questions. The SMS highlights current research trends and identifies gaps in the SDN-IoT network security. Conclusion: We conclude that the SDN controller architecture commonly used for securing IoT networks is the centralized controller architecture. However, this architecture is not without its limitations. Additionally, the predominant technique utilized for risk mitigation is machine learning.	翻訳日:2024-08-05 13:07:59 公開日:2024-08-02
# MCXゲートにおけるアンシラ量子ビットによる量子回路深さ低減の解析 Analyzing Quantum Circuit Depth Reduction with Ancilla Qubits in MCX Gates ( http://arxiv.org/abs/2408.01304v1 ) ライセンス: Link先を確認	Ahmad Bennakhi, Paul Franzon, Gregory T. Byrd,	(参考訳) 本稿では,アンシラ量子ビットを用いたMCX深度低減技術の概要を読者に提供する。また、異なる量子トポロジカル・セッティングの下でどのように機能するかを簡潔に分析する。調査されたテクニックは再帰とv鎖であり、最も一般的な量子コンピューティングライブラリであるQiskitで最もよく使われているテクニックである。本論文のターゲットは,量子コンピューティングに関する複雑な数学的知識や物理知識を持たない人々である。 This paper aims to give readers a high-level overview of the different MCX depth reduction techniques that utilize ancilla qubits. We also exhibit a brief analysis of how they would perform under different quantum topological settings. The techniques examined are recursion and v-chain, as they are the most commonly used techniques in the most popular quantum computing libraries, Qiskit. The target audience of this paper is people who do not have intricate mathematical or physics knowledge related to quantum computing.	翻訳日:2024-08-05 13:07:59 公開日:2024-08-02
# 非凸スパースペナルティをもつ量子回帰のための分散平滑化ADMM Decentralized Smoothing ADMM for Quantile Regression with Non-Convex Sparse Penalties ( http://arxiv.org/abs/2408.01307v1 ) ライセンス: Link先を確認	Reza Mirzaeifard, Diyako Ghaderyan, Stefan Werner,	(参考訳) 急速に進化するIoT(Internet-of-Things)エコシステムでは、センサによって生成された分散データを扱う上で、効果的なデータ分析技術が不可欠である。本稿では, 実効係数と非実効係数を効果的に区別できないサブ段階的手法のような既存手法の限界に対処するため, 量子化回帰のための乗算器(DSAD)の分散平滑化交互方向法を提案する。提案手法は,ミニマックス・コンケーブペナルティ (MCP) やスムーズクリッピング絶対偏差 (SCAD) などの非凸性スパースペナルティを活用し,有意な予測器の同定と保持を改善する。 DSADはスムーズなADMMフレームワークに総変分ノルムを組み込み、分散ノード間のコンセンサスを実現し、異なるデータソース間で均一なモデル性能を確保する。このアプローチは、分散環境での非凸ペナルティに関連する従来の収束課題を克服する。本稿では,DSADの有効性を検証するための理論的証明と広範囲なシミュレーション結果について述べる。 In the rapidly evolving internet-of-things (IoT) ecosystem, effective data analysis techniques are crucial for handling distributed data generated by sensors. Addressing the limitations of existing methods, such as the sub-gradient approach, which fails to distinguish between active and non-active coefficients effectively, this paper introduces the decentralized smoothing alternating direction method of multipliers (DSAD) for penalized quantile regression. Our method leverages non-convex sparse penalties like the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD), improving the identification and retention of significant predictors. DSAD incorporates a total variation norm within a smoothing ADMM framework, achieving consensus among distributed nodes and ensuring uniform model performance across disparate data sources. This approach overcomes traditional convergence challenges associated with non-convex penalties in decentralized settings. We present theoretical proofs and extensive simulation results to validate the effectiveness of the DSAD, demonstrating its superiority in achieving reliable convergence and enhancing estimation accuracy compared with prior methods.	翻訳日:2024-08-05 13:07:59 公開日:2024-08-02
# 事前学習型言語モデルの定義を取り入れたToken Embeddingの再検討 Reconsidering Token Embeddings with the Definitions for Pre-trained Language Models ( http://arxiv.org/abs/2408.01308v1 ) ライセンス: Link先を確認	Ying Zhang, Dongyuan Li, Manabu Okumura,	(参考訳) トークン共起統計に基づく学習トークン埋め込みは、自然言語処理における事前学習と微調整の両方に有効であることが証明されている。しかし、最近の研究では、学習された埋め込みの分布が異方性に縮退し、事前学習された言語モデル(PLM)でさえ、低周波トークンの埋め込みにおいて意味論的関連情報が失われることに悩まされていることが指摘されている。本研究ではまず, PLM, BART-largeの微調整力学を解析し, その変性に対する堅牢性を示す。そこで本研究では,PLMに対する等方的分散および意味論的トークン埋め込みの構築に定義を用いたDefenseEMBを提案する。本実験は,RoBERTa-base と BART-large の埋め込み構築における Wiktionary の定義の活用の有効性を実証するものである。さらに、低周波トークンのための構築された埋め込みにより、様々なGLUEと4つのテキスト要約データセット間でこれらのモデルの性能が向上する。 Learning token embeddings based on token co-occurrence statistics has proven effective for both pre-training and fine-tuning in natural language processing. However, recent studies have pointed out the distribution of learned embeddings degenerates into anisotropy, and even pre-trained language models (PLMs) suffer from a loss of semantics-related information in embeddings for low-frequency tokens. This study first analyzes fine-tuning dynamics of a PLM, BART-large, and demonstrates its robustness against degeneration. On the basis of this finding, we propose DefinitionEMB, a method that utilizes definitions to construct isotropically distributed and semantics-related token embeddings for PLMs while maintaining original robustness during fine-tuning. Our experiments demonstrate the effectiveness of leveraging definitions from Wiktionary to construct such embeddings for RoBERTa-base and BART-large. Furthermore, the constructed embeddings for low-frequency tokens improve the performance of these models across various GLUE and four text summarization datasets.	翻訳日:2024-08-05 13:07:59 公開日:2024-08-02
# PsybORG+:認知バイアスのトリガと検出のための認知モデル PsybORG+: Cognitive Modeling for Triggering and Detection of Cognitive Biases of Advanced Persistent Threats ( http://arxiv.org/abs/2408.01310v1 ) ライセンス: Link先を確認	Shuo Huang, Quanyan Zhu,	(参考訳) Advanced Persistent Threats(APT)は、高度でステルスな性質のため、サイバーセキュリティに重大な課題をもたらす。従来のサイバーセキュリティ対策はAPTに対する防衛に失敗する。認知的脆弱性は攻撃者の意思決定プロセスに大きな影響を与える可能性がある。本稿では,認知的脆弱性に影響されたAPT行動のモデル化を目的としたマルチエージェントサイバーセキュリティシミュレーション環境であるPsybORGを紹介する。 PsybORGはHMM(Hidden Markov Model)を使用して攻撃行動をシミュレートする。本研究では,行動系列のベイズ推定と決定木解析を用いて認知的脆弱性推定を行う。さらに、合成データを生成するためにPsybORG+というシステムも構築されている。また、攻撃者の日焼けコストの低下を刺激するトリガーも設計しています。我々の貢献には、APTの数学的モデリング、PsybORGの開発、攻撃者の認知的脆弱性を推測する技術の実装が含まれる。 Advanced Persistent Threats (APTs) bring significant challenge to cybersecurity due to their sophisticated and stealthy nature. Traditional cybersecurity measures fail to defend against APTs. Cognitive vulnerabilities can significantly influence attackers' decision-making processes, which presents an opportunity for defenders to exploit these weaknesses. This paper introduces PsybORG, a multi-agent cybersecurity simulation environment designed to model APT behaviors influenced by cognitive vulnerabilities. PsybORG uses a Hidden Markov Model (HMM) to simulate attacker behaviors. We use Bayesian inference and decision tree analysis of action sequences to do cognitive vulnerabilities inference. In addition, a system called PsybORG+ is built for generating synthetic data. We also design a trigger to stimulate the sunk cost fallacy in attackers. Our contributions include the mathematical modeling of APTs, the development of PsybORG, and the implementation of techniques to infer attackers' cognitive vulnerabilities.	翻訳日:2024-08-05 13:07:59 公開日:2024-08-02
# TopoNAS: トポロジカル単純化による勾配型NASの探索効率の向上 TopoNAS: Boosting Search Efficiency of Gradient-based NAS via Topological Simplification ( http://arxiv.org/abs/2408.01311v1 ) ライセンス: Link先を確認	Danpei Zhao, Zhuoran Liu, Bo Yuan,	(参考訳) 探索効率の向上は、ニューラルネットワーク探索(NAS)の重要な目的の1つである。しかし、現在の多くのアプローチは、探索戦略の普遍性を無視し、特にワンショットNASアーキテクチャにおいて、探索過程における計算冗長性を減少させることができない。さらに、現在のNAS法は、非線形探索空間において無効な再パラメータ化を示し、DARTSのような一般的な探索空間では効率が低下する。本稿では,勾配に基づくワンショットNASのモデルに依存しない手法であるTopoNASを提案する。まず,パラメータ化の難しさを明らかにするために,探索空間の非線形性をモデル化する。探索効率を向上させるために,探索可能な経路のトポロジ的構造を簡素化するために,トポロジ的単純化手法を提案し,モジュール共有戦略を反復的に適用する。また,検索精度を維持するため,カーネル正規化手法も提案している。各種探索空間を用いたNASBench201ベンチマーク実験の結果,本手法の有効性が示された。提案したTopoNASは,高い精度を維持しつつ,探索効率の観点から各種アーキテクチャの性能を向上させる。プロジェクトのページはhttps://xdedss.github.io/topo_simplificationで公開されている。 Improving search efficiency serves as one of the crucial objectives of Neural Architecture Search (NAS). However, many current approaches ignore the universality of the search strategy and fail to reduce the computational redundancy during the search process, especially in one-shot NAS architectures. Besides, current NAS methods show invalid reparameterization in non-linear search space, leading to poor efficiency in common search spaces like DARTS. In this paper, we propose TopoNAS, a model-agnostic approach for gradient-based one-shot NAS that significantly reduces searching time and memory usage by topological simplification of searchable paths. Firstly, we model the non-linearity in search spaces to reveal the parameterization difficulties. To improve the search efficiency, we present a topological simplification method and iteratively apply module-sharing strategies to simplify the topological structure of searchable paths. In addition, a kernel normalization technique is also proposed to preserve the search accuracy. Experimental results on the NASBench201 benchmark with various search spaces demonstrate the effectiveness of our method. It proves the proposed TopoNAS enhances the performance of various architectures in terms of search efficiency while maintaining a high level of accuracy. The project page is available at https://xdedss.github.io/topo_simplification.	翻訳日:2024-08-05 13:07:59 公開日:2024-08-02
# 連続観測温度計の最適限界とそのハミルトン構造 Optimal limits of continuously monitored thermometers and their Hamiltonian structure ( http://arxiv.org/abs/2408.01313v1 ) ライセンス: Link先を確認	Mohammad Mehboudi, Florian Meier, Marcus Huber, Harry J. D. Miller,	(参考訳) ボゾン/フェルミオン環境の温度は、完全に特徴づけられた$N$次元プローブを結合することによって測定することができる。準備-測定-リセット戦略は最適な温度測定精度を提供するが、準備とリセットに必要な時間を見落とし、常にプローブを過剰に制御する必要がある。連続監視されたプローブは、有限時間制限を考慮して、この意味でより実用的なものである。そこで本研究では, 連続監視した$N$次元温度計の極限値と最適構造について検討する。局所推定スキームでは、我々のメリットの図形はフィッシャー情報であり、これは平均二乗誤差を逆に有界にしている。フェミオン環境とボゾン環境の両方に最適戦略を提供する。妥当な仮定の下では、最適温度計は有効な2段階の系であり、基底状態の縮退は、単一の基底状態の縮退を持つ平衡状態の最適温度計とは対照的に、$N$で増加する。最適ギャップは、浴の種類(フェルミオン/ボゾン)と特定のスペクトル密度に依存するため、平衡の場合と異なる。 $N\gg 1$の場合、フィッシャー情報は浴の種類に関わらず$N$で線形に成長し、平衡温度測定におけるよく知られた$Nlog^2N$スケーリングを著しく改善することができる。もう1つの注目すべき観察は、N$のスケーリングが事前の無知、すなわち非適応戦略でさえも1/N$でスケールする推定誤差につながるようなベイズ的な設定で消滅しないことである。対照的に、No-go定理は、適応戦略なしで1/\log^2 N$の究極の平衡スケーリングを禁止している。 The temperature of a bosonic/fermionic environment can be measured by coupling a fully characterised $N$-dimensional probe to it. While prepare-measure-reset strategies offer optimal thermometry precision, they overlook the required time for the preparation and reset, and require excessive control of the probe at all times. Continuously monitored probes are more practical in this sense, as they take into account finite-time limitations. Thus, we study the ultimate limits and the optimal structure of continuously monitored $N$-dimensional thermometers. With the local estimation scheme our figure of merit is the Fisher information, which inversely bounds the mean square error. We provide an optimal strategy for both fermionic and bosonic environments. Under reasonable assumptions it turns out that the optimal thermometer is an effective two-level system, with a degeneracy of the ground state that increases with $N$ -- contrary to the optimal thermometers at equilibrium that have a single ground state degeneracy. The optimal gap also differs from the equilibrium case, as it depends on the bath type (fermionic/bosonic) and the specific spectral density. For $N\gg 1$, the Fisher information can grow linearly with $N$ regardless of bath type, significantly improving the well-known $\log^2 N$ scaling for equilibrium thermometry. Another remarkable observation is that the scaling with $N$ does not vanish in presence of prior ignorance, i.e., in a Bayesian setup even non-adaptive strategies can lead to an estimation error that scales with $1/N$. In comparison, a no-go theorem prohibits the ultimate equilibrium scaling $1/\log^2 N$ without adaptive strategies.	翻訳日:2024-08-05 13:07:59 公開日:2024-08-02
# 調節の相乗的経路は神経力学におけるロバストなタスクパッキングを可能にする Synergistic pathways of modulation enable robust task packing within neural dynamics ( http://arxiv.org/abs/2408.01316v1 ) ライセンス: Link先を確認	Giacomo Vedovati, ShiNung Ching,	(参考訳) 脳ネットワークが複数のタスクを同時に学習し、管理する方法を理解することは、神経科学と人工知能の両方に関心がある。この点において、理論神経科学における最近の研究スレッドは、リカレントニューラルネットワークモデルとその内部ダイナミクスがマルチタスク学習をどのように実現するかに焦点を当てている。異なるタスクを管理するには、タスクのアイデンティティやコンテキストに関する情報をモデルに伝達するメカニズムが必要である。本研究では、リカレントネットワークモデルを用いて、ニューラルダイナミクスの文脈的調節の2つの形態、神経興奮性のレベル、シナプス強度のレベルでの区別を探索する。我々は,これらのメカニズムを機能的成果の観点から特徴付け,コンテキストのあいまいさに対する頑健さと,関連して,複数のタスクを有限サイズネットワークにパックする効率に焦点をあてる。また、これらのメカニズムを、それらが引き起こす神経力学のレベルにおいて区別することも示す。これらの特徴は、これらのメカニズムがマルチタスク学習の堅牢性を高めるために、潜在的に複数の時間スケールでどのように振舞うかの相補性と相乗性を示している。 Understanding how brain networks learn and manage multiple tasks simultaneously is of interest in both neuroscience and artificial intelligence. In this regard, a recent research thread in theoretical neuroscience has focused on how recurrent neural network models and their internal dynamics enact multi-task learning. To manage different tasks requires a mechanism to convey information about task identity or context into the model, which from a biological perspective may involve mechanisms of neuromodulation. In this study, we use recurrent network models to probe the distinctions between two forms of contextual modulation of neural dynamics, at the level of neuronal excitability and at the level of synaptic strength. We characterize these mechanisms in terms of their functional outcomes, focusing on their robustness to context ambiguity and, relatedly, their efficiency with respect to packing multiple tasks into finite size networks. We also demonstrate distinction between these mechanisms at the level of the neuronal dynamics they induce. Together, these characterizations indicate complementarity and synergy in how these mechanisms act, potentially over multiple time-scales, toward enhancing robustness of multi-task learning.	翻訳日:2024-08-05 13:07:59 公開日:2024-08-02
# ストリーミングデータのポイント予測 Point Prediction for Streaming Data ( http://arxiv.org/abs/2408.01318v1 ) ライセンス: Link先を確認	Aleena Chanda, N. V. Vinodchandran, Bertrand Clarke,	(参考訳) 本稿では,ストリーミングデータを用いた2つの新しい点予測手法を提案する。 1つはCount-Minスケッチ(CMS)に基づいており、もう1つはランダムなバイアスを持つガウス過程の先行に基づく。これらの手法は、真のモデルがデータストリームに対して有用に定式化できない、最も一般的な予測問題を対象としている。統計的文脈では、これはしばしば$\mathcal{M}$-open problem classと呼ばれる。固定分布関数$F$のi.dサンプルからなるという仮定の下で、分布関数のCMSに基づく推定が一貫したことを示す。我々は新しい手法を2つの確立された予測器と比較し、累積的な$L^1$誤差の観点から比較する。 1つは、通常の専門家設定におけるシュタルコフ解(しばしば正規化最大可能性と呼ばれる)に基づいており、もう1つはディリクレ過程の先行に基づくものである。これらの比較は2例である。 1つはワンパスであり、CMSがスケッチであるという事実を使って予測器の更新が行われることを意味する。 1パスではなく、ストリーミング$K$-meansを使用して、データ蓄積時に更新可能な固定サイズの代表的なサブセットを提供します。予備的な計算研究は、CMS法の1パス中央値バージョンが、十分複雑なデータのための他の方法よりも優れていることは滅多にないことを示している。また、ランダムなバイアスを持つガウス過程に基づく予測器が良好に動作することも見出した。私たちがここで使用しているシュタルコフ予測器は、おそらく最も単純な例だけを使用していたため、うまく機能しなかった。他の予測器は、主にデータがMオープンデータジェネレータから来たように見えなかったときにうまく機能したように思われた。 We present two new approaches for point prediction with streaming data. One is based on the Count-Min sketch (CMS) and the other is based on Gaussian process priors with a random bias. These methods are intended for the most general predictive problems where no true model can be usefully formulated for the data stream. In statistical contexts, this is often called the $\mathcal{M}$-open problem class. Under the assumption that the data consists of i.i.d samples from a fixed distribution function $F$, we show that the CMS-based estimates of the distribution function are consistent. We compare our new methods with two established predictors in terms of cumulative $L^1$ error. One is based on the Shtarkov solution (often called the normalized maximum likelihood) in the normal experts setting and the other is based on Dirichlet process priors. These comparisons are for two cases. The first is one-pass meaning that the updating of the predictors is done using the fact that the CMS is a sketch. For predictors that are not one-pass, we use streaming $K$-means to give a representative subset of fixed size that can be updated as data accumulate. Preliminary computational work suggests that the one-pass median version of the CMS method is rarely outperformed by the other methods for sufficiently complex data. We also find that predictors based on Gaussian process priors with random biases perform well. The Shtarkov predictors we use here did not perform as well probably because we were only using the simplest example. The other predictors seemed to perform well mainly when the data did not look like they came from an M-open data generator.	翻訳日:2024-08-05 13:07:59 公開日:2024-08-02
# マルチモーダル大言語モデルの総合的レビュー:諸課題における性能と課題 A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks ( http://arxiv.org/abs/2408.01319v1 ) ライセンス: Link先を確認	Jiaqi Wang, Hanqi Jiang, Yiheng Liu, Chong Ma, Xu Zhang, Yi Pan, Mengyuan Liu, Peiran Gu, Sichen Xia, Wenjun Li, Yutong Zhang, Zihao Wu, Zhengliang Liu, Tianyang Zhong, Bao Ge, Tuo Zhang, Ning Qiang, Xintao Hu, Xi Jiang, Xin Zhang, Wei Zhang, Dinggang Shen, Tianming Liu, Shu Zhang,	(参考訳) データの爆発的な成長と急速な技術進歩によって定義される時代において、マルチモーダル大言語モデル(MLLM)は人工知能(AI)システムの最前線にある。テキスト、画像、ビデオ、オーディオ、生理的シーケンスを含む多様なデータタイプをシームレスに統合するために設計されたMLLMは、単一のモダリティシステムの能力を超えた現実世界のアプリケーションの複雑さに対処する。本稿では,自然言語,視覚,音声などのマルチモーダルタスクにおけるMLLMの応用を体系的に整理する。また、タスクにおける異なるMLLMの焦点の比較分析を行い、現在のMLLMの欠点についての洞察を提供し、今後の研究の方向性を示唆する。これらの議論を通じて,MLLMのさらなる開発と応用に向けた貴重な知見を提供したいと考えている。 In an era defined by the explosive growth of data and rapid technological advancements, Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence (AI) systems. Designed to seamlessly integrate diverse data types-including text, images, videos, audio, and physiological sequences-MLLMs address the complexities of real-world applications far beyond the capabilities of single-modality systems. In this paper, we systematically sort out the applications of MLLM in multimodal tasks such as natural language, vision, and audio. We also provide a comparative analysis of the focus of different MLLMs in the tasks, and provide insights into the shortcomings of current MLLMs, and suggest potential directions for future research. Through these discussions, this paper hopes to provide valuable insights for the further development and application of MLLM.	翻訳日:2024-08-05 13:07:59 公開日:2024-08-02
# ロボットにインスパイアされたスキャンパスモデルによるダイナミックシーンにおける迷路誘導のための不確かさとセマンティックなオブジェクトキューの重要性 A Robotics-Inspired Scanpath Model Reveals the Importance of Uncertainty and Semantic Object Cues for Gaze Guidance in Dynamic Scenes ( http://arxiv.org/abs/2408.01322v1 ) ライセンス: Link先を確認	Vito Mengers, Nicolas Roth, Oliver Brock, Klaus Obermayer, Martin Rolfs,	(参考訳) 周囲の物体をどう知覚するかは、積極的に参加するものによって異なりますが、視線の動きは知覚対象に依存します。それでも、物体のセグメンテーションと視線行動は2つの独立したプロセスとして扱われる。ロボット工学からの情報処理パターンを描画し、これらの過程を動的現実世界のシーンにシミュレートするメカニスティックモデルを提案する。画像計算可能なモデルでは,現在のシーンセグメンテーションをオブジェクトベースのサスカディック決定に用いながら,そのシーンセグメンテーションを再帰的に洗練する。この改良をモデル化するためにベイズフィルタを使用し、アクティブなシーン探索を導くために使用するセグメンテーションに対する不確実性の推定も提供する。本研究では,このモデルが観測者の自由視聴行動とよく似ていることを示す。例えば,パラメータフィッティングに使用されるフェーベーション時間やササード振幅分布,およびフィッティングに使用されていない高次統計量などである。これには、オブジェクト検出、検査、返却のバランスの取れ方や、返却の時間的抑制の明示的な実装を伴わないササードの返却の遅れなどが含まれる。広範囲にわたるシミュレーションとアブレーション研究により、不確実性はバランスの取れた探索を促進し、セマンティック・オブジェクト・キューは、オブジェクトベースの注意に使用される知覚単位を形成するのに不可欠であることが示された。さらに,本モデルでは,サスカディック・モーメントを組み込んだり,サカディック・アテンションをプリサカディック・アテンションに組み込んだりすることで,その出力を人間のスキャンパスと整合させることが可能であることを示す。 How we perceive objects around us depends on what we actively attend to, yet our eye movements depend on the perceived objects. Still, object segmentation and gaze behavior are typically treated as two independent processes. Drawing on an information processing pattern from robotics, we present a mechanistic model that simulates these processes for dynamic real-world scenes. Our image-computable model uses the current scene segmentation for object-based saccadic decision-making while using the foveated object to refine its scene segmentation recursively. To model this refinement, we use a Bayesian filter, which also provides an uncertainty estimate for the segmentation that we use to guide active scene exploration. We demonstrate that this model closely resembles observers' free viewing behavior, measured by scanpath statistics, including foveation duration and saccade amplitude distributions used for parameter fitting and higher-level statistics not used for fitting. These include how object detections, inspections, and returns are balanced and a delay of returning saccades without an explicit implementation of such temporal inhibition of return. Extensive simulations and ablation studies show that uncertainty promotes balanced exploration and that semantic object cues are crucial to form the perceptual units used in object-based attention. Moreover, we show how our model's modular design allows for extensions, such as incorporating saccadic momentum or pre-saccadic attention, to further align its output with human scanpaths.	翻訳日:2024-08-05 13:07:59 公開日:2024-08-02
# FANNO: オープンソース LLM のみによる高品質なインストラクションデータの拡張 FANNO: Augmenting High-Quality Instruction Data with Open-Sourced LLMs Only ( http://arxiv.org/abs/2408.01323v1 ) ライセンス: Link先を確認	He Zhu, Junyou Su, Tianle Lun, Yicheng Tao, Wenjia Zhang, Zipei Fan, Guanhua Chen,	(参考訳) インストラクションの微調整は、タスクパフォーマンスを向上させるために大きな言語モデル(LLM)を活用する上で重要な進歩である。しかし、命令データセットのアノテーションは伝統的に高価であり、しばしば手動のアノテーションやプロプライエタリなLLMのAPI呼び出しに依存している。これらの課題に対処するため、既存のアノテーション付きデータを必要とせずにアノテーションプロセスに革命をもたらす、完全に自律的なオープンソースフレームワークであるFANNOを紹介します。 Mistral-7b-インストラクタモデルを用いて、FANNOは文書事前スクリーニング、命令生成、応答生成を含む構造化プロセスを通じて、多種多様な高品質なデータセットを効率的に生成する。 Open LLM LeaderboardとAlpacaEvalベンチマークの実験によると、FANNOは、Alpaca-GPT4-Cleanedのような人間による注釈付きまたはクリーン化されたデータセットに匹敵する、多様性と複雑さを備えた高品質なデータを無償で生成できる。 Instruction fine-tuning stands as a crucial advancement in leveraging large language models (LLMs) for enhanced task performance. However, the annotation of instruction datasets has traditionally been expensive and laborious, often relying on manual annotations or costly API calls of proprietary LLMs. To address these challenges, we introduce FANNO, a fully autonomous, open-sourced framework that revolutionizes the annotation process without the need for pre-existing annotated data. Utilizing a Mistral-7b-instruct model, FANNO efficiently produces diverse and high-quality datasets through a structured process involving document pre-screening, instruction generation, and response generation. Experiments on Open LLM Leaderboard and AlpacaEval benchmark show that the FANNO can generate high-quality data with diversity and complexity for free, comparable to human-annotated or cleaned datasets like Alpaca-GPT4-Cleaned.	翻訳日:2024-08-05 13:07:59 公開日:2024-08-02
# UnifiedNN: クラウド上での効率的なニューラルネットワークトレーニング UnifiedNN: Efficient Neural Network Training on the Cloud ( http://arxiv.org/abs/2408.01331v1 ) ライセンス: Link先を確認	Sifat Ut Taki, Spyridon Mastorakis, Arthi Padmanabhan,	(参考訳) 今日では、クラウドベースのサービスは、ニューラルネットワーク(NN)モデルをローカルにトレーニングする従来のアプローチよりも、広く支持されています。多くの場合、クラウドサービスは、複数のNNモデルを同時にトレーニングするユーザからの複数のリクエストを処理する。しかし、NNモデルを同時にトレーニングすることは難しいプロセスであり、通常は大量の利用可能なコンピューティングリソースを必要とし、完成までに長い時間がかかる。本稿では,クラウド上で複数のNNモデルを効果的にトレーニングするためのUnifiedNNを提案する。 UnifiedNNは、複数のNNモデルを効果的に"結合"し、トレーニングプロセスの正確性に影響を与えることなく、複数のNNモデルを同時にトレーニングするためのメモリと時間保存機構を備えている。具体的には、UnifiedNNは複数のNNモデルをマージし、全てのモデルを効率的に訓練するために大きな特異統一モデルを生成する。我々はPyTorchでUnifiedNNのプロトタイプを実装し、そのパフォーマンスを関連する最先端フレームワークと比較した。実験の結果,UnifiedNNは,モデルトレーニングやテスト精度に影響を与えることなく,バニラPyTorchと比較して最大53%,トレーニング時間は最大81%削減できることがわかった。最後に、UnifiedNNは、複数のモデルを同時にトレーニングする際の最先端フレームワークと比較して、メモリ消費を最大52%削減し、トレーニング時間を最大41%削減できることを示す。 Nowadays, cloud-based services are widely favored over the traditional approach of locally training a Neural Network (NN) model. Oftentimes, a cloud service processes multiple requests from users--thus training multiple NN models concurrently. However, training NN models concurrently is a challenging process, which typically requires significant amounts of available computing resources and takes a long time to complete. In this paper, we present UnifiedNN to effectively train multiple NN models concurrently on the cloud. UnifiedNN effectively "combines" multiple NN models and features several memory and time conservation mechanisms to train multiple NN models simultaneously without impacting the accuracy of the training process. Specifically, UnifiedNN merges multiple NN models and creates a large singular unified model in order to efficiently train all models at once. We have implemented a prototype of UnifiedNN in PyTorch and we have compared its performance with relevant state-of-the-art frameworks. Our experimental results demonstrate that UnifiedNN can reduce memory consumption by up to 53% and training time by up to 81% when compared with vanilla PyTorch without impacting the model training and testing accuracy. Finally, our results indicate that UnifiedNN can reduce memory consumption by up to 52% and training time by up to 41% when compared to state-of-the-art frameworks when training multiple models concurrently.	翻訳日:2024-08-05 12:58:15 公開日:2024-08-02
# HMDN:クリックスルーレート予測のための階層型マルチディストリビューションネットワーク HMDN: Hierarchical Multi-Distribution Network for Click-Through Rate Prediction ( http://arxiv.org/abs/2408.01332v1 ) ライセンス: Link先を確認	Xingyu Lou, Yu Yang, Kuiyao Dong, Heyuan Huang, Wenyi Yu, Ping Wang, Xiu Li, Jun Wang,	(参考訳) このレコメンデーションサービスは、マルチポピュレーション、マルチセサリオ、マルチターゲット、マルチ関心といった多様な分布に対処する必要があるため、近年では、マルチディストリビューション・モデリングに焦点を合わせ、大きな進歩を遂げている。しかし、それらの多くは単一のマルチディストリビューション方式でモデリングすることしか考えておらず、混合マルチディストリビューションはしばしば共存し階層的な関係を形成することを無視している。これらの課題に対処するため,HMDN (Hierarchical Multi-Distribution Network) というフレキシブル・モデリング・パラダイムを提案し,これらの階層的関係を効率的にモデル化し,Mixture of-Experts (MoE) やDynamic-Weight (DW) モデルといった既存のマルチ・ディストリビューション手法とシームレスに統合する。具体的には、まず階層的な多重分布表現精製モジュールを設計し、多レベル残差量子化を用いて微細な階層表現を得る。そして、洗練された階層表現を既存の単一マルチディストリビューションモデルに統合し、それらを混合マルチディストリビューションモデルにシームレスに拡張する。 HMDNの有効性と柔軟性は,公立および工業用両方のデータセットで実験的に検証された。 As the recommendation service needs to address increasingly diverse distributions, such as multi-population, multi-scenario, multitarget, and multi-interest, more and more recent works have focused on multi-distribution modeling and achieved great progress. However, most of them only consider modeling in a single multi-distribution manner, ignoring that mixed multi-distributions often coexist and form hierarchical relationships. To address these challenges, we propose a flexible modeling paradigm, named Hierarchical Multi-Distribution Network (HMDN), which efficiently models these hierarchical relationships and can seamlessly integrate with existing multi-distribution methods, such as Mixture of-Experts (MoE) and Dynamic-Weight (DW) models. Specifically, we first design a hierarchical multi-distribution representation refinement module, employing a multi-level residual quantization to obtain fine-grained hierarchical representation. Then, the refined hierarchical representation is integrated into the existing single multi-distribution models, seamlessly expanding them into mixed multi-distribution models. Experimental results on both public and industrial datasets validate the effectiveness and flexibility of HMDN.	翻訳日:2024-08-05 12:58:15 公開日:2024-08-02
# 長距離ロボットタスク理解のためのバックボーン A Backbone for Long-Horizon Robot Task Understanding ( http://arxiv.org/abs/2408.01334v1 ) ライセンス: Link先を確認	Xiaoshuai Chen, Wei Chen, Dongmyoung Lee, Yukun Ge, Nicolas Rojas, Petar Kormushev,	(参考訳) エンド・ツー・エンドのロボット・ラーニング、特にロングホライズン・タスクは予測不可能な結果と一般化の欠如をもたらすことが多い。これらの課題に対処するために,ロボットのタスク理解と伝達性を高めるために,新しいTBBF(Therblig-based Backbone Framework)を提案する。このフレームワークは、ベースアクション要素(therblig)をバックボーンとして、ハイレベルロボットタスクを要素ロボット構成に分解し、タスク理解を改善するために現在の基礎モデルと統合する。このアプローチは、オフライントレーニングとオンラインテストの2つのステージで構成されている。オフライントレーニングの段階では,様々なタスクにまたがる正確なサービグセグメンテーションのためのMeta-RGate SynerFusion (MGSF) ネットワークを開発した。オンラインテスト段階では、新しいタスクのワンショットデモが収集された後、MGSFネットワークはハイレベルな知識を抽出し、アクション登録(ActionREG)を使用して画像にエンコードする。さらに、視覚矯正のためのLarge Language Model (LLM)-Alignment Policy for Visual Correction (LAP-VC) が採用され、新しいロボットシナリオにおける軌道伝達を容易にする。実験結果はこれらの手法を検証し、94.37%のリコールと94.4%と80%の成功率を達成した。追加資料は以下の通り。 https://sites.google.com/view/therbligs basedbackbone/home End-to-end robot learning, particularly for long-horizon tasks, often results in unpredictable outcomes and poor generalization. To address these challenges, we propose a novel Therblig-based Backbone Framework (TBBF) to enhance robot task understanding and transferability. This framework uses therbligs (basic action elements) as the backbone to decompose high-level robot tasks into elemental robot configurations, which are then integrated with current foundation models to improve task understanding. The approach consists of two stages: offline training and online testing. During the offline training stage, we developed the Meta-RGate SynerFusion (MGSF) network for accurate therblig segmentation across various tasks. In the online testing stage, after a one-shot demonstration of a new task is collected, our MGSF network extracts high-level knowledge, which is then encoded into the image using Action Registration (ActionREG). Additionally, the Large Language Model (LLM)-Alignment Policy for Visual Correction (LAP-VC) is employed to ensure precise action execution, facilitating trajectory transfer in novel robot scenarios. Experimental results validate these methods, achieving 94.37% recall in therblig segmentation and success rates of 94.4% and 80% in real-world online robot testing for simple and complex scenarios, respectively. Supplementary material is available at: https://sites.google.com/view/therbligsbasedbackbone/home	翻訳日:2024-08-05 12:58:15 公開日:2024-08-02
# 騒音と共変物質が重畳されたり、外部から汚染されたりした場合の疎線形回帰 Sparse Linear Regression when Noises and Covariates are Heavy-Tailed and Contaminated by Outliers ( http://arxiv.org/abs/2408.01336v1 ) ライセンス: Link先を確認	Takeyuki Sasai, Hironori Fujisawa,	(参考訳) 重み付き分布から共変量と雑音をサンプリングする場合, 空間性仮定の下で線形回帰係数を推定する問題について検討する。また, 重尾分布から共変量やノイズを採取するだけでなく, オフレーヤによって汚染される状況も検討する。我々の推定器は効率的に計算でき、鋭い誤差境界を示すことができる。 We investigate a problem estimating coefficients of linear regression under sparsity assumption when covariates and noises are sampled from heavy tailed distributions. Additionally, we consider the situation where not only covariates and noises are sampled from heavy tailed distributions but also contaminated by outliers. Our estimators can be computed efficiently, and exhibit sharp error bounds.	翻訳日:2024-08-05 12:58:15 公開日:2024-08-02
# MuChoMusic:マルチモーダルオーディオ言語モデルによる音楽理解の評価 MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models ( http://arxiv.org/abs/2408.01337v1 ) ライセンス: Link先を確認	Benno Weck, Ilaria Manco, Emmanouil Benetos, Elio Quinton, George Fazekas, Dmitry Bogdanov,	(参考訳) 音声と言語を共同で処理するマルチモーダルモデルは、音声理解において大きな可能性を秘めており、音楽分野においてますます採用されている。ユーザがテキストで検索し、与えられた音声入力に関する情報を入手できるようにすることで、これらのモデルは言語ベースのインタフェースを通じて様々な音楽理解タスクを可能にする可能性がある。しかし,その評価にはかなりの課題があり,音楽関連入力を現在の手法で正しく解釈する能力をどのように効果的に評価するかは定かではない。そこで本研究では,音声に着目したマルチモーダル言語モデルにおける音楽理解のベンチマークであるMuChoMusicを紹介する。 MuChoMusicは1,187の質問から成り、いずれも人間のアノテータによって検証され、2つのパブリックな音楽データセットから得られた644曲の楽曲に収録され、様々なジャンルをカバーする。このベンチマークの質問は、基本的な音楽概念と文化的・機能的文脈との関係を網羅する知識と推論能力を評価するために作成されている。ベンチマークで得られた全体分析を通じて、5つのオープンソースモデルを評価し、言語モダリティへの過度な依存を含むいくつかの落とし穴を特定し、より優れたマルチモーダル統合の必要性を示している。データとコードはオープンソースである。 Multimodal models that jointly process audio and language hold great promise in audio understanding and are increasingly being adopted in the music domain. By allowing users to query via text and obtain information about a given audio input, these models have the potential to enable a variety of music understanding tasks via language-based interfaces. However, their evaluation poses considerable challenges, and it remains unclear how to effectively assess their ability to correctly interpret music-related inputs with current methods. Motivated by this, we introduce MuChoMusic, a benchmark for evaluating music understanding in multimodal language models focused on audio. MuChoMusic comprises 1,187 multiple-choice questions, all validated by human annotators, on 644 music tracks sourced from two publicly available music datasets, and covering a wide variety of genres. Questions in the benchmark are crafted to assess knowledge and reasoning abilities across several dimensions that cover fundamental musical concepts and their relation to cultural and functional contexts. Through the holistic analysis afforded by the benchmark, we evaluate five open-source models and identify several pitfalls, including an over-reliance on the language modality, pointing to a need for better multimodal integration. Data and code are open-sourced.	翻訳日:2024-08-05 12:58:15 公開日:2024-08-02
# 量子LDPC符号の時間効率論理演算 Time-efficient logical operations on quantum LDPC codes ( http://arxiv.org/abs/2408.01339v1 ) ライセンス: Link先を確認	Guo Zhang, Ying Li,	(参考訳) 任意の可換論理パウリ作用素の集合を演算子数に依存しない時間で測定できるスキームを提案する。唯一の条件は可換性であり、量子力学における同時測定の基本的な要件である。量子低密度パリティチェック(LDPC)符号は、フォールトトレラント量子コンピューティングの実現に大いに期待できる。比較的少ない物理量子ビットを用いて多くの論理量子ビットを符号化できるため、初期のフォールトトレラント技術では特に重要である。論理演算子の同時測定により、完全に並列化された量子計算が可能となり、計算時間を最小化できる。提案方式は任意の量子LDPC符号に適用可能であり,複数の論理演算子を同時に測定しながらパリティチェックの低密度を維持する。これらの結果から, 早期耐故障技術の適用の可能性が高まった。 We propose schemes capable of measuring an arbitrary set of commutative logical Pauli operators in time independent of the number of operators. The only condition is commutativity, a fundamental requirement for simultaneous measurements in quantum mechanics. Quantum low-density parity check (LDPC) codes show great promise for realising fault-tolerant quantum computing. They are particularly significant for early fault-tolerant technologies as they can encode many logical qubits using relatively few physical qubits. By achieving simultaneous measurements of logical operators, our approaches enable fully parallelised quantum computing, thus minimising computation time. Our schemes are applicable to any quantum LDPC codes and maintain the low density of parity checks while measuring multiple logical operators simultaneously. These results enhance the feasibility of applying early fault-tolerant technologies to practical problems.	翻訳日:2024-08-05 12:58:15 公開日:2024-08-02
# 効果的な会話レコメンデーションのための知識グラフ埋め込みの活用 Leveraging Knowledge Graph Embedding for Effective Conversational Recommendation ( http://arxiv.org/abs/2408.01342v1 ) ライセンス: Link先を確認	Yunwen Xia, Hui Fang, Jie Zhang, Chong Long,	(参考訳) 近年,対話システムと推薦システムを組み合わせた会話推薦システム (CRS) が注目されている。従来のレコメンデータシステムとは対照的に、対話(会話など)を通じてユーザの好みを学習し、さらにレコメンデーション性能を高める。しかし、CRSに関する既存の研究は属性、ユーザ、項目間の関係を効果的に扱うことを無視しており、不適切な質問や不正確なレコメンデーションにつながる可能性がある。本稿では,知識グラフに基づく会話推薦システム(KG-CRS)を提案する。具体的には、まず、ユーザ・テムグラフとアイテム・アトリビュートグラフを動的グラフ、すなわち、負の項目や属性を取り除いて対話プロセス中に動的に変化するグラフに統合する。次に、ユーザ、アイテム、属性の情報埋め込みを、グラフ上の隣人の伝播も考慮して学習する。 3つの実際のデータセットに対する大規模な実験は、提案課題と会話課題の両方の観点から、最先端のアプローチよりも、我々の手法の優位性を検証している。 Conversational recommender system (CRS), which combines the techniques of dialogue system and recommender system, has obtained increasing interest recently. In contrast to traditional recommender system, it learns the user preference better through interactions (i.e. conversations), and then further boosts the recommendation performance. However, existing studies on CRS ignore to address the relationship among attributes, users, and items effectively, which might lead to inappropriate questions and inaccurate recommendations. In this view, we propose a knowledge graph based conversational recommender system (referred as KG-CRS). Specifically, we first integrate the user-item graph and item-attribute graph into a dynamic graph, i.e., dynamically changing during the dialogue process by removing negative items or attributes. We then learn informative embedding of users, items, and attributes by also considering propagation through neighbors on the graph. Extensive experiments on three real datasets validate the superiority of our method over the state-of-the-art approaches in terms of both the recommendation and conversation tasks.	翻訳日:2024-08-05 12:58:15 公開日:2024-08-02
# StitchFusion: マルチモーダルセマンティックセマンティックセマンティックセグメンテーションを促進するために、あらゆる視覚モダリティを織る StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation ( http://arxiv.org/abs/2408.01343v1 ) ライセンス: Link先を確認	Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li,	(参考訳) 多モーダルセマンティックセグメンテーションは複雑なシーンにおけるセグメンテーションの精度を高める重要な可能性を示している。しかし、現在の手法では、特定のモダリティに合わせて特別な機能融合モジュールを組み込んでおり、入力の柔軟性を制限し、トレーニングパラメータの数を増やしている。これらの課題に対処するために,大規模な事前学習モデルを直接エンコーダや機能フューザとして統合する,単純かつ効果的なモーダル融合フレームワークであるStitchFusionを提案する。このアプローチは、視覚的なモーダル入力を調節する、包括的なマルチモーダルとマルチスケールの機能融合を促進する。具体的には,マルチモーダル視覚情報を共有することで,符号化時のモーダル統合を実現する。モダリティ間の情報交換を強化するため,エンコーディング中に多方向アダプタモジュール(MultiAdapter)を導入し,モダリティ間の情報伝達を実現する。符号化プロセスにおいて、MultiAdapterを活用して、事前訓練されたエンコーダ間でマルチスケール情報を伝達することにより、エンコーダ中にマルチモーダルな視覚情報統合を実現する。大規模比較実験により,最小限の追加パラメータを持つ4つのマルチモーダルセグメンテーションデータセットの最先端性能が得られた。さらに、MultiAdapterと既存のFeature Fusion Modules (FFMs)との実験的統合は、それらの相補的な性質を強調している。私たちのコードはStitchFusion_repoで利用可能です。 Multimodal semantic segmentation shows significant potential for enhancing segmentation accuracy in complex scenes. However, current methods often incorporate specialized feature fusion modules tailored to specific modalities, thereby restricting input flexibility and increasing the number of training parameters. To address these challenges, we propose StitchFusion, a straightforward yet effective modal fusion framework that integrates large-scale pre-trained models directly as encoders and feature fusers. This approach facilitates comprehensive multi-modal and multi-scale feature fusion, accommodating any visual modal inputs. Specifically, Our framework achieves modal integration during encoding by sharing multi-modal visual information. To enhance information exchange across modalities, we introduce a multi-directional adapter module (MultiAdapter) to enable cross-modal information transfer during encoding. By leveraging MultiAdapter to propagate multi-scale information across pre-trained encoders during the encoding process, StitchFusion achieves multi-modal visual information integration during encoding. Extensive comparative experiments demonstrate that our model achieves state-of-the-art performance on four multi-modal segmentation datasets with minimal additional parameters. Furthermore, the experimental integration of MultiAdapter with existing Feature Fusion Modules (FFMs) highlights their complementary nature. Our code is available at StitchFusion_repo.	翻訳日:2024-08-05 12:58:15 公開日:2024-08-02
# プロンプトリファインメント・ファインチューニング : 計算社会科学課題におけるLCMの活用のためのベストプラクティス Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks ( http://arxiv.org/abs/2408.01346v1 ) ライセンス: Link先を確認	Anders Giovanni Møller, Luca Maria Aiello,	(参考訳) 大規模言語モデルは、計算社会科学におけるテキスト理解の複雑なタスクを可能にする表現ツールである。彼らの汎用性は有益ではあるが、この分野における標準化されたベストプラクティスを確立するための障壁となる。異なる戦略の価値を明確にするために、23の社会的知識タスクのベンチマークにおいて、現代のLCMに基づく分類手法の性能について概説する。結果から,より大きな語彙と事前学習コーパスを持つモデルを選択すること,AI強化プロンプトを優先する単純なゼロショットを避けること,タスク固有のデータに微調整を行うこと,さらにトレーニングデータが豊富である場合にのみ,複数のデータセット上でより複雑な形式の命令チューニングを検討すること,の3つのベストプラクティスが示唆された。 Large Language Models are expressive tools that enable complex tasks of text understanding within Computational Social Science. Their versatility, while beneficial, poses a barrier for establishing standardized best practices within the field. To bring clarity on the values of different strategies, we present an overview of the performance of modern LLM-based classification methods on a benchmark of 23 social knowledge tasks. Our results point to three best practices: select models with larger vocabulary and pre-training corpora; avoid simple zero-shot in favor of AI-enhanced prompting; fine-tune on task-specific data, and consider more complex forms instruction-tuning on multiple datasets only when only training data is more abundant.	翻訳日:2024-08-05 12:58:15 公開日:2024-08-02
# PC$^2$: Pseudo-classification based Pseudo-Captioning for Noisy Cor correspondingence Learning in Cross-Modal Retrieval (特集:情報ネットワーク) PC$^2$: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal Retrieval ( http://arxiv.org/abs/2408.01349v1 ) ライセンス: Link先を確認	Yue Duan, Zhangxuan Gu, Zhenzhe Ying, Lei Qi, Changhua Meng, Yinghuan Shi,	(参考訳) クロスモーダル検索の分野では、特にノイズ対応学習(NCL)によって引き起こされる複雑さを考えると、マルチメディアに多様なモダリティをシームレスに統合することは、依然として困難な課題である。このようなノイズは、しばしば、従来のノイズラベルとは異なる重要な障害である、ミスマッチしたデータペアに由来する。本稿では, Pseudo-Classification をベースとした Pseudo-Captioning (PC$^2$) フレームワークを提案する。 PC$^2$は3つの戦略を提供する: 第一に、キャプションをカテゴリラベルとして解釈する補助的な「擬似分類」タスクを確立し、非競合メカニズムを通じて画像テキストのセマンティックな類似性を学ぶためにモデルを操縦する。第二に、PC$^2$の擬似分類能力に乗じて、一般的なマージンベースの手法とは異なり、擬似キャプションを生成し、各ペアについてより情報的で具体的な監視を行う。第3に、擬似分類の発振は、対応の補正を支援するために借りられる。技術的貢献に加えて、ノイズが自然に存在する新しい強力なNCLベンチマークであるNoWと呼ばれる現実的なNCLデータセットを開発する。 PC$^2$の実証評価は、様々なNCL設定のシミュレーションおよび現実的データセット上で、既存の最先端の堅牢なクロスモーダル検索技術よりも顕著に改善された。投稿されたデータセットとソースコードはhttps://github.com/alipay/PC2-NoiseofWebで公開されている。 In the realm of cross-modal retrieval, seamlessly integrating diverse modalities within multimedia remains a formidable challenge, especially given the complexities introduced by noisy correspondence learning (NCL). Such noise often stems from mismatched data pairs, which is a significant obstacle distinct from traditional noisy labels. This paper introduces Pseudo-Classification based Pseudo-Captioning (PC$^2$) framework to address this challenge. PC$^2$ offers a threefold strategy: firstly, it establishes an auxiliary "pseudo-classification" task that interprets captions as categorical labels, steering the model to learn image-text semantic similarity through a non-contrastive mechanism. Secondly, unlike prevailing margin-based techniques, capitalizing on PC$^2$'s pseudo-classification capability, we generate pseudo-captions to provide more informative and tangible supervision for each mismatched pair. Thirdly, the oscillation of pseudo-classification is borrowed to assistant the correction of correspondence. In addition to technical contributions, we develop a realistic NCL dataset called Noise of Web (NoW), which could be a new powerful NCL benchmark where noise exists naturally. Empirical evaluations of PC$^2$ showcase marked improvements over existing state-of-the-art robust cross-modal retrieval techniques on both simulated and realistic datasets with various NCL settings. The contributed dataset and source code are released at https://github.com/alipay/PC2-NoiseofWeb.	翻訳日:2024-08-05 12:58:15 公開日:2024-08-02
# MCGMark: LLM生成した悪意のあるコードのための暗号化可能でロバストなオンライン透かし MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code ( http://arxiv.org/abs/2408.01354v1 ) ライセンス: Link先を確認	Kaiwen Ning, Jiachi Chen, Qingyuan Zhong, Tao Zhang, Yanlin Wang, Wei Li, Yu Zhang, Weizhe Zhang, Zibin Zheng,	(参考訳) 大規模言語モデル(LLM)の出現に伴い、多くのソフトウェアサービスプロバイダ(SSP)がCodeLlamaやCopilotといったコード生成タスク用にカスタマイズされたLLMの開発に専念している。しかし、これらのLSMは攻撃者によって悪質なソフトウェアを作成するために利用することができ、ソフトウェアエコシステムに潜在的な脅威をもたらす可能性がある。例えば、高度なフィッシングマルウェアの作成を自動化することができる。この問題に対処するため、我々はまず実験的な研究を行い、約400人の作業時間を含む406の悪意のあるコード生成タスクからなるプロンプトデータセットであるMCGTestを設計する。このデータセットを利用することで、LCM生成コードを追跡するための、最初の堅牢で、コード構造を認識し、エンコード可能な透かしアプローチであるMCGMarkを提案する。トークンの選択を制御し,確率的アウトレイラに基づく出力品質を確保することで,エンコード可能な情報を埋め込む。さらに、悪意のあるコードの構造的特徴を考慮することで、透かしの堅牢性を高め、コメントなどの容易に修正された位置に透かしの埋め込みを防止する。我々は,DeepSeek-CoderにおけるMCGMarkの有効性とロバスト性を検証した。 MCGMarkは最大出力限界の400トークン内に88.9%の埋め込み成功率を達成する。さらに、強い堅牢性を示し、出力コードの品質に最小限の影響を与える。我々のアプローチは、LSMが生成した悪意のあるコードに対して責任ある当事者を追跡・保持するSSPを支援する。 With the advent of large language models (LLMs), numerous software service providers (SSPs) are dedicated to developing LLMs customized for code generation tasks, such as CodeLlama and Copilot. However, these LLMs can be leveraged by attackers to create malicious software, which may pose potential threats to the software ecosystem. For example, they can automate the creation of advanced phishing malware. To address this issue, we first conduct an empirical study and design a prompt dataset, MCGTest, which involves approximately 400 person-hours of work and consists of 406 malicious code generation tasks. Utilizing this dataset, we propose MCGMark, the first robust, code structure-aware, and encodable watermarking approach to trace LLM-generated code. We embed encodable information by controlling the token selection and ensuring the output quality based on probabilistic outliers. Additionally, we enhance the robustness of the watermark by considering the structural features of malicious code, preventing the embedding of the watermark in easily modified positions, such as comments. We validate the effectiveness and robustness of MCGMark on the DeepSeek-Coder. MCGMark achieves an embedding success rate of 88.9% within a maximum output limit of 400 tokens. Furthermore, it also demonstrates strong robustness and has minimal impact on the quality of the output code. Our approach assists SSPs in tracing and holding responsible parties accountable for malicious code generated by LLMs.	翻訳日:2024-08-05 12:58:15 公開日:2024-08-02
# Hallu-PI:摂動入力における多モーダル大言語モデルにおける幻覚の評価 Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs ( http://arxiv.org/abs/2408.01355v1 ) ライセンス: Link先を確認	Peng Ding, Jingyu Wu, Jun Kuang, Dan Ma, Xuezhi Cao, Xunliang Cai, Shi Chen, Jiajun Chen, Shujian Huang,	(参考訳) MLLM(Multi-modal Large Language Models)は、様々な視覚的言語理解および生成タスクにおいて顕著な性能を示す。しかし、MLLMは時に「幻覚」と呼ばれる与えられた画像と矛盾する内容を生成する。従来の研究は主に、MLLMの幻覚を包括的に評価する上で重要な、画像の収穫やぼやけといった現実のシナリオにおいて、乱れた入力が頻繁に発生するのを見落としている標準の未成熟なベンチマークを用いて幻覚を評価することに焦点を当てていた。本稿では,このギャップを埋めるために,摂動入力におけるMLLMのハロシン化を評価するための最初のベンチマークであるHau-PIを提案する。具体的には、Hau-PIは7つの摂動シナリオで構成され、11種類のオブジェクトから1,260の摂動イメージを含んでいる。それぞれの画像には詳細な注釈が添付されており、そこには存在、属性、関係など、細かい幻覚のタイプが含まれている。我々はこれらのアノテーションにリッチな質問セットを設け、識別的タスクと生成的タスクの両方に適合させる。 GPT-4VやGemini-Pro Visionのような12種類のMLLMの大規模な実験は、これらのモデルが未成熟のシナリオでは観測されないハルーPIに顕著な幻覚を示すことを示した。さらに,本研究では,MLLMがさまざまな幻覚を処理できる能力に深刻な偏りがあることが判明した。また、摂動シナリオ(Perturbed-Reminder)と摂動ICL(Perturbed-ICL)の2つのベースラインを設計する。我々の研究は、摂動入力に対処する際のMLLMの限界に研究者の注意を向け、この問題に対処するためのさらなる調査を促すことを願っている。私たちのコードとデータセットはhttps://github.com/NJUNLP/Hallu-PIで公開されています。 Multi-modal Large Language Models (MLLMs) have demonstrated remarkable performance on various visual-language understanding and generation tasks. However, MLLMs occasionally generate content inconsistent with the given images, which is known as "hallucination". Prior works primarily center on evaluating hallucination using standard, unperturbed benchmarks, which overlook the prevalent occurrence of perturbed inputs in real-world scenarios-such as image cropping or blurring-that are critical for a comprehensive assessment of MLLMs' hallucination. In this paper, to bridge this gap, we propose Hallu-PI, the first benchmark designed to evaluate Hallucination in MLLMs within Perturbed Inputs. Specifically, Hallu-PI consists of seven perturbed scenarios, containing 1,260 perturbed images from 11 object types. Each image is accompanied by detailed annotations, which include fine-grained hallucination types, such as existence, attribute, and relation. We equip these annotations with a rich set of questions, making Hallu-PI suitable for both discriminative and generative tasks. Extensive experiments on 12 mainstream MLLMs, such as GPT-4V and Gemini-Pro Vision, demonstrate that these models exhibit significant hallucinations on Hallu-PI, which is not observed in unperturbed scenarios. Furthermore, our research reveals a severe bias in MLLMs' ability to handle different types of hallucinations. We also design two baselines specifically for perturbed scenarios, namely Perturbed-Reminder and Perturbed-ICL. We hope that our study will bring researchers' attention to the limitations of MLLMs when dealing with perturbed inputs, and spur further investigations to address this issue. Our code and datasets are publicly available at https://github.com/NJUNLP/Hallu-PI.	翻訳日:2024-08-05 12:58:15 公開日:2024-08-02
# 3次元クラウドクラスインクリメンタルセマンティックセグメンテーションのための残差蒸留学習 Balanced Residual Distillation Learning for 3D Point Cloud Class-Incremental Semantic Segmentation ( http://arxiv.org/abs/2408.01356v1 ) ライセンス: Link先を確認	Yuanzhi Su, Siyuan Chen, Yuan-Gen Wang,	(参考訳) クラスインクリメンタルラーニング(CIL)は,新たなクラスを継続的に追加して学習することで情報の流入を処理し,古いクラスに対する破滅的な忘れ込みを防ぐことで成長する。 CILのパフォーマンスのブレークスルーは、過去の知識をベースモデルから効果的に洗練し、それを新しい学習とバランスさせることが不可欠です。しかし、この問題は現在の研究ではまだ検討されていない。本研究では, これらの観点からCILの可能性を探究し, CILの性能バーをより高レベルに押し上げるために, 新しい残圧蒸留フレームワーク(BRD-CIL)を提案する。具体的には、BRD-CILは、ネットワーク構造を動的に拡張し、ベースモデルとターゲットモデルの間の残差を捕捉し、過去の知識を効果的に精製する残差蒸留学習戦略を設計する。さらに、BRD-CILは、古いクラスの嗜好を減らし、新しいクラスと古いクラスとのバランスの取れた学習を確保するためのガイダンスマスクを生成することで、バランスの取れた擬似ラベル学習戦略を設計する。提案したBRD-CILを,データを非順序で非構造化した3Dポイントクラウドセマンティックセマンティックセグメンテーションタスクに適用する。大規模な実験結果から,BRD-CILは,クラスバイアスシナリオにおけるバランス能力に優れる新しいベンチマークを設定できることがわかった。 Class-incremental learning (CIL) thrives due to its success in processing the influx of information by learning from continuously added new classes while preventing catastrophic forgetting about the old ones. It is essential for the performance breakthrough of CIL to effectively refine past knowledge from the base model and balance it with new learning. However, such an issue has not yet been considered in current research. In this work, we explore the potential of CIL from these perspectives and propose a novel balanced residual distillation framework (BRD-CIL) to push the performance bar of CIL to a new higher level. Specifically, BRD-CIL designs a residual distillation learning strategy, which can dynamically expand the network structure to capture the residuals between the base and target models, effectively refining the past knowledge. Furthermore, BRD-CIL designs a balanced pseudo-label learning strategy by generating a guidance mask to reduce the preference for old classes, ensuring balanced learning from new and old classes. We apply the proposed BRD-CIL to a challenging 3D point cloud semantic segmentation task where the data are unordered and unstructured. Extensive experimental results demonstrate that BRD-CIL sets a new benchmark with an outstanding balance capability in class-biased scenarios.	翻訳日:2024-08-05 12:58:15 公開日:2024-08-02
# 多粒子蒸留エンタングルメントのデバイス非依存認証 Device-Independent Certification of Multipartite Distillable Entanglement ( http://arxiv.org/abs/2408.01357v1 ) ライセンス: Link先を確認	Aby Philip, Mark M. Wilde,	(参考訳) 量子ネットワークは様々な量子技術で構成され、広範囲に分散し、同時に様々なユーザーを巻き込む。個々のコンポーネントの機能と効率の証明は、よく研究され広く使われているタスクである。しかし、量子ネットワークのパワーは、多くのユーザに対して必要な量子技術とプラットフォームをすべて統合することによってのみ実現できる。本研究では, 量子ネットワークが生成する多粒子状態において, 構成成分の物理的実現に頼ることなく, 蒸留可能な絡み合いを認証する方法を実証する。私たちはデバイス独立というパラダイムを使ってそうしています。 Quantum networks consist of various quantum technologies, spread across vast distances, and involve various users at the same time. Certifying the functioning and efficiency of the individual components is a task that is well studied and widely used. However, the power of quantum networks can only be realized by integrating all the required quantum technologies and platforms across a large number of users. In this work, we demonstrate how to certify the distillable entanglement available in multipartite states produced by quantum networks, without relying on the physical realization of its constituent components. We do so by using the paradigm of device independence.	翻訳日:2024-08-05 12:58:15 公開日:2024-08-02
# 関数空間におけるオートエンコーダ Autoencoders in Function Space ( http://arxiv.org/abs/2408.01362v1 ) ライセンス: Link先を確認	Justin Bunker, Mark Girolami, Hefin Lambley, Andrew M. Stuart, T. J. Sullivan,	(参考訳) オートエンコーダは、元の決定論的形式と変分的定式化(VAE)の両方において広く応用されている。科学的応用においては、しばしば関数で構成されたデータを考えることに関心がある。実際には、離散化(科学で生じる微分方程式)やピクセル化(画像の微分方程式)は、有限次元の問題を生じさせるが、関数に作用するアルゴリズムの第一に着目し、それを識別またはピクセル化するだけで、異なるレベルの離散化またはピクセル化の間をスムーズに操作するアルゴリズムがより良くなる。本稿では,自動エンコーダ(FAE)と変分自動エンコーダ(FVAE)の関数空間バージョンを導入し,解析し,展開する。 VAEs を管理する目的関数の well-definedness は、有限次元においても微妙な問題である。 FVAEの目的は、データ分布が選択された生成モデルと互換性があるときによく定義される。 FAEの目的はより広く有効であり、微分方程式によって支配されるデータに直接適用することができる。これらの目的を任意のメッシュで評価可能なニューラル演算子アーキテクチャで達成することで、オートエンコーダの新たな応用により、科学的データの着色、超解像化、生成モデリングが可能になる。 Autoencoders have found widespread application, in both their original deterministic form and in their variational formulation (VAEs). In scientific applications it is often of interest to consider data that are comprised of functions; the same perspective is useful in image processing. In practice, discretisation (of differential equations arising in the sciences) or pixellation (of images) renders problems finite dimensional, but conceiving first of algorithms that operate on functions, and only then discretising or pixellating, leads to better algorithms that smoothly operate between different levels of discretisation or pixellation. In this paper function-space versions of the autoencoder (FAE) and variational autoencoder (FVAE) are introduced, analysed, and deployed. Well-definedness of the objective function governing VAEs is a subtle issue, even in finite dimension, and more so on function space. The FVAE objective is well defined whenever the data distribution is compatible with the chosen generative model; this happens, for example, when the data arise from a stochastic differential equation. The FAE objective is valid much more broadly, and can be straightforwardly applied to data governed by differential equations. Pairing these objectives with neural operator architectures, which can thus be evaluated on any mesh, enables new applications of autoencoders to inpainting, superresolution, and generative modelling of scientific data.	翻訳日:2024-08-05 12:58:15 公開日:2024-08-02
# 視覚モデルを用いた自動関連判断に向けて-画像検索のための言語モデル--テキスト検索による評価 Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation ( http://arxiv.org/abs/2408.01363v1 ) ライセンス: Link先を確認	Jheng-Hong Yang, Jimmy Lin,	(参考訳) VLM(Vision-Language Models)は、様々なアプリケーションで成功を収めてきたが、関連性判断を支援する可能性はまだ不明である。本稿では,マルチメディアコンテンツ作成に適した大規模検索タスクにおいて,CLIP,LLaVA,GPT-4Vを含むVLMの関連性評価能力を評価する。 1) LLaVA と GPT-4V はオープンソースおよびクローズドソースのビジュアルインストラクション・チューニングされた大規模言語モデル (LLM) を包含し、Kendall の注目すべき $\tau \sim 0.4$ を人間の関連性判断と比較すると、CLIPScore の基準を超えている。 2) CLIPScore は強く好まれるが,LLM は CLIP ベースの検索システムに偏りが小さい。 (3) GPT-4Vのスコア分布は他のモデルよりも人間の判断と密接に一致し、コーエンの$\kappa$の約0.08を達成し、CLIPScoreを約0.096で上回る。以上の結果から,LLMを用いたVLMの妥当性が示唆された。 Vision--Language Models (VLMs) have demonstrated success across diverse applications, yet their potential to assist in relevance judgments remains uncertain. This paper assesses the relevance estimation capabilities of VLMs, including CLIP, LLaVA, and GPT-4V, within a large-scale \textit{ad hoc} retrieval task tailored for multimedia content creation in a zero-shot fashion. Preliminary experiments reveal the following: (1) Both LLaVA and GPT-4V, encompassing open-source and closed-source visual-instruction-tuned Large Language Models (LLMs), achieve notable Kendall's $\tau \sim 0.4$ when compared to human relevance judgments, surpassing the CLIPScore metric. (2) While CLIPScore is strongly preferred, LLMs are less biased towards CLIP-based retrieval systems. (3) GPT-4V's score distribution aligns more closely with human judgments than other models, achieving a Cohen's $\kappa$ value of around 0.08, which outperforms CLIPScore at approximately -0.096. These findings underscore the potential of LLM-powered VLMs in enhancing relevance judgments.	翻訳日:2024-08-05 12:48:28 公開日:2024-08-02
# データデバッグはSGDで訓練された分類器のNPハードである Data Debugging is NP-hard for Classifiers Trained with SGD ( http://arxiv.org/abs/2408.01365v1 ) ライセンス: Link先を確認	Zizheng Guo, Pengyu Chen, Yanzhang Fu, Dongjing Miao,	(参考訳) データデバッギングは、トレーニングデータのサブセットを見つけることであり、サブセットで再トレーニングしたモデルの方が精度が高い。しかし、多くのヒューリスティックなアプローチが提案されているが、この問題を効果的に解決することは保証されていない。このことは、再トレーニングによって得られたモデルの方が精度が良いような、より効率的なアルゴリズムが存在するかどうかという未解決の問題を残している。このオープンな疑問に答え、データデバッギングのためのより良いアルゴリズムを開発するための理論的基礎を提供するために、Debuggableという問題の計算複雑性について検討する。データセット $D$ とテストインスタンス $(\mathbf{x}_\text{test},y_\text{test})$ where $\mathcal{M}(\mathbf{x}_\text{test})\neq y_\text{test}$, Debuggable が与えられたら、$D$ のサブセット $D^\prime$ が存在して、モデル $D^\prime$ が $D^\prime$ のトレーニングによって得られる$D^\prime$ が $\mathcal{M}^\prime(\mathbf{x}_\text{test})=y\text{test}$ となる。広く使われているモデルをカバーするため,SGD学習線形分類器をモデルとし,以下の主な結果を導出する。 1) 損失関数とモデルの寸法が固定されていなければ、デバッジャブルはSGD中に全てのトレーニングサンプルが処理される訓練順序にかかわらずNP完全である。 2) ヒンジライクな損失関数に対しては,デバッジャブルの計算複雑性を包括的に解析し,(3) 損失関数が線形関数であれば,デバッジャブルは線形時間で解ける。これらの結果は、現在のアプローチの限界を浮き彫りにするだけでなく、データデバッギングに関する新たな洞察を提供する。 Data debugging is to find a subset of the training data such that the model obtained by retraining on the subset has a better accuracy. A bunch of heuristic approaches are proposed, however, none of them are guaranteed to solve this problem effectively. This leaves an open issue whether there exists an efficient algorithm to find the subset such that the model obtained by retraining on it has a better accuracy. To answer this open question and provide theoretical basis for further study on developing better algorithms for data debugging, we investigate the computational complexity of the problem named Debuggable. Given a machine learning model $\mathcal{M}$ obtained by training on dataset $D$ and a test instance $(\mathbf{x}_\text{test},y_\text{test})$ where $\mathcal{M}(\mathbf{x}_\text{test})\neq y_\text{test}$, Debuggable is to determine whether there exists a subset $D^\prime$ of $D$ such that the model $\mathcal{M}^\prime$ obtained by retraining on $D^\prime$ satisfies $\mathcal{M}^\prime(\mathbf{x}_\text{test})=y_\text{test}$. To cover a wide range of commonly used models, we take SGD-trained linear classifier as the model and derive the following main results. (1) If the loss function and the dimension of the model are not fixed, Debuggable is NP-complete regardless of the training order in which all the training samples are processed during SGD. (2) For hinge-like loss functions, a comprehensive analysis on the computational complexity of Debuggable is provided; (3) If the loss function is a linear function, Debuggable can be solved in linear time, that is, data debugging can be solved easily in this case. These results not only highlight the limitations of current approaches but also offer new insights into data debugging.	翻訳日:2024-08-05 12:48:28 公開日:2024-08-02
# ロボットマニピュレーションのためのステージガイド型動的マルチセンサフュージョン Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation ( http://arxiv.org/abs/2408.01366v1 ) ライセンス: Link先を確認	Ruoxuan Feng, Di Hu, Wenke Ma, Xuelong Li,	(参考訳) 人間は、環境と対話する際に異なる感覚に柔軟に交互に変化する優れた才能を持っている。食材添加のタイミングを巧みに計り、色、音、香りに応じて熱を制御し、複雑な調理工程のすべての段階をシームレスにナビゲートするシェフの写真。この能力は、各段階におけるサブゴールを達成するためには、異なる感覚の活用が必要であるため、タスクステージの完全な理解に基づいて構築される。ロボットに類似した能力を与えるため、サブゴールによって分割されたタスクステージを模倣学習プロセスに統合し、動的多感覚融合を誘導する。そこで我々は,MS-Botを提案する。MS-Botは段階誘導型動的多感核融合法であり,予測された現在段階内の微細な状態に基づいて,モダリティの優先度を動的に調整する。我々は、視覚、聴覚、触覚センサーを備えたロボットシステムを訓練し、キーウェイに注ぐ、ペグを挿入するといったロボット操作に挑戦する。実験結果から,本手法は従来の方法よりもヒトの核融合プロセスと密に連携し,より効果的で説明可能な動的核融合を可能にすることが示唆された。 Humans possess a remarkable talent for flexibly alternating to different senses when interacting with the environment. Picture a chef skillfully gauging the timing of ingredient additions and controlling the heat according to the colors, sounds, and aromas, seamlessly navigating through every stage of the complex cooking process. This ability is founded upon a thorough comprehension of task stages, as achieving the sub-goal within each stage can necessitate the utilization of different senses. In order to endow robots with similar ability, we incorporate the task stages divided by sub-goals into the imitation learning process to accordingly guide dynamic multi-sensory fusion. We propose MS-Bot, a stage-guided dynamic multi-sensory fusion method with coarse-to-fine stage understanding, which dynamically adjusts the priority of modalities based on the fine-grained state within the predicted current stage. We train a robot system equipped with visual, auditory, and tactile sensors to accomplish challenging robotic manipulation tasks: pouring and peg insertion with keyway. Experimental results indicate that our approach enables more effective and explainable dynamic fusion, aligning more closely with the human fusion process than existing methods.	翻訳日:2024-08-05 12:48:28 公開日:2024-08-02
# TransformerはUniversal In-context Learnerである Transformers are Universal In-context Learners ( http://arxiv.org/abs/2408.01367v1 ) ライセンス: Link先を確認	Takashi Furuya, Maarten V. de Hoop, Gabriel Peyré,	(参考訳) トランスフォーマーは、特定のトークンセット(NLPアプリケーションにおけるプロンプトやビジョントランスフォーマー用のパッチセットなど)に基づいて新しいトークンを予測できる「コンテキスト内マッピング」を定義するディープアーキテクチャである。この研究は、特にこれらのアーキテクチャが任意の数のコンテキストトークンを扱う能力について研究している。これらのアーキテクチャの表現率を数学的に均一に解決するために、写像がトークンの確率分布で表される文脈で条件づけられている場合を考える(有限個のトークンに対して離散)。関連した滑らかさの概念は、これらの文脈の間のワッサーシュタイン距離の観点からの連続性に対応する。深層変圧器は普遍的であり、コンパクトなトークン領域に対して一様に、任意の精度で連続的な文脈内マッピングを近似できることを示す。我々の結果の重要な側面は、既存の結果と比較して、固定精度では、単一変圧器が任意の(無限の)トークン数で動作可能であることである。さらに、トークンの固定埋め込み次元(この次元は精度で増加しない)と固定数のヘッド(次元に比例する)で作用する。マルチヘッドアテンション層間のMLP層の使用も明示的に制御されている。 Transformers are deep architectures that define "in-context mappings" which enable predicting new tokens based on a given set of tokens (such as a prompt in NLP applications or a set of patches for vision transformers). This work studies in particular the ability of these architectures to handle an arbitrarily large number of context tokens. To mathematically and uniformly address the expressivity of these architectures, we consider the case that the mappings are conditioned on a context represented by a probability distribution of tokens (discrete for a finite number of tokens). The related notion of smoothness corresponds to continuity in terms of the Wasserstein distance between these contexts. We demonstrate that deep transformers are universal and can approximate continuous in-context mappings to arbitrary precision, uniformly over compact token domains. A key aspect of our results, compared to existing findings, is that for a fixed precision, a single transformer can operate on an arbitrary (even infinite) number of tokens. Additionally, it operates with a fixed embedding dimension of tokens (this dimension does not increase with precision) and a fixed number of heads (proportional to the dimension). The use of MLP layers between multi-head attention layers is also explicitly controlled.	翻訳日:2024-08-05 12:48:28 公開日:2024-08-02
# 最小の操作理論:量子的特徴を持つ古典理論 Minimal operational theories: classical theories with quantum features ( http://arxiv.org/abs/2408.01368v1 ) ライセンス: Link先を確認	Davide Rolino, Marco Erba, Alessandro Tosini, Paolo Perinotti,	(参考訳) 各系の力学は状態の集合と整合する最小の操作に制限され、条件付きテストが可能である確率論的理論のクラスを導入する。条件付き最小理論のほとんど全てが、2つの量子no-go定理を満たすことを示す。関連する例として、全ての系が古典的である条件付き最小限の玩具理論を構築する。この理論には相容れない測定や準備の不確実性の関係がなく、コチェン=スペクター非コンテクスト的であり、同時に測定障害の不可逆性、乱れのない情報、放送なしの可逆性を示す。これは、後者の3つの性質が非古典性のシグネチャとして$per se$では理解できないことを証明している。 We introduce a class of probabilistic theories where the dynamics of each system is limited to the minimal set of operations consistent with the set of states and allowing for conditional tests. We show that almost all minimal theories with conditioning satisfy two quantum no-go theorems: no-information without disturbance and no-broadcasting. As a relevant example, we construct a minimal toy-theory with conditioning where all systems are classical. The theory does not have incompatible measurements nor preparation uncertainty relations, and it is Kochen-Specker noncontextual, and at the same time it exhibits irreversibility of measurement disturbance, no-information without disturbance and no-broadcasting. This proves that the latter three properties cannot be understood $per se$ as signatures of non-classicality.	翻訳日:2024-08-05 12:48:28 公開日:2024-08-02
# 超伝導量子情報応用のための低損失Al/Si/Alパラレルプレートコンデンサの作製と評価 Fabrication and characterization of low-loss Al/Si/Al parallel plate capacitors for superconducting quantum information applications ( http://arxiv.org/abs/2408.01369v1 ) ライセンス: Link先を確認	Anthony McFadden, Aranya Goswami, Tongyu Zhao, Teun van Schijndel, Trevyn F. Q. Larson, Sudhir Sahu, Stephen Gill, Florent Lecocq, Raymond Simmonds, Chris Palmstrøm,	(参考訳) 超伝導回路の密度の増大は、コンパクトな部品を必要とするが、超伝導体ベースのコンデンサは、表面や界面の損失により寸法が小さくなるため、一般的には悪化する。ここでは、アルミニウムで接触した結晶状シリコンフィンからなるパラレルプレートコンデンサが、積層素子共振器とトランモン量子ビットの性能を評価することにより、超伝導回路での使用に有望な技術であることが示されている。高アスペクト比SiフィンキャパシタをSi(110)基板の異方性湿式エッチングおよびアルミニウム金属化法により作製した。単結晶Siコンデンサは、それぞれ、リソグラフィーパターンのアルミニウムインダクタと従来の$Al/AlO_x/Al$ジョセフソン接合を用いて、積層素子共振器とトランスモンに組み込まれている。これらの装置のマイクロ波特性は,500k以上およびqubit$T_1$が25$\mu$s以上である積層素子共振器の低電力内部品質係数を有する超電導パラレルプレートコンデンサの最先端性能を示唆している。これらの結果から,Si-Finsは低損失,小型,超伝導系キャパシタの最小容量を必要とするアプリケーションにとって有望な技術であることが示唆された。 Increasing the density of superconducting circuits requires compact components, however, superconductor-based capacitors typically perform worse as dimensions are reduced due to loss at surfaces and interfaces. Here, parallel plate capacitors composed of aluminum-contacted, crystalline silicon fins are shown to be a promising technology for use in superconducting circuits by evaluating the performance of lumped element resonators and transmon qubits. High aspect ratio Si-fin capacitors having widths below $300nm$ with an approximate total height of 3$\mu$m are fabricated using anisotropic wet etching of Si(110) substrates followed by aluminum metallization. The single-crystal Si capacitors are incorporated in lumped element resonators and transmons by shunting them with lithographically patterned aluminum inductors and conventional $Al/AlO_x/Al$ Josephson junctions respectively. Microwave characterization of these devices suggests state-of-the-art performance for superconducting parallel plate capacitors with low power internal quality factor of lumped element resonators greater than 500k and qubit $T_1$ times greater than 25$\mu$s. These results suggest that Si-Fins are a promising technology for applications that require low loss, compact, superconductor-based capacitors with minimal stray capacitance.	翻訳日:2024-08-05 12:48:28 公開日:2024-08-02
# EVIT:ウィンドウ付き非線形最適化を用いた半次元地図におけるイベントベースビジュアル慣性追跡 EVIT: Event-based Visual-Inertial Tracking in Semi-Dense Maps Using Windowed Nonlinear Optimization ( http://arxiv.org/abs/2408.01370v1 ) ライセンス: Link先を確認	Runze Yuan, Tao Liu, Zijia Dai, Yi-Fan Zuo, Laurent Kneip,	(参考訳) イベントカメラは、絶対的な画像強度を統合するのではなく、明るさの変化に反応する興味深い視覚的知覚センサーである。この設計により、このセンサーは、挑戦的なダイナミックスや照明条件の状況において、強い性能を示す。イベントベースの同時追跡とマッピングは依然として困難な問題であるが、最近の多くの研究で、センサが以前のマップベースの追跡に適していることが指摘されている。クロスモーダルな登録パラダイムを利用することで、カメラのエゴモーションは、より伝統的なセンサーによって事前に作成された正確な地図の上に、照明とダイナミクスの広い範囲にわたって追跡することができる。本稿では,最近導入されたイベントベースの幾何的半深度追跡のパラダイムについて追従し,その評価を強固にするために慣性信号の追加を提案する。より具体的には、追加された信号は、ウィンドウ付きマルチフレームトラッキング中の規則化と同様に、ポーズの初期化のための強力な手がかりを提供する。その結果、高ダイナミックなシーケンスをまたいで安定したトラッキングを維持するためには、照明条件の難易度と、中間イベント表現を登録する必要がある率の低減により、性能の向上を実現している。本評価では,さまざまな実世界のシーケンスに焦点をあて,異なるレートで実行されるイベントベースの代替手段との比較を行った。 Event cameras are an interesting visual exteroceptive sensor that reacts to brightness changes rather than integrating absolute image intensities. Owing to this design, the sensor exhibits strong performance in situations of challenging dynamics and illumination conditions. While event-based simultaneous tracking and mapping remains a challenging problem, a number of recent works have pointed out the sensor's suitability for prior map-based tracking. By making use of cross-modal registration paradigms, the camera's ego-motion can be tracked across a large spectrum of illumination and dynamics conditions on top of accurate maps that have been created a priori by more traditional sensors. The present paper follows up on a recently introduced event-based geometric semi-dense tracking paradigm, and proposes the addition of inertial signals in order to robustify the estimation. More specifically, the added signals provide strong cues for pose initialization as well as regularization during windowed, multi-frame tracking. As a result, the proposed framework achieves increased performance under challenging illumination conditions as well as a reduction of the rate at which intermediate event representations need to be registered in order to maintain stable tracking across highly dynamic sequences. Our evaluation focuses on a diverse set of real world sequences and comprises a comparison of our proposed method against a purely event-based alternative running at different rates.	翻訳日:2024-08-05 12:48:28 公開日:2024-08-02
# ハイパースペクトル画像分類のための空間スペクトル形態マンバ Spatial-Spectral Morphological Mamba for Hyperspectral Image Classification ( http://arxiv.org/abs/2408.01372v1 ) ライセンス: Link先を確認	Muhammad Ahmad, Muhammad Hassaan Farooq Butt, Muhammad Usama, Adil Mehmood Khan, Manual Mazzara, Salvatore Distenano,	(参考訳) 近年,トランスフォーマーの自己注意機構が強いため,ハイパースペクトル画像分類(HSIC)に注目が集まっている。しかし、これらのモデルは計算効率の大きな課題に直面する。ステートスペースモデルを活用するMambaアーキテクチャは、Transformersのより効率的な代替手段を提供する。本稿では,空間スペクトル形態マンバ(MorpMamba)モデルを紹介する。 MorpMambaモデルでは、トークン生成モジュールが最初にハイパースペクトル画像(HSI)パッチを空間スペクトルトークンに変換する。これらのトークンはモルフォロジーブロックによって処理され、奥行き分離可能な畳み込み演算を用いて構造情報と形状情報を計算する。抽出された情報は、HSIサンプルの中心領域に基づいて空間トークンとスペクトルトークンを調整し、各ブロック内で効果的な情報融合を可能にする機能拡張モジュールで強化される。その後、トークンをマルチヘッド自己保持ブロックで洗練して特徴空間をさらに改善する。最後に、組み合わせた情報を状態空間ブロックに入力して、基底真理写像の分類と作成を行う。広く使われているハイパースペクトル(HS)データセットの実験では、MorpMambaモデルはCNNモデルとTransformerモデルの両方で(パラメトリック効率)優れていた。 In recent years, Transformers have garnered significant attention for Hyperspectral Image Classification (HSIC) due to their self-attention mechanism, which provides strong classification performance. However, these models face major challenges in computational efficiency, as their complexity increases quadratically with the sequence length. The Mamba architecture, leveraging a State Space Model, offers a more efficient alternative to Transformers. This paper introduces the Spatial-Spectral Morphological Mamba (MorpMamba) model. In the MorpMamba model, a token generation module first converts the Hyperspectral Image (HSI) patch into spatial-spectral tokens. These tokens are then processed by a morphology block, which computes structural and shape information using depthwise separable convolutional operations. The extracted information is enhanced in a feature enhancement module that adjusts the spatial and spectral tokens based on the center region of the HSI sample, allowing for effective information fusion within each block. Subsequently, the tokens are refined in a multi-head self-attention block to further improve the feature space. Finally, the combined information is fed into the state space block for classification and the creation of the ground truth map. Experiments on widely used Hyperspectral (HS) datasets demonstrate that the MorpMamba model outperforms (parametric efficiency) both CNN and Transformer models.	翻訳日:2024-08-05 12:48:28 公開日:2024-08-02
# ラインサーチとグラディエントディフレッシュを用いた高能率ニューラルネットワーク学習のためのハイブリッドコーディネートディフレッシュ Hybrid Coordinate Descent for Efficient Neural Network Learning Using Line Search and Gradient Descent ( http://arxiv.org/abs/2408.01374v1 ) ライセンス: Link先を確認	Yen-Che Hsiao, Abhishek Dutta,	(参考訳) 本稿では,一方向線探索と勾配情報を組み合わせた2乗誤差損失関数のパラメータ更新手法を提案する。各パラメータは、線探索法または勾配法のいずれかで決定された更新を行い、そのパラメータに対する損失の勾配のモジュラリティが予め定義された閾値を超えるか否かを判断する。特に、より大きなしきい値によってアルゴリズムの効率が向上する。勾配降下に対する線探索法の潜在的に遅い性質にもかかわらず、その並列化性は計算時間の短縮を促進する。合成データを用いた2層整流線形ユニットネットワーク上での実験検証により,ハイパーパラメータが収束率と計算効率に与える影響が明らかになった。 This paper presents a novel coordinate descent algorithm leveraging a combination of one-directional line search and gradient information for parameter updates for a squared error loss function. Each parameter undergoes updates determined by either the line search or gradient method, contingent upon whether the modulus of the gradient of the loss with respect to that parameter surpasses a predefined threshold. Notably, a larger threshold value enhances algorithmic efficiency. Despite the potentially slower nature of the line search method relative to gradient descent, its parallelizability facilitates computational time reduction. Experimental validation conducted on a 2-layer Rectified Linear Unit network with synthetic data elucidates the impact of hyperparameters on convergence rates and computational efficiency.	翻訳日:2024-08-05 12:48:28 公開日:2024-08-02
# 参加型バイオメディカルデータセットにおけるコホート代表性向上のための適応的資源配分 Adaptive Recruitment Resource Allocation to Improve Cohort Representativeness in Participatory Biomedical Datasets ( http://arxiv.org/abs/2408.01375v1 ) ライセンス: Link先を確認	Victor Borza, Andrew Estornell, Ellen Wright Clayton, Chien-Ju Ho, Russell Rothman, Yevgeniy Vorobeychik, Bradley Malin,	(参考訳) データセットに参加する個人を募集する大規模な参加型バイオメディカル研究は、特に現代のAI手法による分析において、人気と投資を集めている。彼らは純粋に参加者を募集するため、これらの研究は歴史的表現の欠如に対処することができる。本研究は, 対象とする属性の人口分布の類似性として代表性を定義し, 年齢, 性別, 人種, 民族の分布にまたがる米国の人口分布を反映することを目的とする。多くの参加型研究がいくつかの機関で採用されているので、我々は、代表性を改善するために、サイト間で採用資源を適応的に割り当てるための計算手法を導入している。また,STAR 臨床研究ネットワークにおける医療センターからの1万人規模のコホートの採用をシミュレーションした結果,既存のベースラインよりも代表的コホートが得られた。そこで本研究では,採用を導く上での計算モデルの価値を強調した。 Large participatory biomedical studies, studies that recruit individuals to join a dataset, are gaining popularity and investment, especially for analysis by modern AI methods. Because they purposively recruit participants, these studies are uniquely able to address a lack of historical representation, an issue that has affected many biomedical datasets. In this work, we define representativeness as the similarity to a target population distribution of a set of attributes and our goal is to mirror the U.S. population across distributions of age, gender, race, and ethnicity. Many participatory studies recruit at several institutions, so we introduce a computational approach to adaptively allocate recruitment resources among sites to improve representativeness. In simulated recruitment of 10,000-participant cohorts from medical centers in the STAR Clinical Research Network, we show that our approach yields a more representative cohort than existing baselines. Thus, we highlight the value of computational modeling in guiding recruitment efforts.	翻訳日:2024-08-05 12:48:28 公開日:2024-08-02
# 絶縁体中のポラロン形成と孔散乱過程の鍵となる役割:バンド絶縁体、電荷密度波、モット転移 Polaron formation in insulators and the key role of hole scattering processes: Band insulators, charge density waves and Mott transition ( http://arxiv.org/abs/2408.01377v1 ) ライセンス: Link先を確認	Ivan Amelio, Giacomo Mazza, Nathan Goldman,	(参考訳) 非相互作用性フェルミ海に浸漬された移動不純物は、フェルミオン媒質の隙間のない粒子ホール励起によって着る。この従来のフェルミ・ポラロン設定は、不純物-ホール散乱過程を無視するいわゆるはしご近似によってよく説明されている。本研究では,金属-モット絶縁体遷移中の外部周期電位,自発的な電荷密度波,フェルミ-ハッバード系から生じる帯域絶縁体と媒体の相関関係を考慮し,物質絶縁状態の文脈におけるポーラロン生成を解析する。ポーラロンスペクトル関数は、単一粒子バンドギャップ、粒子ホール対称性、モット状態への遷移など、基礎となるフェルミオン背景の顕著な符号を示す。これらのシグネチャは、チェヴィアンザッツの枠組み、すなわちヒルベルト空間を単一粒子ホール励起に制限するときに特定される。興味深いことに、これらのバンド系では、粒子と孔散乱位相空間が同等であるという事実から、はしご近似が不正確なことが分かる。以上の結果から, 低温原子および半導体実験に関係した相関多体媒体におけるポラロン生成の理解を深めることができた。 A mobile impurity immersed in a non-interacting Fermi sea is dressed by the gapless particle-hole excitations of the fermionic medium. This conventional Fermi-polaron setting is well described by the so-called ladder approximation, which consists in neglecting impurity-hole scattering processes. In this work, we analyze polaron formation in the context of insulating states of matter, considering increasing levels of correlation in the medium:~band insulators originating from external periodic potentials, spontaneously-formed charge density waves, and a Fermi-Hubbard system undergoing a metal-Mott insulator transition. The polaron spectral function is shown to exhibit striking signatures of the underlying fermionic background, such as the single-particle band gap, particle-hole symmetry and the transition to the Mott state. These signatures are identified within the framework of the Chevy ansatz, i.e. upon restricting the Hilbert space to single particle-hole excitations. Interestingly, we find that the ladder approximation is inaccurate in these band systems, due to the fact that the particle and hole scattering phase spaces are comparable. Our results provide a step forward in the understanding of polaron formation in correlated many-body media, which are relevant to both cold-atom and semiconductor experiments.	翻訳日:2024-08-05 12:48:28 公開日:2024-08-02
# データ上の座標のサンプリングと平均化 Resampling and averaging coordinates on data ( http://arxiv.org/abs/2408.01379v1 ) ライセンス: Link先を確認	Andrew J. Blumberg, Mathieu Carriere, Jun Hou Fung, Michael A. Mandell,	(参考訳) 点雲上の固有座標を頑健に計算するアルゴリズムを導入する。我々のアプローチは、埋め込みアルゴリズム(例えば、多様体学習)のデータと様々なハイパーパラメータをサブサンプリングすることで、多くの候補座標を生成することに依存している。次に、候補座標の集合をクラスタリングし、トポロジカルデータ解析から形状記述子を用いて代表埋め込みのサブセットを同定する。最終的な出力は、一般化されたProcrustes解析を用いて代表埋め込みの平均として得られる埋め込みである。提案アルゴリズムは, ゲノミクスによる合成データと実験的測定の両方で検証し, ノイズや外れ値に対するロバスト性を実証した。 We introduce algorithms for robustly computing intrinsic coordinates on point clouds. Our approach relies on generating many candidate coordinates by subsampling the data and varying hyperparameters of the embedding algorithm (e.g., manifold learning). We then identify a subset of representative embeddings by clustering the collection of candidate coordinates and using shape descriptors from topological data analysis. The final output is the embedding obtained as an average of the representative embeddings using generalized Procrustes analysis. We validate our algorithm on both synthetic data and experimental measurements from genomics, demonstrating robustness to noise and outliers.	翻訳日:2024-08-05 12:48:28 公開日:2024-08-02
# 大規模言語モデルによるAIエージェントのロバスト性向上 Coalitions of Large Language Models Increase the Robustness of AI Agents ( http://arxiv.org/abs/2408.01380v1 ) ライセンス: Link先を確認	Prattyush Mangal, Carol Mak, Theo Kanakis, Timothy Donovan, Dave Braines, Edward Pyzer-Knapp,	(参考訳) 大規模言語モデル(LLM)の出現は、私たちがデジタルシステムと対話する方法を根本的に変え、日々のワークフローを支援するためにLLMを利用したAIエージェントの追求につながった。 LLMは強力で、いくつかの創発的な特性を示す能力があるが、論理的推論者ではなく、AIエージェントがワークフローを計画し実行するために実行するすべてのサブタスクでうまく機能するのに苦労することが多い。既存の研究は、大規模に事前訓練を一般化したり、ツール用途に特化して微調整を行うことによって、この熟練度不足に対処する一方で、個別のサブタスクで特化性能を示す事前学習LLMからなるシステムが、単一モデルエージェントの性能に匹敵するかどうかを評価する。モデルアプローチの連立は、特定のモデルによって示される特性を活用することによって、堅牢性の構築と、これらのAIエージェントの運用コスト削減の可能性を示している。本研究は, 事前学習モデルの連立性を考慮し, 微調整を緩和できることを示すとともに, LLMを利用した他の非エージェントシステムにも適用できると考えている。 The emergence of Large Language Models (LLMs) have fundamentally altered the way we interact with digital systems and have led to the pursuit of LLM powered AI agents to assist in daily workflows. LLMs, whilst powerful and capable of demonstrating some emergent properties, are not logical reasoners and often struggle to perform well at all sub-tasks carried out by an AI agent to plan and execute a workflow. While existing studies tackle this lack of proficiency by generalised pretraining at a huge scale or by specialised fine-tuning for tool use, we assess if a system comprising of a coalition of pretrained LLMs, each exhibiting specialised performance at individual sub-tasks, can match the performance of single model agents. The coalition of models approach showcases its potential for building robustness and reducing the operational costs of these AI agents by leveraging traits exhibited by specific models. Our findings demonstrate that fine-tuning can be mitigated by considering a coalition of pretrained models and believe that this approach can be applied to other non-agentic systems which utilise LLMs.	翻訳日:2024-08-05 12:48:28 公開日:2024-08-02
# シェープリー合成による単純体上の確率論的予測の解説 Explaining a probabilistic prediction on the simplex with Shapley compositions ( http://arxiv.org/abs/2408.01382v1 ) ライセンス: Link先を確認	Paul-Gauthier Noé, Miquel Perelló-Nieto, Jean-François Bonastre, Peter Flach,	(参考訳) ゲーム理論から派生したShapley値は、各特徴値の予測への寄与を定量化し、機械学習モデルの予測を説明するために広く利用されている。これは二項分類のようにスカラー予測を必要とするが、多重クラス確率予測は離散確率分布であり、多次元の単純体上に存在する。このようなマルチクラス設定では、Shapley値は通常、1-vs-rest方式で各クラスで別々に計算され、出力分布の構成的性質を無視する。本稿では,合成データ解析から得られたアッチソン幾何を用いて,多クラス確率予測を適切に記述する手法として,シェープリー合成を紹介する。我々は、シャプリー合成が Aitchison simplex 上の線型性、対称性、効率を満足する唯一の量であることを証明し、標準シャプリー値の対応する公理的性質を拡張した。様々なシナリオでこの適切なマルチクラス処理を実演する。 Originating in game theory, Shapley values are widely used for explaining a machine learning model's prediction by quantifying the contribution of each feature's value to the prediction. This requires a scalar prediction as in binary classification, whereas a multiclass probabilistic prediction is a discrete probability distribution, living on a multidimensional simplex. In such a multiclass setting the Shapley values are typically computed separately on each class in a one-vs-rest manner, ignoring the compositional nature of the output distribution. In this paper, we introduce Shapley compositions as a well-founded way to properly explain a multiclass probabilistic prediction, using the Aitchison geometry from compositional data analysis. We prove that the Shapley composition is the unique quantity satisfying linearity, symmetry and efficiency on the Aitchison simplex, extending the corresponding axiomatic properties of the standard Shapley value. We demonstrate this proper multiclass treatment in a range of scenarios.	翻訳日:2024-08-05 12:48:28 公開日:2024-08-02
# NOLO: Navigateは一度しか見えません NOLO: Navigate Only Look Once ( http://arxiv.org/abs/2408.01384v1 ) ライセンス: Link先を確認	Bohan Zhou, Jiangxing Wang, Zongqing Lu,	(参考訳) Transformerモデルのコンテキスト内学習能力は、ビジュアルナビゲーションに新たな可能性をもたらした。本稿では,実環境にアクセスすることなく,オフラインで映像からコンテキスト内ナビゲーションポリシーを純粋に学習する必要があるビデオナビゲーション設定に焦点を当てる。そこで,本研究では,インコンテキスト能力を有するナビゲーションポリシーを学習し,対応するコンテキストビデオの入力を微調整や再学習をすることなく,新たなシーンに適応するNOLO(Navigate Only Look Once)を提案する。ビデオから学習できるようにするために,まず,光学フローを用いた擬似動作ラベル作成手法を提案し,自己中心型ビデオから動作ラベルを復元する。そして、ナビゲーションポリシーを学習するためにオフライン強化学習を適用する。異なる場面での広範な実験を通して,我々のアルゴリズムは,学習方針の文脈内学習能力を示す大きなマージンでベースラインを上回っていることを示す。 The in-context learning ability of Transformer models has brought new possibilities to visual navigation. In this paper, we focus on the video navigation setting, where an in-context navigation policy needs to be learned purely from videos in an offline manner, without access to the actual environment. For this setting, we propose Navigate Only Look Once (NOLO), a method for learning a navigation policy that possesses the in-context ability and adapts to new scenes by taking corresponding context videos as input without finetuning or re-training. To enable learning from videos, we first propose a pseudo action labeling procedure using optical flow to recover the action label from egocentric videos. Then, offline reinforcement learning is applied to learn the navigation policy. Through extensive experiments on different scenes, we show that our algorithm outperforms baselines by a large margin, which demonstrates the in-context learning ability of the learned policy.	翻訳日:2024-08-05 12:48:28 公開日:2024-08-02
# NeuralBeta: ディープラーニングを使ってベータを見積もる NeuralBeta: Estimating Beta Using Deep Learning ( http://arxiv.org/abs/2408.01387v1 ) ライセンス: Link先を確認	Yuxin Liu, Jimin Lin, Achintya Gopal,	(参考訳) ファイナンスでベータを見積もる従来のアプローチは、厳格な仮定を伴い、ベータダイナミクスを適切に捉えることができず、ヘッジのようなユースケースでの有効性を制限します。これらの制約に対処するため,ニューラルベタと呼ばれるニューラルネットワークを用いた新しい手法を開発し,一変量と多変量の両方のシナリオを処理し,ベータの動的動作を追跡する。解釈可能性の問題に対処するため,正規化重み付き線形回帰にインスパイアされた新たな出力層を導入し,モデルの意思決定プロセスに透明性を提供する。我々は合成データと市場データの両方について広範な実験を行い、NeuralBetaの優れたパフォーマンスを様々なシナリオにおけるベンチマーク手法と比較した。このモデルは、ベータ推定の分野における進歩を表すだけでなく、線形関係を前提とした他の金融状況における応用の可能性も示している。 Traditional approaches to estimating beta in finance often involve rigid assumptions and fail to adequately capture beta dynamics, limiting their effectiveness in use cases like hedging. To address these limitations, we have developed a novel method using neural networks called NeuralBeta, which is capable of handling both univariate and multivariate scenarios and tracking the dynamic behavior of beta. To address the issue of interpretability, we introduce a new output layer inspired by regularized weighted linear regression, which provides transparency into the model's decision-making process. We conducted extensive experiments on both synthetic and market data, demonstrating NeuralBeta's superior performance compared to benchmark methods across various scenarios, especially instances where beta is highly time-varying, e.g., during regime shifts in the market. This model not only represents an advancement in the field of beta estimation, but also shows potential for applications in other financial contexts that assume linear relationships.	翻訳日:2024-08-05 12:38:30 公開日:2024-08-02
# FT K-Means: フォールトトレランスを備えたGPU上の高性能K-Means FT K-Means: A High-Performance K-Means on GPU with Fault Tolerance ( http://arxiv.org/abs/2408.01391v1 ) ライセンス: Link先を確認	Shixun Wu, Yitong Ding, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Huangliang Dai, Sheng Di, Bryan M. Wong, Zizhong Chen, Franck Cappello,	(参考訳) K-Meansはクラスタリングにおいて広く使われているアルゴリズムであるが、その効率は主に距離計算の計算コストに制約されている。既存の実装は、計算単位の最適以下利用とソフトエラーに対するレジリエンスの欠如に悩まされている。これらの課題に対処するため、FT K-Meansを導入し、K-Meansの高速GPU高速化とオンラインフォールトトレランスを実現する。まず、NVIDIAのcuMLライブラリと比較して、競争性能を達成するためのステップワイズ最適化戦略を提案する。我々は、異なるデータ型をサポートし、異なる入力形式に適応するテンプレートベースのコード生成フレームワークにより、FT K-Meansをさらに改善する。コピー操作中のメモリ同期による既存のフォールトトレランス手法の故障に対処するために,ワープレベルのテンソルコア誤り訂正方式を提案する。 NVIDIA T4 GPU と A100 GPU の実験的評価により,障害耐性のない FT K-Means が cuML の K-Means 実装より優れており,不規則なデータ形状を含むシナリオでは 10\%-300\% の性能向上が示されている。さらに、FT K-Meansのフォールトトレランス機能は、オーバーヘッドが111\%しか導入せず、毎秒数十エラーを注入しても堅牢な性能を維持している。 K-Means is a widely used algorithm in clustering, however, its efficiency is primarily constrained by the computational cost of distance computing. Existing implementations suffer from suboptimal utilization of computational units and lack resilience against soft errors. To address these challenges, we introduce FT K-Means, a high-performance GPU-accelerated implementation of K-Means with online fault tolerance. We first present a stepwise optimization strategy that achieves competitive performance compared to NVIDIA's cuML library. We further improve FT K-Means with a template-based code generation framework that supports different data types and adapts to different input shapes. A novel warp-level tensor-core error correction scheme is proposed to address the failure of existing fault tolerance methods due to memory asynchronization during copy operations. Our experimental evaluations on NVIDIA T4 GPU and A100 GPU demonstrate that FT K-Means without fault tolerance outperforms cuML's K-Means implementation, showing a performance increase of 10\%-300\% in scenarios involving irregular data shapes. Moreover, the fault tolerance feature of FT K-Means introduces only an overhead of 11\%, maintaining robust performance even with tens of errors injected per second.	翻訳日:2024-08-05 12:38:30 公開日:2024-08-02
# スケーラブル表面符号計算のための横制御NOTゲートの誤差補正 Error correction of transversal controlled-NOT gates for scalable surface code computation ( http://arxiv.org/abs/2408.01393v1 ) ライセンス: Link先を確認	Kaavya Sahay, Yingjia Lin, Shilin Huang, Kenneth R. Brown, Shruti Puri,	(参考訳) 近年の実験的進歩により、多数のプラットフォームにおいて、表面コード上に論理的マルチキュービットトランスバーサルゲートを実装することが可能になった。 2つの面符号上のtransversal controlled-NOT (tCNOT) ゲートは、コードブロック間の相関エラーを導入し、表面符号量子メモリ (SCQM) や格子演算の確立した方法と比較して、デコード戦略を変更する必要がある。本研究では,スケーラブルでフォールトトレラントな量子計算のために,tCNOTの3種類の復号法の性能を検証・ベンチマークする。特に、SCQM MWPMデコーダと同じ閾値を達成する最小重完全マッチング(MWPM)に基づく低複雑さデコーダを提案する。解析は, 格子手術とパウリ法における横方向操作の性能と消去ノイズモデルとの比較とともに, 横方向テレポーテーション回路の整形復号化に関する研究により拡張される。本研究は,超越ゲートに基づく大規模量子アルゴリズムの実装コストの体系的評価を目的としている。 Recent experimental advances have made it possible to implement logical multi-qubit transversal gates on surface codes in a multitude of platforms. A transversal controlled-NOT (tCNOT) gate on two surface codes introduces correlated errors across the code blocks and thus requires modified decoding strategies compared to established methods of decoding surface code quantum memory (SCQM) or lattice surgery operations. In this work, we examine and benchmark the performance of three different decoding strategies for the tCNOT for scalable, fault-tolerant quantum computation. In particular, we present a low-complexity decoder based on minimum-weight perfect matching (MWPM) that achieves the same threshold as the SCQM MWPM decoder. We extend our analysis with a study of tailored decoding of a transversal teleportation circuit, along with a comparison between the performance of lattice surgery and transversal operations under Pauli and erasure noise models. Our investigation works towards systematic estimation of the cost of implementing large-scale quantum algorithms based on transversal gates in the surface code.	翻訳日:2024-08-05 12:38:29 公開日:2024-08-02
# 意味的特徴と言語的特徴を利用した多言語ニューラルマシン翻訳の改良 Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features ( http://arxiv.org/abs/2408.01394v1 ) ライセンス: Link先を確認	Mengyu Bu, Shuhao Gu, Yang Feng,	(参考訳) 多言語多言語ニューラルマシン翻訳は、原文からの意味的特徴と対象文からの言語的特徴を統合する過程とみなすことができる。ゼロショット翻訳を強化するために、モデルは言語間で知識を共有する必要がある。そこで本稿では,複数言語間の意味的特徴と言語的特徴を両立させ,多言語翻訳を強化することを提案する。エンコーダ側では,意味的特徴と言語的特徴を両立させることで,エンコーダ表現の整合性を図った不整合学習タスクを導入し,完全な情報を保持しながら知識伝達を容易にする。デコーダ側では、言語エンコーダを利用して低レベル言語機能を統合し、ターゲット言語生成を支援する。多言語データセットの実験結果は、教師付き翻訳の性能を維持しながら、ベースラインシステムと比較してゼロショット翻訳の大幅な改善を示す。さらに分析により,意味的特徴と言語的特徴の両面を活用した手法の有効性が検証された。コードはhttps://github.com/ictnlp/SemLing-MNMTで公開されている。 The many-to-many multilingual neural machine translation can be regarded as the process of integrating semantic features from the source sentences and linguistic features from the target sentences. To enhance zero-shot translation, models need to share knowledge across languages, which can be achieved through auxiliary tasks for learning a universal representation or cross-lingual mapping. To this end, we propose to exploit both semantic and linguistic features between multiple languages to enhance multilingual translation. On the encoder side, we introduce a disentangling learning task that aligns encoder representations by disentangling semantic and linguistic features, thus facilitating knowledge transfer while preserving complete information. On the decoder side, we leverage a linguistic encoder to integrate low-level linguistic features to assist in the target language generation. Experimental results on multilingual datasets demonstrate significant improvement in zero-shot translation compared to the baseline system, while maintaining performance in supervised translation. Further analysis validates the effectiveness of our method in leveraging both semantic and linguistic features. The code is available at https://github.com/ictnlp/SemLing-MNMT.	翻訳日:2024-08-05 12:38:29 公開日:2024-08-02
# 調和振動子と等調振動子の非エルミートおよび$\mathcal{PT}$対称拡大 Non-Hermitian and $\mathcal{PT}$-symmetric extensions of the harmonic and isotonic oscillators ( http://arxiv.org/abs/2408.01397v1 ) ライセンス: Link先を確認	Aritra Ghosh, Akash Sinha,	(参考訳) スワンソン発振器は、二次ハミルトニアンを持つ$\mathcal{PT}$-対称および非エルミート系の原型的な例を形成する。この系は総称二次二次ハミルトニアン $\hat{H}_{\rm Swanson} = \hbar \Omega_0 \big( \hat{a}^\dagger \hat{a} + \frac{1}{2}\big) + \alpha \hat{a}^2 + \beta ({\hat{a}^\dagger})^2$, where $\Omega_0 > 0$, $\alpha, \beta \in \mathbb{R}$, $\alpha \neq \beta$によって記述される。このようなシステムは、虚数値のゲージ場の存在下で調和振動子として実現可能であることを示す。位置表現では、問題、すなわち波動関数とスペクトルを正確に解く。また、パラメータの特定の範囲に対して正確に解かれる等速発振器の同様の非エルミートおよび$\mathcal{PT}$対称拡張も提案する。 The Swanson oscillator forms a prototypical example of a $\mathcal{PT}$-symmetric and non-Hermitian system with a quadratic Hamiltonian. The system is described by the generic quadratic Hamiltonian $\hat{H}_{\rm Swanson} = \hbar \Omega_0 \big( \hat{a}^\dagger \hat{a} + \frac{1}{2}\big) + \alpha \hat{a}^2 + \beta ({\hat{a}^\dagger})^2$, where $\Omega_0 > 0$, $\alpha, \beta \in \mathbb{R}$, and $\alpha \neq \beta$. We show that such a system may be realized as a harmonic oscillator in the presence of an imaginary-valued gauge field. In the position representation, we solve the problem exactly, i.e., we find the wavefunctions and the spectrum. We then also propose a similar non-Hermitian and $\mathcal{PT}$-symmetric extension of the isotonic oscillator which is exactly solved for a certain range of the parameters.	翻訳日:2024-08-05 12:38:29 公開日:2024-08-02
# 量子多体系の次数パラメータ発見 Order Parameter Discovery for Quantum Many-Body Systems ( http://arxiv.org/abs/2408.01400v1 ) ライセンス: Link先を確認	Nicola Mariella, Tara Murphy, Francesco Di Marcantonio, Khadijeh Najafi, Sofia Vallecorsa, Sergiy Zhuk, Enrique Rico,	(参考訳) 量子相転移は、基本的な量子現象に関する深い洞察を与え、複雑な物質やシステムの理解を深める。しかし、従来の順序パラメータが存在しない場合の量子相転移の同定は大きな課題となる。そこで本研究では,RFSベクトル場を用いて様々な量子系の位相図を構築し,その実測値を用いて確立されたモデルの位相図を再現する効果を実証する。そこで本研究では,ANNNI(Axial Next Nearest Neighbour Interaction)モデルに適した順序パラメータを同定することにより,与えられた量子モデルに必要な順序パラメータを探索し,その能力を示す新しい手法を提案する。有限サイズのスケーリングとともに観測対象を固有プロジェクタに分解することを含む解析により,本手法が順序パラメータを決定できることを確認した。 Quantum phase transitions offer profound insights into fundamental quantum phenomena and enhance our understanding of complex materials and systems. However, identifying quantum phase transitions in the absence of conventional order parameters poses a significant challenge. To address this, we utilize reduced fidelity susceptibility (RFS) vector field to construct phase diagrams of various quantum systems and then demonstrate its efficacy in reproducing the phase diagrams of established models with known order parameter. To this end, we propose a new method for discovering the necessary order parameters for a given quantum model and illustrate its capability by identifying a suitable order parameter for the Axial Next Nearest Neighbour Interaction (ANNNI) Model. Our analysis, which includes decomposing the observable into its eigen-projectors alongside the finite-size scaling, confirms that our method successfully can determine order parameters and thus its capable of characterizing quantum phase transitions.	翻訳日:2024-08-05 12:38:29 公開日:2024-08-02
# 事前学習型言語モデルによる決定変換器のFew-shot Prompt能力の向上 Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer ( http://arxiv.org/abs/2408.01402v1 ) ライセンス: Link先を確認	Yu Yang, Pan Xu,	(参考訳) Decision Transformer (DT)は、オフライン強化学習(RL)タスクにおける有望なアルゴリズムのクラスとして登場し、事前にコンパイルされたデータセットと長いシーケンスをモデル化するTransformerの機能を活用している。近年の研究では、DTのプロンプトとしてトレーニングタスクからトラジェクトリの一部を使用することで、目に見えないタスクのパフォーマンスが向上し、Prompt-DTメソッドがもたらされることが示されている。しかし、特定の環境からデータを集めることは、多くのシナリオにおいてコストがかかり、安全ではない。さらに、事前トレーニングで使用される限られたデータセットは、Prompt-DTタイプのメソッドがプロンプトだけで様々なRLタスクを区別することを困難にしている。これらの課題に対処するために,メタRLタスクに事前学習された言語モデルを活用し,ローランク適応(LoRA)を用いてモデルを微調整するLanguage Model-initialized Prompt Decision Transformer (LPDT)を導入する。我々はさらに、プロンプト特徴表現に基づくタスクを効果的に区別するために、プロンプト正規化を取り入れている。我々のアプローチは、事前訓練された言語モデルとRLタスクをシームレスに統合する。事前学習した言語モデルによる初期化は、ベースライン手法と比較して、目に見えないタスクにおけるPrompt-DTの性能を著しく向上させる。 Decision Transformer (DT) has emerged as a promising class of algorithms in offline reinforcement learning (RL) tasks, leveraging pre-collected datasets and Transformer's capability to model long sequences. Recent works have demonstrated that using parts of trajectories from training tasks as prompts in DT enhances its performance on unseen tasks, giving rise to Prompt-DT methods. However, collecting data from specific environments can be both costly and unsafe in many scenarios, leading to suboptimal performance and limited few-shot prompt abilities due to the data-hungry nature of Transformer-based models. Additionally, the limited datasets used in pre-training make it challenging for Prompt-DT type of methods to distinguish between various RL tasks through prompts alone. To address these challenges, we introduce the Language model-initialized Prompt Decision Transformer (LPDT), which leverages pre-trained language models for meta-RL tasks and fine-tunes the model using Low-rank Adaptation (LoRA). We further incorporate prompt regularization to effectively differentiate between tasks based on prompt feature representations. Our approach integrates pre-trained language model and RL tasks seamlessly. Extensive empirical studies demonstrate that initializing with a pre-trained language model significantly enhances the performance of Prompt-DT on unseen tasks compared to baseline methods.	翻訳日:2024-08-05 12:38:29 公開日:2024-08-02
# 行列計算を用いたグラフ畳み込みネットワークのバックプロパゲーションの導出と説明可能な人工知能への応用 Derivation of Back-propagation for Graph Convolutional Networks using Matrix Calculus and its Application to Explainable Artificial Intelligence ( http://arxiv.org/abs/2408.01408v1 ) ライセンス: Link先を確認	Yen-Che Hsiao, Rongting Yue, Abhishek Dutta,	(参考訳) 本稿では,行列計算を用いたグラフ畳み込みニューラルネットワークのバックプロパゲーションアルゴリズムの包括的,詳細な導出を行う。導出は任意の要素単位の活性化関数と任意の数の層を含むように拡張される。この研究はノード分類とリンク予測という2つの基本的な問題に対処する。そこで本手法をリバースモード自動微分と比較した。提案手法を逆モード自動微分法と比較した場合, 重み行列の正方形誤差の中央値が10^{-18}$から10^{-14}$の範囲に収まることを示した。これらの結果は,Zacharyの空手部ソーシャルネットワークにおけるノード分類問題と薬物・薬物相互作用ネットワークにおけるリンク予測問題に適用した5層グラフ畳み込みネットワークの実験から得られる。最後に、導出されたクローズドフォームソリューションが、説明可能なAIと感度分析の開発をいかに促進するかを示す。 This paper provides a comprehensive and detailed derivation of the backpropagation algorithm for graph convolutional neural networks using matrix calculus. The derivation is extended to include arbitrary element-wise activation functions and an arbitrary number of layers. The study addresses two fundamental problems, namely node classification and link prediction. To validate our method, we compare it with reverse-mode automatic differentiation. The experimental results demonstrate that the median sum of squared errors of the updated weight matrices, when comparing our method to the approach using reverse-mode automatic differentiation, falls within the range of $10^{-18}$ to $10^{-14}$. These outcomes are obtained from conducting experiments on a five-layer graph convolutional network, applied to a node classification problem on Zachary's karate club social network and a link prediction problem on a drug-drug interaction network. Finally, we show how the derived closed-form solution can facilitate the development of explainable AI and sensitivity analysis.	翻訳日:2024-08-05 12:38:29 公開日:2024-08-02
# 条件付きロラパラメータ生成 Conditional LoRA Parameter Generation ( http://arxiv.org/abs/2408.01415v1 ) ライセンス: Link先を確認	Xiaolong Jin, Kai Wang, Dongwen Tang, Wangbo Zhao, Yukun Zhou, Junshu Tang, Yang You,	(参考訳) 生成モデルは、画像、ビデオ、テキストドメインにおいて顕著な成功を収めた。これに触発された研究者らは、生成モデルを利用してニューラルネットワークパラメータを生成することを検討している。しかし、これらの取り組みは、パラメータサイズと高性能パラメータを生成する実用性によって制限されている。本稿では,制御可能な高性能パラメータ生成の実現可能性を示す新しいアプローチであるCOND P-DIFFを提案する。具体的には、パラメータの効率的な潜在表現を抽出するためにオートエンコーダを用いる。次に条件付き潜在拡散モデルを訓練し、特定のタスク条件に基づいてランダムノイズから高性能モデルパラメータを合成する。コンピュータビジョンと自然言語処理の両領域における実験結果から, COND P-DIFF が与えられたタスクに条件付き高性能なパラメータを生成できることが一貫して示されている。また, COND P-DIFF によって生成されるパラメータ分布は, 正規最適化法で得られたパラメータ分布と差があり, ある程度の一般化能力を示す。我々の研究は、条件駆動パラメータ生成のさらなる探求の道を開き、ニューラルネットワークのタスク固有の適応のための有望な方向を提供する。 Generative models have achieved remarkable success in image, video, and text domains. Inspired by this, researchers have explored utilizing generative models to generate neural network parameters. However, these efforts have been limited by the parameter size and the practicality of generating high-performance parameters. In this paper, we propose COND P-DIFF, a novel approach that demonstrates the feasibility of controllable high-performance parameter generation, particularly for LoRA (Low-Rank Adaptation) weights, during the fine-tuning process. Specifically, we employ an autoencoder to extract efficient latent representations for parameters. We then train a conditional latent diffusion model to synthesize high-performing model parameters from random noise based on specific task conditions. Experimental results in both computer vision and natural language processing domains consistently demonstrate that COND P-DIFF can generate high-performance parameters conditioned on the given task. Moreover, we observe that the parameter distribution generated by COND P-DIFF exhibits differences compared to the distribution obtained through normal optimization methods, indicating a certain level of generalization capability. Our work paves the way for further exploration of condition-driven parameter generation, offering a promising direction for task-specific adaptation of neural networks.	翻訳日:2024-08-05 12:38:29 公開日:2024-08-02
# 正しい媒介者への探求 : 因果的解釈可能性の歴史, 調査, 理論的根拠 The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability ( http://arxiv.org/abs/2408.01416v1 ) ライセンス: Link先を確認	Aaron Mueller, Jannik Brinkmann, Millicent Li, Samuel Marks, Koyena Pal, Nikhil Prakash, Can Rager, Aruna Sankaranarayanan, Arnab Sen Sharma, Jiuding Sun, Eric Todd, David Bau, Yonatan Belinkov,	(参考訳) 解釈可能性(Interpretability)は、ニューラルネットワークが特定の方法でどのように振る舞うかを理解するためのツールセットを提供する。しかし、この分野には統一性はほとんどなく、ほとんどの研究はアドホックな評価を採用し、理論的な基礎を共有していないため、進歩を測り、異なる技術の長所と短所を比較することは困難である。さらに、機械的理解は頻繁に議論されるが、これらのメカニズムの基礎となる基本的な因果単位は明確に定義されないことが多い。本稿では,因果媒介分析に基づく解釈可能性研究の視点を提案する。具体的には、採用された因果単位(メディエーター)の種類に応じて分類された解釈可能性の歴史と現状、および仲介者を探索する手法について述べる。各メディエータの長所と短所について議論し、特定の種類のメディエータと探索手法が与えられた研究の目的によって最も適している時期についての洞察を提供する。このフレーミングは、この分野のより密集した物語と、将来の仕事に対する実用的な洞察をもたらすと我々は主張する。具体的には、人間の解釈可能性と計算効率のトレードオフが良く、ニューラルネットワークからより洗練された抽象化を発見できる新しいメディエータの発見に重点を置くことを推奨する。我々はまた、特定の因果単位が特定のユースケースに適しているかどうかをよりよく理解できるように、仲介者型間での原則的比較を可能にする、より標準化された評価についても論じる。 Interpretability provides a toolset for understanding how and why neural networks behave in certain ways. However, there is little unity in the field: most studies employ ad-hoc evaluations and do not share theoretical foundations, making it difficult to measure progress and compare the pros and cons of different techniques. Furthermore, while mechanistic understanding is frequently discussed, the basic causal units underlying these mechanisms are often not explicitly defined. In this paper, we propose a perspective on interpretability research grounded in causal mediation analysis. Specifically, we describe the history and current state of interpretability taxonomized according to the types of causal units (mediators) employed, as well as methods used to search over mediators. We discuss the pros and cons of each mediator, providing insights as to when particular kinds of mediators and search methods are most appropriate depending on the goals of a given study. We argue that this framing yields a more cohesive narrative of the field, as well as actionable insights for future work. Specifically, we recommend a focus on discovering new mediators with better trade-offs between human-interpretability and compute-efficiency, and which can uncover more sophisticated abstractions from neural networks than the primarily linear mediators employed in current work. We also argue for more standardized evaluations that enable principled comparisons across mediator types, such that we can better understand when particular causal units are better suited to particular use cases.	翻訳日:2024-08-05 12:38:29 公開日:2024-08-02
# 会話の低さ, 対話性の向上: マルチモーダルLLMにおける文脈内会話適応の評価 Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs ( http://arxiv.org/abs/2408.01417v1 ) ライセンス: Link先を確認	Yilun Hua, Yoav Artzi,	(参考訳) 人間は、アドホックな慣習を適応し形成することによって、対話の進行に伴って、より効率的な言語を自然に利用する。この現象は参照ゲームを用いて広く研究され、リレー意図を超えた人間の言語の性質を示している。マルチモーダル・大規模言語モデル(MLLM)が通信の効率を向上するかどうか、またその目的のためにどのようなメカニズムを採用するのかは、まだ解明されていない。本稿では,MLLMにおける文脈内行動としての会話適応を自動評価するフレームワークICCAを紹介する。我々は、最先端のMLLMを評価し、それらのインターロケータの効率的な言語がますます理解されるかもしれないが、時間とともに自国の言語をより効率的にするわけではないことを観察する。この後者の能力は、一部のモデル(例えば、GPT-4)でのみ引き出せる。これは、この言語相互作用の性質が、現在の訓練体制から生じるものではないことを示している。 ICCAはhttps://github.com/lil-lab/ICCAで入手できる。 Humans spontaneously use increasingly efficient language as interactions progress, by adapting and forming ad-hoc conventions. This phenomenon has been studied extensively using reference games, showing properties of human language that go beyond relaying intents. It remains unexplored whether multimodal large language models (MLLMs) similarly increase communication efficiency during interactions, and what mechanisms they may adopt for this purpose. We introduce ICCA, an automated framework to evaluate such conversational adaptation as an in-context behavior in MLLMs. We evaluate several state-of-the-art MLLMs, and observe that while they may understand the increasingly efficient language of their interlocutor, they do not spontaneously make their own language more efficient over time. This latter ability can only be elicited in some models (e.g., GPT-4) with heavy-handed prompting. This shows that this property of linguistic interaction does not arise from current training regimes, even though it is a common hallmark of human language. ICCA is available at https://github.com/lil-lab/ICCA.	翻訳日:2024-08-05 12:38:29 公開日:2024-08-02
# DebateQA: 議論可能な知識に基づく質問応答の評価 DebateQA: Evaluating Question Answering on Debatable Knowledge ( http://arxiv.org/abs/2408.01419v1 ) ライセンス: Link先を確認	Rongwu Xu, Xuan Qi, Zehan Qi, Wei Xu, Zhijiang Guo,	(参考訳) 大規模言語モデル (LLMs) の台頭により,LLMチャットボット上での本質的に議論の余地のある質問に対する回答を求めることができ,その能力を評価する上で信頼性の高い方法が必要になった。しかし、従来のQAベンチマークでは、この目的のために固定された答えが不十分であると仮定している。この問題に対処するために、DebateQAという2,941の難解な質問のデータセットを導入し、それぞれに様々な視点を捉えた複数の人手による部分的な回答を添えた。本研究では,視点の包括性を評価するパースペクティブ・ダイバーシティと,LLMが疑問の解答性を認めているかどうかを評価するディスパプト・アウェアネスの2つの指標を開発する。実験では、両方のメトリクスが人間の好みと一致し、異なる基盤モデルで安定していることが示されている。 DebateQAと2つのメトリクスを用いて、12の人気のあるLCMと検索拡張生成手法を評価する。以上の結果から, LLMは概して, 問題点の認識に優れるが, 多様な視点を包含する包括的回答を提供する能力は, かなり異なることが明らかとなった。 The rise of large language models (LLMs) has enabled us to seek answers to inherently debatable questions on LLM chatbots, necessitating a reliable way to evaluate their ability. However, traditional QA benchmarks assume fixed answers are inadequate for this purpose. To address this, we introduce DebateQA, a dataset of 2,941 debatable questions, each accompanied by multiple human-annotated partial answers that capture a variety of perspectives. We develop two metrics: Perspective Diversity, which evaluates the comprehensiveness of perspectives, and Dispute Awareness, which assesses if the LLM acknowledges the question's debatable nature. Experiments demonstrate that both metrics align with human preferences and are stable across different underlying models. Using DebateQA with two metrics, we assess 12 popular LLMs and retrieval-augmented generation methods. Our findings reveal that while LLMs generally excel at recognizing debatable issues, their ability to provide comprehensive answers encompassing diverse perspectives varies considerably.	翻訳日:2024-08-05 12:38:29 公開日:2024-08-02
# ミッション・インポッシブル: 脱獄 LLM の統計的展望 Mission Impossible: A Statistical Perspective on Jailbreaking LLMs ( http://arxiv.org/abs/2408.01420v1 ) ライセンス: Link先を確認	Jingtong Su, Julia Kempe, Karen Ullrich,	(参考訳) 大規模言語モデル(LLM)は、限られた品質制御を伴う大量のテキストデータに基づいて訓練される。結果として、LLMは、情報漏洩、偽ニュース、ヘイトスピーチなどの意図しないまたは有害な行動を示すことができる。プライオリティアライメント(英語版)と呼ばれる対策には、所望の振る舞いを注意深く記述したテキスト例で事前訓練されたLLMを微調整することが含まれる。それでも実証的な証拠は、好ましく整列されたLSMは有害な行動に誘惑される可能性があることを示している。いわゆるLDMのジェイルブレイクは、典型的にはLSMへの入力プロンプトの逆修正によって達成される。本稿は、統計的観点から、嗜好調整と脱獄現象に関する理論的知見を提供する。まず,事前学習したLLMが,トレーニングコーパスに存在すると有害な行動を模倣することを示す。同じ枠組みの下で、統計的にアライメントの概念を導入し、ジェイルブレイクの確率を低くし、合理的な仮定では防止できないことを示す。そこで本研究では,現在普及しているアライメント戦略RLHFの変更を提案する。具体的には、安全応答の可能性を高めることを目的とした、E-RLHFと呼ばれるRLHFの目的に対する簡単な修正を導入する。 E-RLHFは追加のトレーニングコストを伴わず、他の方法と互換性がある。 MT-Benchプロジェクトのモデル性能を犠牲にすることなく,AdvBenchとHarmBenchプロジェクトのアライメント問題に対して,E-RLHFがRLHFより優れていることを示す。 Large language models (LLMs) are trained on a deluge of text data with limited quality control. As a result, LLMs can exhibit unintended or even harmful behaviours, such as leaking information, fake news or hate speech. Countermeasures, commonly referred to as preference alignment, include fine-tuning the pretrained LLMs with carefully crafted text examples of desired behaviour. Even then, empirical evidence shows preference aligned LLMs can be enticed to harmful behaviour. This so called jailbreaking of LLMs is typically achieved by adversarially modifying the input prompt to the LLM. Our paper provides theoretical insights into the phenomenon of preference alignment and jailbreaking from a statistical perspective. Under our framework, we first show that pretrained LLMs will mimic harmful behaviour if present in the training corpus. Under that same framework, we then introduce a statistical notion of alignment, and lower-bound the jailbreaking probability, showing that it is unpreventable under reasonable assumptions. Based on our insights, we propose an alteration to the currently prevalent alignment strategy RLHF. Specifically, we introduce a simple modification to the RLHF objective, we call E-RLHF, that aims to increase the likelihood of safe responses. E-RLHF brings no additional training cost, and is compatible with other methods. Empirically, we demonstrate that E-RLHF outperforms RLHF on all alignment problems put forward by the AdvBench and HarmBench project without sacrificing model performance as measured by the MT-Bench project.	翻訳日:2024-08-05 12:38:29 公開日:2024-08-02
# Prompt Recursive Search: LLMオートプロンプティングにおける適応的成長を伴うリビングフレームワーク Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting ( http://arxiv.org/abs/2408.01423v1 ) ライセンス: Link先を確認	Xiangyu Zhao, Chengqian Ma,	(参考訳) 大規模言語モデル(LLM)は、自然言語処理(NLP)ドメイン内の様々なタスクに対処する際、顕著な熟練度を示す。しかし、これらのプロンプトは有益であるが、それぞれに固有の制限がある。 CoT(Chain of Thought)によって実証された最初のプロンプトは、個々のデータセットに固有のプロンプトを手作業で作成するので、Expert-Designed Prompts(EDPs)と呼ばれる。これらのプロンプトが確立されると、それらは変更不可能となり、その効果は人間のデザイナーの専門知識によって制限される。 LLMに適用すると、EDPの静的な性質は、同じデータセット内の単純かつ複雑な問題に対して均一なアプローチをもたらす。第2の方法は、LDP(LDM-Derived Prompts)として知られるLDMによって自律的に生成されるプロンプトで、EDPの制限を緩和し、特定の問題に対する適切な解決策を提供する。しかし、LCPは、ソリューション計画プロセス中にエラーが蓄積する可能性があるため、複雑な問題に取り組む際に性能が低下する可能性がある。これらの課題に対処するため, LLM を利用した新しい Prompt Recursive Search (PRS) フレームワークを考案した。このフレームワークには、問題複雑性と調整可能な構造の評価が含まれており、エラーの可能性の低減が保証されている。我々は、様々な領域のデータセットのスペクトルに異なるパラメータを持つLSMを用いた広範囲な実験により、PSSフレームワークの有効性を実証した。 CoT法と比較して, PRS法は, Llama3-7Bモデルを用いてBBHデータセットの精度を8%向上し, 22%の改善を実現した。 Large Language Models (LLMs) exhibit remarkable proficiency in addressing a diverse array of tasks within the Natural Language Processing (NLP) domain, with various prompt design strategies significantly augmenting their capabilities. However, these prompts, while beneficial, each possess inherent limitations. The primary prompt design methodologies are twofold: The first, exemplified by the Chain of Thought (CoT), involves manually crafting prompts specific to individual datasets, hence termed Expert-Designed Prompts (EDPs). Once these prompts are established, they are unalterable, and their effectiveness is capped by the expertise of the human designers. When applied to LLMs, the static nature of EDPs results in a uniform approach to both simple and complex problems within the same dataset, leading to the inefficient use of tokens for straightforward issues. The second method involves prompts autonomously generated by the LLM, known as LLM-Derived Prompts (LDPs), which provide tailored solutions to specific problems, mitigating the limitations of EDPs. However, LDPs may encounter a decline in performance when tackling complex problems due to the potential for error accumulation during the solution planning process. To address these challenges, we have conceived a novel Prompt Recursive Search (PRS) framework that leverages the LLM to generate solutions specific to the problem, thereby conserving tokens. The framework incorporates an assessment of problem complexity and an adjustable structure, ensuring a reduction in the likelihood of errors. We have substantiated the efficacy of PRS framework through extensive experiments using LLMs with different numbers of parameters across a spectrum of datasets in various domains. Compared to the CoT method, the PRS method has increased the accuracy on the BBH dataset by 8% using Llama3-7B model, achieving a 22% improvement.	翻訳日:2024-08-05 12:38:29 公開日:2024-08-02
# 分散量子コンピューティングのための一般化回路分割 Generalised Circuit Partitioning for Distributed Quantum Computing ( http://arxiv.org/abs/2408.01424v1 ) ライセンス: Link先を確認	Felix Burt, Kuan-Cheng Chen, Kin Leung,	(参考訳) 分散量子コンピューティング(DQC)は、より小さな量子処理ユニット(QPU)の相互接続による量子コンピューティングのスケールアップを目的とした新しいパラダイムである。共有エンタングルメントは、QPU間の状態とゲートの両方をテレポーテーションする。これは量子処理能力の魅力的な水平スケーリングをもたらし、絡み合い共有プロトコルによってもたらされる追加時間とノイズを犠牲にしている。したがって、量子回路を複数のQPUに分割する方法は、分散QPU間の絡み合いに基づく通信量を最小化することを目的としている。既存のプロトコルは、どちらも同時にではなく、QPU間の操作をカバーするために、ゲートテレポーテーションや状態テレポーテーションの絡み合いコストの最適化に重点を置いている。この問題の最も一般的な形態は、同じ足場におけるゲートと状態のテレポーテーションを扱い、この2つの組み合わせによる最小コストの回路分割を可能にすることである。本研究は,共通資源を用いてゲートをグループ化して配布するゲートテレポーテーションの拡張を含む,ゲートと状態テレポーテーションの同時最適化を可能にするグラフベースの定式化を導入する。この定式化により、様々な回路タイプに対して低eビットのコストが許される。基本的遺伝的アルゴリズムを用いて、平均eビットコストと時間スケーリングの両方の観点から、最先端手法よりも優れた性能が得られる。 Distributed quantum computing (DQC) is a new paradigm aimed at scaling up quantum computing via the interconnection of smaller quantum processing units (QPUs). Shared entanglement allows teleportation of both states and gates between QPUs. This leads to an attractive horizontal scaling of quantum processing power, which comes at the expense of the additional time and noise introduced by entanglement sharing protocols. Consequently, methods for partitioning quantum circuits across multiple QPUs should aim to minimise the amount of entanglement-based communication required between distributed QPUs. Existing protocols tend to focus primarily on optimising entanglement costs for gate teleportation or state teleportation to cover operations between QPUs, rather than both at the same time. The most general form of the problem should treat gate and state teleportation on the same footing, allowing minimal cost circuit partitions through a combination of the two. This work introduces a graph-based formulation which allows joint optimisation of gate and state teleportation cost, including extensions of gate teleportation which group gates together for distribution using common resources. The formulation permits low e-bit cost for a variety of circuit types. Using a basic genetic algorithm, improved performance over state-of-the-art methods is obtained in terms of both average e-bit cost and time scaling.	翻訳日:2024-08-05 12:38:29 公開日:2024-08-02
# 『良きボットが常に限界を知る』:機械の自己自信による自律的なシステム決定能力の評価 "A Good Bot Always Knows Its Limitations": Assessing Autonomous System Decision-making Competencies through Factorized Machine Self-confidence ( http://arxiv.org/abs/2407.19631v2 ) ライセンス: Link先を確認	Brett Israelsen, Nisar R. Ahmed, Matthew Aitken, Eric W. Frew, Dale A. Lawrence, Brian M. Argrow,	(参考訳) インテリジェントマシンは、タスク完了時の能力を評価するにはどうすればよいか? この問題は、アルゴリズムで推論し、不確実性の下で決定する自律システムに焦点が当てられている。ここでは、機械の自信 - エージェントの世界の状況とそれ自身に関する知識の自己評価に基づくメタ推論の形式、およびタスクの推論と実行の能力 - が、エージェントに多くの卓越した計算可能で有用な能力指標をもたらすと論じられている。本稿では,この概念を計算フレームワークFaMSeC(Factized Machine Self-confidence)の形で実現し,アルゴリズム決定過程を駆動する因子を包括的に記述した上で,結果評価,解法品質,モデル品質,アライメント品質,過去の経験を述べる。 FaMSeCでは、自己信頼指標はマルコフ決定プロセスのような確率的意思決定アルゴリズムの幅広いクラスに埋め込まれた階層的な「確率問題解決統計」から導かれる。本発明の問題解決統計は、情報提供者(例えば、非熟練者またはエキスパートシステム設計者)によって、各種意思決定能力要因のそれぞれに規定される与えられた、与えられた能力基準に対する確率的超越マージンを評価して評価する。このアプローチは、「適合のアルゴリズム的良さ」の評価を、人間の解釈可能な能力自己評価レポートという形で、多種多様な自律エージェントの設計に容易に組み込むことを可能にする。マルコフ決定プロセスエージェントの詳細な説明と応用例は、メタユーティリティ関数、行動シミュレーション、代理予測モデルを用いて、可能なタスクコンテキストに対して2つのFaMSeC因子(アウトカムアセスメントと解法品質)を計算し、レポートする方法を示している。 How can intelligent machines assess their competencies in completing tasks? This question has come into focus for autonomous systems that algorithmically reason and make decisions under uncertainty. It is argued here that machine self-confidence - a form of meta-reasoning based on self-assessments of an agent's knowledge about the state of the world and itself, as well as its ability to reason about and execute tasks - leads to many eminently computable and useful competency indicators for such agents. This paper presents a culmination of work on this concept in the form of a computational framework called Factorized Machine Self-confidence (FaMSeC), which provides a holistic engineering-focused description of factors driving an algorithmic decision-making process, including: outcome assessment, solver quality, model quality, alignment quality, and past experience. In FaMSeC, self confidence indicators are derived from hierarchical `problem-solving statistics' embedded within broad classes of probabilistic decision-making algorithms such as Markov decision processes. The problem-solving statistics are obtained by evaluating and grading probabilistic exceedance margins with respect to given competency standards, which are specified for each of the various decision-making competency factors by the informee (e.g. a non-expert user or an expert system designer). This approach allows `algorithmic goodness of fit' evaluations to be easily incorporated into the design of many kinds of autonomous agents in the form of human-interpretable competency self-assessment reports. Detailed descriptions and application examples for a Markov decision process agent show how two of the FaMSeC factors (outcome assessment and solver quality) can be computed and reported for a range of possible tasking contexts through novel use of meta-utility functions, behavior simulations, and surrogate prediction models.	翻訳日:2024-08-05 12:28:46 公開日:2024-08-02
# 自己推論による検索言語モデルの改善 Improving Retrieval Augmented Language Model with Self-Reasoning ( http://arxiv.org/abs/2407.19813v2 ) ライセンス: Link先を確認	Yuan Xia, Jingbo Zhou, Zhenhui Shi, Jun Chen, Haifeng Huang,	(参考訳) Retrieval-Augmented Language Model (RALM) は、大規模言語モデル(LLM)で継承された事実幻覚を緩和する推論中に外部知識を取り入れることで、知識集約的なタスクにおいて顕著なパフォーマンスを示した。これらの進歩にもかかわらず、ALMの実装には特に信頼性とトレーサビリティに関する課題が続いている。具体的には、無関係な文書検索は、LLMの性能を損なう、あるいは悪化させる可能性があるが、生成した出力における適切な引用の欠如は、モデルの信頼性を検証する努力を複雑にする。そこで本研究では,ALMの信頼性とトレーサビリティ向上を目的とした自己推論フレームワークを提案する。このフレームワークは、関連性を認識したプロセス、エビデンスを認識した選択プロセス、軌跡解析プロセスの3つのプロセスで自己推論軌道を構築することを含む。我々は4つの公開データセット(ショートフォームQAデータセット2つ、ロングフォームQAデータセット1つ、ファクト検証データセット1つ)にまたがってフレームワークを評価し、既存の最先端モデルより優れ、GPT-4と同等のパフォーマンスを達成でき、2000のトレーニングサンプルのみを使用しながら、我々の手法の優位性を実証した。 The Retrieval-Augmented Language Model (RALM) has shown remarkable performance on knowledge-intensive tasks by incorporating external knowledge during inference, which mitigates the factual hallucinations inherited in large language models (LLMs). Despite these advancements, challenges persist in the implementation of RALMs, particularly concerning their reliability and traceability. To be specific, the irrelevant document retrieval may result in unhelpful response generation or even deteriorate the performance of LLMs, while the lack of proper citations in generated outputs complicates efforts to verify the trustworthiness of the models. To this end, we propose a novel self-reasoning framework aimed at improving the reliability and traceability of RALMs, whose core idea is to leverage reasoning trajectories generated by the LLM itself. The framework involves constructing self-reason trajectories with three processes: a relevance-aware process, an evidence-aware selective process, and a trajectory analysis process. We have evaluated our framework across four public datasets (two short-form QA datasets, one long-form QA dataset, and one fact verification dataset) to demonstrate the superiority of our method, which can outperform existing state-of-art models and can achieve comparable performance with GPT-4, while only using 2,000 training samples.	翻訳日:2024-08-05 12:28:46 公開日:2024-08-02
# MSMA:マルチソースデータ統合による連結・自律走行環境におけるマルチエージェント軌道予測 MSMA: Multi-agent Trajectory Prediction in Connected and Autonomous Vehicle Environment with Multi-source Data Integration ( http://arxiv.org/abs/2407.21310v2 ) ライセンス: Link先を確認	Xi Chen, Rahul Bhadani, Zhanbo Sun, Larry Head,	(参考訳) 衝突のない経路計画には周囲の車両軌道の予測が不可欠である。本研究では、自律走行車(AV)、連結車両(CV)、人力車両(HDV)からなる周囲の交通を知覚するために、センサと通信技術の両方を利用して、コネクテッド・自律走行車(CAV)が中心となるシナリオに焦点を当てた。我々の軌道予測タスクは、検出された周辺車両すべてを対象としている。センサと通信技術の両方のマルチソースデータを効果的に統合するために,マルチソースデータ融合のためのクロスアテンションモジュールを用いたMSMAと呼ばれるディープラーニングフレームワークを提案する。ベクトルマップデータを用いてコンテキスト情報を提供する。軌道データセットは、合成データエラーを導入したCARLAシミュレータで収集される。数値実験により、混在した交通流のシナリオにおいて、異なるソースからのデータの統合が環境の理解を高めることが示されている。これは特にCV市場浸透率の高い状況において、軌道予測精度を著しく向上させる。コードは、https://github.com/xichennn/MSMA.comで入手できる。 The prediction of surrounding vehicle trajectories is crucial for collision-free path planning. In this study, we focus on a scenario where a connected and autonomous vehicle (CAV) serves as the central agent, utilizing both sensors and communication technologies to perceive its surrounding traffics consisting of autonomous vehicles (AVs), connected vehicles (CVs), and human-driven vehicles (HDVs). Our trajectory prediction task is aimed at all the detected surrounding vehicles. To effectively integrate the multi-source data from both sensor and communication technologies, we propose a deep learning framework called MSMA utilizing a cross-attention module for multi-source data fusion. Vector map data is utilized to provide contextual information. The trajectory dataset is collected in CARLA simulator with synthesized data errors introduced. Numerical experiments demonstrate that in a mixed traffic flow scenario, the integration of data from different sources enhances our understanding of the environment. This notably improves trajectory prediction accuracy, particularly in situations with a high CV market penetration rate. The code is available at: https://github.com/xichennn/MSMA.	翻訳日:2024-08-05 12:28:46 公開日:2024-08-02
# ディスタングル3次元シーン表現による新しい視点からの今後の映像の予測 Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation ( http://arxiv.org/abs/2407.21450v2 ) ライセンス: Link先を確認	Sudhir Yarram, Junsong Yuan,	(参考訳) 空間と時間の映像補間(VEST)により、視聴者は未来への3Dシーンを予測し、新しい視点から見ることができる。近年の手法では,各シーン層における簡易なアフィンの動きとホモグラフィに基づくワープを仮定しながら,階層化されたシーン形状,動き予測,新しいビュー合成をモデル化することを目的として,絡み合った表現を学習し,不正確な映像外挿を実現する。シーン表現やレンダリングを絡める代わりに、2Dシーンを3Dポイントの雲に持ち上げることで、シーンの動きからシーンの幾何学を解き放つことで、新しい視点からの映像の高品質なレンダリングを可能にします。将来の3Dシーン動作をモデル化するために,まず自我運動を予測し,その後動的物体(車,人など)の残留運動を予測する2段階のアンタングル手法を提案する。このアプローチは、動的物体運動との絡み合いから不正確な動きを減らし、より正確な動き予測を可能にする。 2つの都市景観データセットの大規模解析により,提案手法の強塩基性と比較して優れた性能を示した。 Video extrapolation in space and time (VEST) enables viewers to forecast a 3D scene into the future and view it from novel viewpoints. Recent methods propose to learn an entangled representation, aiming to model layered scene geometry, motion forecasting and novel view synthesis together, while assuming simplified affine motion and homography-based warping at each scene layer, leading to inaccurate video extrapolation. Instead of entangled scene representation and rendering, our approach chooses to disentangle scene geometry from scene motion, via lifting the 2D scene to 3D point clouds, which enables high quality rendering of future videos from novel views. To model future 3D scene motion, we propose a disentangled two-stage approach that initially forecasts ego-motion and subsequently the residual motion of dynamic objects (e.g., cars, people). This approach ensures more precise motion predictions by reducing inaccuracies from entanglement of ego-motion with dynamic object motion, where better ego-motion forecasting could significantly enhance the visual outcomes. Extensive experimental analysis on two urban scene datasets demonstrate superior performance of our proposed method in comparison to strong baselines.	翻訳日:2024-08-05 12:28:46 公開日:2024-08-02
# ベストファースト総合計画のための並列戦略 Parallel Strategies for Best-First Generalized Planning ( http://arxiv.org/abs/2407.21485v2 ) ライセンス: Link先を確認	Alejandro Fernández-Alburquerque, Javier Segovia-Aguas,	(参考訳) 近年,複数の古典的計画インスタンスを解くことができるアルゴリズム的ソリューションの自動合成を研究するAIの研究分野である,最先端の計画解法と一般化計画(GP)のパフォーマンスギャップを埋めることに対する新たな関心が高まっている。現在の進歩の1つはBest-First Generalized Planning (BFGP) の導入である。BFGPは、現代のプランナーの基礎の一つであるヒューリスティック探索を用いて探索できる新しい解空間に基づくGPアルゴリズムである。本稿では,並列探索手法をBFGPに適用し,性能ギャップを埋める上で重要な要素であることを示す。まず,BFGPが並列化に適している理由と,古典的プランナーとの相違点について論じる。次に,コア数で優れたスケーリングが可能な2つの単純な共有メモリ並列戦略を提案する。 In recent years, there has been renewed interest in closing the performance gap between state-of-the-art planning solvers and generalized planning (GP), a research area of AI that studies the automated synthesis of algorithmic-like solutions capable of solving multiple classical planning instances. One of the current advancements has been the introduction of Best-First Generalized Planning (BFGP), a GP algorithm based on a novel solution space that can be explored with heuristic search, one of the foundations of modern planners. This paper evaluates the application of parallel search techniques to BFGP, another critical component in closing the performance gap. We first discuss why BFGP is well suited for parallelization and some of its differentiating characteristics from classical planners. Then, we propose two simple shared-memory parallel strategies with good scaling with the number of cores.	翻訳日:2024-08-05 12:28:46 公開日:2024-08-02
# フィボナッチ異性体シミュレーションのための最小量子回路 Minimal Quantum Circuits for Simulating Fibonacci Anyons ( http://arxiv.org/abs/2407.21761v2 ) ライセンス: Link先を確認	Sary Bseiso, Joel Pommerening, Richard R. Allen, Steven H. Simon, Layla Hormozi,	(参考訳) フィボナッチ位相順序は普遍位相量子計算の実現の第一候補である。我々は最小の量子回路を考案し、レヴィン=ウェン弦網モデルで実現された二重フィボナッチ位相秩序の非アベリア的性質を実証した。我々の回路は、基底状態を効果的に初期化し、励起を生成し、ねじり、それらを可能な限り最小の格子で編む。さらに、単一量子ビット測定を行うことにより、複数の励起の融合振幅とブレイディング位相を決定する方法も設計する。両立フィボナッチモデルの融合チャネルは3量子ビットのみを用いて検出でき、ツイスト位相は5量子ビットで測定でき、ブレイディングは9量子ビットで示すことができる。これらの設計は、フィボナッチ・アロンの性質を示すための最も単純な設定を提供し、現代の多くの量子アーキテクチャの実装のための現実的な青写真として使用することができる。 The Fibonacci topological order is the prime candidate for the realization of universal topological quantum computation. We devise minimal quantum circuits to demonstrate the non-Abelian nature of the doubled Fibonacci topological order, as realized in the Levin-Wen string net model. Our circuits effectively initialize the ground state, create excitations, twist and braid them, all in the smallest lattices possible. We further design methods to determine the fusion amplitudes and braiding phases of multiple excitations by carrying out a single qubit measurement. We show that the fusion channels of the doubled Fibonacci model can be detected using only three qubits, twisting phases can be measured using five, and braiding can be demonstrated using nine qubits. These designs provide the simplest possible settings for demonstrating the properties of Fibonacci anyons and can be used as realistic blueprints for implementation on many modern quantum architectures.	翻訳日:2024-08-05 12:28:46 公開日:2024-08-02
# インプット・ロバスト強化学習における摂動状態について On the Perturbed States for Transformed Input-robust Reinforcement Learning ( http://arxiv.org/abs/2408.00023v2 ) ライセンス: Link先を確認	Tung M. Luu, Haeyong Kang, Tri Ton, Thanh Nguyen, Chang D. Yoo,	(参考訳) 訓練環境における熟練度を示す強化学習(Reinforcement Learning, RL)エージェントは, 展開中の入力観察において, 敵の摂動に対する脆弱性を示す。これは、実世界のデプロイの前に堅牢なエージェントを構築することの重要性を浮き彫りにする。この課題を軽減するために、事前の作業は堅牢なトレーニングベースの手順の開発に重点を置いており、ディープニューラルネットワークコンポーネントの堅牢性を強化したり、エージェントに強力な攻撃に対する敵のトレーニングを課すような努力を包含している。本研究では,トランスフォーメーション・インプット・ロバスト・RL (Transformed Input-robust RL) と呼ばれる新しい手法を提案する。具体的には、ロバストなRLエージェントの学習に変換に基づく防御を適用するための2つの原則を紹介し、(1)元の状態を再構築するオートエンコーダスタイルのデノケーション、(2)密な変換入力を達成するための有界変換(ビット深さの低減とベクトル量子化(VQ))を提案する。トランスフォーメーションは、ポリシーネットワークに入力する前に、状態に適用されます。複数のMuJoCo環境に対する大規模な実験により、入力変換に基づく防御、すなわちVQは、状態観察におけるいくつかの敵に対して防御することを示した。公式コードはhttps://github.com/tunglm2203/tirlで入手できる。 Reinforcement Learning (RL) agents demonstrating proficiency in a training environment exhibit vulnerability to adversarial perturbations in input observations during deployment. This underscores the importance of building a robust agent before its real-world deployment. To alleviate the challenging point, prior works focus on developing robust training-based procedures, encompassing efforts to fortify the deep neural network component's robustness or subject the agent to adversarial training against potent attacks. In this work, we propose a novel method referred to as Transformed Input-robust RL (TIRL), which explores another avenue to mitigate the impact of adversaries by employing input transformation-based defenses. Specifically, we introduce two principles for applying transformation-based defenses in learning robust RL agents: (1) autoencoder-styled denoising to reconstruct the original state and (2) bounded transformations (bit-depth reduction and vector quantization (VQ)) to achieve close transformed inputs. The transformations are applied to the state before feeding it into the policy network. Extensive experiments on multiple MuJoCo environments demonstrate that input transformation-based defenses, i.e., VQ, defend against several adversaries in the state observations. The official code is available at https://github.com/tunglm2203/tirl	翻訳日:2024-08-05 12:28:46 公開日:2024-08-02
# Gemma 2: 実用規模でオープン言語モデルを改善する Gemma 2: Improving Open Language Models at a Practical Size ( http://arxiv.org/abs/2408.00118v2 ) ライセンス: Link先を確認	Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, Nikola Momchev, Matt Hoffman, Shantanu Thakoor, Jean-Bastien Grill, Behnam Neyshabur, Olivier Bachem, Alanna Walton, Aliaksei Severyn, Alicia Parrish, Aliya Ahmad, Allen Hutchison, Alvin Abdagic, Amanda Carl, Amy Shen, Andy Brock, Andy Coenen, Anthony Laforge, Antonia Paterson, Ben Bastian, Bilal Piot, Bo Wu, Brandon Royal, Charlie Chen, Chintu Kumar, Chris Perry, Chris Welty, Christopher A. Choquette-Choo, Danila Sinopalnikov, David Weinberger, Dimple Vijaykumar, Dominika Rogozińska, Dustin Herbison, Elisa Bandy, Emma Wang, Eric Noland, Erica Moreira, Evan Senter, Evgenii Eltyshev, Francesco Visin, Gabriel Rasskin, Gary Wei, Glenn Cameron, Gus Martins, Hadi Hashemi, Hanna Klimczak-Plucińska, Harleen Batra, Harsh Dhand, Ivan Nardini, Jacinda Mein, Jack Zhou, James Svensson, Jeff Stanway, Jetha Chan, Jin Peng Zhou, Joana Carrasqueira, Joana Iljazi, Jocelyn Becker, Joe Fernandez, Joost van Amersfoort, Josh Gordon, Josh Lipschultz, Josh Newlan, Ju-yeong Ji, Kareem Mohamed, Kartikeya Badola, Kat Black, Katie Millican, Keelin McDonell, Kelvin Nguyen, Kiranbir Sodhia, Kish Greene, Lars Lowe Sjoesund, Lauren Usui, Laurent Sifre, Lena Heuermann, Leticia Lago, Lilly McNealus, Livio Baldini Soares, Logan Kilpatrick, Lucas Dixon, Luciano Martins, Machel Reid, Manvinder Singh, Mark Iverson, Martin Görner, Mat Velloso, Mateo Wirth, Matt Davidow, Matt Miller, Matthew Rahtz, Matthew Watson, Meg Risdal, Mehran Kazemi, Michael Moynihan, Ming Zhang, Minsuk Kahng, Minwoo Park, Mofi Rahman, Mohit Khatwani, Natalie Dao, Nenshad Bardoliwalla, Nesh Devanathan, Neta Dumai, Nilay Chauhan, Oscar Wahltinez, Pankil Botarda, Parker Barnes, Paul Barham, Paul Michel, Pengchong Jin, Petko Georgiev, Phil Culliton, Pradeep Kuppala, Ramona Comanescu, Ramona Merhej, Reena Jana, Reza Ardeshir Rokni, Rishabh Agarwal, Ryan Mullins, Samaneh Saadat, Sara Mc Carthy, Sarah Perrin, Sébastien M. R. Arnold, Sebastian Krause, Shengyang Dai, Shruti Garg, Shruti Sheth, Sue Ronstrom, Susan Chan, Timothy Jordan, Ting Yu, Tom Eccles, Tom Hennigan, Tomas Kocisky, Tulsee Doshi, Vihan Jain, Vikas Yadav, Vilobh Meshram, Vishal Dharmadhikari, Warren Barkley, Wei Wei, Wenming Ye, Woohyun Han, Woosuk Kwon, Xiang Xu, Zhe Shen, Zhitao Gong, Zichuan Wei, Victor Cotruta, Phoebe Kirk, Anand Rao, Minh Giang, Ludovic Peran, Tris Warkentin, Eli Collins, Joelle Barral, Zoubin Ghahramani, Raia Hadsell, D. Sculley, Jeanine Banks, Anca Dragan, Slav Petrov, Oriol Vinyals, Jeff Dean, Demis Hassabis, Koray Kavukcuoglu, Clement Farabet, Elena Buchatskaya, Sebastian Borgeaud, Noah Fiedel, Armand Joulin, Kathleen Kenealy, Robert Dadashi, Alek Andreev,	(参考訳) 本稿では、Gemma 2を紹介します。これは、20億から27億のパラメータのスケールで、軽量で最先端のオープンモデルのGemmaファミリに新たに追加されたものです。本稿では,トランスフォーマーアーキテクチャにいくつかの技術的変更を加え,例えば,局所的言語的注意(Beltagy et al , 2020a)とグループクエリ的注意(Ainslie et al , 2023)をインターリーブする。また、次のトークン予測の代わりに、2Bおよび9Bモデルを知識蒸留(Hinton et al , 2015)で訓練する。結果として得られたモデルは、そのサイズで最高のパフォーマンスを提供し、さらに2～3倍の大きさのモデルに対して、競争力のある代替手段を提供する。すべてのモデルをコミュニティにリリースします。 In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.	翻訳日:2024-08-05 12:28:46 公開日:2024-08-02
# 協調運転における多視点データ統合によるコンフォーマル軌道予測 Conformal Trajectory Prediction with Multi-View Data Integration in Cooperative Driving ( http://arxiv.org/abs/2408.00374v2 ) ライセンス: Link先を確認	Xi Chen, Rahul Bhadani, Larry Head,	(参考訳) 軌道予測に関する現在の研究は、主にエゴ車両の搭載センサーによって収集されたデータに依存している。車両間通信 (V2V) や車両間通信 (V2I) などの接続技術が急速に進歩し, 無線ネットワークを介して, 代替ビューからの貴重な情報にアクセスできるようになる。オルタナティブ・ビューからの情報の統合は、オクルージョンや限られた視野のような単一の視点に関連する固有の制限を克服する可能性がある。本稿では,既存のシングルビューモデルを拡張してマルチビューデータをモデル化する新しいトラジェクトリ予測フレームワークであるV2INetを紹介する。マルチビューデータを手動で融合したり、個別のトレーニング段階として定式化したりする従来のアプローチとは異なり、当社のモデルはエンドツーエンドのトレーニングをサポートし、柔軟性とパフォーマンスを両立させる。さらに、予測されたマルチモーダル軌道は、ポストホック共形予測モジュールによって校正され、有効かつ効率的な信頼領域を得る。実世界のV2IデータセットであるV2X-Seqを用いて,フレームワーク全体の評価を行った。以上の結果から,FDE(Final Displacement Error)とMR(Miss Rate)において,単一GPUを用いた優れた性能を示した。コードは: \url{https://github.com/xichennn/V2I_trajectory_prediction}で公開されている。 Current research on trajectory prediction primarily relies on data collected by onboard sensors of an ego vehicle. With the rapid advancement in connected technologies, such as vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communication, valuable information from alternate views becomes accessible via wireless networks. The integration of information from alternative views has the potential to overcome the inherent limitations associated with a single viewpoint, such as occlusions and limited field of view. In this work, we introduce V2INet, a novel trajectory prediction framework designed to model multi-view data by extending existing single-view models. Unlike previous approaches where the multi-view data is manually fused or formulated as a separate training stage, our model supports end-to-end training, enhancing both flexibility and performance. Moreover, the predicted multimodal trajectories are calibrated by a post-hoc conformal prediction module to get valid and efficient confidence regions. We evaluated the entire framework using the real-world V2I dataset V2X-Seq. Our results demonstrate superior performance in terms of Final Displacement Error (FDE) and Miss Rate (MR) using a single GPU. The code is publicly available at: \url{https://github.com/xichennn/V2I_trajectory_prediction}.	翻訳日:2024-08-05 12:28:46 公開日:2024-08-02
# MLOpsに関する最初の洞察: 実践者による認識と採用 Initial Insights on MLOps: Perception and Adoption by Practitioners ( http://arxiv.org/abs/2408.00463v2 ) ライセンス: Link先を確認	Sergio Moreschi, David Hästbacka, Andrea Janes, Valentina Lenarduzzi, Davide Taibi,	(参考訳) AIベースのソフトウェアの採用の加速は、信頼性、スケーラビリティ、倫理的コンプライアンスを保証するために、正確な開発ガイドラインを要求する。 MLOps(Machine Learning and Operations)ガイドラインがこの分野で主要な参照として現れ、ハイレベルな自動化ツールやアプリケーションの開発への道を開いた。 MLOpsガイドラインの導入にもかかわらず、その実装を取り巻く懐疑論の程度は依然として存在し、多くの企業で徐々に採用が進んでいる。 MLOpsに対する意識の欠如は、同じアプローチを意図せず、頻繁に採用する組織に、関連するベストプラクティスや原則に関する包括的な理解が欠如している場合もあります。本研究の目的は,さまざまなビジネスコンテキストにおけるMLOps(あるいはそれに匹敵する)ガイドラインの実際の採用に関する洞察を得ることである。この目的のために、MLOpsが企業でどのように採用され、認識されているかを理解するために、さまざまなビジネス環境を代表する実践者を調査しました。この調査の結果は、これらのガイドラインの利点と課題、それらに関連する学習曲線、およびこれらの情報から導出できる今後のトレンドに関連する他の関連する側面にも光を当てた。この研究は、MLOpsとその機械学習におけるイノベーションの次のフェーズへの影響について、より深い洞察を提供することを目的としている。そうすることで、将来的にはより効率的で信頼性があり、クリエイティブなAIアプリケーションの基礎を築くことを目指しています。 The accelerated adoption of AI-based software demands precise development guidelines to guarantee reliability, scalability, and ethical compliance. MLOps (Machine Learning and Operations) guidelines have emerged as the principal reference in this field, paving the way for the development of high-level automated tools and applications. Despite the introduction of MLOps guidelines, there is still a degree of skepticism surrounding their implementation, with a gradual adoption rate across many companies. In certain instances, a lack of awareness about MLOps has resulted in organizations adopting similar approaches unintentionally, frequently without a comprehensive understanding of the associated best practices and principles. The objective of this study is to gain insight into the actual adoption of MLOps (or comparable) guidelines in different business contexts. To this end, we surveyed practitioners representing a range of business environments to understand how MLOps is adopted and perceived in their companies. The results of this survey also shed light on other pertinent aspects related to the advantages and challenges of these guidelines, the learning curve associated with them, and the future trends that can be derived from this information. This study aims to provide deeper insight into MLOps and its impact on the next phase of innovation in machine learning. By doing so, we aim to lay the foundation for more efficient, reliable, and creative AI applications in the future.	翻訳日:2024-08-05 12:28:46 公開日:2024-08-02
# オンライン線形計画法における頻繁な解法 Infrequent Resolving Algorithm for Online Linear Programming ( http://arxiv.org/abs/2408.00465v2 ) ライセンス: Link先を確認	Guokai Li, Zizhuo Wang, Jingwei Zhang,	(参考訳) オンラインリニアプログラミング(OLP)は、オンラインオークション、ネットワーク収益管理、広告などの幅広い応用により、研究者と実践者の両方から大きな注目を集めている。既存のOLPアルゴリズムは、LPベースアルゴリズムとLPフリーアルゴリズムの2つのカテゴリに分類される。前者は典型的にはパフォーマンスの向上を保証し、常に後悔しても良いが、計算コストのかかる大量のLPを解く必要がある。対照的に、LPフリーアルゴリズムは1次計算しか必要としないが、より悪い性能を誘導し、絶え間ないリフレッシュバウンドを欠いている。本研究では, LP を時間的地平線上での O(\log\log T)$ 倍だけ解きながら, 常に後悔するアルゴリズムを提案することにより, 両極間のギャップを埋める。さらに、LPをわずかに$M$回だけ解ける場合、$O\left(T^{(1/2+\epsilon)^{M-1}}\right)を許すアルゴリズムを提案する。さらに、最初に到着確率が分かると、我々のアルゴリズムはLPs$O(\log\log T)$ times と $O\left(T^{(1/2+\epsilon)^{M}}\right)$ regret を LPs$M$ times で解くことで、絶え間ない後悔を保証できる。提案アルゴリズムの効率性を示すために, 数値実験を行った。 Online linear programming (OLP) has gained significant attention from both researchers and practitioners due to its extensive applications, such as online auction, network revenue management and advertising. Existing OLP algorithms fall into two categories: LP-based algorithms and LP-free algorithms. The former one typically guarantees better performance, even offering a constant regret, but requires solving a large number of LPs, which could be computationally expensive. In contrast, LP-free algorithm only requires first-order computations but induces a worse performance, lacking a constant regret bound. In this work, we bridge the gap between these two extremes by proposing an algorithm that achieves a constant regret while solving LPs only $O(\log\log T)$ times over the time horizon $T$. Moreover, when we are allowed to solve LPs only $M$ times, we propose an algorithm that can guarantee an $O\left(T^{(1/2+\epsilon)^{M-1}}\right)$ regret. Furthermore, when the arrival probabilities are known at the beginning, our algorithm can guarantee a constant regret by solving LPs $O(\log\log T)$ times, and an $O\left(T^{(1/2+\epsilon)^{M}}\right)$ regret by solving LPs only $M$ times. Numerical experiments are conducted to demonstrate the efficiency of the proposed algorithms.	翻訳日:2024-08-05 12:28:46 公開日:2024-08-02
# 一階および二階量子相転移における有限温度での開量子系の駆動 Driving of an open quantum system at finite temperature across first- and second-order quantum phase transitions ( http://arxiv.org/abs/2408.00635v2 ) ライセンス: Link先を確認	Felipe Matus, Pavel Cejnar,	(参考訳) 非零温度でのクビットの開完全連結系は、制御パラメータの空間内の様々な経路に沿って有限時間間隔で駆動される。この駆動は、第1次および第2次量子相遷移の有限サイズ前駆体を、分解された基底状態から絡み合った基底状態への遷移に導いており、最終パラメータ点における系の複雑な基底状態の最大忠実度を準備することを目的としている。駆動中、システムは一定の温度で熱浴に結合され、その力学は階層的運動方程式によって非摂動的に決定される。量子相転移に伴うパラメータ領域における熱浴の存在と、地表面に影響を及ぼす回避交差の特定のパターンと、励起状態とが相まって、ターゲット地表面状態の生成精度が著しく向上する可能性が示唆された。 An open fully connected system of qubits at nonzero temperature is driven within a finite time interval along various paths in the space of its control parameters. The driving leads across finite-size precursors of the first- and second-order quantum phase transition from factorized to entangled ground-state phases, aiming at the preparation of the complex ground state of the system at the final parameter point with maximal fidelity. During the drive, the system is coupled to a heat bath with a constant temperature, the dynamics being determined in a nonpertubative way by the method of Hierarchical Equations of Motion. It is shown that the presence of the heat bath in combination with specific patterns of avoided crossings affecting the ground and excited states in the parameter region around the quantum phase transition may considerably improve the fidelity of preparation of the target ground state.	翻訳日:2024-08-05 12:28:46 公開日:2024-08-02
# SentenceVAE:大規模言語モデルの次文予測による高速・長期・高精度推論 SentenceVAE: Faster, Longer and More Accurate Inference with Next-sentence Prediction for Large Language Models ( http://arxiv.org/abs/2408.00655v2 ) ライセンス: Link先を確認	Hongjun An, Yifan Chen, Xiaozhen Qiao, Zhe Sun, Xuelong Li,	(参考訳) 現代の大規模言語モデル (LLM) は、主に推論の次のトーケン予測法に依存しており、処理速度を著しく損なう。本稿では,LLMの推論効率を向上させることを目的とした,次世代予測と呼ばれる新しい推論手法を提案する。文変分オートエンコーダ(文変分自動エンコーダ)は,文変分自動エンコーダ(文変分自動エンコーダ)と文変分自動エンコーダ(文変分自動エンコーダ)からなる小型モデルである。エンコーダは文内の情報を単一のトークンに効果的に凝縮し、デコーダは圧縮されたデータを元のセンテンシャル形式に再構成する。 LLMの入力層と出力層にSentenceVAEを組み込むことで,文ごとの推論手法を用いて,推論速度を著しく高速化するSentence-level LLM(SLLM)を開発する。 SentenceVAEはまた、テキストを文に分割することで、元のセマンティックコンテンツの完全性を維持し、推論速度を高めながら精度を向上させる。公開されたLLMと比較すると、SLLMは等価コンテキスト長よりも少ないトークンを処理し、自己アテンション計算のメモリ要求を著しく低減し、より長いコンテキストの処理を容易にする。実験結果から,本手法は,204～365%の推論速度を向上し,PPLを46～75%に低減し,同じコンテキスト長に対してメモリオーバーヘッドを86～91%削減できることがわかった。さらに、モデルパラメータが増加するにつれて、このアプローチの利点はさらに顕著になる。 Contemporary large language models (LLMs) primarily rely on next-token prediction method for inference, which significantly impedes their processing speed. In this paper, we introduce a novel inference methodology termed next-sentence prediction, aimed at enhancing the inference efficiency of LLMs. We present Sentence Variational Autoencoder (SentenceVAE), a tiny model consisting of a Sentence Encoder and a Sentence Decoder. The encoder effectively condenses the information within a sentence into a singular token, while the decoder reconstructs this compressed data back into its original sentential form. By integrating SentenceVAE into the input and output layers of LLMs, we develop Sentence-level LLMs (SLLMs) that employ a sentence-by-sentence inference approach, markedly accelerating inference speeds. SentenceVAE also maintains the integrity of the original semantic content by segmenting the text into sentences, thereby improving accuracy while boosting inference speeds. Compared to published LLMs, SLLMs process fewer tokens over equivalent context lengths, significantly reducing memory demands for self-attention computations and facilitating the handling of longer contexts. Our experimental findings reveal that this method can accelerate inference speeds by 204~365%, reduce perplexity (PPL) to 46~75% of its original metric, and decrease memory overhead by 86~91% for the same context length, compared to the token-by-token method. Moreover, the benefits of this approach become even more pronounced as model parameters increase.	翻訳日:2024-08-05 12:28:46 公開日:2024-08-02
# コントラスト微調整による小言語モデルのテキスト埋め込みの改善 Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning ( http://arxiv.org/abs/2408.00690v2 ) ライセンス: Link先を確認	Trapoom Ukarapol, Zhicheng Lee, Amy Xin,	(参考訳) 大規模言語モデルは、自然言語理解において顕著な性能を示すが、そのリソース集約性は、言語をアクセスしにくくする。対照的に、MiniCPMのような小さな言語モデルは、より持続的なスケーラビリティを提供するが、特殊最適化なしでは性能が劣ることが多い。本稿では,テキスト埋め込みの改良を通じて,より小さな言語モデルの強化について検討する。 NLIデータセット上で対照的な微調整を行うために,MiniCPM,Phi-2,Gemmaの3つの言語モデルを選択する。以上の結果から, この微調整手法により, 各種ベンチマークにおける3つのモデルすべてに対するテキスト埋め込みの質が向上し, 平均56.33%の性能向上率が最も顕著であることがわかった。対照的な微調整コードはhttps://github.com/trapoom555/Language-Model-STS-CFTで公開されている。 While Large Language Models show remarkable performance in natural language understanding, their resource-intensive nature makes them less accessible. In contrast, smaller language models such as MiniCPM offer more sustainable scalability, but often underperform without specialized optimization. In this paper, we explore the enhancement of smaller language models through the improvement of their text embeddings. We select three language models, MiniCPM, Phi-2, and Gemma, to conduct contrastive fine-tuning on the NLI dataset. Our results demonstrate that this fine-tuning method enhances the quality of text embeddings for all three models across various benchmarks, with MiniCPM showing the most significant improvements of an average 56.33% performance gain. The contrastive fine-tuning code is publicly available at https://github.com/trapoom555/Language-Model-STS-CFT.	翻訳日:2024-08-05 12:28:46 公開日:2024-08-02
# 保険ポートフォリオにおける強化学習 Reinforcement Learning applied to Insurance Portfolio Pursuit ( http://arxiv.org/abs/2408.00713v2 ) ライセンス: Link先を確認	Edward James Young, Alistair Rogers, Elliott Tong, James Jordon,	(参考訳) 新しい顧客に直面すると、保険会社がその顧客に何を提供するかという決定に多くの要因が貢献する。保険提供の期待されるコストに加えて、同社は、他のオファーが顧客に対してなされる可能性や、顧客が価格差にどれほど敏感であるかを考慮しなければなりません。さらに企業は、例えば年齢、場所、職業などに依存する可能性のある、特定の顧客ポートフォリオをターゲットにしていることが多い。このような目標ポートフォリオを前提として、企業は、ポートフォリオ内の顧客を希望するかどうかに基づいて、個々の顧客のオファーを変更することができる。我々は、ポートフォリオ追求問題において、望ましい目標ポートフォリオを達成するため、オファーを変調する問題をいう。ポートフォリオ追跡問題を逐次決定問題として定式化し、そのソリューションのための新しい強化学習アルゴリズムを考案した。本手法は複雑な総合市場環境において試行し,ポートフォリオ追求への現在の産業的アプローチを模したベースライン手法よりも優れていることを示す。 When faced with a new customer, many factors contribute to an insurance firm's decision of what offer to make to that customer. In addition to the expected cost of providing the insurance, the firm must consider the other offers likely to be made to the customer, and how sensitive the customer is to differences in price. Moreover, firms often target a specific portfolio of customers that could depend on, e.g., age, location, and occupation. Given such a target portfolio, firms may choose to modulate an individual customer's offer based on whether the firm desires the customer within their portfolio. We term the problem of modulating offers to achieve a desired target portfolio the portfolio pursuit problem. Having formulated the portfolio pursuit problem as a sequential decision making problem, we devise a novel reinforcement learning algorithm for its solution. We test our method on a complex synthetic market environment, and demonstrate that it outperforms a baseline method which mimics current industry approaches to portfolio pursuit.	翻訳日:2024-08-05 12:28:46 公開日:2024-08-02

Title

Authors

Abstract

論文公表日・翻訳日

# SHARP-Net:Culvertと下水道管の欠陥分割のための精製ピラミッドネットワーク

SHARP-Net: A Refined Pyramid Network for Deficiency Segmentation in Culverts and Sewer Pipes ( http://arxiv.org/abs/2408.08879v1 )

ライセンス: Link先を確認

Rasha Alshawi, Md Meftahul Ferdaus, Md Tamjidul Hoque, Kendall Niles, Ken Pathak, Steve Sloan, Mahdi Abdelguerfi,

(参考訳) 本稿では,意味的セグメンテーションのための新しいアーキテクチャであるSemantic Haar-Adaptive Refined Pyramid Network (SHARP-Net)を紹介する。 SHARP-Netは、インセプションに似たブロックと様々なフィルタサイズ(3x3$と5x5)、並列マックスプーリング、追加の空間検出層を備えたボトムアップ経路を統合している。この設計は、マルチスケールの特徴と詳細な構造を捉えている。ネットワーク全体を通して、複雑さを減らすために深度的に分離可能な畳み込みが使用される。 SHARP-Netのトップダウンパスは、奥行きの分離可能な畳み込み(deep-wise separable convolutions)を使用して、アップサンプリングと情報融合によって高解像度のフィーチャを生成することに焦点を当てている。 Culvert-Swer DefectsデータセットとベンチマークによるDeepGlobe Land Coverデータセットを用いて,本モデルの評価を行った。実験により, 不規則な欠陥形状, 閉塞, クラス不均衡を扱う上で, ベースモデルの有効性(ハール様の特徴を除く)を実証した。 U-Net、CBAM U-Net、ASCU-Net、FPN、SegFormerなどの最先端の手法より優れており、Culvert-Sewer DefectsとDeepGlobe Land Coverのデータセットで平均14.4%と12.1%の改善を達成し、IoUのスコアは77.2%と70.6%だった。また、訓練時間も短縮された。さらに、慎重に選択されたHaarのような機能の統合により、ディープラーニングモデルの性能は少なくとも20%向上した。提案されたSHARP-NetはHaarライクな特徴を取り入れ、94.75%の印象的なIoUを達成し、ベースモデルよりも22.74%改善した。これらの機能は、他のディープラーニングモデルにも適用され、35.0%の改善を示し、その汎用性と有効性を証明した。これにより、SHARP-Netは、現実世界の挑戦的なシナリオにおいて、正確なセマンティックセグメンテーションのための強力で効率的なソリューションを提供する。

This paper introduces Semantic Haar-Adaptive Refined Pyramid Network (SHARP-Net), a novel architecture for semantic segmentation. SHARP-Net integrates a bottom-up pathway featuring Inception-like blocks with varying filter sizes (3x3$ and 5x5), parallel max-pooling, and additional spatial detection layers. This design captures multi-scale features and fine structural details. Throughout the network, depth-wise separable convolutions are used to reduce complexity. The top-down pathway of SHARP-Net focuses on generating high-resolution features through upsampling and information fusion using $1\times1$ and $3\times3$ depth-wise separable convolutions. We evaluated our model using our developed challenging Culvert-Sewer Defects dataset and the benchmark DeepGlobe Land Cover dataset. Our experimental evaluation demonstrated the base model's (excluding Haar-like features) effectiveness in handling irregular defect shapes, occlusions, and class imbalances. It outperformed state-of-the-art methods, including U-Net, CBAM U-Net, ASCU-Net, FPN, and SegFormer, achieving average improvements of 14.4% and 12.1% on the Culvert-Sewer Defects and DeepGlobe Land Cover datasets, respectively, with IoU scores of 77.2% and 70.6%. Additionally, the training time was reduced. Furthermore, the integration of carefully selected and fine-tuned Haar-like features enhanced the performance of deep learning models by at least 20%. The proposed SHARP-Net, incorporating Haar-like features, achieved an impressive IoU of 94.75%, representing a 22.74% improvement over the base model. These features were also applied to other deep learning models, showing a 35.0% improvement, proving their versatility and effectiveness. SHARP-Net thus provides a powerful and efficient solution for accurate semantic segmentation in challenging real-world scenarios.

翻訳日:2024-08-25 14:30:57 公開日:2024-08-02

# ECGが公開:現実世界のECGデータセットにおけるクライアント再識別リスクの分析

ECG Unveiled: Analysis of Client Re-identification Risks in Real-World ECG Datasets ( http://arxiv.org/abs/2408.10228v1 )

ライセンス: Link先を確認

Ziyu Wang, Anil Kanduri, Seyed Amir Hossein Aqajari, Salar Jafarlou, Sanaz R. Mousavi, Pasi Liljeberg, Shaista Malik, Amir M. Rahmani,

(参考訳) ECGデータは心臓の状態を診断し、監視するために重要であるが、プライバシーの重大なリスクを引き起こすユニークな生体情報も含んでいる。既存の心電図の再同定研究は、多くの深層学習の特徴を徹底的に分析することに依存しており、臨床医の意思決定に対するアドホックな説明性に終止符を打つ。本研究では,透過的な機械学習モデルを用いたECG再識別リスクの説明可能性について検討する。 SHAP(SHapley Additive exPlanations)分析を用いて、再識別リスクの原因となる重要な特徴を特定し、説明する。 223人の参加者を含む5つの現実世界のデータセットのECGデータを用いて、アイデンティティ再識別リスクの実証分析を行った。透明な機械学習モデルを用いて、性別0.76、年齢0.67、参加者ID再識別0.82の個人の再識別に寄与する様々なECG特徴の多様性を明らかにする。本手法は,臨床専門家に貴重な知見を提供し,効果的なプライバシ保護機構の開発を導くものである。さらに,本研究は,現実の健康アプリケーションにおける堅牢なプライバシ対策の必要性を強調し,データ匿名化技術を強化するための詳細な,実用的な洞察を提供する。

While ECG data is crucial for diagnosing and monitoring heart conditions, it also contains unique biometric information that poses significant privacy risks. Existing ECG re-identification studies rely on exhaustive analysis of numerous deep learning features, confining to ad-hoc explainability towards clinicians decision making. In this work, we delve into explainability of ECG re-identification risks using transparent machine learning models. We use SHapley Additive exPlanations (SHAP) analysis to identify and explain the key features contributing to re-identification risks. We conduct an empirical analysis of identity re-identification risks using ECG data from five diverse real-world datasets, encompassing 223 participants. By employing transparent machine learning models, we reveal the diversity among different ECG features in contributing towards re-identification of individuals with an accuracy of 0.76 for gender, 0.67 for age group, and 0.82 for participant ID re-identification. Our approach provides valuable insights for clinical experts and guides the development of effective privacy-preserving mechanisms. Further, our findings emphasize the necessity for robust privacy measures in real-world health applications and offer detailed, actionable insights for enhancing data anonymization techniques.

翻訳日:2024-08-25 14:21:10 公開日:2024-08-02

# 学術検索システムにおけるAIの透明性:最初の探索

AI Transparency in Academic Search Systems: An Initial Exploration ( http://arxiv.org/abs/2408.10229v1 )

ライセンス: Link先を確認

Yifan Liu, Peter Sullivan, Luanne Sinnamon,

(参考訳) AIによって強化された学術検索システムが研究者の間で人気を増すにつれて、彼らのAI透明性の調査は、検索結果への信頼と、学術作品の信頼性と完全性を保証するために不可欠である。本研究は,大学図書館案内書から特定された10種類のAIを活用した学術検索システムのウェブサイトを質的コンテンツ分析手法を用いて分析する。 5つはメカニズムに関する詳細な情報を提供し、3つは部分的な情報を提供し、2つはほとんど情報を提供しない。これらの結果は,研究コミュニティが不透明な機能を持つツールを推奨・使用し,再現性の問題や研究者の責任など研究の整合性への懸念を高めていることを示している。

As AI-enhanced academic search systems become increasingly popular among researchers, investigating their AI transparency is crucial to ensure trust in the search outcomes, as well as the reliability and integrity of scholarly work. This study employs a qualitative content analysis approach to examine the websites of a sample of 10 AI-enhanced academic search systems identified through university library guides. The assessed level of transparency varies across these systems: five provide detailed information about their mechanisms, three offer partial information, and two provide little to no information. These findings indicate that the academic community is recommending and using tools with opaque functionalities, raising concerns about research integrity, including issues of reproducibility and researcher responsibility.

翻訳日:2024-08-25 14:21:10 公開日:2024-08-02

# LLMとのインタラクションのための汎用デバイス

A General-Purpose Device for Interaction with LLMs ( http://arxiv.org/abs/2408.10230v1 )

ライセンス: Link先を確認

Jiajun Xu, Qun Wang, Yuhang Cao, Baitao Zeng, Sicheng Liu,

(参考訳) 本稿では,大規模言語モデル(LLM)と高度なハードウェアの統合について検討し,LLMとの対話性の向上を目的とした汎用デバイスの開発に焦点をあてる。当初我々は、仮想アシスタントとLLMが人間とテクノロジーのインタラクションを再構築し、重要な進歩を強調し、新しいインテリジェントハードウェアの時代を舞台にしている現在の状況を分析した。 LLM技術の進歩にもかかわらず、特にスケーラビリティ、効率性、手頃な価格、マルチモーダル機能に関して、ハードウェア開発において大きなギャップが存在する。この格差は、パワフルであるだけでなく、汎用性があり、現代的な計算の洗練された要求を管理することのできるハードウェアの必要性を強調し、課題と機会の両方を提示する。提案するデバイスは,スケーラビリティ,マルチモーダルデータ処理,ユーザインタラクションの強化,プライバシ考慮を重視し,多様なアプリケーションにおけるLLM統合のための総合的なプラットフォームを提供することによって,これらのニーズに対処する。

This paper investigates integrating large language models (LLMs) with advanced hardware, focusing on developing a general-purpose device designed for enhanced interaction with LLMs. Initially, we analyze the current landscape, where virtual assistants and LLMs are reshaping human-technology interactions, highlighting pivotal advancements and setting the stage for a new era of intelligent hardware. Despite substantial progress in LLM technology, a significant gap exists in hardware development, particularly concerning scalability, efficiency, affordability, and multimodal capabilities. This disparity presents both challenges and opportunities, underscoring the need for hardware that is not only powerful but also versatile and capable of managing the sophisticated demands of modern computation. Our proposed device addresses these needs by emphasizing scalability, multimodal data processing, enhanced user interaction, and privacy considerations, offering a comprehensive platform for LLM integration in various applications.

翻訳日:2024-08-25 14:21:10 公開日:2024-08-02

# Responsible AI Question Bank: AIリスクアセスメントのための総合ツール

Responsible AI Question Bank: A Comprehensive Tool for AI Risk Assessment ( http://arxiv.org/abs/2408.11820v1 )

ライセンス: Link先を確認

Sung Une Lee, Harsha Perera, Yue Liu, Boming Xia, Qinghua Lu, Liming Zhu,

(参考訳) 人工知能(AI)の急速な成長は、責任あるAIプラクティスに対する緊急の要求を裏付けている。関心の高まりにもかかわらず、包括的なAIリスク評価ツールキットはいまだに欠落している。この研究は、さまざまなAIイニシアチブをサポートするために設計された包括的なフレームワークとツールであるResponsible AI (RAI) Question Bankを紹介します。公正性、透明性、説明責任といったAI倫理原則を構造化された質問形式に統合することで、RAI質問銀行は潜在的なリスクの特定、EU AI Actのような新たな規則の遵守、AIガバナンス全体の強化を支援する。 RAI質問銀行の重要な利点は、低レベルのリスク問題と高レベルのリスク問題と関連テーマを結びつけ、サイロ評価を防止し、結束的な評価プロセスを確実にする体系的なアプローチである。ケーススタディでは、リスク要因の評価から意思決定プロセスの実行に至るまで、AIプロジェクト評価におけるRAI質問銀行の実践的応用を説明している。この研究はまた、RAI質問銀行が標準の遵守を確実にし、リスクを軽減し、信頼できるAIシステムの開発を促進するためにどのように使用できるかを実証している。この作業は、包括的なリスク管理を確保しつつ、倫理的AI開発とデプロイメントの複雑さをナビゲートする貴重なツールを組織に提供することで、RAIを前進させる。

The rapid growth of Artificial Intelligence (AI) has underscored the urgent need for responsible AI practices. Despite increasing interest, a comprehensive AI risk assessment toolkit remains lacking. This study introduces our Responsible AI (RAI) Question Bank, a comprehensive framework and tool designed to support diverse AI initiatives. By integrating AI ethics principles such as fairness, transparency, and accountability into a structured question format, the RAI Question Bank aids in identifying potential risks, aligning with emerging regulations like the EU AI Act, and enhancing overall AI governance. A key benefit of the RAI Question Bank is its systematic approach to linking lower-level risk questions to higher-level ones and related themes, preventing siloed assessments and ensuring a cohesive evaluation process. Case studies illustrate the practical application of the RAI Question Bank in assessing AI projects, from evaluating risk factors to informing decision-making processes. The study also demonstrates how the RAI Question Bank can be used to ensure compliance with standards, mitigate risks, and promote the development of trustworthy AI systems. This work advances RAI by providing organizations with a valuable tool to navigate the complexities of ethical AI development and deployment while ensuring comprehensive risk management.

翻訳日:2024-08-25 14:11:11 公開日:2024-08-02

# DECO: TLSのための分散Oracleを使ってWebデータを解放する

DECO: Liberating Web Data Using Decentralized Oracles for TLS ( http://arxiv.org/abs/1909.00938v6 )

ライセンス: Link先を確認

Fan Zhang, Sai Krishna Deepak Maram, Harjasleen Malvai, Steven Goldfeder, Ari Juels,

(参考訳) TLSが広くデプロイされているため、ユーザはエンドツーエンドの機密性と整合性を備えたチャネル上のプライベートデータにアクセスすることができる。しかし、彼らができないことは、第三者にそのようなデータの証明(すなわち、それが実際に特定のウェブサイトから来ていること)を証明できない。既存のアプローチでは、望ましくない信頼の前提を導入するか、サーバ側の修正が必要になる。その結果、ユーザのプライベートデータの値は、その起源点にロックアップされる。ユーザは、現在のデータホルダからのヘルプとパーミッションなしで、保存された完全性でデータを他のアプリケーションにエクスポートすることはできない。上記の問題に対処するため, DECO ( \underline{dec}entralized \underline{o}racle の略) を提案する。 DECOは、TLSを介してアクセスされたデータの一部が特定のWebサイトから来たことを証明し、任意の方法でそのようなデータに関するステートメントをゼロ知識で証明し、データ自体を秘密にすることを可能にする。 DECOは、信頼できるハードウェアやサーバサイドの変更なしに動作する最初のシステムである。 DECOは、集中型のWebサービスサイロからデータを解放し、多様なアプリケーションにアクセスできるようにする。 DECOのパワーを実証するために、スマートコントラクトを使用したプライベートファイナンス、レガシ認証を匿名の認証に変換、価格差別に対するクレームの検証という3つのアプリケーションを実装しました。

Thanks to the widespread deployment of TLS, users can access private data over channels with end-to-end confidentiality and integrity. What they cannot do, however, is prove to third parties the {\em provenance} of such data, i.e., that it genuinely came from a particular website. Existing approaches either introduce undesirable trust assumptions or require server-side modifications. As a result, the value of users' private data is locked up in its point of origin. Users cannot export their data with preserved integrity to other applications without help and permission from the current data holder. We propose DECO (short for \underline{dec}entralized \underline{o}racle) to address the above problems. DECO allows users to prove that a piece of data accessed via TLS came from a particular website and optionally prove statements about such data in zero-knowledge, keeping the data itself secret. DECO is the first such system that works without trusted hardware or server-side modifications. DECO can liberate data from centralized web-service silos, making it accessible to a rich spectrum of applications. To demonstrate the power of DECO, we implement three applications that are hard to achieve without it: a private financial instrument using smart contracts, converting legacy credentials to anonymous credentials, and verifiable claims against price discrimination.

翻訳日:2024-08-19 05:35:40 公開日:2024-08-02

# ロボットコースにおける高度なLLM技術がAI講義チュータに及ぼす影響評価

Evaluating the Impact of Advanced LLM Techniques on AI-Lecture Tutors for a Robotics Course ( http://arxiv.org/abs/2408.04645v1 )

ライセンス: Link先を確認

Sebastian Kahl, Felix Löffler, Martin Maciol, Fabian Ridder, Marius Schmitz, Jennifer Spanagel, Jens Wienkamp, Christopher Burgahn, Malte Schilling,

(参考訳) 本研究では,大規模言語モデル(LLM)を人工知能を用いた大学授業用チューターとして評価する。特に、プロンプトエンジニアリング、Retrieval-Augmented-Generation (RAG)、ファインチューニングなど、様々な高度な技術が利用されている。 BLEU-4, ROUGE, BERTScoreなどの共通類似度指標を用いて, 実用性と信頼性の小さな人為的評価を行った。以上の結果から,RAGと迅速なエンジニアリングを組み合わせることで,モデル応答が大幅に向上し,より優れた事実回答が得られている。教育の文脈において、RAGは、通常大学コースにすでに存在している追加情報と材料でモデルの入力を豊かにすることに基づいているため、理想的な手法として現れる。一方、ファインチューニングは、非常に小さく、まだ強力なエキスパートモデルを生成することができるが、過度に適合する危険性がある。我々の研究は、LLMの性能をどのように測定し、現在の測定値がどの程度正確か、あるいは関連性を表すかをさらに問うものである。類似度指標には高い相関関係があり、これらの指標のほとんどを短い応答に偏りがある。全体として,LLMを教育環境に統合する可能性と課題が指摘され,バランスの取れたトレーニングアプローチと高度な評価フレームワークの必要性が示唆された。

This study evaluates the performance of Large Language Models (LLMs) as an Artificial Intelligence-based tutor for a university course. In particular, different advanced techniques are utilized, such as prompt engineering, Retrieval-Augmented-Generation (RAG), and fine-tuning. We assessed the different models and applied techniques using common similarity metrics like BLEU-4, ROUGE, and BERTScore, complemented by a small human evaluation of helpfulness and trustworthiness. Our findings indicate that RAG combined with prompt engineering significantly enhances model responses and produces better factual answers. In the context of education, RAG appears as an ideal technique as it is based on enriching the input of the model with additional information and material which usually is already present for a university course. Fine-tuning, on the other hand, can produce quite small, still strong expert models, but poses the danger of overfitting. Our study further asks how we measure performance of LLMs and how well current measurements represent correctness or relevance? We find high correlation on similarity metrics and a bias of most of these metrics towards shorter responses. Overall, our research points to both the potential and challenges of integrating LLMs in educational settings, suggesting a need for balanced training approaches and advanced evaluation frameworks.

翻訳日:2024-08-19 04:27:34 公開日:2024-08-02

# SumRecom:ユーザのフィードバックから学ぶパーソナライズされた要約アプローチ

SumRecom: A Personalized Summarization Approach by Learning from Users' Feedback ( http://arxiv.org/abs/2408.07294v1 )

ライセンス: Link先を確認

Samira Ghodratnama, Mehrdad Zakershahrak,

(参考訳) 既存の文書要約アプローチは、個人の興味を考慮せずに、すべてのユーザに対して一様要約を生成するが、これは非常に現実的ではない。ユーザ固有の要約を作ることは、要求される課題である。一利用者に関する関連情報を取得すること。二情報をユーザモデルに集約して統合すること。三パーソナライズした要約を作成する際に提供された情報を利用すること。そこで本稿では,要約における実質的かつ困難な問題の解決,すなわち,特定のユーザに対して要約を推奨する手法を提案する。提案したアプローチはSumRecomと呼ばれ、人間をループに巻き込み、参照要約を必要とせず、パーソナライズ、インタラクション、ユーザの興味を学習する3つの側面に焦点を当てている。 SumRecomには2つのステップがある。一本質的な概念を選択する際の利用者の傾きを捉えようとする利用者選好抽出装置二利用者の最も適した要約を所定のフィードバックに基づいて発見する要約者。ベンチマークデータセット上でのさまざまな自動的および人為的評価は、ユーザ固有の要約を生成する上で、SumRecomの優位性を示す。文書要約と対話的要約とパーソナライズされた要約と強化学習

Existing multi-document summarization approaches produce a uniform summary for all users without considering individuals' interests, which is highly impractical. Making a user-specific summary is a challenging task as it requires: i) acquiring relevant information about a user; ii) aggregating and integrating the information into a user-model; and iii) utilizing the provided information in making the personalized summary. Therefore, in this paper, we propose a solution to a substantial and challenging problem in summarization, i.e., recommending a summary for a specific user. The proposed approach, called SumRecom, brings the human into the loop and focuses on three aspects: personalization, interaction, and learning user's interest without the need for reference summaries. SumRecom has two steps: i) The user preference extractor to capture users' inclination in choosing essential concepts, and ii) The summarizer to discover the user's best-fitted summary based on the given feedback. Various automatic and human evaluations on the benchmark dataset demonstrate the supremacy SumRecom in generating user-specific summaries. Document summarization and Interactive summarization and Personalized summarization and Reinforcement learning.

翻訳日:2024-08-19 03:35:49 公開日:2024-08-02

# 正規化コントラスト部分多視点外乱検出

Regularized Contrastive Partial Multi-view Outlier Detection ( http://arxiv.org/abs/2408.07819v1 )

ライセンス: Link先を確認

Yijia Wang, Qianqian Xu, Yangbangyan Jiang, Siran Dai, Qingming Huang,

(参考訳) 近年,マルチビュー・アウトレイラ検出法(MVOD)が大幅に進歩し,マルチビュー・データセット内のアウトレイラの同定が試みられている。重要なポイントは、マルチビューデータにのみ存在する、クラスのアウトラヤとクラス属性のアウトラヤをよりよく検出することである。しかし、既存の手法では、ビュー一貫性のある情報を学ぶときの外れ値の影響を低減できないか、近隣構造が異なる場合に苦労する。さらに、そのほとんどは実世界のシナリオにおける部分的なマルチビューデータには適用されない。これらの欠点を克服するため,RCPMOD (Regularized Contrastive partial Multi-view Outlier Detection) と呼ばれる新しい手法を提案する。このフレームワークでは、コントラスト学習を利用して、ビュー一貫性のある情報を学び、一貫性の度合いでアウトレイラを識別する。具体的には, 理論的解析によって動機付けられたバイアスを除去するため, 1) 潜在外付けメモリバンクを用いた外付けメモリバンクのコントラスト損失について検討する。 2) 視野共有型局所構造相関を捉えるための隣接アライメントのコントラスト損失。 (3) モデルが外れ値よりも過度に収まらないように正規化損失を広げる。クロスビューリレーショナルトランスファー技術を用いることで、近隣住民の特徴に基づいて、行方不明のビューサンプルを簡単にインプットできる。 4つのベンチマークデータセットによる実験結果から,提案手法は異なる条件下での最先端の競合より優れていることが示された。

In recent years, multi-view outlier detection (MVOD) methods have advanced significantly, aiming to identify outliers within multi-view datasets. A key point is to better detect class outliers and class-attribute outliers, which only exist in multi-view data. However, existing methods either is not able to reduce the impact of outliers when learning view-consistent information, or struggle in cases with varying neighborhood structures. Moreover, most of them do not apply to partial multi-view data in real-world scenarios. To overcome these drawbacks, we propose a novel method named Regularized Contrastive Partial Multi-view Outlier Detection (RCPMOD). In this framework, we utilize contrastive learning to learn view-consistent information and distinguish outliers by the degree of consistency. Specifically, we propose (1) An outlier-aware contrastive loss with a potential outlier memory bank to eliminate their bias motivated by a theoretical analysis. (2) A neighbor alignment contrastive loss to capture the view-shared local structural correlation. (3) A spreading regularization loss to prevent the model from overfitting over outliers. With the Cross-view Relation Transfer technique, we could easily impute the missing view samples based on the features of neighbors. Experimental results on four benchmark datasets demonstrate that our proposed approach could outperform state-of-the-art competitors under different settings.

翻訳日:2024-08-19 03:35:49 公開日:2024-08-02

# テレコム財団モデル:応用,課題,将来の動向

Telecom Foundation Models: Applications, Challenges, and Future Trends ( http://arxiv.org/abs/2408.03964v1 )

ライセンス: Link先を確認

Tahar Zanouda, Meysam Masoudi, Fitsum Gaim Gebre, Mischa Dohler,

(参考訳) テレコムネットワークは、展開シナリオの多様化、マルチスタンダード、マルチベンダのサポートなど、ますます複雑化している。通信ネットワークエコシステムの複雑な性質は、ネットワークを効果的に管理、運用、最適化する上での課題を提示している。これらのハードルに対処するために、通信ネットワークにおけるさまざまなタスクを解決するために人工知能(AI)が広く採用されている。しかし、これらの従来のAIモデルは、しばしば特定のタスクのために設計されており、開発と保守のために専門的な通信専門知識を必要とする広範囲で高価なラベル付きデータに依存している。 AIモデルは、通常、さまざまなデプロイメントシナリオやアプリケーションの一般化とサポートに失敗する。対照的に、ファンデーションモデル(FM)は言語、ビジョン、意思決定タスクにおいて、様々な領域において効果的な一般化能力を示す。 FMは、通信エコシステムから生成された複数のデータモダリティに基づいてトレーニングし、専門的なドメイン知識を活用することができる。さらに、FMは、最小限のタスク固有のラベル付きデータで多くの特殊タスクを解くように微調整することができ、場合によっては、以前は目に見えない問題を解決するためにコンテキストを活用することができる。 6Gの夜明けに,FMを用いて通信技術や規格を形作る可能性について検討する。特に,Telecom FMs (TFMs) のコンセプトプロセスについて概説し,ネットワーク構成,運用,保守のための特殊な TFM を編成する新たな機会について論じる。最後に,TFMの開発と展開の限界と課題について論じる。

Telecom networks are becoming increasingly complex, with diversified deployment scenarios, multi-standards, and multi-vendor support. The intricate nature of the telecom network ecosystem presents challenges to effectively manage, operate, and optimize networks. To address these hurdles, Artificial Intelligence (AI) has been widely adopted to solve different tasks in telecom networks. However, these conventional AI models are often designed for specific tasks, rely on extensive and costly-to-collect labeled data that require specialized telecom expertise for development and maintenance. The AI models usually fail to generalize and support diverse deployment scenarios and applications. In contrast, Foundation Models (FMs) show effective generalization capabilities in various domains in language, vision, and decision-making tasks. FMs can be trained on multiple data modalities generated from the telecom ecosystem and leverage specialized domain knowledge. Moreover, FMs can be fine-tuned to solve numerous specialized tasks with minimal task-specific labeled data and, in some instances, are able to leverage context to solve previously unseen problems. At the dawn of 6G, this paper investigates the potential opportunities of using FMs to shape the future of telecom technologies and standards. In particular, the paper outlines a conceptual process for developing Telecom FMs (TFMs) and discusses emerging opportunities for orchestrating specialized TFMs for network configuration, operation, and maintenance. Finally, the paper discusses the limitations and challenges of developing and deploying TFMs.

翻訳日:2024-08-09 17:39:48 公開日:2024-08-02

# 存在オントロジーを基本形式オントロジーにマッピングする

Mapping the Provenance Ontology to Basic Formal Ontology ( http://arxiv.org/abs/2408.03866v1 )

ライセンス: Link先を確認

Tim Prudhomme, Giacomo De Colle, Austin Liebers, Alec Sculley, Peihong, Xie, Sydney Cohen, John Beverley,

(参考訳) Provenance Ontology (PROV-O) はWorld Wide Web Consortium (W3C) の推奨オントロジーである。 Basic Formal Ontology (BFO)は、OBO Foundry OntologyやCommon Core Ontology (CCO)など、さまざまなオントロジーを構成するために使用されるトップレベルのオントロジーISO/IEC規格である。これら2つのオントロジ、その拡張、およびそれらによって編成されたデータとの相互運用性を高めるために、構造的および意味的考察を優先する特定のマッピング基準と方法論に従ってアライメントを示す。オントロジーアライメントは、PROV-Oインスタンスの標準的な例と、SPARQLで形式化されたマッピング基準を満たさないクエリ項との論理的整合性をチェックすることで評価される。 FAIR(Findable, Accessible, Interoperable, Reusable)の原則をサポートするために,さまざまなセマンティックWebテクノロジが使用されている。

The Provenance Ontology (PROV-O) is a World Wide Web Consortium (W3C) recommended ontology used to structure data about provenance across a wide variety of domains. Basic Formal Ontology (BFO) is a top-level ontology ISO/IEC standard used to structure a wide variety of ontologies, such as the OBO Foundry ontologies and the Common Core Ontologies (CCO). To enhance interoperability between these two ontologies, their extensions, and data organized by them, an alignment is presented according to a specific mapping criteria and methodology which prioritizes structural and semantic considerations. The ontology alignment is evaluated by checking its logical consistency with canonical examples of PROV-O instances and querying terms that do not satisfy the mapping criteria as formalized in SPARQL. A variety of semantic web technologies are used in support of FAIR (Findable, Accessible, Interoperable, Reusable) principles.

翻訳日:2024-08-08 12:25:11 公開日:2024-08-02

# ベイズ最適化のための事前学習されたガウス過程

Pre-trained Gaussian Processes for Bayesian Optimization ( http://arxiv.org/abs/2109.08215v6 )

ライセンス: Link先を確認

Zi Wang, George E. Dahl, Kevin Swersky, Chansoo Lee, Zachary Nado, Justin Gilmer, Jasper Snoek, Zoubin Ghahramani,

(参考訳) ベイズ最適化(BO)は、高価な実世界の関数をグローバルに最適化するための一般的な戦略となっている。 BOがブラックボックス関数の最適化に適しているという一般的な期待とは対照的に、実際にBOを正常にデプロイするためには、それらの関数に関するドメイン知識が必要である。このようなドメイン知識は、関数に対する初期信念を規定するガウス過程(GP)にしばしば現れる。しかし、専門家の知識があっても、事前を定量的に定義するのは簡単ではない。これは、複雑な機械学習モデルにおけるハイパーパラメータチューニングの問題に特に当てはまる。我々はこれらの機能的前提を設定するための代替のプラクティスを模索する。特に、より厳密な分布を事前訓練できるような、類似した関数のデータを持つシナリオを考察する。我々は,KL分散に基づく損失関数を用いて,GPの事前学習に必要なものについて詳述し,HyperBOと呼ばれる新しい事前学習ベースのBOフレームワークを提案する。理論的には, GP前の「真理」を仮定せずに, 後続の予測とHyperBOのほぼゼロの後悔が認められた。我々のアプローチを現実的なセットアップで検証するために、一般的な画像やテキストデータセット、およびタンパク質配列データセットに基づいて、最先端のディープラーニングモデルの何万もの構成をトレーニングすることで、大規模なマルチタスクハイパーパラメータチューニングデータセットを収集します。以上の結果から,HyperBOは,新しいチューニングデータセットと既存のマルチタスクBOベンチマークの両方において,競合する最良の手法よりも,少なくとも3倍高い効率で優れたハイパーパラメータを見つけることができることがわかった。

Bayesian optimization (BO) has become a popular strategy for global optimization of expensive real-world functions. Contrary to a common expectation that BO is suited to optimizing black-box functions, it actually requires domain knowledge about those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process (GP) priors that specify initial beliefs on functions. However, even with expert knowledge, it is non-trivial to quantitatively define a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. We detail what pre-training entails for GPs using a KL divergence based loss function, and propose a new pre-training based BO framework named HyperBO. Theoretically, we show bounded posterior predictions and near-zero regrets for HyperBO without assuming the "ground truth" GP prior is known. To verify our approach in realistic setups, we collect a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art deep learning models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, HyperBO is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods on both our new tuning dataset and existing multi-task BO benchmarks.

翻訳日:2024-08-07 20:01:27 公開日:2024-08-02

# 高速・高精度脳内出血CT画像分類のためのデュアルタスク・ビジョン・トランス

Dual-Task Vision Transformer for Rapid and Accurate Intracerebral Hemorrhage CT Image Classification ( http://arxiv.org/abs/2405.06814v3 )

ライセンス: Link先を確認

Jialiang Fan, Xinhui Fan, Chengyan Song, Xiaofan Wang, Bingdong Feng, Lucan Li, Guoyu Lu,

(参考訳) 脳内出血 (ICH) は、脳血管の破裂によって引き起こされる重篤で急激な医学的症状であり、脳組織に永続的な損傷を与え、しばしば機能障害や死亡を引き起こす。 ICHの診断と解析は、通常、脳のCT画像に頼っている。 ICH 条件の緊急性を考えると,早期治療は極めて重要である。しかし、ICHCT画像の複雑さと専門医の頻繁な不足は重要な課題である。そこで本研究では,ICHと正常分類のための実世界からデータセットを収集し,出血位置,すなわちDeep,Subcortical,Lobarの3種類のICH画像分類を行う。さらに、ICH画像の自動分類と診断のためのニューラルネットワーク構造であるDouble-task Vision Transformer(DTViT)を提案する。 DTViTは、視覚変換器(ViT)からエンコーダをデプロイし、CT画像からの特徴抽出に注意機構を用いる。提案するDTViTフレームワークは、2つの多層認識(MLP)ベースのデコーダを組み込んで、ICHの存在を同時に識別し、3種類の出血部位を分類する。実験の結果,DTViTは実世界のテストデータセットで良好に動作することがわかった。この作業のためのコードと新たに収集されたデータセットは、https://github.com/jfan 1997/DTViT.comで公開されている。

Intracerebral hemorrhage (ICH) is a severe and sudden medical condition caused by the rupture of blood vessels in the brain, leading to permanent damage to brain tissue and often resulting in functional disabilities or death in patients. Diagnosis and analysis of ICH typically rely on brain CT imaging. Given the urgency of ICH conditions, early treatment is crucial, necessitating rapid analysis of CT images to formulate tailored treatment plans. However, the complexity of ICH CT images and the frequent scarcity of specialist radiologists pose significant challenges. Therefore, we collect a dataset from the real world for ICH and normal classification and three types of ICH image classification based on the hemorrhage location, i.e., Deep, Subcortical, and Lobar. In addition, we propose a neural network structure, dual-task vision transformer (DTViT), for the automated classification and diagnosis of ICH images. The DTViT deploys the encoder from the Vision Transformer (ViT), employing attention mechanisms for feature extraction from CT images. The proposed DTViT framework also incorporates two multilayer perception (MLP)-based decoders to simultaneously identify the presence of ICH and classify the three types of hemorrhage locations. Experimental results demonstrate that DTViT performs well on the real-world test dataset. The code and newly collected dataset for this work are available at: https://github.com/jfan1997/DTViT.

翻訳日:2024-08-07 18:52:52 公開日:2024-08-02

# フォトニック応用のための人工ニューラルネットワーク:アルゴリズムから実装まで

Artificial Neural Networks for Photonic Applications: From Algorithms to Implementation ( http://arxiv.org/abs/2408.02685v1 )

ライセンス: Link先を確認

Pedro Freire, Egor Manuylovich, Jaroslaw E. Prilepsky, Sergei K. Turitsy,

(参考訳) フォトニクスにおける人工ニューラルネットワークの適用に関するチュートリアル・レビューは、光学研究や工学コミュニティからコンピュータ科学、応用数学まで幅広い読者を対象としている。ここでは、これらの分野間のインターフェースにおける研究領域に注目し、各ドメイン固有の技術的詳細と全体的な明確さの間の適切なバランスを見つけようとしています。まず、いくつかのコアニューラルネットワークタイプの重要な特性と特異性を簡潔に思い出す。これはフォトニクスに最も関係していると考えられており、また、レイヤーの理論的設計をフォトニクスのハードウェア実現と結びつけている。その後、最適化された精度で必要なタスクを実行するために、選択したモデルの設計を微調整する方法の問題を解明する。次に,光通信,イメージング,センシング,新しい材料やレーザーの設計に関わる複数の側面を含む,フォトニクスにおけるニューラルネットワークの応用の最近の展開と進歩について論じる。次の節では、アルゴリズムからハードウェア実装への移行という文脈において、ニューラルネットワークの複雑さを正確に評価する方法に重点を置いている。導入された複雑性特性は、特定の重要な例である光通信におけるニューラルネットワークの応用を分析するために使用され、それらをいくつかのベンチマーク信号処理方法と比較する。我々は、機械学習でよく知られたモデル圧縮戦略の記述と、最近ニューラルネットワークの光学的応用で導入された新しい技法を組み合わせる。このチュートリアルのレビューはフォトニクスに重点を置いているが、ここで紹介される手法や技法は、より広い範囲の科学的・工学的応用において有用であると信じている点を強調することが重要である。

This tutorial-review on applications of artificial neural networks in photonics targets a broad audience, ranging from optical research and engineering communities to computer science and applied mathematics. We focus here on the research areas at the interface between these disciplines, attempting to find the right balance between technical details specific to each domain and overall clarity. First, we briefly recall key properties and peculiarities of some core neural network types, which we believe are the most relevant to photonics, also linking the layer's theoretical design to some photonics hardware realizations. After that, we elucidate the question of how to fine-tune the selected model's design to perform the required task with optimized accuracy. Then, in the review part, we discuss recent developments and progress for several selected applications of neural networks in photonics, including multiple aspects relevant to optical communications, imaging, sensing, and the design of new materials and lasers. In the following section, we put a special emphasis on how to accurately evaluate the complexity of neural networks in the context of the transition from algorithms to hardware implementation. The introduced complexity characteristics are used to analyze the applications of neural networks in optical communications, as a specific, albeit highly important example, comparing those with some benchmark signal processing methods. We combine the description of the well-known model compression strategies used in machine learning, with some novel techniques introduced recently in optical applications of neural networks. It is important to stress that although our focus in this tutorial-review is on photonics, we believe that the methods and techniques presented here can be handy in a much wider range of scientific and engineering applications.

翻訳日:2024-08-07 16:17:55 公開日:2024-08-02

# バイオメディカル応用のためのマルチモーダルディープラーニングにおける中間核融合の体系的検討

A Systematic Review of Intermediate Fusion in Multimodal Deep Learning for Biomedical Applications ( http://arxiv.org/abs/2408.02686v1 )

ライセンス: Link先を確認

Valerio Guarrasi, Fatih Aksu, Camillo Maria Caruso, Francesco Di Feola, Aurora Rofena, Filippo Ruffini, Paolo Soda,

(参考訳) 深層学習は、複雑な高次元データを扱う洗練された方法を提供することによって、生物医学研究に革命をもたらした。マルチモーダルディープラーニング(MDL)は、画像、テキストデータ、遺伝情報などの多様なデータタイプを統合することで、より堅牢で正確な予測モデルを実現することで、この機能をさらに強化する。 MDLでは、早期と後期の融合法とは異なり、中間核融合は学習過程においてモダリティ固有の特徴を効果的に組み合わせる能力において際立っている。本システムレビューは, 生物医学応用における現在の中間核融合法を包括的に解析し, 定式化することを目的としている。本研究では, 中間核融合法の発展に向けた技術, 課題, 今後の方向性について検討する。さらに, バイオメディカルドメインを超えて, これらの手法の理解と応用を高めるための構造的表記法を導入する。我々の発見は、より高度で洞察に富んだマルチモーダルモデルの開発において、研究者、医療専門家、そしてより広範なディープラーニングコミュニティを支援することを目的としています。本稿では,MDLの動的分野における今後の研究および実用化のための基礎的枠組みを提案する。

Deep learning has revolutionized biomedical research by providing sophisticated methods to handle complex, high-dimensional data. Multimodal deep learning (MDL) further enhances this capability by integrating diverse data types such as imaging, textual data, and genetic information, leading to more robust and accurate predictive models. In MDL, differently from early and late fusion methods, intermediate fusion stands out for its ability to effectively combine modality-specific features during the learning process. This systematic review aims to comprehensively analyze and formalize current intermediate fusion methods in biomedical applications. We investigate the techniques employed, the challenges faced, and potential future directions for advancing intermediate fusion methods. Additionally, we introduce a structured notation to enhance the understanding and application of these methods beyond the biomedical domain. Our findings are intended to support researchers, healthcare professionals, and the broader deep learning community in developing more sophisticated and insightful multimodal models. Through this review, we aim to provide a foundational framework for future research and practical applications in the dynamic field of MDL.

翻訳日:2024-08-07 16:17:55 公開日:2024-08-02

# 映像からの物体・事象の合成物理推論

Compositional Physical Reasoning of Objects and Events from Videos ( http://arxiv.org/abs/2408.02687v1 )

ライセンス: Link先を確認

Zhenfang Chen, Shilong Dong, Kexin Yi, Yunzhu Li, Mingyu Ding, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan,

(参考訳) 自然界における物体の物理的性質の理解と推論は、人工知能における根本的な課題である。色や形状などのいくつかの特性は直接観察できるが、質量や電荷などの他の特性は、物体の視覚的な外観から隠されている。本稿では、物体の運動と相互作用からこれらの隠れた物理的特性を推定し、推定された物理的特性に基づいて対応する力学を予測するというユニークな課題に対処する。まず,コンポジション物理推論(ComPhy)データセットを紹介する。与えられたオブジェクトのセットに対して、ComPhyは、異なる初期条件下で動いたり相互作用したりした、限られたビデオを含んでいる。このモデルは、質量や電荷などの構成的隠れた特性を解き放つ能力に基づいて評価され、この知識を用いて一連の疑問に答える。シミュレータの合成ビデオの他に、実世界のデータセットを収集し、異なるモデルの物理的推論能力をテストする。我々は、ComPhyの最先端ビデオ推論モデルを評価し、これらの隠れプロパティをキャプチャする能力に制限があることを明らかにし、性能が低下することを示した。また,視覚的および隠れた物理的特性を質問応答から学習し,原因を解明する,新しいニューロシンボリックな枠組みであるPhysical Concept Reasoner(PCR)を提案する。訓練後、PCRは顕著な能力を示す。フレームをまたいでオブジェクトを検出し、関連付けることができ、視覚的および隠れた物理的特性を検知し、未来と反現実的な予測を行い、これらの抽出された表現を使って挑戦的な質問に答える。

Understanding and reasoning about objects' physical properties in the natural world is a fundamental challenge in artificial intelligence. While some properties like colors and shapes can be directly observed, others, such as mass and electric charge, are hidden from the objects' visual appearance. This paper addresses the unique challenge of inferring these hidden physical properties from objects' motion and interactions and predicting corresponding dynamics based on the inferred physical properties. We first introduce the Compositional Physical Reasoning (ComPhy) dataset. For a given set of objects, ComPhy includes limited videos of them moving and interacting under different initial conditions. The model is evaluated based on its capability to unravel the compositional hidden properties, such as mass and charge, and use this knowledge to answer a set of questions. Besides the synthetic videos from simulators, we also collect a real-world dataset to show further test physical reasoning abilities of different models. We evaluate state-of-the-art video reasoning models on ComPhy and reveal their limited ability to capture these hidden properties, which leads to inferior performance. We also propose a novel neuro-symbolic framework, Physical Concept Reasoner (PCR), that learns and reasons about both visible and hidden physical properties from question answering. After training, PCR demonstrates remarkable capabilities. It can detect and associate objects across frames, ground visible and hidden physical properties, make future and counterfactual predictions, and utilize these extracted representations to answer challenging questions.

翻訳日:2024-08-07 16:17:55 公開日:2024-08-02

# 短時間学習データを用いた長期気候シミュレーションにおける非侵入的補正学習のための確率的枠組み

A probabilistic framework for learning non-intrusive corrections to long-time climate simulations from short-time training data ( http://arxiv.org/abs/2408.02688v1 )

ライセンス: Link先を確認

Benedikt Barthel Sorensen, Leonardo Zepeda-Núñez, Ignacio Lopez-Gomez, Zhong Yi Wan, Rob Carver, Fei Sha, Themistoklis Sapsis,

(参考訳) 乱流のようなカオスシステムは、科学や工学においてユビキタスである。しかし、これらの研究は広い範囲のスケールと、しばしば完全には理解されていない他の物理学との強い相互作用のため、依然として課題である。その結果、これらのシステムの正確なシミュレーションに必要な時空間分解能は、特に気候変動による極端な気象リスクの定量化のような長期的リスク評価の応用において、一般に計算不可能である。データ駆動モデリングは、これらの障害を軽減するためのいくつかの約束を提供するが、高品質なシミュレーションの不足は、そのようなモデルを訓練するための限られたデータをもたらす。したがって、計算、アルゴリズム、データ制限は一般的に、稀な極端な事象の確率が正確に捕捉されないことを意味する。本研究では,カオスシステムの非侵襲的に解けない長期シミュレーションにニューラルネットワークモデルをトレーニングするための一般的な戦略を提案する。提案手法は,高忠実度基準に向けた未解決シミュレーションにおいて,後処理補正演算子を訓練することに基づく。これにより、基礎となるシステムのダイナミクスを直接学習することができ、統計が収束していない場合でも、非常に少ないトレーニングデータを使用することができます。さらに、確率的ネットワークアーキテクチャを使用することで、限られたトレーニングデータによる不確実性を活用して、外挿機能をさらに改善できます。本研究では, 準地球栄養流の過度に未解決なシミュレーションに適用し, トレーニングデータより30倍以上の時間地平線上での異方性統計を正確に予測できることを実証する。

Chaotic systems, such as turbulent flows, are ubiquitous in science and engineering. However, their study remains a challenge due to the large range scales, and the strong interaction with other, often not fully understood, physics. As a consequence, the spatiotemporal resolution required for accurate simulation of these systems is typically computationally infeasible, particularly for applications of long-term risk assessment, such as the quantification of extreme weather risk due to climate change. While data-driven modeling offers some promise of alleviating these obstacles, the scarcity of high-quality simulations results in limited available data to train such models, which is often compounded by the lack of stability for long-horizon simulations. As such, the computational, algorithmic, and data restrictions generally imply that the probability of rare extreme events is not accurately captured. In this work we present a general strategy for training neural network models to non-intrusively correct under-resolved long-time simulations of chaotic systems. The approach is based on training a post-processing correction operator on under-resolved simulations nudged towards a high-fidelity reference. This enables us to learn the dynamics of the underlying system directly, which allows us to use very little training data, even when the statistics thereof are far from converged. Additionally, through the use of probabilistic network architectures we are able to leverage the uncertainty due to the limited training data to further improve extrapolation capabilities. We apply our framework to severely under-resolved simulations of quasi-geostrophic flow and demonstrate its ability to accurately predict the anisotropic statistics over time horizons more than 30 times longer than the data seen in training.

翻訳日:2024-08-07 16:17:55 公開日:2024-08-02

# 長期交通に対する時空間部分センシング予測

Spatio-Temporal Partial Sensing Forecast for Long-term Traffic ( http://arxiv.org/abs/2408.02689v1 )

ライセンス: Link先を確認

Zibo Liu, Zhe Jiang, Zelin Xu, Tingsong Xiao, Zhengkun Xiao, Haibo Wang, Shigang Chen,

(参考訳) 交通予測は、選択した場所に設置されたセンサによる最近の計測を使用して、将来の道路交通を予測する。既存の作業では、すべての場所がセンサーを装備していると仮定するか、短期的な予測に焦点を当てている。本稿では,一部の場所でのみセンサを仮定して,長期交通量の部分的検知トラフィック予測について検討する。この研究は、あらゆる場所にセンサーを配置することで、交通管理におけるインフラ投資コストを著しく削減する上で重要である。しかし、この問題は、センサのない場所での未知の分布、長期予測における複雑な時空間相関、データのノイズや交通パターンの異常(例えば道路閉鎖)などにより困難である。本稿では、長期交通予測のための時空間部分センシング(STPS)予測モデルを提案し、不規則を捕捉しノイズを克服するランクベースの埋め込み技術、空間分布シフトを永久に知覚された場所から未感知場所へ克服する空間伝達行列、利用可能なすべてのデータを利用してモデルパラメータを逐次改善する多段階トレーニングプロセスを提案する。複数の実世界の交通データセットに対する広範囲な実験により、STPSは最先端よりも優れ、部分的な知覚的長期予測において優れた精度を達成していることが示された。

Traffic forecasting uses recent measurements by sensors installed at chosen locations to forecast the future road traffic. Existing work either assumes all locations are equipped with sensors or focuses on short-term forecast. This paper studies partial sensing traffic forecast of long-term traffic, assuming sensors only at some locations. The study is important in lowering the infrastructure investment cost in traffic management since deploying sensors at all locations could incur prohibitively high cost. However, the problem is challenging due to the unknown distribution at unsensed locations, the intricate spatio-temporal correlation in long-term forecasting, as well as noise in data and irregularities in traffic patterns (e.g., road closure). We propose a Spatio-Temporal Partial Sensing (STPS) forecast model for long-term traffic prediction, with several novel contributions, including a rank-based embedding technique to capture irregularities and overcome noise, a spatial transfer matrix to overcome the spatial distribution shift from permanently sensed locations to unsensed locations, and a multi-step training process that utilizes all available data to successively refine the model parameters for better accuracy. Extensive experiments on several real-world traffic datasets demonstrate that STPS outperforms the state-of-the-art and achieves superior accuracy in partial sensing long-term forecasting.

翻訳日:2024-08-07 16:17:55 公開日:2024-08-02

# 量子因果構造の再検討 -- 因果秩序はいつ進化するのか?

Revisiting dynamics of quantum causal structures -- when can causal order evolve? ( http://arxiv.org/abs/2008.12757v4 )

ライセンス: Link先を確認

John H. Selby, Ana Belén Sainz, Paweł Horodecki,

(参考訳) 近年、量子論の力学、特にチャネルの力学、測定、高次変換の研究に大きな関心が寄せられている。 Ref [Phys. X 8(1), 011047 (2018)]は、プロセス行列のダイナミックスの定義とともに、プロセス行列形式を用いてこれを追求し、特に因果構造の進化の問題に焦点を当てている。その主要な結論の1つは、形式論において、連続的かつ可逆的な変換の下では、操作間の因果順序は保存されなければならないという強い定理である。我々の結果はRefに挑戦する。 [Phys. Rev. X 8(1), 011047 (2018)]:標準的な量子力学形式論における操作の物理的進化の全体像を考慮に入れれば、Refの結論となる。 [X 8(1), 011047(2018)] は成立しない。すなわち、ある連続的かつ可逆的な力学の下では、操作間の因果順序は必ずしも保存されないことを示す。我々はさらに、この明らかな矛盾の根源、具体的には、高次過程の広く受け入れられ、広く適用されている枠組みを、数学的に健全であるのに対して、必ずしも物理力学の結論を導き出すのに適切ではない、と特定し分析する。最後に、局所的な操作による絡み合い処理と古典的コミュニケーションに基づいて、直観の後の全体像の要素の整合性を示す。

Recently, there has been substantial interest in studying the dynamics of quantum theory beyond that of states, in particular, the dynamics of channels, measurements, and higher-order transformations. Ref. [Phys. Rev. X 8(1), 011047 (2018)] pursues this using the process-matrix formalism, together with a definition of the possible dynamics of such process matrices, and focusing especially on the question of evolution of causal structures. One of its major conclusions is a strong theorem saying that within the formalism, under continuous and reversible transformations, the causal order between operations must be preserved. Our result here challenges that of Ref. [Phys. Rev. X 8(1), 011047 (2018)]: if one is to take into account a full picture of the physical evolution of operations within the standard quantum-mechanical formalism, then the conclusion of Ref. [Phys. Rev. X 8(1), 011047 (2018)] does not hold. That is, we show that under certain continuous and reversible dynamics, the causal order between operations is not necessarily preserved. We moreover identify and analyse the root of this apparent contradiction, specifically, that the commonly accepted and widely applied framework of higher-order processes, whilst mathematically sound, is not always appropriate for drawing conclusions on physical dynamics. Finally, we show how to reconcile the elements of the whole picture following the intuition based on entanglement processing by local operations and classical communication.

翻訳日:2024-08-07 01:00:27 公開日:2024-08-02

# グラフにおけるコミュニティの補正確率検出

Amortized Probabilistic Detection of Communities in Graphs ( http://arxiv.org/abs/2010.15727v4 )

ライセンス: Link先を確認

Yueqi Wang, Yoonho Lee, Pallab Basu, Juho Lee, Yee Whye Teh, Liam Paninski, Ari Pakman,

(参考訳) グラフでコミュニティ構造を学ぶことは、科学領域にまたがる幅広い応用をもたらす。グラフニューラルネットワーク(GNN)はグラフ構造を符号化することに成功したが、既存のGNNベースのコミュニティ検出手法は、不確実性を扱うための適切な確率的定式化の欠如に加えて、予め多くのコミュニティの知識を必要とすることによって制限されている。本稿では,GNNの表現力と最近のクラスタリング手法を組み合わせることで,これらの問題に対処する。我々のモデルは、構造情報を抽出するグラフ表現のバックボーンと、クラスタの変動数を自然に扱うアモータイズされたクラスタリングネットワークから成り立っている。どちらの成分も、グラフ群の後部分布のよく定義されたモデルに結合し、ラベル付きグラフを共同最適化する。推測時に、モデルはコミュニティラベルの後部から並列サンプルを生成し、原則化された方法で不確実性を定量化する。合成および実データセットのフレームワークからいくつかのモデルを評価するとともに,従来の手法と比較して性能が向上したことを示す。個別のコントリビューションとして、アテンションモジュールを追加することで、最近のアモルト化確率的クラスタリングアーキテクチャを拡張し、コミュニティ検出タスクをさらに改善する。

Learning community structures in graphs has broad applications across scientific domains. While graph neural networks (GNNs) have been successful in encoding graph structures, existing GNN-based methods for community detection are limited by requiring knowledge of the number of communities in advance, in addition to lacking a proper probabilistic formulation to handle uncertainty. We propose a simple framework for amortized community detection, which addresses both of these issues by combining the expressive power of GNNs with recent methods for amortized clustering. Our models consist of a graph representation backbone that extracts structural information and an amortized clustering network that naturally handles variable numbers of clusters. Both components combine into well-defined models of the posterior distribution of graph communities and are jointly optimized given labeled graphs. At inference time, the models yield parallel samples from the posterior of community labels, quantifying uncertainty in a principled way. We evaluate several models from our framework on synthetic and real datasets, and demonstrate improved performance compared to previous methods. As a separate contribution, we extend recent amortized probabilistic clustering architectures by adding attention modules, which yield further improvements on community detection tasks.

翻訳日:2024-08-07 01:00:27 公開日:2024-08-02

# 光の体性を示す光子数分解検出器

Photon number resolving detectors as evidence for the corpuscular nature of light ( http://arxiv.org/abs/2110.04245v2 )

ライセンス: Link先を確認

Morgan C. Williamson, Gabriel D. Ko, Brian R. La Cour,

(参考訳) 我々は、光子数分解(PNR)検出器が光の離散性についての説得力のある証拠を与えるかどうかという問題を考える。そこで本研究では,既存のPNR検出器の信号対雑音比 (SNR) が不十分であることを明らかにするとともに,光の波動図と一致し,光粒子の推定に依存しないPNR検出器出力の解析に代替的な解釈を提案する。この解釈は、与えられた検出器の偶然の窓内での相関または偶然の一致検出の集約に基づいている。我々の解釈は、検出器の異常窓の任意の特性を考慮し、強度干渉計の確立した処理に接続する。この解釈を検証するために,多重化PNR検出器を用いて実験を行い,光子番号の一致窓への依存性を後処理により検討した。これらの観測を振幅しきい値検出に基づく完全古典波モデルと比較した結果, 良好な一致が得られた。文献上に存在するような低SNRPNR検出器の結果は古典的な記述で説明できるため、光の離散性を示す証拠は得られない。

We consider the question of whether photon-number-resolving (PNR) detectors provide compelling evidence for the discrete nature of light; i.e., whether they indicate the prior presence of a certain number of discrete photons. To answer this question, we reveal the insufficient signal-to-noise ratio (SNR) of existing PNR detectors, and propose an alternative interpretation for the analysis of PNR detector output that is consistent with a wave picture of light and does not rely on the presumption of light particles. This interpretation is based on the aggregation of correlated or accidentally coincident detections within a given detector coincidence window. Our interpretation accounts for the arbitrary character of detector coincidence windows and includes connections to established treatments of intensity interferometers. To validate our interpretation, we performed an experiment on a multiplexed PNR detector and examined the dependence of photon number on the coincidence window via post-processing. These observations were then compared to a fully classical wave model based on amplitude threshold detection, and the results were found to be in excellent agreement. We find that results from low SNR PNR detectors, such as those existing in the literature, are able to be described by classical descriptions, and therefore do not demonstrate evidence for the discrete nature of light.

翻訳日:2024-08-07 01:00:27 公開日:2024-08-02

# ソースコードメトリクスと静的解析を用いたバグの深刻度推定に関する実証的研究

An Empirical Study on Bug Severity Estimation using Source Code Metrics and Static Analysis ( http://arxiv.org/abs/2206.12927v2 )

ライセンス: Link先を確認

Ehsan Mashhadi, Shaiful Chowdhury, Somayeh Modaberi, Hadi Hemmati, Gias Uddin,

(参考訳) 過去数十年間、ソフトウェアバグ(欠陥)の予測に多大な研究努力が注がれている。一般的に、これらの作業はさまざまなメトリクス、ツール、テクニックを活用して、どのクラス、メソッド、行、コミットがバグが多いかを予測します。しかし、このドメインの既存の作業のほとんどはすべてのバグを扱います。バグが厳しければ多いほど、結果が大きくなる。したがって, バグの重大度を推定する欠陥予測手法が重要であり, 高い重大度がすぐに注目される。本稿では,10のソースコードメトリクスを用いた2つの一般的なデータセット(Defects4JとBugs.jar)と,その欠陥とその重症度を予測するための2つの人気のある静的解析ツール(SpotBugsとInfer)について,定量的かつ定性的に検討する。我々は、19のJavaオープンソースプロジェクトから異なる重度ラベルを持つ3,358のバグギーメソッドを調査した。その結果、コードメトリクスはバグの多いコードを予測するのに役立ちます(Lines of the Code、Mantainable Index、FanOut、Effortのメトリクスはベストです)が、バグの深刻度レベルを見積もることはできません。さらに,静的解析ツールは,予測バグ(F1スコアは3.1%-7.1%)と重度ラベル(F1スコアは2%以下)の両方において,弱い性能を示した。また、深刻なバグの特徴を手動で調べて、その深刻さを見積もる上で、コードメトリクスと静的解析ツールの弱いパフォーマンスの背後にある可能性のある理由を特定しました。また、当社の分類では、ほとんどのケースでセキュリティバグは深刻度が高いのに対して、エッジ/バウンダリ障害は深刻度が低いことが示されています。最後に,実験結果の実際的意義について考察し,今後の研究に向けた新たな方向性を提案する。

In the past couple of decades, significant research efforts have been devoted to the prediction of software bugs (i.e., defects). In general, these works leverage a diverse set of metrics, tools, and techniques to predict which classes, methods, lines, or commits are buggy. However, most existing work in this domain treats all bugs the same, which is not the case in practice. The more severe the bugs the higher their consequences. Therefore, it is important for a defect prediction method to estimate the severity of the identified bugs, so that the higher severity ones get immediate attention. In this paper, we provide a quantitative and qualitative study on two popular datasets (Defects4J and Bugs.jar), using 10 common source code metrics, and two popular static analysis tools (SpotBugs and Infer) for analyzing their capability to predict defects and their severity. We studied 3,358 buggy methods with different severity labels from 19 Java open-source projects. Results show that although code metrics are useful in predicting buggy code (Lines of the Code, Maintainable Index, FanOut, and Effort metrics are the best), they cannot estimate the severity level of the bugs. In addition, we observed that static analysis tools have weak performance in both predicting bugs (F1 score range of 3.1%-7.1%) and their severity label (F1 score under 2%). We also manually studied the characteristics of the severe bugs to identify possible reasons behind the weak performance of code metrics and static analysis tools in estimating their severity. Also, our categorization shows that Security bugs have high severity in most cases while Edge/Boundary faults have low severity. Finally, we discuss the practical implications of the results and propose new directions for future research.

翻訳日:2024-08-07 00:54:45 公開日:2024-08-02

# ガウス混合モデルにおけるロバスト教師なしマルチタスクと伝達学習

Robust Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models ( http://arxiv.org/abs/2209.15224v4 )

ライセンス: Link先を確認

Ye Tian, Haolei Weng, Lucy Xia, Yang Feng,

(参考訳) 教師なし学習は多くの現実世界のアプリケーションで広く使われている。最も単純かつ重要な教師なし学習モデルの1つはガウス混合モデル(GMM)である。本研究では,GMMにおけるマルチタスク学習問題について検討し,タスク間の類似したGMMパラメータ構造を活用し,シングルタスク学習と比較して学習性能を向上させることを目的とする。本稿では,EMアルゴリズムに基づくマルチタスクGMM学習手法を提案する。提案手法は,パラメータ推定誤差と過剰な誤クラスタリング誤差の両方に対する最小収束率を,幅広い状況下で達成する。さらに,同様の理論的結果が導出されるGMMにおける移動学習問題へのアプローチを一般化する。さらに、反復的教師なしマルチタスクおよび転送学習法は初期化アライメントの問題に悩まされ、この問題を解決するために2つのアライメントアルゴリズムが提案される。最後に,本手法の有効性をシミュレーションおよび実データ例を用いて実証する。我々の知る限りでは、理論的保証のあるGMM上でマルチタスクとトランスファー学習を研究する最初の研究である。

Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on GMMs, which aims to leverage potentially similar GMM parameter structures among tasks to obtain improved learning performance compared to single-task learning. We propose a multi-task GMM learning procedure based on the EM algorithm that effectively utilizes unknown similarities between related tasks and is robust against a fraction of outlier tasks from arbitrary distributions. The proposed procedure is shown to achieve the minimax optimal rate of convergence for both parameter estimation error and the excess mis-clustering error, in a wide range of regimes. Moreover, we generalize our approach to tackle the problem of transfer learning for GMMs, where similar theoretical results are derived. Additionally, iterative unsupervised multi-task and transfer learning methods may suffer from an initialization alignment problem, and two alignment algorithms are proposed to resolve the issue. Finally, we demonstrate the effectiveness of our methods through simulations and real data examples. To the best of our knowledge, this is the first work studying multi-task and transfer learning on GMMs with theoretical guarantees.

翻訳日:2024-08-07 00:54:45 公開日:2024-08-02

# プログラムで訓練された言語モデルにおけるプログラム意味論の創発的表現

Emergent Representations of Program Semantics in Language Models Trained on Programs ( http://arxiv.org/abs/2305.11169v3 )

ライセンス: Link先を確認

Charles Jin, Martin Rinard,

(参考訳) プログラムの形式的意味論を表現するために,プログラムの言語モデル(LM)が学習できることを実証する。具体的には、2Dグリッド環境をナビゲートするためのドメイン固有言語で記述されたプログラムの合成コーパス上でトランスフォーマーモデルを訓練する。コーパス内の各プログラムは、いくつかの入力出力グリッドの世界状態の形で(部分的な)仕様によって先行される。さらなる帰納バイアスは与えないが、学習中に未観測の中間格子状態のより正確な表現をLM隠蔽状態から抽出できることが判明し、LMがプログラムを形式的に解釈する創発的な能力を得る可能性が示唆された。また,新たな介入ベースラインを開発し,LMで表現されるものを,プローブが学習するのとは対照的に曖昧にすることができるようにした。我々は,この手法が多種多様な意味探索実験に適用可能であることを予測している。要約すると、コードのLMをトレーニングするための新しいテクニックは提案されていないが、コードの統計モデルにおける形式的意味論の獲得と表現に関する知見を提供するための実験的なフレームワークを開発した。私たちのコードはhttps://github.com/charlesjin/emergent-semantics.comで利用可能です。

We present evidence that language models (LMs) of code can learn to represent the formal semantics of programs, despite being trained only to perform next-token prediction. Specifically, we train a Transformer model on a synthetic corpus of programs written in a domain-specific language for navigating 2D grid world environments. Each program in the corpus is preceded by a (partial) specification in the form of several input-output grid world states. Despite providing no further inductive biases, we find that a probing classifier is able to extract increasingly accurate representations of the unobserved, intermediate grid world states from the LM hidden states over the course of training, suggesting the LM acquires an emergent ability to interpret programs in the formal sense. We also develop a novel interventional baseline that enables us to disambiguate what is represented by the LM as opposed to learned by the probe. We anticipate that this technique may be generally applicable to a broad range of semantic probing experiments. In summary, this paper does not propose any new techniques for training LMs of code, but develops an experimental framework for and provides insights into the acquisition and representation of formal semantics in statistical models of code. Our code is available at https://github.com/charlesjin/emergent-semantics.

翻訳日:2024-08-07 00:45:00 公開日:2024-08-02

# 注意すべきことは多かれ少なかれあるべきか?

Should We Attend More or Less? Modulating Attention for Fairness ( http://arxiv.org/abs/2305.13088v2 )

ライセンス: Link先を確認

Abdelrahman Zayed, Goncalo Mordido, Samira Shabanian, Sarath Chandar,

(参考訳) 自然言語処理(NLP)の進歩は、機会と課題の両方をもたらす。最近の進歩は、様々なタスクのためのハイパフォーマンスなモデルの開発を可能にする一方で、ジェンダーステレオタイプのようなデータから有害なバイアスを学習するモデルのリスクも引き起こす。本研究では,現在最先端のNLPモデルにおいて広く用いられている,社会バイアスの伝播における注意の役割について検討する。具体的には,注意分布のエントロピーとモデルの性能,公平性の関係について検討する。そこで本研究では,トレーニング後のモデルフェアネスを改善するために,注目度を変調する新しい手法を提案する。本手法はトレーニング後および事前推論にのみ適用されるため,プロセス内手法であり,既存のプロセス内手法や事前処理手法よりも計算コストが低い。本研究の結果は,テキストの分類や生成タスクにおいて,様々なサイズの言語モデルを用いて,公平さと最小性能の損失の増加を示す。 WARNING: この仕事は攻撃的な言語を使用します。

The advances in natural language processing (NLP) pose both opportunities and challenges. While recent progress enables the development of high-performing models for a variety of tasks, it also poses the risk of models learning harmful biases from the data, such as gender stereotypes. In this work, we investigate the role of attention, a widely-used technique in current state-of-the-art NLP models, in the propagation of social biases. Specifically, we study the relationship between the entropy of the attention distribution and the model's performance and fairness. We then propose a novel method for modulating attention weights to improve model fairness after training. Since our method is only applied post-training and pre-inference, it is an intra-processing method and is, therefore, less computationally expensive than existing in-processing and pre-processing approaches. Our results show an increase in fairness and minimal performance loss on different text classification and generation tasks using language models of varying sizes. WARNING: This work uses language that is offensive.

翻訳日:2024-08-07 00:45:00 公開日:2024-08-02

# リンドブラディアンによる単層地盤準備

Single-ancilla ground state preparation via Lindbladians ( http://arxiv.org/abs/2308.15676v4 )

ライセンス: Link先を確認

Zhiyan Ding, Chi-Fang Chen, Lin Lin,

(参考訳) 我々は、早期耐故障状態における基底状態の準備のための量子アルゴリズムを設計する。モンテカルロ型量子アルゴリズムとして、ターゲット状態が定常なリンドブラディアンを特徴とする。このリンドブラディアンの構成はアルゴリズム的であり、自然界の弱結合系-バス力学の特定の近似と見なすべきではない。提案アルゴリズムは1つのアンシラ量子ビットで実装でき、量子コンピュータ上で効率的にシミュレートできる。初期状態が基底状態と重複しない場合でも基底状態を作成することができ、量子位相推定のような手法の最も重要な制限を回避できる。変種として、より優れた効率性を示し、所望の進化時間と精度に応じてほぼ最適なシミュレーションコストを提供する離散時間アルゴリズムも提案する。 IsingモデルとHubbardモデルを用いた数値シミュレーションにより,本手法の有効性と適用性を示した。

We design a quantum algorithm for ground state preparation in the early fault tolerant regime. As a Monte Carlo-style quantum algorithm, our method features a Lindbladian where the target state is stationary. The construction of this Lindbladian is algorithmic and should not be seen as a specific approximation to some weakly coupled system-bath dynamics in nature. Our algorithm can be implemented using just one ancilla qubit and efficiently simulated on a quantum computer. It can prepare the ground state even when the initial state has zero overlap with the ground state, bypassing the most significant limitation of methods like quantum phase estimation. As a variant, we also propose a discrete-time algorithm, demonstrating even better efficiency and providing a near-optimal simulation cost depending on the desired evolution time and precision. Numerical simulation using Ising and Hubbard models demonstrates the efficacy and applicability of our method.

翻訳日:2024-08-07 00:25:32 公開日:2024-08-02

# グリーンマシンによるスーパーアダプティブ通信--絡みのない非局所性の実例

Superadditive Communication with the Green Machine: A Practical Demonstration of Nonlocality without Entanglement ( http://arxiv.org/abs/2310.05889v3 )

ライセンス: Link先を確認

Chaohan Cui, Jack Postlewaite, Babak N. Saif, Linran Fan, Saikat Guha,

(参考訳) 光通信容量の究極的なホレボ限界を達成するには、複数の変調されたシンボルに対して集合的な量子測定を行う共同検出受信機が必要である。このような超付加性は、シンボルバイシンボル光検出によって達成可能な通信速度よりも高いものであり、絡み合いのないよく知られた非局所性の特別な場合であり、まだ実証されていない。本稿では,重付加性を実現する共同検出受信機であるグリーンマシンの設計と実演を行う。我々はこの受信機を構築し、その容量が二相シフトキー(BPSK)を用いたフォトンスターベッド方式のシンボルバイシンボル受信機を超えることを示す。我々のグリーンマシン受信機は、パルス配置変調(宇宙レーザー通信に使用される従来の変調フォーマット)と比較して、送信機のピーク電力要求を著しく低減することができる。さらに, 自己参照相は, 位相雑音, 大気乱流, プラットフォーム振動に免疫を与えることを示した。

Achieving the ultimate Holevo limit of optical communication capacity requires a joint-detection receiver which makes a collective quantum measurement over multiple modulated symbols. Such superadditivity -- a higher communication rate than that achievable by symbol-by-symbol optical detection -- is a special case of the well-known nonlocality without entanglement and has yet to be demonstrated. In this article, we propose and demonstrate a design of joint-detection receivers, the Green Machine, that can achieve superadditivity. We build this receiver and show that its capacity surpasses any symbol-by-symbol receivers in the photon-starved regime with binary-phase-shift-keying (BPSK). Our Green Machine receiver can also significantly reduce the transmitter peak power requirement compared with the pulse-position modulation (the conventional modulation format used for deep space laser communication). We further show that the self-referenced phase makes it immune to phase noise, e.g., atmospheric turbulence or platform vibrations.

翻訳日:2024-08-07 00:15:47 公開日:2024-08-02

# 人々がオンラインでストーリーを語る場所は? オンラインコミュニティ全体にわたるストーリー検出

Where Do People Tell Stories Online? Story Detection Across Online Communities ( http://arxiv.org/abs/2311.09675v3 )

ライセンス: Link先を確認

Maria Antoniak, Joel Mire, Maarten Sap, Elliott Ash, Andrew Piper,

(参考訳) オンラインコミュニティにおけるストーリーの検出は、ストーリーがコミュニティに散らばり、単一のテキスト内でノンストーリーテリングスパンと織り交ぜられるため、難しい作業である。この課題に対処するために、StorySeekerツールキットを構築してリリースする。そこには、502のReddit投稿とコメントの豊富な注釈付きデータセット、ソーシャルメディアのコンテキストに適合した詳細なコードブック、ドキュメントとスパンのレベルでのストーリーテリングを予測するモデルが含まれている。私たちのデータセットは、33のトピックカテゴリにまたがる数百の人気のあるRedditコミュニティからサンプルされ、バイナリストーリーラベル、ストーリースパン、イベントスパンなど、詳細な専門家アノテーションが含まれています。筆者らは,本データを用いたさまざまな検出手法の評価を行い,ストーリーテリングに焦点をあてたオンラインストーリーテリングの特徴を識別する。我々は,大規模なコミュニティ中心のソーシャルメディアプラットフォーム上でのストーリーテリングの分布特性を照らし,また,物語テリングを多くの説得的戦略の1つとして活用するr/ChangeMyViewのケーススタディも実施し,我々のデータとモデルがコミュニティ間およびコミュニティ内研究の両方に利用できることを示した。最後に,ナラトロジーにおけるツールの意味と分析,およびオンラインコミュニティの研究について論じる。

Story detection in online communities is a challenging task as stories are scattered across communities and interwoven with non-storytelling spans within a single text. We address this challenge by building and releasing the StorySeeker toolkit, including a richly annotated dataset of 502 Reddit posts and comments, a detailed codebook adapted to the social media context, and models to predict storytelling at the document and span levels. Our dataset is sampled from hundreds of popular English-language Reddit communities ranging across 33 topic categories, and it contains fine-grained expert annotations, including binary story labels, story spans, and event spans. We evaluate a range of detection methods using our data, and we identify the distinctive textual features of online storytelling, focusing on storytelling spans. We illuminate distributional characteristics of storytelling on a large community-centric social media platform, and we also conduct a case study on r/ChangeMyView, where storytelling is used as one of many persuasive strategies, illustrating that our data and models can be used for both inter- and intra-community research. Finally, we discuss implications of our tools and analyses for narratology and the study of online communities.

翻訳日:2024-08-07 00:06:03 公開日:2024-08-02

# Emu Video: 明示的な画像コンディショニングによるテキスト・ツー・ビデオ生成の要因付け

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning ( http://arxiv.org/abs/2311.10709v2 )

ライセンス: Link先を確認

Rohit Girdhar, Mannat Singh, Andrew Brown, Quentin Duval, Samaneh Azadi, Sai Saketh Rambhatla, Akbar Shah, Xi Yin, Devi Parikh, Ishan Misra,

(参考訳) 本稿では,テキストに条件付き画像を生成し,テキストと生成された画像に条件付き映像を生成する2つのステップに分解するテキスト間ビデオ生成モデルであるEmu Videoを提案する。重要設計決定 - 拡散のための調整されたノイズスケジュール、高画質・高解像度ビデオを直接生成できるマルチステージトレーニング。人間の評価では、我々の生成されたビデオは、以前の作業の81%対GoogleのImagen Video、90%対NvidiaのPYOCO、そして96%対MetaのMake-A-Videoに比べて、品質が強く好まれています。私たちのモデルはRunwayMLのGen2やPika Labsといった商用ソリューションよりも優れています。最後に,本手法は,ユーザのテキストプロンプトに基づく画像のアニメーションに自然に寄与する。

We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image. We identify critical design decisions--adjusted noise schedules for diffusion, and multi-stage training that enable us to directly generate high quality and high resolution videos, without requiring a deep cascade of models as in prior work. In human evaluations, our generated videos are strongly preferred in quality compared to all prior work--81% vs. Google's Imagen Video, 90% vs. Nvidia's PYOCO, and 96% vs. Meta's Make-A-Video. Our model outperforms commercial solutions such as RunwayML's Gen2 and Pika Labs. Finally, our factorizing approach naturally lends itself to animating images based on a user's text prompt, where our generations are preferred 96% over prior work.

翻訳日:2024-08-07 00:06:03 公開日:2024-08-02

# イベントビジョンのためのSNNのための非同期バイオプラスティックニューロン

Asynchronous Bioplausible Neuron for SNN for Event Vision ( http://arxiv.org/abs/2311.11853v2 )

ライセンス: Link先を確認

Sanket Kachole, Hussain Sajwani, Fariborz Baghaei Naeini, Dimitrios Makris, Yahya Zweiri,

(参考訳) Spiking Neural Networks (SNN)は、生物学的にインスパイアされたコンピュータビジョンのアプローチを提供する。しかしながら、これらのネットワーク内でのホメオスタシスを維持することは、様々な予測不可能な入力信号の中で平衡と最適な処理効率を維持するために、神経応答の連続的な調整を必要とするため、困難である。これらの課題に対応するために、入力信号の変動を自動的に調整する動的スパイク発火機構であるABN(Asynchronous Bioplausible Neuron)を提案する。様々なデータセットにわたる包括的評価は、画像分類とセグメンテーション、神経平衡の維持、エネルギー効率におけるABNの強化された性能を示す。

Spiking Neural Networks (SNNs) offer a biologically inspired approach to computer vision that can lead to more efficient processing of visual data with reduced energy consumption. However, maintaining homeostasis within these networks is challenging, as it requires continuous adjustment of neural responses to preserve equilibrium and optimal processing efficiency amidst diverse and often unpredictable input signals. In response to these challenges, we propose the Asynchronous Bioplausible Neuron (ABN), a dynamic spike firing mechanism to auto-adjust the variations in the input signal. Comprehensive evaluation across various datasets demonstrates ABN's enhanced performance in image classification and segmentation, maintenance of neural equilibrium, and energy efficiency.

翻訳日:2024-08-07 00:06:03 公開日:2024-08-02

# VAE-IF:定期的に取得したICU時系列における非教師なしアーティファクト検出のための平均化による深部特徴抽出

VAE-IF: Deep feature extraction with averaging for fully unsupervised artifact detection in routinely acquired ICU time-series ( http://arxiv.org/abs/2312.05959v2 )

ライセンス: Link先を確認

Hollan Haule, Ian Piper, Patricia Jones, Chen Qin, Tsz-Yan Milly Lo, Javier Escudero,

(参考訳) 人工物は、集中治療単位(ICU)やその他の設定から収集される生理的時系列において一般的な問題である。臨床研究や患者医療の質や信頼性に影響を及ぼす。アーティファクトのマニュアルアノテーションは費用がかかり、時間がかかり、実用的ではない。自動化された方法が望ましい。本稿では,従来のラベル付けや信号固有知識を使わずに,臨床標準,分単位でのICUデータ中のアーティファクトを検出するための,教師なしの手法を提案する。我々のアプローチは、変動型オートエンコーダ(VAE)と孤立林(IF)をハイブリッドモデルに組み合わせて、血圧、心拍数、頭蓋内圧など、様々な種類の重要な徴候の異常を学習し、同定する。我々は、実世界のICUデータセットに対するアプローチを評価し、長寿命メモリ(LSTM)とXGBoostに基づく教師付きベンチマークモデルと、ARIMAのような統計的手法との比較を行った。我々の教師なしアプローチは、完全に教師付きされた手法に匹敵する感度を達成し、外部データセットによく当てはまることを示す。また、VAEが学習した潜伏空間を可視化し、クリーンでノイズの多いサンプルを分解する能力を示す。本手法は,臨床研究や実践において,ラベルを一切必要とせずにICUデータをクリーニングする,有望なソリューションを提供する。

Artifacts are a common problem in physiological time series collected from intensive care units (ICU) and other settings. They affect the quality and reliability of clinical research and patient care. Manual annotation of artifacts is costly and time-consuming, rendering it impractical. Automated methods are desired. Here, we propose a novel fully unsupervised approach to detect artifacts in clinical-standard, minute-by-minute resolution ICU data without any prior labeling or signal-specific knowledge. Our approach combines a variational autoencoder (VAE) and an isolation forest (IF) into a hybrid model to learn features and identify anomalies in different types of vital signs, such as blood pressure, heart rate, and intracranial pressure. We evaluate our approach on a real-world ICU dataset and compare it with supervised benchmark models based on long short-term memory (LSTM) and XGBoost and statistical methods such as ARIMA. We show that our unsupervised approach achieves comparable sensitivity to fully supervised methods and generalizes well to an external dataset. We also visualize the latent space learned by the VAE and demonstrate its ability to disentangle clean and noisy samples. Our approach offers a promising solution for cleaning ICU data in clinical research and practice without the need for any labels whatsoever.

翻訳日:2024-08-06 23:55:54 公開日:2024-08-02

# 循環流によるホログラフィックエントロピー円錐の向こう側

Beyond the Holographic Entropy Cone via Cycle Flows ( http://arxiv.org/abs/2312.10137v2 )

ライセンス: Link先を確認

Temple He, Sergio Hernández-Cuenca, Cynthia Keeler,

(参考訳) ビットスレッドをモチベーションとして,ホログラフィックエントロピー円錐の外側のエントロピーベクトルを計算するための新しい処方令を導入する。有向グラフ上のサイクルフローを利用することで、頂点の任意の部分集合に付随する最大サイクルフローが、サブシステムに対応するもので、明らかに浄化対称性に従うことを示す。さらに、自分自身を有向グラフのサブクラスに制限することにより、最大サイクルフローが部分加法性と強い部分加法の両方に従うことを証明し、それによって、部分系に関連するエントロピーの候補として確立する。最後に、我々のモデルは、非方向グラフの従来の流れを通して得られるエントロピーベクトルをどのように一般化するかを示し、また、我々のモデルは、ハイパーグラフから生じるエントロピーベクトルを同様に一般化する。

Motivated by bit threads, we introduce a new prescription for computing entropy vectors outside the holographic entropy cone. By utilizing cycle flows on directed graphs, we show that the maximum cycle flow associated to any subset of vertices, which corresponds to a subsystem, manifestly obeys purification symmetry. Furthermore, by restricting ourselves to a subclass of directed graphs, we prove that the maximum cycle flow obeys both subadditivity and strong subadditivity, thereby establishing it as a viable candidate for the entropy associated to the subsystem. Finally, we demonstrate how our model generalizes the entropy vectors obtainable via conventional flows in undirected graphs, as well as conjecture that our model similarly generalizes the entropy vectors arising from hypergraphs.

翻訳日:2024-08-06 23:55:54 公開日:2024-08-02

# DEM: 航空宇宙におけるディープニューラルネットワーク分類器出力の認証方法

DEM: A Method for Certifying Deep Neural Network Classifier Outputs in Aerospace ( http://arxiv.org/abs/2401.02283v4 )

ライセンス: Link先を確認

Guy Katz, Natan Levy, Idan Refaeli, Raz Yerushalmi,

(参考訳) 航空宇宙分野におけるソフトウェア開発は、厳格で高品質な標準に固執する必要がある。この領域には商用ソフトウェア(例えば ARP-4754 や DO-178)の規制ガイドラインがあるが、ディープニューラルネットワーク(DNN)コンポーネントを持つソフトウェアには適用されない。したがって、航空宇宙システムが深層学習革命の恩恵を受けるためには、どうすればよいのかは不明である。我々の研究は、DNN認証のための新しいアウトプット中心のアプローチで、この問題に対処しようとしています。提案手法は統計的検証手法を用いており,DNNの出力が信頼できない可能性のある特定の入力をフラグできる重要な利点がある。そこで本手法では,DNNの他の近傍入力に対する予測を統計的に解析し,不整合を検出する。これは、個々の出力とは対照的に、DNN全体を認証しようとする既存の技術とは対照的である。本手法では,DNNをブラックボックスとして使用し,そのトポロジを仮定しない。この作業が、特に高品質と信頼性の基準が不可欠である航空宇宙領域において、安全クリティカルなアプリケーションにDNNを統合するための別のステップになることを期待しています。

Software development in the aerospace domain requires adhering to strict, high-quality standards. While there exist regulatory guidelines for commercial software in this domain (e.g., ARP-4754 and DO-178), these do not apply to software with deep neural network (DNN) components. Consequently, it is unclear how to allow aerospace systems to benefit from the deep learning revolution. Our work here seeks to address this challenge with a novel, output-centric approach for DNN certification. Our method employs statistical verification techniques, and has the key advantage of being able to flag specific inputs for which the DNN's output may be unreliable - so that they may be later inspected by a human expert. To achieve this, our method conducts a statistical analysis of the DNN's predictions for other, nearby inputs, in order to detect inconsistencies. This is in contrast to existing techniques, which typically attempt to certify the entire DNN, as opposed to individual outputs. Our method uses the DNN as a black-box, and makes no assumptions about its topology. We hope that this work constitutes another step towards integrating DNNs in safety-critical applications - especially in the aerospace domain, where high standards of quality and reliability are crucial.

翻訳日:2024-08-06 23:55:54 公開日:2024-08-02

# ミッション: 不可能な言語モデル

Mission: Impossible Language Models ( http://arxiv.org/abs/2401.06416v2 )

ライセンス: Link先を確認

Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, Christopher Potts,

(参考訳) チョムスキーらは、大きな言語モデル(LLM)は人間が学べることが可能で不可能な言語を等しく学習できると主張している。しかし、そのような主張を支持する実験的な証拠はほとんど発表されていない。そこで我々は,不自然な単語順序と文法規則で英語データを体系的に変更して設計した,複雑さの異なる合成不可能な言語の集合を開発する。英語の単語のランダムなシャッフルや不可逆的なシャッフルなど、本質的に不可能な言語であり、他方では直感的に不可能ではないが、言語学、特に単語の位置の数え方に基づく規則でよく考えられている言語である。本稿では,GPT-2小モデルの学習能力を評価するための多種多様な評価について報告し,各言語の学習過程を比較するために,これらの評価を訓練期間中,様々な段階で実施する。我々の中核的な発見は、GPT-2が英語を対照として、不可能な言語を学ぶのに苦労していることであり、中核的な主張に挑戦する。さらに重要なことは、我々のアプローチが、様々なLLMアーキテクチャを様々な不可能な言語でテストし、これらの認知的および型論的調査のツールとしてどのようにLLMを利用できるかを学ぶために、生産的な調査の行をオープンにすることを願っている。

Chomsky and others have very directly claimed that large language models (LLMs) are equally capable of learning languages that are possible and impossible for humans to learn. However, there is very little published experimental evidence to support such a claim. Here, we develop a set of synthetic impossible languages of differing complexity, each designed by systematically altering English data with unnatural word orders and grammar rules. These languages lie on an impossibility continuum: at one end are languages that are inherently impossible, such as random and irreversible shuffles of English words, and on the other, languages that may not be intuitively impossible but are often considered so in linguistics, particularly those with rules based on counting word positions. We report on a wide range of evaluations to assess the capacity of GPT-2 small models to learn these uncontroversially impossible languages, and crucially, we perform these assessments at various stages throughout training to compare the learning process for each language. Our core finding is that GPT-2 struggles to learn impossible languages when compared to English as a control, challenging the core claim. More importantly, we hope our approach opens up a productive line of inquiry in which different LLM architectures are tested on a variety of impossible languages in an effort to learn more about how LLMs can be used as tools for these cognitive and typological investigations.

翻訳日:2024-08-06 23:46:09 公開日:2024-08-02

# AONeuS:音響光学式センサフュージョンのためのニューラルネットワークフレームワーク

AONeuS: A Neural Rendering Framework for Acoustic-Optical Sensor Fusion ( http://arxiv.org/abs/2402.03309v3 )

ライセンス: Link先を確認

Mohamad Qadri, Kevin Zhang, Akshay Hinduja, Michael Kaess, Adithya Pediredla, Christopher A. Metzler,

(参考訳) 水中の知覚と3次元表面の再構築は、建設、セキュリティ、海洋考古学、環境モニタリングにおける幅広い応用において難しい問題である。危険な操作条件、脆弱な環境、限られた航法制御は、潜水艇がその移動範囲を制限し、測定を捉えるための基準線を規定することが多い。 3次元シーン再構築の文脈では、より小さなベースラインが再構築をより困難にすることが知られている。本研究は,高分解能RGB計測と低分解能深度画像ソナー計測を効果的に統合できる物理ベースの多モード音響-光学ニューラルサーフェス再構成フレームワーク(AONeuS)を開発した。これらの相補的なモダリティを融合させることで,本フレームワークは,高度に制限されたベースライン上での計測から高精度な高解像度3次元表面を再構築することができる。広範囲なシミュレーションと実験により, AONeuS は最近の RGB とソナーのみの逆微分可能な面再構成法を劇的に上回っていることを示した。論文の結果を視覚化するWebサイトは、このアドレスにある: https://aoneus.github.io/

Underwater perception and 3D surface reconstruction are challenging problems with broad applications in construction, security, marine archaeology, and environmental monitoring. Treacherous operating conditions, fragile surroundings, and limited navigation control often dictate that submersibles restrict their range of motion and, thus, the baseline over which they can capture measurements. In the context of 3D scene reconstruction, it is well-known that smaller baselines make reconstruction more challenging. Our work develops a physics-based multimodal acoustic-optical neural surface reconstruction framework (AONeuS) capable of effectively integrating high-resolution RGB measurements with low-resolution depth-resolved imaging sonar measurements. By fusing these complementary modalities, our framework can reconstruct accurate high-resolution 3D surfaces from measurements captured over heavily-restricted baselines. Through extensive simulations and in-lab experiments, we demonstrate that AONeuS dramatically outperforms recent RGB-only and sonar-only inverse-differentiable-rendering--based surface reconstruction methods. A website visualizing the results of our paper is located at this address: https://aoneus.github.io/

翻訳日:2024-08-06 23:46:09 公開日:2024-08-02

# 変分量子アルゴリズムにおける単位ノイズを超えて--雑音誘起バレンプラトーと極限集合

Beyond unital noise in variational quantum algorithms: noise-induced barren plateaus and limit sets ( http://arxiv.org/abs/2402.08721v5 )

ライセンス: Link先を確認

P. Singkanipa, D. A. Lidar,

(参考訳) 変分量子アルゴリズム(VQA)は、多くの可能性を秘めているが、指数的に小さな勾配の挑戦に直面している。このバレンプラトー(BP)現象は、VQAの指数的トレーニングオーバーヘッドをもたらす。おそらく最も悪名高いのがノイズ誘起バレン台地(NIBP)であり、これはオープン・システム・エフェクトから生じる避けられないBPの一種である。ここでは、NIBP の研究をより一般的な正のトレース保存写像に一般化し、ユニタリケースにおける NIBP の存在とヒルベルト・シュミット (HS)-コントラクティブ (Hilbert-Schmidt) と呼ばれる非ユニタリ写像のクラスを確立する。後者は振幅減衰を含む。本稿では,VQAコスト関数のノイズ誘起極限集合(NILS)の関連現象を同定し,その存在を一元的および一元的非一元的ノイズマップで証明する。その過程で、VQAのパラメータシフトルールをノイズ設定に拡張する。解析結果を示す非分極および振幅減衰マップの数値シミュレーションとともに、NIBPとNILSを生じさせる変数に関して厳密な境界を提供する。

Variational quantum algorithms (VQAs) hold much promise but face the challenge of exponentially small gradients. Unmitigated, this barren plateau (BP) phenomenon leads to an exponential training overhead for VQAs. Perhaps the most pernicious are noise-induced barren plateaus (NIBPs), a type of unavoidable BP arising from open system effects, which have so far been shown to exist for unital noise maps. Here, we generalize the study of NIBPs to more general completely positive, trace-preserving maps, establishing the existence of NIBPs in the unital case and a class of non-unital maps we call Hilbert-Schmidt (HS)-contractive. The latter includes amplitude damping. We identify the associated phenomenon of noise-induced limit sets (NILS) of the VQA cost function and prove its existence for both unital and HS-contractive non-unital noise maps. Along the way, we extend the parameter shift rule of VQAs to the noisy setting. We provide rigorous bounds in terms of the relevant variables that give rise to NIBPs and NILSs, along with numerical simulations of the depolarizing and amplitude-damping maps that illustrate our analytical results.

翻訳日:2024-08-06 23:36:13 公開日:2024-08-02

# ReViT: 意図的残差接続によるビジョントランスフォーマーの多様性向上

ReViT: Enhancing Vision Transformers Feature Diversity with Attention Residual Connections ( http://arxiv.org/abs/2402.11301v2 )

ライセンス: Link先を確認

Anxhelo Diko, Danilo Avola, Marco Cascio, Luigi Cinque,

(参考訳) 視覚変換器 (ViT) の自己保持機構は, 深い層に特徴的崩壊が生じ, 低レベルの視覚的特徴が消失する。しかし、そのような特徴は、画像内の要素を正確に表現し、識別し、視覚ベースの認識システムの精度と堅牢性を高めるのに役立つ。そこで本研究では,視覚的特徴の多様性の向上とモデルロバスト性の向上を図り,視覚的特徴量の向上を図り,視覚的特徴量の向上とモデルロバスト性の向上を図った。このようにして、提案するネットワークは、重要な低レベル特徴をキャプチャして保存し、分析対象のシーン内の要素についてより詳細な情報を提供する。提案手法の有効性とロバスト性は,ImageNet1k, CIFAR10, CIFAR100, Oxford Flowers-102, Oxford-IIIT Petの5つの画像分類ベンチマークで評価され, 性能が向上した。さらに、COCO2017データセットの実験では、空間認識トランスフォーマーモデルに実装された場合、オブジェクト検出とインスタンスセグメンテーションのための意味的および空間的関係を発見し、組み込むことが示されている。

Vision Transformer (ViT) self-attention mechanism is characterized by feature collapse in deeper layers, resulting in the vanishing of low-level visual features. However, such features can be helpful to accurately represent and identify elements within an image and increase the accuracy and robustness of vision-based recognition systems. Following this rationale, we propose a novel residual attention learning method for improving ViT-based architectures, increasing their visual feature diversity and model robustness. In this way, the proposed network can capture and preserve significant low-level features, providing more details about the elements within the scene being analyzed. The effectiveness and robustness of the presented method are evaluated on five image classification benchmarks, including ImageNet1k, CIFAR10, CIFAR100, Oxford Flowers-102, and Oxford-IIIT Pet, achieving improved performances. Additionally, experiments on the COCO2017 dataset show that the devised approach discovers and incorporates semantic and spatial relationships for object detection and instance segmentation when implemented into spatial-aware transformer models.

翻訳日:2024-08-06 23:36:13 公開日:2024-08-02

# 超伝導量子ビットアレイにおけるフォノンを介する準粒子中毒のモデル化

Modeling phonon-mediated quasiparticle poisoning in superconducting qubit arrays ( http://arxiv.org/abs/2402.15471v2 )

ライセンス: Link先を確認

Eric Yelton, Clayton P. Larson, Vito Iaia, Kenneth Dodge, Guglielmo La Magna, Paul G. Baity, Ivan V. Pechenezhskiy, Robert McDermott, Noah Kurinsky, Gianluigi Catelani, Britton L. T. Plourde,

(参考訳) 超伝導量子ビットチップに衝突する電離放射線による相関誤差は、量子誤り訂正に問題となる。このような影響は、クビット電極に準粒子(QP)励起を生成し、クビットコヒーレンスを一時的に減少させる。粒子衝突によって生じる多くのエネルギーフォノンは、デバイス基板中を効率よく移動し、高い確率で準粒子を生成する。衝撃の余波におけるフォノンおよび準粒子動力学の数値シミュレーションのための総合的戦略について述べる。フォノンを介するQP毒の実験的測定と比較し,本モデルがフォノンダウンコンバージョン構造の様々な構成においてQP毒の空間的および時間的フットプリントを捉えることを実証した。そこで我々は、電離放射線の存在下で超伝導量子プロセッサを動作させるための経路を提示する。

Correlated errors caused by ionizing radiation impacting superconducting qubit chips are problematic for quantum error correction. Such impacts generate quasiparticle (QP) excitations in the qubit electrodes, which temporarily reduce qubit coherence significantly. The many energetic phonons produced by a particle impact travel efficiently throughout the device substrate and generate quasiparticles with high probability, thus causing errors on a large fraction of the qubits in an array simultaneously. We describe a comprehensive strategy for the numerical simulation of the phonon and quasiparticle dynamics in the aftermath of an impact. We compare the simulations with experimental measurements of phonon-mediated QP poisoning and demonstrate that our modeling captures the spatial and temporal footprint of the QP poisoning for various configurations of phonon downconversion structures. We thus present a path forward for the operation of superconducting quantum processors in the presence of ionizing radiation.

翻訳日:2024-08-06 23:36:13 公開日:2024-08-02

# PCR-99:99%のアウトリーチによるポイントクラウド登録の実践的方法

PCR-99: A Practical Method for Point Cloud Registration with 99 Percent Outliers ( http://arxiv.org/abs/2402.16598v5 )

ライセンス: Link先を確認

Seong Hun Lee, Javier Civera, Patrick Vandewalle,

(参考訳) 本稿では,未知のスケールと極端外周比の両方を扱える点雲登録法を提案する。 PCR-99と呼ばれる本手法では, 速度を著しく向上させる2つの新しいメカニズムを持つ決定論的3点サンプリング手法を用いて, 1) ペアスケールの整合性に基づくサンプルの整合性の向上, および(2) トリプルトスケールの整合性に基づく効率的な外乱除去手法, 悪いサンプルの事前スクリーニング, テスト対象の仮説数の削減を行う。提案手法は,98%のアウトレイラ比において,最先端技術に匹敵する性能を達成できることを示す。しかし、99%のアウトラヤ比では、既知のスケールと未知のスケールの問題の両方において、最先端の問題を上回ります。特に後者では、ロバスト性と速度の観点から明らかな優位性を観察する。

We propose a robust method for point cloud registration that can handle both unknown scales and extreme outlier ratios. Our method, dubbed PCR-99, uses a deterministic 3-point sampling approach with two novel mechanisms that significantly boost the speed: (1) an improved ordering of the samples based on pairwise scale consistency, prioritizing the point correspondences that are more likely to be inliers, and (2) an efficient outlier rejection scheme based on triplet scale consistency, prescreening bad samples and reducing the number of hypotheses to be tested. Our evaluation shows that, up to 98% outlier ratio, the proposed method achieves comparable performance to the state of the art. At 99% outlier ratio, however, it outperforms the state of the art for both known-scale and unknown-scale problems. Especially for the latter, we observe a clear superiority in terms of robustness and speed.

翻訳日:2024-08-06 23:26:29 公開日:2024-08-02

# $$-QVAE:正規化混合状態潜在表現を用いた量子変分オートエンコーダ

$ζ$-QVAE: A Quantum Variational Autoencoder utilizing Regularized Mixed-state Latent Representations ( http://arxiv.org/abs/2402.17749v2 )

ライセンス: Link先を確認

Gaoyuan Wang, Jonathan Warrell, Prashant S. Emani, Mark Gerstein,

(参考訳) 短期量子コンピューティングにおける大きな課題は、量子ハードウェアリソースの不足による大規模な実世界のデータセットへの適用である。このようなデータセットに対してトラクタブルな量子モデルを可能にするアプローチの1つは、下流分析に不可欠な情報を示しながら、元のデータを管理可能な次元に圧縮することである。古典的機械学習では、変動オートエンコーダ(VAE)は効率的なデータ圧縮、その後のタスクの表現学習、新しいデータ生成を容易にする。しかし、量子コンピュータ上での量子データへの直接適用のために、これらの特徴をすべて正確に捉えるモデルが提案されていない。データ圧縮のための既存の量子モデルは、潜在表現の正規化を欠いているため、一般化の生成と制御に直接的な使用を妨げている。他のモデルは、いくつかの内部量子成分しか持たないハイブリッドモデルであり、量子データを直接訓練することを妨げている。このギャップを埋めるために、古典的VAEのすべての能力を包含し、古典的データ圧縮と量子的データ圧縮の両方に直接適用できる完全量子フレームワークである$\zeta$-QVAEを提案する。我々のモデルは、正規化された混合状態を利用して最適な潜在表現を得る。再建・正規化に様々な違いがある。さらに、各段階で混合状態の調整を行うことで、全データ密度行列を利用でき、"グローバル"トレーニングの目的を達成できる。そうすることで効率の良い最適化が可能になり、プライベートとフェデレーションの学習に潜在的に影響する可能性がある。我々は,$\zeta$-QVAEの理論的性質の探索に加えて,代表ゲノミクスと合成データの性能を実証する。我々の結果は、$\zeta$-QVAEがマッチした古典モデルと比較すると、類似またはより良い性能を示すことを一貫して示している。

A major challenge in near-term quantum computing is its application to large real-world datasets due to scarce quantum hardware resources. One approach to enabling tractable quantum models for such datasets involves compressing the original data to manageable dimensions while still representing essential information for downstream analysis. In classical machine learning, variational autoencoders (VAEs) facilitate efficient data compression, representation learning for subsequent tasks, and novel data generation. However, no model has been proposed that exactly captures all of these features for direct application to quantum data on quantum computers. Some existing quantum models for data compression lack regularization of latent representations, thus preventing direct use for generation and control of generalization. Others are hybrid models with only some internal quantum components, impeding direct training on quantum data. To bridge this gap, we present a fully quantum framework, $\zeta$-QVAE, which encompasses all the capabilities of classical VAEs and can be directly applied for both classical and quantum data compression. Our model utilizes regularized mixed states to attain optimal latent representations. It accommodates various divergences for reconstruction and regularization. Furthermore, by accommodating mixed states at every stage, it can utilize the full-data density matrix and allow for a "global" training objective. Doing so, in turn, makes efficient optimization possible and has potential implications for private and federated learning. In addition to exploring the theoretical properties of $\zeta$-QVAE, we demonstrate its performance on representative genomics and synthetic data. Our results consistently indicate that $\zeta$-QVAE exhibits similar or better performance compared to matched classical models.

翻訳日:2024-08-06 23:26:29 公開日:2024-08-02

# マンバにおけるファクチュアル・アソシエーションの立地と編集

Locating and Editing Factual Associations in Mamba ( http://arxiv.org/abs/2404.03646v2 )

ライセンス: Link先を確認

Arnab Sen Sharma, David Atkinson, David Bau,

(参考訳) 本研究では,マンバ状態空間モデルにおける事実的リコールのメカニズムについて検討する。我々の研究は, 自己回帰型トランスフォーマー言語モデルにおける過去の知見に触発されて, それらの知識リコールが特定のトークン位置の特定のモジュールに局所化されていることを示唆し, マンバにおける事実リコールが同様に局所化可能であるかどうかを問う。これを調べるために,マンバで4行の実験を行う。まず,中間層内の特定の成分が被写体の最後のトークンにおいて強い因果効果を示すのに対して,後層の介入による因果効果は,前者のトークンにおいて最も顕著であり,自己回帰トランスフォーマーにおける前の結果と一致する。第2に、トランスフォーマーLMの発見に類似した、ランクワンのモデル編集手法が、特定の場所で事実を挿入できることを示す。第3に,マンバの事実関係表現の線型性について検討する。最後に,実際のリコール時の情報の流れを識別するために,注意ノックアウト手法をMambaに適用する。我々は、Mambaを、同様のサイズの自己回帰変換器LMと直接比較し、アーキテクチャアプローチに大きな違いがあるにもかかわらず、事実的リコールに関しては、2つのアーキテクチャが多くの類似点を共有していると結論づける。

We investigate the mechanisms of factual recall in the Mamba state space model. Our work is inspired by previous findings in autoregressive transformer language models suggesting that their knowledge recall is localized to particular modules at specific token locations; we therefore ask whether factual recall in Mamba can be similarly localized. To investigate this, we conduct four lines of experiments on Mamba. First, we apply causal tracing or interchange interventions to localize key components inside Mamba that are responsible for recalling facts, revealing that specific components within middle layers show strong causal effects at the last token of the subject, while the causal effect of intervening on later layers is most pronounced at the last token of the prompt, matching previous findings on autoregressive transformers. Second, we show that rank-one model editing methods can successfully insert facts at specific locations, again resembling findings on transformer LMs. Third, we examine the linearity of Mamba's representations of factual relations. Finally we adapt attention-knockout techniques to Mamba in order to dissect information flow during factual recall. We compare Mamba directly to a similar-sized autoregressive transformer LM and conclude that despite significant differences in architectural approach, when it comes to factual recall, the two architectures share many similarities.

翻訳日:2024-08-06 23:16:45 公開日:2024-08-02

# 低ランクテンソルコンプリートによる変分量子アルゴリズムのランドスケープ再構築

Variational Quantum Algorithm Landscape Reconstruction by Low-Rank Tensor Completion ( http://arxiv.org/abs/2405.10941v2 )

ライセンス: Link先を確認

Tianyi Hao, Zichang He, Ruslan Shaydulin, Marco Pistoia, Swamit Tannu,

(参考訳) 変分量子アルゴリズム(VQA)は、科学と産業に多くの応用がある幅広い種類のアルゴリズムである。問題へのVQAの適用には、コスト関数の最大化または最小化によるパラメータ化量子回路の最適化が含まれる。 VQAに関連する特別な課題は、関連するコスト関数の性質を理解することである。 VQAコスト関数のランドスケープを持つことは、新しい変分量子アルゴリズムの開発とテストに大いに役立つが、計算は非常に高価である。既存の技術を用いてVQAの景観を再構築するには、特にランドスケープの寸法や解像度が高い場合、多くのコスト関数評価が必要である。そこで本研究では,局所景観復元のための低ランクテンソル・コンプリート・アプローチを提案する。テンソルのコンパクトな低ランク表現を利用することで、この手法は次元の呪いを克服し、高解像度の景観を扱うことができる。本稿では,制約付き最適化問題に対するペナルティ項の解析と,特定の基底状態の確率景観の検証を実践的応用として示すことで,VQA開発における景観のパワーを実証する。

Variational quantum algorithms (VQAs) are a broad class of algorithms with many applications in science and industry. Applying a VQA to a problem involves optimizing a parameterized quantum circuit by maximizing or minimizing a cost function. A particular challenge associated with VQAs is understanding the properties of associated cost functions. Having the landscapes of VQA cost functions can greatly assist in developing and testing new variational quantum algorithms, but they are extremely expensive to compute. Reconstructing the landscape of a VQA using existing techniques requires a large number of cost function evaluations, especially when the dimension or the resolution of the landscape is high. To address this challenge, we propose a low-rank tensor-completion-based approach for local landscape reconstruction. By leveraging compact low-rank representations of tensors, our technique can overcome the curse of dimensionality and handle high-resolution landscapes. We demonstrate the power of landscapes in VQA development by showcasing practical applications of analyzing penalty terms for constrained optimization problems and examining the probability landscapes of certain basis states.

翻訳日:2024-08-06 23:16:45 公開日:2024-08-02

# 一般化Lefschetz thimble法による量子宇宙論のモンテカルロ研究

Monte Carlo studies of quantum cosmology by the generalized Lefschetz thimble method ( http://arxiv.org/abs/2407.17724v2 )

ライセンス: Link先を確認

Chien-Yu Chou, Jun Nishimura,

(参考訳) 量子宇宙論は宇宙の始まりを解明することを目的としています。 80年代初期、ヴィレンキンとハートル・ホーキングは「何もない」と「境界のない」提案を提唱した。近年、ピカール・レフシェッツ理論を用いてローレンツ量子重力の振動経路積分を定義する観点から、この問題に対する新たな関心が高まっている。ミニ超空間とサドル点近似を超えていくことを目的として、一般化されたレフシェッツ・ティンブル法を用いてモンテカルロ計算を行い、符号問題を克服する。特に、パラメータに応じてロビン境界条件を使用する場合、ヴィレンキンあるいはハートル・ホーキング・サドル点が関係することを確認する。また、量子宇宙論の基本的な問題として、ラプス関数の積分領域に関する問題や、サドル点で得られた複素幾何学から実幾何学を読み取る問題などを明らかにした。

Quantum cosmology aims at elucidating the beginning of our Universe. Back in early 80's, Vilenkin and Hartle-Hawking put forward the "tunneling from nothing" and "no boundary" proposals. Recently there has been renewed interest in this subject from the viewpoint of defining the oscillating path integral for Lorentzian quantum gravity using the Picard-Lefschetz theory. Aiming at going beyond the mini-superspace and saddle-point approximations, we perform Monte Carlo calculations using the generalized Lefschetz thimble method to overcome the sign problem. In particular, we confirm that either Vilenkin or Hartle-Hawking saddle point becomes relevant if one uses the Robin boundary condition depending on its parameter. We also clarify some fundamental issues in quantum cosmology, such as an issue related to the integration domain of the lapse function and an issue related to reading off the real geometry from the complex geometry obtained at the saddle point.

翻訳日:2024-08-06 23:07:02 公開日:2024-08-02

# BlenderAlchemy:ビジョンランゲージモデルによる3Dグラフィックの編集

BlenderAlchemy: Editing 3D Graphics with Vision-Language Models ( http://arxiv.org/abs/2404.17672v3 )

ライセンス: Link先を確認

Ian Huang, Guandao Yang, Leonidas Guibas,

(参考訳) グラフィックデザインは、映画制作やゲームデザインなど様々な用途において重要である。高品質なシーンを作るためには、デザイナーは通常、Blenderのようなソフトウェアに何時間も費やす必要がある。さらに、わずかに異なる設計目標には、完全に異なるシーケンスが必要になる可能性があるため、自動化が難しくなる。本稿では,GPT-4Vのような視覚言語モデル(VLM)を利用して,ユーザの意図を満足できる回答に到達するための設計行動空間をインテリジェントに探索するシステムを提案する。具体的には、視覚に基づく編集生成器と状態評価器を協調して設計し、その目標を達成するためのアクションの正しいシーケンスを見つける。人間のデザインプロセスにおける視覚的想像力の役割に触発されて、VLMの視覚的推論能力と画像生成モデルからの「想像」参照イメージを補完し、抽象言語記述の視覚的基盤を提供する。本稿では,テキストや参照画像からの手続き資料や幾何学の編集や,複雑なシーンにおける製品レンダリングの照明構成の調整といったタスクに対して,簡単なが退屈なブレンダー編集シーケンスを生成できることを示す実証的証拠を提供する。

Graphics design is important for various applications, including movie production and game design. To create a high-quality scene, designers usually need to spend hours in software like Blender, in which they might need to interleave and repeat operations, such as connecting material nodes, hundreds of times. Moreover, slightly different design goals may require completely different sequences, making automation difficult. In this paper, we propose a system that leverages Vision-Language Models (VLMs), like GPT-4V, to intelligently search the design action space to arrive at an answer that can satisfy a user's intent. Specifically, we design a vision-based edit generator and state evaluator to work together to find the correct sequence of actions to achieve the goal. Inspired by the role of visual imagination in the human design process, we supplement the visual reasoning capabilities of VLMs with "imagined" reference images from image-generation models, providing visual grounding of abstract language descriptions. In this paper, we provide empirical evidence suggesting our system can produce simple but tedious Blender editing sequences for tasks such as editing procedural materials and geometry from text and/or reference images, as well as adjusting lighting configurations for product renderings in complex scenes.

翻訳日:2024-08-06 22:45:03 公開日:2024-08-02

# 水平拡大:長尺胸部X線分類のためのハイブリッド量子伝達学習の実現

Expanding the Horizon: Enabling Hybrid Quantum Transfer Learning for Long-Tailed Chest X-Ray Classification ( http://arxiv.org/abs/2405.00156v2 )

ライセンス: Link先を確認

Skylar Chan, Pranav Kulkarni, Paul H. Yi, Vishwa S. Parekh,

(参考訳) 量子機械学習(QML)は、サンプル効率と一般化性において古典的機械学習(CML)よりも理論的に有利なため、大規模胸部X線(CXR)データセットにおける希少かつ致命的な疾患のマルチラベル分類を改善する可能性がある。以前の文献では、QMLをCXRで調べているが、量子ハードウェアや計算コストのかかるシミュレーションへのアクセスに制限があるため、小さなデータセットを使ったバイナリ分類タスクに重点を置いている。そのために我々は,現在のソフトウェア製品よりもウォールタイム時間を大幅に改善した,中規模のキュービットアーキテクチャのシミュレーションを可能にするJaxベースのフレームワークを実装した。我々は,大規模CXRデータセットを用いて,8,14,19の疾患ラベルの長期分類のためのハイブリッド量子トランスファー学習の効率と性能の観点から,Jaxベースのフレームワークの性能を評価した。 Jaxベースのフレームワークは、それぞれPyTorchとTensorFlowの実装と比較して、最大58%と95%のスピードアップを実現した。しかし, CMLと比較すると, 平均AUROCは0.70, 0.73, 0.74, CXR病ラベルは8, 14, 19であった。一方、CMLモデルの平均AUROCは0.77、0.78、0.80であった。結論として,計算効率のよいJaxベースのフレームワークを用いて,長い尾を持つCXR分類のためのハイブリッド量子トランスファー学習の実装を提案する。

Quantum machine learning (QML) has the potential for improving the multi-label classification of rare, albeit critical, diseases in large-scale chest x-ray (CXR) datasets due to theoretical quantum advantages over classical machine learning (CML) in sample efficiency and generalizability. While prior literature has explored QML with CXRs, it has focused on binary classification tasks with small datasets due to limited access to quantum hardware and computationally expensive simulations. To that end, we implemented a Jax-based framework that enables the simulation of medium-sized qubit architectures with significant improvements in wall-clock time over current software offerings. We evaluated the performance of our Jax-based framework in terms of efficiency and performance for hybrid quantum transfer learning for long-tailed classification across 8, 14, and 19 disease labels using large-scale CXR datasets. The Jax-based framework resulted in up to a 58% and 95% speed-up compared to PyTorch and TensorFlow implementations, respectively. However, compared to CML, QML demonstrated slower convergence and an average AUROC of 0.70, 0.73, and 0.74 for the classification of 8, 14, and 19 CXR disease labels. In comparison, the CML models had an average AUROC of 0.77, 0.78, and 0.80 respectively. In conclusion, our work presents an accessible implementation of hybrid quantum transfer learning for long-tailed CXR classification with a computationally efficient Jax-based framework.

翻訳日:2024-08-06 22:45:03 公開日:2024-08-02

# 概念以上の意味を持つ関係:CoReXによる分類決定の探索と評価

When a Relation Tells More Than a Concept: Exploring and Evaluating Classifier Decisions with CoReX ( http://arxiv.org/abs/2405.01661v2 )

ライセンス: Link先を確認

Bettina Finzel, Patrick Hilme, Johannes Rabold, Ute Schmid,

(参考訳) 入力画素の関連性に基づく畳み込みニューラルネットワーク(CNN)の解説は、どの入力特徴がモデル決定にどのように影響するかを評価するには、あまり特異ではないかもしれない。特に生物学のような複雑な現実世界の領域では、特定の概念の存在と概念間の関係はクラス間で区別される。ピクセルの関連性はこの種の情報を伝えるのに十分ではない。結果として、モデル評価は制限され、データに関連性があり、モデル決定に影響を与えることは見過ごされかねない。本研究では,概念と関係に基づく説明器(CoReX)を用いて,CNNモデルの説明と評価を行う新しい手法を提案する。決定過程から関連する概念をマスキングし,学習した解釈可能なサロゲートモデルにおける関係を拘束することにより,画像の集合上でのモデルの予測挙動を説明する。いくつかの画像データセットとCNNアーキテクチャでアプローチをテストする。結果から,CNNモデルに対するCReXの説明は予測結果に忠実であることが示唆された。さらに,人間による評価を通じて,CNNの分類品質を評価する上で,CReXは複合的な説明を生成するのに適したツールであることを示す。さらに,CoReXが不正確な分類や曖昧な分類の識別と再分類を支援することを示す。

Explanations for Convolutional Neural Networks (CNNs) based on relevance of input pixels might be too unspecific to evaluate which and how input features impact model decisions. Especially in complex real-world domains like biology, the presence of specific concepts and of relations between concepts might be discriminating between classes. Pixel relevance is not expressive enough to convey this type of information. In consequence, model evaluation is limited and relevant aspects present in the data and influencing the model decisions might be overlooked. This work presents a novel method to explain and evaluate CNN models, which uses a concept- and relation-based explainer (CoReX). It explains the predictive behavior of a model on a set of images by masking (ir-)relevant concepts from the decision-making process and by constraining relations in a learned interpretable surrogate model. We test our approach with several image data sets and CNN architectures. Results show that CoReX explanations are faithful to the CNN model in terms of predictive outcomes. We further demonstrate through a human evaluation that CoReX is a suitable tool for generating combined explanations that help assessing the classification quality of CNNs. We further show that CoReX supports the identification and re-classification of incorrect or ambiguous classifications.

翻訳日:2024-08-06 22:45:03 公開日:2024-08-02

# データ抽出と材料特性予測のための会話モデルを用いた動的インコンテキスト学習

Dynamic In-context Learning with Conversational Models for Data Extraction and Materials Property Prediction ( http://arxiv.org/abs/2405.10448v2 )

ライセンス: Link先を確認

Chinedu Ekuma,

(参考訳) 自然言語処理と大規模言語モデル(LLM)の出現は、構造化されていない学術論文からのデータの抽出に革命をもたらした。しかし、データの信頼性を確保することは重要な課題である。本稿では,Google gemini-proやOpenAI gpt-4といった高度な対話型LLMを活用するオープンソースツールであるPropertyExtractorを紹介する。本試験では,約9%の誤差率で95%を超える精度とリコールを示し,ツールキットの有効性と汎用性を強調した。最後に、PropertyExtractorを用いて、2次元材料厚のデータベース、デバイス統合のクリティカルパラメータ、エネルギーバンドギャップ値を開発する。特に厚さデータベースの場合、フィールドの急速な進化は実験的な測定と計算方法の両方を上回り、重要なデータギャップを生み出している。このギャップに対処し、様々な物件データベースを自動生成するための信頼性と効率的なツールとしてのPropertyExtractorの可能性を示し、フィールドを前進させる。

The advent of natural language processing and large language models (LLMs) has revolutionized the extraction of data from unstructured scholarly papers. However, ensuring data trustworthiness remains a significant challenge. In this paper, we introduce PropertyExtractor, an open-source tool that leverages advanced conversational LLMs like Google gemini-pro and OpenAI gpt-4, blends zero-shot with few-shot in-context learning, and employs engineered prompts for the dynamic refinement of structured information hierarchies - enabling autonomous, efficient, scalable, and accurate identification, extraction, and verification of material property data. Our tests on material data demonstrate precision and recall that exceed 95\% with an error rate of approximately 9%, highlighting the effectiveness and versatility of the toolkit. Finally, databases for 2D material thicknesses, a critical parameter for device integration, and energy bandgap values are developed using PropertyExtractor. Specifically for the thickness database, the rapid evolution of the field has outpaced both experimental measurements and computational methods, creating a significant data gap. Our work addresses this gap and showcases the potential of PropertyExtractor as a reliable and efficient tool for the autonomous generation of various material property databases, advancing the field.

翻訳日:2024-08-06 22:45:03 公開日:2024-08-02

# スキルベース学習における質問応答強化のための認知AIと生成モデルの統合

Integrating Cognitive AI with Generative Models for Enhanced Question Answering in Skill-based Learning ( http://arxiv.org/abs/2407.19393v2 )

ライセンス: Link先を確認

Rochan H. Madhusudhana, Rahul K. Dass, Jeanette Luu, Ashok K. Goel,

(参考訳) オンライン学習では、学習者に迅速かつ正確なフィードバックを提供する能力が不可欠である。スキルベースの学習では、学習者はスキルの根底にある概念やメカニズムを理解して、効果的に適用できる必要がある。ビデオはオンライン学習において一般的なツールであるが、教えられているスキルを理解したり評価したりすることはできない。さらに、生成AI手法はテキストコーパスからの回答の検索と検索に有効であるが、これらの手法が真の理解を示すかどうかは不明である。これにより、スキルの説明や問題解決を支援する能力が制限される。本稿では,認知AIと生成AIを融合してこれらの課題に対処する手法を提案する。我々は、構造化知識表現、TMK(Task-Method-Knowledge)モデルを用いて、オンライン知識ベースのAIコースで教えられたスキルをエンコードする。学習者のスキルに関する質問に応えて,大規模言語モデル,チャット・オブ・ソート(Chain-of-Thought),イテレーティブ・リファインメント(Iterative Refinement)などの手法を活用して,理性的な説明を生成するための枠組みを概説する。

In online learning, the ability to provide quick and accurate feedback to learners is crucial. In skill-based learning, learners need to understand the underlying concepts and mechanisms of a skill to be able to apply it effectively. While videos are a common tool in online learning, they cannot comprehend or assess the skills being taught. Additionally, while Generative AI methods are effective in searching and retrieving answers from a text corpus, it remains unclear whether these methods exhibit any true understanding. This limits their ability to provide explanations of skills or help with problem-solving. This paper proposes a novel approach that merges Cognitive AI and Generative AI to address these challenges. We employ a structured knowledge representation, the TMK (Task-Method-Knowledge) model, to encode skills taught in an online Knowledge-based AI course. Leveraging techniques such as Large Language Models, Chain-of-Thought, and Iterative Refinement, we outline a framework for generating reasoned explanations in response to learners' questions about skills.

翻訳日:2024-08-06 19:59:40 公開日:2024-08-02

# アライメントスコア:マルチビューポース精度評価のためのロバストメトリクス

Alignment Scores: Robust Metrics for Multiview Pose Accuracy Evaluation ( http://arxiv.org/abs/2407.20391v2 )

ライセンス: Link先を確認

Seong Hun Lee, Javier Civera,

(参考訳) 本稿では,TAS(Translation Alignment Score),RAS(Rotation Alignment Score),PAS(Pose Alignment Score)の3つの新しい指標を提案する。 TASは回転とは独立して翻訳精度を評価し、RASは翻訳とは独立して回転精度を評価する。 PASは2つのスコアの平均であり、翻訳と回転の組合せの精度を評価する。 TASは以下の4つのステップで計算される: 1) 最寄りのペア距離の上位4分の1、$d$。 2) 頑健な登録法を用いて, 推定軌道を真実に順応する。 (3)全ての距離誤差を収集し、0.01d$から0.01d$までの複数のしきい値の累積周波数を得る。 (4) これらの累積周波数を加算し、理論上の最大値が1となるように正規化する。 TASは,(1)アウトレーヤやコリニア運動に対して頑健であり,(2)異なるデータセットのパラメータを調整する必要がないという点において,既存の指標に対して現実的な優位性を持っている。 RASはTASと同じような方法で計算され、既存の回転測定値よりも外れ値に対して堅牢であることも示されている。我々は、広範囲なシミュレーションを通じてクレームを検証するとともに、提案した指標の長所と短所について詳細な議論を行う。

We propose three novel metrics for evaluating the accuracy of a set of estimated camera poses given the ground truth: Translation Alignment Score (TAS), Rotation Alignment Score (RAS), and Pose Alignment Score (PAS). The TAS evaluates the translation accuracy independently of the rotations, and the RAS evaluates the rotation accuracy independently of the translations. The PAS is the average of the two scores, evaluating the combined accuracy of both translations and rotations. The TAS is computed in four steps: (1) Find the upper quartile of the closest-pair-distances, $d$. (2) Align the estimated trajectory to the ground truth using a robust registration method. (3) Collect all distance errors and obtain the cumulative frequencies for multiple thresholds ranging from $0.01d$ to $d$ with a resolution $0.01d$. (4) Add up these cumulative frequencies and normalize them such that the theoretical maximum is 1. The TAS has practical advantages over the existing metrics in that (1) it is robust to outliers and collinear motion, and (2) there is no need to adjust parameters on different datasets. The RAS is computed in a similar manner to the TAS and is also shown to be more robust against outliers than the existing rotation metrics. We verify our claims through extensive simulations and provide in-depth discussion of the strengths and weaknesses of the proposed metrics.

翻訳日:2024-08-06 19:59:40 公開日:2024-08-02

# SoK: Payment Channel Networks

SoK: Payment Channel Networks ( http://arxiv.org/abs/2407.20968v2 )

ライセンス: Link先を確認

Kartick Kolachala, Mohammed Ababneh, Roopa Vishwanathan,

(参考訳) オンチェーントランザクションに関連するスケーラビリティ、スループット、コストオーバーヘッドの代替ソリューションとして、ペイメントチャネルネットワーク(PCN)が提案されている。トランザクションのオフチェーン実行を容易にすることで、PCNはブロックチェーンの負担を大幅に削減し、トランザクション処理の高速化、トランザクション手数料の削減、プライバシの向上を実現した。これらの利点にもかかわらず、PCNの現在の研究は、さらなる探査を必要とする様々な研究課題を提示している。本稿では、パスフィンディングやルーティング、仮想チャネル、状態チャネル、決済チャネルハブ、リバランシングなど、PCNの最近の研究について調査する。本調査は,PCN研究における現状の詳細な理解を読者に提供することを目的としており,いくつかの重要な進展を浮き彫りにしている。さらに,PCN研究の領域における未解決問題について紹介する。具体的には,PCN研究における,学術・研究コミュニティからの即時的な注意を必要とする,興味深い課題と非自明な課題について述べる。この課題に対処することで、興味のある読者がすぐに取り組むことができる最も急進的な問題と今後の研究方向性を特定することを目指している。この分析を通じて、我々は研究者や実践者がこれらの課題に取り組み、PCNをより安全で多目的にすることを望む。

Payment Channel Networks (PCNs) have been proposed as an alternative solution to the scalability, throughput, and cost overhead associated with on-chain transactions. By facilitating offchain execution of transactions, PCNs significantly reduce the burden on the blockchain, leading to faster transaction processing, reduced transaction fees, and enhanced privacy. Despite these advantages, the current research in PCNs presents a variety of research challenges that require further exploration. In this paper, we survey the recent work in several aspects of PCNs, such as pathfinding and routing, virtual channels, state channels, payment channel hubs and rebalancing. This survey aims to provide the reader with a detailed understanding of the current state-of-the-art in PCN research, highlighting a few important advancements. Additionally, we highlight the various unresolved issues in the area of PCN research. Specifically, this paper seeks to answer the following crucial question: What are the various interesting and non-trivial challenges in PCN research that require immediate attention from the academic and research community? By addressing this question, we aim to identify the most pressing problems and future research directions that interested readers can immediately work on. Through this analysis, we hope to inspire researchers and practitioners to tackle these challenges to make PCNs more secure and versatile

翻訳日:2024-08-06 19:59:40 公開日:2024-08-02

# ビジュアルアートワークの創造性を評価するためにCNNモデルを使用する

Using a CNN Model to Assess Visual Artwork's Creativity ( http://arxiv.org/abs/2408.01481v1 )

ライセンス: Link先を確認

Zhehan Zhang, Meihua Qian, Li Luo, Ripon Saha, Qianyi Gao, Xinxin Song,

(参考訳) 芸術的創造性を評価することは、長い間研究者に挑戦してきた。近年の研究は、絵画ではなく、絵画の創造性を評価するために機械学習を適用している。本研究は,学生の絵画の創造性を自動評価するCNNモデルを開発することで,このギャップに対処する。専門家や子どもによる600点の絵のデータセットを用いて, 精度は90%, 評価速度は人間よりも向上した。このアプローチは、芸術的創造性評価の進歩における機械学習の可能性を示し、従来の方法よりも効率的な代替手段を提供する。

Assessing artistic creativity has long challenged researchers, with traditional methods proving time-consuming. Recent studies have applied machine learning to evaluate creativity in drawings, but not paintings. Our research addresses this gap by developing a CNN model to automatically assess the creativity of students' paintings. Using a dataset of 600 paintings by professionals and children, our model achieved 90% accuracy and faster evaluation times than human raters. This approach demonstrates the potential of machine learning in advancing artistic creativity assessment, offering a more efficient alternative to traditional methods.

翻訳日:2024-08-06 19:49:47 公開日:2024-08-02

# NeuralFactors: 方程式の生成的モデリングのための新しい因子学習アプローチ

NeuralFactors: A Novel Factor Learning Approach to Generative Modeling of Equities ( http://arxiv.org/abs/2408.01499v1 )

ライセンス: Link先を確認

Achintya Gopal,

(参考訳) 統計モデリングにおける機械学習の利用(したがって生成モデリング)は、時系列モデル、テキスト・ツー・イメージモデル、特に大きな言語モデルの普及とともに人気が高まっている。古典的因子モデリングのゴールは、ストックリターンの統計的モデリングであり、本研究では、古典的因子モデルを強化するために、深い生成モデルを用いて検討する。これまでの研究では、数百の在庫をモデル化するために、詳細なリスク予測とアルファポートフォリオ構築のために、深層生成モデルの使用を検討してきたが、特定のモデルでは、因子の暴露が推論できないという容易にファクターモデリングの解釈ができない。本研究では、ニューラルネットワークが因子の露出と因子の戻りを出力する、新しい機械学習に基づく因子分析手法であるNeuralFactorsを紹介し、変分オートエンコーダと同じ手法を用いてトレーニングする。このモデルは,ログライクな性能と計算効率の両面において,従来の手法よりも優れていることを示す。さらに,本手法は,現実的な合成データの生成,共分散推定,リスク分析(ポートフォリオの価値,ポートフォリオの価値,VaR),ポートフォリオ最適化において,事前の作業と競合することを示す。最後に、古典的因子分析とのつながりから、モデルがクラスタを一緒に学習する要因を分析し、要素の露出がストックを埋め込むのに使えることを示す。

The use of machine learning for statistical modeling (and thus, generative modeling) has grown in popularity with the proliferation of time series models, text-to-image models, and especially large language models. Fundamentally, the goal of classical factor modeling is statistical modeling of stock returns, and in this work, we explore using deep generative modeling to enhance classical factor models. Prior work has explored the use of deep generative models in order to model hundreds of stocks, leading to accurate risk forecasting and alpha portfolio construction; however, that specific model does not allow for easy factor modeling interpretation in that the factor exposures cannot be deduced. In this work, we introduce NeuralFactors, a novel machine-learning based approach to factor analysis where a neural network outputs factor exposures and factor returns, trained using the same methodology as variational autoencoders. We show that this model outperforms prior approaches both in terms of log-likelihood performance and computational efficiency. Further, we show that this method is competitive to prior work in generating realistic synthetic data, covariance estimation, risk analysis (e.g., value at risk, or VaR, of portfolios), and portfolio optimization. Finally, due to the connection to classical factor analysis, we analyze how the factors our model learns cluster together and show that the factor exposures could be used for embedding stocks.

翻訳日:2024-08-06 19:49:47 公開日:2024-08-02

# ニューラルネットワークによる効率的なグラフ色付け:大規模グラフに対する物理に着想を得たアプローチ

Efficient Graph Coloring with Neural Networks: A Physics-Inspired Approach for Large Graphs ( http://arxiv.org/abs/2408.01503v1 )

ライセンス: Link先を確認

Lorenzo Colantonio, Andrea Cacioppo, Federico Scarpati, Stefano Giagu,

(参考訳) グラフ着色問題は、隣接する2つの頂点が同じ色を共有することのないグラフの各頂点にq色の1つを割り当てることを含む最適化問題である。この問題はNPハードであり、様々な応用に現れる。本研究では,特に大規模グラフにおいて,グラフニューラルネットワークを有効活用する新しいアルゴリズムを提案する。本稿では、統計力学で使用されるツールを活用して、アルゴリズムのトレーニングと性能を向上させる物理に着想を得た手法を提案する。本手法のスケーリングは,異なる接続性およびグラフサイズに対して評価される。最後に,Erdos-Renyiグラフのデータセット上での本手法の有効性を実証し,従来の手法が難解な接続領域においても適用可能であることを示す。

The graph coloring problem is an optimization problem involving the assignment of one of q colors to each vertex of a graph such that no two adjacent vertices share the same color. This problem is NP-hard and arises in various practical applications. In this work, we present a novel algorithm that leverages graph neural networks to tackle the problem efficiently, particularly for large graphs. We propose a physics-inspired approach that leverages tools used in statistical mechanics to improve the training and performance of the algorithm. The scaling of our method is evaluated for different connectivities and graph sizes. Finally, we demonstrate the effectiveness of our method on a dataset of Erdos-Renyi graphs, showing its applicability also in hard-to-solve connectivity regions where traditional methods struggle.

翻訳日:2024-08-06 19:49:47 公開日:2024-08-02

# MoDE: Dyadic Experts を併用したマルチタスクパラメータ効率の良いファインチューニング

MoDE: Effective Multi-task Parameter Efficient Fine-Tuning with a Mixture of Dyadic Experts ( http://arxiv.org/abs/2408.01505v1 )

ライセンス: Link先を確認

Lin Ning, Harsh Lara, Meiqi Guo, Abhinav Rastogi,

(参考訳) Low-Rank Adaptation (LoRA)のようなパラメータ効率のよい微調整技術は、大規模言語モデル(LLM)の多様なタスクへの適応に革命をもたらした。近年、マルチタスク設定のためのLoRAモジュールの混合について検討している。しかし,本研究では,これらのアーキテクチャのダウンプロジェクション行列の冗長性を明らかにした。この観察は,提案手法であるMixture of Dyadic Experts (MoDE) を動機付け,効率的なマルチタスク適応のための新しい設計を提案する。これはタスク間でダウンプロジェクション行列を共有し、より高度なタスクレベルの特殊化を可能にするルータとアトミックなランクワンアダプタを併用することで実現される。我々の設計はよりきめ細かい混合を可能にし、それによってモデルの複数のタスクを共同で処理する能力を高めます。我々は,700以上のタスクからなるSNI(Super Natural Instructions)ベンチマーク上でMoDEを評価し,さらにパラメータを追加することなく,最先端のマルチタスクパラメータ効率の微調整(PEFT)手法よりも優れていることを示した。本研究は,マルチタスクLLM適応におけるパラメータ効率のより深い理解に寄与し,高性能で軽量なモデルを展開するための実用的なソリューションを提供する。

Parameter-efficient fine-tuning techniques like Low-Rank Adaptation (LoRA) have revolutionized the adaptation of large language models (LLMs) to diverse tasks. Recent efforts have explored mixtures of LoRA modules for multi-task settings. However, our analysis reveals redundancy in the down-projection matrices of these architectures. This observation motivates our proposed method, Mixture of Dyadic Experts (MoDE), which introduces a novel design for efficient multi-task adaptation. This is done by sharing the down-projection matrix across tasks and employing atomic rank-one adapters, coupled with routers that allow more sophisticated task-level specialization. Our design allows for more fine-grained mixing, thereby increasing the model's ability to jointly handle multiple tasks. We evaluate MoDE on the Supernatural Instructions (SNI) benchmark consisting of a diverse set of 700+ tasks and demonstrate that it outperforms state-of-the-art multi-task parameter-efficient fine-tuning (PEFT) methods, without introducing additional parameters. Our findings contribute to a deeper understanding of parameter efficiency in multi-task LLM adaptation and provide a practical solution for deploying high-performing, lightweight models.

翻訳日:2024-08-06 19:49:47 公開日:2024-08-02

# 強化学習による量子ノイズモデリング

Quantum noise modeling through Reinforcement Learning ( http://arxiv.org/abs/2408.01506v1 )

ライセンス: Link先を確認

Simone Bordoni, Andrea Papaluca, Piergiorgio Buttarini, Alejandro Sopena, Stefano Giagu, Stefano Carrazza,

(参考訳) 量子コンピューティングの現在の時代には、シミュレーションと量子ハードウェア実行のギャップを埋めるために、堅牢で効率的なツールが不可欠である。本研究では,量子チップに影響を及ぼすノイズを識別し,シミュレーション中にエミュレートする機械学習手法を提案する。このアルゴリズムは強化学習を活用し、ランダム化ベンチマークやヒューリスティックノイズモデルといった従来の手法と比較して、様々なノイズモデルを再現する柔軟性を向上させる。実超伝導量子ビット上でのシミュレーションおよび試験によりRLエージェントの有効性が検証された。さらに、有名な量子アルゴリズムの研究に応用例を挙げる。

In the current era of quantum computing, robust and efficient tools are essential to bridge the gap between simulations and quantum hardware execution. In this work, we introduce a machine learning approach to characterize the noise impacting a quantum chip and emulate it during simulations. Our algorithm leverages reinforcement learning, offering increased flexibility in reproducing various noise models compared to conventional techniques such as randomized benchmarking or heuristic noise models. The effectiveness of the RL agent has been validated through simulations and testing on real superconducting qubits. Additionally, we provide practical use-case examples for the study of renowned quantum algorithms.

翻訳日:2024-08-06 19:49:47 公開日:2024-08-02

# ブロックチェーン経済におけるサステナビリティ攻撃の否定 - Ethereumトランザクションフォワードにおけるレイテンシ最適化の爆発的実行

Blockchain Economic Denial of Sustainability Attack: Exploiting Latency Optimization in Ethereum Transaction Forwarding ( http://arxiv.org/abs/2408.01508v1 )

ライセンス: Link先を確認

Taro Tsuchiya, Liyi Zhou, Kaihua Qin, Arthur Gervais, Nicolas Christin,

(参考訳) 調停、フロントランニング、バックランニングといったブロックチェーンの概念である抽出可能な価値(MEV/BEV)に関する戦略は、ブロックチェーンネットワークをセキュアにするための中核的な機能であるトランザクション検証時間を最小化するなど、レイテンシを低減するためのネットワークノードの経済的インセンティブを生み出します。 Ethereum P2Pネットワークにおける不正なトランザクションをフィルタリングしない修正ノードは、新しいアタックベクターを導入している。本研究では,修正ノードのオペレーターに対するトラフィックコストの金銭的損失を生じさせるブロックチェーン・エコノミック・デニアル・オブ・サステナビリティ(EDoS)攻撃を形式化し,評価する。我が家 1) 数学的に攻撃モデルを定義する。 2) 野生での同様の攻撃の実証事例を数千件特定する。 3) 2つのモニタリングノードからモデルパラメータを経験的に測定し、 4) 既存のDenial-of-Service攻撃と比較するため, ローカルネットワーク上で攻撃シミュレーションを行う。攻撃者は修正ノードのネットワークトラフィックを3,600倍に増幅し、攻撃を行うために必要な量よりも13,800倍の経済被害を発生させることができることを示す。これらのリスクにもかかわらず、アグレッシブなレイテンシ削減は、修正ノードの存在を正当化するのに十分な利益を得る可能性がある。このトレードオフを評価するには 1)ローカルネットワークにおける取引検証プロセスをシミュレートし、 2)Ethereumテストネットに修正ノードをデプロイすることで遅延低減を実証的に測定する。我々は、スキップ検証の費用対効果分析を行い、この攻撃に対する緩和戦略を提供する。

Strategies related to the blockchain concept of Extractable Value (MEV/BEV), such as arbitrage, front- or back-running create an economic incentive for network nodes to reduce latency, including minimizing transaction validation time -- a core feature to secure blockchain networks. A modified node, that neglects to filter invalid transactions in the Ethereum P2P network, introduces novel attack vectors. In this work, we formalize and evaluate a Blockchain Economic Denial of Sustainability (EDoS) attack, which can cause financial losses in traffic costs for operators of modified nodes. We 1) mathematically define the attack model, 2) identify thousands of empirical instances of this similar attack in the wild, 3) empirically measure the model parameters from our two monitoring nodes, and 4) conduct attack simulations on the local network to compare its performance with existing Denial-of-Service attacks. We show that an attacker can amplify network traffic at modified nodes by a factor of 3,600, and cause economic damages 13,800 times greater than the amount needed to carry out the attack. Despite these risks, aggressive latency reduction may still be profitable enough to justify the existence of modified nodes. To assess this trade-off, we 1) simulate the transaction validation process in the local network and 2) empirically measure the latency reduction by deploying our modified node in the Ethereum testnet. We conclude with a cost-benefit analysis of skipping validation and provide mitigation strategies against this attack.

翻訳日:2024-08-06 19:49:47 公開日:2024-08-02

# 不確実性下における生成モデルによる適応計画

Adaptive Planning with Generative Models under Uncertainty ( http://arxiv.org/abs/2408.01510v1 )

ライセンス: Link先を確認

Pascal Jutras-Dubé, Ruqi Zhang, Aniket Bera,

(参考訳) 生成モデルによる計画は、強化学習や自律ナビゲーションなど、幅広い領域にわたる効果的な意思決定パラダイムとして現れている。各時点における連続的な再計画は、最新の環境観測に基づいて決定を下すことができるため直感的に思えるかもしれないが、主に生成モデルの基盤となるディープラーニングアーキテクチャの複雑さのために、かなりの計算上の困難が生じる。本研究は, 生成モデルの長期的状態軌跡予測能力を活用し, 即時的な計画変更を必要とせずに連続的に複数行動の実行を可能にする, 適応型計画手法を導入することで, この課題に対処する。本稿では,逆動力学モデルのディープアンサンブルから導かれる予測不確実性を利用して,計画セッション間の間隔を動的に調整することを提案する。我々は,OpenAI Gymフレームワーク内での移動タスクの実施実験において,適応計画政策により,性能を損なうことなく,頻度を約10%に短縮できることを実証した。本結果は,意思決定の効率的かつ効果的なツールとしての生成モデルの可能性を明らかにするものである。

Planning with generative models has emerged as an effective decision-making paradigm across a wide range of domains, including reinforcement learning and autonomous navigation. While continuous replanning at each timestep might seem intuitive because it allows decisions to be made based on the most recent environmental observations, it results in substantial computational challenges, primarily due to the complexity of the generative model's underlying deep learning architecture. Our work addresses this challenge by introducing a simple adaptive planning policy that leverages the generative model's ability to predict long-horizon state trajectories, enabling the execution of multiple actions consecutively without the need for immediate replanning. We propose to use the predictive uncertainty derived from a Deep Ensemble of inverse dynamics models to dynamically adjust the intervals between planning sessions. In our experiments conducted on locomotion tasks within the OpenAI Gym framework, we demonstrate that our adaptive planning policy allows for a reduction in replanning frequency to only about 10% of the steps without compromising the performance. Our results underscore the potential of generative modeling as an efficient and effective tool for decision-making.

翻訳日:2024-08-06 19:49:47 公開日:2024-08-02

# グラフ特性を持つ重み付きグラフ状態の曲率とねじれの関係とその量子コンピュータに関する研究

Relation of curvature and torsion of weighted graph states with graph properties and its studies on a quantum computer ( http://arxiv.org/abs/2408.01511v1 )

ライセンス: Link先を確認

Kh. P. Gnatenko,

(参考訳) 重み付きグラフで表現できるスピン系の量子状態は、$G(V, E)$である。これらの状態の幾何学的特性について検討した。量子進化の速度は、グラフ内のノードの重み付け次数の和によって決定され、G(V, E)$の重み付けを第2のパワーに上げることによって構成される。曲率(英: curvature)は、グラフの重み付きノードの和に依り、重みを第2乗と第4乗に$G(V, E)$で上げる。また、グラフ $G(V, E)$ の辺の重みの積の和にも依存する。追加のねじれは、グラフ $G(V, E)$ の辺の重みの積の和 $S_3$ に関係している。量子グラフ状態の幾何学的性質と重み付けされたノードの和は、スピンチェーンの場合、IBMの量子コンピュータ上で量子プログラミングによって計算されている。

Quantum states of spin systems that can be represented with weighted graphs $G(V, E)$ are studied. The geometrical characteristics of these states are examined. We find that the velocity of quantum evolution is determined by the sum of the weighted degrees of the nodes in the graph, constructed by raising to the second power the weights in $G(V, E)$. The curvature depends on the sum of the weighted degrees of nodes in graphs constructed by raising the weights in $G(V, E)$ to the second and fourth powers. It also depends on the sum of the products of the weights of edges forming squares in graph $G(V, E)$. The torsion in addition is related to the sum of the products of the weights of edges in graph $G(V, E)$ forming triangles $S_3$. Geometric properties of quantum graph states and the sum of the weighted degrees of nodes have been calculated with quantum programming on IBM's quantum computer for the case of a spin chain.

翻訳日:2024-08-06 19:49:47 公開日:2024-08-02

# ギブスサンプリングは、O(1)$-ローカルハミルトニアンによる一定温度での量子アドバンテージを与える

Gibbs Sampling gives Quantum Advantage at Constant Temperatures with $O(1)$-Local Hamiltonians ( http://arxiv.org/abs/2408.01516v1 )

ライセンス: Link先を確認

Joel Rajakumar, James D. Watson,

(参考訳) Gibbs の標本化 $\unicode{x2013}$ 熱平衡におけるシステムに対応する状態 $\unicode{x2013}$ は、量子コンピュータが古典的コンピュータと比較して超ポリノミカルなスピードアップを達成することを期待するタスクであることが最近示されている(Bergamaschi et al , arXiv: 2404.14639)。これらの結果を拡張し、量子コンピュータを用いて古典的な硬さを示すことによって、O(1)-局所相互作用を持つハミルトニアンのギブス状態に対して、この量子優位性が依然として生じていることを示す。特に、3次元格子上の5-局所ハミルトニアンに対しても、サンプリングの硬さが維持されることを示す。さらに、不完全な測定しかできない場合、サンプルの硬さは堅牢であることを示す。これらの硬度結果の他に、ギブス状態が古典的にハミルトンの相互作用グラフの最大度でサンプリングし易くなる温度の低い境界を示す。

Sampling from Gibbs states $\unicode{x2013}$ states corresponding to system in thermal equilibrium $\unicode{x2013}$ has recently been shown to be a task for which quantum computers are expected to achieve super-polynomial speed-up compared to classical computers, provided the locality of the Hamiltonian increases with the system size (Bergamaschi et al., arXiv: 2404.14639). We extend these results to show that this quantum advantage still occurs for Gibbs states of Hamiltonians with O(1)-local interactions at constant temperature by showing classical hardness-of-sampling and demonstrating such Gibbs states can be prepared efficiently using a quantum computer. In particular, we show hardness-of-sampling is maintained even for 5-local Hamiltonians on a 3D lattice. We additionally show that the hardness-of-sampling is robust when we are only able to make imperfect measurements. Beyond these hardness results, we present a lower bound on the temperatures that Gibbs states become easy to sample from classically in terms of the maximum degree of the Hamiltonian's interaction graph.

翻訳日:2024-08-06 19:49:47 公開日:2024-08-02

# パラメータ空間における勾配流は出力空間における線形補間と等価である

Gradient flow in parameter space is equivalent to linear interpolation in output space ( http://arxiv.org/abs/2408.01517v1 )

ライセンス: Link先を確認

Thomas Chen, Patrícia Muñoz Ewald,

(参考訳) 深層学習におけるニューラルネットワークのトレーニングアルゴリズムの根底にあるパラメータ空間における通常の勾配流は、連続的に適応された勾配流に変形し、出力空間におけるユークリッド勾配流を生じることを証明した。さらに、パラメータに関する出力のヤコビアンが完全ランク(固定トレーニングデータ)であれば、時間変数は単に線形補間であり、大域的な最小値が得られるように再パラメータ化することができる。

We prove that the usual gradient flow in parameter space that underlies many training algorithms for neural networks in deep learning can be continuously deformed into an adapted gradient flow which yields (constrained) Euclidean gradient flow in output space. Moreover, if the Jacobian of the outputs with respect to the parameters is full rank (for fixed training data), then the time variable can be reparametrized so that the resulting flow is simply linear interpolation, and a global minimum can be achieved.

翻訳日:2024-08-06 19:49:47 公開日:2024-08-02

# Raster-Wise 床計画のセマンティックセグメンテーションを改良したマルチユニット床計画認識と再構成

Multi-Unit Floor Plan Recognition and Reconstruction Using Improved Semantic Segmentation of Raster-Wise Floor Plans ( http://arxiv.org/abs/2408.01526v1 )

ライセンス: Link先を確認

Lukas Kratochvila, Gijs de Jong, Monique Arkesteijn, Simon Bilik, Tomas Zemcik, Karel Horak, Jan S. Rellermeyer,

(参考訳) デジタル双生児は、避難経路のより効率的な設計、例外的な状況での配向の改善、迅速な救助介入を可能にするため、緊急計画において都市管理の重要な部分を形成する大きな可能性を秘めている。しかし、3D表現が不足しているため、新しい建物では限られた量しか利用できないため、双子の製作は依然として手作業で行われている。そこで本研究では,一般的な2次元建築フロアプランから3次元情報を合成することを目的とする。本稿では,MDA-Unet と MACU-Net アーキテクチャをベースとした2つの新しい画素分割手法を提案する。提案手法は他の2つの最先端技術とベンチマークデータセットと比較した。一般的に使用されるCubeCasaベンチマークデータセットでは,5つのクラスに対して平均F1スコアが0.86であり,他のピクセル単位のアプローチよりも優れていた。私たちはまた、この分野の研究を支援するためにコードを公開しました。

Digital twins have a major potential to form a significant part of urban management in emergency planning, as they allow more efficient designing of the escape routes, better orientation in exceptional situations, and faster rescue intervention. Nevertheless, creating the twins still remains a largely manual effort, due to a lack of 3D-representations, which are available only in limited amounts for some new buildings. Thus, in this paper we aim to synthesize 3D information from commonly available 2D architectural floor plans. We propose two novel pixel-wise segmentation methods based on the MDA-Unet and MACU-Net architectures with improved skip connections, an attention mechanism, and a training objective together with a reconstruction part of the pipeline, which vectorizes the segmented plans to create a 3D model. The proposed methods are compared with two other state-of-the-art techniques and several benchmark datasets. On the commonly used CubiCasa benchmark dataset, our methods have achieved the mean F1 score of 0.86 over five examined classes, outperforming the other pixel-wise approaches tested. We have also made our code publicly available to support research in the field.

翻訳日:2024-08-06 19:49:47 公開日:2024-08-02

# LLMのソフトウェア不適格ユーザ感の確立能力の分析

Analyzing LLMs' Capabilities to Establish Implicit User Sentiment of Software Desirability ( http://arxiv.org/abs/2408.01527v1 )

ライセンス: Link先を確認

Sherri Weitl-Harms, John D. Hastings, Jonah Lum,

(参考訳) 本研究では,ユーザによって表現される暗黙のソフトウェア望ましさを定量的にゼロショット感情分析するために,複数のLCMを用いて検討する。この研究は、感情を肯定的、中立的、否定的なものに分類する他の方法とは異なり、スケールされた数値的な感情分析を提供する。数値分析は感情の大きさについて深い洞察を与え、製品の望ましさに関するより良い意思決定を促す。データはMicrosoft Product Desirability Toolkit (PDT)を使って収集される。最初の探索のために、PDTメトリックは、学部のコンピュータサイエンス教育で使用されるゲーミフィケーションシステムであるZORQのユーザに与えられた。収集したPDTデータは,複数のLDM(Claude Sonnet 3,3.5,GPT4,GPT4o)と,主要な伝達学習技術であるTwitter-Roberta-Base-Sentiment(TRBS)と,主要な感情分析ツールであるVaderを通じて,定量的感情分析を行った。各システムは、まず、PDTワード/説明ペアで表現された感情と、ユーザがグループで表現した5つの単語と説明の感情を総合的に見ることによって、データを2つの方法で評価するよう求められた。各LSMは、感情スコアの信頼度(低、中、高)と、なぜ感情値を選んだのかの説明も求められた。テストされたすべてのLSMは、ユーザのグループ化されたデータから統計的にユーザ感情を検出できたが、TRBSとVaderはそうではなかった。 LLMが提供した信頼感と説明は、ユーザの感情を理解するのに役立った。本研究は、暗黙の感情を定量化する普遍的なツールを開発することを目的として、ユーザエクスペリエンスを評価することの理解を深める。

This study explores the use of several LLMs for providing quantitative zero-shot sentiment analysis of implicit software desirability expressed by users. The study provides scaled numerical sentiment analysis unlike other methods that simply classify sentiment as positive, neutral, or negative. Numerical analysis provides deeper insights into the magnitude of sentiment, to drive better decisions regarding product desirability. Data is collected through the use of the Microsoft Product Desirability Toolkit (PDT), a well-known qualitative user experience analysis tool. For initial exploration, the PDT metric was given to users of ZORQ, a gamification system used in undergraduate computer science education. The PDT data collected was fed through several LLMs (Claude Sonnet 3 and 3.5, GPT4, and GPT4o) and through a leading transfer learning technique, Twitter-Roberta-Base-Sentiment (TRBS), and through Vader, a leading sentiment analysis tool, for quantitative sentiment analysis. Each system was asked to evaluate the data in two ways, first by looking at the sentiment expressed in the PDT word/explanation pairs; and by looking at the sentiment expressed by the users in their grouped selection of five words and explanations, as a whole. Each LLM was also asked to provide its confidence (low, medium, high) in its sentiment score, along with an explanation of why it selected the sentiment value. All LLMs tested were able to statistically detect user sentiment from the users' grouped data, whereas TRBS and Vader were not. The confidence and explanation of confidence provided by the LLMs assisted in understanding the user sentiment. This study adds to a deeper understanding of evaluating user experiences, toward the goal of creating a universal tool that quantifies implicit sentiment expressed.

翻訳日:2024-08-06 19:49:47 公開日:2024-08-02

# 多変量グランガー因果関係は、多変数・動的生物学的決定ネットワークモデルの相互接続を検出することができるか?

Can multivariate Granger causality detect directed connectivity of a multistable and dynamic biological decision network model? ( http://arxiv.org/abs/2408.01528v1 )

ライセンス: Link先を確認

Abdoreza Asadpour, KongFatt Wong-Lin,

(参考訳) 因果関係の抽出は、解釈可能なAIと機械学習を前進させる。 Granger causality (GC) は、信号間の直接影響(DC)を推定するための頑健な統計手法である。 GCは、生物学的ニューラルネットワークやその他の領域における神経信号の解析に広く応用されているが、その複雑で非線形で多安定なニューラルネットワークへの応用は、あまり研究されていない。本研究では, 実時間決定不確実性モニタリングを用いた実時間多変数決定ニューラルネットワークモデルにおいて, 全ノードの時系列神経活動に時間領域多変量グランガー因果性(MVGC)を適用した。解析の結果,入力信号が密に一致し得る2方向決定に挑戦し,より微細なスライディング時間窓の適切な適用により,元のモデルのDCが容易に明らかになることがわかった。さらに、同定されたDCは、ネットワークが正しいかエラーかによって異なる。異なる決定結果から識別されたDCを統合することで、いくつかの急激で欠落した接続性にもかかわらず、元のモデルのアーキテクチャの大半を回復した。このアプローチは、ニューラルネットワークのダイナミクスと結果の異なるフェーズにわたって因果関係を明らかにすることによって、動的マルチスタブルおよび非線形生物学的またはAIシステムの解釈可能性と透明性を高めるための最初の調査として使用することができる。

Extracting causal connections can advance interpretable AI and machine learning. Granger causality (GC) is a robust statistical method for estimating directed influences (DC) between signals. While GC has been widely applied to analysing neuronal signals in biological neural networks and other domains, its application to complex, nonlinear, and multistable neural networks is less explored. In this study, we applied time-domain multi-variate Granger causality (MVGC) to the time series neural activity of all nodes in a trained multistable biologically based decision neural network model with real-time decision uncertainty monitoring. Our analysis demonstrated that challenging two-choice decisions, where input signals could be closely matched, and the appropriate application of fine-grained sliding time windows, could readily reveal the original model's DC. Furthermore, the identified DC varied based on whether the network had correct or error decisions. Integrating the identified DC from different decision outcomes recovered most of the original model's architecture, despite some spurious and missing connectivity. This approach could be used as an initial exploration to enhance the interpretability and transparency of dynamic multistable and nonlinear biological or AI systems by revealing causal connections throughout different phases of neural network dynamics and outcomes.

翻訳日:2024-08-06 19:40:03 公開日:2024-08-02

# 液相FTIRと機械学習を用いた持続可能な航空燃料特性予測のための構造的枠組み

A Structured Framework for Predicting Sustainable Aviation Fuel Properties using Liquid-Phase FTIR and Machine Learning ( http://arxiv.org/abs/2408.01530v1 )

ライセンス: Link先を確認

Ana E. Comesana, Sharon S. Chen, Kyle E. Niemeyer, Vi H. Rapp,

(参考訳) 持続可能な航空燃料は、排出と環境への影響を減らす可能性がある。持続可能な航空燃料を特定し、研究を加速するために、関連する物理化学的特性を予測するためにいくつかの機械学習モデルが開発されている。しかし、多くのモデルは適用可能性に制限があり、スペクトル範囲が制限された複雑な分析技術からのデータを利用するか、解釈可能性に制限のある特徴分解手法を使用する。本研究では, 液体相フーリエ変換赤外(FTIR)スペクトルを用いて, クリーン分子, 航空燃料, ブレンドの高精度かつ解釈可能な特性予測モデルを構築するための構造化手法を提案する。液相FTIRスペクトル測定は、信頼性、感度、成分特異性を2mL未満の試料を用いて、迅速かつ一貫して収集することができる。この方法は、FTIRスペクトルを非負行列因子化(NMF)を用いて基本構造ブロックに分解し、FTIRスペクトル特性と燃料特性の科学的解析を可能にする。 NMFの機能は、最終沸点、点火点、凍結点、密度15C、運動粘度-20Cを予測するための5つのアンサンブルモデルを作成するために使用される。全てのモデルは、きれいな分子、航空燃料、ブレンドからの実験的な特性データを用いて訓練された。これらのモデルは、機能基や化学クラスなどの燃料の組成要素間の関係の解釈を可能にしながら、特性を正確に予測する。持続可能な航空燃料研究開発を支援するため、モデルとデータはインタラクティブなウェブツールで利用可能である。

Sustainable aviation fuels have the potential for reducing emissions and environmental impact. To help identify viable sustainable aviation fuels and accelerate research, several machine learning models have been developed to predict relevant physiochemical properties. However, many of the models have limited applicability, leverage data from complex analytical techniques with confined spectral ranges, or use feature decomposition methods that have limited interpretability. Using liquid-phase Fourier Transform Infrared (FTIR) spectra, this study presents a structured method for creating accurate and interpretable property prediction models for neat molecules, aviation fuels, and blends. Liquid-phase FTIR spectra measurements can be collected quickly and consistently, offering high reliability, sensitivity, and component specificity using less than 2 mL of sample. The method first decomposes FTIR spectra into fundamental building blocks using Non-negative Matrix Factorization (NMF) to enable scientific analysis of FTIR spectra attributes and fuel properties. The NMF features are then used to create five ensemble models for predicting final boiling point, flash point, freezing point, density at 15C, and kinematic viscosity at -20C. All models were trained using experimental property data from neat molecules, aviation fuels, and blends. The models accurately predict properties while enabling interpretation of relationships between compositional elements of a fuel, such as functional groups or chemical classes, and its properties. To support sustainable aviation fuel research and development, the models and data are available on an interactive web tool.

翻訳日:2024-08-06 19:40:03 公開日:2024-08-02

# 視覚的深度検出と位置推定のための文脈的クロスモーダルアテンション

Contextual Cross-Modal Attention for Audio-Visual Deepfake Detection and Localization ( http://arxiv.org/abs/2408.01532v1 )

ライセンス: Link先を確認

Vinaya Sree Katamneni, Ajita Rattani,

(参考訳) デジタル時代には、ディープフェイクや合成メディアの出現は、社会的・政治的整合性に対する重大な脅威となる。オーディオ視覚のようなマルチモーダル操作に基づくディープフェイクは、より現実的であり、より大きな脅威をもたらす。現在のマルチモーダルディープフェイク検出器は、注意に基づく複数のモーダルからの異種データストリームの融合に基づいていることが多い。しかし、データ(音声や視覚信号など)の異種性は、分散モダリティのギャップを生じさせ、効果的な融合やマルチモーダルディープフェイク検出において重要な課題を生じさせる。本稿では,音声・視覚的ディープフェイク検出にコンテキスト情報を活用する,リカレントニューラルネットワーク(RNN)に基づく新しいマルチモーダルアテンションフレームワークを提案する。提案手法はマルチモーダルなマルチシーケンス表現に注意を払い、深度検出と局所化に寄与する特徴を学習する。 FakeAVCeleb, AV-Deepfake1M, TVIL, LAV-DFといったオーディオ・ビジュアルディープフェイク・データセットに対する実験的検証を行い, 本手法の有効性を実証した。本研究との相互比較により, 深度検出と局所化の精度が3.47%, 精度が2.05%向上した。したがって、最先端のパフォーマンスを得る。再現性を促進するため、コードとデータセット情報はhttps://github.com/vcbsl/audiovisual-deepfake/で公開されている。

In the digital age, the emergence of deepfakes and synthetic media presents a significant threat to societal and political integrity. Deepfakes based on multi-modal manipulation, such as audio-visual, are more realistic and pose a greater threat. Current multi-modal deepfake detectors are often based on the attention-based fusion of heterogeneous data streams from multiple modalities. However, the heterogeneous nature of the data (such as audio and visual signals) creates a distributional modality gap and poses a significant challenge in effective fusion and hence multi-modal deepfake detection. In this paper, we propose a novel multi-modal attention framework based on recurrent neural networks (RNNs) that leverages contextual information for audio-visual deepfake detection. The proposed approach applies attention to multi-modal multi-sequence representations and learns the contributing features among them for deepfake detection and localization. Thorough experimental validations on audio-visual deepfake datasets, namely FakeAVCeleb, AV-Deepfake1M, TVIL, and LAV-DF datasets, demonstrate the efficacy of our approach. Cross-comparison with the published studies demonstrates superior performance of our approach with an improved accuracy and precision by 3.47% and 2.05% in deepfake detection and localization, respectively. Thus, obtaining state-of-the-art performance. To facilitate reproducibility, the code and the datasets information is available at https://github.com/vcbsl/audiovisual-deepfake/.

翻訳日:2024-08-06 19:40:03 公開日:2024-08-02

# 効率的な深部ニューラルネットワーク圧縮のための適応型テンソルトレイン分解法

An Adaptive Tensor-Train Decomposition Approach for Efficient Deep Neural Network Compression ( http://arxiv.org/abs/2408.01534v1 )

ライセンス: Link先を確認

Shiyi Luo, Mingshuo Liu, Pu Sun, Yifeng Yu, Shangping Ren, Yu Bai,

(参考訳) モデル圧縮の分野では、モデル圧縮率と効率のバランスをとるために、テンソル分解に適したランクを選択することが重要である。しかし、この選択は手動でも最適化ベースの自動手法でも、しばしば計算複雑性を増大させる。手動のランク選択は効率とスケーラビリティに欠けており、しばしば大規模な試行錯誤を必要とするが、最適化ベースの自動手法は計算負担を大幅に増加させる。そこで我々は,Layer-Wise Imprinting Quantitation (LWIQ) を用いた,効率的なモデル圧縮のための新しい,自動かつ予算を考慮したランク選択手法を提案する。 LWIQは、プロキシ分類器を統合することにより、ニューラルネットワーク内の各レイヤの意義を定量化する。この分類器は、レイヤーが全体的なモデル性能に与える影響を評価し、テンソルランクのより詳細な調整を可能にする。さらに,提案手法は,計算予算の制約に適合するスケーリング係数を含む。この予算意識は、異なる予算シナリオに対する反復的なランク再計算の必要性を排除します。 CIFAR-10データセットによる実験結果から,LWIQのランク検索効率は63.2$\%に向上し,ResNet-56モデルでは3.2倍のモデルサイズで0.86$\%に低下した。

In the field of model compression, choosing an appropriate rank for tensor decomposition is pivotal for balancing model compression rate and efficiency. However, this selection, whether done manually or through optimization-based automatic methods, often increases computational complexity. Manual rank selection lacks efficiency and scalability, often requiring extensive trial-and-error, while optimization-based automatic methods significantly increase the computational burden. To address this, we introduce a novel, automatic, and budget-aware rank selection method for efficient model compression, which employs Layer-Wise Imprinting Quantitation (LWIQ). LWIQ quantifies each layer's significance within a neural network by integrating a proxy classifier. This classifier assesses the layer's impact on overall model performance, allowing for a more informed adjustment of tensor rank. Furthermore, our approach includes a scaling factor to cater to varying computational budget constraints. This budget awareness eliminates the need for repetitive rank recalculations for different budget scenarios. Experimental results on the CIFAR-10 dataset show that our LWIQ improved by 63.2$\%$ in rank search efficiency, and the accuracy only dropped by 0.86$\%$ with 3.2x less model size on the ResNet-56 model as compared to the state-of-the-art proxy-based automatic tensor rank selection method.

翻訳日:2024-08-06 19:40:03 公開日:2024-08-02

# ニューラルPDE解の能動学習

Active Learning for Neural PDE Solvers ( http://arxiv.org/abs/2408.01536v1 )

ライセンス: Link先を確認

Daniel Musekamp, Marimuthu Kalimuthu, David Holzmüller, Makoto Takamoto, Mathias Niepert,

(参考訳) 偏微分方程式 (PDE) の解法は工学と科学の基本的な問題である。ニューラルPDEソルバは、確立された数値ソルバよりも効率がよいが、取得にコストがかかる大量のトレーニングデータを必要とすることが多い。アクティブラーニング(AL)は、より情報のある初期条件とPDEパラメータで古典的解法をクエリすることで、より小さなトレーニングセットでモデルが同じ精度に達するのに役立つ。 ALは他の領域ではより一般的であるが、神経性PDE解法についてはまだ広く研究されていない。このギャップを埋めるために、モジュール的で拡張可能なアクティブラーニングベンチマークであるAL4PDEを導入する。複数のパラメトリックPDEと最先端サロゲートモデルを提供し、PDE解決のための既存手法の評価と新たなAL手法の開発を可能にする。このベンチマークを用いて、不確実性や特徴に基づく手法のようなバッチアクティブな学習アルゴリズムを評価する。 ALは,ランダムサンプリングと比較して平均誤差を最大71%削減し,最悪のケースエラーを著しく低減することを示した。さらにALは、PDEパラメータと初期条件を一貫した分布で、繰り返し実行される複数の類似データセットを生成する。取得したデータセットは再利用可能であり、データ生成に関与しないモデルのサロゲートにメリットを提供する。

Solving partial differential equations (PDEs) is a fundamental problem in engineering and science. While neural PDE solvers can be more efficient than established numerical solvers, they often require large amounts of training data that is costly to obtain. Active Learning (AL) could help surrogate models reach the same accuracy with smaller training sets by querying classical solvers with more informative initial conditions and PDE parameters. While AL is more common in other domains, it has yet to be studied extensively for neural PDE solvers. To bridge this gap, we introduce AL4PDE, a modular and extensible active learning benchmark. It provides multiple parametric PDEs and state-of-the-art surrogate models for the solver-in-the-loop setting, enabling the evaluation of existing and the development of new AL methods for PDE solving. We use the benchmark to evaluate batch active learning algorithms such as uncertainty- and feature-based methods. We show that AL reduces the average error by up to 71% compared to random sampling and significantly reduces worst-case errors. Moreover, AL generates similar datasets across repeated runs, with consistent distributions over the PDE parameters and initial conditions. The acquired datasets are reusable, providing benefits for surrogate models not involved in the data generation.

翻訳日:2024-08-06 19:40:03 公開日:2024-08-02

# SceneMotion: エージェント中心の埋め込みからScene-Wide予測へ

SceneMotion: From Agent-Centric Embeddings to Scene-Wide Forecasts ( http://arxiv.org/abs/2408.01537v1 )

ライセンス: Link先を確認

Royden Wagner, Ömer Sahin Tas, Marlon Steiner, Fabian Konstantinidis, Hendrik Königshof, Marvin Klemp, Carlos Fernandez, Christoph Stiller,

(参考訳) 自動運転車は、環境と効果的に対話し、安全な操作を計画するために、マルチモーダルな動き予測に依存している。我々は、複数の交通機関のシーンワイド・モーション・モードを予測するアテンションベースモデルであるSceneMotionを紹介する。我々のモデルは,局所エージェント中心の埋め込みを,新しい潜在コンテキストモジュールを用いてシーンワイドな予測に変換する。このモジュールは複数のエージェント中心の埋め込みからシーン全体の潜在空間を学習し、共同予測と相互作用モデリングを可能にする。 Waymo Open Interaction Prediction Challengeの競合性能は、我々のアプローチの有効性を示している。さらに、エージェント間の相互作用を定量化するために、時間と空間で将来のウェイポイントをクラスタ化する。すべてのモードをマージし、各モードを独立して分析し、相互作用によってどのクラスタが解決されたかを決定します。私たちの実装は、https://github.com/kit-mrt/future-motion.comで利用可能です。

Self-driving vehicles rely on multimodal motion forecasts to effectively interact with their environment and plan safe maneuvers. We introduce SceneMotion, an attention-based model for forecasting scene-wide motion modes of multiple traffic agents. Our model transforms local agent-centric embeddings into scene-wide forecasts using a novel latent context module. This module learns a scene-wide latent space from multiple agent-centric embeddings, enabling joint forecasting and interaction modeling. The competitive performance in the Waymo Open Interaction Prediction Challenge demonstrates the effectiveness of our approach. Moreover, we cluster future waypoints in time and space to quantify the interaction between agents. We merge all modes and analyze each mode independently to determine which clusters are resolved through interaction or result in conflict. Our implementation is available at: https://github.com/kit-mrt/future-motion

翻訳日:2024-08-06 19:40:03 公開日:2024-08-02

# 3D $\mathcal{N}=1$ Supergravity from Virasoro TQFT: Gravitational partition function and Out-of-time-order correlator

3D $\mathcal{N}=1$ supergravity from Virasoro TQFT: Gravitational partition function and Out-of-time-order correlator ( http://arxiv.org/abs/2408.01538v1 )

ライセンス: Link先を確認

Arpan Bhattacharyya, Saptaswa Ghosh, Poulami Nandi, Sounak Pal,

(参考訳) 本論文では,超ビラソーロTQFTを用いて,球面とトーラスの異なる境界位相に対して$\mathcal{N}=1$ SUGRAの分割関数を計算する。我々はスーパーリウヴィル理論の融合とモジュラー核を用いてネックレス-チャネル共形ブロックを計算し、内積がヒルベルト空間の状態として定義される超共形ブロックに対して成り立つことを証明して形式主義を示す。最後に,スーパービラソーロTQFTのツールを用いてトーラストポロジの時間外相関を計算し,その早期挙動について検討する。

In this paper, we compute the partition functions of $\mathcal{N}=1$ SUGRA for different boundary topologies, i.e. sphere and torus, using super-Virasoro TQFT. We use fusion and modular kernels of the super-Liouville theory to compute the necklace-channel conformal block and showcase formalism by proving that the inner product holds for superconformal blocks, defined as states in the Hilbert space. Finally, we compute the out-of-time-order correlator for the torus topology with superconformal primary insertions as matter using the tools of super-Virasoro TQFT and investigate its early-time behaviour.

翻訳日:2024-08-06 19:40:03 公開日:2024-08-02

# 画像品質のガーディアン:画像品質指標に対する敵対的攻撃に対する防御のベンチマーク

Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality Metrics ( http://arxiv.org/abs/2408.01541v1 )

ライセンス: Link先を確認

Alexander Gushchin, Khaled Abud, Georgii Bychkov, Ekaterina Shumitskaya, Anna Chistyakova, Sergey Lavrushkin, Bader Rasheed, Kirill Malyshev, Dmitriy Vatolin, Anastasia Antsiferova,

(参考訳) 画像品質評価(IQA)の分野では、メトリクスの対角的堅牢性が重要な関心事となっている。本稿では、IQAに対する敵攻撃の増加に対応する防衛機構の総合的なベンチマーク研究について述べる。本研究は, 敵の浄化, 敵の訓練, 確証された堅牢性手法を含む25の防衛戦略を体系的に評価する。非適応性および適応性の両方の設定において,14種類の逆攻撃アルゴリズムを適用し,これらの防御性を検証した。我々は、IQAのスコアと画質を保存すべきであるとして、ディフェンスとIQAタスクへの適用性の違いを分析した。提案されたベンチマークは、今後の開発をガイドし、新しいメソッドの提出を受け入れることを目的としており、最新の結果がオンラインで公開されている。

In the field of Image Quality Assessment (IQA), the adversarial robustness of the metrics poses a critical concern. This paper presents a comprehensive benchmarking study of various defense mechanisms in response to the rise in adversarial attacks on IQA. We systematically evaluate 25 defense strategies, including adversarial purification, adversarial training, and certified robustness methods. We applied 14 adversarial attack algorithms of various types in both non-adaptive and adaptive settings and tested these defenses against them. We analyze the differences between defenses and their applicability to IQA tasks, considering that they should preserve IQA scores and image quality. The proposed benchmark aims to guide future developments and accepts submissions of new methods, with the latest results available online: https://videoprocessing.ai/benchmarks/iqa-defenses.html.

翻訳日:2024-08-06 19:40:03 公開日:2024-08-02

# 非線形解析による心血管障害の心電図分類

Non-linear Analysis Based ECG Classification of Cardiovascular Disorders ( http://arxiv.org/abs/2408.01542v1 )

ライセンス: Link先を確認

Suraj Kumar Behera, Debanjali Bhattacharya, Ninad Aithal, Neelam Sinha,

(参考訳) マルチチャネル心電図による心疾患の検出は、心臓ケアと治療に影響を及ぼす。既存の手法の限界は、電極の位置によるECG波形の変化、信号の非線形性の高さ、ミリボルトの振幅測定などであった。本研究では,Recurrenceプロットの可視化を利用した非線形解析手法について報告する。 QRS複合体のようなよく定義された構造のパターン化は、再帰プロットを用いて効果的に利用することができる。この再帰的手法は、PhystoNetデータベースから公開されているPhysicalkalisch-Technische Bundesanstalt(PTB)データセットに適用され、心筋梗塞、分枝ブロック、心筋症、Dysrhythmiaの4種類の異なる心疾患と健康管理の分類精度を100%達成した。さらに、t-SNEプロットは、再帰プロットと再帰量子化分析の特徴から導かれる潜伏空間の埋め込みを可視化し、考慮された心疾患と健康な個人の間に明確な境界線が示され、このアプローチの可能性を実証している。

Multi-channel ECG-based cardiac disorders detection has an impact on cardiac care and treatment. Limitations of existing methods included variation in ECG waveforms due to the location of electrodes, high non-linearity in the signal, and amplitude measurement in millivolts. The present study reports a non-linear analysis-based methodology that utilizes Recurrence plot visualization. The patterned occurrence of well-defined structures, such as the QRS complex, can be exploited effectively using Recurrence plots. This Recurrence-based method is applied to the publicly available Physikalisch-Technische Bundesanstalt (PTB) dataset from PhysioNet database, where we studied four classes of different cardiac disorders (Myocardial infarction, Bundle branch blocks, Cardiomyopathy, and Dysrhythmia) and healthy controls, achieving an impressive classification accuracy of 100%. Additionally, t-SNE plot visualizations of the latent space embeddings derived from Recurrence plots and Recurrence Quantification Analysis features reveal a clear demarcation between the considered cardiac disorders and healthy individuals, demonstrating the potential of this approach.

翻訳日:2024-08-06 19:40:03 公開日:2024-08-02

# ウィンブルドン・オープン2023トーナメントデータに基づくモーメント・キャプチャーと予測システム

Momentum Capture and Prediction System Based on Wimbledon Open2023 Tournament Data ( http://arxiv.org/abs/2408.01544v1 )

ライセンス: Link先を確認

Chang Liu, Tongyuan Yang, Yan Zhao,

(参考訳) テニスには隠れたエネルギーがあり、見ることも触ることもできない。ゲームの流れを制御する力であり、あらゆる種類の試合に存在している。この謎の力はモメンタムです。本研究では,エントロピー重み法(EWM)とグレイ関係解析(GRA)を相乗化して,運動量の影響を定量化する評価モデルを提案する。実験的な検証はマン=ホイットニーUとコルモゴロフ=スミルノフの実験によって行われ、p値は0.0043と0.00128となった。これらの結果は、運動量シフトと一致結果の非ランダムな関連性を強調し、テニスにおける運動量の重要性を強調している。さもなくば、我々の調査は、高度な機械学習アルゴリズムXGBoostとSHAPフレームワークを組み合わせた予測モデルの作成である。このモデルにより、マッチスイングの精度を極めて高い精度で予測できる(複数試合で0.999013、決勝で0.992738)。本モデルでは,2点間距離の走行など,特定の要因が一致ダイナミクスに与える影響を同定し,その性能を実証し,グランドスラムの4つのトーナメントのデータセットを用いて,モデルの一般化性について徹底的に評価した。結果は,予測精度の微妙な変化にもかかわらず,異なる一致シナリオに対する顕著な適応性を示した。プレイヤーが相手の運動量の変化に効果的に対応し、競争力を高める戦略的洞察を提供する。

There is a hidden energy in tennis, which cannot be seen or touched. It is the force that controls the flow of the game and is present in all types of matches. This mysterious force is Momentum. This study introduces an evaluation model that synergizes the Entropy Weight Method (EWM) and Gray Relation Analysis (GRA) to quantify momentum's impact on match outcomes. Empirical validation was conducted through Mann-Whitney U and Kolmogorov-Smirnov tests, which yielded p values of 0.0043 and 0.00128,respectively. These results underscore the non-random association between momentum shifts and match outcomes, highlighting the critical role of momentum in tennis. Otherwise, our investigation foucus is the creation of a predictive model that combines the advanced machine learning algorithm XGBoost with the SHAP framework. This model enables precise predictions of match swings with exceptional accuracy (0.999013 for multiple matches and 0.992738 for finals). The model's ability to identify the influence of specific factors on match dynamics,such as bilateral distance run during points, demonstrates its prowess.The model's generalizability was thoroughly evaluated using datasets from the four Grand Slam tournaments. The results demonstrate its remarkable adaptability to different match scenarios,despite minor variations in predictive accuracy. It offers strategic insights that can help players effectively respond to opponents' shifts in momentum,enhancing their competitive edge.

翻訳日:2024-08-06 19:40:03 公開日:2024-08-02

# 摂動Floquet-Clifford回路における演算子空間の断片化

Operator space fragmentation in perturbed Floquet-Clifford circuits ( http://arxiv.org/abs/2408.01545v1 )

ライセンス: Link先を確認

Marcell D. Kovács, Christopher J. Turner, Lluis Masanes, Arijeet Pal,

(参考訳) フロッケ量子回路は、幅広い非平衡量子状態を実現し、量子カオス、トポロジカル秩序、局在を示す。本研究では,ランダムなフロケ・クリフォード回路における演算子の局所化の安定性とカオスの出現を,クリフォード極限から遠ざかるユニタリ摂動によって検討する。レンガ加工パターンを用いた最寄りのクリフォード回路を構築し,不規則な非クリフォードゲートの影響について検討する。摂動は、各キュービットに確率$p$のシングルキュービットユニタリから一様にサンプリングされる。相互作用モデルでは, 壁面配置の出現により, 作用素空間が非連結領域に分解されることが特徴である0 \le p < 1$に対して, 作用素の強い局所化が示される。このような壁は、我々が正確に構築した回路に対して、創発的な局所的な運動積分をもたらす。一般摂動に対する局所化の安定性を解析的に確立し、調整可能な演算子の平均長を$p$で計算する。我々の回路は任意の二分割で分離できないが、作用素の局所化が絡み合いのボトルネックに繋がることを示す。最後に、スペクトル形状因子(SFF)を用いて、演算子フラグメントのカオス特性とスペクトル変動を非エルゴディディティのプローブとして特徴付ける。 p = 1$モデルにおいて、断片化時間スケールの出現は、後にSFFが円のユニタリアンサンブルによって近似できるようなランダム行列理論が成立する前に見出される。我々の研究は、現在のNISQデバイスで実現可能な演算子力学と回路エルゴディディティにおける量子位相の明示的な記述を提供する。

Floquet quantum circuits are able to realise a wide range of non-equilibrium quantum states, exhibiting quantum chaos, topological order and localisation. In this work, we investigate the stability of operator localisation and emergence of chaos in random Floquet-Clifford circuits subjected to unitary perturbations which drive them away from the Clifford limit. We construct a nearest-neighbour Clifford circuit with a brickwork pattern and study the effect of including disordered non-Clifford gates. The perturbations are uniformly sampled from single-qubit unitaries with probability $p$ on each qubit. We show that the interacting model exhibits strong localisation of operators for $0 \le p < 1$ that is characterised by the fragmentation of operator space into disjoint sectors due to the appearance of wall configurations. Such walls give rise to emergent local integrals of motion for the circuit that we construct exactly. We analytically establish the stability of localisation against generic perturbations and calculate the average length of operator spreading tunable by $p$. Although our circuit is not separable across any bi-partition, we further show that the operator localisation leads to an entanglement bottleneck, where initially unentangled states remain weakly entangled across typical fragment boundaries. Finally, we study the spectral form factor (SFF) to characterise the chaotic properties of the operator fragments and spectral fluctuations as a probe of non-ergodicity. In the $p = 1$ model, the emergence of a fragmentation time scale is found before random matrix theory sets in after which the SFF can be approximated by that of the circular unitary ensemble. Our work provides an explicit description of quantum phases in operator dynamics and circuit ergodicity which can be realised on current NISQ devices.

翻訳日:2024-08-06 19:40:03 公開日:2024-08-02

# ポイントクラウドセグメンテーションのためのトレーニング可能なポイントワイズデコーダモジュール

Trainable Pointwise Decoder Module for Point Cloud Segmentation ( http://arxiv.org/abs/2408.01548v1 )

ライセンス: Link先を確認

Bike Chen, Chen Gong, Antti Tikanmäki, Juha Röning,

(参考訳) ポイントクラウドセグメンテーション(PCS)は、ポイントごとの予測を行い、ロボットや自動運転車が環境を理解することを可能にすることを目的としている。レンジ画像は大規模屋外点雲の密度の高い表現であり、画像上に構築されたセグメンテーションモデルは一般的に効率的に実行される。しかし、複数の点が同じ位置に投影されているにもかかわらず、各画像座標において1つの点だけが保持されるため、遠距離画像への点雲の投影は必然的に落下する。さらに重要なのは、保持されたポイントクラスとは異なるクラスに属するドロップポイントに正しい予測を割り当てることは困難である。さらに、K-nearest neighbor(KNN)探索やカーネルポイント畳み込み(KPConv)のような既存の後処理手法では、エンド・ツー・エンドでモデルでトレーニングすることはできず、また、様々な密度の屋外ポイント・クラウドを適切に処理できないため、モデルが準最適性能を達成することができる。この問題を軽減するために,近隣住民から重み付けされた特徴を収集し,問合せ点の最終的な予測を行う後処理手法として,訓練可能なポイントワイドデコーダモジュール(PDM)を提案する。さらに,データ拡張において仮想レンジ画像誘導コピーロートペースト(VRCrop)戦略を導入する。 VRCropは、ポイントの総数を制限し、拡張ポイントクラウドにおける望ましくないアーティファクトを排除します。 PDMとVRCropでは、既存のレンジイメージベースのセグメンテーションモデルは、SemanticKITTI、SemanticPOSS、nuScenesデータセットのそれよりも一貫してパフォーマンスが向上している。

Point cloud segmentation (PCS) aims to make per-point predictions and enables robots and autonomous driving cars to understand the environment. The range image is a dense representation of a large-scale outdoor point cloud, and segmentation models built upon the image commonly execute efficiently. However, the projection of the point cloud onto the range image inevitably leads to dropping points because, at each image coordinate, only one point is kept despite multiple points being projected onto the same location. More importantly, it is challenging to assign correct predictions to the dropped points that belong to the classes different from the kept point class. Besides, existing post-processing methods, such as K-nearest neighbor (KNN) search and kernel point convolution (KPConv), cannot be trained with the models in an end-to-end manner or cannot process varying-density outdoor point clouds well, thereby enabling the models to achieve sub-optimal performance. To alleviate this problem, we propose a trainable pointwise decoder module (PDM) as the post-processing approach, which gathers weighted features from the neighbors and then makes the final prediction for the query point. In addition, we introduce a virtual range image-guided copy-rotate-paste (VRCrop) strategy in data augmentation. VRCrop constrains the total number of points and eliminates undesirable artifacts in the augmented point cloud. With PDM and VRCrop, existing range image-based segmentation models consistently perform better than their counterparts on the SemanticKITTI, SemanticPOSS, and nuScenes datasets.

翻訳日:2024-08-06 19:40:03 公開日:2024-08-02

# GANに基づく教師なしマニピュレーションによるマルチタスクSAR画像処理

Multi-task SAR Image Processing via GAN-based Unsupervised Manipulation ( http://arxiv.org/abs/2408.01553v1 )

ライセンス: Link先を確認

Xuran Hu, Mingzhe Zhu, Ziqiang Xu, Zhenpeng Feng, Ljubisa Stankovic,

(参考訳) GAN(Generative Adversarial Networks)は、データ分布の学習パターンにより、多数の現実的なSAR画像を合成する大きな可能性を示している。一部のGANは遅延コードを導入して画像編集を実現し、SAR画像処理において大きな可能性を証明している。従来のSAR画像処理法と比較して、GAN遅延空間制御に基づく編集は完全に教師なしであり、ラベル付きデータなしで画像処理を行うことができる。さらに、データから抽出された情報はより解釈可能である。本稿では,GANをベースとしたUnsupervised Editing (GUE) と呼ばれる新しいSAR画像処理フレームワークを提案し,(1)GANラテント空間における意味的方向の分離と意味的方向の発見,(2)複数の画像処理機能を実現しつつ総合的なSAR画像処理フレームワークの構築,という2つの課題に対処する。 GUEの実装において、慎重に設計されたネットワークをトレーニングすることで、GANラテント空間における絡み合った意味方向を分解する。さらに,複数のSAR画像処理タスク(非特定,ローカライゼーション,補助識別,回転編集など)を1つのトレーニングプロセスで行うことができる。大規模実験により提案手法の有効性が検証された。

Generative Adversarial Networks (GANs) have shown tremendous potential in synthesizing a large number of realistic SAR images by learning patterns in the data distribution. Some GANs can achieve image editing by introducing latent codes, demonstrating significant promise in SAR image processing. Compared to traditional SAR image processing methods, editing based on GAN latent space control is entirely unsupervised, allowing image processing to be conducted without any labeled data. Additionally, the information extracted from the data is more interpretable. This paper proposes a novel SAR image processing framework called GAN-based Unsupervised Editing (GUE), aiming to address the following two issues: (1) disentangling semantic directions in the GAN latent space and finding meaningful directions; (2) establishing a comprehensive SAR image processing framework while achieving multiple image processing functions. In the implementation of GUE, we decompose the entangled semantic directions in the GAN latent space by training a carefully designed network. Moreover, we can accomplish multiple SAR image processing tasks (including despeckling, localization, auxiliary identification, and rotation editing) in a single training process without any form of supervision. Extensive experiments validate the effectiveness of the proposed method.

翻訳日:2024-08-06 19:40:03 公開日:2024-08-02

# 部分的表面触覚イメージングによる胃癌ポリープのロボットによる機械学習診断

Robot-Enabled Machine Learning-Based Diagnosis of Gastric Cancer Polyps Using Partial Surface Tactile Imaging ( http://arxiv.org/abs/2408.01554v1 )

ライセンス: Link先を確認

Siddhartha Kapuria, Jeff Bonyun, Yash Kulkarni, Naruhiko Ikoma, Sandeep Chinchali, Farshid Alambeigi,

(参考訳) 本稿では, 進行胃癌(AGC)の内視鏡診断における既存の限界に対処するために, はじめて提案する。 (i)最近開発されたビジョンベース触覚センサ(VTS)の利用と評価 (II) テクスチャ特徴を用いた腫瘍の分類のための補完的機械学習(ML)アルゴリズム。 7台のDoFロボットマニピュレータと、独自に設計され、追加で製造された現実的なAGC腫瘍ファントムを活用し、従来のMLベースのアプローチで発生するデータ不足とバイアスの問題に対処するVTSを用いた自動データ収集の利点を実証した。合成データ学習型MLモデルは, 各種統計指標を用いた従来のMLモデルと比較して, 混合形態特性および部分センサ接触下においても評価し, 比較した。

In this paper, to collectively address the existing limitations on endoscopic diagnosis of Advanced Gastric Cancer (AGC) Tumors, for the first time, we propose (i) utilization and evaluation of our recently developed Vision-based Tactile Sensor (VTS), and (ii) a complementary Machine Learning (ML) algorithm for classifying tumors using their textural features. Leveraging a seven DoF robotic manipulator and unique custom-designed and additively-manufactured realistic AGC tumor phantoms, we demonstrated the advantages of automated data collection using the VTS addressing the problem of data scarcity and biases encountered in traditional ML-based approaches. Our synthetic-data-trained ML model was successfully evaluated and compared with traditional ML models utilizing various statistical metrics even under mixed morphological characteristics and partial sensor contact.

翻訳日:2024-08-06 19:40:03 公開日:2024-08-02

# 強化膝キネマティクス:3次元インプラントモデリングのためのディープラーニングとモーフィングアルゴリズムの活用

Enhanced Knee Kinematics: Leveraging Deep Learning and Morphing Algorithms for 3D Implant Modeling ( http://arxiv.org/abs/2408.01557v1 )

ライセンス: Link先を確認

Viet-Dung Nguyen, Michael T. LaCour, Richard D. Komistek,

(参考訳) 移植膝モデルの正確な再建は, 整形外科手術や生体工学, 術前計画の強化, インプラント設計の最適化, 手術成績の向上に不可欠である。伝統的な手法は、労働集約的かつエラーを起こしやすい手作業のセグメンテーションに依存している。本研究では, 人工膝の正確な3次元再構築のための機械学習アルゴリズムとモーフィング技術を用いた新しいアプローチを提案する。この手法は、患者の膝関節の蛍光画像やX線画像などの術前画像を取得することから始まる。その後、畳み込みニューラルネットワーク(CNN)が訓練され、インプラントされたコンポーネントの大腿骨の輪郭を自動的に分割し、手作業を大幅に削減し、高い精度を確保する。セグメント化後, 変形アルゴリズムは人工膝関節のパーソナライズされた3次元モデルを生成する。このアルゴリズムは膝関節の形状をシミュレートするためにインプラントの位置、大きさ、方向を考慮している。形態データをインプラント固有のパラメータと統合することにより、再建されたモデルは患者のインプラント解剖と構成を正確に反映する。提案手法の有効性は, 基礎的真理データと既存手法との比較など, 定量的評価によって実証される。各種インプラント型を含む19の試験例において、MLベースのセグメンテーション法は手動セグメンテーションよりも精度と一貫性が優れ、平均RMS誤差は0.58 +/- 0.14 mmであった。本研究は, 移植膝モデルの自動再建のための頑健な枠組みを提供することにより整形外科手術を進展させる。 MLとフォーミングアルゴリズムを活用することで、臨床医や研究者は患者固有の膝解剖、インプラントバイオメカニクス、および手術計画に関する貴重な洞察を得ることができ、患者の成果の改善とケアの質の向上につながる。

Accurate reconstruction of implanted knee models is crucial in orthopedic surgery and biomedical engineering, enhancing preoperative planning, optimizing implant design, and improving surgical outcomes. Traditional methods rely on labor-intensive and error-prone manual segmentation. This study proposes a novel approach using machine learning (ML) algorithms and morphing techniques for precise 3D reconstruction of implanted knee models. The methodology begins with acquiring preoperative imaging data, such as fluoroscopy or X-ray images of the patient's knee joint. A convolutional neural network (CNN) is then trained to automatically segment the femur contour of the implanted components, significantly reducing manual effort and ensuring high accuracy. Following segmentation, a morphing algorithm generates a personalized 3D model of the implanted knee joint, using the segmented data and biomechanical principles. This algorithm considers implant position, size, and orientation to simulate the knee joint's shape. By integrating morphological data with implant-specific parameters, the reconstructed models accurately reflect the patient's implant anatomy and configuration. The approach's effectiveness is demonstrated through quantitative evaluations, including comparisons with ground truth data and existing techniques. In 19 test cases involving various implant types, the ML-based segmentation method showed superior accuracy and consistency compared to manual segmentation, with an average RMS error of 0.58 +/- 0.14 mm. This research advances orthopedic surgery by providing a robust framework for the automated reconstruction of implanted knee models. Leveraging ML and morphing algorithms, clinicians and researchers gain valuable insights into patient-specific knee anatomy, implant biomechanics, and surgical planning, leading to improved patient outcomes and enhanced quality of care.

翻訳日:2024-08-06 19:40:03 公開日:2024-08-02

# 合成データを用いた深層学習モデルによる領域認識電子顕微鏡解析の高速化と画像-Wide Confidence Scoring

Accelerating Domain-Aware Electron Microscopy Analysis Using Deep Learning Models with Synthetic Data and Image-Wide Confidence Scoring ( http://arxiv.org/abs/2408.01558v1 )

ライセンス: Link先を確認

Matthew J. Lynch, Ryan Jacobs, Gabriella Bruno, Priyam Patki, Dane Morgan, Kevin G. Field,

(参考訳) 機械学習(ML)モデルの統合は、顕微鏡における機能検出の効率性、可視性、信頼性を高めるが、その開発と適用性は、不足し、しばしば手動でラベル付けされたデータセットの欠陥とドメイン認識の欠如によって妨げられる。物理に基づく合成画像とデータジェネレータを作成することでこれらの課題に対処し、人間のラベル付きデータに基づいてトレーニングされたモデルに対して、同等の精度(0.86)、リコール(0.63)、F1スコア(0.71)、エンジニアリングプロパティ予測(R2=0.82)を実現する機械学習モデルを実現した。我々は,特徴予測信頼度スコアを用いて画像全体の信頼度を導出し,領域外画像の曖昧さを排除し,フィルタアウト率25%で5～30%の性能向上を実現した。本研究は,合成データがMLの人間依存を排除し,画像毎に多くの特徴を検出する必要がある場合に,ドメイン認識の手段を提供することを示す。

The integration of machine learning (ML) models enhances the efficiency, affordability, and reliability of feature detection in microscopy, yet their development and applicability are hindered by the dependency on scarce and often flawed manually labeled datasets and a lack of domain awareness. We addressed these challenges by creating a physics-based synthetic image and data generator, resulting in a machine learning model that achieves comparable precision (0.86), recall (0.63), F1 scores (0.71), and engineering property predictions (R2=0.82) to a model trained on human-labeled data. We enhanced both models by using feature prediction confidence scores to derive an image-wide confidence metric, enabling simple thresholding to eliminate ambiguous and out-of-domain images resulting in performance boosts of 5-30% with a filtering-out rate of 25%. Our study demonstrates that synthetic data can eliminate human reliance in ML and provides a means for domain awareness in cases where many feature detections per image are needed.

翻訳日:2024-08-06 19:30:18 公開日:2024-08-02

# 空間的均質モード選択モデルを用いたニューヨーク市インターボロー・エクスプレスの福祉・持続可能性・株式評価

Welfare, sustainability, and equity evaluation of the New York City Interborough Express using spatially heterogeneous mode choice models ( http://arxiv.org/abs/2408.01562v1 )

ライセンス: Link先を確認

Hai Yang, Hongying Wu, Lauren Whang, Xiyuan Ren, Joseph Y. J. Chow,

(参考訳) メトロポリタン・トランジット・オーソリティ(MTA)はクイーンズとブルックリンの間を高速で直通するインターボロー・エクスプレス(IBX)と呼ばれる新しいライトレールの建設を提案した。 IBXがニューヨーク(NYC)にもたらす潜在的影響を評価するために、オープンアクセスの都市全体の旅行アジェンダデータセットとブロックグループレベルのモード選択モデルが使用される。 IBXは28.1分を市内の潜在的な乗客に節約することができた。 IBXに近い地域に行くか出発する旅行者にとって、平均的な節約時間は29.7分と見積もられている。 IBXは完成後、毎日2万4千人(公式のIBX提案より69%高い)の乗車を予定している。その内、7万人以上(30.8%)が低所得世帯から、また165万人(64.7%)がIBX回廊で出発または終了する。 IBXの追加は、トランジットモードへの毎日5万回以上の旅行を惹きつけることとなり、そのうち16万回以上は民間車両の使用から切り替えられ、温室効果ガス(GHG)の排出量は1日あたり29.28トン削減された。 IBXは1回の旅行で1.25米ドル、低所得の旅行者が1回の旅行で最大1.64米ドルと見積もられている。低所得者にとって利益は比例的に高いが、消費者の余剰が人口平均の10%以下(すでにかなり低い)の旅行者の割合を著しく減らしているようには見えない。

The Metropolitan Transit Authority (MTA) proposed building a new light rail route called the Interborough Express (IBX) to provide a direct, fast transit linkage between Queens and Brooklyn. An open-access synthetic citywide trip agenda dataset and a block-group-level mode choice model are used to assess the potential impact IBX could bring to New York City (NYC). IBX could save 28.1 minutes to potential riders across the city. For travelers either going to or departing from areas close to IBX, the average time saving is projected to be 29.7 minutes. IBX is projected to have more than 254 thousand daily ridership after its completion (69% higher than reported in the official IBX proposal). Among those riders, more than 78 thousand people (30.8%) would come from low-income households while 165 thousand people (64.7%) would start or end along the IBX corridor. The addition of IBX would attract more than 50 thousand additional daily trips to transit mode, among which more than 16 thousand would be switched from using private vehicles, reducing potential greenhouse gas (GHG) emissions by 29.28 metric tons per day. IBX can also bring significant consumer surplus benefits to the communities, which are estimated to be $1.25 USD per trip, or as high as $1.64 per trip made by a low-income traveler. While benefits are proportionately higher for lower-income users, the service does not appear to significantly reduce the proportion of travelers whose consumer surpluses fall below 10% of the population average (already quite low).

翻訳日:2024-08-06 19:30:18 公開日:2024-08-02

# カメラモデルに基づく自己監督深度推定

Self-Supervised Depth Estimation Based on Camera Models ( http://arxiv.org/abs/2408.01565v1 )

ライセンス: Link先を確認

Jinchang Zhang, Praveen Kumar Reddy, Xue-Iuan Wong, Guoyu Lu,

(参考訳) 深さ推定はロボット工学と視覚関連タスクにとって重要なトピックである。単眼深度推定では、高価な地中レーティングを必要とする教師あり学習と比較して、自己教師あり手法はラベリングコストを伴わないため、大きな可能性を秘めている。しかし、自己教師付き学習は、深さ推定性能において教師付き学習と大きなギャップがある。一方、スケーリングは単眼で教師なし深度推定を行う上でも大きな問題であり、GPSやLiDAR、あるいは既存の地図からの地上の真理スケールを必要とすることが多い。ディープラーニング時代においては、既存の手法は主に教師なしニューラルネットワークを訓練するための画像関係の探索に依存しているが、カメラ自体の基本的な情報は一般的に無視されており、監視信号を提供するための余分な装置を必要とせずに、広範囲にわたる監視情報を無償で提供することができる。カメラ自体の内在と外在を生かして、物理的原理に基づいて地上と地上を結ぶ地域について深度情報を計算し、他のセンサーを使わずに自由に監視情報を提供する。この方法は容易に実現でき、教師なしのすべての方法の効果を高めるための構成要素となることができる。

Depth estimationn is a critical topic for robotics and vision-related tasks. In monocular depth estimation, in comparison with supervised learning that requires expensive ground truth labeling, self-supervised methods possess great potential due to no labeling cost. However, self-supervised learning still has a large gap with supervised learning in depth estimation performance. Meanwhile, scaling is also a major issue for monocular unsupervised depth estimation, which commonly still needs ground truth scale from GPS, LiDAR, or existing maps to correct. In deep learning era, while existing methods mainly rely on the exploration of image relationships to train the unsupervised neural networks, fundamental information provided by the camera itself has been generally ignored, which can provide extensive supervision information for free, without the need for any extra equipment to provide supervision signals. Utilizing the camera itself's intrinsics and extrinsics, depth information can be calculated for ground regions and regions connecting ground based on physical principles, providing free supervision information without any other sensors. The method is easy to realize and can be a component to enhance the effects of all the unsupervised methods.

翻訳日:2024-08-06 19:30:18 公開日:2024-08-02

# フルレンジヘッドポース幾何データ拡張

Full-range Head Pose Geometric Data Augmentations ( http://arxiv.org/abs/2408.01566v1 )

ライセンス: Link先を確認

Huei-Chung Hu, Xuyang Wu, Haowei Liu, Ting-Ruen Wei, Hsin-Tai Wu,

(参考訳) 多くのヘッドポーズ推定(HPE)手法は、理論上、様々な角度から頭部の回転と位置を推定できるフルレンジデータセットを作成することを約束する。しかし、これらの手法は頭部角度の範囲内でのみ正確であり、この特定の範囲を超えると重大な不正確な結果がもたらされる。これは、座標系と基底回転行列計算で用いられるオイラー角の明確な特異性によって、支配的に説明される。そこで我々は,(1)正しい座標系とユーラー角を正しい軸列で正確に推定する手法,(2)SPECIFIC座標系の下での回転行列の2次元幾何増分式,(3)回転行列とポーズの正しい描画ルーチンの導出,(4)フルレンジの頭部ポーズデータセット生成のための適切なピッチヨットカバレッジを可能にする数学的実験と検証を行うことによって,これらの制限に対処した。提案手法を既存の頭部ポーズ推定法に適用することにより,モデルの性能が大幅に向上した。コードは受理後に公開される。

Many head pose estimation (HPE) methods promise the ability to create full-range datasets, theoretically allowing the estimation of the rotation and positioning of the head from various angles. However, these methods are only accurate within a range of head angles; exceeding this specific range led to significant inaccuracies. This is dominantly explained by unclear specificity of the coordinate systems and Euler Angles used in the foundational rotation matrix calculations. Here, we addressed these limitations by presenting (1) methods that accurately infer the correct coordinate system and Euler angles in the correct axis-sequence, (2) novel formulae for 2D geometric augmentations of the rotation matrices under the (SPECIFIC) coordinate system, (3) derivations for the correct drawing routines for rotation matrices and poses, and (4) mathematical experimentation and verification that allow proper pitch-yaw coverage for full-range head pose dataset generation. Performing our augmentation techniques to existing head pose estimation methods demonstrated a significant improvement to the model performance. Code will be released upon paper acceptance.

翻訳日:2024-08-06 19:30:18 公開日:2024-08-02

# デジタル病理における組織像の検索と検索の妥当性について

On Validation of Search & Retrieval of Tissue Images in Digital Pathology ( http://arxiv.org/abs/2408.01570v1 )

ライセンス: Link先を確認

H. R. Tizhoosh,

(参考訳) 医療画像は、診断、治療計画、疾病モニタリングに不可欠な情報を提供することによって、現代医療において重要な役割を担っている。放射線学や病理学などの分野は正確な画像解釈に大きく依存しており、X線、CTスキャン、MRIを用いて骨折から癌までを診断する一方、病理学者は顕微鏡とデジタル画像を用いてがんや感染症の診断に細胞異常を検出する。技術的進歩は、医療画像の量と複雑さを指数関数的に増加させ、管理と検索に効率的なツールを必要としている。 CBIR(Content-Based Image Retrieval)システムは、視覚的コンテンツに基づいて画像の検索と検索を行い、臨床医が類似の症例を見つけ、病理パターンを比較することによって診断精度を高めることで、このニーズに対処する。医療応用における画像検索エンジンの総合的検証には、精度、インデックス化、検索時間、ストレージオーバーヘッドなどのパフォーマンス指標の評価が含まれており、最近の病理組織学の検証で示されているように、正確な結果の信頼性と効率的な検索が保証されている。

Medical images play a crucial role in modern healthcare by providing vital information for diagnosis, treatment planning, and disease monitoring. Fields such as radiology and pathology rely heavily on accurate image interpretation, with radiologists examining X-rays, CT scans, and MRIs to diagnose conditions from fractures to cancer, while pathologists use microscopy and digital images to detect cellular abnormalities for diagnosing cancers and infections. The technological advancements have exponentially increased the volume and complexity of medical images, necessitating efficient tools for management and retrieval. Content-Based Image Retrieval (CBIR) systems address this need by searching and retrieving images based on visual content, enhancing diagnostic accuracy by allowing clinicians to find similar cases and compare pathological patterns. Comprehensive validation of image search engines in medical applications involves evaluating performance metrics like accuracy, indexing, and search times, and storage overhead, ensuring reliable and efficient retrieval of accurate results, as demonstrated by recent validations in histopathology.

翻訳日:2024-08-06 19:30:18 公開日:2024-08-02

# 拡散オートエンコーダを用いた医用画像の分類と回帰の因果的説明

Counterfactual Explanations for Medical Image Classification and Regression using Diffusion Autoencoder ( http://arxiv.org/abs/2408.01571v1 )

ライセンス: Link先を確認

Matan Atad, David Schinz, Hendrik Moeller, Robert Graf, Benedikt Wiestler, Daniel Rueckert, Nassir Navab, Jan S. Kirschke, Matthias Keicher,

(参考訳) 対実的説明(CE)は、入力特徴の変化が結果の予測にどのように影響するかを説明することによって、機械学習モデルの解釈可能性を高めることを目的としている。共通CEアプローチは追加のモデルを必要とし、通常は二項対物に制約される。対照的に、生成モデルの潜在空間、特に拡散オートエンコーダ(DAE)を直接操作する新しい手法を提案する。このアプローチは、CEの生成と決定境界を越えたモデルの内部表現の継続的な可視化を可能にすることによって、固有の解釈可能性を提供します。提案手法は,DAEが意味的にリッチな潜在空間を教師なしでエンコードする機能を活用し,ラベル付きデータや特徴抽出モデルを不要にする。脊椎圧迫骨折 (VCF) や糖尿病性網膜症 (DR) などの重症度病態の医学的分類や経時的退行に有用であることが示唆された。本手法は,線形モデルを用いた順序CEの可視化をサポートし,モデル決定過程の深い洞察と解釈可能性の向上を実現する。様々な医用画像データセットに対する実験は、解釈可能性と汎用性における手法の利点を実証している。 DAEの潜伏空間の線形多様体は意味のある補間と操作を可能にし、医療画像特性を探索するための強力なツールとなった。私たちのコードはhttps://github.com/matanat/dae_counterfactual.comで利用可能です。

Counterfactual explanations (CEs) aim to enhance the interpretability of machine learning models by illustrating how alterations in input features would affect the resulting predictions. Common CE approaches require an additional model and are typically constrained to binary counterfactuals. In contrast, we propose a novel method that operates directly on the latent space of a generative model, specifically a Diffusion Autoencoder (DAE). This approach offers inherent interpretability by enabling the generation of CEs and the continuous visualization of the model's internal representation across decision boundaries. Our method leverages the DAE's ability to encode images into a semantically rich latent space in an unsupervised manner, eliminating the need for labeled data or separate feature extraction models. We show that these latent representations are helpful for medical condition classification and the ordinal regression of severity pathologies, such as vertebral compression fractures (VCF) and diabetic retinopathy (DR). Beyond binary CEs, our method supports the visualization of ordinal CEs using a linear model, providing deeper insights into the model's decision-making process and enhancing interpretability. Experiments across various medical imaging datasets demonstrate the method's advantages in interpretability and versatility. The linear manifold of the DAE's latent space allows for meaningful interpolation and manipulation, making it a powerful tool for exploring medical image properties. Our code is available at https://github.com/matanat/dae_counterfactual.

翻訳日:2024-08-06 19:30:18 公開日:2024-08-02

# リンドープダイヤモンドショットキーダイオードにおける色中心の電気励起

Electrical excitation of color centers in phosphorus-doped diamond Schottky diodes ( http://arxiv.org/abs/2408.01572v1 )

ライセンス: Link先を確認

Florian Sledz, Igor A. Khramtsov, Assegid M. Flatae, Stefano Lagomarsino, Silvio Sciortino, Shannon S. Nicley, Rozita Rouzbahani, Paulius Pobedinskas, Tianxiao Guo, Xin Jiang, Paul Kienitz, Peter Haring Bolivar, Ken Haenen, Dmitry Yu. Fedyanin, Mario Agio,

(参考訳) 環境条件下で電気注入を行う堅牢な量子光源は、量子鍵分布やメトロジーのような量子技術の実用化に望ましい。ダイヤモンドのカラーセンターは、室温と高温で撮影可能なエミッターであるため、有望な候補となっている。それらの電気励起の可能性は既にp-i-nダイオード内で実証されている。しかし、これは複雑なダイヤモンド構造の成長を必要とする。これらの従来手法とは対照的に, ダイヤモンド中の色中心をベースとした単一光子発光デバイスの実現を約束する, 水素を透過したn型ダイヤモンドをベースとした新しいショットキーダイオード構成において, 電気ポンプによる色中心の放出を実証する。

A robust quantum light source operating upon electrical injection at ambient conditions is desirable for practical implementation of quantum technologies, such as quantum key distribution or metrology. Color centers in diamond are promising candidates as they are photostable emitters at room and elevated temperatures. The possibility of their electrical excitation has already been demonstrated within p-i-n diodes. However, this requires the growth of complex diamond structures. In contrast to these conventional approaches, we demonstrate the emission of color centers under electrical pumping in a novel Schottky diode configuration based on hydrogen passivated n-type diamond, which holds promise for integrated single-photon emitting devices based on color centers in diamond.

翻訳日:2024-08-06 19:30:18 公開日:2024-08-02

# 4次元地震探査とモニタリングデータによるCO2貯蔵の履歴マッチングのためのディープラーニングフレームワーク

Deep Learning Framework for History Matching CO2 Storage with 4D Seismic and Monitoring Well Data ( http://arxiv.org/abs/2408.01575v1 )

ライセンス: Link先を確認

Nanzhe Wang, Louis J. Durlofsky,

(参考訳) 地質的な炭素貯蔵は、超臨界二酸化炭素のメガトンを地下層に注入する。これらの構造の特徴は、通常非常に不確実であり、大規模なストレージ操作の設計と最適化を困難にしている。本稿では,早期観測に基づく生成特性の校正を可能にする履歴マッチング戦略を提案する。早期評価は、作戦が計画通りに実行されていることを保証するために不可欠である。筆者らのフレームワークは、2つの適合型深層学習サロゲートモデルによって構成されている。これらの2種類のデータは解像度のスケールが全く異なるため、予測のために個別に専門的なディープラーニングネットワークを構築するのが適切である。このアプローチによって、グローバルな高忠実度予測を提供する単一のサロゲートよりも、設計が簡単で、トレーニングが効率的になるワークフローが実現します。ディープラーニングモデルは階層的なマルコフ連鎖モンテカルロ (MCMC) 履歴マッチング手順に統合される。 4次元地震データを用いて, 履歴マッチングを行い, 不確実性低減に対する4次元地震の影響を定量化する。両データ型の利用は,鍵ジオモデルパラメータの相当な不確実性を低減し,CO2配管力学の正確な予測を可能にする。本研究で開発された全体的な履歴マッチングフレームワークは,複数のデータ型を統合し,不確実性低減と性能予測に与える影響を評価するための効率的な方法である。

Geological carbon storage entails the injection of megatonnes of supercritical CO2 into subsurface formations. The properties of these formations are usually highly uncertain, which makes design and optimization of large-scale storage operations challenging. In this paper we introduce a history matching strategy that enables the calibration of formation properties based on early-time observations. Early-time assessments are essential to assure the operation is performing as planned. Our framework involves two fit-for-purpose deep learning surrogate models that provide predictions for in-situ monitoring well data and interpreted time-lapse (4D) seismic saturation data. These two types of data are at very different scales of resolution, so it is appropriate to construct separate, specialized deep learning networks for their prediction. This approach results in a workflow that is more straightforward to design and more efficient to train than a single surrogate that provides global high-fidelity predictions. The deep learning models are integrated into a hierarchical Markov chain Monte Carlo (MCMC) history matching procedure. History matching is performed on a synthetic case with and without 4D seismic data, which allows us to quantify the impact of 4D seismic on uncertainty reduction. The use of both data types is shown to provide substantial uncertainty reduction in key geomodel parameters and to enable accurate predictions of CO2 plume dynamics. The overall history matching framework developed in this study represents an efficient way to integrate multiple data types and to assess the impact of each on uncertainty reduction and performance predictions.

翻訳日:2024-08-06 19:30:18 公開日:2024-08-02

# THOR2: 色空間のトポロジカルソフトクラスタリングによる不透明環境におけるヒューマンインスパイアされた物体認識

THOR2: Leveraging Topological Soft Clustering of Color Space for Human-Inspired Object Recognition in Unseen Environments ( http://arxiv.org/abs/2408.01579v1 )

ライセンス: Link先を確認

Ekta U. Samani, Ashis G. Banerjee,

(参考訳) 見えない、散らばった屋内環境における視覚的物体認識は、移動ロボットにとって難しい問題である。本研究では,RGB-D画像から生成された点群に対する3次元形状と色に基づく記述子TOPS2と,それに付随する認識フレームワークTHOR2を提案する。 TOPS2ディスクリプタは、粗い色領域のネットワークを用いて計算されたスライスベースのカラー埋め込みを通じてオブジェクト色情報をキャプチャしながら、TOPSディスクリプタから3D形状のスライスベースのトポロジカル表現を保持することにより、人間の認知機構であるオブジェクト単位を具現化する。これらの色領域は, トポロジカルソフトクラスタリング法であるMapperアルゴリズムを用いて, 人間の色知覚で同定されたマカダム楕円体に類似している。合成データを用いてトレーニングされたTHOR2は、異なる視点から散在するシーンをキャプチャするOCIDデータセットと、異なる環境条件とコモディティハードウェアを用いて記録されたオブジェクトの閉塞度を反映するUW-IS Occludedデータセットの2つのベンチマークで、3D形状ベースの前駆体であるTHORと比較して、認識精度が著しく向上したことを示した。 THOR2はまた、ベースラインのディープラーニングネットワークよりも優れており、両方のデータセットでRGB-D入力に適応した広く使われているViTである。したがって、THOR2は低コストロボットにおける堅牢な認識を実現するための有望なステップである。

Visual object recognition in unseen and cluttered indoor environments is a challenging problem for mobile robots. This study presents a 3D shape and color-based descriptor, TOPS2, for point clouds generated from RGB-D images and an accompanying recognition framework, THOR2. The TOPS2 descriptor embodies object unity, a human cognition mechanism, by retaining the slicing-based topological representation of 3D shape from the TOPS descriptor while capturing object color information through slicing-based color embeddings computed using a network of coarse color regions. These color regions, analogous to the MacAdam ellipses identified in human color perception, are obtained using the Mapper algorithm, a topological soft-clustering technique. THOR2, trained using synthetic data, demonstrates markedly improved recognition accuracy compared to THOR, its 3D shape-based predecessor, on two benchmark real-world datasets: the OCID dataset capturing cluttered scenes from different viewpoints and the UW-IS Occluded dataset reflecting different environmental conditions and degrees of object occlusion recorded using commodity hardware. THOR2 also outperforms baseline deep learning networks, and a widely-used ViT adapted for RGB-D inputs on both the datasets. Therefore, THOR2 is a promising step toward achieving robust recognition in low-cost robots.

翻訳日:2024-08-06 19:30:18 公開日:2024-08-02

# 巨大アンサンブル(第2報)球状フーリエニューラル演算子を用いたハインドキャストの大規模アンサンブルの特性

Huge Ensembles Part II: Properties of a Huge Ensemble of Hindcasts Generated with Spherical Fourier Neural Operators ( http://arxiv.org/abs/2408.01581v1 )

ライセンス: Link先を確認

Ankur Mahesh, William Collins, Boris Bonev, Noah Brenowitz, Yair Cohen, Peter Harrington, Karthik Kashinath, Thorsten Kurth, Joshua North, Travis OBrien, Michael Pritchard, David Pruitt, Mark Risser, Shashank Subramanian, Jared Willard,

(参考訳) パートIでは,球状フーリエニューラル演算子に基づくアンサンブルを作成した。初期条件摂動として、ブレッドベクトルを用い、モデル摂動として、スクラッチから独立して訓練された複数のチェックポイントを用いた。アンサンブルの物理的忠実度を評価する診断に基づいて、我々のアンサンブルは運用天気予報システムに匹敵する性能を有する。しかし、数桁の計算資源を必要とする。第2部では,2023年夏から1日に7,424人が参加し,巨大なアンサンブル(HENS)を発生させる。この規模で巨大なアンサンブルを実行するための技術的要件を列挙します。 HENSは予測分布の尾部を正確にサンプリングし、内部変数の詳細なサンプリングを行う。極端な気候統計では、HENSはアンサンブル平均から4$\sigma$のイベントをサンプリングする。各グリッドセルにおいて、HENSは最も正確なアンサンブル部材のスキルを改善し、将来の軌道のカバレッジを高める。天気予報モデルとして、HENSは、不確実性の定量化を向上した極端な天気予報を発行する。また、検証値がアンサンブル予測分布の外側にあるような、外れ値イベントの確率を下げる。

In Part I, we created an ensemble based on Spherical Fourier Neural Operators. As initial condition perturbations, we used bred vectors, and as model perturbations, we used multiple checkpoints trained independently from scratch. Based on diagnostics that assess the ensemble's physical fidelity, our ensemble has comparable performance to operational weather forecasting systems. However, it requires several orders of magnitude fewer computational resources. Here in Part II, we generate a huge ensemble (HENS), with 7,424 members initialized each day of summer 2023. We enumerate the technical requirements for running huge ensembles at this scale. HENS precisely samples the tails of the forecast distribution and presents a detailed sampling of internal variability. For extreme climate statistics, HENS samples events 4$\sigma$ away from the ensemble mean. At each grid cell, HENS improves the skill of the most accurate ensemble member and enhances coverage of possible future trajectories. As a weather forecasting model, HENS issues extreme weather forecasts with better uncertainty quantification. It also reduces the probability of outlier events, in which the verification value lies outside the ensemble forecast distribution.

翻訳日:2024-08-06 19:30:18 公開日:2024-08-02

# 個別処理効果推定と推定のための等角拡散モデル

Conformal Diffusion Models for Individual Treatment Effect Estimation and Inference ( http://arxiv.org/abs/2408.01582v1 )

ライセンス: Link先を確認

Hengrui Cai, Huaqing Jin, Lexin Li,

(参考訳) 観察データから治療効果を推定することは、多くのアプリケーション領域において中心的な関心事である。個々の治療効果は、個々のレベルで最もきめ細かい治療効果を示し、パーソナライズされたケアを促進するのに最も有用である。しかし、いくつかの問題により、その推定と推測は未発達のままである。本稿では、これらの複雑な課題に対処する新しい共形拡散モデルに基づくアプローチを提案する。我々は,共形推論のモデルフリーな統計的推論パラダイムである高フレキシブルな拡散モデルと,分布シフトに対処する確率スコアと共変局所近似を統合した。我々は、個々の治療効果の潜在的な結果の分布を不偏に見積もり、情報的信頼区間を構築し、厳密な理論的保証を確立する。提案手法の既存解に対する競合性能を,広範囲な数値研究により実証した。

Estimating treatment effects from observational data is of central interest across numerous application domains. Individual treatment effect offers the most granular measure of treatment effect on an individual level, and is the most useful to facilitate personalized care. However, its estimation and inference remain underdeveloped due to several challenges. In this article, we propose a novel conformal diffusion model-based approach that addresses those intricate challenges. We integrate the highly flexible diffusion modeling, the model-free statistical inference paradigm of conformal inference, along with propensity score and covariate local approximation that tackle distributional shifts. We unbiasedly estimate the distributions of potential outcomes for individual treatment effect, construct an informative confidence interval, and establish rigorous theoretical guarantees. We demonstrate the competitive performance of the proposed method over existing solutions through extensive numerical studies.

翻訳日:2024-08-06 19:30:18 公開日:2024-08-02

# GPUDrive:100万FPSでデータ駆動マルチエージェント駆動シミュレーション

GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS ( http://arxiv.org/abs/2408.01584v1 )

ライセンス: Link先を確認

Saman Kazemkhani, Aarav Pandya, Daphne Cornelisse, Brennan Shacklett, Eugene Vinitsky,

(参考訳) マルチエージェント学習アルゴリズムは多種多様なゲームでスーパーヒューマンプランニングを生成することに成功したが、デプロイされたマルチエージェントプランナーの設計にはほとんど影響を与えていない。これらのテクニックをマルチエージェント計画に適用する上で重要なボトルネックは、何十億もの経験ステップを必要とすることだ。このスケールでのマルチエージェント計画の研究を可能にするために,Madrona Game Engine上に構築されたGPUアクセラレーションによるマルチエージェントシミュレータであるGPUDriveを紹介した。観察、報酬、動的関数はC++で直接書かれており、ユーザーは高性能なCUDAに格下げされる複雑で異質なエージェントの振る舞いを定義できる。 GPUDriveを使用することで、Waymo Motionデータセットの多くのシーンで強化学習エージェントを効果的にトレーニングすることができ、個々のシーンで数分で高い効率の目標達成エージェントが得られ、数時間で一般的に有能なエージェントが得られます。トレーニングされたエージェントは、https://github.com/Emerge-Lab/gpudrive.comのコードベースの一部として出荷されます。

Multi-agent learning algorithms have been successful at generating superhuman planning in a wide variety of games but have had little impact on the design of deployed multi-agent planners. A key bottleneck in applying these techniques to multi-agent planning is that they require billions of steps of experience. To enable the study of multi-agent planning at this scale, we present GPUDrive, a GPU-accelerated, multi-agent simulator built on top of the Madrona Game Engine that can generate over a million steps of experience per second. Observation, reward, and dynamics functions are written directly in C++, allowing users to define complex, heterogeneous agent behaviors that are lowered to high-performance CUDA. We show that using GPUDrive we are able to effectively train reinforcement learning agents over many scenes in the Waymo Motion dataset, yielding highly effective goal-reaching agents in minutes for individual scenes and generally capable agents in a few hours. We ship these trained agents as part of the code base at https://github.com/Emerge-Lab/gpudrive.

翻訳日:2024-08-06 19:30:18 公開日:2024-08-02

# OpenLogParser: オープンソースの大規模言語モデルによる教師なしのパース

OpenLogParser: Unsupervised Parsing with Open-Source Large Language Models ( http://arxiv.org/abs/2408.01585v1 )

ライセンス: Link先を確認

Zeyang Ma, Dong Jae Kim, Tse-Hsun Chen,

(参考訳) ログ解析は、非構造化ログデータを構造化フォーマットに変換する重要なステップであり、その後のログベースの分析を容易にする。従来の構文ベースのログパーサは効率的で効果的だが、事前に定義されたルールから外れたログを処理すると、精度が低下することが多い。近年,大規模言語モデル (LLM) に基づくログ解析では,解析精度が向上している。しかし、既存のLCMベースのパーサは、1)微調整やインコンテキスト学習のための時間的および労働集約的なマニュアルラベリング、2)大量のログデータとLLMのコンテキストサイズ制限による解析コストの増加、3)機密ログ情報を備えたChatGPTのような商用モデルの使用によるプライバシリスクの3つの課題に直面している。この制限を克服するために,OpenLogParserを導入する。これはオープンソースのLLM(Llama3-8B)を活用して,最先端の解析精度を確保しながら,プライバシの向上と運用コストの削減を実現する,教師なしのログ解析アプローチである。 OpenLogParserは、同じ静的テキストでログをグループ化するが、固定深さのグルーピングツリーを使用して動的変数を変更する。次に、これらのグループ内のログを3つのコンポーネントを使って解析する。 i)類似度スコアリングに基づく検索強化生成:Jaccardの類似性に基づいて各グループ内の多様なログを選択し、LCMが静的テキストと動的変数を区別するのに役立つ。二自己回帰解析精度を向上させるため、ログテンプレートを洗練するためにLCMを反復的にクエリすること。三ログテンプレートメモリ: 解析効率を向上させるため、LLMクエリを減らすために解析テンプレートを格納する。 LogHub-2.0の評価では,OpenLogParserは解析精度が25%向上し,ログ処理は最先端のLCMベースのパーサに比べて2.7倍高速であった。簡単に言うと、OpenLogParserは商用LLMを使用することによるプライバシとコストの懸念に対処しつつ、最先端のパース効率と正確性を実現している。

Log parsing is a critical step that transforms unstructured log data into structured formats, facilitating subsequent log-based analysis. Traditional syntax-based log parsers are efficient and effective, but they often experience decreased accuracy when processing logs that deviate from the predefined rules. Recently, large language models (LLM) based log parsers have shown superior parsing accuracy. However, existing LLM-based parsers face three main challenges: 1)time-consuming and labor-intensive manual labeling for fine-tuning or in-context learning, 2)increased parsing costs due to the vast volume of log data and limited context size of LLMs, and 3)privacy risks from using commercial models like ChatGPT with sensitive log information. To overcome these limitations, this paper introduces OpenLogParser, an unsupervised log parsing approach that leverages open-source LLMs (i.e., Llama3-8B) to enhance privacy and reduce operational costs while achieving state-of-the-art parsing accuracy. OpenLogParser first groups logs with similar static text but varying dynamic variables using a fixed-depth grouping tree. It then parses logs within these groups using three components: i)similarity scoring-based retrieval augmented generation: selects diverse logs within each group based on Jaccard similarity, helping the LLM distinguish between static text and dynamic variables; ii)self-reflection: iteratively query LLMs to refine log templates to improve parsing accuracy; and iii) log template memory: stores parsed templates to reduce LLM queries for improved parsing efficiency. Our evaluation on LogHub-2.0 shows that OpenLogParser achieves 25% higher parsing accuracy and processes logs 2.7 times faster compared to state-of-the-art LLM-based parsers. In short, OpenLogParser addresses privacy and cost concerns of using commercial LLMs while achieving state-of-the-arts parsing efficiency and accuracy.

翻訳日:2024-08-06 19:30:18 公開日:2024-08-02

# 幼児の耳認識と縦断的評価のための深層学習アプローチ

Deep Learning Approach for Ear Recognition and Longitudinal Evaluation in Children ( http://arxiv.org/abs/2408.01588v1 )

ライセンス: Link先を確認

Afzal Hossain, Tipu Sultan, Stephanie Schuckers,

(参考訳) バイオメトリック・モダリティとしての耳の認識はますます人気を集めており、将来的な応用分野が期待されている。現在の応用は成人を含むが、子供の耳の認識における課題の1つは、年齢とともに耳の構造が急速に変化することである。本研究は,4歳から14歳までの小児から2.5年間に収集した基礎的縦断的データセットを導入し,その評価を行った。本稿では,VGG16とMobileNetのアンサンブルを用いて,成人・小児両方のデータセットに着目し,子どもの縦断的評価を重視した深層学習に基づく認識手法を提案する。

Ear recognition as a biometric modality is becoming increasingly popular, with promising broader application areas. While current applications involve adults, one of the challenges in ear recognition for children is the rapid structural changes in the ear as they age. This work introduces a foundational longitudinal dataset collected from children aged 4 to 14 years over a 2.5-year period and evaluates ear recognition performance in this demographic. We present a deep learning based approach for ear recognition, using an ensemble of VGG16 and MobileNet, focusing on both adult and child datasets, with an emphasis on longitudinal evaluation for children.

翻訳日:2024-08-06 19:30:18 公開日:2024-08-02

# テキスト・画像・ジェネレータにおける鋳物の解釈・表現・ステレオタイプ

Interpretations, Representations, and Stereotypes of Caste within Text-to-Image Generators ( http://arxiv.org/abs/2408.01590v1 )

ライセンス: Link先を確認

Sourojit Ghosh,

(参考訳) テキスト・ツー・イメージ・ジェネレータ(T2Is)の普及は、公正性と公平な結果の確保に関する広範な研究と、それらが社会に与える影響に焦点が当てられている。しかし、このような研究は一般的に、世界的な経験のあるアイデンティティや、西洋の文脈中心に焦点を絞ったものである。本稿では,T2I研究における悲劇的に未解明な文脈を取り巻く解釈,表現,ステレオタイプについて述べる。我々は、T2I安定拡散が、様々なキャストの人々をいかに表示するか、そして、彼らが演じている職業をどう表現するかを考察する。 1プロンプトあたり100個の画像を生成し、安定拡散による「インド人」の既定描写とCLIP-cosine類似性の比較を行い、類似性のパターンを探索する。以上の結果から, 安定拡散は「キャスタレスネス」の系を永久に出力し, 高いキャスタネスと同一視し, 貧困の指標とキャスタブルに抑圧されたアイデンティティを表現していることが明らかとなった。特に、歴史的に結婚したダリト族に対するステレオタイプと表現上の害は、農村部に住み、常に抗議活動で顕著に描写されている。以上の結果から,T2I設計に対するキャストアウェアアプローチの必要性が浮き彫りにされ,デザインレコメンデーションで結論が得られた。

The surge in the popularity of text-to-image generators (T2Is) has been matched by extensive research into ensuring fairness and equitable outcomes, with a focus on how they impact society. However, such work has typically focused on globally-experienced identities or centered Western contexts. In this paper, we address interpretations, representations, and stereotypes surrounding a tragically underexplored context in T2I research: caste. We examine how the T2I Stable Diffusion displays people of various castes, and what professions they are depicted as performing. Generating 100 images per prompt, we perform CLIP-cosine similarity comparisons with default depictions of an 'Indian person' by Stable Diffusion, and explore patterns of similarity. Our findings reveal how Stable Diffusion outputs perpetuate systems of 'castelessness', equating Indianness with high-castes and depicting caste-oppressed identities with markers of poverty. In particular, we note the stereotyping and representational harm towards the historically-marginalized Dalits, prominently depicted as living in rural areas and always at protests. Our findings underscore a need for a caste-aware approach towards T2I design, and we conclude with design recommendations.

翻訳日:2024-08-06 19:30:18 公開日:2024-08-02

# 円箱に閉じ込められた電場をもつ二次元高調波発振器について

On the two-dimensional harmonic oscillator with an electric field confined to a circular box ( http://arxiv.org/abs/2408.01593v1 )

ライセンス: Link先を確認

Francisco M. Fernández, Javier Garcia, Norberto Aquino, Antonio Flores-Riveros,

(参考訳) 電界を不透明壁の円形箱に閉じ込めた量子力学的2次元高調波発振器を再検討する。エネルギースペクトルを得るために、多項式とガウス基底集合を持つレイリー・リッツ法を利用する。本稿では,近年の他の著者による結果と比較する。我々は、大小の箱半径の限界について議論し、摂動理論を用いていくつかの計算を行う。

We revisit the quantum-mechanical two-dimensional harmonic oscillator with an electric field confined to a circular box of impenetrable walls. In order to obtain the energy spectrum we resort to the Rayleigh-Ritz method with polynomial and Gaussian basis sets. We compare present results with those derived recently by other authors. We discuss the limits of large and small box radius and also do some calculations with perturbation theory.

翻訳日:2024-08-06 19:20:31 公開日:2024-08-02

# 「私はここでは表現できない」:ジェンダーのアイデンティティと国籍にまたがる表現的ハームを含む安定拡散出力のユーザ体験

"I don't see myself represented here at all": User Experiences of Stable Diffusion Outputs Containing Representational Harms across Gender Identities and Nationalities ( http://arxiv.org/abs/2408.01594v1 )

ライセンス: Link先を確認

Sourojit Ghosh, Nina Lutz, Aylin Caliskan,

(参考訳) 安定拡散のようなテキスト・ツー・イメージ・ジェネレータ(T2Is)の研究は、社会的バイアスや害を引き起こす可能性の増幅を実証してきたが、そのような研究は主に、害を経験する実際のユーザから情報を求めるのではなく、計算手法に依存しており、これは重要な知識ギャップである。本稿では,133人のクラウドソーシングデータと,多様な国・性別を対象とした14の半構造化インタビューを組み合わせることで,安定拡散に関する最大の人的研究を行う。集合内コサイン類似性階層の混合メソッドアプローチ(すなわち、同じプロンプトに対する複数の安定拡散出力を比較して、どの結果がプロンプトに最も近いかを調べる)と定性的セマンティック解析により、まず、安定拡散出力に対するユーザ期待と、そのような期待から遠く離れた画像を提供する「人」の安定拡散再帰によって証明された、生成されたものとの間の大きな不一致を示す。そして、この一般的な不満の発見を、我々の被験者、特に伝統的に疎外されたアイデンティティを持つ人々に対する安定拡散による表現的害の強調に拡張し、それらのアイデンティティに関する不正確でしばしば非人間的なステレオタイプを課す。我々は、安定拡散や他のT2Iの将来のバージョンを設計(再設計)するためのハーネスアウェアアプローチを提案する。

Though research into text-to-image generators (T2Is) such as Stable Diffusion has demonstrated their amplification of societal biases and potentials to cause harm, such research has primarily relied on computational methods instead of seeking information from real users who experience harm, which is a significant knowledge gap. In this paper, we conduct the largest human subjects study of Stable Diffusion, with a combination of crowdsourced data from 133 crowdworkers and 14 semi-structured interviews across diverse countries and genders. Through a mixed-methods approach of intra-set cosine similarity hierarchies (i.e., comparing multiple Stable Diffusion outputs for the same prompt with each other to examine which result is 'closest' to the prompt) and qualitative thematic analysis, we first demonstrate a large disconnect between user expectations for Stable Diffusion outputs with those generated, evidenced by a set of Stable Diffusion renditions of `a Person' providing images far away from such expectations. We then extend this finding of general dissatisfaction into highlighting representational harms caused by Stable Diffusion upon our subjects, especially those with traditionally marginalized identities, subjecting them to incorrect and often dehumanizing stereotypes about their identities. We provide recommendations for a harm-aware approach to (re)design future versions of Stable Diffusion and other T2Is.

翻訳日:2024-08-06 19:20:31 公開日:2024-08-02

# 社会的・敵対的データに基づく信頼できる機械学習

Trustworthy Machine Learning under Social and Adversarial Data Sources ( http://arxiv.org/abs/2408.01596v1 )

ライセンス: Link先を確認

Han Shao,

(参考訳) 機械学習は近年、驚くべきブレークスルーを目の当たりにした。機械学習が日常生活の様々な側面に浸透するにつれ、個人や組織はますますこれらのシステムと相互作用し、幅広い社会的・敵対的な行動を示すようになる。これらの振る舞いは、機械学習システムの振る舞いと性能に顕著な影響を与える可能性がある。具体的には、これらの相互作用の間、データは戦略的個人によって生成され、自己関心のデータ収集者によって収集され、おそらく敵の攻撃者によって汚染され、複数の目的を満たす予測器、モデル、ポリシーを作成するために使用される。その結果、ディープラーニングシステムの出力は、敵対的な例(Shafahi et al , 2018; Szegedy et al , 2013)に対するディープニューラルネットワークの感受性や、戦略的個人の存在下での古典的アルゴリズムのパフォーマンスの低下(Ahmadi et al , 2021)など、低下する可能性がある。これらの課題に対処することは、社会的環境における機械学習の成功に不可欠である。

Machine learning has witnessed remarkable breakthroughs in recent years. As machine learning permeates various aspects of daily life, individuals and organizations increasingly interact with these systems, exhibiting a wide range of social and adversarial behaviors. These behaviors may have a notable impact on the behavior and performance of machine learning systems. Specifically, during these interactions, data may be generated by strategic individuals, collected by self-interested data collectors, possibly poisoned by adversarial attackers, and used to create predictors, models, and policies satisfying multiple objectives. As a result, the machine learning systems' outputs might degrade, such as the susceptibility of deep neural networks to adversarial examples (Shafahi et al., 2018; Szegedy et al., 2013) and the diminished performance of classic algorithms in the presence of strategic individuals (Ahmadi et al., 2021). Addressing these challenges is imperative for the success of machine learning in societal settings.

翻訳日:2024-08-06 19:20:31 公開日:2024-08-02

# 物理インフォームド幾何対応ニューラル演算子

Physics-Informed Geometry-Aware Neural Operator ( http://arxiv.org/abs/2408.01600v1 )

ライセンス: Link先を確認

Weiheng Zhong, Hadi Meidani,

(参考訳) 工学設計の問題は、可変PDEパラメータとドメイン幾何学の下でパラメトリック部分微分方程式(PDE)を解くことである。近年、ニューラル演算子はPDE演算子を学習し、PDE解を素早く予測する。しかしながら、これらのニューラル演算子のトレーニングは通常、大きなデータセットを必要とする。これを解決するために、物理インフォームドトレーニングは、ニューラルネットワークを構築する代替方法を提供し、有限要素データ生成に伴う高い計算コストを排除している。それにもかかわらず、現在の物理インフォームドニューラルネットワークは、異なるドメインジオメトリや異なるPDEパラメータを扱う場合の制限に苦慮している。本研究では,PDEパラメータとドメインジオメトリの両方を同時に一般化する物理インフォーメーション幾何認識ニューラル演算子(PI-GANO)を提案する。ドメインの幾何学的特徴を捉えるためにジオメトリエンコーダを採用し、既存のDCONアーキテクチャにこのコンポーネントを統合するための新しいパイプラインを設計する。提案手法の精度と効率を数値計算により検証した。

Engineering design problems often involve solving parametric Partial Differential Equations (PDEs) under variable PDE parameters and domain geometry. Recently, neural operators have shown promise in learning PDE operators and quickly predicting the PDE solutions. However, training these neural operators typically requires large datasets, the acquisition of which can be prohibitively expensive. To overcome this, physics-informed training offers an alternative way of building neural operators, eliminating the high computational costs associated with Finite Element generation of training data. Nevertheless, current physics-informed neural operators struggle with limitations, either in handling varying domain geometries or varying PDE parameters. In this research, we introduce a novel method, the Physics-Informed Geometry-Aware Neural Operator (PI-GANO), designed to simultaneously generalize across both PDE parameters and domain geometries. We adopt a geometry encoder to capture the domain geometry features, and design a novel pipeline to integrate this component within the existing DCON architecture. Numerical results demonstrate the accuracy and efficiency of the proposed method.

翻訳日:2024-08-06 19:20:31 公開日:2024-08-02

# FIVBランキング:正しい方向へのミスステップ

FIVB ranking: Misstep in the right direction ( http://arxiv.org/abs/2408.01603v1 )

ライセンス: Link先を確認

Salma Tenni, Daniel Gomes de Pinho Zanco, Leszek Szczecinski,

(参考訳) この研究は統計フレームワークを使用して、2020年からF\'ed\'eration Internationale de Volleyball (FIVB)が使用しているランキングアルゴリズムを提示し、評価している。 FIVBランキングの健全な特徴は確率モデルを使用することであり、これは今後のゲームの確率を明示的に計算する。この明示的なモデリングは、公式ランキングの文脈では新しく、パラメータの最適性だけでなく、ランキングアルゴリズムとの関係についても検討する。解析は解析的手法と数値的手法の両方を用いて行われる。モデリングの観点からは、ホームフィールド・アドバンテージ(HFA)の使用は有用であり、ゲーム結果の重み付けは非生産的であると結論付けている。アルゴリズム自体に関して、現在使われている近似以外の理論的根拠を説明し、性能を改善する新しいパラメータを見つける方法について説明する。最後に,結果のアルゴリズムの実装と解釈を劇的に単純化する新しいモデルを提案する。

This work uses a statistical framework to present and evaluate the ranking algorithm that has been used by F\'ed\'eration Internationale de Volleyball (FIVB) since 2020. The salient feature of the FIVB ranking is the use of the probabilistic model, which explicitly calculates the probabilities of the games to come. This explicit modeling is new in the context of official ranking, and we study the optimality of its parameters as well as its relationship with the ranking algorithm as such. The analysis is carried out using both analytical and numerical methods. We conclude that, from the modeling perspective, the use of the home-field advantage (HFA) would be beneficial and that the weighting of the game results is counterproductive. Regarding the algorithm itself, we explain the rationale beyond the approximations currently used and explain how to find new parameters which improve the performance. Finally, we propose a new model that drastically simplifies both the implementation and interpretation of the resulting algorithm.

翻訳日:2024-08-06 19:20:31 公開日:2024-08-02

# CYBERSECEVAL 3:大規模言語モデルにおけるサイバーセキュリティリスクと能力の評価の改善

CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models ( http://arxiv.org/abs/2408.01605v1 )

ライセンス: Link先を確認

Shengye Wan, Cyrus Nikolaidis, Daniel Song, David Molnar, James Crnkovich, Jayson Grace, Manish Bhatt, Sahana Chennabasappa, Spencer Whitman, Stephanie Ding, Vlad Ionescu, Yue Li, Joshua Saxe,

(参考訳) LLMのための新しいセキュリティベンチマークであるCYBERSECEVAL 3をリリースし、LLMのサイバーセキュリティのリスクと能力を実証的に測定する議論を継続する。 CYBERSECEVAL 3は、サードパーティに対するリスクと、アプリケーション開発者とエンドユーザに対するリスクという、2つの幅広いカテゴリの8つの異なるリスクを評価します。これまでの研究と比較して、攻撃的セキュリティ機能に焦点を当てた新たな分野として、ソーシャルエンジニアリングの自動化、手動攻撃型サイバーオペレーションのスケーリング、自動攻撃型サイバーオペレーションがあります。本稿では,これらのベンチマークをLlama 3モデルと同時期LLMのスイートに適用し,リスクを軽減・回避できる可能性について論じる。

We are releasing a new suite of security benchmarks for LLMs, CYBERSECEVAL 3, to continue the conversation on empirically measuring LLM cybersecurity risks and capabilities. CYBERSECEVAL 3 assesses 8 different risks across two broad categories: risk to third parties, and risk to application developers and end users. Compared to previous work, we add new areas focused on offensive security capabilities: automated social engineering, scaling manual offensive cyber operations, and autonomous offensive cyber operations. In this paper we discuss applying these benchmarks to the Llama 3 models and a suite of contemporaneous state-of-the-art LLMs, enabling us to contextualize risks both with and without mitigations in place.

翻訳日:2024-08-06 19:20:31 公開日:2024-08-02

# Deep Learning Meets OBIA: Tasks, Challenges, Strategies, and Perspectives

Deep Learning Meets OBIA: Tasks, Challenges, Strategies, and Perspectives ( http://arxiv.org/abs/2408.01607v1 )

ライセンス: Link先を確認

Lei Ma, Ziyun Yan, Mengmeng Li, Tao Liu, Liqin Tan, Xuan Wang, Weiqiang He, Ruikun Wang, Guangjun He, Heng Lu, Thomas Blaschke,

(参考訳) 深層学習は、特にピクセルレベルのアプリケーションやパッチレベルのアプリケーションにおいて、リモートセンシングにおいて大きな注目を集めている。深層学習をオブジェクトベース画像解析(OBIA)に統合しようという試みは最初はあったが、その潜在能力は未解明のままである。本稿では、OBIAの利用がより広まるにつれて、深層学習の統合の有無にかかわらず、タスクサブドメインの包括的なレビューと拡張を行った。さらに、OBIA内での非構造化オブジェクトデータの直接処理におけるディープラーニングの限界に対処するための5つの一般的な戦略を特定し、要約し、また、いくつかの重要な研究方向性を推奨する。これらの取り組みの目標は、この魅力的だが見落とされがちな領域でのさらなる探索を刺激し、深層学習のOBIA処理ワークフローへの統合を促進することです。

Deep learning has gained significant attention in remote sensing, especially in pixel- or patch-level applications. Despite initial attempts to integrate deep learning into object-based image analysis (OBIA), its full potential remains largely unexplored. In this article, as OBIA usage becomes more widespread, we conducted a comprehensive review and expansion of its task subdomains, with or without the integration of deep learning. Furthermore, we have identified and summarized five prevailing strategies to address the challenge of deep learning's limitations in directly processing unstructured object data within OBIA, and this review also recommends some important future research directions. Our goal with these endeavors is to inspire more exploration in this fascinating yet overlooked area and facilitate the integration of deep learning into OBIA processing workflows.

翻訳日:2024-08-06 19:20:31 公開日:2024-08-02

# ポスト量子暗号(PQC)ネットワーク機器:PQC導入率の測定とマイグレーションパスの同定

Post-Quantum Cryptography (PQC) Network Instrument: Measuring PQC Adoption Rates and Identifying Migration Pathways ( http://arxiv.org/abs/2408.00054v2 )

ライセンス: Link先を確認

Jakub Sowa, Bach Hoang, Advaith Yeluru, Steven Qie, Anita Nikolich, Ravishankar Iyer, Phuong Cao,

(参考訳) 量子耐性暗号ネットワークプロトコルやポスト量子暗号(PQC)を採用する問題は、量子コンピューティングの民主化において極めて重要である。現実的な量子コンピュータは今後数十年で古典的な暗号化を破ることになるため、この問題は緊急である。過去の暗号化されたデータは、既に収集されており、近い将来に復号化できる。量子後暗号を採用する主な課題は、アルゴリズムの複雑さとハードウェア/ソフトウェア/ネットワークの実装である。既存のサイバーインフラ構造がポスト量子暗号をサポートするのかという大きな疑問は、まだ答えられていない。本論文は以下のとおりである。一イリノイ大学アーバナ・シャンペーン校の国立計算応用センター(NCSA)に置かれる新規な量子暗号(PQC)ネットワーク機器及びFABRICテストベッドの一部の設計二幅広いネットワークプロトコル(セキュアシェル、SSH、トランスポート層セキュリティ、TLS等)におけるPQC採用率に関する最新の結果。三重要な科学的応用(例えば、OpenSSH又はSciTokens)におけるPQCの実施の現状四量子抵抗の課題、及び五新規攻撃の可能性についての議論これは、全国規模のスーパーコンピュータセンターとFABRICテストベッドにおけるPQC導入の大規模測定としては初めてである。 OARNET, GTT, Google Fiber Webpass (U.S.) や Uppsala Lans Landsting (Sweden) といった主要なインターネットサービスプロバイダや自律システム(ASes)から来るNCSAにおけるOpenSSHコネクションの初回採用率は0.029%(20,556,816のうち6,044件)に達し,2023～2024年には総じて採用率が増加した。解析により、電流アプリケーションを量子抵抗に移行する経路を同定する。

The problem of adopting quantum-resistant cryptographic network protocols or post-quantum cryptography (PQC) is critically important to democratizing quantum computing. The problem is urgent because practical quantum computers will break classical encryption in the next few decades. Past encrypted data has already been collected and can be decrypted in the near future. The main challenges of adopting post-quantum cryptography lie in algorithmic complexity and hardware/software/network implementation. The grand question of how existing cyberinfrastructure will support post-quantum cryptography remains unanswered. This paper describes: i) the design of a novel Post-Quantum Cryptography (PQC) network instrument placed at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign and a part of the FABRIC testbed; ii) the latest results on PQC adoption rate across a wide spectrum of network protocols (Secure Shell -- SSH, Transport Layer Security -- TLS, etc.); iii) the current state of PQC implementation in key scientific applications (e.g., OpenSSH or SciTokens); iv) the challenges of being quantum-resistant; and v) discussion of potential novel attacks. This is the first large-scale measurement of PQC adoption at national-scale supercomputing centers and FABRIC testbeds. Our results show that only OpenSSH and Google Chrome have successfully implemented PQC and achieved an initial adoption rate of 0.029% (6,044 out of 20,556,816) for OpenSSH connections at NCSA coming from major Internet Service Providers or Autonomous Systems (ASes) such as OARNET, GTT, Google Fiber Webpass (U.S.) and Uppsala Lans Landsting (Sweden), with an overall increasing adoption rate year-over-year for 2023-2024. Our analyses identify pathways to migrate current applications to be quantum-resistant.

翻訳日:2024-08-06 12:36:51 公開日:2024-08-02

# 単純度上のWeighed l1:圧縮センシングは局所性に合致する

Weighed l1 on the simplex: Compressive sensing meets locality ( http://arxiv.org/abs/2104.13894v2 )

ライセンス: Link先を確認

Abiy Tasissa, Pranay Tankala, Demba Ba,

(参考訳) スパース多様体学習アルゴリズムは、多様体学習とスパース最適化の技法を組み合わせて、下流タスクに使用できる特徴を学習する。圧縮センシングの標準設定は、この設定に直ちに適用できない。データ固有の幾何学構造のため、辞書原子は冗長であり、制限された等尺性やコヒーレンス条件を満たすことができない。さらに、多様体学習は標準的な$\ell_1$最小化問題に反映されない局所幾何学の学習を強調する。辞書ベースの多様体学習に適した近傍原子による表現を促進するために、重み付き$\ell_0$と重み付き$\ell_1$メトリクスを提案する。データがデラウネー三角測量から生成されると仮定すると、重み付き$\ell_0$と重み付き$\ell_1$の等価性を示す。本稿では,辞書とスパース係数を学習する最適化プログラムについて論じ,合成および実データに対する正規化の有用性を実証する。

Sparse manifold learning algorithms combine techniques in manifold learning and sparse optimization to learn features that could be utilized for downstream tasks. The standard setting of compressive sensing can not be immediately applied to this setup. Due to the intrinsic geometric structure of data, dictionary atoms might be redundant and do not satisfy the restricted isometry property or coherence condition. In addition, manifold learning emphasizes learning local geometry which is not reflected in a standard $\ell_1$ minimization problem. We propose weighted $\ell_0$ and weighted $\ell_1$ metrics that encourage representation via neighborhood atoms suited for dictionary based manifold learning. Assuming that the data is generated from Delaunay triangulation, we show the equivalence of weighted $\ell_0$ and weighted $\ell_1$. We discuss an optimization program that learns the dictionaries and sparse coefficients and demonstrate the utility of our regularization on synthetic and real datasets.

翻訳日:2024-08-05 19:02:21 公開日:2024-08-02

# オンライン学習における最適政策評価のための二重ロバスト区間推定

Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning ( http://arxiv.org/abs/2110.15501v4 )

ライセンス: Link先を確認

Ye Shen, Hengrui Cai, Rui Song,

(参考訳) オンライン実験の初期段階における重要な指示と環境からのタイムリーなフィードバックを提供するため、医療や経済学など多くの分野において、進行中の政策のパフォーマンスを評価することが重要な役割を担っている。オンライン学習における政策評価は、最適政策(すなわち、その価値)の平均的な結果をリアルタイムで推測することによって、注目を集める。しかし、このような問題は、オンライン環境で生成された依存データ、未知の最適ポリシー、適応実験における複雑な探索と搾取のトレードオフなど、特に困難である。本稿では,オンライン学習における政策評価におけるこれらの課題を克服することを目的とする。我々は、一般的に使用される帯域幅アルゴリズムの下で、最適でない動作を探索する確率を定量化する探索の確率を明示的に導出する。この確率を用いて、オンラインの条件付き平均推定器を各動作下で有効に推定し、オンライン学習において、推定された最適ポリシーの下で値を推定する2つの頑健な間隔推定法(DREAM)を開発する。提案した値推定器は、一貫性の二重保護を提供し、ウォルド型信頼区間が提供される漸近正規である。提案手法の実証的妥当性を示すため,大規模なシミュレーション研究と実データ応用を行った。

Evaluating the performance of an ongoing policy plays a vital role in many areas such as medicine and economics, to provide crucial instructions on the early-stop of the online experiment and timely feedback from the environment. Policy evaluation in online learning thus attracts increasing attention by inferring the mean outcome of the optimal policy (i.e., the value) in real-time. Yet, such a problem is particularly challenging due to the dependent data generated in the online environment, the unknown optimal policy, and the complex exploration and exploitation trade-off in the adaptive experiment. In this paper, we aim to overcome these difficulties in policy evaluation for online learning. We explicitly derive the probability of exploration that quantifies the probability of exploring non-optimal actions under commonly used bandit algorithms. We use this probability to conduct valid inference on the online conditional mean estimator under each action and develop the doubly robust interval estimation (DREAM) method to infer the value under the estimated optimal policy in online learning. The proposed value estimator provides double protection for consistency and is asymptotically normal with a Wald-type confidence interval provided. Extensive simulation studies and real data applications are conducted to demonstrate the empirical validity of the proposed DREAM method.

翻訳日:2024-08-05 19:02:21 公開日:2024-08-02

# 質問分解によるテキストからSQLへのパーシングの微妙な改善

Weakly Supervised Text-to-SQL Parsing through Question Decomposition ( http://arxiv.org/abs/2112.06311v4 )

ライセンス: Link先を確認

Tomer Wolfson, Daniel Deutch, Jonathan Berant,

(参考訳) テキストからSQLへのパーサは、非専門家が懸命にリレーショナルデータをクエリできるようにすることに不可欠である。対照的に、このようなパーサーを訓練するには、一般的に、自然言語(NL)の発話を対応するSQLクエリとアノテートする専門知識が必要である。本研究では,テキストからSQLへのパーサの学習において弱い監督手法を提案する。我々は最近提案されたQDMR(QDMR)という,NL言語と形式的クエリ言語の間の中間的表現を利用した。質問、QDMR構造(非専門家によって注釈付けされたり、自動予測されたりする)、そして回答が与えられたら、テキストからSQLモデルをトレーニングするのに使用されるSQLクエリを自動的に合成できます。 5つのベンチマークデータセットで実験することで、アプローチをテストします。その結果, 弱教師付きモデルは, 注釈付きNL-SQLデータで訓練したモデルと競合することがわかった。全体として、SQLアノテーションをゼロにしながら、テキストからSQLへのパーサを効果的にトレーニングします。

Text-to-SQL parsers are crucial in enabling non-experts to effortlessly query relational data. Training such parsers, by contrast, generally requires expertise in annotating natural language (NL) utterances with corresponding SQL queries. In this work, we propose a weak supervision approach for training text-to-SQL parsers. We take advantage of the recently proposed question meaning representation called QDMR, an intermediate between NL and formal query languages. Given questions, their QDMR structures (annotated by non-experts or automatically predicted), and the answers, we are able to automatically synthesize SQL queries that are used to train text-to-SQL models. We test our approach by experimenting on five benchmark datasets. Our results show that the weakly supervised models perform competitively with those trained on annotated NL-SQL data. Overall, we effectively train text-to-SQL parsers, while using zero SQL annotations.

翻訳日:2024-08-05 19:02:21 公開日:2024-08-02

# 視覚変換器:セマンティックセグメンテーションからディエンス予測へ

Vision Transformers: From Semantic Segmentation to Dense Prediction ( http://arxiv.org/abs/2207.09339v4 )

ライセンス: Link先を確認

Li Zhang, Jiachen Lu, Sixiao Zheng, Xinxuan Zhao, Xiatian Zhu, Yanwei Fu, Tao Xiang, Jianfeng Feng, Philip H. S. Torr,

(参考訳) 画像分類における視覚変換器(ViT)の出現は、視覚表現学習の方法論をシフトさせた。特に、ViTは、レイヤやその他の代替(例えば、大きなカーネルやアトラスな畳み込み)にわたるCNNの受容野の増加と比較して、すべてのイメージパッチにわたる層ごとの完全な受容野で視覚的表現を学ぶ。本研究では,視覚的密接な予測(セマンティックセグメンテーションなど)のために,ViTのグローバルな文脈学習ポテンシャルを初めて探求する。我々のモチベーションは、全受動的フィールド層でグローバルなコンテキストを階層的に学習することで、高密度な予測タスクにおいて重要な、より強力な長距離依存性情報を取得することができることである。まず,局所的な畳み込みや解像度の低下を伴わないバニラ ViT をパッチのシーケンスとして画像の符号化を行うことで,セマンティックセグメンテーションにおいてより強力な視覚表現が得られることを示す。例えば、Segmentation TRansformer (SETR) と呼ばれ、ADE20K (50.28% mIoU, the first position in the test leaderboard on the submit) を抜粋し、Cityscapes で競争力を発揮する。しかし、基本的なViTアーキテクチャは、ピラミッド構造の欠如、高い計算要求、ローカルコンテキストの不足などにより、オブジェクト検出やインスタンスセグメンテーションといった広範囲にわたる予測アプリケーションでは不十分である。一般の高密度視覚予測タスクにコスト効率で対処するために、階層型局所グロバル変換器(HLG)のファミリーを更に定式化し、窓内部の局所的な注意とピラミッド建築における窓全体のグローバルな注意を特徴とする。画像分類だけでなく,オブジェクト検出やインスタンス分割,セマンティックセマンティックセマンティックセマンティクスなど,多種多種多様な予測タスクにおいて,提案手法が魅力的な性能を発揮することを示す。

The emergence of vision transformers (ViTs) in image classification has shifted the methodologies for visual representation learning. In particular, ViTs learn visual representation at full receptive field per layer across all the image patches, in comparison to the increasing receptive fields of CNNs across layers and other alternatives (e.g., large kernels and atrous convolution). In this work, for the first time we explore the global context learning potentials of ViTs for dense visual prediction (e.g., semantic segmentation). Our motivation is that through learning global context at full receptive field layer by layer, ViTs may capture stronger long-range dependency information, critical for dense prediction tasks. We first demonstrate that encoding an image as a sequence of patches, a vanilla ViT without local convolution and resolution reduction can yield stronger visual representation for semantic segmentation. For example, our model, termed as SEgmentation TRansformer (SETR), excels on ADE20K (50.28% mIoU, the first position in the test leaderboard on the day of submission) and performs competitively on Cityscapes. However, the basic ViT architecture falls short in broader dense prediction applications, such as object detection and instance segmentation, due to its lack of a pyramidal structure, high computational demand, and insufficient local context. For tackling general dense visual prediction tasks in a cost-effective manner, we further formulate a family of Hierarchical Local-Global (HLG) Transformers, characterized by local attention within windows and global-attention across windows in a pyramidal architecture. Extensive experiments show that our methods achieve appealing performance on a variety of dense prediction tasks (e.g., object detection and instance segmentation and semantic segmentation) as well as image classification.

翻訳日:2024-08-05 19:02:21 公開日:2024-08-02

# ベイジアンネットワークによるラグビーのモデル化 : 実践的アプローチ

Modelling Assessment Rubrics through Bayesian Networks: a Pragmatic Approach ( http://arxiv.org/abs/2209.05467v3 )

ライセンス: Link先を確認

Francesca Mangili, Giorgia Adorni, Alberto Piatti, Claudio Bonesana, Alessandro Antonucci,

(参考訳) 知的学習システムにおける学習能力の自動評価は基本的な課題である。評価ルーブリックは典型的には、関連する能力と能力レベルを効果的に記述する。本稿では,学習者モデルを直接抽出する手法を提案する。このモデルはベイズ的ネットワークに基づいており、モデルのパラメータの数を減らすために不確実性(しばしばノイズゲートと呼ばれる)を持つ論理ゲートを利用する。本稿では,コンピュータ思考のスキルをテストするために開発された活動の人的評価を自動化するために,この手法を適用する方法について述べる。評価ルーブリックから始まるモデルの簡単な適用により、複数のタスクの迅速な自動化が可能となり、適応的アセスメントツールやインテリジェントなチューリングシステムにおいて、より容易に利用できるようになる。

Automatic assessment of learner competencies is a fundamental task in intelligent tutoring systems. An assessment rubric typically and effectively describes relevant competencies and competence levels. This paper presents an approach to deriving a learner model directly from an assessment rubric defining some (partial) ordering of competence levels. The model is based on Bayesian networks and exploits logical gates with uncertainty (often referred to as noisy gates) to reduce the number of parameters of the model, so to simplify their elicitation by experts and allow real-time inference in intelligent tutoring systems. We illustrate how the approach can be applied to automatize the human assessment of an activity developed for testing computational thinking skills. The simple elicitation of the model starting from the assessment rubric opens up the possibility of quickly automating the assessment of several tasks, making them more easily exploitable in the context of adaptive assessment tools and intelligent tutoring systems.

翻訳日:2024-08-05 19:02:21 公開日:2024-08-02

# 時空間データに対する混合移動平均場誘導学習

Mixed moving average field guided learning for spatio-temporal data ( http://arxiv.org/abs/2301.00736v4 )

ライセンス: Link先を確認

Imma Valentina Curato, Orkun Furat, Lorenzo Proietti, Bennet Stroeh,

(参考訳) 混合移動平均場は時空間データのための多用途モデリングクラスである。しかし、それらの分布は一般には知られていない。このモデリングの前提のもと、我々は新しい時空間埋め込みと一般化されたベイズアルゴリズムを用いてアンサンブル予測を行う理論誘導機械学習アプローチを定義する。リプシッツ予測器を用いて、バッチ学習環境における固定時間および任意のPACベイズ境界を決定する。因果予測の実行は、空間的・時間的短距離・長距離依存データへの潜在的な応用として、我々の方法論のハイライトである。次に,線形予測器と,時空間Ornstein-Uhlenbeckプロセスからシミュレーションしたデータセットを用いて,学習手法の性能を検証した。

Influenced mixed moving average fields are a versatile modeling class for spatio-temporal data. However, their predictive distribution is not generally known. Under this modeling assumption, we define a novel spatio-temporal embedding and a theory-guided machine learning approach that employs a generalized Bayesian algorithm to make ensemble forecasts. We use Lipschitz predictors and determine fixed-time and any-time PAC Bayesian bounds in the batch learning setting. Performing causal forecast is a highlight of our methodology as its potential application to data with spatial and temporal short and long-range dependence. We then test the performance of our learning methodology by using linear predictors and data sets simulated from a spatio-temporal Ornstein-Uhlenbeck process.

翻訳日:2024-08-05 19:02:21 公開日:2024-08-02

# テキスト生成のための再パラメータ化離散拡散モデル

A Reparameterized Discrete Diffusion Model for Text Generation ( http://arxiv.org/abs/2302.05737v3 )

ライセンス: Link先を確認

Lin Zheng, Jianbo Yuan, Lei Yu, Lingpeng Kong,

(参考訳) 本研究は, 離散拡散確率モデルと自然言語生成への応用に関する研究である。我々は、離散拡散過程からサンプリングの代替的かつ等価な定式化を導き、この知見を活用して、再パラメータ化された離散拡散モデルのファミリーを開発する。導出された汎用フレームワークは非常に柔軟で、離散拡散モデルにおける生成プロセスの新たな視点を提供し、より効果的なトレーニングと復号化技術を備えている。本研究では,既存の拡散モデルに対して,テキスト生成能力を評価するための広範囲な実験を行った。

This work studies discrete diffusion probabilistic models with applications to natural language generation. We derive an alternative yet equivalent formulation of the sampling from discrete diffusion processes and leverage this insight to develop a family of reparameterized discrete diffusion models. The derived generic framework is highly flexible, offers a fresh perspective of the generation process in discrete diffusion models, and features more effective training and decoding techniques. We conduct extensive experiments to evaluate the text generation capability of our model, demonstrating significant improvements over existing diffusion models.

翻訳日:2024-08-05 19:02:21 公開日:2024-08-02

# 自己監督型ハイブリッドディープラーニングによるロバストミリメートルビームフォーミング

Robust Millimeter Beamforming via Self-Supervised Hybrid Deep Learning ( http://arxiv.org/abs/2303.12653v3 )

ライセンス: Link先を確認

Fenghao Zhu, Bohao Wang, Zhaohui Yang, Chongwen Huang, Zhaoyang Zhang, George C. Alexandropoulos, Chau Yuen, Merouane Debbah,

(参考訳) 大型アンテナアレイを用いたビームフォーミングは近年広く使われており、これは5Gと6Gの重要な部分として認識されている。このように、様々な技術が活用され、パフォーマンス、例えば、ディープラーニング、高度な最適化アルゴリズムなどが改善される。ディープラーニングによるこれまでの多くの研究シナリオのパフォーマンスは非常に魅力的なものだが、通常、環境やデータセットを変更すると急速に低下する。したがって、強力なロバスト性を持つ効率的なビームフォーミングネットワークを設計することは、インテリジェント無線通信にとってオープンな問題である。本稿では,頑健なビームフォーミング自己教師ネットワークを提案し,様々なシナリオで異なる2種類のデータセットで検証する。シミュレーションの結果、従来のDeepMIMOとWAIR-Dデータセットの両方において、ハイブリッド学習を用いた自己教師型ネットワークは、様々な環境下で強い堅牢性を持つことがわかった。また,このようなハイブリッド学習の合理性を説明する原理を提案する。

Beamforming with large-scale antenna arrays has been widely used in recent years, which is acknowledged as an important part in 5G and incoming 6G. Thus, various techniques are leveraged to improve its performance, e.g., deep learning, advanced optimization algorithms, etc. Although its performance in many previous research scenarios with deep learning is quite attractive, usually it drops rapidly when the environment or dataset is changed. Therefore, designing effective beamforming network with strong robustness is an open issue for the intelligent wireless communications. In this paper, we propose a robust beamforming self-supervised network, and verify it in two kinds of different datasets with various scenarios. Simulation results show that the proposed self-supervised network with hybrid learning performs well in both classic DeepMIMO and new WAIR-D dataset with the strong robustness under the various environments. Also, we present the principle to explain the rationality of this kind of hybrid learning, which is instructive to apply with more kinds of datasets.

翻訳日:2024-08-05 19:02:21 公開日:2024-08-02

# Adaptive Spiking Encoder-Decoder Network を用いた高精度かつ効率的なイベントベースセマンティックセマンティックセグメンテーション

Accurate and Efficient Event-based Semantic Segmentation Using Adaptive Spiking Encoder-Decoder Network ( http://arxiv.org/abs/2304.11857v3 )

ライセンス: Link先を確認

Rui Zhang, Luziwei Leng, Kaiwei Che, Hu Zhang, Jie Cheng, Qinghai Guo, Jiangxing Liao, Ran Cheng,

(参考訳) 低消費電力、イベント駆動型計算、本質的な時間ダイナミクスで知られるスパイキングニューラルネットワーク(SNN)は、イベントベースのセンサーから動的に非同期な信号を処理するための有望なソリューションとして浮上している。その可能性にもかかわらず、SNNはトレーニングとアーキテクチャ設計の課題に直面しており、人工知能ニューラルネットワーク(ANN)と比較して、イベントベースの高密度予測タスクに挑戦する場合には、パフォーマンスが制限される。本研究では,大規模イベントベースセマンティックセマンティックセグメンテーションタスクのための効率的なスパイキングエンコーダデコーダネットワーク(SpikingEDN)を開発した。動的イベントストリームからの学習効率を向上させるために,適応しきい値を用いて,ストリーミング推論におけるネットワークの精度,疎性,堅牢性を向上させる。さらに,スパースイベントやマルチモーダル入力の表現性の向上に特化して,ネットワーク性能を著しく向上させる2経路空間適応変調モジュールを開発した。私たちのSpkingEDNは、DDD17データセットで72.57\%、より大きなDSEC-Semanticデータセットで58.32\%の平均的な結合(MIoU)を獲得し、最先端のANNと競合する結果を示しながら、計算リソースを著しく少なくする。我々の結果は、イベントベースの視覚アプリケーションにおけるSNNの未解決の可能性に光を当てた。ソースコードは一般公開される予定だ。

Spiking neural networks (SNNs), known for their low-power, event-driven computation and intrinsic temporal dynamics, are emerging as promising solutions for processing dynamic, asynchronous signals from event-based sensors. Despite their potential, SNNs face challenges in training and architectural design, resulting in limited performance in challenging event-based dense prediction tasks compared to artificial neural networks (ANNs). In this work, we develop an efficient spiking encoder-decoder network (SpikingEDN) for large-scale event-based semantic segmentation tasks. To enhance the learning efficiency from dynamic event streams, we harness the adaptive threshold which improves network accuracy, sparsity and robustness in streaming inference. Moreover, we develop a dual-path Spiking Spatially-Adaptive Modulation module, which is specifically tailored to enhance the representation of sparse events and multi-modal inputs, thereby considerably improving network performance. Our SpikingEDN attains a mean intersection over union (MIoU) of 72.57\% on the DDD17 dataset and 58.32\% on the larger DSEC-Semantic dataset, showing competitive results to the state-of-the-art ANNs while requiring substantially fewer computational resources. Our results shed light on the untapped potential of SNNs in event-based vision applications. The source code will be made publicly available.

翻訳日:2024-08-05 19:02:21 公開日:2024-08-02

# 電気ネットワークの幾何学的記述とFaddeev-Jackiw量子化

Geometrical description and Faddeev-Jackiw quantization of electrical networks ( http://arxiv.org/abs/2304.12252v3 )

ライセンス: Link先を確認

A. Parra-Rodriguez, I. L. Egusquiza,

(参考訳) ラム要素電気回路理論では、メディアの存在下でマクスウェルの方程式を解く問題は、2つの方程式に還元される: 局所幾何学と閉じ込められたエネルギー密度の力学を包含する構成方程式と、より大きく、位相的なスケールでの電荷とエネルギーの保存を強制するキルヒホフ方程式である。我々は、ラグランジアンおよびレイリー散逸関数から導出される1次微分方程式として、一般ランプ素子電気回路の力学の幾何学的、体系的な新しい記述を開発する。 Faddeev-Jackiw 法により、一般ネットワークのハミルトン的記述を探索する際に生じる特異点を特定し、分類する。我々の解の核は、回路状態が表現可能である還元多様体の正しい同定、例えば、コンパクトな多様体の存在を含むフラックスと電荷の混合に依存する。純ノード束あるいはループ電荷変数が始点構成空間として使われた場合、不規則かつ特異な非線形および非相互回路のハミルトン的記述を得るために、我々の完全プログラマブルな方法を適用する。また、エネルギー要素の分岐変数に対するトポロジーの特定の割り当てを提案し、手順への入力として使用すると、古典的な記述やより関連する量子回路のスペクトルと一致した結果が得られる。この研究は、電気ネットワーク理論の様々な幾何学的イメージを統一し、例えば超伝導量子チップの正確なハミルトン記述の計算を自動化するのに有用であることが証明される。

In lumped-element electrical circuit theory, the problem of solving Maxwell's equations in the presence of media is reduced to two sets of equations, the constitutive equations encapsulating local geometry and dynamics of a confined energy density, and the Kirchhoff equations enforcing conservation of charge and energy in a larger, topological, scale. We develop a new geometric and systematic description of the dynamics of general lumped-element electrical circuits as first order differential equations, derivable from a Lagrangian and a Rayleigh dissipation function. Through the Faddeev-Jackiw method we identify and classify the singularities that arise in the search for Hamiltonian descriptions of general networks. The core of our solution relies on the correct identification of the reduced manifold in which the circuit state is expressible, e.g., a mix of flux and charge degrees of freedom, including the presence of compact ones. We apply our fully programmable method to obtain (canonically quantizable) Hamiltonian descriptions of nonlinear and nonreciprocal circuits which would be cumbersome/singular if pure node-flux or loop-charge variables were used as a starting configuration space. We also propose a specific assignment of topology for the branch variables of energetic elements, that when used as input to the procedure gives results consistent with classical descriptions as well as with spectra of more involved quantum circuits. This work unifies diverse existent geometrical pictures of electrical network theory, and will prove useful, for instance, to automatize the computation of exact Hamiltonian descriptions of superconducting quantum chips.

翻訳日:2024-08-05 19:02:21 公開日:2024-08-02

# 思考の複数の連鎖に対するメタ推論による質問への回答

Answering Questions by Meta-Reasoning over Multiple Chains of Thought ( http://arxiv.org/abs/2304.13007v4 )

ライセンス: Link先を確認

Ori Yoran, Tomer Wolfson, Ben Bogin, Uri Katz, Daniel Deutch, Jonathan Berant,

(参考訳) マルチホップ質問応答(QA)のための現代のシステムは、最終回答に到達する前に、質問を一連の推論ステップ、すなわちチェーン・オブ・シント(CoT)に分割する。多くの場合、複数の連鎖が最終回答の投票機構を通じてサンプリングされ集約されるが、中間ステップ自体は破棄される。このような手法は性能を向上させるが、チェーン間の中間ステップの関係を考慮せず、予測された解に対する統一的な説明を提供しない。 MCR(Multi-Chain Reasoning)は,大規模言語モデルに対して,回答を集約するのではなく,複数の思考チェーン上でメタ推論を行うアプローチである。 MCRは、異なる推論連鎖を調べ、それら間で情報を混合し、説明を生成し、答えを予測する際に最も関係のある事実を選択する。 MCRは7つのマルチホップQAデータセットで強いベースラインを上回ります。さらに,本分析の結果から,MCRの説明は高品質であり,人間が回答を検証できることが判明した。

Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-of-thought (CoT), before arriving at a final answer. Often, multiple chains are sampled and aggregated through a voting mechanism over the final answers, but the intermediate steps themselves are discarded. While such approaches improve performance, they do not consider the relations between intermediate steps across chains and do not provide a unified explanation for the predicted answer. We introduce Multi-Chain Reasoning (MCR), an approach which prompts large language models to meta-reason over multiple chains of thought, rather than aggregating their answers. MCR examines different reasoning chains, mixes information between them and selects the most relevant facts in generating an explanation and predicting the answer. MCR outperforms strong baselines on 7 multi-hop QA datasets. Moreover, our analysis reveals that MCR explanations exhibit high quality, enabling humans to verify its answers.

翻訳日:2024-08-05 19:02:21 公開日:2024-08-02

# 条件量子温度測定 --少ない測定による精度の向上-

Conditional quantum thermometry -- enhancing precision by measuring less ( http://arxiv.org/abs/2304.13595v2 )

ライセンス: Link先を確認

Akira Sone, Diogo O. Soares-Pinto, Sebastian Deffner,

(参考訳) 量子系の温度を正確に測定することは難しい課題である。量子情報の数学的特異性により、無限の精度で測定することは事実上不可能である。本稿では、利用可能な測定装置のポインター状態に条件付けされた一般化熱状態を紹介する。この条件付き熱状態は、量子温度測定においてギブス状態よりも優れていることを示す。拡張精度の起源は、ウィグナー・ヤネーゼ・ダイソンスキュー情報によって定量化される非対称性で求めることができる。この追加資源は, 完全資源理論解析においてさらに解明され, 対象状態を条件付き熱状態に変換するギブス保存マップが存在することを示す。条件付き熱状態と同じターゲット状態の量子J偏差を量子熱に関連付ける。

Taking accurate measurements of the temperature of quantum systems is a challenging task. The mathematical peculiarities of quantum information make it virtually impossible to measure with infinite precision. In the present paper, we introduce a generalize thermal state, which is conditioned on the pointer states of the available measurement apparatus. We show that this conditional thermal state outperforms the Gibbs state in quantum thermometry. The origin for the enhanced precision can be sought in its asymmetry quantified by the Wigner-Yanase-Dyson skew information. This additional resource is further clarified in a fully resource-theoretic analysis, and we show that there is a Gibbs-preserving map to convert a target state into the conditional thermal state. We relate the quantum J-divergence between the conditional thermal state and the same target state to quantum heat.

翻訳日:2024-08-05 18:53:04 公開日:2024-08-02

# ボソン-ボソン相互作用を持たない単一ボソンモードによる臨界センシング

Critical sensing with a single bosonic mode without boson-boson interactions ( http://arxiv.org/abs/2305.17656v3 )

ライセンス: Link先を確認

Ken Chen, Jia-Hao Lü, Xin Zhu, Hao-Long Zhang, Wen Ning, Zhen-Biao Yang, Shi-Biao Zheng,

(参考訳) 量子系の臨界現象は、量子センシングの強化に有用である。しかし、臨界性増強の実験的な実現は、熱力学やスケーリングの限界を含む厳密な要件、相互作用する量子スーパシステムや粒子の制御など、ごく少数のシステムに限られている。ここでは、これらの条件をどちらも必要としない単純な臨界量子センシング方式を提案する。臨界系は、多くの非相互作用ボソンを含む1つのパラメトリック駆動ボソニックモードで実現される。我々は、量子フィッシャー情報を計算し、臨界を許容するエンハンスメントを確認するシミュレーションを行う。さらに、制御パラメータの変動に対する二次関数の1つの応答について詳述する。数値的な結果から,その逆分散は臨界点における変動挙動を示すことが明らかとなった。現在利用可能なパラメトリック駆動の制御技術に基づいて,本手法はイオントラップや超伝導回路など,様々なシステムで実現可能であることを期待する。

Critical phenomena of quantum systems are useful for enhancement of quantum sensing. However, experimental realizations of criticality enhancement have been confined to very few systems, owing to the stringent requirements, including the thermodynamical or scaling limit, and fine control of interacting quantum susystems or particles. We here propose a simple critical quantum sensing scheme that requires neither of these conditions. The critical system is realized with a single parametrically-driven bosonic mode involving many non-interacting bosons. We calculate the quantum Fisher information, and perform a simulation, which confirms the criticality-enabled enhancement. We further detail the response of one of the quadratures to the variation of the control parameter. The numerical results reveal that its inverted variance exhibits a diverging behavior at the critical point. Based on the presently available control techniques of parametric driving, we expect our scheme can be realized in different systems, e.g., ion traps and superconducting circuits.

翻訳日:2024-08-05 18:53:04 公開日:2024-08-02

# フローガイド型ナノスケールローカライゼーションの設計空間の展望

Insights from the Design Space Exploration of Flow-Guided Nanoscale Localization ( http://arxiv.org/abs/2305.18493v3 )

ライセンス: Link先を確認

Filip Lemic, Gerard Calvo Bartra, Arnau Brosa López, Jorge Torres Gómez, Jakob Struye, Falko Dressler, Sergi Abadal, Xavier Costa Perez,

(参考訳) Terahertz(THz)をベースとした無線通信機能を備えたナノデバイスは、ヒトの血流内におけるフロー誘導局在のプライマーを提供する。このようなローカライゼーションは、イベント自体に知覚されたイベントの位置を割り当てることを可能にし、早期かつ正確な診断の線に沿って利益を提供し、コストと侵襲性を低減させる。フロー誘導型ローカライゼーションはまだ初歩的な段階であり、この問題を対象とする研究はごくわずかである。それにもかかわらず、提案手法の性能評価は、通常、単一の性能指標に沿って、そのようなスケール(例えば、ナノデバイスの限られたエネルギー)と、そのような困難な環境(例えば、体内のTHz伝搬の極端減衰)で関係する様々な側面を無視する非標準化方法で既に実施されている。このように、これらの評価は現実主義のレベルが低く、客観的に比較することはできない。この問題に対処するために、我々はシナリオの環境とスケールに関連する特質を説明し、その精度や信頼性などの不均一なパフォーマンス指標に沿って、最先端のフロー誘導型ローカライゼーションアプローチの2つの性能を評価する。

Nanodevices with Terahertz (THz)-based wireless communication capabilities are providing a primer for flow-guided localization within the human bloodstreams. Such localization is allowing for assigning the locations of sensed events with the events themselves, providing benefits along the lines of early and precise diagnostics, and reduced costs and invasiveness. Flow-guided localization is still in a rudimentary phase, with only a handful of works targeting the problem. Nonetheless, the performance assessments of the proposed solutions are already carried out in a non-standardized way, usually along a single performance metric, and ignoring various aspects that are relevant at such a scale (e.g., nanodevices' limited energy) and for such a challenging environment (e.g., extreme attenuation of in-body THz propagation). As such, these assessments feature low levels of realism and cannot be compared in an objective way. Toward addressing this issue, we account for the environmental and scale-related peculiarities of the scenario and assess the performance of two state-of-the-art flow-guided localization approaches along a set of heterogeneous performance metrics such as the accuracy and reliability of localization.

翻訳日:2024-08-05 18:53:04 公開日:2024-08-02

# ランダム階層グラフにおける量子ウォークの指数的高速化

Exponential speedups for quantum walks in random hierarchical graphs ( http://arxiv.org/abs/2307.15062v2 )

ライセンス: Link先を確認

Shankar Balasubramanian, Tongyang Li, Aram Harrow,

(参考訳) 量子アルゴリズムの指数的スピードアップは知られていないが、これらはさらに少ないファミリーに分類される傾向がある。一般化に抵抗するスピードアップの1つは、Childs, Cleve, Deotto, Farhi, Gutmann, Spielman が溶接木グラフを横切るために量子ウォークを使うことである。これを、頂点が$d$次元格子に従って配置された「超頂点」にグループ化されるような、階層グラフの大規模なクラスに一般化する方法を示す。スーパーバーティスのサイズは異なり、スーパーバーティス間のエッジはその構成頂点間のランダムな接続に対応する。これらのグラフ上の量子ウォークのヒット時間は、特定の乱れた強結合ハミルトニアンにおけるゼロモードの局在特性に関係している。スピードアップは、下層の次元とランダムグラフモデルによって、スーパーポリノミカルから指数関数まで様々である。また、これらの階層グラフの具体的実現法を提供し、グラフスペーサー化を用いて、効率的な量子トラバース時間でグラフを構築する一般的な方法を提案する。

There are few known exponential speedups for quantum algorithms and these tend to fall into even fewer families. One speedup that has mostly resisted generalization is the use of quantum walks to traverse the welded-tree graph, due to Childs, Cleve, Deotto, Farhi, Gutmann, and Spielman. We show how to generalize this to a large class of hierarchical graphs in which the vertices are grouped into "supervertices" which are arranged according to a $d$-dimensional lattice. Supervertices can have different sizes, and edges between supervertices correspond to random connections between their constituent vertices. The hitting times of quantum walks on these graphs are related to the localization properties of zero modes in certain disordered tight binding Hamiltonians. The speedups range from superpolynomial to exponential, depending on the underlying dimension and the random graph model. We also provide concrete realizations of these hierarchical graphs, and introduce a general method for constructing graphs with efficient quantum traversal times using graph sparsification.

翻訳日:2024-08-05 18:53:04 公開日:2024-08-02

# SemiSFL: ラベルなしおよび非IIDデータによるフェデレーション学習

SemiSFL: Split Federated Learning on Unlabeled and Non-IID Data ( http://arxiv.org/abs/2307.15870v5 )

ライセンス: Link先を確認

Yang Xu, Yunming Liao, Hongli Xu, Zhipeng Sun, Liusheng Huang, Chunming Qiao,

(参考訳) フェデレートラーニング(FL)は、複数のクライアントがネットワークエッジでプライベートデータ上で機械学習モデルを協調的にトレーニングできるようにするためのものだ。しかし、リソース制約のあるデバイス上での大規模モデルのトレーニングとデプロイは困難である。幸いなことに、SFL(Split Federated Learning)は、クライアントの計算や通信の負担を軽減することで、実現可能なソリューションを提供します。しかし、既存のSFLの作業は、クライアントに十分なラベル付きデータを仮定することが多い。さらに、データ非IIDnessは、効率的なモデルトレーニングを保証するために別の課題となる。我々の知る限りでは、上記の2つの問題はSFLでは同時に解決されていない。そこで本研究では,クラスタリング正規化を組み込んで,ラベルなしおよび非IIDクライアントデータでSFLを実行する,Semi-supervised SFLシステムを提案する。さらに、モデル収束に関する理論的および実験的研究により、ラベル付きおよびラベルなしデータの一貫性のないトレーニングプロセスがクラスタリング正則化の有効性に影響を及ぼすことが明らかとなった。トレーニングの不整合を軽減するため,グローバルな更新頻度を動的に調整し,トレーニング性能を向上させるアルゴリズムを開発した。ベンチマークモデルとデータセットの大規模な実験により、我々のシステムはトレーニング時間の3.8倍のスピードアップを提供し、目標精度に達しながら通信コストを約70.3%削減し、最先端のベースラインと比較して、非IIDシナリオで最大5.8%の精度向上を実現している。

Federated Learning (FL) has emerged to allow multiple clients to collaboratively train machine learning models on their private data at the network edge. However, training and deploying large-scale models on resource-constrained devices is challenging. Fortunately, Split Federated Learning (SFL) offers a feasible solution by alleviating the computation and/or communication burden on clients. However, existing SFL works often assume sufficient labeled data on clients, which is usually impractical. Besides, data non-IIDness poses another challenge to ensure efficient model training. To our best knowledge, the above two issues have not been simultaneously addressed in SFL. Herein, we propose a novel Semi-supervised SFL system, termed SemiSFL, which incorporates clustering regularization to perform SFL with unlabeled and non-IID client data. Moreover, our theoretical and experimental investigations into model convergence reveal that the inconsistent training processes on labeled and unlabeled data have an influence on the effectiveness of clustering regularization. To mitigate the training inconsistency, we develop an algorithm for dynamically adjusting the global updating frequency, so as to improve training performance. Extensive experiments on benchmark models and datasets show that our system provides a 3.8x speed-up in training time, reduces the communication cost by about 70.3% while reaching the target accuracy, and achieves up to 5.8% improvement in accuracy under non-IID scenarios compared to the state-of-the-art baselines.

翻訳日:2024-08-05 18:53:04 公開日:2024-08-02

# 言語モデルを用いた算術:記憶から計算へ

Arithmetic with Language Models: from Memorization to Computation ( http://arxiv.org/abs/2308.01154v4 )

ライセンス: Link先を確認

Davide Maltoni, Matteo Ferrara,

(参考訳) 最近の大規模言語モデルの創発的計算と問題解決能力についてより深く理解することは、それらをさらに改善し、適用性を広げる上で、最重要事項である。本研究は、次のトークンを予測するために訓練された言語モデルが、トレーニングデータを超えて一般化された算術演算を実行する方法を検討する。バイナリの追加と乗算は、非常に小さな語彙を必要とするため、新しいデータに対してスムーズな入力補間を行うのに有効な入力/出力の不連続性を示すため、この目的のために良いテストベッドを構成する。我々はこれらのタスクを学ぶために軽言語モデルを訓練し、外挿能力と内部情報処理を調べるために多くの実験を行った。本研究は,入力トークン表現が適切な内部表現にマップされると,演算が値空間内で行われるエンコード-回帰-復号機として機能する,という仮説を支持する。

A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypothesis that the language model works as an Encoding-Regression-Decoding machine where the computation takes place in the value space once the input token representation is mapped to an appropriate internal representation.

翻訳日:2024-08-05 18:53:04 公開日:2024-08-02

# クロスオブジェクト脳波を用いた感情認識のための半教師付きデュアルストリーム自己弁別グラフコントラスト学習

Semi-Supervised Dual-Stream Self-Attentive Adversarial Graph Contrastive Learning for Cross-Subject EEG-based Emotion Recognition ( http://arxiv.org/abs/2308.11635v2 )

ライセンス: Link先を確認

Weishan Ye, Zhiguo Zhang, Fei Teng, Min Zhang, Jianhong Wang, Dong Ni, Fali Li, Peng Xu, Zhen Liang,

(参考訳) 脳波(Electroencephalography、EEG)は、有望な応用を伴う感情認識のための客観的ツールである。しかし、この分野ではラベル付きデータの不足が大きな課題であり、脳波に基づく感情認識の広範使用を制限する。本稿では,クロスオブジェクトの脳波に基づく感情認識において,限定ラベル付きデータの課題に対処するために,半教師付きデュアルストリーム自己弁別グラフコントラスト学習フレームワーク(DS-AGC)を提案する。 DS-AGCフレームワークは、非構造的および構造的EEG特徴を抽出する2つの並列ストリームを含む。非構造的ストリームは、ラベル付きソースドメイン、ラベル付きソースドメイン、未知のターゲットドメイン間の分散不一致を軽減するための、半教師付きマルチドメイン適応手法を組み込んでいる。構造ストリームは,複数のEEGチャネルから効率的なグラフベースの特徴表現を半教師付きで抽出するグラフコントラスト学習法を開発する。さらに、特徴融合、サンプル選択、感情認識のための自己注意融合モジュールが開発され、ターゲットドメインに近いラベル付きソースドメイン内の感情やデータサンプルとより関連性の高い脳波特徴が強調される。 2つのベンチマークデータベース(SEEDとSEED-IV)で半教師付きクロスオブジェクト・ワン・オブ・サブオブジェクト・アウト・クロスバリデーション・アセスメント・スキームを用いて行った大規模な実験により、提案モデルは、異なる不完全なラベル条件下で既存の手法よりも優れ(SEED-IVでは平均5.83%、SEED-IVでは6.99%)、クロスオブジェクトEEGベースの感情認識におけるラベル不足問題に対処する効果を示す。

Electroencephalography (EEG) is an objective tool for emotion recognition with promising applications. However, the scarcity of labeled data remains a major challenge in this field, limiting the widespread use of EEG-based emotion recognition. In this paper, a semi-supervised Dual-stream Self-Attentive Adversarial Graph Contrastive learning framework (termed as DS-AGC) is proposed to tackle the challenge of limited labeled data in cross-subject EEG-based emotion recognition. The DS-AGC framework includes two parallel streams for extracting non-structural and structural EEG features. The non-structural stream incorporates a semi-supervised multi-domain adaptation method to alleviate distribution discrepancy among labeled source domain, unlabeled source domain, and unknown target domain. The structural stream develops a graph contrastive learning method to extract effective graph-based feature representation from multiple EEG channels in a semi-supervised manner. Further, a self-attentive fusion module is developed for feature fusion, sample selection, and emotion recognition, which highlights EEG features more relevant to emotions and data samples in the labeled source domain that are closer to the target domain. Extensive experiments conducted on two benchmark databases (SEED and SEED-IV) using a semi-supervised cross-subject leave-one-subject-out cross-validation evaluation scheme show that the proposed model outperforms existing methods under different incomplete label conditions (with an average improvement of 5.83% on SEED and 6.99% on SEED-IV), demonstrating its effectiveness in addressing the label scarcity problem in cross-subject EEG-based emotion recognition.

翻訳日:2024-08-05 18:53:04 公開日:2024-08-02

# ルールに基づく動作軌跡分類の誤り検出と補正

Rule-Based Error Detection and Correction to Operationalize Movement Trajectory Classification ( http://arxiv.org/abs/2308.14250v3 )

ライセンス: Link先を確認

Bowen Xi, Kevin Scaria, Divyagna Bavikadi, Paulo Shakarian,

(参考訳) 移動軌跡の分類は交通に多くの応用があり、災害や外部ショックの余波において重要な安全性を有する大規模移動軌跡生成および異常検出の鍵となる要素である。しかし、現在の最先端(SOTA)は教師付き深層学習に基づいているため、このようなショックによって軌道の分布が変化した場合に課題が生じる。我々は,これらのモデルの誤り訂正と検出を行い,運動軌道プラットフォームに統合するための,ニューロシンボリックなルールベースのフレームワークを提供する。我々は,最近のSOTAモデルにおいて,精度の高い誤り検出,テスト分布の変化による精度向上,およびアルゴリズム開発を通知する理論的特性のスイートに加えて,基本ユースケースの精度向上を示す一連の実験を行った。具体的には、最大0.984の誤差を予測するためのF1スコア、分布外精度の大幅な向上(ゼロショット精度のSOTAよりも8.51%改善)、SOTAモデルよりも精度の向上を示す。

Classification of movement trajectories has many applications in transportation and is a key component for large-scale movement trajectory generation and anomaly detection which has key safety applications in the aftermath of a disaster or other external shock. However, the current state-of-the-art (SOTA) are based on supervised deep learning - which leads to challenges when the distribution of trajectories changes due to such a shock. We provide a neuro-symbolic rule-based framework to conduct error correction and detection of these models to integrate into our movement trajectory platform. We provide a suite of experiments on several recent SOTA models where we show highly accurate error detection, the ability to improve accuracy with a changing test distribution, and accuracy improvement for the base use case in addition to a suite of theoretical properties that informed algorithm development. Specifically, we show an F1 scores for predicting errors of up to 0.984, significant performance increase for out-of distribution accuracy (8.51% improvement over SOTA for zero-shot accuracy), and accuracy improvement over the SOTA model.

翻訳日:2024-08-05 18:53:04 公開日:2024-08-02

# エントロピー最大化によるブラックホール

Black Hole from Entropy Maximization ( http://arxiv.org/abs/2309.00602v4 )

ライセンス: Link先を確認

Yuki Yokokura,

(参考訳) 局所ホログラフィーと熱力学によって動機付けられたブラックホールの量子的特徴の一つは、与えられた表面領域に対する熱力学的エントロピーを最大化することである。量子重力の文脈では、これは地平線による古典的な特徴づけよりも基礎的である。ステップとして、多くの物質場を持つ4次元半古典アインシュタイン方程式を解くことにより、この可能性を探究し、ブラックホールの画像を見つける。球面静的な高励起構成に対しては、局所的な典型を適用し、自己重力を含むエントロピーを推定し、その上界を導出する。飽和条件はエントロピー最大化の構成を一意に決定する: 自己重力量子は地平線のない放射状に一様密な構成に凝縮し、そこでは自己重力と曲率によって誘導される大きな量子圧がバランスを取り、特異性は現れない。内部計量はプランク定数に対する自己整合かつ非摂動解である。エントロピー密度の体積積分によって与えられる最大エントロピーは、自己重力によるベーケンシュタイン・ホーキングの公式と一致する。最後に、10の将来の展望が議論され、この構成は半古典的にホログラフィックバルク力学を持つ量子重力凝縮体を表すという投機的見解が導かれる。

One quantum characterization of a black hole motivated by (local) holography and thermodynamics is that it maximizes thermodynamic entropy for a given surface area. In the context of quantum gravity, this could be more fundamental than the classical characterization by a horizon. As a step, we explore this possibility by solving the 4D semi-classical Einstein equation with many matter fields, and find a picture of a black hole. For spherical static highly-excited configurations, we apply local typicality and estimate the entropy including self-gravity to derive its upper bound. The saturation condition uniquely determines the entropy-maximized configuration: self-gravitating quanta condensate into a radially-uniform dense configuration with no horizon, where the self-gravity and a large quantum pressure induced by the curvatures are balanced and no singularity appears. The interior metric is a self-consistent and non-perturbative solution for Planck's constant. The maximum entropy, given by the volume integral of the entropy density, agrees with the Bekenstein-Hawking formula through self-gravity, deriving the Bousso bound for thermodynamic entropy. Finally, 10 future prospects are discussed, leading to the speculative view that the configuration represents semi-classically a quantum-gravitational condensate with holographic bulk dynamics.

翻訳日:2024-08-05 18:53:04 公開日:2024-08-02

# コンテンツモデレーションにおけるミスモーダル推論のためのマルチモーダル誘導ネットワーク

Multimodal Guidance Network for Missing-Modality Inference in Content Moderation ( http://arxiv.org/abs/2309.03452v2 )

ライセンス: Link先を確認

Zhuokai Zhao, Harish Palani, Tianyi Liu, Lena Evans, Ruth Toner,

(参考訳) マルチモーダルディープラーニング(特に視覚言語モデル)は近年、コンテンツモデレーションや暴力検出など、多くの下流タスクのパフォーマンスを大幅に向上させ、大きな注目を集めている。しかしながら、標準的なマルチモーダルアプローチは、トレーニングと推論の間に一貫したモダリティを仮定し、多くの実世界のユースケースで応用を制限する。既存の研究は、欠落したモダリティを再構築することでこの問題を軽減するが、必然的に不要な計算コストを増大させる。そこで本研究では,学習中の知識共有を促進する新しいガイダンスネットワークを提案し,マルチモーダル表現を活用して,より優れた単一モダリティモデルを推論に使用するためのトレーニングを行う。暴力検出における実世界の実験は、提案フレームワークが従来の訓練されたモデルを大幅に上回るシングルモダリティモデルを訓練し、推論の計算コストの増大を回避していることを示している。

Multimodal deep learning, especially vision-language models, have gained significant traction in recent years, greatly improving performance on many downstream tasks, including content moderation and violence detection. However, standard multimodal approaches often assume consistent modalities between training and inference, limiting applications in many real-world use cases, as some modalities may not be available during inference. While existing research mitigates this problem through reconstructing the missing modalities, they unavoidably increase unnecessary computational cost, which could be just as critical, especially for large, deployed infrastructures in industry. To this end, we propose a novel guidance network that promotes knowledge sharing during training, taking advantage of the multimodal representations to train better single-modality models to be used for inference. Real-world experiments in violence detection shows that our proposed framework trains single-modality models that significantly outperform traditionally trained counterparts, while avoiding increases in computational cost for inference.

翻訳日:2024-08-05 18:53:04 公開日:2024-08-02

# ロシア語における事前学習型トランスフォーマー言語モデルの一家系

A Family of Pretrained Transformer Language Models for Russian ( http://arxiv.org/abs/2309.10931v4 )

ライセンス: Link先を確認

Dmitry Zmitrovich, Alexander Abramov, Andrey Kalmykov, Maria Tikhonova, Ekaterina Taktasheva, Danil Astafurov, Mark Baushenko, Artem Snegirev, Vitalii Kadulin, Sergey Markov, Tatiana Shavrina, Vladislav Mikhailov, Alena Fenogenova,

(参考訳) トランスフォーマー言語モデル(LM)は、様々な言語におけるNLP研究方法論と応用の基礎である。しかし、ロシア語に特化したそのようなモデルの開発はほとんど注目されていない。本稿では、エンコーダ(ruBERT, ruRoBERTa, ruELECTRA)、デコーダ(ruGPT-3)、エンコーダ-デコーダ(ruT5, FRED-T5)アーキテクチャにまたがる13のロシアトランスフォーマーLMのコレクションを紹介する。本稿では, モデルアーキテクチャの設計と事前学習について報告し, それらの一般化能力をロシア語理解および生成データセットおよびベンチマーク上で評価した結果について述べる。これらの特殊なトランスフォーマーLMの事前学習とリリースにより、NLP研究の方向性の範囲を広げ、ロシア語のための産業ソリューションの開発を可能にすることを目指している。

Transformer language models (LMs) are fundamental to NLP research methodologies and applications in various languages. However, developing such models specifically for the Russian language has received little attention. This paper introduces a collection of 13 Russian Transformer LMs, which spans encoder (ruBERT, ruRoBERTa, ruELECTRA), decoder (ruGPT-3), and encoder-decoder (ruT5, FRED-T5) architectures. We provide a report on the model architecture design and pretraining, and the results of evaluating their generalization abilities on Russian language understanding and generation datasets and benchmarks. By pretraining and releasing these specialized Transformer LMs, we aim to broaden the scope of the NLP research directions and enable the development of industrial solutions for the Russian language.

翻訳日:2024-08-05 18:53:04 公開日:2024-08-02

# フラックスパルスを用いたフラクソニウム量子ビットの読み出し

Flux-pulse-assisted Readout of a Fluxonium Qubit ( http://arxiv.org/abs/2309.17286v2 )

ライセンス: Link先を確認

Taryn V. Stefanski, Christian Kraglund Andersen,

(参考訳) 大規模な超伝導量子デバイスのためのトランスモンアーキテクチャに多くの注意が向けられているが、フラクソニウム量子ビットが後継候補として浮上している。ヨーゼフソン接合と平行に振る舞うインダクタにより、フラクソニウムはより大きな非調和性を提供し、誘電体損失に対して強い保護を与えるため、従来のトランモン量子ビットと比較してコヒーレンス時間が高い。フラクソニウム量子ビットの誘導エネルギーポテンシャルとジョセフソンエネルギーポテンシャルの間の相互作用は、外部フラックスをチューニングする際に、豊富な分散シフトランドスケープをもたらす。ここでは、量子ビットの読み出しを改善するために分散シフトにおける特徴を活用することを提案する。具体的には,大規模な分散シフトを伴うフラックスバイアス点において,読み出し時間と誤り率の改善を示す理論シミュレーションについて報告する。我々は、異なるエラーチャネルを含むようにスキームを拡張し、155 nsの積分時間で、フラックスパルスアシストによる読み出しにより、信号対雑音比が約5倍向上することを示す。さらに, フラックスパルスアシスト再生点におけるパーセル速度の増加を考慮した場合, 有限測定効率と準静圧フラックスノイズとの併用により, 性能改善が持続することを示す。提案するフラックスパルスアシスト型読み出し方式の実装を可能にするフラクトロニウムアーキテクチャの妥当なエネルギーパラメータセットを提案する。

Much attention has focused on the transmon architecture for large-scale superconducting quantum devices, however, the fluxonium qubit has emerged as a possible successor. With a shunting inductor in parallel to a Josephson junction, the fluxonium offers larger anharmonicity and stronger protection against dielectric loss, leading to higher coherence times as compared to conventional transmon qubits. The interplay between the inductive and Josephson energy potentials of the fluxonium qubit leads to a rich dispersive shift landscape when tuning the external flux. Here we propose to exploit the features in the dispersive shift to improve qubit readout. Specifically, we report on theoretical simulations showing improved readout times and error rates by performing the readout at a flux bias point with large dispersive shift. We expand the scheme to include different error channels, and show that with an integration time of 155 ns, flux-pulse-assisted readout offers about 5 times improvement in the signal to noise ratio. Moreover, we show that the performance improvement persists in the presence of finite measurement efficiency combined with quasi-static flux noise, and also when considering the increased Purcell rate at the flux-pulse-assisted readout point. We suggest a set of reasonable energy parameters for the fluxonium architecture that will allow for the implementation of our proposed flux-pulse-assisted readout scheme.

翻訳日:2024-08-05 18:43:16 公開日:2024-08-02

# DragD3D: 2次元拡散前処理による剛性制御によるリアルメッシュ編集

DragD3D: Realistic Mesh Editing with Rigidity Control Driven by 2D Diffusion Priors ( http://arxiv.org/abs/2310.04561v2 )

ライセンス: Link先を確認

Tianhao Xie, Eugene Belilovsky, Sudhir Mudur, Tiberiu Popa,

(参考訳) ダイレクトメッシュの編集と変形は、幾何学的モデリングとアニメーションパイプラインの重要なコンポーネントである。メッシュ編集法は通常、ユーザ指定の頂点制約と他の頂点の位置を決定する正規化器を組み合わせた最適化問題としてフレーム化される。正則化器の選択は、最終的な結果の現実性と信頼性の鍵となる。物理と幾何学に基づく正規化器は、対象のグローバルな文脈や意味を意識せず、より最近のディープラーニングの先行は、特定の3次元オブジェクトの変形のクラスに限られる。 DragD3Dと呼ばれる頂点ベースのメッシュ編集手法は,(1)変形の回転成分とストレッチ成分を分離し3次元幾何正規化器と(2)最近導入されたDDS損失とを組み合わせた新しい最適化式を,拡散モデルから導出した2次元画像の忠実度を評価する。したがって, この変形法は, 対象物の種類に制限されない世界的現実的な形状変形を実現する。我々の新しい定式化は、回転成分と伸縮成分を明示的に分離する神経ジャコビアン場の変換を直接最適化する。最適化の目的関数は、DDSの近似勾配と幾何学的損失からの勾配を組み合わせて頂点制約を満たす。所望の大域形状変形に対する追加のユーザ制御は、明示的な三角形変形制御と、変形の回転成分と伸縮成分の明示的な分離を可能にする。我々の変形は, 物体のグローバルな文脈を認識した現実的な形状変形を生じさせ, 幾何正規化器よりも優れた結果が得られることを示す。

Direct mesh editing and deformation are key components in the geometric modeling and animation pipeline. Mesh editing methods are typically framed as optimization problems combining user-specified vertex constraints with a regularizer that determines the position of the rest of the vertices. The choice of the regularizer is key to the realism and authenticity of the final result. Physics and geometry-based regularizers are not aware of the global context and semantics of the object, and the more recent deep learning priors are limited to a specific class of 3D object deformations. Our main contribution is a vertex-based mesh editing method called DragD3D based on (1) a novel optimization formulation that decouples the rotation and stretch components of the deformation and combines a 3D geometric regularizer with (2) the recently introduced DDS loss which scores the faithfulness of the rendered 2D image to one from a diffusion model. Thus, our deformation method achieves globally realistic shape deformation which is not restricted to any class of objects. Our new formulation optimizes directly the transformation of the neural Jacobian field explicitly separating the rotational and stretching components. The objective function of the optimization combines the approximate gradients of DDS and the gradients from the geometric loss to satisfy the vertex constraints. Additional user control over desired global shape deformation is made possible by allowing explicit per-triangle deformation control as well as explicit separation of rotational and stretching components of the deformation. We show that our deformations can be controlled to yield realistic shape deformations that are aware of the global context of the objects, and provide better results than just using geometric regularizers.

翻訳日:2024-08-05 18:43:16 公開日:2024-08-02

# QFTの検出器ベース測定:2つの課題とAQFTの提案

Detector-based measurements for QFT: two issues and an AQFT proposal ( http://arxiv.org/abs/2310.06596v2 )

ライセンス: Link先を確認

Nicola Pranzini, Esko Keski-Vakkuri,

(参考訳) 本稿では, J. Polo-G\'omez と L. J. Garay と E. Mart\'in-Mart\'inez が提示した QFT 測定スキームの2つの問題について, 量子場理論の検出器ベース測定理論について述べる。実測スキームを文脈的フィールド状態に適用した場合に発生するいくつかの相違点を指摘し、局所処理領域に基づく$n$-point関数の割り当てが不整合を引き起こすことがあることを示す。これらの問題を解決するために、非相対論的検出器を用いて量子場理論のハーグ・カーストラーの定式化において、代数状態の更新規則を誘導するために測定スキームを変更した。このようにして、$n$-point関数は、測定値と明確な因果関係を持つ任意の領域にわたって一貫して評価することができる。

We present and investigate two issues within the measurement scheme for QFT presented by J. Polo-G\'omez, L. J. Garay and E. Mart\'in-Mart\'inez in "A detector-based measurement theory for quantum field theory". We point out some discrepancies that arise when the measurement scheme is applied to contextual field states and show that $n$-point function assignments based on local processing regions sometimes lead to inconsistencies. To solve these issues, we modify the measurement scheme to use non-relativistic detectors to induce an update rule for algebraic states in the Haag-Kastler formulation of quantum field theory. In this way, $n$-point functions can be consistently evaluated across any region having a definite causal relation with measurements.

翻訳日:2024-08-05 18:43:16 公開日:2024-08-02

# 量子カーネル法のハイパーパラメーターによる研究

A Hyperparameter Study for Quantum Kernel Methods ( http://arxiv.org/abs/2310.11891v3 )

ライセンス: Link先を確認

Sebastian Egginger, Alona Sakhnenko, Jeanette Miriam Lorenz,

(参考訳) 量子カーネル法は、量子機械学習において、それらに接続された保証のおかげで有望な方法である。分析的考察に対するそれらのアクセシビリティは、量子的優位性の可能性に基づいてデータセットを事前スクリーニングする可能性も開ける。そのため、初期の研究は、量子カーネルと古典的カーネルの間の2つのカーネルベースの機械学習アプローチの密接度尺度として理解できる幾何学的差異を開発した。この計量は量子と古典的なモデルの複雑さを結びつけ、一般化誤差を有界にするために開発された。したがって、この計量が経験的な環境でどのように振る舞うかという疑問が提起される。本研究では,ハイパーパラメータ選択がモデル性能および古典カーネルと量子カーネル間の一般化ギャップに与える影響について検討する。ハイパーパラメータの重要性は、古典的な機械学習においてもよく知られている。特に興味深いのは、量子ハミルトン進化特徴写像に関連するハイパーパラメータと、投影された量子カーネルを計算する前に追跡する量子ビットの数である。 11データセットにわたるハイパーパラメータを徹底的に調査し、利用可能な特定の側面を特定します。クロスバリデーション精度によって測定された経験的性能に対するある種のハイパーパラメータ設定の影響の解析と、上記の幾何学的差異によって測定された一般化能力は、古典的なデータセット上での量子カーネル法の可能性を理解するための一歩となる。

Quantum kernel methods are a promising method in quantum machine learning thanks to the guarantees connected to them. Their accessibility for analytic considerations also opens up the possibility of prescreening datasets based on their potential for a quantum advantage. To do so, earlier works developed the geometric difference, which can be understood as a closeness measure between two kernel-based machine learning approaches, most importantly between a quantum kernel and a classical kernel. This metric links the quantum and classical model complexities, and it was developed to bound generalization error. Therefore, it raises the question of how this metric behaves in an empirical setting. In this work, we investigate the effects of hyperparameter choice on the model performance and the generalization gap between classical and quantum kernels. The importance of hyperparameters is well known also for classical machine learning. Of special interest are hyperparameters associated with the quantum Hamiltonian evolution feature map, as well as the number of qubits to trace out before computing a projected quantum kernel. We conduct a thorough investigation of the hyperparameters across 11 datasets and we identify certain aspects that can be exploited. Analyzing the effects of certain hyperparameter settings on the empirical performance, as measured by cross validation accuracy, and generalization ability, as measured by geometric difference described above, brings us one step closer to understanding the potential of quantum kernel methods on classical datasets.

翻訳日:2024-08-05 18:43:16 公開日:2024-08-02

# RCAgent: ツール強化大規模言語モデルを用いた自律エージェントによるクラウドルート解析

RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models ( http://arxiv.org/abs/2310.16340v3 )

ライセンス: Link先を確認

Zefan Wang, Zichuan Liu, Yingying Zhang, Aoxiao Zhong, Jihong Wang, Fengbin Yin, Lunting Fan, Lingfei Wu, Qingsong Wen,

(参考訳) 近年,クラウド根本原因分析(RCA)における言語モデル (LLM) の適用が活発に検討されている。しかし、現在のメソッドは手動のワークフロー設定に依存しており、LCMの意思決定と環境相互作用能力を解き放たない。 RCAgentは、実用的でプライバシに配慮した産業RCA利用のためのツール強化LDM自律エージェントフレームワークである。 RCAgentはGPTファミリではなく、内部的にデプロイされたモデル上で動作し、フリーフォームのデータ収集とツールによる包括的な分析を行うことができる。私たちのフレームワークは、アクショントラジェクトリのためのユニークなセルフ一貫性や、コンテキスト管理、安定化、ドメイン知識のインポートのための一連のメソッドなど、さまざまな拡張を組み合わせています。我々の実験は、RCAのすべての側面 – 根本原因、ソリューション、エビデンス、責任の予測 -- におけるReActに対するRCAgentの明らかかつ一貫した優位性、そして、自動化されたメトリクスと人的評価の両方によって検証された現在のルールによってカバーまたは明らかにされたタスクを示しています。さらに、RCAgentはすでにAlibaba CloudのApache Flink用のReal-time Compute Platformの診断と問題発見ワークフローに統合されている。

Large language model (LLM) applications in cloud root cause analysis (RCA) have been actively explored recently. However, current methods are still reliant on manual workflow settings and do not unleash LLMs' decision-making and environment interaction capabilities. We present RCAgent, a tool-augmented LLM autonomous agent framework for practical and privacy-aware industrial RCA usage. Running on an internally deployed model rather than GPT families, RCAgent is capable of free-form data collection and comprehensive analysis with tools. Our framework combines a variety of enhancements, including a unique Self-Consistency for action trajectories, and a suite of methods for context management, stabilization, and importing domain knowledge. Our experiments show RCAgent's evident and consistent superiority over ReAct across all aspects of RCA -- predicting root causes, solutions, evidence, and responsibilities -- and tasks covered or uncovered by current rules, as validated by both automated metrics and human evaluations. Furthermore, RCAgent has already been integrated into the diagnosis and issue discovery workflow of the Real-time Compute Platform for Apache Flink of Alibaba Cloud.

翻訳日:2024-08-05 18:43:16 公開日:2024-08-02

# 絡み合い支援符号語安定化量子符号のサイズに関する半定値プログラミング境界

Semidefinite programming bounds on the size of entanglement-assisted codeword stabilized quantum codes ( http://arxiv.org/abs/2311.07111v2 )

ライセンス: Link先を確認

Ching-Yi Lai, Pin-Chieh Tseng, Wei-Hsuan Yu,

(参考訳) 本稿では,量子符号の領域における半定値プログラミングの適用について検討し,特に絡み合い支援付き符号語安定化符号(CWS)に着目した。特に、CWS群の等方部分群とCWS型量子コードのワード演算子の集合を利用して、最小距離上の上限を導出する。さらに、この特徴は関連する距離列挙子に組み込むことができ、CWS型量子符号の最小距離またはサイズでSDP境界につながる半定値制約を構築することができる。 SDP が LP 境界より優れており、LP が有意義な結果を得ることができない場合もあれば、SDP は一貫してより厳密で関連する境界を提供する。最後に、コードワード安定化符号に対するShor-Laflamme重み列挙子とシャドー列挙子を解釈し、量子符号の理解を深める。

In this paper, we explore the application of semidefinite programming to the realm of quantum codes, specifically focusing on codeword stabilized (CWS) codes with entanglement assistance. Notably, we utilize the isotropic subgroup of the CWS group and the set of word operators of a CWS-type quantum code to derive an upper bound on the minimum distance. Furthermore, this characterization can be incorporated into the associated distance enumerators, enabling us to construct semidefinite constraints that lead to SDP bounds on the minimum distance or size of CWS-type quantum codes. We illustrate several instances where SDP bounds outperform LP bounds, and there are even cases where LP fails to yield meaningful results, while SDP consistently provides tighter and relevant bounds. Finally, we also provide interpretations of the Shor-Laflamme weight enumerators and shadow enumerators for codeword stabilized codes, enhancing our understanding of quantum codes.

翻訳日:2024-08-05 18:43:16 公開日:2024-08-02

# フィードバックループを用いたインクリメンタルオブジェクトベースノベルティ検出

Incremental Object-Based Novelty Detection with Feedback Loop ( http://arxiv.org/abs/2311.09004v2 )

ライセンス: Link先を確認

Simone Caldarella, Elisa Ricci, Rahaf Aljundi,

(参考訳) オブジェクトベースノベルティ検出(ND)は、オブジェクト検出モデルによってトレーニング中に見られるクラスに属さない未知のオブジェクトを識別することを目的としている。このタスクは、例えば自動運転車や自律ロボットで採用されている物体検出モデルのように、潜在的に有害な行動を回避できるため、現実世界のアプリケーションでは特に重要である。従来のNDのアプローチでは、事前訓練されたオブジェクト検出出力をオフラインで処理することに集中しており、トレーニング後にモデルロバスト性を改善し、デプロイ中に発生する大量のアウト・オブ・ディストリビューションデータを破棄する可能性は残っていない。本研究では,オブジェクト検出性能に悪影響を及ぼすことなく,予測出力に対して人間のフィードバックを要求できることを前提として,オブジェクトベースNDの新しいフレームワークを提案する。この改善操作は、新しいフィードバックが利用できるたびに繰り返される。そこで本研究では,物体検出モデル上に付加された軽量NDモジュールを,フィードバックループを通じて漸進的に更新する手法を提案する。また,この新たな設定の手法を評価し,ベースラインに対するNDアプローチを広範囲に検証する新たなベンチマークを提案し,ロバスト性の向上とフィードバックの取り込みに成功していることを示す。

Object-based Novelty Detection (ND) aims to identify unknown objects that do not belong to classes seen during training by an object detection model. The task is particularly crucial in real-world applications, as it allows to avoid potentially harmful behaviours, e.g. as in the case of object detection models adopted in a self-driving car or in an autonomous robot. Traditional approaches to ND focus on one time offline post processing of the pretrained object detection output, leaving no possibility to improve the model robustness after training and discarding the abundant amount of out-of-distribution data encountered during deployment. In this work, we propose a novel framework for object-based ND, assuming that human feedback can be requested on the predicted output and later incorporated to refine the ND model without negatively affecting the main object detection performance. This refinement operation is repeated whenever new feedback is available. To tackle this new formulation of the problem for object detection, we propose a lightweight ND module attached on top of a pre-trained object detection model, which is incrementally updated through a feedback loop. We also propose a new benchmark to evaluate methods on this new setting and test extensively our ND approach against baselines, showing increased robustness and a successful incorporation of the received feedback.

翻訳日:2024-08-05 18:43:16 公開日:2024-08-02

# スパイクニューラルネットワークの視覚的位置認識への応用

Applications of Spiking Neural Networks in Visual Place Recognition ( http://arxiv.org/abs/2311.13186v2 )

ライセンス: Link先を確認

Somayeh Hussaini, Michael Milford, Tobias Fischer,

(参考訳) ロボット工学において、スパイキングニューラルネットワーク(SNN)は、特にニューロモルフィックハードウェアに実装された場合、その大部分が非現実的なポテンシャルエネルギー効率と低レイテンシーで認識されている。本稿では,視覚的位置認識(VPR)におけるSNNの3つの進歩について述べる。まず,各SNNが重複しない地理的に異なる場所の集合を表現し,大規模環境におけるスケーラブルなネットワークを実現するモジュールSNNを提案する。次に,複数のネットワークが同じ場所を表現し,シングルネットワークモデルと比較して精度を著しく向上させるモジュールSNNのアンサンブルを提案する。モジュラーSNNモジュールはそれぞれコンパクトで、1500のニューロンと474kのシナプスのみで構成されています。最後に,SNNに基づくVPRにおけるシーケンスマッチングの役割について検討する。我々は,他のVPR手法と比較して,SNNのアンサンブルとシーケンスマッチングに対する応答性を解析した。弊社のコントリビューションは、VPRのためのSNNの実用性を強調し、スケーラブルで堅牢なソリューションを提供し、さまざまなエネルギーに敏感なロボットタスクに適用するための道を開いた。

In robotics, Spiking Neural Networks (SNNs) are increasingly recognized for their largely-unrealized potential energy efficiency and low latency particularly when implemented on neuromorphic hardware. Our paper highlights three advancements for SNNs in Visual Place Recognition (VPR). Firstly, we propose Modular SNNs, where each SNN represents a set of non-overlapping geographically distinct places, enabling scalable networks for large environments. Secondly, we present Ensembles of Modular SNNs, where multiple networks represent the same place, significantly enhancing accuracy compared to single-network models. Each of our Modular SNN modules is compact, comprising only 1500 neurons and 474k synapses, making them ideally suited for ensembling due to their small size. Lastly, we investigate the role of sequence matching in SNN-based VPR, a technique where consecutive images are used to refine place recognition. We analyze the responsiveness of SNNs to ensembling and sequence matching compared to other VPR techniques. Our contributions highlight the viability of SNNs for VPR, offering scalable and robust solutions, and paving the way for their application in various energy-sensitive robotic tasks.

翻訳日:2024-08-05 18:43:16 公開日:2024-08-02

# 有限熱力学資源を用いた量子符号化

Quantum Coding with Finite Thermodynamic Resources ( http://arxiv.org/abs/2311.14561v2 )

ライセンス: Link先を確認

Jake Xuereb, Tiago Debarba, Marcus Huber, Paul Erker,

(参考訳) 量子直接符号化(英語版)あるいはシューマッハ圧縮(英語版)はシャノン理論の考えを一般化し、フォン・ノイマンのエントロピーに操作的意味を与え、qubitという用語を確立した。しかし、その情報処理が物理的プロセスによって実行されることを思い出すと、量子情報の圧縮に必要な熱力学的資源と、そのタスクを実行する能力の制約について疑問を抱くようになる。つまり、アリスとボブが有限の精度で熱量子状態や時計にしかアクセスできないなら、純粋な量子状態のメッセージを計測、エンコード、復号できるだろうか? 本研究では、Aliceの典型的な測定を測定プローブを含むユニタリとしてモデル化し、符号化と復号における不完全時間保持を検証し、Bobの付加量子ビットにおける温度の役割を考察する。そうすることで、アリスが測定プローブで生成できる相関、時計のくちばしのばらつき、ボブの量子ビットの温度を含むこのプロトコルの忠実性境界を導出する。最後に、これらの2つのエージェントが生成するエントロピーについて、それらが使用するリソースを量子熱力学冷却プロトコルに関連付けることにより、圧縮プロトコルを通して考察する。

Quantum direct coding or Schumacher compression generalised the ideas of Shannon theory, gave an operational meaning to the von Neumann entropy and established the term qubit. But remembering that information processing is carried out by physical processes prompts one to wonder what thermodynamic resources are required to compress quantum information and how they constrain one's ability to perform this task. That is, if Alice and Bob only have access to thermal quantum states and clocks with finite accuracy, how well can they measure, encode and decode pure quantum state messages? In this work we examine these questions by modelling Alice's typical measurement as a unitary involving a measurement probe, investigating imperfect timekeeping on encoding and decoding and considering the role of temperature in Bob's appended qubits. In doing so, we derive fidelity bounds for this protocol involving the correlations Alice can form with their measurement probe, the variance of the clock's ticks and the temperature of Bob's qubits. Finally, we give an insight into the entropy produced by these two agents throughout the compression protocol by relating the resources they use to a quantum thermodynamic cooling protocol.

翻訳日:2024-08-05 18:43:16 公開日:2024-08-02

# デジタル病理基盤モデルにおける下流ネットワークの重要性

The Importance of Downstream Networks in Digital Pathology Foundation Models ( http://arxiv.org/abs/2311.17804v3 )

ライセンス: Link先を確認

Gustav Bredell, Marcel Fischer, Przemyslaw Szostak, Samaneh Abbasi-Sureshjani, Alvaro Gomariz,

(参考訳) デジタル病理学は、ギガピクセル全スライディング画像(WSI)の解析を通じて、疾患の検出と病理学の効率を大幅に向上させた。このプロセスでは、まずWSIをパッチに分割し、特徴抽出モデルを適用して特徴ベクトルを取得し、その後集約モデルで処理して各WSIラベルを予測する。表現学習の急速な進化に伴い、多くの新しい特徴抽出モデル(しばしば基礎モデルと呼ばれる)が出現した。従来の評価方法は、固定されたアーキテクチャとハイパーパラメータを含む静的な下流アグリゲーションモデルの設定に依存しています。本研究は, 特徴抽出器モデルのアグリゲーションモデル構成に対する感度を明らかにし, 選択した構成に基づいて, 性能コンパビリティをスキューできることを示す。この感度を考慮すると、多くの特徴抽出器モデルの性能が顕著に類似していることが分かる。 162のアグリゲーションモデル構成を持つ3つのデータセットにまたがる7つの特徴抽出モデルを評価することで、この洞察を支援する。この包括的なアプローチは、様々な集約モデル構成に対する特徴抽出器の感度をより微妙に理解し、デジタル病理学における新しい基礎モデルをより公平かつ正確に評価する。

Digital pathology has significantly advanced disease detection and pathologist efficiency through the analysis of gigapixel whole-slide images (WSI). In this process, WSIs are first divided into patches, for which a feature extractor model is applied to obtain feature vectors, which are subsequently processed by an aggregation model to predict the respective WSI label. With the rapid evolution of representation learning, numerous new feature extractor models, often termed foundational models, have emerged. Traditional evaluation methods rely on a static downstream aggregation model setup, encompassing a fixed architecture and hyperparameters, a practice we identify as potentially biasing the results. Our study uncovers a sensitivity of feature extractor models towards aggregation model configurations, indicating that performance comparability can be skewed based on the chosen configurations. By accounting for this sensitivity, we find that the performance of many current feature extractor models is notably similar. We support this insight by evaluating seven feature extractor models across three different datasets with 162 different aggregation model configurations. This comprehensive approach provides a more nuanced understanding of the feature extractors' sensitivity to various aggregation model configurations, leading to a fairer and more accurate assessment of new foundation models in digital pathology.

翻訳日:2024-08-05 18:43:16 公開日:2024-08-02

# 大規模言語モデルのトレーニングのためのデータ管理:サーベイ

Data Management For Training Large Language Models: A Survey ( http://arxiv.org/abs/2312.01700v3 )

ライセンス: Link先を確認

Zige Wang, Wanjun Zhong, Yufei Wang, Qi Zhu, Fei Mi, Baojun Wang, Lifeng Shang, Xin Jiang, Qun Liu,

(参考訳) データは、Large Language Models(LLM)のトレーニングにおいて、基本的な役割を果たす。効率的なデータ管理、特に適切なトレーニングデータセットの定式化は、事前トレーニングおよび教師付き微調整段階におけるモデル性能の向上とトレーニング効率の向上に重要である。データ管理の重要性は大きいが、現在の顕著なプラクティスの基盤となるメカニズムはまだ不明である。その結果、データ管理の探究が研究コミュニティの間でますます注目を集めている。本調査は、データ管理戦略設計の様々な側面を網羅し、LLMの事前訓練および微調整段階におけるデータ管理に関する現在の研究の概要を概観することを目的としている。今後の展望として、既存の課題を概説し、この分野の開発に向けた有望な方向性を概説する。したがって、この調査は、効率的なデータ管理の実践を通じて強力なLCMを構築したいと考える実践者の指針となる。最新の論文のコレクションはhttps://github.com/ZigeW/data_management_LLMで公開されている。

Data plays a fundamental role in training Large Language Models (LLMs). Efficient data management, particularly in formulating a well-suited training dataset, is significant for enhancing model performance and improving training efficiency during pretraining and supervised fine-tuning stages. Despite the considerable importance of data management, the underlying mechanism of current prominent practices are still unknown. Consequently, the exploration of data management has attracted more and more attention among the research community. This survey aims to provide a comprehensive overview of current research in data management within both the pretraining and supervised fine-tuning stages of LLMs, covering various aspects of data management strategy design. Looking into the future, we extrapolate existing challenges and outline promising directions for development in this field. Therefore, this survey serves as a guiding resource for practitioners aspiring to construct powerful LLMs through efficient data management practices. The collection of the latest papers is available at https://github.com/ZigeW/data_management_LLM.

翻訳日:2024-08-05 18:43:16 公開日:2024-08-02

# 3次元インスタンス分割のためのSAM誘導グラフカット

SAM-guided Graph Cut for 3D Instance Segmentation ( http://arxiv.org/abs/2312.08372v3 )

ライセンス: Link先を確認

Haoyu Guo, He Zhu, Sida Peng, Yuang Wang, Yujun Shen, Ruizhen Hu, Xiaowei Zhou,

(参考訳) 本稿では,3次元画像情報と多視点画像情報の同時利用による3次元インスタンス分割の課題に対処する。これまでの多くの研究は、3Dポイントクラウドにディープラーニング技術を適用して、例のセグメンテーションを行っている。しかし,これらの手法は,ラベル付き3Dポイントクラウドデータの不足と低多様性のため,様々な場面に一般化できなかった。最近のいくつかの作業では、ボトムアップフレームワーク内で2Dインスタンスのセグメンテーションを3Dに引き上げようとしている。ビュー間の2Dインスタンスセグメンテーションの不整合は、3Dセグメンテーションの性能を著しく低下させる。本研究では,3次元インスタンスセグメンテーションのための2次元セグメンテーションモデルを効果的に活用する新しい3D-to-2Dクエリフレームワークを提案する。具体的には、シーンを3次元のいくつかのスーパーポイントに事前分割し、タスクをグラフカット問題に定式化する。スーパーポイントグラフは2次元セグメンテーションモデルに基づいて構築され、マルチビュー画像特徴からノード特徴が得られ、エッジ重みがマルチビューセグメンテーション結果に基づいて計算され、より優れた一般化能力が得られる。グラフを処理するために、2Dセグメンテーションモデルから擬似3Dラベルを用いてグラフニューラルネットワークを訓練する。 ScanNet, ScanNet++, KITTI-360データセットによる実験結果から, 本手法がロバストなセグメンテーション性能を実現し, 異なるタイプのシーンにまたがって一般化可能であることが示された。私たちのプロジェクトページはhttps://zju3dv.github.io/sam_graph.comで公開されている。

This paper addresses the challenge of 3D instance segmentation by simultaneously leveraging 3D geometric and multi-view image information. Many previous works have applied deep learning techniques to 3D point clouds for instance segmentation. However, these methods often failed to generalize to various types of scenes due to the scarcity and low-diversity of labeled 3D point cloud data. Some recent works have attempted to lift 2D instance segmentations to 3D within a bottom-up framework. The inconsistency in 2D instance segmentations among views can substantially degrade the performance of 3D segmentation. In this work, we introduce a novel 3D-to-2D query framework to effectively exploit 2D segmentation models for 3D instance segmentation. Specifically, we pre-segment the scene into several superpoints in 3D, formulating the task into a graph cut problem. The superpoint graph is constructed based on 2D segmentation models, where node features are obtained from multi-view image features and edge weights are computed based on multi-view segmentation results, enabling the better generalization ability. To process the graph, we train a graph neural network using pseudo 3D labels from 2D segmentation models. Experimental results on the ScanNet, ScanNet++ and KITTI-360 datasets demonstrate that our method achieves robust segmentation performance and can generalize across different types of scenes. Our project page is available at https://zju3dv.github.io/sam_graph.

翻訳日:2024-08-05 18:43:16 公開日:2024-08-02

# ゼロ階プロセス忠実度を状態準備と測定誤差から独立にすること

Making the zeroth-order process fidelity independent of state preparation and measurement errors ( http://arxiv.org/abs/2312.08590v2 )

ライセンス: Link先を確認

Yu-Hao Chen, Renata Wong, Hsi-Sheng Goan,

(参考訳) 本研究では, プロセス忠実度に対する近似であるゼロ忠実度が, ランダム化ベンチマークと組み合わせると, 状態準備と測定(SPAM)誤差に対して頑健になることを示す。しかし、ランダム化されたベンチマークでは、量子ビット数が増加するとクリフォード群からより多くのクリフォード要素をランダムに選択する必要があるため、この組み合わせは最大3つの量子ビットを持つ量子系に制限される。 SPAMエラーとは無関係に、同時にマルチキュービットシステムにも適用できるようにするため、大域的ユニタリ折り畳み法や、量子誤差軽減に使用されるアイデンティティスケーリングと同様のチャネルノイズスケーリング手法を用いる。

In this work, we demonstrate that the zero-fidelity, an approximation to the process fidelity, when combined with randomized benchmarking, becomes robust to state preparation and measurement (SPAM) errors. However, as randomized benchmarking requires randomly choosing an increasingly large number of Clifford elements from the Clifford group when the qubit number increases, this combination is also limited to quantum systems with up to three qubits. To make the zero-fidelity independent of SPAM errors and, at the same time, applicable to multi-qubit systems, we employ a channel noise scaling method similar to the method of global unitary folding, or identity scaling, used for quantum error mitigation.

翻訳日:2024-08-05 18:43:16 公開日:2024-08-02

# 衝突と部分グラフの検出のインスタンス最適性について

On the instance optimality of detecting collisions and subgraphs ( http://arxiv.org/abs/2312.10196v2 )

ライセンス: Link先を確認

Omri Ben-Eliezer, Tomer Grossman, Moni Naor,

(参考訳) 関数 $f\colon [n] \to [n]$ via (black-box) query access to the function を指定します。衝突(ペア$x \neq y$ s.t. $f)のようなローカルなものを探し求めている。 (x)=f (y)$)。問題は、関数の「形」を知ることが、あなたを助けるかどうかである(形によって、関数のいくつかの置換が知られていることを意味する)。本稿では,グラフや関数のサブ構造検出問題のラベルなしインスタンス最適性について検討する。可能な入力に対して、$A$の(ランダム化された)クエリの複雑さが、任意のアルゴリズムのクエリの複雑さの少なくとも1倍の$g(n)$であることを満たすアルゴリズムを許容すると、$g(n)$-instance 最適である。この結果から,グラフや関数のサブ構造検出問題における未ラベルのインスタンス最適性の三分法が示唆された: 1. 非常に単純な性質のいくつかは,$O(1)$-instanceの最適アルゴリズムを持つ。 2. グラフと関数のほとんどの性質は、不動点や関数の3$コリションを含む例やグラフの三角形のような例は、インスタンス最適性から$n^{\Omega(1)}$-farである。 3. 関数における衝突検出の問題とグラフ内の爪の発見は, 両者の中間となる。この2つの性質は、インスタンス最適性から$\Omega(\log n)$-farであることを示し、この境界がきついことを予想する。この予想に対する証拠として、グラフ内の爪を見つけることは、未ラベル証明書を持つアルゴリズムのクエリ複雑性が$O\left(\sqrt{\frac{n}{\log n}}\right)$である全ての入力グラフの中で最適な$O(\log(n))$-instanceであることを示す。

Suppose you are given a function $f\colon [n] \to [n]$ via (black-box) query access to the function. You are looking to find something local, like a collision (a pair $x \neq y$ s.t. $f(x)=f(y)$). The question is whether knowing the "shape" of the function helps you or not (by shape we mean that some permutation of the function is known). Formally, we investigate the unlabeled instance optimality of substructure detection problems in graphs and functions. A problem is $g(n)$-instance optimal if it admits an algorithm $A$ satisfying that for any possible input, the (randomized) query complexity of $A$ is at most $g(n)$ times larger than the query complexity of any algorithm $A'$ which solves the same problem while holding an unlabeled copy of the input (i.e., any $A'$ that "knows the structure of the input"). Our results point to a trichotomy of unlabeled instance optimality among substructure detection problems in graphs and functions: 1. A few very simple properties have an $O(1)$-instance optimal algorithm. 2. Most properties of graphs and functions, with examples such as containing a fixed point or a $3$-collision in functions, or a triangle in graphs, are $n^{\Omega(1)}$-far from instance optimality. 3. The problems of collision detection in functions and finding a claw in a graph serve as a middle ground between the two regimes. We show that these two properties are $\Omega(\log n)$-far from instance optimality, and conjecture that this bound is tight. We provide evidence towards this conjecture, by proving that finding a claw in a graph is $O(\log(n))$-instance optimal among all input graphs for which the query complexity of an algorithm holding an unlabeled certificate is $O\left(\sqrt{\frac{n}{\log n}}\right)$.

翻訳日:2024-08-05 18:33:20 公開日:2024-08-02

# 6Gサブネットワークにおけるサブバンド配置のための教師なしグラフベース学習法

Unsupervised Graph-based Learning Method for Sub-band Allocation in 6G Subnetworks ( http://arxiv.org/abs/2401.00950v2 )

ライセンス: Link先を確認

Daniel Abode, Ramoni Adeogun, Lou Salaün, Renato Abreu, Thomas Jacobsen, Gilberto Berardinelli,

(参考訳) 本稿では,グラフ学習を用いた無線ネットワークにおける周波数サブバンド割り当ての教師なし手法を提案する。我々は、サブネットワーク間の干渉を調整するために最適に割り当てなければならないサブバンドの数が限られている工場環境におけるサブネットワークの密集配置について検討する。サブネットワーク配置をコンフリクトグラフとしてモデル化し,グラフカラーヒューリスティックとポッツモデルにインスパイアされた教師なし学習アプローチを提案し,グラフニューラルネットワークを用いたサブバンド割り当てを最適化する。数値評価により,提案手法は,計算時間の複雑度が低い集中グリーディーカラー化サブバンド割り当てヒューリスティックに密接な性能を実現することを示す。さらに、全ての相互干渉チャネル情報を必要とする反復的最適化ヒューリスティックと比べて、信号のオーバーヘッドを低減させる。さらに,本手法は異なるネットワーク設定に対して堅牢であることを示す。

In this paper, we present an unsupervised approach for frequency sub-band allocation in wireless networks using graph-based learning. We consider a dense deployment of subnetworks in the factory environment with a limited number of sub-bands which must be optimally allocated to coordinate inter-subnetwork interference. We model the subnetwork deployment as a conflict graph and propose an unsupervised learning approach inspired by the graph colouring heuristic and the Potts model to optimize the sub-band allocation using graph neural networks. The numerical evaluation shows that the proposed method achieves close performance to the centralized greedy colouring sub-band allocation heuristic with lower computational time complexity. In addition, it incurs reduced signalling overhead compared to iterative optimization heuristics that require all the mutual interfering channel information. We further demonstrate that the method is robust to different network settings.

翻訳日:2024-08-05 18:33:20 公開日:2024-08-02

# レート制限チャネルを用いたモデルフリーLQR制御に向けて

Towards Model-Free LQR Control over Rate-Limited Channels ( http://arxiv.org/abs/2401.01258v2 )

ライセンス: Link先を確認

Aritra Mitra, Lintao Ye, Vijay Gupta,

(参考訳) 多くの問題設定におけるモデルフリーな制御手法の成功を考えると、現実的な通信チャネルを勾配やポリシーの伝達に利用すれば、どう変わるのかを問うことは自然である。結果として生じる問題は、ネットワーク制御システムのルーリックの下で研究される定式化と類似しているが、その領域の豊かな文献は一般にシステムのモデルが知られていると仮定している。モデルフリー制御設計とネットワーク制御システムの分野を橋渡しするステップとして、リニア2次レギュレータ(LQR)問題のような基本的な制御問題を、レート制限チャネル上でモデルフリーに解決することは可能か? 作業者エージェントが(LQRコストの)量子化されたポリシー勾配を有限ビットレートのノイズレスチャネル上のサーバに送信する環境について検討する。そこで我々は,Adaptively Quantized Gradient Descent (\texttt{AQGD}) と題する新しいアルゴリズムを提案し,ある有限しきい値ビットレートを超えると,大域的最適ポリシーに対する指数関数的に高速な収束が保証され,指数関数が不等化設定に対して劣化することを証明する。より一般に、我々の手法は高速線形収束率の保存における適応量子化の利点を明らかにし、圧縮最適化に関する文献には独立した関心を持つ可能性がある。

Given the success of model-free methods for control design in many problem settings, it is natural to ask how things will change if realistic communication channels are utilized for the transmission of gradients or policies. While the resulting problem has analogies with the formulations studied under the rubric of networked control systems, the rich literature in that area has typically assumed that the model of the system is known. As a step towards bridging the fields of model-free control design and networked control systems, we ask: \textit{Is it possible to solve basic control problems - such as the linear quadratic regulator (LQR) problem - in a model-free manner over a rate-limited channel?} Toward answering this question, we study a setting where a worker agent transmits quantized policy gradients (of the LQR cost) to a server over a noiseless channel with a finite bit-rate. We propose a new algorithm titled Adaptively Quantized Gradient Descent (\texttt{AQGD}), and prove that above a certain finite threshold bit-rate, \texttt{AQGD} guarantees exponentially fast convergence to the globally optimal policy, with \textit{no deterioration of the exponent relative to the unquantized setting}. More generally, our approach reveals the benefits of adaptive quantization in preserving fast linear convergence rates, and, as such, may be of independent interest to the literature on compressed optimization.

翻訳日:2024-08-05 18:33:20 公開日:2024-08-02

# OCLと検索によるMC/DCの効率的なテストデータ生成

Efficient Test Data Generation for MC/DC with OCL and Search ( http://arxiv.org/abs/2401.03469v3 )

ライセンス: Link先を確認

Hassan Sartaj, Muhammad Zohaib Iqbal, Atif Aftab Ahmed Jilani, Muhammad Uzair Khan,

(参考訳) アビオニクスのソフトウェアシステムのシステムレベルのテストは、DO-178Cのような異なる国際安全基準に準拠する必要がある。アビオニクス産業の重要な考慮事項は、安全基準によって提案される基準に従って自動テストデータ生成である。 DO-178Cの推奨基準の1つは、修正条件/決定カバレッジ(MC/DC)基準である。現在のモデルベースのテストデータ生成アプローチでは、Object Constraint Language(OCL)で記述された制約を使用し、テストデータを生成するために検索技術を適用します。これらのアプローチはMC/DC基準をサポートしないか、大規模アビオニクスシステムのテストデータを生成する際にパフォーマンス上の問題に悩まされる。本稿では,モデルベーステストにおいてMC/DCテストデータの自動生成を効果的に行う方法を提案する。ケースベース推論 (CBR) と範囲縮小ヒューリスティックスを用いて, MC/DC に適合した OCL 制約を解く手法を開発した。我々は,CBRを用いたMC/DCテストデータ生成のための提案手法と,CBRと範囲縮小の双方を,元の探索アルゴリズムとランダム検索と比較する実験的検討を行った。また、我々の戦略を既存の制約解決アプローチと経験的に比較した。その結果, MC/DCテストデータ生成におけるCBRと範囲の低減は, ベースライン法よりも優れていた。さらに, MC/DCテストデータ生成におけるCBRと範囲削減の組み合わせは, 既存の制約解法と比較して有効である。

System-level testing of avionics software systems requires compliance with different international safety standards such as DO-178C. An important consideration of the avionics industry is automated test data generation according to the criteria suggested by safety standards. One of the recommended criteria by DO-178C is the modified condition/decision coverage (MC/DC) criterion. The current model-based test data generation approaches use constraints written in Object Constraint Language (OCL), and apply search techniques to generate test data. These approaches either do not support MC/DC criterion or suffer from performance issues while generating test data for large-scale avionics systems. In this paper, we propose an effective way to automate MC/DC test data generation during model-based testing. We develop a strategy that utilizes case-based reasoning (CBR) and range reduction heuristics designed to solve MC/DC-tailored OCL constraints. We performed an empirical study to compare our proposed strategy for MC/DC test data generation using CBR, range reduction, both CBR and range reduction, with an original search algorithm, and random search. We also empirically compared our strategy with existing constraint-solving approaches. The results show that both CBR and range reduction for MC/DC test data generation outperform the baseline approach. Moreover, the combination of both CBR and range reduction for MC/DC test data generation is an effective approach compared to existing constraint solvers.

翻訳日:2024-08-05 18:33:20 公開日:2024-08-02

# MERA:ロシアにおける総合的なLCM評価

MERA: A Comprehensive LLM Evaluation in Russian ( http://arxiv.org/abs/2401.04531v3 )

ライセンス: Link先を確認

Alena Fenogenova, Artem Chervyakov, Nikita Martynov, Anastasia Kozlova, Maria Tikhonova, Albina Akhmetgareeva, Anton Emelyanov, Denis Shevelev, Pavel Lebedev, Leonid Sinev, Ulyana Isaeva, Katerina Kolomeytseva, Daniil Moskovskiy, Elizaveta Goncharova, Nikita Savushkin, Polina Mikhailova, Denis Dimitrov, Alexander Panchenko, Sergei Markov,

(参考訳) 過去数年間、AI研究の最も顕著な進歩の1つは、基礎モデル(FM)であり、言語モデル(LM)の台頭に基づいている。モデルのサイズが大きくなるにつれて、LMは測定可能な側面の強化と新しい定性的特徴の開発を示す。しかし、研究者の注意とLM応用の急速な成長にもかかわらず、その能力、限界、関連するリスクをよりよく理解する必要がある。これらの課題に対処するために,ロシア語を指向した基礎モデルを評価するための新しいインストラクション・ベンチマークである,オープンなマルチモーダル・アセスメント・オブ・ロシア・アーキテクチャ(MERA)を導入する。このベンチマークは、11のスキルドメインで生成モデルを評価する21のタスクを含み、データ漏洩の排除を保証するブラックボックステストとして設計されている。本稿では,FMとLMを,他のモードに拡張可能なゼロおよび少数ショットの固定命令設定で評価する手法を提案する。本稿では,評価手法,MERA評価のためのオープンソースコードベース,提案システムを備えたリーダボードを提案する。オープンなLMをベースラインとして評価し,人間のレベルをはるかに下回っていることを確認した。我々はMERAを公開し、今後の研究をガイドし、グラウンディングモデルの特徴を予測し、評価手順を標準化し、潜在的な社会的欠点に対処する。

Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). As the models' size increases, LMs demonstrate enhancements in measurable aspects and the development of new qualitative features. However, despite researchers' attention and the rapid growth in LM application, the capabilities, limitations, and associated risks still need to be better understood. To address these issues, we introduce an open Multimodal Evaluation of Russian-language Architectures (MERA), a new instruction benchmark for evaluating foundation models oriented towards the Russian language. The benchmark encompasses 21 evaluation tasks for generative models in 11 skill domains and is designed as a black-box test to ensure the exclusion of data leakage. The paper introduces a methodology to evaluate FMs and LMs in zero- and few-shot fixed instruction settings that can be extended to other modalities. We propose an evaluation methodology, an open-source code base for the MERA assessment, and a leaderboard with a submission system. We evaluate open LMs as baselines and find that they are still far behind the human level. We publicly release MERA to guide forthcoming research, anticipate groundbreaking model features, standardize the evaluation procedure, and address potential societal drawbacks.

翻訳日:2024-08-05 18:33:20 公開日:2024-08-02

# エッジコンピューティングを実現するブロックチェーンを用いたIoTのセキュアなターゲットメッセージ拡散

Secure Targeted Message Dissemination in IoT Using Blockchain Enabled Edge Computing ( http://arxiv.org/abs/2401.06384v2 )

ライセンス: Link先を確認

Muhammad Baqer Mollah, Md Abul Kalam Azad, Yinghui Zhang,

(参考訳) スマートデバイスはIoT(Internet of Things)の不可欠な部分と見なされており、情報交換、データ収集、分析、自律的な方法で最適な決定を行い、より効率的で自動的で経済的なサービスを実現するための動的ネットワークの実現を目的としている。これらのスマートデバイス間のメッセージの拡散により、新しい機能の追加、更新された命令、アラートまたは安全メッセージの送信、価格情報または請求金額の通知、インセンティブ、セキュリティパッチのインストールが可能になる。一方、このようなメッセージの拡散は、IoTシステムに関わるすべての関係者にとって直接的に有利である。一方、リモートプロシージャのため、スマートデバイス、ベンダー、その他の関係機関は、ターゲットデバイス間でメッセージを発信しながら、セキュリティ、プライバシ、パフォーマンスに関する多くの懸念を満たす必要があるかもしれない。そこで本論文では,IoTにおけるセキュリティとプライバシを意識したターゲットメッセージの普及を目的としたSTarEdgeChainを設計し,ブロックチェーンと高度な暗号化技術が,このような問題に対処するためにどのように取り組まれているかを示す。実際、STarEdgeChainは、ターゲットとするデバイスグループ間でシングルサイン暗号化されたメッセージの拡散を迅速化すると同時に、複数のユニカッティングアプローチを使用する依存関係を回避するために、認可されたブロックチェーン支援エッジコンピューティングを使用している。最後に,STarEdgeChainのプロトタイプを開発し,スマートデバイスの実用性を示す。コードはhttps://github.com/mbaqer/Blockchain-IoTで公開されている。

Smart devices are considered as an integral part of Internet of Things (IoT), have an aim to make a dynamic network to exchange information, collect data, analysis, and make optimal decisions in an autonomous way to achieve more efficient, automatic, and economical services. Message dissemination among these smart devices allows adding new features, sending updated instructions, alerts or safety messages, informing the pricing information or billing amount, incentives, and installing security patches. On one hand, such message disseminations are directly beneficial to the all parties involved in the IoT system. On the other hand, due to remote procedure, smart devices, vendors, and other involved authorities might have to meet a number of security, privacy, and performance related concerns while disseminating messages among targeted devices. To this end, in this paper, we design STarEdgeChain, a security and privacy aware targeted message dissemination in IoT to show how blockchain along with advanced cryptographic techniques are devoted to address such concerns. In fact, the STarEdgeChain employs a permissioned blockchain assisted edge computing in order to expedite a single signcrypted message dissemination among targeted groups of devices, at the same time avoiding the dependency of utilizing multiple unicasting approaches. Finally, we develop a software prototype of STarEdgeChain and show it's practicability for smart devices. The codes are publicly available at https://github.com/mbaqer/Blockchain-IoT

翻訳日:2024-08-05 18:33:20 公開日:2024-08-02

# 自律走行におけるロバスト性を考慮した3次元物体検出:展望と展望

Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook ( http://arxiv.org/abs/2401.06542v2 )

ライセンス: Link先を確認

Ziying Song, Lin Liu, Feiyang Jia, Yadan Luo, Guoxin Zhang, Lei Yang, Li Wang, Caiyan Jia,

(参考訳) 現代の自律運転の領域では、周囲環境の状態を正確に評価するためには認識システムが不可欠であり、情報予測と計画を可能にする。このシステムの重要なステップは、LiDARやカメラなどの車両に搭載されたセンサーを使って、近くの物体のサイズ、カテゴリ、位置を識別する3Dオブジェクト検出に関連している。検出精度と効率性の向上を目的とした3次元物体検出手法の急増にもかかわらず, 環境変動, 騒音, 気象変化に対する耐性を系統的に検討する文献のギャップがある。本研究は,現実シナリオ下での知覚システム評価において,精度と遅延とともに頑健性の重要性を強調した。我々の研究は、カメラのみ、LiDARのみ、マルチモーダルな3Dオブジェクト検出アルゴリズムを広範囲に調査し、精度、レイテンシ、堅牢性の間のトレードオフを、特にKITTI-CやnuScenes-Cのようなデータセットで徹底的に評価し、公正な比較を保証する。これらのうち、多モード3D検出手法は優れた堅牢性を示し、新しい分類法を導入して、文献の明瞭性を高めるために再編成する。本調査は、現実のアプリケーションにおける3次元オブジェクト検出アルゴリズムの現在の機能と制約について、より実用的な視点を提供することを目的としている。

In the realm of modern autonomous driving, the perception system is indispensable for accurately assessing the state of the surrounding environment, thereby enabling informed prediction and planning. The key step to this system is related to 3D object detection that utilizes vehicle-mounted sensors such as LiDAR and cameras to identify the size, the category, and the location of nearby objects. Despite the surge in 3D object detection methods aimed at enhancing detection precision and efficiency, there is a gap in the literature that systematically examines their resilience against environmental variations, noise, and weather changes. This study emphasizes the importance of robustness, alongside accuracy and latency, in evaluating perception systems under practical scenarios. Our work presents an extensive survey of camera-only, LiDAR-only, and multi-modal 3D object detection algorithms, thoroughly evaluating their trade-off between accuracy, latency, and robustness, particularly on datasets like KITTI-C and nuScenes-C to ensure fair comparisons. Among these, multi-modal 3D detection approaches exhibit superior robustness, and a novel taxonomy is introduced to reorganize the literature for enhanced clarity. This survey aims to offer a more practical perspective on the current capabilities and the constraints of 3D object detection algorithms in real-world applications, thus steering future research towards robustness-centric advancements.

翻訳日:2024-08-05 18:33:20 公開日:2024-08-02

# PT対称量子系における量子カオス

Quantum chaos in PT symmetric quantum systems ( http://arxiv.org/abs/2401.07215v2 )

ライセンス: Link先を確認

Kshitij Sharma, Himanshu Sahu, Subroto Mukerjee,

(参考訳) 本研究では,非エルミート力学系における$\mathcal{PT}$対称性と量子カオスの相互作用について検討する。量子カオスの標準的な診断、すなわち複素レベル間隔比と時間外順序相関器(OTOC)を拡張して、$\mathcal{PT}$-symmetric 量子キックロータモデルについて検討する。キックローターは、古典的および量子的カオスを研究するためのパラダイム力学システムとして長い間見なされてきた。量子キックローターに非ハーミティシティを導入することで、エルミート系に存在しない新しい位相と遷移を明らかにする。複素レベル間隔比の研究から、積分可能かつ$\mathcal{PT}$対称性(英語版)、$\mathcal{PT}$対称性(英語版)とカオス的であるが$\mathcal{PT}$対称性(英語版)が破れた3つの状態を見つける。複素レベル間隔比は3つの相を区別できることがわかった。 OTOCの計算は、半古典的極限における古典的リャプノフ指数の計算と関係があるので、これらの状態と位相境界におけるその性質について検討する。 $\mathcal{PT}$-対称性の位相において、OTOCは積分可能およびカオス的状態の両方においてエルミート系で観察されるものに似た振舞いを示す。さらに、$\mathcal{PT}$-対称性の破れ相において、OTOCは後代の固有値スペクトルの複素性質から生じる追加の指数的成長を示す。我々はOTOCの深夜動作の分析形式を導出する。正規化OTOCを定義して、$\mathcal{PT}$-対称性の破れによる影響を軽減することにより、OTOCは$\mathcal{PT}$-対称性のカオス相から$\mathcal{PT}$-対称性の破れ、カオス相への遷移において特異な挙動を示すことを示す。

In this study, we explore the interplay between $\mathcal{PT}$-symmetry and quantum chaos in a non-Hermitian dynamical system. We consider an extension of the standard diagnostics of quantum chaos, namely the complex level spacing ratio and out-of-time-ordered correlators (OTOCs), to study the $\mathcal{PT}$-symmetric quantum kicked rotor model. The kicked rotor has long been regarded as a paradigmatic dynamic system to study classical and quantum chaos. By introducing non-Hermiticity in the quantum kicked rotor, we uncover new phases and transitions that are absent in the Hermitian system. From the study of the complex level spacing ratio, we locate three regimes -- one which is integrable and $\mathcal{PT}$-symmetry, another which is chaotic with $\mathcal{PT}$-symmetry and a third which is chaotic but with broken $\mathcal{PT}$-symmetry. We find that the complex level spacing ratio can distinguish between all three phases. Since calculations of the OTOC can be related to those of the classical Lyapunov exponent in the semi-classical limit, we investigate its nature in these regimes and at the phase boundaries. In the phases with $\mathcal{PT}$-symmetry, the OTOC exhibits behaviour akin to what is observed in the Hermitian system in both the integrable and chaotic regimes. Moreover, in the $\mathcal{PT}$-symmetry broken phase, the OTOC demonstrates additional exponential growth stemming from the complex nature of the eigenvalue spectrum at later times. We derive the analytical form of the late-time behaviour of the OTOC. By defining a normalized OTOC to mitigate the effects caused by $\mathcal{PT}$-symmetry breaking, we show that the OTOC exhibits singular behaviour at the transition from the $\mathcal{PT}$-symmetric chaotic phase to the $\mathcal{PT}$-symmetry broken, chaotic phase.

翻訳日:2024-08-05 18:33:20 公開日:2024-08-02

# ロジスティックマップを用いた貯留層計算

Reservoir computing with logistic map ( http://arxiv.org/abs/2401.09501v2 )

ライセンス: Link先を確認

R. Arun, M. Sathish Aravindh, A. Venkatesan, M. Lakshmanan,

(参考訳) 貯水池計算の最近の研究は、時間的および非時間的データ処理のための高次元状態として入力を変換し保存する貯水池としての高次元力学系を本質的に含んでいる。ここでは、非線形写像、すなわちロジスティック写像と単純な有限三角級数を用いて、貯水池計算における貯水池を構成する仮想ノードを構成することにより、時間的および非時間的タスクを予測する方法を示す。時間的タスクに対してはLorenz, Rossler, Hindmarsh-Roseの3つの非線形系を予測し, 高精度な非時間的タスクに対しては7次多項式を推定する。また、予測はノイズの存在下で行われ、ターゲットと密接に一致していることがわかった。注目すべきは、ロジスティックマップがうまく機能し、実際の値や対象値に近いものを予測することである。根平均二乗誤差の低い値は,効率性の観点から,この手法の精度を確認した。本手法は貯水池計算における貯水池構築における連続力学系の必要性を解消するものである。さらに、3つの異なる非線形系の正確な予測は、この手法を一般的なものとみなすことができ、多くの系を予測できることを示している。最後に、この手法は将来のロスラー系の3変数の時系列を正確に予測する(自己予測)。

Recent studies on reservoir computing essentially involve a high dimensional dynamical system as the reservoir, which transforms and stores the input as a higher dimensional state, for temporal and nontemporal data processing. We demonstrate here a method to predict temporal and nontemporal tasks by constructing virtual nodes as constituting a reservoir in reservoir computing using a nonlinear map, namely the logistic map, and a simple finite trigonometric series. We predict three nonlinear systems, namely Lorenz, Rossler, and Hindmarsh-Rose, for temporal tasks and a seventh order polynomial for nontemporal tasks with great accuracy. Also, the prediction is made in the presence of noise and found to closely agree with the target. Remarkably, the logistic map performs well and predicts close to the actual or target values. The low values of the root mean square error confirm the accuracy of this method in terms of efficiency. Our approach removes the necessity of continuous dynamical systems for constructing the reservoir in reservoir computing. Moreover, the accurate prediction for the three different nonlinear systems suggests that this method can be considered a general one and can be applied to predict many systems. Finally, we show that the method also accurately anticipates the time series of the all the three variable of Rossler system for the future (self prediction).

翻訳日:2024-08-05 18:33:20 公開日:2024-08-02

# 微粒なシーングラフ生成のための適応的自己学習フレームワーク

Adaptive Self-training Framework for Fine-grained Scene Graph Generation ( http://arxiv.org/abs/2401.09786v5 )

ライセンス: Link先を確認

Kibum Kim, Kanghoon Yoon, Yeonjun In, Jinyoung Moon, Donghyun Kim, Chanyoung Park,

(参考訳) シーングラフ生成(SGG)モデルは、長い尾の述語分布やアノテーションの欠如といったベンチマークデータセットに固有の問題に悩まされている。本研究では, 注釈のない三つ子を用いて, SGGの長期化問題を緩和することを目的とする。そこで本研究では,SGGモデルがトレーニングされている無注釈三重項に対して擬似ラベルを割り当てる,SGG(ST-SGG)の自己評価フレームワークを提案する。画像認識のための自己学習には大きな進歩があったが、意味的あいまいさや述語クラスの長期分布といった固有の性質から、SGGタスクのための自己学習フレームワークを設計することはより困難である。そこで本研究では,既存のSGGモデルに適用可能なモデルに依存しないフレームワークであるClass-specific Adaptive Thresholding with Momentum (CATM)を提案する。さらに,提案する自己学習フレームワークをMPNNベースのSGGモデルに導入する際に有用なグラフ構造学習器(GSL)を考案した。各種SGGモデルにおけるST-SGGの有効性を検証し,特に細粒度述語クラスの性能向上について検討した。

Scene graph generation (SGG) models have suffered from inherent problems regarding the benchmark datasets such as the long-tailed predicate distribution and missing annotation problems. In this work, we aim to alleviate the long-tailed problem of SGG by utilizing unannotated triplets. To this end, we introduce a Self-Training framework for SGG (ST-SGG) that assigns pseudo-labels for unannotated triplets based on which the SGG models are trained. While there has been significant progress in self-training for image recognition, designing a self-training framework for the SGG task is more challenging due to its inherent nature such as the semantic ambiguity and the long-tailed distribution of predicate classes. Hence, we propose a novel pseudo-labeling technique for SGG, called Class-specific Adaptive Thresholding with Momentum (CATM), which is a model-agnostic framework that can be applied to any existing SGG models. Furthermore, we devise a graph structure learner (GSL) that is beneficial when adopting our proposed self-training framework to the state-of-the-art message-passing neural network (MPNN)-based SGG models. Our extensive experiments verify the effectiveness of ST-SGG on various SGG models, particularly in enhancing the performance on fine-grained predicate classes.

翻訳日:2024-08-05 18:33:20 公開日:2024-08-02

# 非対称量子井戸におけるクーパーペア組換えによる絡み合った光子生成

Entangled Photon Generation through Cooper Pair Recombination in a Noncentrosymmetric Quantum Well ( http://arxiv.org/abs/2401.11577v2 )

ライセンス: Link先を確認

Mehdi Biderang, Erfan Hosseini, Alireza Akbari,

(参考訳) 我々は,非セントロ対称[001]量子井戸超伝導体におけるクーパー対再結合による絡み合った2光子対の生成を,アダクチャRashbaとDresselhausのスピン軌道結合を示す超伝導層との前方バイアスp-n接合により理論的に検討した。我々は、純粋な一重項クーパー対、特に従来の$s$-waveギャップ関数を含むシナリオの中で、最も達成可能な光子対の純度が現れることを示す。以上の結果から,Rashba と Dresselhaus のスピン軌道カップリングの大きさのバランスをとることで,反対称スピン軌道カップリングの振幅を小さくすることで,高純度で絡み合った状態を実現することが重要である。純度に関する懸念に加えて、2光子状態の分布を探索するために、絡み合ったペア間でそれらの個体群を比較して、潜在的な超伝導ペアを作る。

We explore theoretically the generation of entangled two-photon pairs by Cooper pair recombination in a noncentrosymmetric [001]-quantum well superconductor, driven by a forward-biased p-n junction with a superconducting layer which exhibits admixture Rashba and Dresselhaus spin-orbit couplings. We show that the highest achievable purity of entangled photon pairs emerges within scenarios involving pure singlet Cooper pairs, specifically, the conventional $s$-wave gap function. Our results highlight the importance of minimizing the charge-carrier level concentration and balancing the magnitudes of Rashba and Dresselhaus spin-orbit couplings to achieve entangled states with enhanced purity, which can be realized by reducing the amplitudes of antisymmetric spin-orbit couplings. In addition to purity concerns, to explore the distribution of two-photon states, we compare their population across entangled pairs for potential superconducting pairings.

翻訳日:2024-08-05 18:33:20 公開日:2024-08-02

# 微分可能木探索網

Differentiable Tree Search Network ( http://arxiv.org/abs/2401.11660v2 )

ライセンス: Link先を確認

Dixant Mittal, Wee Sun Lee,

(参考訳) 訓練データに制限のある意思決定問題では、ディープニューラルネットワークを用いて近似されたポリシー関数は、しばしば準最適性能を示す。別のアプローチでは、限られたデータから世界モデルを学び、オンライン検索を通じて行動を決定する。しかし, 学習世界モデルにおける不正確性に起因する複合的誤りにより, 性能に悪影響を及ぼす。 TreeQNのような手法は、ニューラルネットワークアーキテクチャにアルゴリズム的帰納バイアスを組み込むことで、これらの不正確な問題に対処しようとしているが、彼らが導入するバイアスはしばしば弱く、複雑な意思決定タスクには不十分である。本研究では,最も優れたオンライン検索アルゴリズムのアルゴリズム構造を組み込むことで,帰納的バイアスを大幅に強化するニューラルネットワークアーキテクチャである微分可能木探索ネットワーク(D-TSN)を紹介する。 D-TSNは、完全に差別化可能なオンライン検索を行うために、学習された世界モデルを採用している。世界モデルは検索アルゴリズムと協調的に最適化され、堅牢な世界モデルの学習を可能にし、予測不正確性の影響を緩和する。さらに、最優先探索の素早い組み込みにより、パラメータ空間における不連続損失関数がもたらされる可能性があることに留意する。本稿では、確率木拡張ポリシーを採用し、探索木拡張を別の意思決定課題として定式化し、勾配計算に有効な分散低減手法を導入することでこの問題に対処する。我々は,D-TSNを,ゲームやグリッドナビゲーションタスクにおいて限られたトレーニングデータシナリオでオフラインRLで評価し,D-TSNが一般的なモデルフリーおよびモデルベースラインより優れていることを示す。

In decision-making problems with limited training data, policy functions approximated using deep neural networks often exhibit suboptimal performance. An alternative approach involves learning a world model from the limited data and determining actions through online search. However, the performance is adversely affected by compounding errors arising from inaccuracies in the learned world model. While methods like TreeQN have attempted to address these inaccuracies by incorporating algorithmic inductive biases into the neural network architectures, the biases they introduce are often weak and insufficient for complex decision-making tasks. In this work, we introduce Differentiable Tree Search Network (D-TSN), a novel neural network architecture that significantly strengthens the inductive bias by embedding the algorithmic structure of a best-first online search algorithm. D-TSN employs a learned world model to conduct a fully differentiable online search. The world model is jointly optimized with the search algorithm, enabling the learning of a robust world model and mitigating the effect of prediction inaccuracies. Further, we note that a naive incorporation of best-first search could lead to a discontinuous loss function in the parameter space. We address this issue by adopting a stochastic tree expansion policy, formulating search tree expansion as another decision-making task, and introducing an effective variance reduction technique for the gradient computation. We evaluate D-TSN in an offline-RL setting with a limited training data scenario on Procgen games and grid navigation task, and demonstrate that D-TSN outperforms popular model-free and model-based baselines.

翻訳日:2024-08-05 18:33:20 公開日:2024-08-02

# Routoo: 大規模言語モデルへのルートを効果的に学ぶ

Routoo: Learning to Route to Large Language Models Effectively ( http://arxiv.org/abs/2401.13979v2 )

ライセンス: Link先を確認

Alireza Mohammadshahi, Arshad Rafiq Shaikh, Majid Yazdani,

(参考訳) 基盤となる大規模言語モデル(LLM)の開発は、ますますコストがかかり非効率になりつつある。また、クローズドソースおよびより大きなオープンソースモデルは、一般的により優れたレスポンス品質を提供するが、より小さなモデルよりも推論コストが高い。本稿では,性能,コスト,効率に基づいて,特定のプロンプトに対してLLMの選択を最適化するアーキテクチャであるRoutooを紹介する。 Routooは2つの重要なコンポーネントで構成されている。性能予測器は軽量なLCMであり、様々なLCMの性能を評価・実行することなく推定する。コストを意識したデコーディングは、これらの予測とコストやレイテンシといった他の制約に基づいて、最も適切なモデルを選択する。オープンソースモデルを用いた57領域にわたるMMLUベンチマークを用いて,rutooの評価を行った。その結果,RoutooはMixtral 8x7bモデルの性能と一致し,推論コストを3分の1削減できることがわかった。さらに、コストの増加を許すことで、RutooはMixtralの精度を5%以上上回り、75.9%の精度を達成している。モデルプールにGPT4を組み込む場合、RutooはGPT4の性能を半分のコストでほぼ一致させ、25%のコスト削減でそれを上回ります。これらの結果から,複数のLSMの集合的知識を活用することで,新しいSOTAを低コストで作成できる可能性が浮かび上がっている。

Developing foundational large language models (LLMs) is becoming increasingly costly and inefficient. Also, closed-source and larger open-source models generally offer better response quality but come with higher inference costs than smaller models. In this paper, we introduce Routoo, an architecture designed to optimize the selection of LLMs for specific prompts based on performance, cost, and efficiency. Routoo consists of two key components: a performance predictor and a cost-aware decoding. The performance predictor is a lightweight LLM that estimates the performance of various underlying LLMs without needing to execute and evaluate them. The cost-aware decoding then selects the most suitable model based on these predictions and other constraints like cost and latency. We evaluated Routoo using the MMLU benchmark across 57 domains employing open-source models. Our results show that Routoo matches the performance of the Mixtral 8x7b model while reducing inference costs by one-third. Additionally, by allowing increased costs, Routoo surpasses Mixtral's accuracy by over 5% at equivalent costs, achieving an accuracy of 75.9%. When integrating GPT4 into our model pool, Routoo nearly matches GPT4's performance at half the cost and exceeds it with a 25% cost reduction. These outcomes highlight Routoo's potential to create new SOTA in a cost-effective manner by leveraging the collective knowledge of multiple LLMs.

翻訳日:2024-08-05 18:23:26 公開日:2024-08-02

# 格子誘起波動関数が捕捉された超流動体に及ぼす影響

Lattice-induced wavefunction effects on trapped superfluids ( http://arxiv.org/abs/2401.14004v3 )

ライセンス: Link先を確認

Yeyang Zhang,

(参考訳) 非相関系の波動関数効果はベリー曲率と量子計量によって特徴づけられる。さらに、相関粒子間の局所的相互作用に対するブロッホ波動関数効果を記述するゲージ独立テンソルを提案する。光学格子中の超低温ボソンに対する有効流体力学理論を導出する。高対称格子に対して等方性調和トラップの基底状態と超流動の集合モードを解く。動的過程において、波動関数効果は励起呼吸モードの固有周波数、振幅、位相シフトによって特徴づけられ、実験で観察できる。また、非自明な波動関数効果を持つ二部格子の密結合モデルを与え、その結果を典型的な実験パラメータで推定する。我々の発見は、現代のバンド理論と量子多体物理学のつながりを前進させる。

Wavefunction effects in uncorrelated systems are characterized by the Berry curvature and quantum metric. Beyond those, we propose gauge-independent tensors describing Bloch wavefunction effects on local interaction between correlated particles. We derive an effective hydrodynamic theory for ultracold bosons in optical lattices. Ground states and collective modes of superfluids in isotropic harmonic traps are solved for highly symmetric lattices. In a dynamic process, the wavefunction effects are featured by the eigenfrequency, amplitude, and phase shift of an excited breathing mode and can be observed in experiments. We also give a tight-binding model of a bipartite square lattice with nontrivial wavefunction effects, where results are estimated with typical experimental parameters. Our discovery advances the connections between the modern band theory and quantum many-body physics.

翻訳日:2024-08-05 18:23:26 公開日:2024-08-02

# 遺伝的・スライディングウインドウアルゴリズムを用いた空間・時間変動係数を用いた新型COVID-19 SEIRモデルの校正フレームワーク

A new framework for calibrating COVID-19 SEIR models with spatial-/time-varying coefficients using genetic and sliding window algorithms ( http://arxiv.org/abs/2402.08524v2 )

ライセンス: Link先を確認

Huan Zhou, Ralf Schneider,

(参考訳) サセプティブル感染除去モデル(SEIR)は、非薬剤的介入(NPI)がCOVID-19の流行の地域的および時間的分布に与える影響をモデル化するために、空間的・時間的に異なる係数を仮定する。このようなモデルを使用する上で重要な課題は、地理的に参照された入院データ、すなわち空間的/時間的変動パラメータの効率的な推定から得られたデータに対する高速かつ正確な校正である。本研究では,SEIRモデルの空間的/時間的パラメータを最適化するための新しい校正フレームワークを提案する。また、重なり合うスライディングウインドウ手法(OSW)と遺伝的アルゴリズム(GA)キャリブレーションルーチンを組み合わせ、セグメント化されたパラメータ空間を自動的に検索する手法も考案した。並列化GAは計算負担を軽減するために使用される。我々のフレームワークは、メソッドの実装の複雑さをユーザから切り離して抽象化します。カスタマイズされたキャリブレーションシステムを設定し、パラメータの最適化された値を使用するための高レベルのAPIを提供する。本手法の適用例を,COVID-19関連ICU需要を観測した単一目的関数を用いて,空間年齢構造マイクロシミュレーションモデルのキャリブレーションについて検討した。その結果, 提案手法の有効性を反映し, 変動環境におけるパラメータの推定を行った。

A susceptible-exposed-infected-removed (SEIR) model assumes spatial-/time-varying coefficients to model the effect of non-pharmaceutical interventions (NPIs) on the regional and temporal distribution of COVID-19 disease epidemics. A significant challenge in using such model is their fast and accurate calibration to observed data from geo-referenced hospitalized data, i.e., efficient estimation of the spatial-/time-varying parameters. In this work, a new calibration framework is proposed towards optimizing the spatial-/time-varying parameters of the SEIR model. We also devise a method for combing the overlapping sliding window technique (OSW) with a genetic algorithm (GA) calibration routine to automatically search the segmented parameter space. Parallelized GA is used to reduce the computational burden. Our framework abstracts the implementation complexity of the method away from the user. It provides high-level APIs for setting up a customized calibration system and consuming the optimized values of parameters. We evaluated the application of our method on the calibration of a spatial age-structured microsimulation model using a single objective function that comprises observed COVID-19-related ICU demand. The results reflect the effectiveness of the proposed method towards estimating the parameters in a changing environment.

翻訳日:2024-08-05 18:23:26 公開日:2024-08-02

# 非局在量子源からのニュートンポテンシャルを超えた重力における量子効果

Quantum effects in gravity beyond the Newton potential from a delocalised quantum source ( http://arxiv.org/abs/2402.10288v2 )

ライセンス: Link先を確認

Lin-Qing Chen, Flaminia Giacomini,

(参考訳) テーブルトップ実験の最近の進歩は、重力が古典的な記述と互換性がないことを示す機会となる。重力の2つの量子源間の重力誘起絡み合いの生成のような現在の実験では、重力効果はニュートンポテンシャルと説明できる。したがって、この効果のニュートンの起源は、これらの実験から得られる重力の性質に関する結論の限界である。ここでは、この制限を克服する2つの効果を同定する。ニュートンポテンシャルを用いて再生することはできず、重力放出とは無関係である。まず、広ガウス状態の2つの一般量子源間の相互作用はニュートンポテンシャルや既知の古典理論や重力では再現できないことを示す。したがって、この相互作用の形式を観察するには、古典的な重力や量子的記述の変更が必要になる。第二に、重力場とその正準共役運動量の間の量子可換器が、試験粒子と相互作用する一般量子源の相対位相において追加用語として現れることを示す。この項を位相で観測することは、量子メディエーターとしての重力場のテストである。ニュートンポテンシャルで再現できるものよりも強い重力の量子的側面を特定することは、重力場の非古典性を証明し、これまで提案されたよりも広い意味で重力の量子的側面をテストする新しい世代の実験を計画するために重要である。

Recent progress in table-top experiments offers the opportunity to show for the first time that gravity is not compatible with a classical description. In all current experimental proposals, such as the generation of gravitationally induced entanglement between two quantum sources of gravity, gravitational effects can be explained with the Newton potential, namely in a regime that is consistent with the weak-field limit of general relativity and does not probe the field nature of gravity. Hence, the Newtonian origin of the effects is a limitation to the conclusions on the nature of gravity that can be drawn from these experiments. Here, we identify two effects that overcome this limitation: they cannot be reproduced using the Newton potential and are independent of graviton emission. First, we show that the interaction between two generic quantum sources of gravity, e.g. in wide Gaussian states, cannot be reproduced with the Newton potential nor with a known classical theory or gravity. Hence, observing the form of this interaction would require either a modification to classical gravity or its quantum description. Second, we show that the quantum commutator between the gravitational field and its canonically conjugate momentum appears as an additional term in the relative phase of a generic quantum source interacting with a test particle. Observing this term in the phase would be a test of the gravitational field as a quantum mediator. Identifying stronger quantum aspects of gravity than those reproducible with the Newton potential is crucial to prove the nonclassicality of the gravitational field and to plan a new generation of experiments testing quantum aspects of gravity in a broader sense than what proposed so far.

翻訳日:2024-08-05 18:23:26 公開日:2024-08-02

# AIM: メタモルフィックセキュリティテストのための入力セットの最小化を自動化する

AIM: Automated Input Set Minimization for Metamorphic Security Testing ( http://arxiv.org/abs/2402.10773v3 )

ライセンス: Link先を確認

Nazanin Bayati Chaleshtari, Yoann Marquer, Fabrizio Pastore, Lionel C. Briand,

(参考訳) Webシステムのセキュリティテストは、クラフトインプットを生成して自動化することができるが、テストオラクルを自動化するソリューション、すなわち、正しい出力と間違った出力を区別するソリューションは、まだ予備的なままである。実際には、セキュリティ障害は、有効な入力を悪意のある入力に変換するメタモルフィック関係によって決定できる。しかし、それ以上のガイダンスがなければ、メタモルフィックな関係は通常、大きな入力セット上で実行されるため、時間を要するため、メタモルフィックなテストは現実的ではない。脆弱性検出機能を保ちながら、テストコストを削減するために入力を自動的に選択するアプローチであるAIMを提案する。 AIMにはクラスタリングベースのブラックボックスアプローチが含まれており、セキュリティ特性に基づいて同様の入力を識別する。また、コストを最小化しながら、多様な入力を効率的に選択できる新しい遺伝的アルゴリズムにも依存している。さらに、探索スペースを減らし、最小化処理を高速化する問題低減成分を含む。我々は、文書化された脆弱性で有名な2つのWebシステム、JenkinsとJoomlaにおけるAIMの有効性を評価した。 AIMの結果を4つの基準線と比較した。全体として、AIMは、脆弱性検出を保ちながら、Jenkinsで84%、Joomlaで82%のメタモルフィックテスト時間を短縮した。さらに、AIMは脆弱性カバレッジに関して考慮されたベースラインをすべて上回った。

Although the security testing of Web systems can be automated by generating crafted inputs, solutions to automate the test oracle, i.e., distinguishing correct from incorrect outputs, remain preliminary. Specifically, previous work has demonstrated the potential of metamorphic testing; indeed, security failures can be determined by metamorphic relations that turn valid inputs into malicious inputs. However, without further guidance, metamorphic relations are typically executed on a large set of inputs, which is time-consuming and thus makes metamorphic testing impractical. We propose AIM, an approach that automatically selects inputs to reduce testing costs while preserving vulnerability detection capabilities. AIM includes a clustering-based black box approach, to identify similar inputs based on their security properties. It also relies on a novel genetic algorithm able to efficiently select diverse inputs while minimizing their total cost. Further, it contains a problem-reduction component to reduce the search space and speed up the minimization process. We evaluated the effectiveness of AIM on two well-known Web systems, Jenkins and Joomla, with documented vulnerabilities. We compared AIM's results with four baselines. Overall, AIM reduced metamorphic testing time by 84% for Jenkins and 82% for Joomla, while preserving vulnerability detection. Furthermore, AIM outperformed all the considered baselines regarding vulnerability coverage.

翻訳日:2024-08-05 18:23:26 公開日:2024-08-02

# SymBa: 構造化自然言語推論のためのシンボリック・バックワード・チェイン

SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning ( http://arxiv.org/abs/2402.12806v2 )

ライセンス: Link先を確認

Jinu Lee, Wonseok Hwang,

(参考訳) 大規模言語モデル(LLM)は目覚ましい推論能力を示しており、説明可能性を保証するための構造化された説明可能な証明を提供する。構造化推論の2つの方向のうち、特に後方連鎖に着目し、クエリは推論規則を適用して再帰的にサブゴールに分解される。現在普及している後方連鎖実装 (Least-to-most prompting と LAMBADA) は、任意の深度再帰やバインディングの伝搬といった、後方連鎖に必要な機能を実装していないことを指摘する。そこで本研究では,Symblic Backward Chaining (Symbolic Backward Chaining) という,新しい後方チェーンフレームワークを提案する。 SymBAでは、シンボリック・ソルバが証明プロセス全体を制御し、LLMは関連する自然言語の前提を検索し、それらをシンボリック・フォームに変換する。このLCM-ソルバ統合により、記号的に検証された完全に構造化された証明を生成する一方で、Symbaは、ベースラインと比較して様々な構造化された推論ベンチマークにおいて、性能、検証精度、効率を大幅に向上させる。

While Large Language Models (LLMs) have demonstrated remarkable reasoning ability, providing a structured, explainable proof to ensure explainability, i.e. structured reasoning, still remains challenging. Among two directions of structured reasoning, we specifically focus on backward chaining, where the query is recursively decomposed to subgoals by applying inference rules. We point out that current popular backward chaining implementations (Least-to-most prompting and LAMBADA) fail to implement the necessary features of backward chaining, such as arbitrary-depth recursion and binding propagation. To this end, we propose a novel backward chaining framework, SymBa (Symbolic Backward Chaining). In SymBA, a symbolic solver controls the whole proof process, and an LLM searches for the relevant natural language premises and translates them into a symbolic form for the solver. By this LLM-solver integration, while producing a completely structured proof that is symbolically verified, SymBa achieves significant improvement in performance, proof accuracy, and efficiency in diverse structured reasoning benchmarks compared to baselines.

翻訳日:2024-08-05 18:23:26 公開日:2024-08-02

# 簡単な例によるトレーニングデータの自動生成による文埋め込みの改善

Improving Sentence Embeddings with Automatic Generation of Training Data Using Few-shot Examples ( http://arxiv.org/abs/2402.15132v2 )

ライセンス: Link先を確認

Soma Sato, Hayato Tsukagoshi, Ryohei Sasano, Koichi Takeda,

(参考訳) デコーダベースの大規模言語モデル(LLM)は、自然言語処理における多くのタスクにおいて高いパフォーマンスを示している。これは文埋め込み学習にも当てはまり、デコーダベースのモデルであるPromptEOLは、セマンティックテキスト類似性(STS)タスクで最高のパフォーマンスを達成した。しかし、PromptEOLは微調整のために手動で注釈付き自然言語推論(NLI)データセットを必要とする。我々は,LLM を用いて NLI データセットを自動生成し,それを PromptEOL の微調整に用いることにより,手動で注釈付きデータセットを用いることなく文の埋め込みを改善することを目的としている。そこで本研究では,文埋め込み学習に適したデータ生成手法について検討する。具体的には、数ショットの学習による自動データセット生成に焦点を当て、数ショットの例を活用するための適切な方法を探る。 STSタスクの実験結果から,提案手法は手作業による大規模なアノテートデータセットを使わずに,既存のモデルよりも優れていることが示された。

Decoder-based large language models (LLMs) have shown high performance on many tasks in natural language processing. This is also true for sentence embedding learning, where a decoder-based model, PromptEOL, has achieved the best performance on semantic textual similarity (STS) tasks. However, PromptEOL requires a manually annotated natural language inference (NLI) dataset for fine-tuning. We aim to improve sentence embeddings without using large manually annotated datasets by automatically generating an NLI dataset with an LLM and using it for fine-tuning of PromptEOL. To achieve this, we explore methods of data generation suitable for sentence embedding learning in this study. Specifically, we will focus on automatic dataset generation through few-shot learning and explore the appropriate methods to leverage few-shot examples. Experimental results on the STS tasks demonstrate that our approach outperforms existing models in settings without large manually annotated datasets.

翻訳日:2024-08-05 18:23:26 公開日:2024-08-02

# 言語モデルのデータ選択に関する調査

A Survey on Data Selection for Language Models ( http://arxiv.org/abs/2402.16827v3 )

ライセンス: Link先を確認

Alon Albalak, Yanai Elazar, Sang Michael Xie, Shayne Longpre, Nathan Lambert, Xinyi Wang, Niklas Muennighoff, Bairu Hou, Liangming Pan, Haewon Jeong, Colin Raffel, Shiyu Chang, Tatsunori Hashimoto, William Yang Wang,

(参考訳) 近年の大規模言語モデルの成功の大きな要因は、教師なしの事前トレーニングに巨大な成長を続けるテキストデータセットを使用することである。しかしながら、利用可能なすべてのデータに対して、利用可能なテキストデータの質が変化するため、モデルを直接的にトレーニングすることは最適ではない(あるいは実現可能である)。データのフィルタリングは、必要なトレーニングの量を減らすことで、トレーニングモデルのカーボンフットプリントと財政コストを削減できる。データ選択方法は、トレーニングデータセットに含まれる候補データポイントと、選択したデータポイントから適切にサンプリングする方法を決定することを目的としている。改良されたデータ選択手法の約束は、この分野の研究の規模を急速に拡大させてきた。しかし、ディープラーニングは、主に実証的な証拠と大規模なデータの実験によって駆動されるため、広範なデータ選択研究のためのリソースを持つ組織はほとんどない。その結果、効果的なデータ選択のプラクティスに関する知識は、いくつかの組織に集中するようになった。知識のギャップを狭めるために,データ選択手法および関連研究分野に関する既存の文献を包括的にレビューし,既存のアプローチの分類を提示する。本研究は,現在の研究状況を説明することによって,新たな研究者のエントリーポイントを確立することにより,データ選択の進展を加速することを目的としている。さらに,本研究を通じて,文献の目立った穴に注意を向け,将来的な研究の道筋を提案し,論文を締めくくっている。

A major factor in the recent success of large language models is the use of enormous and ever-growing text datasets for unsupervised pre-training. However, naively training a model on all available data may not be optimal (or feasible), as the quality of available text data can vary. Filtering out data can also decrease the carbon footprint and financial costs of training models by reducing the amount of training required. Data selection methods aim to determine which candidate data points to include in the training dataset and how to appropriately sample from the selected data points. The promise of improved data selection methods has caused the volume of research in the area to rapidly expand. However, because deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive, few organizations have the resources for extensive data selection research. Consequently, knowledge of effective data selection practices has become concentrated within a few organizations, many of which do not openly share their findings and methodologies. To narrow this gap in knowledge, we present a comprehensive review of existing literature on data selection methods and related research areas, providing a taxonomy of existing approaches. By describing the current landscape of research, this work aims to accelerate progress in data selection by establishing an entry point for new and established researchers. Additionally, throughout this review we draw attention to noticeable holes in the literature and conclude the paper by proposing promising avenues for future research.

翻訳日:2024-08-05 18:23:26 公開日:2024-08-02

# インコンテキスト学習におけるデュアルオペレーティングモード

Dual Operating Modes of In-Context Learning ( http://arxiv.org/abs/2402.18819v2 )

ライセンス: Link先を確認

Ziqian Lin, Kangwook Lee,

(参考訳) In-context Learning (ICL)は、タスク学習(英語版)、すなわち、インコンテキストサンプルから新しいスキルを取得する、タスク検索(英語版)、すなわち、関連する事前訓練されたスキルの位置と活性化の2つの動作モードを示す。最近の理論的研究では、ICLを解析するための様々な数学的モデルが検討されているが、既存のモデルは一度に1つの動作モードしか説明できない。本稿では,ICLの二重動作モードを同時に説明できる確率モデルを提案する。線形関数の文脈内学習に着目し,複数のタスク群とタスク依存入力分布を導入することで,事前学習のための既存のモデルを拡張する。次に,2乗損失下での最適事前学習モデルの挙動,すなわちラベルのMMSE推定器の分析を行った。先行学習タスクの分布を観察例として, タスク後部分布のクローズドフォーム表現を導出する。クローズドフォーム表現では、ICLの2つの動作モードの定量的理解が得られる。さらに、ある条件下では、ICLのリスクは最初増加し、その後、より文脈内での例で減少する。我々のモデルは、この「初期段階」現象について、妥当な説明を提供する: 限られた数のインコンテキストサンプルが不正なスキルの検索に繋がる可能性があり、それによってリスクが増大し、より多くのインコンテキストサンプルでタスク学習が効果を発揮すると、最終的には減少する。また,テキスト内サンプルがランダムラベルに割り当てられるゼロショットICLなど,バイアス付きラベルを用いてICLを理論的に解析する。最後に,トランスフォーマーと大規模言語モデルを用いた実験により,この結果と予測を検証した。

In-context learning (ICL) exhibits dual operating modes: task learning, i.e., acquiring a new skill from in-context samples, and task retrieval, i.e., locating and activating a relevant pretrained skill. Recent theoretical work investigates various mathematical models to analyze ICL, but existing models explain only one operating mode at a time. We introduce a probabilistic model, with which one can explain the dual operating modes of ICL simultaneously. Focusing on in-context learning of linear functions, we extend existing models for pretraining data by introducing multiple task groups and task-dependent input distributions. We then analyze the behavior of the optimally pretrained model under the squared loss, i.e., the MMSE estimator of the label given in-context examples. Regarding pretraining task distribution as prior and in-context examples as the observation, we derive the closed-form expression of the task posterior distribution. With the closed-form expression, we obtain a quantitative understanding of the two operating modes of ICL. Furthermore, we shed light on an unexplained phenomenon observed in practice: under certain settings, the ICL risk initially increases and then decreases with more in-context examples. Our model offers a plausible explanation for this "early ascent" phenomenon: a limited number of in-context samples may lead to the retrieval of an incorrect skill, thereby increasing the risk, which will eventually diminish as task learning takes effect with more in-context samples. We also theoretically analyze ICL with biased labels, e.g., zero-shot ICL, where in-context examples are assigned random labels. Lastly, we validate our findings and predictions via experiments involving Transformers and large language models.

翻訳日:2024-08-05 18:23:26 公開日:2024-08-02

# ソフトウェアエンジニアリングの公平さを理解する - Stack Exchangeからの洞察

Understanding Fairness in Software Engineering: Insights from Stack Exchange ( http://arxiv.org/abs/2402.19038v3 )

ライセンス: Link先を確認

Emeralda Sesari, Federica Sarro, Ayushi Rastogi,

(参考訳) ソフトウェア実践者は、同僚や個人、オンラインで作業する際の問題について議論する。これらの議論は技術的(例:バグの修正方法?)と社会的(例:作業を公平に割り当てる方法? ソフトウェアエンジニアリングの人的および社会的要因において、公平性の問題と解決策を探求する知識が増えている一方で、ほとんどの場合、特定の問題に焦点が当てられている。この研究はStack Exchangeサイトのソフトウェア実践者による公正な議論を提供する。本稿では,ソフトウェア実践者の公正な経験と,ソフトウェア開発チームにおける公正な期待を示す探索的研究について述べる。私たちはまた、ソフトウェア実践者が最もよく話す公平さの側面を特定したいと思っています。例えば、彼らは収入の公平さや、職場でどのように扱われるかをより気にしていますか? Stack Exchangeの8つのサイトでの公平性に関する議論を調査した結果,4,178の候補ポストから手作業で収集した136の投稿(28の質問と108の回答)のリストが得られた。この調査によると、フェアネスに関する議論(24記事)の大多数は、多くのソフトウェア実践者が給与とそれがどのようにかなり分散されているかについて非常に関心を持っていることを示唆している。また、あまり議論されることはないが、採用における公正性に関する議論は、最も多くのビューやスコアを受け取る傾向にあることも指摘した。興味深いことに、この研究は保護された属性を超えて不公平な体験が広がることを示している。本研究では,保護属性について言及した投稿は136件中25件に過ぎず,主にジェンダーが議論されている。

Software practitioners discuss problems at work with peers, in-person and online. These discussions can be technical (e.g., how to fix a bug?) and social (e.g., how to assign work fairly?). While there is a growing body of knowledge exploring fairness problems and solutions in the human and social factors of software engineering, most focus has been on specific problems. This study provides fairness discussions by software practitioners on Stack Exchange sites. We present an exploratory study presenting the fairness experience of software practitioners and fairness expectations in software teams. We also want to identify the fairness aspects software practitioners talk about the most. For example, do they care more about fairness in income or how they are treated in the workplace? Our investigation of fairness discussions on eight Stack Exchange sites resulted in a list of 136 posts (28 questions and 108 answers) manually curated from 4,178 candidate posts. The study reveals that the majority of fairness discussions (24 posts) revolve around the topic of income suggesting that many software practitioners are highly interested in matters related to their pay and how it is fairly distributed. Further, we noted that while not discussed as often, discussions on fairness in recruitment tend to receive the highest number of views and scores. Interestingly, the study shows that unfairness experiences extend beyond the protected attributes. In this study, only 25 out of 136 posts mention protected attributes, with gender mainly being discussed.

翻訳日:2024-08-05 18:23:26 公開日:2024-08-02

# 異方性誘起スピンパリティ効果

Anisotropy-Induced Spin Parity Effects ( http://arxiv.org/abs/2402.19311v4 )

ライセンス: Link先を確認

Shuntaro Sumita, Akihiro Tanaka, Yusuke Kato,

(参考訳) スピンパリティ効果(スピンパリティえき、英: Spin parity effect)とは、系の物理的挙動における二分法が生じる特別な状況を指し、関連するスピン量子数が積分であるか半負積分であるかに依存する。反強磁性スピン鎖におけるハルダン予想と同様に、それらの追求はしばしば深い洞察を導き、量子凝縮物質物理学の新しい発展を呼び起こす。ここでは、異方性相互作用を用いて、任意の空間次元におけるそのような効果を生成するための単純で一般的なスキームと、最先端のコールド原子実装の妥当な到達範囲に設定する。本研究では, 横磁場中の異方性反強磁性体である1次元スピン鎖モデルの磁化挙動を詳細に解析し, 従来注目されてきたが明確には理解されていない磁化曲線で観測された有限サイズの効果の量子的起源を解明する。

Spin parity effects refer to those special situations where a dichotomy in the physical behavior of a system arises, solely depending on whether the relevant spin quantum number is integral or half-odd integral. As is the case with the Haldane conjecture in antiferromagnetic spin chains, their pursuit often derives deep insights and invokes new developments in quantum condensed matter physics. Here we put forth a simple and general scheme for generating such effects in any spatial dimension through the use of anisotropic interactions, and a setup within reasonable reach of state-of-the-art cold-atom implementations. We demonstrate its utility through a detailed analysis of the magnetization behavior of a specific one-dimensional spin chain model -- an anisotropic antiferromagnet in a transverse magnetic field, unraveling along the way the quantum origin of finite-size effects observed in the magnetization curve that had previously been noted but not clearly understood.

翻訳日:2024-08-05 18:23:26 公開日:2024-08-02

# DISORF - 移動ロボットのための分散オンライン3D再構成フレームワーク

DISORF: A Distributed Online 3D Reconstruction Framework for Mobile Robots ( http://arxiv.org/abs/2403.00228v3 )

ライセンス: Link先を確認

Chunlin Li, Hanrui Fan, Xiaorui Huang, Ruofan Liang, Sankeerth Durvasula, Nandita Vijaykumar,

(参考訳) 本研究では,資源制約された移動ロボットやエッジデバイスが捉えたシーンのオンライン3次元再構成と可視化を実現するためのフレームワークであるDIORFを提案する。エッジデバイスの限られた計算能力と潜在的に限られたネットワーク可用性に対処するため,エッジデバイスとリモートサーバ間で効率的に計算を分散するフレームワークを設計する。我々は、オンデバイスSLAMシステムを活用して、ポーズ付きキーフレームを生成し、それらを遠隔サーバに送信し、ニューラル3D手法の最近の進歩を活用して、実行時に高品質な3D再構成と可視化を行う。我々は、画像サンプリング戦略がレンダリング品質を著しく低下させるおそれのあるオンライントレーニングにおいて、重要な課題を識別する。本稿では,オンライン学習におけるこの課題に対処する,シフト指数型フレームサンプリング手法を提案する。我々は,移動ロボットやエッジデバイスのカメラから撮影・ストリームされる未知シーンの高品質なリアルタイム再構築と可視化を実現する上で,我々のフレームワークの有効性を実証する。

We present a framework, DISORF, to enable online 3D reconstruction and visualization of scenes captured by resource-constrained mobile robots and edge devices. To address the limited computing capabilities of edge devices and potentially limited network availability, we design a framework that efficiently distributes computation between the edge device and the remote server. We leverage on-device SLAM systems to generate posed keyframes and transmit them to remote servers that can perform high-quality 3D reconstruction and visualization at runtime by leveraging recent advances in neural 3D methods. We identify a key challenge with online training where naive image sampling strategies can lead to significant degradation in rendering quality. We propose a novel shifted exponential frame sampling method that addresses this challenge for online training. We demonstrate the effectiveness of our framework in enabling high-quality real-time reconstruction and visualization of unknown scenes as they are captured and streamed from cameras in mobile robots and edge devices.

翻訳日:2024-08-05 18:23:26 公開日:2024-08-02

# 自然言語処理のためのプレトレイン・フィネチューンパラダイムに関する研究

A Tutorial on the Pretrain-Finetune Paradigm for Natural Language Processing ( http://arxiv.org/abs/2403.02504v3 )

ライセンス: Link先を確認

Yu Wang, Wen Qu,

(参考訳) 自然言語が思考や感情を表現する主要な経路として機能していることを考えると、テキスト分析は心理学研究において重要な技術となっている。自然言語から貴重な洞察を抽出し、人格特性評価、メンタルヘルスモニタリング、対人コミュニケーションにおける感情分析などの取り組みを促進する。テキスト分析では、既存の研究は、事前に構築された辞書を使って、時間を要する人間のコーディングを頼りにし、可能なすべてのシナリオをカバーできないか、大量のラベル付きデータを必要とするモデルをスクラッチからトレーニングする。本チュートリアルでは,プレトレイン-ファインチューンパラダイムについて紹介する。 Pretrain-finetune パラダイムは、テキスト分析と自然言語処理における変換的アプローチを表している。このパラダイムは、大規模な事前訓練された言語モデルを使用することで、限られた訓練データであっても、微調整タスクにおいて顕著な効率性を示す。この効率性は、注釈付きサンプルの数が非常に限られている社会科学の研究にとって特に有益である。本チュートリアルでは,プレトレイン-ファインチューンパラダイムの包括的紹介を行う。まず、事前学習と微調整の基本概念を掘り下げ、続いて実世界のアプリケーションを用いた実践的な演習を行った。マルチクラス分類や回帰など,様々なタスクにまたがるパラダイムの適用例を示す。このチュートリアルは、その有効性とユーザフレンドリさを強調し、このパラダイムのより広範な採用を促進することを目的としている。この目的のために、私たちはすべてのコードとデータセットへのオープンアクセスを提供しました。このチュートリアルは様々な心理学の分野において非常に有益であり、様々な研究環境でテキスト分析を利用するための包括的なガイドを提供する。

Given that natural language serves as the primary conduit for expressing thoughts and emotions, text analysis has become a key technique in psychological research. It enables the extraction of valuable insights from natural language, facilitating endeavors like personality traits assessment, mental health monitoring, and sentiment analysis in interpersonal communications. In text analysis, existing studies often resort to either human coding, which is time-consuming, using pre-built dictionaries, which often fails to cover all possible scenarios, or training models from scratch, which requires large amounts of labeled data. In this tutorial, we introduce the pretrain-finetune paradigm. The pretrain-finetune paradigm represents a transformative approach in text analysis and natural language processing. This paradigm distinguishes itself through the use of large pretrained language models, demonstrating remarkable efficiency in finetuning tasks, even with limited training data. This efficiency is especially beneficial for research in social sciences, where the number of annotated samples is often quite limited. Our tutorial offers a comprehensive introduction to the pretrain-finetune paradigm. We first delve into the fundamental concepts of pretraining and finetuning, followed by practical exercises using real-world applications. We demonstrate the application of the paradigm across various tasks, including multi-class classification and regression. Emphasizing its efficacy and user-friendliness, the tutorial aims to encourage broader adoption of this paradigm. To this end, we have provided open access to all our code and datasets. The tutorial is highly beneficial across various psychology disciplines, providing a comprehensive guide to employing text analysis in diverse research settings.

翻訳日:2024-08-05 18:23:26 公開日:2024-08-02

# 常磁性体の量子アニーリングと非平衡ダイナミクスにおける断熱性ボトルネック

Adiabatic Bottlenecks in Quantum Annealing and Nonequilibrium Dynamics of Paramagnons ( http://arxiv.org/abs/2403.11548v2 )

ライセンス: Link先を確認

Tim Bode, Frank K. Wilhelm,

(参考訳) 長距離相互作用量子スピングラスと組合せ最適化問題との対応は、断熱量子コンピューティングの物理的動機となっている。一方、乱れた(量子)スピン系では、無限のシステムとアンサンブルサイズの極限におけるシステム量の計算を可能にする複製トリックのような正確な方法に焦点が当てられている。一方、最適化問題の与えられたインスタンスを解くとき、乱数平均量は、インスタンス固有の有限サイズの性質、特に真の解にのみ興味を持つため、何の関係も持たない。ここでは、スピンコヒーレント状態経路積分に非平衡グリーン関数形式を適用し、アニーリング経路に沿った統計的揺らぎと集合励起スペクトルを得る。量子シェリントン・カークパトリックスピンガラスの例では、広範囲な数値的な結果と比較することにより、この手法がアニーリングプロトコルのインスタンス固有のボトルネックにアクセスできることを示す。

The correspondence between long-range interacting quantum spin glasses and combinatorial optimization problems underpins the physical motivation for adiabatic quantum computing. On one hand, in disordered (quantum) spin systems, the focus is on exact methods such as the replica trick that allow the calculation of system quantities in the limit of infinite system and ensemble size. On the other hand, when solving a given instance of an optimization problem, disorder-averaged quantities are of no relevance, as one is solely interested in instance-specific, finite-size properties, in particular the true solution. Here, we apply the nonequilibrium Green-function formalism to the spin coherent-state path integral to obtain the statistical fluctuations and the collective-excitation spectrum along the annealing path. For the example of the quantum Sherrington-Kirkpatrick spin glass, by comparing to extensive numerically exact results, we show that this method provides access to the instance-specific bottlenecks of the annealing protocol.

翻訳日:2024-08-05 18:13:29 公開日:2024-08-02

# 説明者のアンサンブルが生成した反事実集合から説明を選択するための多基準アプローチ

A multi-criteria approach for selecting an explanation from the set of counterfactuals produced by an ensemble of explainers ( http://arxiv.org/abs/2403.13940v2 )

ライセンス: Link先を確認

Ignacy Stępka, Mateusz Lango, Jerzy Stefanowski,

(参考訳) ファクトファクトは、より望ましい予測を得るための代替シナリオを提供することで、MLモデルの予測を説明するために広く使用される。これらは、異なる、時には矛盾する、品質測定を最適化し、全く異なるソリューションを生成する様々な方法によって生成される。しかし、最も適切な説明方法と生成された偽物を選択することは容易ではない。本稿では,ユーザが様々な説明手法をテストし,矛盾する解を解析する代わりに,多段階アンサンブルアプローチを用いることを提案する。それは妥協ソリューションを提供し、いくつかの人気のある品質基準によく適合する。このアプローチは,パレートフロントから1つのファクトファクトを選択する,支配関係と理想的なポイント決定支援手法を利用する。実験により,提案手法は,検討された品質指標の魅力的な妥協値を持つ,完全な動作可能な反事実を生成できることが実証された。

Counterfactuals are widely used to explain ML model predictions by providing alternative scenarios for obtaining the more desired predictions. They can be generated by a variety of methods that optimize different, sometimes conflicting, quality measures and produce quite different solutions. However, choosing the most appropriate explanation method and one of the generated counterfactuals is not an easy task. Instead of forcing the user to test many different explanation methods and analysing conflicting solutions, in this paper, we propose to use a multi-stage ensemble approach that will select single counterfactual based on the multiple-criteria analysis. It offers a compromise solution that scores well on several popular quality measures. This approach exploits the dominance relation and the ideal point decision aid method, which selects one counterfactual from the Pareto front. The conducted experiments demonstrated that the proposed approach generates fully actionable counterfactuals with attractive compromise values of the considered quality measures.

翻訳日:2024-08-05 18:13:29 公開日:2024-08-02

# 3Dメッシュのテクスチャ化のためのマルチビュー整合性向上のための最適化フレームワーク

An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes ( http://arxiv.org/abs/2403.15559v2 )

ライセンス: Link先を確認

Zhengyi Zhao, Chen Song, Xiaodong Gu, Yuan Dong, Qi Zuo, Weihao Yuan, Liefeng Bo, Zilong Dong, Qixing Huang,

(参考訳) 事前訓練されたテキスト・ツー・イメージモデルを用いた3Dメッシュのテクスチャ化における根本的な問題は、マルチビューの一貫性を保証することである。最先端のアプローチでは、一般的に拡散モデルを用いてマルチビュー入力を集約するが、一般的な問題は集約ステップにおける平均演算による曖昧さや局所的な特徴の不整合である。本稿では,多視点整合性を実現するために,4段階の最適化フレームワークを提案する。特に、第1段階は、MV一貫性拡散プロセスを用いて、予め定義された視点の集合から、過剰に完全な2次元テクスチャの集合を生成する。第2段階は、基礎となる3Dモデルをカバーしながら相互に一貫性のあるビューのサブセットを選択する。半確定プログラムを解くことで、この目標を達成する方法を示す。第3ステージは、重複する領域にまたがって選択されたビューを調整するために、厳密でないアライメントを実行する。第4ステージは、各メッシュ面と選択されたビューを関連付けるためにMRF問題を解決する。特に第3段と第4段は反復され、第4段のカットは第3段の非剛性アライメントを奨励し、カットに近い領域にフォーカスする。実験結果から,本手法は質的,定量的にベースラインアプローチを著しく上回ることがわかった。プロジェクトページ: https://aigc3d.github.io/ConsistenTex。

A fundamental problem in the texturing of 3D meshes using pre-trained text-to-image models is to ensure multi-view consistency. State-of-the-art approaches typically use diffusion models to aggregate multi-view inputs, where common issues are the blurriness caused by the averaging operation in the aggregation step or inconsistencies in local features. This paper introduces an optimization framework that proceeds in four stages to achieve multi-view consistency. Specifically, the first stage generates an over-complete set of 2D textures from a predefined set of viewpoints using an MV-consistent diffusion process. The second stage selects a subset of views that are mutually consistent while covering the underlying 3D model. We show how to achieve this goal by solving semi-definite programs. The third stage performs non-rigid alignment to align the selected views across overlapping regions. The fourth stage solves an MRF problem to associate each mesh face with a selected view. In particular, the third and fourth stages are iterated, with the cuts obtained in the fourth stage encouraging non-rigid alignment in the third stage to focus on regions close to the cuts. Experimental results show that our approach significantly outperforms baseline approaches both qualitatively and quantitatively. Project page: https://aigc3d.github.io/ConsistenTex.

翻訳日:2024-08-05 18:13:29 公開日:2024-08-02

# 無人航空システムのシステムレベル自動試験

Automated System-level Testing of Unmanned Aerial Systems ( http://arxiv.org/abs/2403.15857v2 )

ライセンス: Link先を確認

Hassan Sartaj, Asmar Muqeet, Muhammad Zohaib Iqbal, Muhammad Uzair Khan,

(参考訳) 無人航空システム(UAS)は、安全クリティカルでミッションクリティカルな様々なアビオニクスシステムに依存している。国際安全基準の主な要件は、アビオニクスソフトウェアシステムの厳格なシステムレベルのテストを実行することである。現在の産業的なプラクティスは、手動でテストシナリオを作成し、シミュレータを使ってこれらのシナリオを手動/自動で実行し、成果を手動で評価することです。テストシナリオは一般的に、特定の飛行条件や環境条件を設定し、これらの設定でテスト中のシステムをテストする。この目的のための最先端のアプローチは、手動のテストシナリオの開発と評価も必要である。本稿では,UASのシステムレベルのテストを自動化する新しい手法を提案する。提案したアプローチ(AITester)は、モデルベースのテストと人工知能(AI)技術を使用して、さまざまなテストシナリオを自動生成、実行、評価する。テストシナリオは、実行時の環境コンテキストに基づいてテスト実行中に、即時に生成される。このアプローチはツールセットによってサポートされます。地上管制局(GCS)の無人航空機(UAV)のオートパイロットシステムとコックピット表示システム(CDS)の2つのコアコンポーネントに対する提案手法を実証的に評価した。その結果,AITesterはUAVオートパイロットの期待される動作から逸脱するテストシナリオを効果的に生成し,GCS-CDSの潜在的な欠陥を明らかにすることができた。

Unmanned aerial systems (UAS) rely on various avionics systems that are safety-critical and mission-critical. A major requirement of international safety standards is to perform rigorous system-level testing of avionics software systems. The current industrial practice is to manually create test scenarios, manually/automatically execute these scenarios using simulators, and manually evaluate outcomes. The test scenarios typically consist of setting certain flight or environment conditions and testing the system under test in these settings. The state-of-the-art approaches for this purpose also require manual test scenario development and evaluation. In this paper, we propose a novel approach to automate the system-level testing of the UAS. The proposed approach (AITester) utilizes model-based testing and artificial intelligence (AI) techniques to automatically generate, execute, and evaluate various test scenarios. The test scenarios are generated on the fly, i.e., during test execution based on the environmental context at runtime. The approach is supported by a toolset. We empirically evaluate the proposed approach on two core components of UAS, an autopilot system of an unmanned aerial vehicle (UAV) and cockpit display systems (CDS) of the ground control station (GCS). The results show that the AITester effectively generates test scenarios causing deviations from the expected behavior of the UAV autopilot and reveals potential flaws in the GCS-CDS.

翻訳日:2024-08-05 18:13:29 公開日:2024-08-02

# DASA: 遅延適応型マルチエージェント確率近似

DASA: Delay-Adaptive Multi-Agent Stochastic Approximation ( http://arxiv.org/abs/2403.17247v3 )

ライセンス: Link先を確認

Nicolò Dal Fabbro, Arman Adibi, H. Vincent Poor, Sanjeev R. Kulkarni, Aritra Mitra, George J. Pappas,

(参考訳) 我々は,Stochastic Approximation (SA) 問題を並列に動作し,中央サーバと通信することで高速化することを目的としている。サーバへのアップリンク送信は、非同期で潜在的に非バウンドな時間変化の遅延にさらされていると仮定する。分散計算の利点を享受しながら遅延とストラグラーの効果を緩和するため,マルチエージェント確率近似のための遅延適応アルゴリズムである \texttt{DASA} を提案する。エージェントの確率的観察過程が独立なマルコフ連鎖であることを仮定して、 texttt{DASA} の有限時間解析を行う。既存の結果を前進させる最初のアルゴリズムは、収束速度が混合時間$\tau_{mix}$と平均遅延$\tau_{avg}$にのみ依存するが、マルコヴィアンサンプリングでは$N$倍収束速度を共同で達成する。我々の研究は、マルチエージェントおよび分散時間差学習(TD)、Qラーニング、相関データによる確率的最適化など、様々なSAアプリケーションに関係している。

We consider a setting in which $N$ agents aim to speedup a common Stochastic Approximation (SA) problem by acting in parallel and communicating with a central server. We assume that the up-link transmissions to the server are subject to asynchronous and potentially unbounded time-varying delays. To mitigate the effect of delays and stragglers while reaping the benefits of distributed computation, we propose \texttt{DASA}, a Delay-Adaptive algorithm for multi-agent Stochastic Approximation. We provide a finite-time analysis of \texttt{DASA} assuming that the agents' stochastic observation processes are independent Markov chains. Significantly advancing existing results, \texttt{DASA} is the first algorithm whose convergence rate depends only on the mixing time $\tau_{mix}$ and on the average delay $\tau_{avg}$ while jointly achieving an $N$-fold convergence speedup under Markovian sampling. Our work is relevant for various SA applications, including multi-agent and distributed temporal difference (TD) learning, Q-learning and stochastic optimization with correlated data.

翻訳日:2024-08-05 18:13:29 公開日:2024-08-02

# デモから視覚的四足歩行操作を学習する

Learning Visual Quadrupedal Loco-Manipulation from Demonstrations ( http://arxiv.org/abs/2403.20328v2 )

ライセンス: Link先を確認

Zhengmao He, Kun Lei, Yanjie Ze, Koushil Sreenath, Zhongyu Li, Huazhe Xu,

(参考訳) 四足歩行ロボットは徐々に人間環境に統合されている。四足歩行ロボットの移動能力の増大にもかかわらず、現実的な場面での物体との相互作用はまだ限られている。四足歩行ロボットにロボットアームを追加することで、物体を操作することができるが、四足歩行ロボットは基本的に4つの手足を備えた移動ユニットであり、それぞれが3自由度(DoF)を持つことを考えると、しばしば冗長である。そこで,本研究の目的は,四足歩行ロボットを足のみを用いて実世界の操作タスクの実行に活用することである。我々はロコ操作プロセスを低レベル強化学習(RL)ベースのコントローラと高レベル行動クローン(BC)ベースのプランナに分解する。操作軌跡をパラメータ化することにより,上層と下層の努力を同期させ,RLとBCの利点を活用する。提案手法はシミュレーションや実世界の実験を通じて検証され,移動中にバスケットを持ち上げる,食器洗い機を閉じる,ボタンを押す,ドアを押すなど,移動性や高精度な作業を行うロボットの能力を実証した。プロジェクトウェブサイト: https://zhengmaohe.github.io/leg-manip

Quadruped robots are progressively being integrated into human environments. Despite the growing locomotion capabilities of quadrupedal robots, their interaction with objects in realistic scenes is still limited. While additional robotic arms on quadrupedal robots enable manipulating objects, they are sometimes redundant given that a quadruped robot is essentially a mobile unit equipped with four limbs, each possessing 3 degrees of freedom (DoFs). Hence, we aim to empower a quadruped robot to execute real-world manipulation tasks using only its legs. We decompose the loco-manipulation process into a low-level reinforcement learning (RL)-based controller and a high-level Behavior Cloning (BC)-based planner. By parameterizing the manipulation trajectory, we synchronize the efforts of the upper and lower layers, thereby leveraging the advantages of both RL and BC. Our approach is validated through simulations and real-world experiments, demonstrating the robot's ability to perform tasks that demand mobility and high precision, such as lifting a basket from the ground while moving, closing a dishwasher, pressing a button, and pushing a door. Project website: https://zhengmaohe.github.io/leg-manip

翻訳日:2024-08-05 18:13:29 公開日:2024-08-02

# 旅行購入問題に対する深層強化学習

Deep Reinforcement Learning for Traveling Purchaser Problems ( http://arxiv.org/abs/2404.02476v3 )

ライセンス: Link先を確認

Haofeng Yuan, Rongping Zhu, Wanlu Yang, Shiji Song, Keyou You, Yuli Zhang, C. L. Philip Chen,

(参考訳) 旅行購入問題(TPP)は、幅広い応用において重要な組合せ最適化問題である。ルーティングと購入の結合のため、既存のTPPの作業はルート構築と購入計画を同時に扱うことが一般的であり、高い計算コストと厳密な設計を伴うヒューリスティックな手法をもたらすが、性能は限られている。対照的に、我々はルート構築と購入計画を個別に扱う深層強化学習(DRL)に基づく新しいアプローチを提案し、グローバルな視点からソリューションを評価し、最適化する。提案手法の主な構成要素は,TPP が市場生産関係を捉えるための二部グラフ表現と,その二部グラフから情報を抽出し,それを用いて経路を逐次構築するポリシネットワークである。このフレームワークの重要な利点は、ポリシーネットワークを用いて効率的にルートを構築することができ、ルートが決定されると、関連する購入計画は線形プログラミングにより容易に導出でき、DRLを利用することで、ポリシーネットワークをトレーニングして、グローバルなソリューションの目的を最適化することができることである。さらに、メタラーニング戦略を導入することで、ポリシーネットワークは大規模TPPインスタンス上で安定してトレーニングすることができ、トレーニング中に見たことのないはるかに大きなインスタンスであっても、さまざまなサイズや分布のインスタンスに対して適切に一般化することができる。様々な合成TPPインスタンスとTPPLIBベンチマークの実験により、DRLベースのアプローチは、確立されたTPPヒューリスティックスを大幅に上回り、最適性ギャップを40%-90%削減し、特に大規模インスタンスにおいて実行時に有利であることを示す。

The traveling purchaser problem (TPP) is an important combinatorial optimization problem with broad applications. Due to the coupling between routing and purchasing, existing works on TPPs commonly address route construction and purchase planning simultaneously, which, however, leads to exact methods with high computational cost and heuristics with sophisticated design but limited performance. In sharp contrast, we propose a novel approach based on deep reinforcement learning (DRL), which addresses route construction and purchase planning separately, while evaluating and optimizing the solution from a global perspective. The key components of our approach include a bipartite graph representation for TPPs to capture the market-product relations, and a policy network that extracts information from the bipartite graph and uses it to sequentially construct the route. One significant benefit of our framework is that we can efficiently construct the route using the policy network, and once the route is determined, the associated purchasing plan can be easily derived through linear programming, while, leveraging DRL, we can train the policy network to optimize the global solution objective. Furthermore, by introducing a meta-learning strategy, the policy network can be trained stably on large-sized TPP instances, and generalize well across instances of varying sizes and distributions, even to much larger instances that are never seen during training. Experiments on various synthetic TPP instances and the TPPLIB benchmark demonstrate that our DRL-based approach can significantly outperform well-established TPP heuristics, reducing the optimality gap by 40%-90%, and also showing an advantage in runtime, especially on large-sized instances.

翻訳日:2024-08-05 18:13:29 公開日:2024-08-02

# QDsim: 大規模量子ドットデバイスをシミュレートするユーザフレンドリーなツールボックス

QDsim: A user-friendly toolbox for simulating large-scale quantum dot devices ( http://arxiv.org/abs/2404.02712v2 )

ライセンス: Link先を確認

Valentina Gualtieri, Charles Renshaw-Whitman, Vinicius Hernandes, Eliska Greplova,

(参考訳) 我々は、大規模な量子ドットデバイスにおける電荷安定性図を高速に生成するためのピソンパッケージであるQDsimを紹介し、従来の二重あるいは三重のドットを超えて拡張する。 QDsimは、凸最適化問題として最低エネルギー電荷構成を求めるタスクを言い換える、定数相互作用モデルに基づいている。したがって,既存のCVXPYパッケージと適切な強力な解法を組み合わせることで,安定図やポリトープの作成を効率化する凸最適化を実現できる。複数の例を通して、自動チューニングアルゴリズムのための機械学習モデルのトレーニングの基礎となる大規模なデータセットを、QDsimがどのように生成できるかを実証する。現在パッケージは、定数相互作用モデル以外の量子効果をサポートしていないが、QDsimは、半導体量子デバイスの開発を加速するために、より良いチューニングアルゴリズムのために、コスト効率と迅速なデータ取得のクリティカルなニーズに対処するツールである。

We introduce QDsim, a python package tailored for the rapid generation of charge stability diagrams in large-scale quantum dot devices, extending beyond traditional double or triple dots. QDsim is founded on the constant interaction model from which we rephrase the task of finding the lowest energy charge configuration as a convex optimization problem. Therefore, we can leverage the existing package CVXPY, in combination with an appropriate powerful solver, for the convex optimization which streamlines the creation of stability diagrams and polytopes. Through multiple examples, we demonstrate how QDsim enables the generation of large-scale dataset that can serve a basis for the training of machine-learning models for automated tuning algorithms. While the package currently does not support quantum effects beyond the constant interaction model, QDsim is a tool that directly addresses the critical need for cost-effective and expeditious data acquisition for better tuning algorithms in order to accelerate the development of semiconductor quantum devices.

翻訳日:2024-08-05 18:13:29 公開日:2024-08-02

# データ拡張における一般化ギャップ:照明からの洞察

Generalization Gap in Data Augmentation: Insights from Illumination ( http://arxiv.org/abs/2404.07514v2 )

ライセンス: Link先を確認

Jianqiang Xiao, Weiwen Guo, Junfeng Liu, Mengze Li,

(参考訳) コンピュータビジョンの分野では、深層学習技術を用いてデータセットをトレーニングする際の特徴的複雑さを強化するために、データ拡張が広く用いられている。しかし、モデルの一般化能力については、データ拡張によって生成された人工的特徴と自然な視覚的特徴との差が完全には明らかになっていない。本研究では,視覚的表現変数の概念を導入し,タスクの視覚的変化をこれらの変数の共分散として定義する。我々は,その分布劣化をシミュレーションし,データ拡張技術が分類タスクにおけるモデル性能をいかに向上させるかを調べることで,視覚表現変数「照明」に着目した。我々のゴールは、拡張現実で訓練されたモデルと実世界の照明条件で訓練されたモデルとの一般化の違いを調査することである。その結果,様々なデータ拡張手法を適用した結果,モデルの性能は大幅に向上した。しかし、様々なデータ拡張手法を利用して、モデル一般化を強化するトレーニングセットにおける特徴多様性の重要な役割を強調した上で、注目すべき一般化ギャップが依然として残っている。

In the field of computer vision, data augmentation is widely used to enrich the feature complexity of training datasets with deep learning techniques. However, regarding the generalization capabilities of models, the difference in artificial features generated by data augmentation and natural visual features has not been fully revealed. This study introduces the concept of "visual representation variables" to define the possible visual variations in a task as a joint distribution of these variables. We focus on the visual representation variable "illumination", by simulating its distribution degradation and examining how data augmentation techniques enhance model performance on a classification task. Our goal is to investigate the differences in generalization between models trained with augmented data and those trained under real-world illumination conditions. Results indicate that after applying various data augmentation methods, model performance has significantly improved. Yet, a noticeable generalization gap still exists after utilizing various data augmentation methods, emphasizing the critical role of feature diversity in the training set for enhancing model generalization.

翻訳日:2024-08-05 18:13:29 公開日:2024-08-02

# 変分LSE解の包括的ライブラリー

Comprehensive Library of Variational LSE Solvers ( http://arxiv.org/abs/2404.09916v2 )

ライセンス: Link先を確認

Nico Meyer, Martin Röhn, Jakob Murauer, Axel Plinge, Christopher Mutschler, Daniel D. Scherer,

(参考訳) 方程式の線形系は、様々な数学領域や機械学習の分野にも見られる。ノイズの多い中間スケールの量子デバイスを利用することで、変動解法は大規模システムの探索ソリューションの高速化を約束する。これらのアルゴリズムに関する多くの理論的研究があるが、断片的な実装のみが存在する。このギャップを埋めるために,文献における既存のアプローチを実現する変分解法フレームワークを開発し,いくつかの拡張を導入した。ユーザフレンドリーなインターフェースは、エンド・ツー・エンドのアプリケーションを識別し開発する抽象化レベルで働く研究者のために設計されている。

Linear systems of equations can be found in various mathematical domains, as well as in the field of machine learning. By employing noisy intermediate-scale quantum devices, variational solvers promise to accelerate finding solutions for large systems. Although there is a wealth of theoretical research on these algorithms, only fragmentary implementations exist. To fill this gap, we have developed the variational-lse-solver framework, which realizes existing approaches in literature, and introduces several enhancements. The user-friendly interface is designed for researchers that work at the abstraction level of identifying and developing end-to-end applications.

翻訳日:2024-08-05 18:13:29 公開日:2024-08-02

# SPIdepth:自己教師型単眼深度推定のための強化ポーズ情報

SPIdepth: Strengthened Pose Information for Self-supervised Monocular Depth Estimation ( http://arxiv.org/abs/2404.12501v2 )

ライセンス: Link先を確認

Mykola Lavreniuk,

(参考訳) 自律走行とロボット工学への応用について、自己監督された単眼深度推定が注目されている。最近の手法では、Self Query Layer(SQL)のようなテクニックを活用して動きから奥行きを推測する手法が採用されているが、多くの場合、ポーズ情報を強化する可能性を見落としている。本稿では、ポーズネットワークの強化を優先して深度推定を改善する新しいアプローチであるSPIdepthを紹介する。 SQLによって構築された基盤の上に構築されているSPIdepthは、きめ細かいシーン構造をキャプチャする上で、ポーズ情報の重要性を強調している。 SPIdepthは、ポーズネットワークの能力を高めることにより、シーン理解と深さ推定における顕著な進歩を実現する。 KITTI、Cityscapes、Make3Dといったベンチマークデータセットの実験結果は、SPIdepthの最先端のパフォーマンスを示し、従来の手法をかなり上回っている。具体的には、SPIdepthが自己監督型のKITTIベンチマークを上回っている。さらに、SPIdepthは、KITTI上のAbsRel (0.029)、SqRel (0.069)、RMSE (1.394) の最低値を獲得し、新しい最先端の結果を確立する。 Cityscapesでは、SPIdepthはAbsRelの21.7%、SqRelの36.8%、RMSEの16.5%のSQLdepthの改善を示している。 Make3Dでは、ゼロショットのSPIdepthは他のすべてのモデルより優れている。興味深いことに、SPIdepthは推論のために1つの画像のみを使用してこれらの結果を達成し、推論にビデオシーケンスを利用する方法さえ超え、実世界のアプリケーションにおいてその有効性と効率を実証する。本手法は, 実世界におけるシーン理解の促進を目的としたポーズ情報強化の重要性を強調し, 自己教師型単眼深度推定における飛躍的な進歩を示す。

Self-supervised monocular depth estimation has garnered considerable attention for its applications in autonomous driving and robotics. While recent methods have made strides in leveraging techniques like the Self Query Layer (SQL) to infer depth from motion, they often overlook the potential of strengthening pose information. In this paper, we introduce SPIdepth, a novel approach that prioritizes enhancing the pose network for improved depth estimation. Building upon the foundation laid by SQL, SPIdepth emphasizes the importance of pose information in capturing fine-grained scene structures. By enhancing the pose network's capabilities, SPIdepth achieves remarkable advancements in scene understanding and depth estimation. Experimental results on benchmark datasets such as KITTI, Cityscapes, and Make3D showcase SPIdepth's state-of-the-art performance, surpassing previous methods by significant margins. Specifically, SPIdepth tops the self-supervised KITTI benchmark. Additionally, SPIdepth achieves the lowest AbsRel (0.029), SqRel (0.069), and RMSE (1.394) on KITTI, establishing new state-of-the-art results. On Cityscapes, SPIdepth shows improvements over SQLdepth of 21.7% in AbsRel, 36.8% in SqRel, and 16.5% in RMSE, even without using motion masks. On Make3D, SPIdepth in zero-shot outperforms all other models. Remarkably, SPIdepth achieves these results using only a single image for inference, surpassing even methods that utilize video sequences for inference, thus demonstrating its efficacy and efficiency in real-world applications. Our approach represents a significant leap forward in self-supervised monocular depth estimation, underscoring the importance of strengthening pose information for advancing scene understanding in real-world applications.

翻訳日:2024-08-05 18:13:29 公開日:2024-08-02

# ボトルネックの漂流:ニューラルネットにおける逆行学習から逆行非依存的ドメイン適応学習への進化的移行

Breaching the Bottleneck: Evolutionary Transition from Reward-Driven Learning to Reward-Agnostic Domain-Adapted Learning in Neuromodulated Neural Nets ( http://arxiv.org/abs/2404.12631v2 )

ライセンス: Link先を確認

Solvi Arnold, Reiji Suzuki, Takaya Arita, Kimitoshi Yamazaki,

(参考訳) 高度な生物学的知能は、行動品質に対するフィードバックが不足したり欠如している場合でも、情報豊富な刺激情報のストリームから効率的に学習する。このような学習はタスク領域に関する暗黙の仮定を利用する。ドメイン適応学習(Domain-Adapted Learning, DAL)などを指す。対照的に、AI学習アルゴリズムは、適合した振る舞いを取得するために、外部から提供された行動品質の測定に頼っている。これにより、学習効率を制限し、多様な非逆刺激情報からの学習を妨げる情報ボトルネックが課せられる。生物進化がこのボトルネックを回避してDALを発生させるのかという問題を考察する。まず、報奨信号から学習する能力を進化させ、非効率(ボトルネック化)だが広い適応性を提供することを提案する。そこから、学習プロセスへの非逆情報の統合は、特定のタスク領域におけるそのような情報によって引き起こされるバイアスの段階的な蓄積によって進行する。このシナリオは、ボトルネックのないドメイン適応学習への生物学的に妥当な経路を提供する。このシナリオの第2フェーズに着目して、強化学習(Reinforcement Learning, A2C)をモデルとした報酬駆動学習(Reinforcement Learning, A2C)によるNNの集団を構築し、神経変調更新機構を用いて学習プロセスに非逆情報を統合することにより、学習効率の向上を可能にする。連続2次元空間におけるナビゲーションタスクでは、進化したDALエージェントは純粋なRLエージェントに比べて学習速度が300倍に向上している。進化は報酬情報への依存を完全に排除し、DALエージェントは、局所的な神経変調に基づく接続重み更新のみを使用して、非逆情報からのみ学習することができる。 github.com/aislab/dalで公開されている。

Advanced biological intelligence learns efficiently from an information-rich stream of stimulus information, even when feedback on behaviour quality is sparse or absent. Such learning exploits implicit assumptions about task domains. We refer to such learning as Domain-Adapted Learning (DAL). In contrast, AI learning algorithms rely on explicit externally provided measures of behaviour quality to acquire fit behaviour. This imposes an information bottleneck that precludes learning from diverse non-reward stimulus information, limiting learning efficiency. We consider the question of how biological evolution circumvents this bottleneck to produce DAL. We propose that species first evolve the ability to learn from reward signals, providing inefficient (bottlenecked) but broad adaptivity. From there, integration of non-reward information into the learning process can proceed via gradual accumulation of biases induced by such information on specific task domains. This scenario provides a biologically plausible pathway towards bottleneck-free, domain-adapted learning. Focusing on the second phase of this scenario, we set up a population of NNs with reward-driven learning modelled as Reinforcement Learning (A2C), and allow evolution to improve learning efficiency by integrating non-reward information into the learning process using a neuromodulatory update mechanism. On a navigation task in continuous 2D space, evolved DAL agents show a 300-fold increase in learning speed compared to pure RL agents. Evolution is found to eliminate reliance on reward information altogether, allowing DAL agents to learn from non-reward information exclusively, using local neuromodulation-based connection weight updates only. Code available at github.com/aislab/dal.

翻訳日:2024-08-05 18:13:29 公開日:2024-08-02

# 動画フレーム補間のための動き認識潜時拡散モデル

Motion-aware Latent Diffusion Models for Video Frame Interpolation ( http://arxiv.org/abs/2404.13534v3 )

ライセンス: Link先を確認

Zhilin Huang, Yijie Yu, Ling Yang, Chujun Qin, Bing Zheng, Xiawu Zheng, Zikun Zhou, Yaowei Wang, Wenming Yang,

(参考訳) AIGCの進歩に伴い、ビデオフレーム補間(VFI)は既存のビデオ生成フレームワークにおいて重要な要素となり、幅広い研究の関心を集めている。 VFIタスクでは、隣接するフレーム間の動き推定が、動きのあいまいさを避ける上で重要な役割を果たす。しかし、既存のVFI手法は連続するフレーム間の動き情報を正確に予測するのに常に苦労しており、この不正確な推定は曖昧で視覚的に不整合なフレームに繋がる。本稿では,VFIタスクに特化して設計された新しい拡散フレームワークである動き認識潜在拡散モデル(MADiff)を提案する。拡散サンプリング手順を通じて予測される目標補間フレームと条件付き隣接フレーム間の動作先を組み込むことで、MADiffは中間結果を徐々に洗練し、視覚的に滑らかでリアルな結果の両方を生成する。特に複雑な動きを伴う動的テクスチャを含む難解なシナリオにおいて,提案手法が既存手法よりも優れた性能を発揮することを示す。

With the advancement of AIGC, video frame interpolation (VFI) has become a crucial component in existing video generation frameworks, attracting widespread research interest. For the VFI task, the motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity. However, existing VFI methods always struggle to accurately predict the motion information between consecutive frames, and this imprecise estimation leads to blurred and visually incoherent interpolated frames. In this paper, we propose a novel diffusion framework, motion-aware latent diffusion models (MADiff), which is specifically designed for the VFI task. By incorporating motion priors between the conditional neighboring frames with the target interpolated frame predicted throughout the diffusion sampling procedure, MADiff progressively refines the intermediate outcomes, culminating in generating both visually smooth and realistic results. Extensive experiments conducted on benchmark datasets demonstrate that our method achieves state-of-the-art performance significantly outperforming existing approaches, especially under challenging scenarios involving dynamic textures with complex motion.

翻訳日:2024-08-05 18:13:29 公開日:2024-08-02

# 対超流体の非局所次数パラメータ

Nonlocal order parameter of pair superfluids ( http://arxiv.org/abs/2404.15972v3 )

ライセンス: Link先を確認

Nitya Cuzzuol, Arianna Montorsi, Luca Barbiero,

(参考訳) 順序パラメータは、量子物質を特徴づける基本的な資源を表す。局所密度測定により導出可能な非局所秩序パラメータである奇数パリティ(英語版)を用いて,ペア超流動を厳密に定義できることが示される。研究の例として,1次元と2次元の異なる密度のボース・ハバードモデルについて検討する。ここでは, 相対的に強い相互作用に対して, 対超流動性を求める。奇パリティ作用素は、系の密度とその次元によらず、そのような位相のユニークな順序パラメータとして作用する。我々の発見を強制するために、我々は、超低温原子系において、実験的な実現がタイムリーな話題である2成分のボース・ハバード・ハミルトン系にも、我々のアプローチの一般性を確認する。その結果, 対超流動における相関密度変動の役割に新たな光を当てた。さらに、これらのエキゾチック相を実験的に検出し、原子超流動相への遷移を特徴づけるための強力なツールを提供する。

Order parameters represent a fundamental resource to characterize quantum matter. We show that pair superfluids can be rigorously defined in terms of a nonlocal order parameter, named odd parity, which derivation is experimentally accessible by local density measurements. As a case of study, we first investigate a constrained Bose-Hubbard model at different densities, both in one and two spatial dimensions. Here, our analysis finds pair superfluidity for relatively strong attractive interactions. The odd parity operator acts as the unique order parameter for such phase irrespectively to the density of the system and its dimensionality. In order to enforce our finding, we confirm the generality of our approach also on a two-component Bose-Hubbard Hamiltonian, which experimental realization represents a timely topic in ultracold atomic systems. Our results shed new light on the role of correlated density fluctuations in pair superfluids. In addition, they provide a powerful tool for the experimental detection of such exotic phases and the characterization of their transition to the atomic superfluid phase.

翻訳日:2024-08-05 18:03:40 公開日:2024-08-02

# LlamaTouch: モバイルUIタスク自動化のための忠実でスケーラブルなテストベッド

LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation ( http://arxiv.org/abs/2404.16054v2 )

ライセンス: Link先を確認

Li Zhang, Shihe Wang, Xianqing Jia, Zhihan Zheng, Yunhe Yan, Longxi Gao, Yuanchun Li, Mengwei Xu,

(参考訳) 創発的な大規模言語/マルチモーダルモデルは、特にモバイルUIタスク自動化において、モバイルエージェントの進化を促進する。しかしながら、エージェント予測されたアクションと事前に定義されたアクションシーケンスを比較するために、人間の検証や確立されたデータセットに依存する既存の評価アプローチは、スケール不可能であり、不信である。これらの制限を克服するために、デバイス上でのモバイルUIタスク実行と忠実でスケーラブルなタスク評価のためのテストベッドであるLlamaTouchを提案する。タスク実行プロセスがUI状態のみを転送することを確認することで、LlamaTouchは、エージェントが手動でアノテートされた本質的なアプリケーション/システム状態をトラバースするかどうかのみを評価する、新しい評価アプローチを採用する。 1)モバイルエージェントがタスク実行のためにリアルなモバイル環境と対話できるオンデバイスタスク実行。 2) ピクセルレベルのスクリーンショットとテキスト画面階層をマージして、設計済みのアノテーションプリミティブの豊富なセットで必須のUIコンポーネントを明示的に識別し、正確にアノテートする、きめ細かいUIコンポーネントアノテーション。 (3) 予測不能なUIレイアウト/コンテントダイナミックスであっても、精度とファジィマッチングを利用して各画面の重要情報を正確に検出するマルチレベルアプリケーション状態マッチングアルゴリズム。現在、LlamaTouchには4つのモバイルエージェントと496のタスクが組み込まれています。評価結果は,LlamaTouchの実環境における評価の忠実度の高さと,人間の検証よりも優れたスケーラビリティを示す。 LlamaTouchはまた、タスクアノテーションと新しいモバイルエージェントの統合を可能にする。コードとデータセットはhttps://github.com/LlamaTouch/LlamaTouchで公開されている。

The emergent large language/multimodal models facilitate the evolution of mobile agents, especially in mobile UI task automation. However, existing evaluation approaches, which rely on human validation or established datasets to compare agent-predicted actions with predefined action sequences, are unscalable and unfaithful. To overcome these limitations, this paper presents LlamaTouch, a testbed for on-device mobile UI task execution and faithful, scalable task evaluation. By observing that the task execution process only transfers UI states, LlamaTouch employs a novel evaluation approach that only assesses whether an agent traverses all manually annotated, essential application/system states. LlamaTouch comprises three key techniques: (1) On-device task execution that enables mobile agents to interact with realistic mobile environments for task execution. (2) Fine-grained UI component annotation that merges pixel-level screenshots and textual screen hierarchies to explicitly identify and precisely annotate essential UI components with a rich set of designed annotation primitives. (3) A multi-level application state matching algorithm that utilizes exact and fuzzy matching to accurately detect critical information in each screen, even with unpredictable UI layout/content dynamics. LlamaTouch currently incorporates four mobile agents and 496 tasks, encompassing both tasks in the widely-used datasets and our self-constructed ones to cover more diverse mobile applications. Evaluation results demonstrate LlamaTouch's high faithfulness of evaluation in real-world mobile environments and its better scalability than human validation. LlamaTouch also enables easy task annotation and integration of new mobile agents. Code and dataset are publicly available at https://github.com/LlamaTouch/LlamaTouch.

翻訳日:2024-08-05 18:03:40 公開日:2024-08-02

# 正準決定ダイアグラムによるモデュロ理論

Canonical Decision Diagrams Modulo Theories ( http://arxiv.org/abs/2404.16455v3 )

ライセンス: Link先を確認

Massimo Michelutti, Gabriele Masina, Giuseppe Spallitta, Roberto Sebastiani,

(参考訳) 決定図(Decision diagrams, DD)は、多くの領域、特に形式的検証や知識コンパイルにおいて、効果的に命題式を表現する強力なツールである。 DDのいくつかの形式(例: OBDDs, SDDs)は標準的であり、(原子リスト上の与えられた条件の下では)公式の同値類を一意的に表す。命題論理の限られた表現性を考えると、DDをSMTレベルに活用する試みが文献で紹介されている。残念なことに、これらの技術は依然としていくつかの制限に悩まされている: ほとんどの手順は理論固有のものであり、いくつかの生成理論DD(T-DDs)は T-valid 式や T-consistent 式を単項的に表さない。また、これらの手順は実装が簡単ではなく、実際に実装できるものはほとんどありません。本稿では,全SMT ソルバと DD パッケージをブラックボックスとして実装することは極めて容易であり,すべての DD の形式や,AllSMT ソルバがサポートする理論,あるいはその組み合わせに対して有効であり,提案 DD が正則であれば理論-正準 T-DD を生成するという,SMT レベルに DD を活用するための新しい手法を提案する。我々は,OBDDとSDDパッケージとMathSAT SMTソルバ上に,T-OBDDとT-SDDのプロトタイプツールを実装した。いくつかの予備的な経験的評価は、アプローチの有効性を支持する。

Decision diagrams (DDs) are powerful tools to represent effectively propositional formulas, which are largely used in many domains, in particular in formal verification and in knowledge compilation. Some forms of DDs (e.g., OBDDs, SDDs) are canonical, that is, (under given conditions on the atom list) they univocally represent equivalence classes of formulas. Given the limited expressiveness of propositional logic, a few attempts to leverage DDs to SMT level have been presented in the literature. Unfortunately, these techniques still suffer from some limitations: most procedures are theory-specific; some produce theory DDs (T-DDs) which do not univocally represent T-valid formulas or T-inconsistent formulas; none of these techniques provably produces theory-canonical T-DDs, which (under given conditions on the T-atom list) univocally represent T-equivalence classes of formulas. Also, these procedures are not easy to implement, and very few implementations are actually available. In this paper, we present a novel very-general technique to leverage DDs to SMT level, which has several advantages: it is very easy to implement on top of an AllSMT solver and a DD package, which are used as blackboxes; it works for every form of DDs and every theory, or combination thereof, supported by the AllSMT solver; it produces theory-canonical T-DDs if the propositional DD is canonical. We have implemented a prototype tool for both T-OBDDs and T-SDDs on top of OBDD and SDD packages and the MathSAT SMT solver. Some preliminary empirical evaluation supports the effectiveness of the approach.

翻訳日:2024-08-05 18:03:40 公開日:2024-08-02

# 大規模言語モデルのイベント推論に関する総合的評価

A Comprehensive Evaluation on Event Reasoning of Large Language Models ( http://arxiv.org/abs/2404.17513v2 )

ライセンス: Link先を確認

Zhengwei Tao, Zhi Jin, Yifan Zhang, Xiancai Chen, Haiyan Zhao, Jia Li, Bing Liang, Chongyang Tao, Qun Liu, Kam-Fai Wong,

(参考訳) イベント推論は多くのアプリケーションの基礎となる基本的な能力です。グローバルな推論を行うにはイベントスキーマの知識が必要であり、イベント間の関係や推論パラダイムの多様性を扱う必要がある。 LLMが、様々な関係や推論パラダイムに基づいたイベント推論をいかにうまく達成するかは、いまだに不明である。この格差を緩和するため,LLMの事象推論能力について総合的に評価した。本稿ではEVent推論のEValuationのための新しいベンチマークEV2を紹介する。 EV2はスキーマとインスタンスの評価の2つのレベルから構成されており、関係性や推論のパラダイムにおいて包括的である。 EV2について広範な実験を行った。 LLMにはイベント推論を実現する能力があるが、その性能は十分ではない。また,LLMにおける事象推論能力の不均衡にも気付く。 LLMにはイベントスキーマの知識もありますが、その知識の活用方法については、人間と一致していません。これらの知見に基づいて,イベントスキーマの知識をメモリとして活用することで,イベント推論の改善を導出する。

Event reasoning is a fundamental ability that underlies many applications. It requires event schema knowledge to perform global reasoning and needs to deal with the diversity of the inter-event relations and the reasoning paradigms. How well LLMs accomplish event reasoning on various relations and reasoning paradigms remains unknown. To mitigate this disparity, we comprehensively evaluate the abilities of event reasoning of LLMs. We introduce a novel benchmark EV2 for EValuation of EVent reasoning. EV2 consists of two levels of evaluation of schema and instance and is comprehensive in relations and reasoning paradigms. We conduct extensive experiments on EV2. We find that LLMs have abilities to accomplish event reasoning but their performances are far from satisfactory. We also notice the imbalance of event reasoning abilities in LLMs. Besides, LLMs have event schema knowledge, however, they're not aligned with humans on how to utilize the knowledge. Based on these findings, we guide the LLMs in utilizing the event schema knowledge as memory leading to improvements on event reasoning.

翻訳日:2024-08-05 18:03:40 公開日:2024-08-02

# MiPa:Mixed Patch Infrared-VisibleModality Agnostic Object Detection

MiPa: Mixed Patch Infrared-Visible Modality Agnostic Object Detection ( http://arxiv.org/abs/2404.18849v2 )

ライセンス: Link先を確認

Heitor R. Medeiros, David Latortue, Eric Granger, Marco Pedersoli,

(参考訳) 現実のシナリオでは、可視光(RGB)や赤外線(IR)のような複数のモードを使用することで、オブジェクト検出(OD)のような予測タスクの性能を大幅に向上させることができる。マルチモーダル学習は、これらのモダリティを活用する一般的な方法であり、複数のモダリティ固有のエンコーダと融合モジュールを用いて性能を向上させる。本稿では、RGBとIRのモダリティを1つの共有ビジョンエンコーダによって1つのモダリティまたはもう1つのモダリティのみを観測する別の方法に取り組む。この現実的な設定はメモリフットプリントが低く、RGBやIRデータに依存する自律運転や監視といったアプリケーションに適している。しかし、1つのエンコーダを複数のモダリティで学習すると、一方のモダリティが他方を支配し、不均一な認識結果を生み出す。本研究では、RGBとIRのモダリティを効率よく活用して、共通トランスフォーマーベースのODビジョンエンコーダをトレーニングし、モダリティの不均衡の影響に対処する方法について検討する。そこで本研究では,2つのモダリティの共通表現を学習するために,パッチワイドなモダリティ非依存モジュールと組み合わせたMiPa(MiPa)の新たなトレーニング手法を提案する。我々の実験は、MiPaが従来のRGB/IRベンチマークで競合する結果に到達するための表現を学習できることを示し、推論中に単一のモダリティしか必要としないことを示した。私たちのコードは、https://github.com/heitorrapela/MiPa.comで利用可能です。

In real-world scenarios, using multiple modalities like visible (RGB) and infrared (IR) can greatly improve the performance of a predictive task such as object detection (OD). Multimodal learning is a common way to leverage these modalities, where multiple modality-specific encoders and a fusion module are used to improve performance. In this paper, we tackle a different way to employ RGB and IR modalities, where only one modality or the other is observed by a single shared vision encoder. This realistic setting requires a lower memory footprint and is more suitable for applications such as autonomous driving and surveillance, which commonly rely on RGB and IR data. However, when learning a single encoder on multiple modalities, one modality can dominate the other, producing uneven recognition results. This work investigates how to efficiently leverage RGB and IR modalities to train a common transformer-based OD vision encoder, while countering the effects of modality imbalance. For this, we introduce a novel training technique to Mix Patches (MiPa) from the two modalities, in conjunction with a patch-wise modality agnostic module, for learning a common representation of both modalities. Our experiments show that MiPa can learn a representation to reach competitive results on traditional RGB/IR benchmarks while only requiring a single modality during inference. Our code is available at: https://github.com/heitorrapela/MiPa.

翻訳日:2024-08-05 18:03:40 公開日:2024-08-02

# 単一光子検出は光子確率振幅の干渉を示すために必ずしも必要ではない

Detecting single photons is not always necessary to evidence interference of photon probability amplitudes ( http://arxiv.org/abs/2405.01050v4 )

ライセンス: Link先を確認

Eric Lantz, Fabrice Devaux, Serge Massar,

(参考訳) 偶然の偶然の抽出は、量子光学実験の一般的な実践である。真空圧縮のようなゼロ平均ガウス状態の場合、偶然の一致を取り除いた場合、測定結果は、非常に低いフラックスでの光子偶然と強度の共分散の両方で定量的に同じであることを示す。したがって、光子波動関数の干渉や光子束の干渉のような光子レベルの純粋な量子効果は、自発的なダウン変換から発行されるマクロビームのゆらぎの相関で再現される。これは、検出分解能がコヒーレンスセル(モードのサイズ)よりも小さい場合と、ウィグナー関数のサンプリングに基づく確率シミュレーションの場合の両方に当てはまる。本稿では,ベルの不等式(偶発的偶然を減じることができない),量子イメージングなどの多モードな状況,高次相関など,この対応の限界について論じる。

Subtracting accidental coincidences is a common practice quantum optics experiments. For zero mean Gaussian states, such as squeezed vacuum, we show that if one removes accidental coincidences the measurement results are quantitatively the same, both for photon coincidences at very low flux and for intensity covariances. Consequently, pure quantum effects at the photon level, like interference of photon wave functions or photon bunching, are reproduced in the correlation of fluctuations of macroscopic beams issued from spontaneous down conversion. This is true both in experiment if the detection resolution is smaller than the coherence cell (size of the mode), and in stochastic simulations based on sampling the Wigner function. We discuss the limitations of this correspondence, such as Bell inequalities (for which one cannot substract accidental coincidences), highly multimode situations such as quantum imaging, and higher order correlations.

翻訳日:2024-08-05 18:03:40 公開日:2024-08-02

# 時間を無駄にしない - クロスバリデーションの早期停止

Don't Waste Your Time: Early Stopping Cross-Validation ( http://arxiv.org/abs/2405.03389v2 )

ライセンス: Link先を確認

Edward Bergman, Lennart Purucker, Frank Hutter,

(参考訳) グラフデータのための最先端の自動機械学習システムは、しばしばクロスバリデーションを使用し、測定されたパフォーマンスが目に見えないデータに一般化すること、あるいはその後のアンサンブルが過度に適合しないことを保証する。しかし、ホールドアウトバリデーションの代わりにk倍のクロスバリデーションを使用すると、単一の構成を検証する計算コストが大幅に増大する。より良い一般化と、拡張によって、より良いパフォーマンスを保証する一方で、追加のコストは、しばしば時間予算内で効果的なモデル選択を禁止します。クロスバリデーションによるモデル選択をより効果的にすることを目指している。そこで本研究では,モデル選択時のクロスバリデーションプロセスの早期停止について検討する。我々は,36の分類データセットにおいて,早期停止が2つのアルゴリズム(MLPとランダムフォレスト)のランダム探索に与える影響について検討した。さらに, 3, 5-, 10-folds を考慮し, 折りたたみ数の影響を解析した。さらに,ランダム探索の代わりにベイズ最適化による早期停止の効果と,繰り返しのクロスバリデーションについて検討した。我々の探索的研究は、単純な理解と実装の容易な方法でさえ、モデル選択が一貫して高速に収束できることを示し、全てのデータセットの94%が平均214%の速度でモデル選択を行う。さらに、クロスバリデーションの停止により、1時間以内に平均で+167%の構成を考慮し、モデル選択により検索空間をより徹底的に探索できると同時に、全体的なパフォーマンスも向上する。

State-of-the-art automated machine learning systems for tabular data often employ cross-validation; ensuring that measured performances generalize to unseen data, or that subsequent ensembling does not overfit. However, using k-fold cross-validation instead of holdout validation drastically increases the computational cost of validating a single configuration. While ensuring better generalization and, by extension, better performance, the additional cost is often prohibitive for effective model selection within a time budget. We aim to make model selection with cross-validation more effective. Therefore, we study early stopping the process of cross-validation during model selection. We investigate the impact of early stopping on random search for two algorithms, MLP and random forest, across 36 classification datasets. We further analyze the impact of the number of folds by considering 3-, 5-, and 10-folds. In addition, we investigate the impact of early stopping with Bayesian optimization instead of random search and also repeated cross-validation. Our exploratory study shows that even a simple-to-understand and easy-to-implement method consistently allows model selection to converge faster; in ~94% of all datasets, on average by ~214%. Moreover, stopping cross-validation enables model selection to explore the search space more exhaustively by considering +167% configurations on average within one hour, while also obtaining better overall performance.

翻訳日:2024-08-05 18:03:40 公開日:2024-08-02

# MovieLensの信奉データ:オンラインレコメンダシステムのためのプレChoiceデータ収集

The MovieLens Beliefs Dataset: Collecting Pre-Choice Data for Online Recommender Systems ( http://arxiv.org/abs/2405.11053v3 )

ライセンス: Link先を確認

Guy Aridor, Duarte Goncalves, Ruoyan Kong, Daniel Kluver, Joseph Konstan,

(参考訳) レコメンデーションシステムをデザインする上でますます重要な側面は、リコメンデーションが消費者の選択にどのように影響するかを検討することである。本稿では,未経験項目に対するユーザの信念を収集する手法を導入することでこの問題に対処する。この手法をMovieLensプラットフォームに実装し,ユーザ評価,信条,レコメンデーションを組み合わせたリッチデータセットを構築した。このようなデータ収集の課題には、応答における選択バイアスや、製品空間の限定的なカバレッジなどが含まれる。このユニークなリソースにより、研究者はユーザーの振る舞いを深く掘り下げ、不在のレコメンデーションを分析し、レコメンデーションの有効性を計測し、ユーザー信条データを活用するアルゴリズムのプロトタイプを作成することができ、最終的にはより影響力のあるレコメンデーションシステムに繋がる。データセットはhttps://grouplens.org/datasets/movielens/ml_belief_2024/で見ることができる。

An increasingly important aspect of designing recommender systems involves considering how recommendations will influence consumer choices. This paper addresses this issue by introducing a method for collecting user beliefs about un-experienced items - a critical predictor of choice behavior. We implemented this method on the MovieLens platform, resulting in a rich dataset that combines user ratings, beliefs, and observed recommendations. We document challenges to such data collection, including selection bias in response and limited coverage of the product space. This unique resource empowers researchers to delve deeper into user behavior and analyze user choices absent recommendations, measure the effectiveness of recommendations, and prototype algorithms that leverage user belief data, ultimately leading to more impactful recommender systems. The dataset can be found at https://grouplens.org/datasets/movielens/ml_belief_2024/.

翻訳日:2024-08-05 18:03:40 公開日:2024-08-02

# NISQフレンドリーなHHLアルゴリズムのボットネックの同定

Identifying Bottlenecks of NISQ-friendly HHL algorithms ( http://arxiv.org/abs/2406.06288v2 )

ライセンス: Link先を確認

Marc Andreu Marfany, Alona Sakhnenko, Jeanette Miriam Lorenz,

(参考訳) 量子コンピューティングは、ハードウェアスタックが成熟すると、例えばHHLアルゴリズムによる大規模線形方程式システムのような、大きな問題インスタンスの解決を可能にすることを約束する。将来の量子コンピューティングは、ノイズなどのハードウェアの欠陥をアルゴリズムが考慮する必要があるいわゆるNISQ時代のままである。本研究は,HHLアルゴリズムの最も資源を消費するコンポーネントであるQPEとそのNISQ適応反復QPEのスケーリング特性と直接的ノイズレジリエンスをテストするための実証的研究である。そこで我々は,これらのアルゴリズムにおける雑音低減手法の有効性について検討し,入力に間隔制約を課すことでゲート数を低く抑えることができるか,Qiskitパッケージが提供する回路最適化手法を用いて検討する。この結果から,現在利用可能なQiskitreadoutやM Threereadoutパッケージなどのノイズ低減技術は,ここでテストした小さなインスタンスにおいても,結果の回復には不十分であることが示唆された。さらに,本研究の結果から,これらのアルゴリズムの精度向上に伴うスケーリングが,最も重大な障害であることが示唆された。これらの知見により、QPEと同様の時間進化を考慮したアルゴリズムの近似ボトルネックを導出することができた。このような観測は、NISQデバイスにおけるそのようなアルゴリズムの弱点の証拠を提供し、有意義な将来の研究方向性を定式化するのに役立ちます。

Quantum computing promises enabling solving large problem instances, e.g. large linear equation systems with HHL algorithm, once the hardware stack matures. For the foreseeable future quantum computing will remain in the so-called NISQ era, in which the algorithms need to account for the flaws of the hardware such as noise. In this work, we perform an empirical study to test scaling properties and directly related noise resilience of the the most resources-intense component of the HHL algorithm, namely QPE and its NISQ-adaptation Iterative QPE. We explore the effectiveness of noise mitigation techniques for these algorithms and investigate whether we can keep the gate number low by enforcing sparsity constraints on the input or using circuit optimization techniques provided by Qiskit package. Our results indicate that currently available noise mitigation techniques, such as Qiskit readout and Mthree readout packages, are insufficient for enabling results recovery even in the small instances tested here. Moreover, our results indicate that the scaling of these algorithms with increase in precision seems to be the most substantial obstacle. These insights allowed us to deduce an approximate bottleneck for algorithms that consider a similar time evolution as QPE. Such observations provide evidence of weaknesses of such algorithms on NISQ devices and help us formulate meaningful future research directions.

翻訳日:2024-08-05 18:03:40 公開日:2024-08-02

# スパース・デンス混合符号化プロセスにおけるMajoranaオブジェクトの無散逸位相量子計算

Dissipationless topological quantum computation for Majorana objects in sparse-dense mixed encoding process ( http://arxiv.org/abs/2407.11544v2 )

ライセンス: Link先を確認

Ye-Min Zhan, Guan-Dong Mao, Yu-Ge Chen, Yue Yu, Xi Luo,

(参考訳) 2量子ビットの量子ゲートの少なくともいくつかは、量子ビットのフェルミオン(電荷またはスピン)パリティに依存しているため、マヨラナ天体に基づくトポロジカル量子計算は重要な課題である。この依存は、量子回路モデル内で量子プロセスを進めようとするとき、これらのゲートを含む量子演算を確率的に表す。このようなアプローチは、測定が望ましくないフェルミオンパリティをもたらすと、重大な情報損失につながる。情報の浪費問題を解決するため,不要なフェルミオンパリティから所望のフェミオンパリティへの情報の非散逸補正を可能にするトポロジカルな操作を考案した。我々は、制御NOTゲートに対してスパース・デンス混合符号化プロセスを用いて、計算量子ビットが持つ量子情報に影響を与えることなく、どのように修正を行うかを説明する。この補正プロセスは、望ましくない入力量子ビットかフェルミオンパリティ依存の量子ゲートのいずれかに適用することができ、Majorana-zero-modeベースおよびMajorana-edge-modeベースのトポロジカル量子計算に有効である。

Topological quantum computation based on Majorana objects is subject to a significant challenge because at least some of the two-qubit quantum gates rely on the fermion (either charge or spin) parity of the qubits. This dependency renders the quantum operations involving these gates probabilistic when attempting to advance quantum processes within the quantum circuit model. Such an approach leads to significant information loss whenever measurements yield the undesired fermion parity. To resolve the problem of wasting information, we devise topological operations that allow for the non-dissipative correction of information from undesired fermion parity to the desired one. We will use the sparse-dense mixed encoding process for the controlled-NOT gate as an example to explain how corrections can be implemented without affecting the quantum information carried by the computational qubits. This correction process can be applied {to} either the undesired input qubits or the fermion parity-dependent quantum gates, and it works for both Majorana-zero-mode-based and Majorana-edge-mode-based topological quantum computation.

翻訳日:2024-08-05 18:03:40 公開日:2024-08-02

# 量子チャネルの可視性に対する十分な基準

A Sufficient Criterion for Divisibility of Quantum Channels ( http://arxiv.org/abs/2407.17103v2 )

ライセンス: Link先を確認

Frederik vom Ende,

(参考訳) 我々は、ある量子チャネル $\Phi$ が可除であること、すなわち非自明な分解 $\Phi=\Phi_1\Phi_2$ が存在することを保証する、単純で次元に依存しない基準を示す。まず "elementary" チャネル $\Phi_2$ を定義し、次に $\Phi\Phi_2^{-1}$ が完全に正であるときに解析する。この方法で得られる十分条件は、$\Phi$ の明示的な因数分解をもたらそうとするものであり、$\langle x^\perp|\mathcal K_\Phi\mathcal K_\Phi^\perp|x\rangle=\langle x|\mathcal K_\Phi\mathcal K_\Phi\mathcal K_\Phi^\perp|x\rangle=\{0\}$ ここで$\mathcal K_\Phi$ は $\Phi$ のクラウス部分空間であり、$\mathcal K_\Phi^\perp$ はその直交補空間である。もちろん、線型性を用いることで、この基準は有限個の等式に還元できる。一般論として、この分割はクラウス階数をさらに低くするので、繰り返しアプリケーション(可能であれば)は、ある意味では「単純な」チャネルに$\Phi$を分解する。最後に、我々の技術は私たちが選択した特定の基本チャネルに限らないことに注意してください。

We present a simple, dimension-independent criterion which guarantees that some quantum channel $\Phi$ is divisible, i.e. that there exists a non-trivial factorization $\Phi=\Phi_1\Phi_2$. The idea is to first define an "elementary" channel $\Phi_2$ and then to analyze when $\Phi\Phi_2^{-1}$ is completely positive. The sufficient criterion obtained this way -- which even yields an explicit factorization of $\Phi$ -- is that one has to find orthogonal unit vectors $x,x^\perp$ such that $\langle x^\perp|\mathcal K_\Phi\mathcal K_\Phi^\perp|x\rangle=\langle x|\mathcal K_\Phi\mathcal K_\Phi^\perp|x\rangle=\{0\}$ where $\mathcal K_\Phi$ is the Kraus subspace of $\Phi$ and $\mathcal K_\Phi^\perp$ is its orthogonal complement. Of course, using linearity this criterion can be reduced to finitely many equalities. Generically, this division even lowers the Kraus rank which is why repeated application -- if possible -- results in a factorization of $\Phi$ into in some sense "simple" channels. Finally, be aware that our techniques are not limited to the particular elementary channel we chose.

翻訳日:2024-08-05 18:03:40 公開日:2024-08-02

# 社会技術スタック:非合意的近親メディアにおけるソーシャル・コンピューティング研究の機会

The Sociotechnical Stack: Opportunities for Social Computing Research in Non-consensual Intimate Media ( http://arxiv.org/abs/2405.03585v2 )

ライセンス: Link先を確認

Li Qiwei, Allison McDonald, Oliver L. Haimson, Sarita Schoenebeck, Eric Gilbert,

(参考訳) 非合意的親密なメディア(NCIM)は、人物の同意なしに親密なコンテンツを共有することであり、その中には「復讐ポルノ」や性的に露骨なディープフェイクが含まれる。 NCIMは過去10年間、法学、心理学、コミュニケーションの分野で注目を集めてきたが、コンピュータ奨学金では十分に扱われていない。本稿では、NCIMがそれらを促進する特定の技術コンポーネントに害を及ぼすことによって、このギャップを解消する。技術的スタックをそれに対応する社会的影響にマッピングするために設計された概念的フレームワークである社会技術的スタックを紹介する。社会工学的なスタックは、NCIMのような社会工学的な問題を解析し、コンピューティング研究の機会へ向けることを可能にする。本稿では,NCIMの潜伏を防止し,技術の構築と再構築を通じて被害者の生存を支援するための,コンピューティングと社会コンピューティングコミュニティのための研究ロードマップを提案する。

Non-consensual intimate media (NCIM) involves sharing intimate content without the depicted person's consent, including "revenge porn" and sexually explicit deepfakes. While NCIM has received attention in legal, psychological, and communication fields over the past decade, it is not sufficiently addressed in computing scholarship. This paper addresses this gap by linking NCIM harms to the specific technological components that facilitate them. We introduce the sociotechnical stack, a conceptual framework designed to map the technical stack to its corresponding social impacts. The sociotechnical stack allows us to analyze sociotechnical problems like NCIM, and points toward opportunities for computing research. We propose a research roadmap for computing and social computing communities to deter NCIM perpetration and support victim-survivors through building and rebuilding technologies.

翻訳日:2024-08-05 17:53:28 公開日:2024-08-02

# SemiCD-VL: ビジュアルランゲージモデル誘導による半教師付き変化検出器の改良

SemiCD-VL: Visual-Language Model Guidance Makes Better Semi-supervised Change Detector ( http://arxiv.org/abs/2405.04788v3 )

ライセンス: Link先を確認

Kaiyu Li, Xiangyong Cao, Yupeng Deng, Junmin Liu, Deyu Meng, Zhi Wang,

(参考訳) Change Detection (CD) は、画像間のセマンティックな変化でピクセルを識別することを目的としている。しかし、大量のピクセルレベルの画像に注釈を付けることは、特に人間の専門家によるピクセルレベルの比較を必要とするマルチテンポラリ画像に対して、労働集約的でコストがかかる。ゼロショットやオープンボキャブラリなどにおける視覚言語モデル(VLM)の性能を即時推論で向上させることを考えると,VLMを利用してラベル付きデータでより良いCDを作成することが期待できる。本稿では,VLM誘導に基づく半教師付きCD手法,すなわちSemiCD-VLを提案する。 SemiCD-VLの洞察は、VLMを用いて自由な変更ラベルを合成し、ラベルなしデータに対するさらなる監視信号を提供することである。しかしながら、現在のほとんどのVLMは単一時間画像用に設計されており、バイ時間画像や複数時間画像に直接適用することはできない。そこで我々はまず,VLMに基づく混合変化イベント生成(CEG)戦略を提案し,ラベルなしCDデータに擬似ラベルを付与する。これらのVLM駆動型擬似ラベルによって提供される追加の教師付き信号は、整合正則化パラダイム(例えば FixMatch)の擬似ラベルと矛盾する可能性があるため、異なる信号源を分離するための二重投影ヘッドを提案する。さらに、VLMによってガイドされる2つの補助セグメント化デコーダを通して、両時間画像の意味表現を明示的に分離する。最後に、モデルが変化表現をより適切にキャプチャするために、補助枝における特徴レベルのコントラスト損失によるメトリクス認識の監視を導入する。広汎な実験はセミCD-VLの利点を示している。例えば、SemiCD-VLはFixMatchベースラインをWHU-CDで+5.3 IoU、LEVIR-CDで+2.4 IoUで5%改善している。さらに、当社のCEG戦略は、教師なしの方法で、最先端の教師なしCD手法よりもはるかに優れた性能を達成することができる。

Change Detection (CD) aims to identify pixels with semantic changes between images. However, annotating massive numbers of pixel-level images is labor-intensive and costly, especially for multi-temporal images, which require pixel-wise comparisons by human experts. Considering the excellent performance of visual language models (VLMs) for zero-shot, open-vocabulary, etc. with prompt-based reasoning, it is promising to utilize VLMs to make better CD under limited labeled data. In this paper, we propose a VLM guidance-based semi-supervised CD method, namely SemiCD-VL. The insight of SemiCD-VL is to synthesize free change labels using VLMs to provide additional supervision signals for unlabeled data. However, almost all current VLMs are designed for single-temporal images and cannot be directly applied to bi- or multi-temporal images. Motivated by this, we first propose a VLM-based mixed change event generation (CEG) strategy to yield pseudo labels for unlabeled CD data. Since the additional supervised signals provided by these VLM-driven pseudo labels may conflict with the pseudo labels from the consistency regularization paradigm (e.g. FixMatch), we propose the dual projection head for de-entangling different signal sources. Further, we explicitly decouple the bi-temporal images semantic representation through two auxiliary segmentation decoders, which are also guided by VLM. Finally, to make the model more adequately capture change representations, we introduce metric-aware supervision by feature-level contrastive loss in auxiliary branches. Extensive experiments show the advantage of SemiCD-VL. For instance, SemiCD-VL improves the FixMatch baseline by +5.3 IoU on WHU-CD and by +2.4 IoU on LEVIR-CD with 5% labels. In addition, our CEG strategy, in an un-supervised manner, can achieve performance far superior to state-of-the-art un-supervised CD methods.

翻訳日:2024-08-05 17:53:28 公開日:2024-08-02

# ニュース記事イベントベース埋め込みの新しい手法

A Novel Method for News Article Event-Based Embedding ( http://arxiv.org/abs/2405.13071v2 )

ライセンス: Link先を確認

Koren Ishlach, Itzhak Ben-David, Michael Fire, Lior Rokach,

(参考訳) ニュース記事の埋め込みは、メディアバイアスの検出、偽ニュースの特定、ニュースレコメンデーションなど、複数の分野にとって重要なツールである。しかし、既存のニュース埋め込み手法は、ニュースイベントの潜在コンテキストを捉えるために最適化されていない。ほとんどの埋め込み方法はフルテキスト情報に依存し、時間関連の埋め込み生成を無視する。本稿では,記事に言及されているエンティティやテーマに着目し,特定のイベントに関連付けることで,ニュース埋め込み生成を最適化する,新しい軽量な手法を提案する。 3段階からなる手法を提案する。まず、与えられたニュース記事からイベント、エンティティ、テーマを処理し、抽出する。第2に、現在および歴史的データに基づいて時間分離したGloVeモデルをトレーニングすることにより、テーマとエンティティの周期的時間埋め込みを生成する。最後に、記事レベルのベクトルに対するSIF(Smooth Inverse Frequency)と、イベント関連情報による埋め込みのためのSamese Neural Networksの2つの異なるアプローチによって生成されたニュース埋め込みを結合する。我々はGDELTプロジェクトから,85万件以上のニュース記事と1000,000件のイベントを活用し,本手法の検証と評価を行った。検証のための様々なニュース埋め込み生成手法の比較分析を行った。提案手法は,共有イベント検出タスクにおける最先端手法の改善と性能向上の両立を実証した。

Embedding news articles is a crucial tool for multiple fields, such as media bias detection, identifying fake news, and making news recommendations. However, existing news embedding methods are not optimized to capture the latent context of news events. Most embedding methods rely on full-text information and neglect time-relevant embedding generation. In this paper, we propose a novel lightweight method that optimizes news embedding generation by focusing on entities and themes mentioned in articles and their historical connections to specific events. We suggest a method composed of three stages. First, we process and extract events, entities, and themes from the given news articles. Second, we generate periodic time embeddings for themes and entities by training time-separated GloVe models on current and historical data. Lastly, we concatenate the news embeddings generated by two distinct approaches: Smooth Inverse Frequency (SIF) for article-level vectors and Siamese Neural Networks for embeddings with nuanced event-related information. We leveraged over 850,000 news articles and 1,000,000 events from the GDELT project to test and evaluate our method. We conducted a comparative analysis of different news embedding generation methods for validation. Our experiments demonstrate that our approach can both improve and outperform state-of-the-art methods on shared event detection tasks.

翻訳日:2024-08-05 17:53:28 公開日:2024-08-02

# 多変量時系列のための多物理情報ニューラルネットワークのためのボンドグラフ

Bond Graphs for multi-physics informed Neural Networks for multi-variate time series ( http://arxiv.org/abs/2405.13586v2 )

ライセンス: Link先を確認

Alexis-Raja Brachet, Pierre-Yves Richard, Céline Hudelot,

(参考訳) ハイブリッド人工知能技術のトレンドの中で、物理情報機械学習はますます関心を集めている。主に、データ、学習、アーキテクチャバイアスをシミュレーションデータ、部分微分方程式、等分散と不変性で表す。流体力学のような1つの物理領域に関わるタスクでは大きな成功を収めてきたが、既存の手法は複雑な多物理現象や多ドメイン現象のタスクには適応していない。また、主にエンドツーエンドの学習方式として定式化されている。これらの課題に対処するために、メッセージパッシンググラフニューラルネットワークとともに、多物理モデリングアプローチであるボンドグラフを活用することを提案する。タスク固有モデルに入力可能な多物理インフォームド表現を生成するニューラルボンドグラフエンコーダ(NBgE)を提案する。ディープラーニングにデータバイアスとアーキテクチャバイアスを統合する統一的な方法を提供する。直流モータと呼吸システムという2つの挑戦的マルチドメイン物理システムに関する実験により,多変量時系列予測タスクにおけるアプローチの有効性を実証した。

In the trend of hybrid Artificial Intelligence techniques, Physical-Informed Machine Learning has seen a growing interest. It operates mainly by imposing data, learning, or architecture bias with simulation data, Partial Differential Equations, or equivariance and invariance properties. While it has shown great success on tasks involving one physical domain, such as fluid dynamics, existing methods are not adapted to tasks with complex multi-physical and multi-domain phenomena. In addition, it is mainly formulated as an end-to-end learning scheme. To address these challenges, we propose to leverage Bond Graphs, a multi-physics modeling approach, together with Message Passing Graph Neural Networks. We propose a Neural Bond graph Encoder (NBgE) producing multi-physics-informed representations that can be fed into any task-specific model. It provides a unified way to integrate both data and architecture biases in deep learning. Our experiments on two challenging multi-domain physical systems - a Direct Current Motor and the Respiratory System - demonstrate the effectiveness of our approach on a multivariate time-series forecasting task.

翻訳日:2024-08-05 17:53:28 公開日:2024-08-02

# 大規模言語モデルを用いたインフォーマティブテキスト評価の緩和

Eliciting Informative Text Evaluations with Large Language Models ( http://arxiv.org/abs/2405.15077v3 )

ライセンス: Link先を確認

Yuxuan Lu, Shengwei Xu, Yichi Zhang, Yuqing Kong, Grant Schoenebeck,

(参考訳) ピア予測機構は、証明可能な保証で高品質なフィードバックを動機付ける。しかし、現在の手法は、多重選択やスカラー数のような比較的単純なレポートにのみ適用される。我々は,これらの手法をテキストベースレポートの大規模領域に拡張することを目指しており,近年の大規模言語モデルの発展を反映している。これは、ピアレビュー、eコマースの顧客レビュー、ソーシャルメディアへのコメントなど、さまざまなフィードバックチャネルにおいて、テキストフィードバックが標準となっているため、ピア予測メカニズムの適用性を大幅に向上させる。本稿では,GPPM(Generative Peer Prediction Mechanism)とGSPPM(Generative Synopsis Peer Prediction Mechanism)の2つのメカニズムを紹介する。これらのメカニズムはLSMを予測因子として利用し、あるエージェントのレポートから仲間のレポートの予測にマッピングする。理論的には、LLM予測が十分正確であれば、我々のメカニズムは(近似)ベイズナッシュ平衡として高い努力と真理を動機付けることができる。実験により,Yelp レビューデータセットと ICLR OpenReview データセットという,2つの実際のデータセットで実施した実験を通じて,我々のメカニズムの有効性を確認した。 ICLRデータセットでは、人間によるレビュー、GPT-4生成レビュー、GPT-3.5生成レビューの3つの品質レベルを、期待されるスコアの観点から区別することが可能です。さらに、GSPPMはLPM生成レビューをGPPMよりも効果的にペナルティ化する。

Peer prediction mechanisms motivate high-quality feedback with provable guarantees. However, current methods only apply to rather simple reports, like multiple-choice or scalar numbers. We aim to broaden these techniques to the larger domain of text-based reports, drawing on the recent developments in large language models. This vastly increases the applicability of peer prediction mechanisms as textual feedback is the norm in a large variety of feedback channels: peer reviews, e-commerce customer reviews, and comments on social media. We introduce two mechanisms, the Generative Peer Prediction Mechanism (GPPM) and the Generative Synopsis Peer Prediction Mechanism (GSPPM). These mechanisms utilize LLMs as predictors, mapping from one agent's report to a prediction of her peer's report. Theoretically, we show that when the LLM prediction is sufficiently accurate, our mechanisms can incentivize high effort and truth-telling as an (approximate) Bayesian Nash equilibrium. Empirically, we confirm the efficacy of our mechanisms through experiments conducted on two real datasets: the Yelp review dataset and the ICLR OpenReview dataset. We highlight the results that on the ICLR dataset, our mechanisms can differentiate three quality levels -- human-written reviews, GPT-4-generated reviews, and GPT-3.5-generated reviews in terms of expected scores. Additionally, GSPPM penalizes LLM-generated reviews more effectively than GPPM.

翻訳日:2024-08-05 17:53:28 公開日:2024-08-02

# モデルレス強化学習のための多状態TDターゲット

Multi-State TD Target for Model-Free Reinforcement Learning ( http://arxiv.org/abs/2405.16522v4 )

ライセンス: Link先を確認

Wuhao Wang, Zhiyong Chen, Lepeng Zhang,

(参考訳) 時間差学習(TD learning)は、TDターゲットを用いて状態または状態-作用対の値推定を更新する強化学習の基本的な技術である。このターゲットは、後続状態の即時報酬と推定値の両方を組み込むことにより、真の価値の見積もりを改善することを表す。伝統的に、TD学習は後の1つの状態の価値に依存している。本稿では、その後の複数の状態の推定値を利用する拡張多状態TD(MSTD)ターゲットを提案する。この新たなMSTD概念に基づいて,リプレイバッファを2つのモードで管理し,深い決定論的ポリシー最適化(DDPG)とソフトアクタクリティカル(SAC)を統合した,完全なアクタ批判アルゴリズムを開発した。実験の結果,MSTDをターゲットとしたアルゴリズムは従来の手法に比べて学習性能を著しく向上させることがわかった。

Temporal difference (TD) learning is a fundamental technique in reinforcement learning that updates value estimates for states or state-action pairs using a TD target. This target represents an improved estimate of the true value by incorporating both immediate rewards and the estimated value of subsequent states. Traditionally, TD learning relies on the value of a single subsequent state. We propose an enhanced multi-state TD (MSTD) target that utilizes the estimated values of multiple subsequent states. Building on this new MSTD concept, we develop complete actor-critic algorithms that include management of replay buffers in two modes, and integrate with deep deterministic policy optimization (DDPG) and soft actor-critic (SAC). Experimental results demonstrate that algorithms employing the MSTD target significantly improve learning performance compared to traditional methods.The code is provided on GitHub.

翻訳日:2024-08-05 17:53:28 公開日:2024-08-02

# 動的トンネル法による変分量子アルゴリズムの大域的最適化

Global optimization in variational quantum algorithms via dynamic tunneling method ( http://arxiv.org/abs/2405.18783v2 )

ライセンス: Link先を確認

Seung Park, Kyunghyun Baek, Seungjin Lee, Mahn-Soo Choi,

(参考訳) 動的トンネル流れを利用した変分量子アルゴリズムのグローバル最適化ルーチンを提案する。もともと、局所最小値の周辺で勾配に基づく最適化器が収集した情報を活用するために設計されたもので、従来の動的トンネル流を量子状態の距離測定に応用し、量子状態のパラメトリゼーションから生じる外在的縮退の問題を解消する。パラメータ空間上のユークリッド距離測定に基づく従来の動的トンネル法と比較しながら, 横フィールドイジングモデルに対する変分量子固有解法に適用し, ルーチンの性能を実証する。

We present a global optimization routine for the variational quantum algorithms, which utilizes the dynamic tunneling flow. Originally designed to leverage information gathered by a gradient-based optimizer around local minima, we adapt the conventional dynamic tunneling flow to exploit the distance measure of quantum states, resolving issues of extrinsic degeneracy arising from the parametrization of quantum states. Our global optimization algorithm is applied to the variational quantum eigensolver for the transverse-field Ising model to demonstrate the performance of our routine while comparing it with the conventional dynamic tunneling method, which is based on the Euclidean distance measure on the parameter space.

翻訳日:2024-08-05 17:53:28 公開日:2024-08-02

# 変圧器を用いた量子回路アンサッツの表現性学習

Learning the expressibility of quantum circuit ansatz using transformer ( http://arxiv.org/abs/2405.18837v2 )

ライセンス: Link先を確認

Fei Zhang, Jie Li, Zhimin He, Haozhen Situ,

(参考訳) 特定の問題に対する指数関数的に高速な計算により、近年は量子コンピューティングに大きな注目を集めている。変分量子アルゴリズムは量子コンピューティングを実装する上で重要な手法であり、適切なタスク固有の量子回路アンサッツは、VQAの量子優位性を効果的に強化することができる。しかし、膨大な検索スペースは、最適なタスク固有のアンサッツを見つけるのを困難にしている。量子回路のアンザッツ状態の多様性を定量化してヒルベルト空間を効果的に探索する表現性は、一方のアンザッツが他方よりも優れているかどうかを評価するために用いられる。本研究では,量子回路のアンサーゼの表現可能性を予測するために,トランスフォーマーモデルを提案する。ゲートワイズパイプラインが生成するランダムなPQCを含むデータセットを構築する。回路の表現性は、KL偏差、相対KL偏差、最大平均偏差の3つの尺度を用いて計算される。トランスモデルをトレーニングし、回路特性と表現性の間の複雑な関係をキャプチャする。変圧器の性能評価には4つの評価指標が用いられる。数値的な結果から, 訓練されたモデルは, 様々な表現性尺度で高い性能とロバスト性を達成できることが示されている。この研究は、量子回路アンサーゼの表現可能性の理解を深め、量子アーキテクチャ探索アルゴリズムを進化させることが可能である。

With the exponentially faster computation for certain problems, quantum computing has garnered significant attention in recent years. Variational quantum algorithms are crucial methods to implement quantum computing, and an appropriate task-specific quantum circuit ansatz can effectively enhance the quantum advantage of VQAs. However, the vast search space makes it challenging to find the optimal task-specific ansatz. Expressibility, quantifying the diversity of quantum circuit ansatz states to explore the Hilbert space effectively, can be used to evaluate whether one ansatz is superior to another. In this work, we propose using a transformer model to predict the expressibility of quantum circuit ansatze. We construct a dataset containing random PQCs generated by the gatewise pipeline, with varying numbers of qubits and gates. The expressibility of the circuits is calculated using three measures: KL divergence, relative KL divergence, and maximum mean discrepancy. A transformer model is trained on the dataset to capture the intricate relationships between circuit characteristics and expressibility. Four evaluation metrics are employed to assess the performance of the transformer. Numerical results demonstrate that the trained model achieves high performance and robustness across various expressibility measures. This research can enhance the understanding of the expressibility of quantum circuit ansatze and advance quantum architecture search algorithms.

翻訳日:2024-08-05 17:53:28 公開日:2024-08-02

# LightDE: ダングリングポインターを除去するための軽量化手法

LightDE: A Lightweight Method for Eliminating Dangling Pointers ( http://arxiv.org/abs/2405.20697v2 )

ライセンス: Link先を確認

Xun An,

(参考訳) UAF(Use-After-Free)脆弱性が広く存在していることは、ソフトウェアのセキュリティに深刻な脅威をもたらす。しかし、ポインタのメモリアドレスを特定のデータ構造に格納するためには、ポインタの割り当て操作に遭遇する際に、ダングリングポインタを排除してUAF脆弱性を防御する既存の方法がプログラムの実行を中断する必要がある。これにより、これらのメソッドは軽量ではない。この欠点を克服するために、LightDEと呼ばれる新しいアプローチを提案する。この方法は、プログラム実行中にポインタのメモリアドレスを保存する必要はない。 LightDEは,提案手法を用いて,プログラムコンパイル中にプログラムのデータセグメント内のポインティング関係をどのオブジェクトに向けるかを判断し,格納する。 LightDEは、ダングリングポインタをなくす際に、リリースオブジェクトに対するポインタ解析ポイントによって特定されるポインタのみを検証する必要があるため、非常に軽量である。実験の結果、LightDEはUAFの脆弱性に対して効果的に防御できることがわかった。

The widespread presence of Use-After-Free (UAF) vulnerabilities poses a serious threat to software security, with dangling pointers being considered the primary cause of these vulnerabilities. However, existing methods for defending against UAF vulnerabilities by eliminating dangling pointers need to interrupt the program's execution when encountering pointer assignment operations in order to store the memory addresses of the pointers in a specific data structure. This makes these methods not lightweight. To overcome this drawback, we propose a novel approach called LightDE. This method does not require storing the memory addresses of pointers during program execution. LightDE uses our proposed structure-sensitive pointer analysis method to determine which objects pointers point to and stores the pointing relationships in the program's data segment during program compilation. Since LightDE only needs to verify if pointers identified by the pointer analysis point to released objects when eliminating dangling pointers, it is very lightweight. Our experimental results show that LightDE can effectively defend against UAF vulnerabilities and the performance overhead it introduces is very low.

翻訳日:2024-08-05 17:53:28 公開日:2024-08-02

# L-PR: 未順序低オーバーラップマルチビューポイントクラウド登録のためのLiDARフィデューシャルマーカーのエクスプロイト

L-PR: Exploiting LiDAR Fiducial Marker for Unordered Low Overlap Multiview Point Cloud Registration ( http://arxiv.org/abs/2406.03298v2 )

ライセンス: Link先を確認

Yibo Liu, Jinjun Shan, Amaldev Haridevan, Shuo Zhang,

(参考訳) ポイントクラウド登録は、コンピュータビジョンとロボティクスにおける多くのアプリケーションにとって必須条件である。既存の手法の多くは、高い重なり合いを持つ2点の雲をペアで登録することに焦点を当てている。重複の少ないケースにはいくつかの方法があるが、それらは劣化したシナリオで苦労している。本稿では,L-PRと呼ばれる新しいフレームワークについて紹介する。我々はこれらをLiDARフィデューシャルマーカーと呼んでいるが、一般的なエイプリルタグやArUcoマーカーと同じであり、環境の3次元幾何学に影響を与えない薄い紙のシートである。まず, 点雲間の視点が劇的に変化した場合に, 頑健な検出結果を提供する適応しきい値検出法を提案する。そこで,無秩序なマルチビューポイントクラウド登録問題をMAP問題として定式化し,それに対応するための2段階のグラフからなるフレームワークを開発する。重み付きグラフとして構築された第1レベルグラフは、非順序集合からスキャンポーズの初期値を効率よく最適に推定するように設計されている。第2レベルグラフは因子グラフとして構成される。スキャンポーズ,マーカーポーズ,マーカーコーナー位置など,グラフ上の変数をグローバルに最適化することにより,MAP問題に対処する。提案手法が従来のSOTA法を上回り,L-PRが低コストかつ効率的な3Dアセット収集・トレーニングデータ収集ツールとして機能することを示すために,定性的かつ定量的な実験を行った。特に,L-PRを用いたLvox-3DMatchという新しいデータセットを収集し,それをSOTA学習ベース手法であるSGHRのトレーニングに組み込むことにより,各種ベンチマークにおけるSGHRの大幅な改善を実現する。

Point cloud registration is a prerequisite for many applications in computer vision and robotics. Most existing methods focus on pairwise registration of two point clouds with high overlap. Although there have been some methods for low overlap cases, they struggle in degraded scenarios. This paper introduces a novel framework dubbed L-PR, designed to register unordered low overlap multiview point clouds leveraging LiDAR fiducial markers. We refer to them as LiDAR fiducial markers, but they are the same as the popular AprilTag and ArUco markers, thin sheets of paper that do not affect the 3D geometry of the environment. We first propose an improved adaptive threshold marker detection method to provide robust detection results when the viewpoints among point clouds change dramatically. Then, we formulate the unordered multiview point cloud registration problem as a maximum a-posteriori (MAP) problem and develop a framework consisting of two levels of graphs to address it. The first-level graph, constructed as a weighted graph, is designed to efficiently and optimally infer initial values of scan poses from the unordered set. The second-level graph is constructed as a factor graph. By globally optimizing the variables on the graph, including scan poses, marker poses, and marker corner positions, we tackle the MAP problem. We conduct both qualitative and quantitative experiments to demonstrate that the proposed method surpasses previous state-of-the-art (SOTA) methods and to showcase that L-PR can serve as a low-cost and efficient tool for 3D asset collection and training data collection. In particular, we collect a new dataset named Livox-3DMatch using L-PR and incorporate it into the training of the SOTA learning-based method, SGHR, which brings evident improvements for SGHR on various benchmarks.

翻訳日:2024-08-05 17:53:28 公開日:2024-08-02

# コンテクスト化されたベンディスコア誘導による生成画像のジオ多様性向上

Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance ( http://arxiv.org/abs/2406.04551v2 )

ライセンス: Link先を確認

Reyhane Askari Hemmat, Melissa Hall, Alicia Sun, Candace Ross, Michal Drozdzal, Adriana Romero-Soriano,

(参考訳) テキストから画像への生成モデルの人気が高まり、リスクやバイアスを理解することに焦点が当てられている。近年の研究では、最先端のモデルでは、日常の物体を現実世界の真の多様性で表現し、地理的領域間に顕著なギャップがあることが判明している。本研究では,地域ごとの変動が実世界の代表となるような共通オブジェクトの生成画像の多様性を高めることを目的としている。本稿では,従来の画像の「メモリバンク」と比較して,遅延拡散モデルの逆方向ステップを誘導し,サンプルの多様性を増大させるとともに,実世界の文脈化画像群の内部の変動量を制約する,推論時間介入(contextualized Vendi Score Guidance,c-VSG)を導入する。地理的に代表される2つのデータセットを用いてc-VSGを評価し、画像の品質と一貫性を同時に維持・改善しつつ、最もパフォーマンスの悪い領域と平均の両方において、生成された画像の多様性を著しく向上させることを示した。さらに、定性的分析により、原モデルに存在する還元領域の描写行を含む、生成画像の多様性が著しく改善されていることが明らかとなった。この研究が、世界の真の地理的多様性を反映した、テキストから画像への生成モデルへの一歩になることを願っています。

With the growing popularity of text-to-image generative models, there has been increasing focus on understanding their risks and biases. Recent work has found that state-of-the-art models struggle to depict everyday objects with the true diversity of the real world and have notable gaps between geographic regions. In this work, we aim to increase the diversity of generated images of common objects such that per-region variations are representative of the real world. We introduce an inference time intervention, contextualized Vendi Score Guidance (c-VSG), that guides the backwards steps of latent diffusion models to increase the diversity of a sample as compared to a "memory bank" of previously generated images while constraining the amount of variation within that of an exemplar set of real-world contextualizing images. We evaluate c-VSG with two geographically representative datasets and find that it substantially increases the diversity of generated images, both for the worst performing regions and on average, while simultaneously maintaining or improving image quality and consistency. Additionally, qualitative analyses reveal that diversity of generated images is significantly improved, including along the lines of reductive region portrayals present in the original model. We hope that this work is a step towards text-to-image generative models that reflect the true geographic diversity of the world.

翻訳日:2024-08-05 17:43:44 公開日:2024-08-02

# 言語モデルは特定の認知プロファイルをエミュレートする:予測可能性測定と個人差との相互作用に関する研究

Language models emulate certain cognitive profiles: An investigation of how predictability measures interact with individual differences ( http://arxiv.org/abs/2406.04988v2 )

ライセンス: Link先を確認

Patrick Haller, Lena S. Bolliger, Lena A. Jäger,

(参考訳) これまで, 集団レベルでは, 個人差によらず, 読書における主観的, エントロピー的効果に関するほとんどの調査が実施されてきた。本研究では,言語利用者の認知能力の情報を組み込んだ処理努力の指標として,人間の読取時間データに基づく言語モデル(LM)から推定される,素因とエントロピーの予測力を再考する。そこで本研究では,広範囲な心理測定試験を完了した個人から得られた読解データに基づいて,世代別およびエントロピーの推定値の予測力を評価した。具体的には,認知的スコアに対する主観的・エントロピーの調節が読解時間の予測精度を高めるかどうかを検討するとともに,認知的ハイパフォーマンス群や低パフォーマンス群の読解時間の予測において,LMが体系的なバイアスを示すかどうかを検証し,与えられたLMがどのような心理言語的対象をエミュレートするかを明らかにする。本研究は, 認知能力の付加は, 読解時間における主観的・エントロピーの予測能力を高め, 一般に, 心理測定試験における高い評価は, 予測可能性に対する感度の低下と関連していることを明らかにした。最後に, 分析したLMは, 対象群(高い言語知能を有する個人)に対して, 精度の低い予測可能性を示唆した。

To date, most investigations on surprisal and entropy effects in reading have been conducted on the group level, disregarding individual differences. In this work, we revisit the predictive power of surprisal and entropy measures estimated from a range of language models (LMs) on data of human reading times as a measure of processing effort by incorporating information of language users' cognitive capacities. To do so, we assess the predictive power of surprisal and entropy estimated from generative LMs on reading data obtained from individuals who also completed a wide range of psychometric tests. Specifically, we investigate if modulating surprisal and entropy relative to cognitive scores increases prediction accuracy of reading times, and we examine whether LMs exhibit systematic biases in the prediction of reading times for cognitively high- or low-performing groups, revealing what type of psycholinguistic subject a given LM emulates. Our study finds that in most cases, incorporating cognitive capacities increases predictive power of surprisal and entropy on reading times, and that generally, high performance in the psychometric tests is associated with lower sensitivity to predictability effects. Finally, our results suggest that the analyzed LMs emulate readers with lower verbal intelligence, suggesting that for a given target group (i.e., individuals with high verbal intelligence), these LMs provide less accurate predictability estimates.

翻訳日:2024-08-05 17:43:44 公開日:2024-08-02

# Bipartite Reweight-Annealingアルゴリズムによる絡み合いエントロピーとその誘導体の大規模データを高精度に抽出する

Bipartite reweight-annealing algorithm to extract large-scale data of entanglement entropy and its derivative in high precision ( http://arxiv.org/abs/2406.05324v3 )

ライセンス: Link先を確認

Zhe Wang, Zhiyan Wang, Yi-Ming Ding, Bin-Bin Mao, Zheng Yan,

(参考訳) 本稿では,量子モンテカルロ法(QMC)を用いて,エンタングルメントエントロピー(EE)とその誘導体の大規模データを高精度かつ低い技術的障壁で抽出する手法を提案する。我々は、異なる時空多様体内の2つの分割関数の重なりの直接計算を回避し、代わりにreweight-annealingスキームを介してそれらを別々に得る。インクリメンタルなプロセスはこのフレームの実際の物理パラメータの経路に沿って設計することができ、全ての中間子は対応するパラメータのEEであり、アルゴリズムの効率は10^4$以上改善される。 EEの計算はずっと安くなり、より簡単になります。 2次元および高次元系の広いパラメータ領域でEEを走査することで、新しい位相と位相遷移を数値的に検出する手段を開く。次に、EEとそのデリバティブを用いて位相遷移点を見つけ、新しい位相を探索する可能性を示す。

We propose a quantum Monte Carlo (QMC) scheme able to extract large-scale data of entanglement entropy (EE) and its derivative with high precision and low technical barrier. We avoid directly computing the overlap of two partition functions within different spacetime manifolds and instead obtain them separately via reweight-annealing scheme. The incremental process can be designed along the path of real physical parameters in this frame, and all intermediates are EEs of corresponding parameters, so the algorithm efficiency is improved by more than $10^4$ of times. The calculation of EE becomes much cheaper and simpler. It opens a way to numerically detect the novel phases and phase transitions by scanning EE in a wide parameter-region in two and higher dimensional systems. We then show the feasibility of using EE and its derivative to find phase transition points and to probe novel phases.

翻訳日:2024-08-05 17:43:44 公開日:2024-08-02

# 連続および離散量子バスを用いた光誘起ダイナミクス

Photo-induced dynamics with continuous and discrete quantum baths ( http://arxiv.org/abs/2406.07047v2 )

ライセンス: Link先を確認

Zhaoxuan Xie, Mattia Moroder, Ulrich Schollwöck, Sebastian Paeckel,

(参考訳) 複雑な分子における光物理過程の超高速量子力学は、量子化学と生物学における様々な興味深い応用で非常に難しい計算問題である。オープン量子系の最近の発展に触発されて、マルコフの埋め込みを用いて、離散的で効果的なボゾン自由度の集合を通して連続環境を記述する純粋状態の未発見ハイブリッドバス法を導入する。本手法は, 連続スペクトル密度と鋭いピークの双方を記述できる。これにより、離散振動モードの集合のユニタリダイナミクスを用いた長期記憶効果のキャプチャや、リンドブラッドやレッドフィールドのマスター方程式を用いたメモリレスマルコフ環境の利用といった、従来の手法の限界を克服する。量子化学と生物学の2つのパラダイム的問題に対して,本手法をベンチマークする。単元的記述と比較して、ボソニックモードの数が非常に少なく、エクシトニックダイナミクスを正確に記述でき、計算速度がほぼ1桁に向上することを示した。さらに、光ハーベスティング複合体のスペクトル密度が$$\delta$-peakの効果を明示的に考慮し、環境の長期記憶が動的に与える影響を強く示している。

The ultrafast quantum dynamics of photophysical processes in complex molecules is an extremely challenging computational problem with a wide variety of fascinating applications in quantum chemistry and biology. Inspired by recent developments in open quantum systems, we introduce a pure-state unraveled hybrid-bath method that describes a continuous environment via a set of discrete, effective bosonic degrees of freedom using a Markovian embedding. Our method is capable of describing both, a continuous spectral density and sharp peaks embedded into it. Thereby, we overcome the limitations of previous methods, which either capture long-time memory effects using the unitary dynamics of a set of discrete vibrational modes or use memoryless Markovian environments employing a Lindblad or Redfield master equation. We benchmark our method against two paradigmatic problems from quantum chemistry and biology. We demonstrate that compared to unitary descriptions, a significantly smaller number of bosonic modes suffices to describe the excitonic dynamics accurately, yielding a computational speed-up of nearly an order of magnitude. Furthermore, we take into account explicitly the effect of a $\delta$-peak in the spectral density of a light-harvesting complex, demonstrating the strong impact of the long-time memory of the environment on the dynamics.

翻訳日:2024-08-05 17:43:44 公開日:2024-08-02

# 実話にしよう:対面会話のための音声対話モデル

Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation ( http://arxiv.org/abs/2406.07867v2 )

ライセンス: Link先を確認

Se Jin Park, Chae Won Kim, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeong Hun Yeo, Yong Man Ro,

(参考訳) 本稿では,新しい対面音声対話モデルを提案する。ユーザ入力から音声-視覚音声を処理し、音声-視覚音声を応答として生成し、中間テキストに頼ることなくアバターチャットボットシステムを構築するための最初のステップを示す。この目的のために我々は,オープンドメイン対話データセットであるTopicalChatに基づいて,約9000対話の340時間を含む,最初の大規模マルチモーダル音声対話コーパスであるMultiDialogを新たに導入した。マルチダイアログには、与えられたスクリプトに従って行動する会話相手の音声と視覚の同時記録が含まれており、マルチモーダル合成の研究機会が開けることを期待している。我々の対面音声対話モデルは、テキスト事前学習された大きな言語モデルを導入し、音声-テキスト共同学習を取り入れて音声-視覚対話領域に適応する。広範にわたる実験を通して, 対面会話の促進におけるモデルの有効性を検証した。デモとデータはhttps://multidialog.github.ioとhttps://huggingface.co/datasets/IVLLab/MultiDialogで公開されている。

In this paper, we introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response, marking the initial step towards creating an avatar chatbot system without relying on intermediate text. To this end, we newly introduce MultiDialog, the first large-scale multimodal (i.e., audio and visual) spoken dialogue corpus containing 340 hours of approximately 9,000 dialogues, recorded based on the open domain dialogue dataset, TopicalChat. The MultiDialog contains parallel audio-visual recordings of conversation partners acting according to the given script with emotion annotations, which we expect to open up research opportunities in multimodal synthesis. Our Face-to-Face spoken dialogue model incorporates a textually pretrained large language model and adapts it into the audio-visual spoken dialogue domain by incorporating speech-text joint pretraining. Through extensive experiments, we validate the effectiveness of our model in facilitating a face-to-face conversation. Demo and data are available at https://multidialog.github.io and https://huggingface.co/datasets/IVLLab/MultiDialog, respectively.

翻訳日:2024-08-05 17:43:44 公開日:2024-08-02

# ハイパースペクトル画像復調のためのハイブリッド空間スペクトルニューラルネットワーク

Hybrid Spatial-spectral Neural Network for Hyperspectral Image Denoising ( http://arxiv.org/abs/2406.08782v2 )

ライセンス: Link先を確認

Hao Liang, Chengjie, Kun Li, Xin Tian,

(参考訳) ハイパースペクトル画像(HSI)は、HSIアプリケーションに必須の手順である。残念なことに、Transformerベースの既存の手法は主に非局所モデリングに焦点をあてており、画像の復調における局所性の重要性を無視している。さらに、深層学習法は複雑なスペクトル学習機構を用いており、計算コストが大きい。これらの問題に対処するために,CNNとTransformer特性にインスパイアされた新しいハイブリッドデュアルパスネットワークを設計し,局所的および非局所的な空間的詳細を効率よく捕捉し,ノイズを抑えるハイブリッド空間スペクトル認知ネットワーク(HSSD)を提案する。さらに、計算複雑性を低減するために、空間とスペクトルチャネルの学習を阻害する単純だが効果的な分離戦略を採用し、パラメータの少ない多層認識を用いてスペクトルのグローバルな相関関係を学習する。合成および実実験により,提案手法は空間的およびスペクトル的再構成における最先端の手法より優れていることが示された。コードと詳細はhttps://github.com/HLImg/HSSDで確認できる。

Hyperspectral image (HSI) denoising is an essential procedure for HSI applications. Unfortunately, the existing Transformer-based methods mainly focus on non-local modeling, neglecting the importance of locality in image denoising. Moreover, deep learning methods employ complex spectral learning mechanisms, thus introducing large computation costs. To address these problems, we propose a hybrid spatial-spectral denoising network (HSSD), in which we design a novel hybrid dual-path network inspired by CNN and Transformer characteristics, leading to capturing both local and non-local spatial details while suppressing noise efficiently. Furthermore, to reduce computational complexity, we adopt a simple but effective decoupling strategy that disentangles the learning of space and spectral channels, where multilayer perception with few parameters is utilized to learn the global correlations among spectra. The synthetic and real experiments demonstrate that our proposed method outperforms state-of-the-art methods on spatial and spectral reconstruction. The code and details are available on https://github.com/HLImg/HSSD.

翻訳日:2024-08-05 17:43:44 公開日:2024-08-02

# ポスト量子ステアリングの活性化

Activation of post-quantum steering ( http://arxiv.org/abs/2406.10570v2 )

ライセンス: Link先を確認

Ana Belén Sainz, Paul Skrzypczyk, Matty J. Hoban,

(参考訳) ベルの不等式により大きな違反を与える物理理論は、対応するティレルソン境界(英語版)(Tsirelson bound)、すなわち量子後非局所性(英語版)(post-quantum non-locality)と呼ばれるものよりも大きい可能性がある。このような理論は特殊相対性理論に反するものではなく、特定の情報処理タスクにおいて有利である可能性がある。エンタングル量子状態が非古典的な現象を示す別の方法として、アインシュタイン=ポドルスキー=ローゼン(EPR)ステアリングがある。術後のERPステアリングはより複雑であるが,従来のベル試験では必ずしも非局所性を示すものではないことが示されている。本研究では,量子後非局所性を個別に示さない大規模ネットワークにおいて資源を分配する方法を示す。すなわち,ベルシナリオにおいて,量子後相関として確認できるように,量子後ステアリングを活性化する方法を示す。独立した研究の1つの要素は、量子後資源を仮定してさえも、ネットワーク内の二部量子集合を自己テストする方法を示すことである。

There are possible physical theories that give greater violations of Bell's inequalities than the corresponding Tsirelson bound, termed post-quantum non-locality. Such theories do not violate special relativity, but could give an advantage in certain information processing tasks. There is another way in which entangled quantum states exhibit non-classical phenomena, with one notable example being Einstein-Podolsky-Rosen (EPR) steering; a violation of a bipartite Bell inequality implies EPR steering, but the converse is not necessarily true. The study of post-quantum EPR steering is more intricate, but it has been shown that it does not always imply post-quantum non-locality in a conventional Bell test. In this work we show how to distribute resources in a larger network that individually do not demonstrate post-quantum non-locality but violate a Tsirelson bound for the network. That is, we show how to activate post-quantum steering so that it can now be witnessed as post-quantum correlations in a Bell scenario. One element of our work that may be of independent interest is we show how to self-test a bipartite quantum assemblage in a network, even assuming post-quantum resources.

翻訳日:2024-08-05 17:43:44 公開日:2024-08-02

# 限界フェルミオンはユニバーサル・エンベズラーである

Critical Fermions are Universal Embezzlers ( http://arxiv.org/abs/2406.11747v2 )

ライセンス: Link先を確認

Lauritz van Luijk, Alexander Stottmeister, Henrik Wilming,

(参考訳) 普遍エンベズラー(Universal embezzler)は、任意の絡み合った状態が任意の精度で局所演算に抽出され、系の状態を任意に摂動する二部量子系である。一次元格子上の局所的、翻訳不変、および臨界自由フェルミオン多体系の基底状態セクターは、2つの半鎖に分割された場合、普遍エンベズラーである。同じ性質は、ジョルダン・ウィグナー変換を介して局所的に相互作用する双対スピン鎖において成り立つ。普遍エンベッズメントは、熱力学の極限に限らず、既に有限系のサイズに対して現れている:任意の有限誤差と任意の対象の絡み合った状態に対して、鎖の有限長は与えられた誤差の中にその状態をエンベッズするのに十分である。技術的なレベルでは、与えられたモデルの基底状態セクターに付随する半鎖可観測代数がタイプIII$_1$因子であることを示す。

Universal embezzlers are bipartite quantum systems from which any entangled state may be extracted to arbitrary precision using local operations while perturbing the state of the system arbitrarily little. Here, we show that universal embezzlers are ubiquitous in many-body physics: The ground state sector of every local, translation-invariant, and critical free-fermionic many-body system on a one-dimensional lattice is a universal embezzler if bi-partitioned into two half-chains. The same property holds in locally-interacting, dual spin chains via the Jordan-Wigner transformation. Universal embezzlement manifests already for finite system sizes, not only in the thermodynamic limit: For any finite error and any targeted entangled state, a finite length of the chain is sufficient to embezzle said state within the given error. On a technical level, our main result establishes that the half-chain observable algebras associated with ground state sectors of the given models are type III$_1$ factors.

翻訳日:2024-08-05 17:43:44 公開日:2024-08-02

# 階層型変圧器における細粒度注意

Fine-grained Attention in Hierarchical Transformers for Tabular Time-series ( http://arxiv.org/abs/2406.15327v2 )

ライセンス: Link先を確認

Raphael Azorin, Zied Ben Houidi, Massimo Gallo, Alessandro Finamore, Pietro Michiardi,

(参考訳) タブラルデータは、多くの実生活システムにおいてユビキタスである。特に、行が時系列的に関連付けられている時間依存の表データは、典型的には歴史的イベント、例えば、金融取引、医療記録、株価履歴を記録するために使用される。近年,変圧器アーキテクチャのアテンション機構の階層的変化は,表型時系列データのモデル化に利用されている。最初は、行(または列)は、フィールド間の注意を計算することによって、別々に符号化される。その後、エンコードされた行(または列)が互いに参加し、表の時系列全体をモデル化する。このアプローチは効率的だが、注意の粒度を制限し、異なる行や列をまたいだフィールドレベルでパターンを学習する能力を制限する。このギャップに対処する第一歩として、行レベルと列レベルのフィールドをコンテキスト化する、きめ細かい階層モデルであるFieldyを提案します。我々は,表表表時系列データセットを用いた回帰・分類タスクの最先端モデルに対する提案を比較検討した。その結果,行ワイドと列ワイドアテンションを組み合わせることで,モデルサイズを増大させることなく性能が向上することがわかった。コードとデータはhttps://github.com/raphaaal/fieldy.comで公開されている。

Tabular data is ubiquitous in many real-life systems. In particular, time-dependent tabular data, where rows are chronologically related, is typically used for recording historical events, e.g., financial transactions, healthcare records, or stock history. Recently, hierarchical variants of the attention mechanism of transformer architectures have been used to model tabular time-series data. At first, rows (or columns) are encoded separately by computing attention between their fields. Subsequently, encoded rows (or columns) are attended to one another to model the entire tabular time-series. While efficient, this approach constrains the attention granularity and limits its ability to learn patterns at the field-level across separate rows, or columns. We take a first step to address this gap by proposing Fieldy, a fine-grained hierarchical model that contextualizes fields at both the row and column levels. We compare our proposal against state of the art models on regression and classification tasks using public tabular time-series datasets. Our results show that combining row-wise and column-wise attention improves performance without increasing model size. Code and data are available at https://github.com/raphaaal/fieldy.

翻訳日:2024-08-05 17:43:44 公開日:2024-08-02

# 計算生活: 単純な相互作用から生み出す、十分に形成された自己複製プログラム

Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction ( http://arxiv.org/abs/2406.19108v2 )

ライセンス: Link先を確認

Blaise Agüera y Arcas, Jyrki Alakuijala, James Evans, Ben Laurie, Alexander Mordvintsev, Eyvind Niklasson, Ettore Randazzo, Luca Versari,

(参考訳) 生命の起源と人工生命の分野はどちらも、生命とは何か、どのようにして「前生命」のダイナミクスの異なる集合から生まれるのかを疑問視している。生命が出現するほとんどの基質の一般的な特徴は、自己複製が現れるときのダイナミックスの変化である。自己複製器が自然にどのように出現したかについてはいくつかの仮説があるが、一般力学、計算原理、そして自己複製器が現れるために必要な条件についてはほとんど分かっていない。これは、相互作用が論理的、数学的、またはプログラミング規則を含む「計算基板」に特に当てはまる。本稿では,様々な単純なプログラム言語と機械命令セットに基づいて,複数の計算基板を研究することによって,自己複製器がどのように生じるかを理解するための一歩を踏み出した。本研究では,無作為で非自己複製プログラムが明示的なフィットネス環境を持たない環境に置かれる場合,自己複製プログラムが出現する傾向があることを示す。ランダムな相互作用と自己修正が原因で発生することを示し、バックグラウンドなランダムな突然変異を伴わずとも起こりうる。また,自己複製器の出現に伴い,複雑なダイナミクスが出現し続けていることを示す。最後に,自己複製が可能である最小主義プログラミング言語の反例を示す。

The fields of Origin of Life and Artificial Life both question what life is and how it emerges from a distinct set of "pre-life" dynamics. One common feature of most substrates where life emerges is a marked shift in dynamics when self-replication appears. While there are some hypotheses regarding how self-replicators arose in nature, we know very little about the general dynamics, computational principles, and necessary conditions for self-replicators to emerge. This is especially true on "computational substrates" where interactions involve logical, mathematical, or programming rules. In this paper we take a step towards understanding how self-replicators arise by studying several computational substrates based on various simple programming languages and machine instruction sets. We show that when random, non self-replicating programs are placed in an environment lacking any explicit fitness landscape, self-replicators tend to arise. We demonstrate how this occurs due to random interactions and self-modification, and can happen with and without background random mutations. We also show how increasingly complex dynamics continue to emerge following the rise of self-replicators. Finally, we show a counterexample of a minimalistic programming language where self-replicators are possible, but so far have not been observed to arise.

翻訳日:2024-08-05 17:43:44 公開日:2024-08-02

# Chat AI: HPCベースのサービスのためのシームレススラムネイティブソリューション

Chat AI: A Seamless Slurm-Native Solution for HPC-Based Services ( http://arxiv.org/abs/2407.00110v2 )

ライセンス: Link先を確認

Ali Doosthosseini, Jonathan Decker, Hendrik Nolte, Julian M. Kunkel,

(参考訳) 大規模言語モデル(LLM)の普及により、効率的でセキュアでプライベートなサービスインフラストラクチャの必要性が高まっている。最先端GPUを備えた高性能コンピューティング(HPC)システムは、LLMのトレーニングに適しているが、彼らのバッチスケジューリングパラダイムは、AIアプリケーションのリアルタイム配信をサポートするように設計されていない。一方、クラウドシステムはWebサービスには適しているが、一般的にHPCクラスタの計算能力、特に最適な推論速度に必要な高価で少ないハイエンドGPUにアクセスできない。本稿では,HPC システム上で多数の LLM モデルを実行するスケーラブルなバックエンドにセキュアにアクセス可能な,クラウド VM 上で動作する Web サービスによる実装を提案する。 LLMをホストするHPCインフラストラクチャを使用したWebサービスを提供することで、ローカル大学や研究センターの信頼された環境を活用し、商用LLMサービスに代わるプライベートでセキュアな代替手段を提供する。我々のソリューションは、HPCバッチスケジューラSlurmとネイティブに統合され、HPCクラスタへのシームレスなデプロイを可能にし、Slurmが生成したスケジュールのギャップを利用して、通常のSlurmワークロードと並行して実行できる。 HPCシステムのセキュリティを確保するため、SSH ForceCommandディレクティブを用いてロバストなサーキットブレーカーを構築する。当社のシステムは実運用サービスとして成功し、ソースコードは \url{https://github.com/gwdg/chat-ai} で公開しました。

The widespread adoption of large language models (LLMs) has created a pressing need for an efficient, secure and private serving infrastructure, which allows researchers to run open source or custom fine-tuned LLMs and ensures users that their data remains private and is not stored without their consent. While high-performance computing (HPC) systems equipped with state-of-the-art GPUs are well-suited for training LLMs, their batch scheduling paradigm is not designed to support real-time serving of AI applications. Cloud systems, on the other hand, are well suited for web services but commonly lack access to the computational power of HPC clusters, especially expensive and scarce high-end GPUs, which are required for optimal inference speed. We propose an architecture with an implementation consisting of a web service that runs on a cloud VM with secure access to a scalable backend running a multitude of LLM models on HPC systems. By offering a web service using our HPC infrastructure to host LLMs, we leverage the trusted environment of local universities and research centers to offer a private and secure alternative to commercial LLM services. Our solution natively integrates with the HPC batch scheduler Slurm, enabling seamless deployment on HPC clusters, and is able to run side by side with regular Slurm workloads, while utilizing gaps in the schedule created by Slurm. In order to ensure the security of the HPC system, we use the SSH ForceCommand directive to construct a robust circuit breaker, which prevents successful attacks on the web-facing server from affecting the cluster. We have successfully deployed our system as a production service, and made the source code available at \url{https://github.com/gwdg/chat-ai}

翻訳日:2024-08-05 17:43:44 公開日:2024-08-02

# 近似ベイズ計算による量子系パラメータの効率的な推論

Efficient inference of quantum system parameters by Approximate Bayesian Computation ( http://arxiv.org/abs/2407.00724v2 )

ライセンス: Link先を確認

Lewis A. Clark, Jan Kolodynski,

(参考訳) システムパラメータを効率的に推論する能力は、高速な操作を必要とする任意の信号処理タスクにおいて不可欠である。量子システムとのディーリングはヒルベルト空間がシステムサイズで大きく成長することによる深刻な問題である。観測された測定データの統計、すなわち、容易に計算できないため、最大形推定器や粒子フィルタのような一般的な手法は実用的ではない。この問題に対処するために、与えられた量子デバイスに予め用意された測定データのライブラリーからサンプリングすることで、確率計算を回避できる近似ベイズ計算(ABC)アルゴリズムを提案する。本研究では,2レベル原子と光学系をリアルタイムに探索する際に発生する光検出クリックパターンの解釈にABCを適用した。後者については、線形と非線形の両方のレジームを考察し、量子計測統計を理解することによってABCアルゴリズムをカスタマイズする方法を示す。我々の研究は、量子デバイスと関連する測定方法が複雑でなくても、高速パラメータ推論が可能であることを実証している。

The ability to efficiently infer system parameters is essential in any signal-processing task that requires fast operation. Dealing with quantum systems, a serious challenge arises due to substantial growth of the underlying Hilbert space with the system size. As the statistics of the measurement data observed, i.e. the likelihood, can no longer be easily computed, common approaches such as maximum-likelihood estimators or particle filters become impractical. To address this issue, we propose the use of the Approximate Bayesian Computation (ABC) algorithm, which evades likelihood computation by sampling from a library of measurement data -- a priori prepared for a given quantum device. We apply ABC to interpret photodetection click-patterns arising when probing in real time a two-level atom and an optomechanical system. For the latter, we consider both linear and non-linear regimes, in order to show how to tailor the ABC algorithm by understanding the quantum measurement statistics. Our work demonstrates that fast parameter inference may be possible no matter the complexity of a quantum device and the measurement scheme involved.

翻訳日:2024-08-05 17:43:44 公開日:2024-08-02

# OpenVid-1M:テキスト・ビデオ・ジェネレーションのための大規模高品質データセット

OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation ( http://arxiv.org/abs/2407.02371v2 )

ライセンス: Link先を確認

Kepan Nan, Rui Xie, Penghao Zhou, Tiehan Fan, Zhenheng Yang, Zhijie Chen, Xiang Li, Jian Yang, Ying Tai,

(参考訳) テキスト・ツー・ビデオ(T2V)生成は、大規模なマルチモダリティモデルであるSoraのおかげで、近年大きな注目を集めている。しかし、T2V生成には2つの重要な課題がある。 1) 正確なオープンソースの高品質データセットの欠如。以前の人気ビデオデータセットであるWebVid-10MやPanda-70Mは、ほとんどの研究機関では低品質か大きすぎる。したがって、T2V生成のために正確な高品質のテキストビデオペアを収集することは困難であるが、極めて重要である。 2) テキスト情報の完全活用を怠る。近年のT2V法は、テキストプロンプトから意味情報を徹底的に抽出するに足らない、ビデオ生成のための単純なクロスアテンションモジュールを用いて、視覚変換器に焦点を当てている。これらの問題に対処するために,表現的キャプションを備えた高精度な高品質データセットOpenVid-1Mを導入する。このオープンシナリオデータセットには100万以上のテキストビデオペアが含まれており、T2V生成の研究を容易にする。さらに、OpenVid-1Mから433K 1080pのビデオをキュレートし、OpenVidHD-0.4Mを作成し、高精細ビデオ生成を進める。さらに,視覚トークンから構造情報を抽出し,テキストトークンから意味情報を抽出する多モードビデオ拡散変換器(MVDiT)を提案する。大規模な実験とアブレーション研究により,過去のデータセットよりもOpenVid-1Mの方が優れており,MVDiTの有効性が検証された。

Text-to-video (T2V) generation has recently garnered significant attention thanks to the large multi-modality model Sora. However, T2V generation still faces two important challenges: 1) Lacking a precise open sourced high-quality dataset. The previous popular video datasets, e.g. WebVid-10M and Panda-70M, are either with low quality or too large for most research institutions. Therefore, it is challenging but crucial to collect a precise high-quality text-video pairs for T2V generation. 2) Ignoring to fully utilize textual information. Recent T2V methods have focused on vision transformers, using a simple cross attention module for video generation, which falls short of thoroughly extracting semantic information from text prompt. To address these issues, we introduce OpenVid-1M, a precise high-quality dataset with expressive captions. This open-scenario dataset contains over 1 million text-video pairs, facilitating research on T2V generation. Furthermore, we curate 433K 1080p videos from OpenVid-1M to create OpenVidHD-0.4M, advancing high-definition video generation. Additionally, we propose a novel Multi-modal Video Diffusion Transformer (MVDiT) capable of mining both structure information from visual tokens and semantic information from text tokens. Extensive experiments and ablation studies verify the superiority of OpenVid-1M over previous datasets and the effectiveness of our MVDiT.

翻訳日:2024-08-05 15:50:45 公開日:2024-08-02

# D-Rax:マルチモーダルデータとeXpertモデル予測を利用したドメイン固有無線アシスタント

D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions ( http://arxiv.org/abs/2407.02604v2 )

ライセンス: Link先を確認

Hareem Nisar, Syed Muhammad Anwar, Zhifan Jiang, Abhijeet Parida, Ramon Sanchez-Jacob, Vishwesh Nath, Holger R. Roth, Marius George Linguraru,

(参考訳) 大規模視覚言語モデル(VLM)は、研究から汎用ユースケースへの適用に至るまで、驚くほど進歩している。 LLaVA-Medは、バイオメディシンのための先駆的な大規模言語とビジョンアシスタントであり、放射線医学者のための自然言語インタフェースを提供するために、マルチモーダルなバイオメディカルイメージとデータ分析を実行することができる。非常に一般化可能であり、マルチモーダルデータで動作するが、現在、大きな言語モデル空間に存在するよく知られた課題によって制限されている。幻覚と反応のインプレクションは、現在VLMの臨床的適応性を阻害している誤診を引き起こす可能性がある。医療において正確なユーザフレンドリなモデルを作成するために、D-Raxを提案する。D-Raxは、特定の放射線画像についての洞察を得るために使用できる、ドメイン固有の、会話型、無線支援ツールである。本研究では,胸部X線画像(CXR)の会話解析を強化し,放射線学的診断を支援するとともに,医用画像からの包括的洞察と正確な診断の定式化を支援する。 D-Raxは、画像、命令、およびMIMIC-CXR画像データ、CXR関連視覚質問応答(VQA)ペア、および複数の専門家AIモデルから得られる予測結果からなる画像、命令、および疾患診断および人口統計予測を含む、我々のキュレートされた命令追従データに基づいてLLaVA-Medアーキテクチャを微調整することで実現される。オープン・エンド・会話とクローズド・会話の双方において,反応の統計的に有意な改善が認められた。最先端の診断モデルのパワーをVLMと組み合わせることで、D-Raxは、臨床医が自然言語を使って医療画像と対話できるようにし、意思決定プロセスの合理化、診断精度の向上、時間の保存を可能にする。

Large vision language models (VLMs) have progressed incredibly from research to applicability for general-purpose use cases. LLaVA-Med, a pioneering large language and vision assistant for biomedicine, can perform multi-modal biomedical image and data analysis to provide a natural language interface for radiologists. While it is highly generalizable and works with multi-modal data, it is currently limited by well-known challenges that exist in the large language model space. Hallucinations and imprecision in responses can lead to misdiagnosis which currently hinder the clinical adaptability of VLMs. To create precise, user-friendly models in healthcare, we propose D-Rax -- a domain-specific, conversational, radiologic assistance tool that can be used to gain insights about a particular radiologic image. In this study, we enhance the conversational analysis of chest X-ray (CXR) images to support radiological reporting, offering comprehensive insights from medical imaging and aiding in the formulation of accurate diagnosis. D-Rax is achieved by fine-tuning the LLaVA-Med architecture on our curated enhanced instruction-following data, comprising of images, instructions, as well as disease diagnosis and demographic predictions derived from MIMIC-CXR imaging data, CXR-related visual question answer (VQA) pairs, and predictive outcomes from multiple expert AI models. We observe statistically significant improvement in responses when evaluated for both open and close-ended conversations. Leveraging the power of state-of-the-art diagnostic models combined with VLMs, D-Rax empowers clinicians to interact with medical images using natural language, which could potentially streamline their decision-making process, enhance diagnostic accuracy, and conserve their time.

翻訳日:2024-08-05 15:50:45 公開日:2024-08-02

# LLMs Plagiarize:知識グラフ比較による大規模言語モデルの学習データの応答性保証

LLMs Plagiarize: Ensuring Responsible Sourcing of Large Language Model Training Data Through Knowledge Graph Comparison ( http://arxiv.org/abs/2407.02659v2 )

ライセンス: Link先を確認

Devam Mondal, Carlo Lipizzi,

(参考訳) 近年,出版社,新聞,その他の著作権付きコーパス作成者が,著作権付き教材をトレーニングや微調整目的で利用する大規模言語モデル開発者に対して起こした法的主張を踏まえ,大規模言語モデルの訓練や微調整に知識源が使用されているかどうかを評価する新しいシステム,盗作検出システムを提案する。現在の手法とは異なり、我々はResource Description Framework(RDF)トリプルを使用して、ソースドキュメントとLLM継続の両方から知識グラフを作成するアプローチを利用する。これらのグラフは、コサイン類似性を用いてコンテンツに関して分析され、また、同型度を示すグラフ編集距離の正規化版を用いて構造に関して分析される。ソースとターゲットコーパス間のコンテンツマッチングやキーワード識別に重点を置く従来のプラジャリズムシステムとは異なり,提案手法は,アイデアと組織間の相互関係に着目して,ソースドキュメントとLCM継続の類似性をより広く,より正確な評価を可能にする。さらに,閉鎖型大規模言語モデル「ブラックボックス」システムやトレーニングコーパスでは利用できないパープレキシティなどのLCMメトリクスへのアクセスも不要である。そこで我々は,LLMがコーパスの継続を類似度測定によって「プラギアル化」したかどうかを評価する。システムのプロトタイプはハイパーリンクされたGitHubリポジトリで公開されます。

In light of recent legal allegations brought by publishers, newspapers, and other creators of copyrighted corpora against large language model developers who use their copyrighted materials for training or fine-tuning purposes, we propose a novel system, a variant of a plagiarism detection system, that assesses whether a knowledge source has been used in the training or fine-tuning of a large language model. Unlike current methods, we utilize an approach that uses Resource Description Framework (RDF) triples to create knowledge graphs from both a source document and an LLM continuation of that document. These graphs are then analyzed with respect to content using cosine similarity and with respect to structure using a normalized version of graph edit distance that shows the degree of isomorphism. Unlike traditional plagiarism systems that focus on content matching and keyword identification between a source and a target corpus, our approach enables a broader and more accurate evaluation of similarity between a source document and LLM continuation by focusing on relationships between ideas and their organization with regards to others. Additionally, our approach does not require access to LLM metrics like perplexity that may be unavailable in closed large language model "black-box" systems, as well as the training corpus. We thus assess whether an LLM has "plagiarized" a corpus in its continuation through similarity measures. A prototype of our system will be found on a hyperlinked GitHub repository.

翻訳日:2024-08-05 15:50:45 公開日:2024-08-02

# 機械学習を用いた二元中性子星のリアルタイム重力波推定

Real-time gravitational-wave inference for binary neutron stars using machine learning ( http://arxiv.org/abs/2407.09602v2 )

ライセンス: Link先を確認

Maximilian Dax, Stephen R. Green, Jonathan Gair, Nihar Gupte, Michael Pürrer, Vivien Raymond, Jonas Wildberger, Jakob H. Macke, Alessandra Buonanno, Bernhard Schölkopf,

(参考訳) 二元中性子星(BNS)の融合は重力波(GW)と電磁スペクトル(EM)の両方で信号を放出する。有名なことに、GW170817のマルチセンサー観測は、宇宙論、核物理学、重力の科学的な発見につながった。これらの結果の中心は、GW170817の場合、GW信号の11時間後、関連するEM過渡性AT 2017gfoを特定するのに役立った、GWデータから得られる空の局在と距離である。 GWデータの高速解析は、時間に敏感なEM観測を誘導するために重要であるが、信号の長さと複雑さから生じる問題のため、精度を犠牲にする近似を行う必要があることが多い。本稿では,そのような近似を行なわずに,1秒で完全なBNS推論を行う機械学習フレームワークを提案する。提案手法によるマルチメーカ観測の促進一合併前の正確な位置決め (ii) 近似低遅延法と比較して印加精度を$\sim30\%$で改善し, 三光度距離、傾き、質量の詳細な情報で、高価な望遠鏡の時間を優先することができる。さらに、我々の手法の柔軟性とコスト削減は、状態方程式研究の新しい機会を開く。最後に,提案手法は最大1時間までの超長信号にスケールし,次世代地上・宇宙用検出器のデータ解析の青写真として機能することを示す。

Mergers of binary neutron stars (BNSs) emit signals in both the gravitational-wave (GW) and electromagnetic (EM) spectra. Famously, the 2017 multi-messenger observation of GW170817 led to scientific discoveries across cosmology, nuclear physics, and gravity. Central to these results were the sky localization and distance obtained from GW data, which, in the case of GW170817, helped to identify the associated EM transient, AT 2017gfo, 11 hours after the GW signal. Fast analysis of GW data is critical for directing time-sensitive EM observations; however, due to challenges arising from the length and complexity of signals, it is often necessary to make approximations that sacrifice accuracy. Here, we present a machine learning framework that performs complete BNS inference in just one second without making any such approximations. Our approach enhances multi-messenger observations by providing (i) accurate localization even before the merger; (ii) improved localization precision by $\sim30\%$ compared to approximate low-latency methods; and (iii) detailed information on luminosity distance, inclination, and masses, which can be used to prioritize expensive telescope time. Additionally, the flexibility and reduced cost of our method open new opportunities for equation-of-state studies. Finally, we demonstrate that our method scales to extremely long signals, up to an hour in length, thus serving as a blueprint for data analysis for next-generation ground- and space-based detectors.

翻訳日:2024-08-05 15:50:45 公開日:2024-08-02

# 大規模言語モデルの LoRA に関する調査

A Survey on LoRA of Large Language Models ( http://arxiv.org/abs/2407.11046v2 )

ライセンス: Link先を確認

Yuren Mao, Yuhang Ge, Yijiang Fan, Wenyi Xu, Yu Mi, Zhonghao Hu, Yunjun Gao,

(参考訳) Low-Rank Adaptation~(LoRA)は、高密度ニューラルネットワーク層をプラグ可能な低ランク行列で更新する、パラメータ効率の良い微調整パラダイムの1つである。さらに、クロスタスクの一般化とプライバシ保護において大きな利点がある。したがって、LoRAは近年注目を集めており、関連する文献の数は指数関数的な成長を示している。 LoRAの現状を概観する必要がある。本調査は,(1)ダウンストリーム適応の改善による下流タスクの性能向上,(2)複数のLoRAプラグインを混合してタスク間一般化を実現するクロスタスク一般化手法,(3)LoRAの計算効率を高める効率改善手法,(4)LoRAをフェデレート学習に使用するデータプライバシ保護手法,(5)アプリケーションの観点から,進捗を分類し,レビューする。また,本調査では今後の方向性についても論じる。最後に、読者にGithubページ(https://github.com/ZJU-LLMs/Awesome-LoRAs.git)を提供し、この調査論文の更新を確認し、議論を開始する。

Low-Rank Adaptation~(LoRA), which updates the dense neural network layers with pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning paradigms. Furthermore, it has significant advantages in cross-task generalization and privacy-preserving. Hence, LoRA has gained much attention recently, and the number of related literature demonstrates exponential growth. It is necessary to conduct a comprehensive overview of the current progress on LoRA. This survey categorizes and reviews the progress from the perspectives of (1) downstream adaptation improving variants that improve LoRA's performance on downstream tasks; (2) cross-task generalization methods that mix multiple LoRA plugins to achieve cross-task generalization; (3) efficiency-improving methods that boost the computation-efficiency of LoRA; (4) data privacy-preserving methods that use LoRA in federated learning; (5) application. Besides, this survey also discusses the future directions in this field. At last, we provide a Github page (https://github.com/ZJU-LLMs/Awesome-LoRAs.git) for readers to check the updates and initiate discussions on this survey paper.

翻訳日:2024-08-05 15:50:45 公開日:2024-08-02

# 学習した画像の圧縮を再考する

Rethinking Learned Image Compression: Context is All You Need ( http://arxiv.org/abs/2407.11590v3 )

ライセンス: Link先を確認

Jixiang Luo,

(参考訳) 近年,licは従来の手法と比較して急速に進歩しているため,本稿では「Learred Image Compression(lic)の境界はどこにあるのか?」という疑問を論じる。以上の問題を2つのサブプロブレムに分割する: 1)PSNRの速度歪み性能の境界は何か? 2) 圧縮ゲインをさらに改善し、境界を達成するにはどうすればいいのか? そこで本研究では,エンコーダ,デコーダ,コンテキストモデルのスケーリングパラメータの有効性を解析する。そして、licのスケーリングは、lic内のコンテキストモデルとデコーダのスケーリングである、と結論付けます。大規模な実験は、オーバーフィッティングが実際に効果的な文脈として機能することを示した。文脈を最適化することにより、PSNRをさらに改善し、最先端のパフォーマンスを実現し、VVCよりもBD-RATEの方が14.39%向上したことを示す。

Since LIC has made rapid progress recently compared to traditional methods, this paper attempts to discuss the question about 'Where is the boundary of Learned Image Compression(LIC)?'. Thus this paper splits the above problem into two sub-problems:1)Where is the boundary of rate-distortion performance of PSNR? 2)How to further improve the compression gain and achieve the boundary? Therefore this paper analyzes the effectiveness of scaling parameters for encoder, decoder and context model, which are the three components of LIC. Then we conclude that scaling for LIC is to scale for context model and decoder within LIC. Extensive experiments demonstrate that overfitting can actually serve as an effective context. By optimizing the context, this paper further improves PSNR and achieves state-of-the-art performance, showing a performance gain of 14.39% with BD-RATE over VVC.

翻訳日:2024-08-05 15:50:45 公開日:2024-08-02

# CCVA-FL:医療画像のための適応的フェデレーション学習

CCVA-FL: Cross-Client Variations Adaptive Federated Learning for Medical Imaging ( http://arxiv.org/abs/2407.11652v3 )

ライセンス: Link先を確認

Sunny Gupta, Amit Sethi,

(参考訳) Federated Learning(FL)は、分散データ上でモデルをトレーニングするためのプライバシ保護アプローチを提供する。医療におけるそのポテンシャルは重要であるが、制限されたアノテーションによって悪化する医療画像データの横断的変動によって、課題が生じる。本稿では,これらの問題に対処するため,CCVA-FL(Cross-Client Variations Adaptive Federated Learning)を提案する。 CCVA-FLは、画像を共通の特徴空間に変換することで、クロスクライアントの変動を最小限にすることを目的としている。各クライアントからのイメージのサブセットを専門的にアノテーションし、続いてターゲットとして最もデータ複雑性の低いクライアントを選択する。次に、ターゲットクライアントの注釈付き画像に基づいて、変換器付きスケーラブル拡散モデル(DiT)を用いて合成医療画像を生成する。これらの合成画像は多様性を捉え、元のデータを表現し、他のクライアントと共有する。各クライアントは、画像から画像への変換を使用して、そのローカル画像を対象のイメージ空間に変換する。翻訳された画像は、その後、サーバモデルを開発するための連合学習設定で使用される。その結果、CCVA-FLはプライバシーを損なうことなく、クライアント間でのデータ分散の違いを効果的に解決することで、Vanilla Federated Averagingよりも優れていることが示された。

Federated Learning (FL) offers a privacy-preserving approach to train models on decentralized data. Its potential in healthcare is significant, but challenges arise due to cross-client variations in medical image data, exacerbated by limited annotations. This paper introduces Cross-Client Variations Adaptive Federated Learning (CCVA-FL) to address these issues. CCVA-FL aims to minimize cross-client variations by transforming images into a common feature space. It involves expert annotation of a subset of images from each client, followed by the selection of a client with the least data complexity as the target. Synthetic medical images are then generated using Scalable Diffusion Models with Transformers (DiT) based on the target client's annotated images. These synthetic images, capturing diversity and representing the original data, are shared with other clients. Each client then translates its local images into the target image space using image-to-image translation. The translated images are subsequently used in a federated learning setting to develop a server model. Our results demonstrate that CCVA-FL outperforms Vanilla Federated Averaging by effectively addressing data distribution differences across clients without compromising privacy.

翻訳日:2024-08-05 15:50:45 公開日:2024-08-02

# GenRC: スパースイメージコレクションから生成した3Dルームコンプリート

GenRC: Generative 3D Room Completion from Sparse Image Collections ( http://arxiv.org/abs/2407.12939v3 )

ライセンス: Link先を確認

Ming-Feng Li, Yueh-Feng Ku, Hong-Xuan Yen, Chi Liu, Yu-Lun Liu, Albert Y. C. Chen, Cheng-Hao Kuo, Min Sun,

(参考訳) 特に、シーン全体を通して一貫したテクスチャやジオメトリーを考える場合、スパースRGBDシーンの完成は難しい課題である。人間の設計したテキストプロンプトやカメラトラジェクトリに依存する既存のソリューションとは違って,高忠実度テクスチャを備えた部屋規模の3Dメッシュを実現するための,自動トレーニングフリーパイプラインであるGenRCを提案する。これを実現するために、まず、スパースRGBD画像を高度に不完全な3Dメッシュに投影する。空白を埋めるために新しいビューを反復的に生成する代わりに,提案したE-Diffusionを用いて,大域的幾何学と外観整合性を保証するビュー一貫性パノラマRGBD画像を生成する。さらに,人間設計のテキストプロンプトを置き換えるために,テキスト変換による入力出力シーンのスタイリスティックな整合性を維持する。データセット間のドメインギャップを埋めるために、E-Diffusionは大規模なデータセットでトレーニングされたモデルを活用して、さまざまな外観を生成する。 GenRCは、ScanNetとARKitScenesデータセットにおいて、これらのデータセットや事前に定義されたカメラトラジェクトリを使用してトレーニングされていないにもかかわらず、ほとんどの外観と幾何学的メトリクスの下で最先端の手法よりも優れています。プロジェクトページ:https://minfenli.github.io/GenRC

Sparse RGBD scene completion is a challenging task especially when considering consistent textures and geometries throughout the entire scene. Different from existing solutions that rely on human-designed text prompts or predefined camera trajectories, we propose GenRC, an automated training-free pipeline to complete a room-scale 3D mesh with high-fidelity textures. To achieve this, we first project the sparse RGBD images to a highly incomplete 3D mesh. Instead of iteratively generating novel views to fill in the void, we utilized our proposed E-Diffusion to generate a view-consistent panoramic RGBD image which ensures global geometry and appearance consistency. Furthermore, we maintain the input-output scene stylistic consistency through textual inversion to replace human-designed text prompts. To bridge the domain gap among datasets, E-Diffusion leverages models trained on large-scale datasets to generate diverse appearances. GenRC outperforms state-of-the-art methods under most appearance and geometric metrics on ScanNet and ARKitScenes datasets, even though GenRC is not trained on these datasets nor using predefined camera trajectories. Project page: https://minfenli.github.io/GenRC

翻訳日:2024-08-05 15:50:45 公開日:2024-08-02

# 生成型AIと大規模言語モデルの最近の進歩:現状,課題,展望

Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives ( http://arxiv.org/abs/2407.14962v3 )

ライセンス: Link先を確認

Desta Haileselassie Hagos, Rick Battle, Danda B. Rawat,

(参考訳) 生成人工知能(AI)とLarge Language Models(LLMs)の出現は、さまざまなドメインに革命をもたらす前例のない機能を導入し、自然言語処理(NLP)の新しい時代を象徴している。本稿では,これらの最先端技術の現状を概観し,その顕著な進歩と広範囲な応用を実証する。本稿では,ジェネレーティブAIとLLMの進化途上における技術的基盤,実践的応用,新たな課題に関する総合的な視点の提供に寄与する。我々は、AIシステムの生成能力とLLMの特定のコンテキストを理解することは、研究者、実践者、政策立案者にとって、これらの技術の責任と倫理的統合を様々な領域に協調的に形成することが不可欠であると考えている。さらに、主要な研究ギャップを特定し、対処し、AI研究コミュニティにおける将来の研究成果をガイドするための貴重な洞察を提供する。

The emergence of Generative Artificial Intelligence (AI) and Large Language Models (LLMs) has marked a new era of Natural Language Processing (NLP), introducing unprecedented capabilities that are revolutionizing various domains. This paper explores the current state of these cutting-edge technologies, demonstrating their remarkable advancements and wide-ranging applications. Our paper contributes to providing a holistic perspective on the technical foundations, practical applications, and emerging challenges within the evolving landscape of Generative AI and LLMs. We believe that understanding the generative capabilities of AI systems and the specific context of LLMs is crucial for researchers, practitioners, and policymakers to collaboratively shape the responsible and ethical integration of these technologies into various domains. Furthermore, we identify and address main research gaps, providing valuable insights to guide future research endeavors within the AI research community.

翻訳日:2024-08-05 15:50:45 公開日:2024-08-02

# サーキットブレーカのロバストアライメントの再検討

Revisiting the Robust Alignment of Circuit Breakers ( http://arxiv.org/abs/2407.15902v2 )

ライセンス: Link先を確認

Leo Schwinn, Simon Geisler,

(参考訳) 過去10年間で、敵の攻撃に対するモデル堅牢性を高める数少ない信頼性の高い方法(Szegedy et al , 2014 Madry et al , 2018, Xhonneux et al , 2024)の1つとして、敵の訓練が登場した。近年,LLMの整合性を示す新たな防御機構として,回路ブレーカー(Zou et al , 2024)が提案されている。本報告では,入力トークンの埋め込み空間における非拘束的連続攻撃に対する「回路ブレーカーによるアライメントとロバスト性の向上」のロバスト性は過大評価される可能性があることを示す(Zou et al , 2024]。具体的には、スペースアタック(Schwinn et al , 2024a, b]にいくつかの簡単な変更を加えることで、サーキットブレーカモデルに対する100%アタック成功率(ASR)を達成できることを実証する。それ以上のハイパーパラメータチューニングを行なわなければ、これらの調整は元の評価と比べてASRを80%以上増加させる。 https://github.com/SchwinnL/circuit-breakers-eval

Over the past decade, adversarial training has emerged as one of the few reliable methods for enhancing model robustness against adversarial attacks [Szegedy et al., 2014, Madry et al., 2018, Xhonneux et al., 2024], while many alternative approaches have failed to withstand rigorous subsequent evaluations. Recently, an alternative defense mechanism, namely "circuit breakers" [Zou et al., 2024], has shown promising results for aligning LLMs. In this report, we show that the robustness claims of "Improving Alignment and Robustness with Circuit Breakers" against unconstraint continuous attacks in the embedding space of the input tokens may be overestimated [Zou et al., 2024]. Specifically, we demonstrate that by implementing a few simple changes to embedding space attacks [Schwinn et al., 2024a,b], we achieve 100% attack success rate (ASR) against circuit breaker models. Without conducting any further hyperparameter tuning, these adjustments increase the ASR by more than 80% compared to the original evaluation. Code is accessible at: https://github.com/SchwinnL/circuit-breakers-eval

翻訳日:2024-08-05 15:50:45 公開日:2024-08-02

# 適応的勾配正規化法

An Adaptive Gradient Regularization Method ( http://arxiv.org/abs/2407.16944v3 )

ライセンス: Link先を確認

Huixiu Jiang, Ling Yang, Yu Bao, Rutong Si,

(参考訳) 最適化は、高い効率とパフォーマンスを持つニューラルネットワークトレーニングにおいて重要な役割を果たす。勾配に基づく重み更新は、オプティマイザの中心部分である。重みと勾配の正規化および標準化操作は、トレーニングプロセスを加速し、ウェイト標準化(WS)、ウェイト正規化(WN)、勾配正規化(GN)などの性能を向上させることが示されている。本研究では,任意の次元の勾配ベクトルを係数ベクトルとして正規化し,バニラ勾配によって勾配とその係数ベクトルの積を減算する勾配ベクトルの勾配等級に基づく新しい最適化手法を提案する。これは適応的な勾配クリッピング法と見なすことができる。 AGRは、より安定したトレーニングプロセスとより優れた一般化性能により、損失関数リプシッツネスを改善することができることを示す。 AGRは3行のコードだけで、AdanやAdamWといったバニラオプティマイザに組み込むことができる。実験は画像生成,画像分類,言語表現において行われ,AGRがトレーニング結果を改善することを示す。

Optimizer plays an important role in neural network training with high efficiency and performance. Weight update based on its gradient is the central part of the optimizer. It has been shown that normalization and standardization operation on weight and gradient can accelerate the training process and improve performance such as Weight Standardization (WS), weight normalization (WN) and gradient normalization (GN); there is also gradient centralization (GC). In this work, we introduce a new optimization technique based on the gradient magnitude in a gradient vector named adaptive gradient regularization (AGR), which normalizes the gradient vector in all dimensions as a coefficient vector and subtracts the product of the gradient and its coefficient vector by the vanilla gradient. It can be viewed as an adaptive gradient clipping method. We show that the AGR can improve the loss function Lipschitzness with a more stable training process and better generalization performance. AGR is very simple to be embedded into vanilla optimizers such as Adan and AdamW with only three lines of code. Our experiments are conducted in image generation, image classification and language representation, which shows that our AGR improves the training result.

翻訳日:2024-08-05 15:50:45 公開日:2024-08-02

# 表現論的多重性のための量子アルゴリズム

Quantum Algorithms for Representation-Theoretic Multiplicities ( http://arxiv.org/abs/2407.17649v2 )

ライセンス: Link先を確認

Martin Larocca, Vojtech Havlicek,

(参考訳) Kostka, Littlewood-Richardson, Plethysm および Kronecker 係数は、既約の制限と積における対称群の既約表現(不規則)の多重性である。それらは表現論において重要な役割を担い、計算が難しいことで知られている。表現の次元の比が多項式であれば、これらの係数を効率的に計算する量子アルゴリズムを与える。コストカ数は組合せ解釈を許容するので、多項式有界コストカ数に対する効率的な古典的アルゴリズムと、リトルウッド・リチャードソン係数に対する同様のアルゴリズムの存在が示される。同じ古典的アルゴリズムがプレトヒズム係数やクロネッカー係数に対して直接作用しない理由を論じ、我々の量子アルゴリズムが計算の困難さをいかに回避するかを証明し、この問題がいくつかの入力における超多項式量子スピードアップに繋がるかを推測する。最終的にフロベニウスの相互性を用いて別の量子アルゴリズムを導出し、誘導法を用いてこれらの係数を推定し、異なるコスト対インプット依存を持つ。

Kostka, Littlewood-Richardson, Plethysm and Kronecker coefficients are multiplicities of irreducible representations (irreps) of the symmetric group in restrictions and products of irreps. They play an important role in representation theory and are notoriously hard to compute. We give quantum algorithms that efficiently compute these coefficients whenever the ratio of dimensions of the representations is polynomial. Using that the Kostka numbers admit combinatorial interpretation, we show that there is an efficient classical algorithm for polynomially-bounded Kostka numbers and conjecture existence of a similar algorithm for the Littlewood-Richardson coefficients. We argue why the same classical algorithm does not straightforwardly work for the Plethysm and Kronecker coefficients, give evidence on how our quantum algorithm may avoid some hardness obstructions in their computation, and conjecture that the problem could lead to superpolynomial quantum speedups on some inputs. We finally use Frobenius reciprocity to derive another quantum algorithm that estimates these coefficients using induction and has a different cost-to-input dependence.

翻訳日:2024-08-05 15:50:45 公開日:2024-08-02

# データと知識の組み合わせの力:GPT-4oは肺癌リンパ節転移の予測に機械学習モデルを効果的に解釈する

The Power of Combining Data and Knowledge: GPT-4o is an Effective Interpreter of Machine Learning Models in Predicting Lymph Node Metastasis of Lung Cancer ( http://arxiv.org/abs/2407.17900v3 )

ライセンス: Link先を確認

Danqing Hu, Bing Liu, Xiaofeng Zhu, Nan Wu,

(参考訳) リンパ節転移 (LNM) は肺癌患者の早期治療を決定する重要な因子であるが, 正確な術前診断は困難である。近年,大きな言語モデル (LLM) が注目されている。巨大なコーパスから学んだ広範な医学知識を活用して、LLMは臨床上の問題に対する確率を推定できるが、その性能は歴史的にデータ駆動機械学習モデルよりも劣っている。本稿では,LNM予測性能を向上させるために,LLMが取得した医療知識と機械学習モデルが同定した潜伏パターンを組み合わせた新しいアンサンブル手法を提案する。当初,患者データを用いた機械学習モデルを開発した。次に、患者データを機械学習モデルから予測される確率と統合するプロンプトテンプレートを設計した。その後,OpenAIが開発した最も先進的なLCMであるGPT-4oに,患者データに基づいてLNMの確率を推定し,機械学習出力を用いて推定を調整するように指示した。最後に,同じプロンプトを用いてGPT-4oから3つのアウトプットを収集し,これらの結果を最終予測としてアンサンブルした。提案手法を用いて,LNM予測におけるAUC値0.765,AP値0.415を達成し,ベースライン機械学習モデルと比較して予測性能を著しく向上させた。実験の結果, GPT-4oは, より正確なLNM予測を実現するために, 機械学習モデルによって予測される医療知識と確率を効果的に活用できることが示唆された。これらの結果から,LSMは臨床リスク予測タスクにおいて良好に機能し,臨床リスク予測に医療知識と患者データを統合するための新たなパラダイムを提供することが明らかとなった。

Lymph node metastasis (LNM) is a crucial factor in determining the initial treatment for patients with lung cancer, yet accurate preoperative diagnosis of LNM remains challenging. Recently, large language models (LLMs) have garnered significant attention due to their remarkable text generation capabilities. Leveraging the extensive medical knowledge learned from vast corpora, LLMs can estimate probabilities for clinical problems, though their performance has historically been inferior to data-driven machine learning models. In this paper, we propose a novel ensemble method that combines the medical knowledge acquired by LLMs with the latent patterns identified by machine learning models to enhance LNM prediction performance. Initially, we developed machine learning models using patient data. We then designed a prompt template to integrate the patient data with the predicted probability from the machine learning model. Subsequently, we instructed GPT-4o, the most advanced LLM developed by OpenAI, to estimate the likelihood of LNM based on patient data and then adjust the estimate using the machine learning output. Finally, we collected three outputs from the GPT-4o using the same prompt and ensembled these results as the final prediction. Using the proposed method, our models achieved an AUC value of 0.765 and an AP value of 0.415 for LNM prediction, significantly improving predictive performance compared to baseline machine learning models. The experimental results indicate that GPT-4o can effectively leverage its medical knowledge and the probabilities predicted by machine learning models to achieve more accurate LNM predictions. These findings demonstrate that LLMs can perform well in clinical risk prediction tasks, offering a new paradigm for integrating medical knowledge and patient data in clinical predictions.

翻訳日:2024-08-05 15:50:45 公開日:2024-08-02

# 視覚変換器における奥行きの畳み込み

Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets ( http://arxiv.org/abs/2407.19394v3 )

ライセンス: Link先を確認

Tianxiao Zhang, Wenju Xu, Bo Luo, Guanghui Wang,

(参考訳) Vision Transformer (ViT)はTransformerのエンコーダを利用して、イメージをパッチに分割することでグローバル情報をキャプチャし、様々なコンピュータビジョンタスクで優れたパフォーマンスを達成する。しかし、ViTの自己注意機構は、画像やビデオの隣り合うピクセル間の固有の関係を見渡すことで、グローバルなコンテキストを最初から捉えている。トランスフォーマーは主に、細かなローカルの詳細を無視しながら、グローバル情報に焦点を当てている。その結果、ViTは画像やビデオデータセットのトレーニング中に誘導バイアスを欠いている。対照的に、畳み込みニューラルネットワーク(CNN)は、局所的なフィルタに依存するため、固有の帰納バイアスを持ち、より少ないデータでViTよりも効率的で高速に収束する。本稿では,トランスフォーマーブロック全体をバイパスして,ローカルおよびグローバル両方の情報を最小限のオーバーヘッドで捕捉する,軽量なDepth-Wise ConvolutionモジュールをViTモデルのショートカットとして提案する。さらに、Depth-Wise Convolutionモジュールをパラメータセーブのために複数のTransformerブロックに適用し、異なるカーネルで独立した並列Depth-Wise Convolutionモジュールを組み込むことにより、ローカル情報の取得を促進する。提案手法は,画像分類のためのCIFAR-10, CIFAR-100, Tiny-ImageNet, ImageNet, オブジェクト検出およびインスタンスセグメント化のためのCOCOにおいて評価され, 画像分類, オブジェクト検出, インスタンスセグメント化におけるViTモデルの性能を大幅に向上させる。ソースコードはhttps://github.com/ZTX-100/Efficient_ViT_with_DWでアクセスできる。

The Vision Transformer (ViT) leverages the Transformer's encoder to capture global information by dividing images into patches and achieves superior performance across various computer vision tasks. However, the self-attention mechanism of ViT captures the global context from the outset, overlooking the inherent relationships between neighboring pixels in images or videos. Transformers mainly focus on global information while ignoring the fine-grained local details. Consequently, ViT lacks inductive bias during image or video dataset training. In contrast, convolutional neural networks (CNNs), with their reliance on local filters, possess an inherent inductive bias, making them more efficient and quicker to converge than ViT with less data. In this paper, we present a lightweight Depth-Wise Convolution module as a shortcut in ViT models, bypassing entire Transformer blocks to ensure the models capture both local and global information with minimal overhead. Additionally, we introduce two architecture variants, allowing the Depth-Wise Convolution modules to be applied to multiple Transformer blocks for parameter savings, and incorporating independent parallel Depth-Wise Convolution modules with different kernels to enhance the acquisition of local information. The proposed approach significantly boosts the performance of ViT models on image classification, object detection and instance segmentation by a large margin, especially on small datasets, as evaluated on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet for image classification, and COCO for object detection and instance segmentation. The source code can be accessed at https://github.com/ZTX-100/Efficient_ViT_with_DW.

翻訳日:2024-08-05 15:40:20 公開日:2024-08-02

# X線画像における領域適応肺結節検出

Domain Adaptive Lung Nodule Detection in X-ray Image ( http://arxiv.org/abs/2407.19397v2 )

ライセンス: Link先を確認

Haifeng Zhao, Lixiang Jiang, Leilei Ma, Dengdi Sun, Yanping Fu,

(参考訳) 異なる医療センターの医療画像は様々なデータ分布を示しており、トレーニングと適用フェーズのドメインシフトによって肺結節の検出に適応する上で重要な課題となっている。従来の教師なしドメイン適応検出手法は、しばしばこのシフトに苦慮し、最適以下の結果をもたらす。これらの課題を克服するために,教師の自己学習とコントラスト学習を活用した,肺結節検出のための新しい領域適応アプローチを提案する。まず、結節表現を洗練させ、結節と背景の区別を強化する階層的コントラスト学習戦略を提案する。第二に、異なるドメインをまたいだ対角学習を通じて、ドメイン不変の特徴を捉えるために、nodule-level domain-invariant feature learning (NDL)モジュールを導入する。また,肺結節検出研究の進展を支援するために,X線画像の注釈付きデータセットを提案する。複数のX線データセットで行った大規模な実験は、ドメインシフトの影響を緩和するためのアプローチの有効性を示した。

Medical images from different healthcare centers exhibit varied data distributions, posing significant challenges for adapting lung nodule detection due to the domain shift between training and application phases. Traditional unsupervised domain adaptive detection methods often struggle with this shift, leading to suboptimal outcomes. To overcome these challenges, we introduce a novel domain adaptive approach for lung nodule detection that leverages mean teacher self-training and contrastive learning. First, we propose a hierarchical contrastive learning strategy to refine nodule representations and enhance the distinction between nodules and background. Second, we introduce a nodule-level domain-invariant feature learning (NDL) module to capture domain-invariant features through adversarial learning across different domains. Additionally, we propose a new annotated dataset of X-ray images to aid in advancing lung nodule detection research. Extensive experiments conducted on multiple X-ray datasets demonstrate the efficacy of our approach in mitigating domain shift impacts.

翻訳日:2024-08-05 15:40:20 公開日:2024-08-02

# XLIP:医療用言語画像事前学習のためのクロスモーダル・アテンション・マスクド・モデリング

XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training ( http://arxiv.org/abs/2407.19546v2 )

ライセンス: Link先を確認

Biao Wu, Yutong Xie, Zeyu Zhang, Minh Hieu Phan, Qi Chen, Ling Chen, Qi Wu,

(参考訳) 医療分野におけるVLP(Vision-and-Language Pretraining)は、画像テキストペアのコントラスト学習を利用して、タスク間の効果的な伝達を実現する。しかし、現在のVLPアプローチでは、医療領域に適用する場合、マスク付きモデリング戦略が2つの課題に直面している。第一に、現在のモデルは、医療データの不足のため、重要な病理的特徴を正確に再構築するのに苦労している。第二に、ほとんどのメソッドはペア化された画像テキストまたはイメージのみのデータのみを採用しており、ペア化されたデータとペアなしのデータの組み合わせを利用できない。そこで本稿では,XLIP(Masked modelling for Medical Language-Image Pre-Training)フレームワークを提案する。まず、マルチモーダルな特徴の相互作用によって、病理的視覚的およびテキスト的トークンを再構築し、医療的特徴を改善することを学ぶ、AttMIM(AttMIM)とエンティティ駆動型マスキング言語モデリングモジュール(EntMLM)を紹介する。 AttMIMモジュールは、テキスト機能に非常に反応する画像機能の一部をマスクする。これにより、XLIPは医療効率において、非常に類似した画像データの再構成を改善することができる。第2に、XLIPは、病原性プロンプトを導入してマルチモーダル学習を強化するために、不適切なデータを大まかに活用する。実験結果から,XLIPは5つのデータセットに対してゼロショットおよび微調整による分類性能のSOTAを実現することがわかった。私たちのコードはhttps://github.com/White65534/XLIPで利用可能です。

Vision-and-language pretraining (VLP) in the medical field utilizes contrastive learning on image-text pairs to achieve effective transfer across tasks. Yet, current VLP approaches with the masked modelling strategy face two challenges when applied to the medical domain. First, current models struggle to accurately reconstruct key pathological features due to the scarcity of medical data. Second, most methods only adopt either paired image-text or image-only data, failing to exploit the combination of both paired and unpaired data. To this end, this paper proposes a XLIP (Masked modelling for medical Language-Image Pre-training) framework to enhance pathological learning and feature learning via unpaired data. First, we introduce the attention-masked image modelling (AttMIM) and entity-driven masked language modelling module (EntMLM), which learns to reconstruct pathological visual and textual tokens via multi-modal feature interaction, thus improving medical-enhanced features. The AttMIM module masks a portion of the image features that are highly responsive to textual features. This allows XLIP to improve the reconstruction of highly similar image data in medicine efficiency. Second, our XLIP capitalizes unpaired data to enhance multimodal learning by introducing disease-kind prompts. The experimental results show that XLIP achieves SOTA for zero-shot and fine-tuning classification performance on five datasets. Our code will be available at https://github.com/White65534/XLIP

翻訳日:2024-08-05 15:40:20 公開日:2024-08-02

# LLMの自然言語理解

LLMs' Understanding of Natural Language Revealed ( http://arxiv.org/abs/2407.19630v2 )

ライセンス: Link先を確認

Walid S. Saba,

(参考訳) 大規模言語モデル(LLM)は、大規模言語におけるボトムアップ、データ駆動のリバースエンジニアリングにおける大規模な実験の結果である。下流のNLPタスクで多用されているにもかかわらず、LLMは量子化を必要とするタスクやシンボル変数の操作(例えば、計画と問題解決)において推論を行うことができない。しかし,本稿では,LLMの言語理解能力の検証に焦点をあてる。ここで示すように、LLMの言語理解能力は、広く誇張されている。 LLMは人間のようなコヒーレントな言語を生成することが証明されているが、言語理解能力は適切にテストされていない。特に、LLMの言語理解能力は、"テキスト生成"とは逆の操作を実行し、具体的にはテキストのLLMスニペットを入力として与え、LLMの"理解"を問うことで検証されるべきであると考えている。ここで示すように、LLMが言語を真に理解していないことは明らかになるでしょう。

Large language models (LLMs) are the result of a massive experiment in bottom-up, data-driven reverse engineering of language at scale. Despite their utility in a number of downstream NLP tasks, ample research has shown that LLMs are incapable of performing reasoning in tasks that require quantification over and the manipulation of symbolic variables (e.g., planning and problem solving); see for example [25][26]. In this document, however, we will focus on testing LLMs for their language understanding capabilities, their supposed forte. As we will show here, the language understanding capabilities of LLMs have been widely exaggerated. While LLMs have proven to generate human-like coherent language (since that's how they were designed), their language understanding capabilities have not been properly tested. In particular, we believe that the language understanding capabilities of LLMs should be tested by performing an operation that is the opposite of 'text generation' and specifically by giving the LLM snippets of text as input and then querying what the LLM "understood". As we show here, when doing so it will become apparent that LLMs do not truly understand language, beyond very superficial inferences that are essentially the byproduct of the memorization of massive amounts of ingested text.

翻訳日:2024-08-05 15:40:20 公開日:2024-08-02

# CP-Prompt:ドメイン・インクリメンタル連続学習のための構成に基づくクロスモーダル・プロンプト

CP-Prompt: Composition-Based Cross-modal Prompting for Domain-Incremental Continual Learning ( http://arxiv.org/abs/2407.21043v2 )

ライセンス: Link先を確認

Yu Feng, Zhen Tian, Yifan Zhu, Zongfu Han, Haoran Luo, Guangwei Zhang, Meina Song,

(参考訳) クロスモーダルドメイン・インクリメンタル・ラーニング(DIL)の鍵となる課題は、学習モデルが古いことを忘れずに、同じタスクの下で異なる特徴分布を持つ新しいデータから継続的に学習できるようにすることである。しかし、ドメイン内知識抽出とドメイン間共通プロンプト戦略が欠如しているため、既存のトップパフォーマンス手法は依然として高い忘れ込み率を引き起こす。本稿では,制約パラメータをトレーニングして,事前学習したモデルに新しいドメインを学習させ,既存の特徴分布を忘れないようにする,シンプルで効果的なフレームワークCP-Promptを提案する。 CP-Promptはドメイン内知識を、多頭部自己注意層にパーソナライズされたプロンプトを合成的に挿入し、共通のプロンプト戦略でドメイン間知識を学ぶ。 CP-Promptは,3つの広く評価されたDILタスクにおいて,最先端のベースラインよりも優れていた。ソースコードはhttps://github.com/dannis97500/CP_Prompt.comで入手できる。

The key challenge of cross-modal domain-incremental learning (DIL) is to enable the learning model to continuously learn from novel data with different feature distributions under the same task without forgetting old ones. However, existing top-performing methods still cause high forgetting rates, by lacking intra-domain knowledge extraction and inter-domain common prompting strategy. In this paper, we propose a simple yet effective framework, CP-Prompt, by training limited parameters to instruct a pre-trained model to learn new domains and avoid forgetting existing feature distributions. CP-Prompt captures intra-domain knowledge by compositionally inserting personalized prompts on multi-head self-attention layers and then learns the inter-domain knowledge with a common prompting strategy. CP-Prompt shows superiority compared with state-of-the-art baselines among three widely evaluated DIL tasks. The source code is available at https://github.com/dannis97500/CP_Prompt.

翻訳日:2024-08-05 15:40:20 公開日:2024-08-02

# ペルソマ:パーソナライズされたSoft ProMptアダプタアーキテクチャ

PERSOMA: PERsonalized SOft ProMpt Adapter Architecture for Personalized Language Prompting ( http://arxiv.org/abs/2408.00960v1 )

ライセンス: Link先を確認

Liam Hebert, Krishna Sayana, Ambarish Jash, Alexandros Karatzoglou, Sukhdeep Sodhi, Sumanth Doddapaneni, Yanli Cai, Dima Kuzmin,

(参考訳) ユーザの広範なインタラクション履歴のニュアンスを理解することは、進化するユーザの好みに適応できる正確でパーソナライズされた自然言語システムを構築するための鍵となる。そこで我々は,Personalized Soft Prompt AdapterアーキテクチャであるPERSOMAを紹介した。大規模な言語モデルのパーソナライズされたプロンプトメソッドとは異なり、PERSOMAはユーザ履歴を効率的にキャプチャするための新しいアプローチを提供する。 LLMの入力として埋め込み表現を利用する最近の研究に基づいて、自由形式のテキストとして相互作用を再サンプリングし、圧縮することで、これを実現できる。我々は,様々なアダプタアーキテクチャ,第1ステージサンプリング戦略,LoRAなどのパラメータ効率向上手法,その他パーソナライズ手法を評価することで,我々のアプローチを厳格に検証する。 PERSOMAは,既存の埋め込み技術やテキストプロンプト技術と比較して,大規模かつ複雑なユーザ履歴を扱う能力に優れていた。

Understanding the nuances of a user's extensive interaction history is key to building accurate and personalized natural language systems that can adapt to evolving user preferences. To address this, we introduce PERSOMA, Personalized Soft Prompt Adapter architecture. Unlike previous personalized prompting methods for large language models, PERSOMA offers a novel approach to efficiently capture user history. It achieves this by resampling and compressing interactions as free form text into expressive soft prompt embeddings, building upon recent research utilizing embedding representations as input for LLMs. We rigorously validate our approach by evaluating various adapter architectures, first-stage sampling strategies, parameter-efficient tuning techniques like LoRA, and other personalization methods. Our results demonstrate PERSOMA's superior ability to handle large and complex user histories compared to existing embedding-based and text-prompt-based techniques.

翻訳日:2024-08-05 14:46:34 公開日:2024-08-02

# MIS-ME:土壌水分推定のためのマルチモーダルフレームワーク

MIS-ME: A Multi-modal Framework for Soil Moisture Estimation ( http://arxiv.org/abs/2408.00963v1 )

ライセンス: Link先を確認

Mohammed Rakib, Adil Aman Mohammed, Cole Diggins, Sumit Sharma, Jeff Michael Sadler, Tyson Ochsner, Arun Bagavathi,

(参考訳) 土壌水分推定は、灌水、肥料化、収穫のための最適な計画を作成する際に、精密農業を可能にする重要な課題である。気象予報や土壌特性,作物特性といった従来のデータソースから土壌水分を推定するために,統計的および機械学習モデルを利用するのが一般的である。しかし, 土壌水分を推定するために, 地空間画像の利用への関心が高まっている。これらの画像は高解像度の作物の細部を捉えているが、キュレートするのは高価であり、解釈は困難である。スマートフォンが捉えた視覚的手がかりと天気予報による統計データを使って土壌の水分を予測するAI強化ソフトウェアツールを想像してみてほしい。この研究は、土壌水分推定のためのマルチモーダルアプローチを開発するための第一歩である。特に,地上局から撮影した実世界の画像とそれに対応する気象データからなるデータセットをキュレートする。また, 土壌水分推定のためのマルチモーダルフレームワークMIS-ME-Meteorological & Imageベース土壌水分推定器を提案する。我々はMIS-MEが10.79%のMAPEを達成し、気象データに対するMAPEの2.6%、画像データにおけるMAPEの1.5%を削減した。

Soil moisture estimation is an important task to enable precision agriculture in creating optimal plans for irrigation, fertilization, and harvest. It is common to utilize statistical and machine learning models to estimate soil moisture from traditional data sources such as weather forecasts, soil properties, and crop properties. However, there is a growing interest in utilizing aerial and geospatial imagery to estimate soil moisture. Although these images capture high-resolution crop details, they are expensive to curate and challenging to interpret. Imagine, an AI-enhanced software tool that predicts soil moisture using visual cues captured by smartphones and statistical data given by weather forecasts. This work is a first step towards that goal of developing a multi-modal approach for soil moisture estimation. In particular, we curate a dataset consisting of real-world images taken from ground stations and their corresponding weather data. We also propose MIS-ME - Meteorological & Image based Soil Moisture Estimator, a multi-modal framework for soil moisture estimation. Our extensive analysis shows that MIS-ME achieves a MAPE of 10.79%, outperforming traditional unimodal approaches with a reduction of 2.6% in MAPE for meteorological data and 1.5% in MAPE for image data, highlighting the effectiveness of tailored multi-modal approaches.

翻訳日:2024-08-05 14:46:34 公開日:2024-08-02

# ディフェンダー・アタッカーシークエンシャルセキュリティゲームにおける量子応答解析

A Quantal Response Analysis of Defender-Attacker Sequential Security Games ( http://arxiv.org/abs/2408.00964v1 )

ライセンス: Link先を確認

Md Reya Shad Azim, Mustafa Abdallah,

(参考訳) 攻撃者が攻撃を企てている間、攻撃者がサイトを保護する責任を負う2つのサイトと、攻撃者と攻撃者の間のシーケンシャルなゲームに関するシナリオを探索する。各サイトは、攻撃が成功する確率とともに、妥協されたときにディフェンダーの損失値を保持する。ディフェンダーは、各サイトのセキュリティ投資を通じて、これらの可能性を減らすことができる。攻撃の目的は、防御者のセキュリティ投資を考慮して、防御者の期待する損失を最大化するサイトをターゲットにすることである。従来,このようなシナリオにおけるセキュリティ投資について検討してきたが,本稿では,行動経済学において確認されたように,被告が提示する有界合理性の影響について検討した。具体的には、人間が効率的な(純粋な)戦略を選択する際にエラーを犯す量子的行動バイアスについて考察する。逐次ゲームにおける量子応答平衡の存在を実証し、このバイアスが最適なセキュリティ投資の選択にどのように影響するかを分析する。さらに, 行動バイアスのない最適解に比べて, 量的意思決定下での均衡投資の非効率性を定量化する。本研究の主な成果を数値シミュレーションで検証する。

We explore a scenario involving two sites and a sequential game between a defender and an attacker, where the defender is responsible for securing the sites while the attacker aims to attack them. Each site holds a loss value for the defender when compromised, along with a probability of successful attack. The defender can reduce these probabilities through security investments at each site. The attacker's objective is to target the site that maximizes the expected loss for the defender, taking into account the defender's security investments. While previous studies have examined security investments in such scenarios, our work investigates the impact of bounded rationality exhibited by the defender, as identified in behavioral economics. Specifically, we consider quantal behavioral bias, where humans make errors in selecting efficient (pure) strategies. We demonstrate the existence of a quantal response equilibrium in our sequential game and analyze how this bias affects the defender's choice of optimal security investments. Additionally, we quantify the inefficiency of equilibrium investments under quantal decision-making compared to an optimal solution devoid of behavioral biases. We provide numerical simulations to validate our main findings.

翻訳日:2024-08-05 14:46:34 公開日:2024-08-02

# ESGとAIの統合: 総合責任AIアセスメントフレームワーク

Integrating ESG and AI: A Comprehensive Responsible AI Assessment Framework ( http://arxiv.org/abs/2408.00965v1 )

ライセンス: Link先を確認

Sung Une Lee, Harsha Perera, Yue Liu, Boming Xia, Qinghua Lu, Liming Zhu,

(参考訳) 人工知能(AI)は、業界全体で広く開発され、採用されている技術である。環境、社会、ガバナンス(ESG)とAI投資を統合することは、倫理的かつ持続可能な技術進歩の確保に不可欠である。特に投資家の視点では、この統合はリスクを軽減するだけでなく、AIイニシアティブをより広範な社会的目標と整合させることで、長期的な価値創造を促進する。しかし、この領域は学術と産業の両方であまり調査されていない。このギャップを埋めるために,28社の企業との関わりから洞察を得て開発され,3つの重要なコンポーネントから構成されるESG-AIフレームワークを導入する。このフレームワークは、業界関係者とのコラボレーションによって開発された、この統合に対する構造化されたアプローチを提供する。 ESG-AIフレームワークは、AIアプリケーションの環境および社会的影響の概要を提供する。さらに、投資家は、構造化されたエンゲージメントと特定のリスク領域の徹底的な評価を通じて、責任あるAIに対する企業のコミットメントを評価することができる。我々は2024年4月にフレームワークとツールキットを公開し、投資コミュニティから大きな注目を集め、肯定的なフィードバックを受けています。本稿では、現実世界の文脈における適用可能性とその倫理的AI投資を導く可能性を示す、フレームワークの各コンポーネントについて詳述する。

Artificial Intelligence (AI) is a widely developed and adopted technology across entire industry sectors. Integrating environmental, social, and governance (ESG) considerations with AI investments is crucial for ensuring ethical and sustainable technological advancement. Particularly from an investor perspective, this integration not only mitigates risks but also enhances long-term value creation by aligning AI initiatives with broader societal goals. Yet, this area has been less explored in both academia and industry. To bridge the gap, we introduce a novel ESG-AI framework, which is developed based on insights from engagements with 28 companies and comprises three key components. The framework provides a structured approach to this integration, developed in collaboration with industry practitioners. The ESG-AI framework provides an overview of the environmental and social impacts of AI applications, helping users such as investors assess the materiality of AI use. Moreover, it enables investors to evaluate a company's commitment to responsible AI through structured engagements and thorough assessment of specific risk areas. We have publicly released the framework and toolkit in April 2024, which has received significant attention and positive feedback from the investment community. This paper details each component of the framework, demonstrating its applicability in real-world contexts and its potential to guide ethical AI investments.

翻訳日:2024-08-05 14:46:34 公開日:2024-08-02

# 自然言語テキストからの動機・感情・行動の関係の自動抽出

Automatic Extraction of Relationships among Motivations, Emotions and Actions from Natural Language Texts ( http://arxiv.org/abs/2408.00966v1 )

ライセンス: Link先を確認

Fei Yang,

(参考訳) 本稿では,自然言語テキストを明示的に付与したモチベーション,感情,行動間の関係を明らかにするためのグラフベースの新しいフレームワークを提案する。有向非巡回グラフは、人間の性質を記述するように設計されている。ナーチュアの信念は、外部の出来事と人間の自然グラフを結びつけるために組み込まれている。大きな言語モデルのパワーのため、アノテーションのリソースは必要ない。 Amazon Fine Foods Reviewsデータセットがコーパスとして使用され、食品関連のモチベーションが重視されている。 92,990個の関係グラフが生成され、そのうち63%が論理的意味を持つ。今後の研究において、最適化方向のエラータイプについてさらなる分析を行う。

We propose a new graph-based framework to reveal relationships among motivations, emotions and actions explicitly given natural language texts. A directed acyclic graph is designed to describe human's nature. Nurture beliefs are incorporated to connect outside events and the human's nature graph. No annotation resources are required due to the power of large language models. Amazon Fine Foods Reviews dataset is used as corpus and food-related motivations are focused. Totally 92,990 relationship graphs are generated, of which 63% make logical sense. We make further analysis to investigate error types for optimization direction in future research.

翻訳日:2024-08-05 14:46:34 公開日:2024-08-02

# LiDARと空中画像から物体の高さを抽出する

Extracting Object Heights From LiDAR & Aerial Imagery ( http://arxiv.org/abs/2408.00967v1 )

ライセンス: Link先を確認

Jesus Guerrero,

(参考訳) 本研究は,LiDARと空中画像から物体の高さを抽出する手続き的手法を示す。我々は,LiDARと画像処理の高度と将来について論じる。 SOTAオブジェクトセグメンテーションは、ディープラーニングのバックグラウンドなしでオブジェクトの高さを取得できます。エンジニアは世代間で世界データの追跡と再処理を行う。彼らはこの論文のような古い手続き的手法と、ここで議論した新しい手法を使っています。 SOTAメソッドは分析を超えて、生成AIに移行している。手続き的手法と言語モデルを用いた新しい手法の両方を取り上げる。これには、ポイントクラウド、画像、テキストエンコーディングが含まれており、空間的に認識されたAIを可能にする。

This work shows a procedural method for extracting object heights from LiDAR and aerial imagery. We discuss how to get heights and the future of LiDAR and imagery processing. SOTA object segmentation allows us to take get object heights with no deep learning background. Engineers will be keeping track of world data across generations and reprocessing them. They will be using older procedural methods like this paper and newer ones discussed here. SOTA methods are going beyond analysis and into generative AI. We cover both a procedural methodology and the newer ones performed with language models. These include point cloud, imagery and text encoding allowing for spatially aware AI.

翻訳日:2024-08-05 14:46:34 公開日:2024-08-02

# DNSSEC+:DNSSECのメリットと落とし穴によって動機付けられた拡張DNSスキーム

DNSSEC+: An Enhanced DNS Scheme Motivated by Benefits and Pitfalls of DNSSEC ( http://arxiv.org/abs/2408.00968v1 )

ライセンス: Link先を確認

Ali Sadeghi Jahromi, AbdelRahman Abdou, Paul C. van Oorschot,

(参考訳) DNS再帰リゾルバと権限を持ったネームサーバ間のセキュリティ対策がないことは、インライン攻撃とオフパス攻撃の両方によって悪用されている。多くのセキュリティ提案が実際や以前の文献で行われているが、それらは通常、デプロイの障壁や/または不適切なセキュリティ特性に悩まされている。レゾルバとネームサーバの間に広く採用されているセキュリティソリューションがないことは、これらの問題を以前の提案で緩和する新しいスキームを動機付けている。 DNSSECのセキュリティとデプロイ性に対処しつつ、そのメリットを維持しながら、DNSSEC+を提示する。 DNSSEC+は、既存のDNSSECトラストモデルを利用して、ゾーン内のネームサーバを短期間に認可し、ゾーンデータを安全に提供し、DNSレスポンスのリアルタイムセキュリティプロパティを容易にします。名前解決のレイテンシに関しては、DNSSEC+は安全性の低いスキームに匹敵するパフォーマンスを提供する。名前解決のために9つのセキュリティ、プライバシ、デプロイ可能性プロパティを定義し、DNSSEC+がこれらのプロパティをどのように満たしているかを示します。

The absence of security measures between DNS recursive resolvers and authoritative nameservers has been exploited by both inline and off-path attacks. While many security proposals have been made in practice and previous literature, they typically suffer from deployability barriers and/or inadequate security properties. The absence of a broadly adopted security solution between resolvers and nameservers motivates a new scheme that mitigates these issues in previous proposals. We present DNSSEC+, which addresses security and deployability downsides of DNSSEC, while retaining its benefits. DNSSEC+ takes advantage of the existent DNSSEC trust model and authorizes the nameservers within a zone for short intervals to serve the zone data securely, facilitating real-time security properties for DNS responses, without requiring long-term private keys to be duplicated (thus put at risk) on authoritative nameservers. Regarding name resolution latency, DNSSEC+ offers a performance comparable to less secure schemes. We define nine security, privacy, and deployability properties for name resolution, and show how DNSSEC+ fulfills these properties.

翻訳日:2024-08-05 14:46:34 公開日:2024-08-02

# Visible-Thermal Multiple Object Tracking:大規模ビデオデータセットとプログレッシブ・フュージョン・アプローチ

Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach ( http://arxiv.org/abs/2408.00969v1 )

ライセンス: Link先を確認

Yabin Zhu, Qianwu Wang, Chenglong Li, Jin Tang, Zhixiang Huang,

(参考訳) 可視・熱赤外データによる相補的な利点は、視覚追跡、セマンティックセグメンテーション、オブジェクト検出など様々なコンピュータビジョンタスクで広く活用されているが、多重物体追跡(MOT)では滅多に研究されていない。本稿では、VT-MOTと呼ばれるMOTのための大規模な可視熱ビデオベンチマークに貢献する。 VT-MOTには以下の利点がある。 1) データは大規模で多様である。 VT-MOTには、582のビデオシーケンスペア、401kフレームペアの監視、ドローン、ハンドヘルドプラットフォームが含まれている。 2) クロスモーダルアライメントは極めて正確である。フレーム単位で空間的および時間的アライメントフレームを実行するために、複数の専門家を招待する。 3)アノテーションは密で高品質である。 VT-MOTには、3.99百万のアノテーションボックスがあり、専門家によって注釈付けされ、二重チェックされている。本研究では,2つのモードの時間的情報と相補的情報を段階的に効果的に融合した,可視光熱MOTのための簡易かつ効果的な追跡フレームワークを設計する。 VT-MOTについて総合実験を行い, 提案手法の有効性と有効性を示した。評価結果と解析結果から,可視光熱的MOTの今後の方向性を示す。このプロジェクトはhttps://github.com/wqw123wqw/PFTrack.comで公開されている。

The complementary benefits from visible and thermal infrared data are widely utilized in various computer vision task, such as visual tracking, semantic segmentation and object detection, but rarely explored in Multiple Object Tracking (MOT). In this work, we contribute a large-scale Visible-Thermal video benchmark for MOT, called VT-MOT. VT-MOT has the following main advantages. 1) The data is large scale and high diversity. VT-MOT includes 582 video sequence pairs, 401k frame pairs from surveillance, drone, and handheld platforms. 2) The cross-modal alignment is highly accurate. We invite several professionals to perform both spatial and temporal alignment frame by frame. 3) The annotation is dense and high-quality. VT-MOT has 3.99 million annotation boxes annotated and double-checked by professionals, including heavy occlusion and object re-acquisition (object disappear and reappear) challenges. To provide a strong baseline, we design a simple yet effective tracking framework, which effectively fuses temporal information and complementary information of two modalities in a progressive manner, for robust visible-thermal MOT. A comprehensive experiment are conducted on VT-MOT and the results prove the superiority and effectiveness of the proposed method compared with state-of-the-art methods. From the evaluation results and analysis, we specify several potential future directions for visible-thermal MOT. The project is released in https://github.com/wqw123wqw/PFTrack.

翻訳日:2024-08-05 14:46:34 公開日:2024-08-02

# META-ANOVA:解釈可能な機械学習のためのスクリーニングインタラクション

META-ANOVA: Screening interactions for interpretable machine learning ( http://arxiv.org/abs/2408.00973v1 )

ライセンス: Link先を確認

Yongchan Choi, Seokhun Park, Chanmoo Park, Dongha Kim, Yongdai Kim,

(参考訳) 予測モデルを評価する際に考慮すべきことは2つある。 1つは予測精度、もう1つは解釈可能性である。近年では、アンサンブルベースのモデルやディープニューラルネットワークなど、高性能の予測モデルが数多く開発されている。しかし、これらのモデルは複雑すぎることが多く、その予測を直感的に解釈することは困難である。この解釈の複雑さは、医学、ファイナンス、大学入学などの説明責任を必要とする多くの現実世界の分野での使用を制限する。本研究では,メタアノバと呼ばれる新しい手法を開発し,任意の予測モデルに対して解釈可能なモデルを提供する。 Meta-ANOVAの基本的な考え方は、与えられたブラックボックス予測モデルを機能的ANOVAモデルに変換することである。 Meta-ANOVAの新たな技術的貢献は、与えられたブラックボックスモデルを機能的ANOVAモデルに変換する前に不要な相互作用をスクリーニングする手順である。このスクリーニング手法により、計算困難を伴わずに変換された機能的ANOVAモデルに高次相互作用を組み込むことができる。スクリーニング手順が漸近的に一貫性があることを実証する。合成および実世界のデータセットを用いた様々な実験を通じて,メタアノバの優位性を実証的に実証した。

There are two things to be considered when we evaluate predictive models. One is prediction accuracy,and the other is interpretability. Over the recent decades, many prediction models of high performance, such as ensemble-based models and deep neural networks, have been developed. However, these models are often too complex, making it difficult to intuitively interpret their predictions. This complexity in interpretation limits their use in many real-world fields that require accountability, such as medicine, finance, and college admissions. In this study, we develop a novel method called Meta-ANOVA to provide an interpretable model for any given prediction model. The basic idea of Meta-ANOVA is to transform a given black-box prediction model to the functional ANOVA model. A novel technical contribution of Meta-ANOVA is a procedure of screening out unnecessary interaction before transforming a given black-box model to the functional ANOVA model. This screening procedure allows the inclusion of higher order interactions in the transformed functional ANOVA model without computational difficulties. We prove that the screening procedure is asymptotically consistent. Through various experiments with synthetic and real-world datasets, we empirically demonstrate the superiority of Meta-ANOVA

翻訳日:2024-08-05 14:36:49 公開日:2024-08-02

# 長寿命Metastable-qubitメモリ

Long-lived metastable-qubit memory ( http://arxiv.org/abs/2408.00975v1 )

ライセンス: Link先を確認

Xiaoyang Shi, Jasmine Sinanan-Singh, Kyle DeBry, Susanna L. Todaro, Isaac L. Chuang, John Chiaverini,

(参考訳) 量子情報のコヒーレントな保存は多くの量子技術にとって不可欠である。長いコヒーレンス時間は閉じ込められたイオン量子ビットで示されており、典型的には1つのイオンの基底状態における超微細な準位を用いている。しかし、最近の研究では、量子ビットを準安定状態に符号化することで、単一種システムにおける効果的な二重種操作の可能性や、フォールトトレラント量子コンピューティングにおける消去エラー変換といった、量子情報処理にアーキテクチャ上の利点をもたらす可能性が示唆されている。ここでは、捕捉されたイオンの準安定状態における量子状態の長寿命符号化を示す。同調的に同じ種の他のイオンと冷却し、常に消去エラーをモニタリングすることにより、準安定な5D_{5/2}$状態の量子ビットで136(42)秒のコヒーレンス時間を示す。動的デカップリングに基づくノイズスペクトロスコピーによる実験結果に基づくモデルと一致して, 消去誤差が除去されると, メタスタブルレベルのデフォーカスがエラーの原因となることが判明した。

Coherent storage of quantum information is crucial to many quantum technologies. Long coherence times have been demonstrated in trapped-ion qubits, typically using the hyperfine levels within the ground state of a single ion. However, recent research suggests qubits encoded in metastable states could provide architectural benefits for quantum information processing, such as the possibility of effective dual-species operation in a single-species system and erasure-error conversion for fault-tolerant quantum computing. Here we demonstrate long-lived encoding of a quantum state in the metastable states of a trapped ion. By sympathetically cooling with another ion of the same species and constantly monitoring for erasure errors, we demonstrate a coherence time of 136(42) seconds with a qubit encoded in the metastable $5D_{5/2}$ state of a single $^{137}$Ba$^+$ ion. In agreement with a model based on empirical results from dynamical-decoupling-based noise spectroscopy, we find that dephasing of the metastable levels is the dominant source of error once erasure errors are removed.

翻訳日:2024-08-05 14:36:49 公開日:2024-08-02

# グラフマッチングによるドメイン間名前付きエンティティ認識

Cross-domain Named Entity Recognition via Graph Matching ( http://arxiv.org/abs/2408.00981v1 )

ライセンス: Link先を確認

Junhao Zheng, Haibin Chen, Qianli Ma,

(参考訳) クロスドメインのNERは、現実のシナリオにおけるデータの不足から、実用的ながら難しい問題である。一般的なプラクティスは、まず、リッチリソースの汎用ドメインでNERモデルを学習し、その後、モデルを特定のドメインに適応させることである。ドメイン間のエンティティタイプ間のミスマッチの問題により、汎用ドメインにおける幅広い知識は、ターゲットドメイン NER モデルに効果的に転送できない。この目的のために、ラベル関係を確率分布としてモデル化し、ソースとターゲットの両方のラベル空間にラベルグラフを構築する。ラベル構造を用いた文脈表現を強化するため,BERTによる単語埋め込み出力にラベルグラフを融合する。ラベル関係をグラフとして表現することにより、グラフマッチング問題としてクロスドメインNERを定式化する。さらに,本提案手法は事前学習法に適用性が高く,他のドメイン間予測タスクも可能となる可能性が示唆された。 4つのデータセットに対する実験結果から,本手法は一連の移動学習,マルチタスク学習,少数ショット学習よりも優れていた。

Cross-domain NER is a practical yet challenging problem since the data scarcity in the real-world scenario. A common practice is first to learn a NER model in a rich-resource general domain and then adapt the model to specific domains. Due to the mismatch problem between entity types across domains, the wide knowledge in the general domain can not effectively transfer to the target domain NER model. To this end, we model the label relationship as a probability distribution and construct label graphs in both source and target label spaces. To enhance the contextual representation with label structures, we fuse the label graph into the word embedding output by BERT. By representing label relationships as graphs, we formulate cross-domain NER as a graph matching problem. Furthermore, the proposed method has good applicability with pre-training methods and is potentially capable of other cross-domain prediction tasks. Empirical results on four datasets show that our method outperforms a series of transfer learning, multi-task learning, and few-shot learning methods.

翻訳日:2024-08-05 14:36:49 公開日:2024-08-02

# 低次元特徴と注目型ニューラルネットワークを用いたノイズラジオグラフィーからのRichtmyer-Meshkov不安定性の再構築

Reconstructing Richtmyer-Meshkov instabilities from noisy radiographs using low dimensional features and attention-based neural networks ( http://arxiv.org/abs/2408.00985v1 )

ライセンス: Link先を確認

Daniel A. Serino, Marc L. Klasky, Balasubramanya T. Nadiga, Xiaojian Xu, Trevor Wilcox,

(参考訳) 訓練された注意に基づくトランスフォーマーネットワークは、ブラー、散乱、ノイズで劣化した放射線画像から得られた一連の流体力学的特徴から、リッチマイア・メシュコフ不安定によって与えられる複雑なトポロジーを確実に回復することができる。このアプローチは、ICFのような二重貝殻流体力学シミュレーションで実証される。このネットワークの重要なコンポーネントは、ノイズの多いラジオグラフから抽出された一連の特徴に作用するトランスフォーマーエンコーダである。このエンコーダは、入力シーケンスにおける時間的依存関係を学習し、モデルの表現性を高めるために作用する多数の自己注意層を含む。この手法は, ガス-金属界面がラジオグラフィーノイズによって著しく隠蔽されているにもかかわらず, リヒトマイアー-メシュコフ不安定性成長速度を正確に回復する優れた能力を示すことが示されている。

A trained attention-based transformer network can robustly recover the complex topologies given by the Richtmyer-Meshkoff instability from a sequence of hydrodynamic features derived from radiographic images corrupted with blur, scatter, and noise. This approach is demonstrated on ICF-like double shell hydrodynamic simulations. The key component of this network is a transformer encoder that acts on a sequence of features extracted from noisy radiographs. This encoder includes numerous self-attention layers that act to learn temporal dependencies in the input sequences and increase the expressiveness of the model. This approach is demonstrated to exhibit an excellent ability to accurately recover the Richtmyer-Meshkov instability growth rates, even despite the gas-metal interface being greatly obscured by radiographic noise.

翻訳日:2024-08-05 14:36:49 公開日:2024-08-02

# SATによるベイズネットワークの厳密な検証

A SAT-based approach to rigorous verification of Bayesian networks ( http://arxiv.org/abs/2408.00986v1 )

ライセンス: Link先を確認

Ignacy Stępka, Nicholas Gisolfi, Artur Dubrawski,

(参考訳) 機械学習の最近の進歩は、様々な現実世界のアプリケーションで広く採用されている。しかしながら、安全クリティカルなドメインでは、マシンラーニングモデルのデプロイは、その複雑さ、解釈可能性の欠如、行動に関する正式な保証の欠如など、課題によって取り除かれています。本稿では,ベイジアンネットワークに適した検証フレームワークを提案する。本フレームワークは,(1)ベイジアンネットワークをブール論理リテラルに変換する2段階のコンパイルおよび符号化スキームと,(2)これらのリテラルを活用して制約として符号化された様々なプロパティを検証する形式的検証クエリの2つの主要なコンポーネントから構成される。具体的には、if-then Rule(ITR)とFeature monotonicity(FMO)の2つの検証クエリを導入する。検証手法の効率をベンチマークし、実世界のシナリオでその実用性を実証する。

Recent advancements in machine learning have accelerated its widespread adoption across various real-world applications. However, in safety-critical domains, the deployment of machine learning models is riddled with challenges due to their complexity, lack of interpretability, and absence of formal guarantees regarding their behavior. In this paper, we introduce a verification framework tailored for Bayesian networks, designed to address these drawbacks. Our framework comprises two key components: (1) a two-step compilation and encoding scheme that translates Bayesian networks into Boolean logic literals, and (2) formal verification queries that leverage these literals to verify various properties encoded as constraints. Specifically, we introduce two verification queries: if-then rules (ITR) and feature monotonicity (FMO). We benchmark the efficiency of our verification scheme and demonstrate its practical utility in real-world scenarios.

翻訳日:2024-08-05 14:36:49 公開日:2024-08-02

# 有害剤を用いたマルチエージェントシステムのレジリエンスについて

On the Resilience of Multi-Agent Systems with Malicious Agents ( http://arxiv.org/abs/2408.00989v1 )

ライセンス: Link先を確認

Jen-tse Huang, Jiaxu Zhou, Tailin Jin, Xuhui Zhou, Zixi Chen, Wenxuan Wang, Youliang Yuan, Maarten Sap, Michael R. Lyu,

(参考訳) 大規模言語モデルを用いたマルチエージェントシステムは、専門家エージェントの協調によって様々なタスクにまたがる優れた能力を示し、それぞれが特定のドメインに焦点を当てている。しかし、エージェントを別々に配置する場合、悪意のあるユーザーが不正または無関係な結果をもたらす悪意のあるエージェントを導入するリスクがある。そこで本研究では,(1) 様々なマルチエージェントシステムのレジリエンス(例えば, A$\rightarrow$B$\rightarrow$C, A$\leftrightarrow$B$\leftrightarrow$C)が,悪質なエージェントの下で,異なる下流タスクに対してどのようなレジリエンスを持つかを検討する。 (2)悪意のあるエージェントに対して、システムレジリエンスを高めるにはどうすればいいのか? 悪意のあるエージェントをシミュレートするために、AutoTransformとAutoInjectという2つのメソッドを考案しました。我々は、コード生成、数学問題、翻訳、テキスト評価という、4つの下流マルチエージェントシステムタスクに関する総合的な実験を行う。その結果、A$\rightarrow$(B$\leftrightarrow$C)という階層的なマルチエージェント構造は、他の2つの構造のうち、46.4\%と49.8\%よりも、最低性能が23.6\%$の優れたレジリエンスを示すことが示唆された。さらに,各エージェントが他のエージェントの出力に挑戦するメッセージやメカニズムをレビューし,修正するための追加エージェントを導入することによって,システムレジリエンスを向上できることを示すことにより,マルチエージェントシステムレジリエンスの向上が期待できることを示す。私たちのコードとデータはhttps://github.com/CUHK-ARISE/MAS-Resilience.comで公開されています。

Multi-agent systems, powered by large language models, have shown great abilities across various tasks due to the collaboration of expert agents, each focusing on a specific domain. However, when agents are deployed separately, there is a risk that malicious users may introduce malicious agents who generate incorrect or irrelevant results that are too stealthy to be identified by other non-specialized agents. Therefore, this paper investigates two essential questions: (1) What is the resilience of various multi-agent system structures (e.g., A$\rightarrow$B$\rightarrow$C, A$\leftrightarrow$B$\leftrightarrow$C) under malicious agents, on different downstream tasks? (2) How can we increase system resilience to defend against malicious agents? To simulate malicious agents, we devise two methods, AutoTransform and AutoInject, to transform any agent into a malicious one while preserving its functional integrity. We run comprehensive experiments on four downstream multi-agent systems tasks, namely code generation, math problems, translation, and text evaluation. Results suggest that the "hierarchical" multi-agent structure, i.e., A$\rightarrow$(B$\leftrightarrow$C), exhibits superior resilience with the lowest performance drop of $23.6\%$, compared to $46.4\%$ and $49.8\%$ of other two structures. Additionally, we show the promise of improving multi-agent system resilience by demonstrating that two defense methods, introducing an additional agent to review and correct messages or mechanisms for each agent to challenge others' outputs, can enhance system resilience. Our code and data are available at https://github.com/CUHK-ARISE/MAS-Resilience.

翻訳日:2024-08-05 14:36:49 公開日:2024-08-02

# 3時間における大規模言語モデルの公平性

Fairness in Large Language Models in Three Hour ( http://arxiv.org/abs/2408.00992v1 )

ライセンス: Link先を確認

Thang Doan Viet, Zichong Wang, Minh Nhat Nguyen, Wenbin Zhang,

(参考訳) 大規模言語モデル (LLMs) は、様々な領域で顕著な成功を収めてきたが、フェアネスの考慮が欠如していることが多い。従来の機械学習の公平さとは異なり、LLMの公正さには独自の背景、分類学、実現技術が含まれる。本チュートリアルは,LLMを紹介する実世界のケーススタディから始まり,それに続くバイアスの原因の分析を通じて,フェアLLMに関する文献の最近の進歩を体系的に概説する。 LLMにおける公平性の概念を考察し、バイアスを評価するための戦略と公正性を促進するために設計されたアルゴリズムを要約する。さらに、ツールキットやデータセットを含むLCMのバイアスを評価するためのリソースがコンパイルされ、この分野における現在の研究課題とオープンな疑問が議論される。リポジトリは \url{https://github.com/LavinWong/Fairness-in-Large-Language-Models} で公開されている。

Large Language Models (LLMs) have demonstrated remarkable success across various domains but often lack fairness considerations, potentially leading to discriminatory outcomes against marginalized populations. Unlike fairness in traditional machine learning, fairness in LLMs involves unique backgrounds, taxonomies, and fulfillment techniques. This tutorial provides a systematic overview of recent advances in the literature concerning fair LLMs, beginning with real-world case studies to introduce LLMs, followed by an analysis of bias causes therein. The concept of fairness in LLMs is then explored, summarizing the strategies for evaluating bias and the algorithms designed to promote fairness. Additionally, resources for assessing bias in LLMs, including toolkits and datasets, are compiled, and current research challenges and open questions in the field are discussed. The repository is available at \url{https://github.com/LavinWong/Fairness-in-Large-Language-Models}.

翻訳日:2024-08-05 14:36:49 公開日:2024-08-02

# ArchCode: 大きな言語モデルでコード生成にソフトウェア要件を組み込む

ArchCode: Incorporating Software Requirements in Code Generation with Large Language Models ( http://arxiv.org/abs/2408.00994v1 )

ライセンス: Link先を確認

Hojae Han, Jaejin Kim, Jaeseok Yoo, Youngwon Lee, Seung-won Hwang,

(参考訳) 本稿では,大規模言語モデル(LLM)のコード生成機能を拡張し,テキスト記述から包括的なソフトウェア要件を自動的に管理することを目的とする。このような要件には、機能的(例えば、入力に対する期待された動作を達成する)と非機能的(例えば、時間/空間のパフォーマンス、堅牢性、保守性)の両方が含まれる。しかし、テキストによる記述は、要求を口頭で表現するか、あるいはその一部を省略することもある。 ARCHCODEは、文脈内学習を利用して記述中の要求を整理し、表現されていない要求を外挿する新しいフレームワークである。 ARCHCODEは所定の記述から要求を生成し、コードスニペットとテストケースを生成するように条件付けする。各テストケースは要件の1つに合わせて調整され、その要件に従って実行結果のコンプライアンスに基づいてコードスニペットのランク付けが可能である。パブリックベンチマークによると、ARCHCODEは機能要件を満たすために拡張され、Pass@kスコアが大幅に改善されている。さらに、コード生成におけるLLMの非機能要件を最初に評価したHumanEval-NFRを紹介し、ARCHCODEがベースライン法よりも優れていることを示す。 ARCHCODEとHumanEval-NFRベンチマークの実装はどちらも一般公開されている。

This paper aims to extend the code generation capability of large language models (LLMs) to automatically manage comprehensive software requirements from given textual descriptions. Such requirements include both functional (i.e. achieving expected behavior for inputs) and non-functional (e.g., time/space performance, robustness, maintainability) requirements. However, textual descriptions can either express requirements verbosely or may even omit some of them. We introduce ARCHCODE, a novel framework that leverages in-context learning to organize requirements observed in descriptions and to extrapolate unexpressed requirements from them. ARCHCODE generates requirements from given descriptions, conditioning them to produce code snippets and test cases. Each test case is tailored to one of the requirements, allowing for the ranking of code snippets based on the compliance of their execution results with the requirements. Public benchmarks show that ARCHCODE enhances to satisfy functional requirements, significantly improving Pass@k scores. Furthermore, we introduce HumanEval-NFR, the first evaluation of LLMs' non-functional requirements in code generation, demonstrating ARCHCODE's superiority over baseline methods. The implementation of ARCHCODE and the HumanEval-NFR benchmark are both publicly accessible.

翻訳日:2024-08-05 14:36:49 公開日:2024-08-02

# インシデントネット:スパースセンシングによる交通事故検出・局所化・深刻度推定

IncidentNet: Traffic Incident Detection, Localization and Severity Estimation with Sparse Sensing ( http://arxiv.org/abs/2408.00996v1 )

ライセンス: Link先を確認

Sai Shashank Peddiraju, Kaustubh Harapanahalli, Edward Andert, Aviral Shrivastava,

(参考訳) 交通事故検出の先行技術は、高いセンサーカバレッジに依存しており、主に、表現能力に制限のある決定木とランダムな森林モデルに基づいており、その結果、精度の高いインシデントを検出できない。本稿では,都市環境における疎設置センサから得られたデータに基づいて学習した深層学習モデルを用いて,交通事故の重大度を分類,ローカライズ,推定するための新しいアプローチであるインシデントネットを提案する。本モデルでは,交通交差点に設置したカメラを用いて収集可能な微視的交通データについて検討する。微視的トラフィックの詳細と交通事故の詳細を同時に提供するデータセットが利用できないため、マクロ的なトラフィックデータと一致する合成微視的トラフィックデータセットを生成する方法も提案する。インシデントネットは交通事故検出率98%を達成し、交通交差点の20%未満のカメラを備えた都市環境では、平均197秒で7%未満の誤報率を達成している。

Prior art in traffic incident detection relies on high sensor coverage and is primarily based on decision-tree and random forest models that have limited representation capacity and, as a result, cannot detect incidents with high accuracy. This paper presents IncidentNet - a novel approach for classifying, localizing, and estimating the severity of traffic incidents using deep learning models trained on data captured from sparsely placed sensors in urban environments. Our model works on microscopic traffic data that can be collected using cameras installed at traffic intersections. Due to the unavailability of datasets that provide microscopic traffic details and traffic incident details simultaneously, we also present a methodology to generate a synthetic microscopic traffic dataset that matches given macroscopic traffic data. IncidentNet achieves a traffic incident detection rate of 98%, with false alarm rates of less than 7% in 197 seconds on average in urban environments with cameras on less than 20% of the traffic intersections.

翻訳日:2024-08-05 14:36:49 公開日:2024-08-02

# 安全制約グリッド環境におけるモデルフリータスク適応のための安全な探索戦略

A Safe Exploration Strategy for Model-free Task Adaptation in Safety-constrained Grid Environments ( http://arxiv.org/abs/2408.00997v1 )

ライセンス: Link先を確認

Erfan Entezami, Mahsa Sahebdel, Dhawal Gupta,

(参考訳) モデルのない強化学習エージェントを訓練するには、エージェントが最適なポリシーを探すのに十分な環境を探索することが必要である。安全に制約された環境では、監督されていない探索または非最適政策を利用することで、エージェントは望ましくない状態に陥り、エージェントと環境の両方にとってコストがかかるか有害な結果をもたらす可能性がある。本稿では,モデルフリーエージェントが安全制約に固執しながら環境と対話できるグリッド環境をナビゲートするための新しい探索フレームワークを提案する。我々のフレームワークには事前学習フェーズが含まれており、その間、エージェントは、観測可能な特徴と環境内の特定の安全制約の両方に基づいて、潜在的に安全でない状態を特定することを学習する。その後、二項分類モデルは、同様のダイナミクスを示す新しい環境において、これらの安全でない状態を予測するために訓練される。この訓練された分類器は、モデルフリーエージェントにランダムな探索や準最適政策を採用する状況を決定する権限を与え、その場合、我々のフレームワークは、危険をもたらす可能性を軽減するために、事前に定義された安全なポリシーに従うようエージェントに促す。ランダムに生成された3つのグリッド環境におけるフレームワークの評価を行い、モデルフリーエージェントが新しいタスクに安全に適応し、新しい環境に対する最適なポリシーを学習する方法を実証した。その結果, 適切な安全ポリシーを定義し, 十分に訓練されたモデルを用いて安全でない状態を検出することにより, モデルフリーエージェントが新たなタスクや環境に適応し, 安全性違反が著しく少ないことが示唆された。

Training a model-free reinforcement learning agent requires allowing the agent to sufficiently explore the environment to search for an optimal policy. In safety-constrained environments, utilizing unsupervised exploration or a non-optimal policy may lead the agent to undesirable states, resulting in outcomes that are potentially costly or hazardous for both the agent and the environment. In this paper, we introduce a new exploration framework for navigating the grid environments that enables model-free agents to interact with the environment while adhering to safety constraints. Our framework includes a pre-training phase, during which the agent learns to identify potentially unsafe states based on both observable features and specified safety constraints in the environment. Subsequently, a binary classification model is trained to predict those unsafe states in new environments that exhibit similar dynamics. This trained classifier empowers model-free agents to determine situations in which employing random exploration or a suboptimal policy may pose safety risks, in which case our framework prompts the agent to follow a predefined safe policy to mitigate the potential for hazardous consequences. We evaluated our framework on three randomly generated grid environments and demonstrated how model-free agents can safely adapt to new tasks and learn optimal policies for new environments. Our results indicate that by defining an appropriate safe policy and utilizing a well-trained model to detect unsafe states, our framework enables a model-free agent to adapt to new tasks and environments with significantly fewer safety violations.

翻訳日:2024-08-05 14:36:49 公開日:2024-08-02

# FBSDiff: 高可制御性テキスト駆動画像変換のための拡散機能のプラグアンドプレイ周波数帯域置換

FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation ( http://arxiv.org/abs/2408.00998v1 )

ライセンス: Link先を確認

Xiang Gao, Jiaying Liu,

(参考訳) 大規模テキスト画像拡散モデルは、生成的AIとマルチモーダル技術の進化における画期的なマイルストーンであり、自然言語のテキストプロンプトに基づいた並外れた画像生成を可能にしている。しかし,このようなモデルの制御性の欠如は,参照画像を利用したテキスト・ツー・イメージの合成制御に注目が集まっている実生活コンテンツ作成の実践的適用性を制限している。参照画像と生成された画像との密接な相関から、この問題は、テキスト駆動の画像から画像への変換という、テキストごとの参照画像を操作(あるいは編集)するタスクと見なすこともできる。本稿では,大規模テキスト・ツー・イメージ(T2I)拡散モデルとイメージ・ツー・イメージ(I2I)パラダイムをプラグ・アンド・プレイ方式で適用し,モデルトレーニングやモデル微調整,オンライン最適化などなしに高品質で多目的なテキスト駆動型I2I翻訳を実現する,新しい,簡潔かつ効率的なアプローチを提案する。基準画像を用いてT2I生成を誘導するため、DCTスペクトル空間における拡散特徴の周波数帯域の異なる多様な誘導因子をモデル化し、その逆サンプリング過程に沿った参照画像に対応する拡散特徴の特定のDCT周波数帯域を動的に置換する新しい周波数帯域置換層を考案する。提案手法は,各周波数帯域のタイプと帯域幅を調整し,テキスト駆動型I2I翻訳を基準画像の導出係数と導出強度の両方で柔軟に実現できることを実証する。広汎な質的,定量的実験により,I2I翻訳の視覚的品質,汎用性,制御性に対するアプローチの優位性を検証した。

Large-scale text-to-image diffusion models have been a revolutionary milestone in the evolution of generative AI and multimodal technology, allowing extraordinary image generation based on natural-language text prompts. However, the issue of lacking controllability of such models restricts their practical applicability for real-life content creation, for which attention has been focused on leveraging a reference image to control text-to-image synthesis. Due to the close correlation between the reference image and the generated image, this problem can also be regarded as the task of manipulating (or editing) the reference image as per the text, namely text-driven image-to-image translation. This paper contributes a novel, concise, and efficient approach that adapts the pre-trained large-scale text-to-image (T2I) diffusion model to the image-to-image (I2I) paradigm in a plug-and-play manner, realizing high-quality and versatile text-driven I2I translation without any model training, model fine-tuning, or online optimization process. To guide T2I generation with a reference image, we propose to model diverse guiding factors with correspondingly different frequency bands of diffusion features in the DCT spectral space, and accordingly devise a novel frequency band substitution layer that dynamically substitutes a certain DCT frequency band of the diffusion features with the corresponding counterpart of the reference image along the reverse sampling process. We demonstrate that our method flexibly enables highly controllable text-driven I2I translation both in the guiding factor and guiding intensity of the reference image, simply by tuning the type and bandwidth of the substituted frequency band, respectively. Extensive qualitative and quantitative experiments verify the superiority of our approach over related methods in I2I translation visual quality, versatility, and controllability.

翻訳日:2024-08-05 14:36:49 公開日:2024-08-02

# Community Cellular Networks Coverage Visualizer

Community Cellular Networks Coverage Visualizer ( http://arxiv.org/abs/2408.00999v1 )

ライセンス: Link先を確認

Chanwut Kittivorawong, Sirapop Theeranantachai, Nussara Tieanklin, Esther Han Beol Jang, Kurtis Heimerl,

(参考訳) コミュニティの携帯電話ネットワークのボランティアや研究者は、現在、各サイトのネットワークに関する情報にアクセスすることはめったにない。これにより、ネットワークのパフォーマンスを評価したり、激怒やダウンタイムを特定したり、現在の場所を示すことさえ困難になる。本稿では、技術者の作業量を削減し、ネットワークの信頼性を図り、信頼を得るためのパフォーマンスダッシュボードであるCommunity Cellular Networks Coverage Visualizerを提案する。このマップは、現在のCCNの各サイトと将来のCCNの総合的および詳細なパフォーマンスをプライバシを重視した実装で表示し、マルチシリーズラインチャートは、ネットワークのオーバータイム機能を提供することを強調している。近くにより強く信頼性の高い信号がある場所を特定するのに役立つだけでなく、ボランティアやエンジニアが新しいサイトをインストールする最適な場所を判断し、ネットワークの障害を素早く特定する上でも重要なツールとなるでしょう。

The community cellular networks volunteers and researchers currently rarely have an access to information about the networks for each site. This makes it difficult for them to evaluate network performance, identify outrages and downtimes, or even to show the current site locations. In this paper, we propose the Community Cellular Networks Coverage Visualizer, a performance dashboard to help reduce the workload of technicians and gain trust from illustrating the reliability of the networks. The map displays the overall and in-depth performance for each current and future CCNs sites with privacy-focused implementation, while the multi-series line chart emphasizes on providing the capability of network overtime. Not only it will help users identify locations that have stronger and reliable signals nearby, but our applicaiton will also be an essential tool for volunteers and engineers to determine the optimal locations to install a new site and quickly identify possible network failures.

翻訳日:2024-08-05 14:36:49 公開日:2024-08-02

# 階層型マルチ指標予測とベイズ決定による適応型2段階クラウドリソーススケーリング

Adaptive Two-Stage Cloud Resource Scaling via Hierarchical Multi-Indicator Forecasting and Bayesian Decision-Making ( http://arxiv.org/abs/2408.01000v1 )

ライセンス: Link先を確認

Yang Luo, Shiyu Wang, Zhemeng Yu, Wei Lu, Xiaofeng Gao, Lintao Ma, Guihai Chen,

(参考訳) 高度な大規模モデルとデータセンターの急速な成長により、クラウドコンピューティングリソースの需要が急増していることは、効率的かつ適応的なリソース割り当ての重要性を浮き彫りにしている。大企業が数千のGPUで大規模なインフラストラクチャをデプロイする中、既存のクラウドプラットフォームは、階層的なインジケータ構造の取得、非ガウス分布のモデリング、不確実性の下での意思決定といった大きな課題のために、依然としてリソース利用の低さに苦慮している。これらの課題に対処するため,適応型階層型アテンションに基づく資源モデリングと意思決定システムであるHRAMONYを提案する。 HARMONYは階層的多指標分布予測と不確実性を考慮したベイズ決定を組み合わせている。複雑なインジケータ間の依存関係を包括的にモデル化し、進化する環境状態に適応可能な正確な予測を可能にする、新しい階層的注意機構を導入している。正規化フローを通してガウス射影を適応非ガウス分布に変換する。重要なことは、HARMONYは適応ベイズ過程における完全な予測分布を利用し、様々な条件下でSLA制約を堅牢に満たしながら資源割り当てを最適化するために不確実性を積極的に取り入れている。 4つの大規模クラウドデータセットにわたる大規模な評価は、HARMONYの最先端のパフォーマンスを示し、9つの確立されたメソッドを著しく上回っている。 1ヶ月にわたる実世界の展開は、HARMONYの実質的な影響を検証し、35,000時間以上のGPUの節約と1K以上のコスト削減を実現し、適応的で不確実性を認識したスケーリングを通じて、その顕著な経済的価値を示した。私たちのコードはhttps://github.com/Floating-LY/HARMONY1.comで利用可能です。

The surging demand for cloud computing resources, driven by the rapid growth of sophisticated large-scale models and data centers, underscores the critical importance of efficient and adaptive resource allocation. As major tech enterprises deploy massive infrastructures with thousands of GPUs, existing cloud platforms still struggle with low resource utilization due to key challenges: capturing hierarchical indicator structures, modeling non-Gaussian distributions, and decision-making under uncertainty. To address these challenges, we propose HRAMONY, an adaptive Hierarchical Attention-based Resource Modeling and Decision-Making System. HARMONY combines hierarchical multi-indicator distribution forecasting and uncertainty-aware Bayesian decision-making. It introduces a novel hierarchical attention mechanism that comprehensively models complex inter-indicator dependencies, enabling accurate predictions that can adapt to evolving environment states. By transforming Gaussian projections into adaptive non-Gaussian distributions via Normalizing Flows. Crucially, HARMONY leverages the full predictive distributions in an adaptive Bayesian process, proactively incorporating uncertainties to optimize resource allocation while robustly meeting SLA constraints under varying conditions. Extensive evaluations across four large-scale cloud datasets demonstrate HARMONY's state-of-the-art performance, significantly outperforming nine established methods. A month-long real-world deployment validated HARMONY's substantial practical impact, realizing over 35,000 GPU hours in savings and translating to $100K+ in cost reduction, showcasing its remarkable economic value through adaptive, uncertainty-aware scaling. Our code is available at https://github.com/Floating-LY/HARMONY1.

翻訳日:2024-08-05 14:36:49 公開日:2024-08-02

# 対数深さの量子キャリアローカヘッドモデュロ$(2^n-1)$アダ

A Logarithmic Depth Quantum Carry-Lookahead Modulo $(2^n-1)$ Adder ( http://arxiv.org/abs/2408.01002v1 )

ライセンス: Link先を確認

Bhaskar Gaur, Edgard Muñoz-Coreas, Himanshu Thapliyal,

(参考訳) 量子コンピューティングは、量子暗号、量子画像処理、最適化など、さまざまな分野で量子アルゴリズムを実装することができるマシンの開発に向けて、大きな進歩を遂げている。モジュロ加算のための量子演算回路の開発は、これらの量子アルゴリズムの実装に不可欠である。フォールトトレラントゲートをベースとした量子回路を用いてノイズやデコヒーレンスエラーを克服することは理想的であるが、現在のノイズ中間スケール量子(NISQ)時代の量子コンピュータは、フォールトトレラント設計に関連する追加の計算コストを処理できない。本研究の目的は,ノイズを低減し,NISQマシン上での量子変調加算回路の実装を容易にする回路深さの最小化である。この研究は、量子キャリーヘッドモジュロ$(2^n - 1)$ adder (QCLMA)を示し、2つのnビット番号を受け取り、その加算をO(log n)深さで行うように設計されている。従来のO(n)深度処理と比較して,提案したQCLMAは深度を低減し,ノイズの忠実度向上に寄与する。エラーのレジリエンスを高めるため、私たちは、現在の作業のチェーンベースのCarryパスとは異なり、ツリー構造に基づくCarryパスの作成にも重点を置いています。我々はQuantum Computer IBM Cairoの実験を行い、提案したQCLMAの性能を既存の作業に対して評価し、Quantum State Fidelity Ratio (QSFR)を定義し、正しい出力をトップ出力に量子化する。既存の作業と比較すると,QCLMAは4キュービット変調加算器のQSFRが47.21%増加し,優れたノイズ忠実度を示す。

Quantum Computing is making significant advancements toward creating machines capable of implementing quantum algorithms in various fields, such as quantum cryptography, quantum image processing, and optimization. The development of quantum arithmetic circuits for modulo addition is vital for implementing these quantum algorithms. While it is ideal to use quantum circuits based on fault-tolerant gates to overcome noise and decoherence errors, the current Noisy Intermediate Scale Quantum (NISQ) era quantum computers cannot handle the additional computational cost associated with fault-tolerant designs. Our research aims to minimize circuit depth, which can reduce noise and facilitate the implementation of quantum modulo addition circuits on NISQ machines. This work presents quantum carry-lookahead modulo $(2^n - 1)$ adder (QCLMA), which is designed to receive two n-bit numbers and perform their addition with an O(log n) depth. Compared to existing work of O(n) depth, our proposed QCLMA reduces the depth and helps increase the noise fidelity. In order to increase error resilience, we also focus on creating a tree structure based Carry path, unlike the chain based Carry path of the current work. We run experiments on Quantum Computer IBM Cairo to evaluate the performance of the proposed QCLMA against the existing work and define Quantum State Fidelity Ratio (QSFR) to quantify the closeness of the correct output to the top output. When compared against existing work, the proposed QCLMA achieves a 47.21% increase in QSFR for 4-qubit modulo addition showcasing its superior noise fidelity.

翻訳日:2024-08-05 14:36:49 公開日:2024-08-02

# Piculet: マルチモーダル大言語モデルのための特別モデルガイドによる幻覚の減少

Piculet: Specialized Models-Guided Hallucination Decrease for MultiModal Large Language Models ( http://arxiv.org/abs/2408.01003v1 )

ライセンス: Link先を確認

Kohou Wang, Xiang Liu, Zhaoxiang Liu, Kai Wang, Shiguo Lian,

(参考訳) MLLM(Multimodal Large Language Models)は、視覚と言語の間のギャップを埋める上で大きな進歩を遂げた。しかし、生成したテキストが画像の内容と一致しないMLLMの幻覚は、引き続き大きな課題である。既存の幻覚に対処する方法は、しばしば命令チューニングに依存しており、特定のデータでモデルを再訓練する必要があるため、MLLMのさらなる利用コストが増大する。本稿では,MLLMの入力表現を向上するために,Piculetという新しいトレーニングフリー手法を提案する。 Piculetは複数の専門モデルを活用して、入力画像から視覚情報の記述を抽出し、これらの記述を元の画像と組み合わせ、MLLMへの入力としてクエリする。その結果,PiculetはMLLMの幻覚を著しく減少させることが明らかとなった。我々の手法は普遍的でありながら、異なるMLLMに容易に拡張できる。

Multimodal Large Language Models (MLLMs) have made significant progress in bridging the gap between visual and language modalities. However, hallucinations in MLLMs, where the generated text does not align with image content, continue to be a major challenge. Existing methods for addressing hallucinations often rely on instruction-tuning, which requires retraining the model with specific data, which increases the cost of utilizing MLLMs further. In this paper, we introduce a novel training-free method, named Piculet, for enhancing the input representation of MLLMs. Piculet leverages multiple specialized models to extract descriptions of visual information from the input image and combine these descriptions with the original image and query as input to the MLLM. We evaluate our method both quantitively and qualitatively, and the results demonstrate that Piculet greatly decreases hallucinations of MLLMs. Our method can be easily extended to different MLLMs while being universal.

翻訳日:2024-08-05 14:36:49 公開日:2024-08-02

# 金融市場予測の強化:因果関係による特徴選択

Enhancing Financial Market Predictions: Causality-Driven Feature Selection ( http://arxiv.org/abs/2408.01005v1 )

ライセンス: Link先を確認

Wenhao Liang, Zhengyang Li, Weitong Chen,

(参考訳) 本稿では、197カ国の経済・金融ニュース記事と株式市場データを統合することで、金融市場分析に革命をもたらすFinSenデータセットを紹介する。データセットの広範なカバレッジは、2007年から2023年までの15年間にわたって、時間的情報とともに、金融市場ニュースに関する16万件の記録を持つ、豊かなグローバルな視点を提供する。本研究は、市場予測精度と信頼性を高めるために、因果検証された感情スコアとLSTMモデルを活用する。 FinSenデータセットを利用することで、革新的なフーカルキャリブレーション損失を導入し、期待キャリブレーションエラー(ECE)をDAN 3モデルで3.34%削減する。これは予測精度を向上するだけでなく、予測確率が最重要である金融セクターにとって重要な、実際の結果と密に確率予測を一致させる。提案手法は,誤解釈のコストが高い信頼に値する財務予測において,感情分析と正確な校正手法を組み合わせることの有効性を示す。 Finsen Dataは[このgithub URL](https://github.com/EagleAdelaide/FinSen_Dataset.git)で見ることができる。

This paper introduces the FinSen dataset that revolutionizes financial market analysis by integrating economic and financial news articles from 197 countries with stock market data. The dataset's extensive coverage spans 15 years from 2007 to 2023 with temporal information, offering a rich, global perspective with 160,000 records on financial market news. Our study leverages causally validated sentiment scores and LSTM models to enhance market forecast accuracy and reliability. Utilizing the FinSen dataset, we introduce an innovative Focal Calibration Loss, reducing Expected Calibration Error (ECE) to 3.34 percent with the DAN 3 model. This not only improves prediction accuracy but also aligns probabilistic forecasts closely with real outcomes, crucial for the financial sector where predicted probability is paramount. Our approach demonstrates the effectiveness of combining sentiment analysis with precise calibration techniques for trustworthy financial forecasting where the cost of misinterpretation can be high. Finsen Data can be found at [this github URL](https://github.com/EagleAdelaide/FinSen_Dataset.git).

翻訳日:2024-08-05 14:26:49 公開日:2024-08-02

# テンソルトレイン低ランク近似(TT-LoRA):加速LDMによるAIの民主化

Tensor Train Low-rank Approximation (TT-LoRA): Democratizing AI with Accelerated LLMs ( http://arxiv.org/abs/2408.01008v1 )

ライセンス: Link先を確認

Afia Anjum, Maksim E. Eren, Ismael Boureima, Boian Alexandrov, Manish Bhattarai,

(参考訳) 近年、Large Language Models (LLM) は、質問応答、感情分析、テキスト要約、機械翻訳など、幅広い自然言語処理(NLP)タスクにおいて顕著な機能を示している。しかし、LLMの複雑さはますます増大し、膨大な計算資源を必要とし、これらのモデルの研究と応用を妨げている。これを解決するために,ローランド近似 (LoRA) やアダプタ (Adapters) などのパラメータ効率のよい微調整戦略が開発されている。その可能性にもかかわらず、これらの方法は圧縮性の限界に直面していることが多い。特に、LoRAは、現代の大規模LLMにおいて、トレーニング可能なパラメータの数の増加とともに、効果的にスケールするのに苦労している。さらに、テンソル列車分解を利用したローランド経済テンソル・トレイン適応(LoRETTA)は、限られた資源を持つ大規模モデルの微調整に必要な圧縮レベルをまだ達成していない。本稿では,TT 分解積分を最適化して LoRETTA を拡張する新しいパラメータ効率細調整 (PEFT) 手法である Tensor Train Low-Rank Approximation (TT-LoRA) を提案する。アダプタと従来のLoRA構造を排除することにより、TT-LoRAは、ダウンストリームタスク性能を損なうことなく、より優れたモデル圧縮を実現し、推論遅延と計算オーバーヘッドを低減させる。我々は、モデル圧縮と性能のトレードオフを強調するベンチマークを確立するために、徹底的なパラメータ探索を行う。以上の結果から,LLMは大規模モデルに匹敵する性能を維持しつつ,資源制約型プラットフォームへの展開を容易化しつつ,大幅な圧縮を図っている。

In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing (NLP) tasks, such as question-answering, sentiment analysis, text summarization, and machine translation. However, the ever-growing complexity of LLMs demands immense computational resources, hindering the broader research and application of these models. To address this, various parameter-efficient fine-tuning strategies, such as Low-Rank Approximation (LoRA) and Adapters, have been developed. Despite their potential, these methods often face limitations in compressibility. Specifically, LoRA struggles to scale effectively with the increasing number of trainable parameters in modern large scale LLMs. Additionally, Low-Rank Economic Tensor-Train Adaptation (LoRETTA), which utilizes tensor train decomposition, has not yet achieved the level of compression necessary for fine-tuning very large scale models with limited resources. This paper introduces Tensor Train Low-Rank Approximation (TT-LoRA), a novel parameter-efficient fine-tuning (PEFT) approach that extends LoRETTA with optimized tensor train (TT) decomposition integration. By eliminating Adapters and traditional LoRA-based structures, TT-LoRA achieves greater model compression without compromising downstream task performance, along with reduced inference latency and computational overhead. We conduct an exhaustive parameter search to establish benchmarks that highlight the trade-off between model compression and performance. Our results demonstrate significant compression of LLMs while maintaining comparable performance to larger models, facilitating their deployment on resource-constraint platforms.

翻訳日:2024-08-05 14:26:49 公開日:2024-08-02

# EIUP: 暗黙の安全でないプロンプトで条件付き非競合概念を根絶するためのトレーニング不要なアプローチ

EIUP: A Training-Free Approach to Erase Non-Compliant Concepts Conditioned on Implicit Unsafe Prompts ( http://arxiv.org/abs/2408.01014v1 )

ライセンス: Link先を確認

Die Chen, Zhiwen Li, Mingyuan Fan, Cen Chen, Wenmeng Zhou, Yaliang Li,

(参考訳) テキストから画像への拡散モデルは様々な概念を学習する能力を示している。しかし、それらは望ましくないアウトプットを発生させ、結果として重大なセキュリティ上の懸念を引き起こす可能性があることに注意する必要がある。特に、Not Safe for Work(NSFW)コンテンツやスタイル著作権違反などの問題に遭遇する可能性がある。画像生成はテキスト上で条件付けされているため、迅速な浄化はコンテンツ安全性の簡単な解決策となる。 LLMのアプローチと同様に、プロンプトを浄化することで安全な出力の生成を制御するためにいくつかの取り組みがなされている。しかし、これらの努力にもかかわらず、有害でないテキストは、暗黙の安全でないプロンプトと呼ばれる非準拠な画像を生成するリスクがある点にも注意が必要である。さらに、既存の作品の中には、モデルウェイトから望ましくない概念を消すためにモデルを微調整するものもある。このタイプのメソッドは、コンセプトが更新されるたびに複数のトレーニングイテレーションを必要とします。これらの課題に対処するために,非準拠概念を消去プロンプトに組み込む,シンプルで効果的なアプローチを提案する。この消去は、画像空間特徴とテキスト埋め込みの融合に積極的に関与する。注意機構により,画像空間における非準拠概念の特徴表現を同定することができる。我々はこれらの特徴を再重み付けし、元の暗黙の安全でないプロンプトに条件付けされた安全でない画像の発生を効果的に抑制する。本手法は,最先端のベースラインと比較して画像の忠実度を高く評価しながら,優れた消去効果を示す。 WARNING: 攻撃的かもしれないモデル出力を含む。

Text-to-image diffusion models have shown the ability to learn a diverse range of concepts. However, it is worth noting that they may also generate undesirable outputs, consequently giving rise to significant security concerns. Specifically, issues such as Not Safe for Work (NSFW) content and potential violations of style copyright may be encountered. Since image generation is conditioned on text, prompt purification serves as a straightforward solution for content safety. Similar to the approach taken by LLM, some efforts have been made to control the generation of safe outputs by purifying prompts. However, it is also important to note that even with these efforts, non-toxic text still carries a risk of generating non-compliant images, which is referred to as implicit unsafe prompts. Furthermore, some existing works fine-tune the models to erase undesired concepts from model weights. This type of method necessitates multiple training iterations whenever the concept is updated, which can be time-consuming and may potentially lead to catastrophic forgetting. To address these challenges, we propose a simple yet effective approach that incorporates non-compliant concepts into an erasure prompt. This erasure prompt proactively participates in the fusion of image spatial features and text embeddings. Through attention mechanisms, our method is capable of identifying feature representations of non-compliant concepts in the image space. We re-weight these features to effectively suppress the generation of unsafe images conditioned on original implicit unsafe prompts. Our method exhibits superior erasure effectiveness while achieving high scores in image fidelity compared to the state-of-the-art baselines. WARNING: This paper contains model outputs that may be offensive.

翻訳日:2024-08-05 14:26:49 公開日:2024-08-02

# IBBトラフィックグラフデータ:ベンチマークと道路交通予測モデル

IBB Traffic Graph Data: Benchmarking and Road Traffic Prediction Model ( http://arxiv.org/abs/2408.01016v1 )

ライセンス: Link先を確認

Eren Olug, Kiymet Kaya, Resul Tugay, Sule Gunduz Oguducu,

(参考訳) 道路交通渋滞予測は、積極的な交通管理を可能にし、郊外での体験を高め、環境への影響を低減し、全体的な安全と効率を向上させるため、インテリジェント交通システムにおいて重要な要素である。特に大都市圏の公共データセットはいくつか存在するが、これらのデータセットは、データ規模(センサ数や道路リンク数など)の不足や、都市、高速道路、データ収集場所といった対象領域の異なる特徴のような外部要因により、現実的なシナリオには適用できない可能性がある。そこで本研究では,これらの制約を緩和し,新たな地理的特徴を持つ文献を充実させるための代替ベンチマークデータセットとして,新しいIBBトラヒックグラフデータセットを提案する。 IBB Traffic Graphデータセットは、2451の異なる場所で収集されたセンサーデータをカバーしている。さらに,機能工学による時間的リンクを強化する新しい道路交通予測モデル,交通ネットワーク内の関連関係を表現するためのGLEEへのノード埋め込み,およびExtraTreesによる交通予測を提案する。その結果,提案モデルはベースラインモデルより一貫して優れており,平均精度は4%向上した。

Road traffic congestion prediction is a crucial component of intelligent transportation systems, since it enables proactive traffic management, enhances suburban experience, reduces environmental impact, and improves overall safety and efficiency. Although there are several public datasets, especially for metropolitan areas, these datasets may not be applicable to practical scenarios due to insufficiency in the scale of data (i.e. number of sensors and road links) and several external factors like different characteristics of the target area such as urban, highways and the data collection location. To address this, this paper introduces a novel IBB Traffic graph dataset as an alternative benchmark dataset to mitigate these limitations and enrich the literature with new geographical characteristics. IBB Traffic graph dataset covers the sensor data collected at 2451 distinct locations. Moreover, we propose a novel Road Traffic Prediction Model that strengthens temporal links through feature engineering, node embedding with GLEE to represent inter-related relationships within the traffic network, and traffic prediction with ExtraTrees. The results indicate that the proposed model consistently outperforms the baseline models, demonstrating an average accuracy improvement of 4%.

翻訳日:2024-08-05 14:26:49 公開日:2024-08-02

# GNN-MolKAN:kanの力とGNNによる分子表現学習の促進

GNN-MolKAN: Harnessing the Power of KAN to Advance Molecular Representation Learning with GNNs ( http://arxiv.org/abs/2408.01018v1 )

ライセンス: Link先を確認

Ruifeng Li,

(参考訳) 分子特性予測と薬物設計には効果的な分子表現学習が不可欠である。しかし、既存のアプローチは、不十分なアノテーションと最適でないアーキテクチャ設計の制限に悩まされている。例えば、グラフニューラルネットワーク(GNN)は過剰なスカッシングに悩まされ、分子内の重要な構造的詳細が失われ、分子表現が損なわれる。本稿では,AI+ScienceのKANアーキテクチャをGNNに統合し,これらの課題に対処する,GNNの新しいクラスであるGNN-MolKANとその拡張型であるGNN-MolKAN+を提案する。さらに,安定性と速度を向上し,標準GNNの性能をさらに向上する先進的な Kan である AdFastKAN (Adaptive FastKAN) を導入する。私たちのアプローチには3つの大きなメリットがあります。 1) 高性能: GNN-MolKAN と GNN-MolKAN+ は優れた予測能力, 見えない足場への堅牢な一般化, 異なる GNN アーキテクチャ間の多目的転送性を示す。 2) 効率性: これらのモデルでは,SOTA(State-of-the-art)の自己管理手法をマッチングあるいは超越しながら,計算時間が少なく,パラメータも少ない。 3) 数ショットの学習能力: GNN-MolKANは、数ショットの学習シナリオにおいて大きなポテンシャルを示し、数ショットのベンチマークで平均6.97%の改善を実現している。全体として、アーキテクチャを6つの分類データセット、6つの回帰データセット、および4つの数ショットの学習データセットで検証し、それらすべてに対して一貫して高い競争力のある結果が得られるようにします。

Effective molecular representation learning is crucial for molecular property prediction and drug design. However, existing approaches struggle with limitations in insufficient annotations and suboptimal architecture design. For instance, Graph Neural Networks (GNNs) suffer from over-squashing, causing the loss of important structural details in molecules, thus impairing molecular representations. In this work, we propose a new class of GNNs, GNN-MolKAN and its augmented variant, GNN-MolKAN+, that integrate the Kolmogorov-Arnold Networks (KAN) architecture from AI + Science into GNNs to address these challenges. Additionally, we introduce Adaptive FastKAN (AdFastKAN), an advanced KAN that offers increased stability and speed, further enhancing the performance of standard GNNs. Notably, our approach holds three key benefits: 1) Superior Performance: GNN-MolKAN and GNN-MolKAN+ demonstrate superior prediction ability, robust generalization to unseen scaffolds, and versatile transferability across different GNN architectures. 2) Efficiency: These models require less computational time and fewer parameters while matching or surpassing the state-of-the-art (SOTA) self-supervised methods. 3) Few-shot Learning Ability: GNN-MolKAN demonstrates great potential in few-shot learning scenarios, achieving an average improvement of 6.97% across few-shot benchmarks. Overall, we validate our architecture on 6 classification datasets, 6 regression datasets, and 4 few-shot learning datasets, consistently achieving highly competitive results across all of them.

翻訳日:2024-08-05 14:26:49 公開日:2024-08-02

# 正負依存制御のためのランダムサブセット分布の一家系

A Family of Distributions of Random Subsets for Controlling Positive and Negative Dependence ( http://arxiv.org/abs/2408.01022v1 )

ライセンス: Link先を確認

Takahiro Kawashima, Hideitsu Hino,

(参考訳) 正と負の依存は、ランダム部分集合の魅力的で反発的な振る舞いを特徴づける基本的な概念である。いくつかの確率モデルは正あるいは負の依存を示すことが知られているが、実践可能な確率モデルでそれらをシームレスに橋渡しすることは困難である。本研究では,行列点過程とボルツマンマシンの一部を含む離散カーネル点過程 (DKPP) を新たに導入する。また, DKPPを用いた確率的演算と推定のための計算手法を開発し, 限界確率と条件確率を計算し, パラメータを学習する。数値実験により, 正負依存の制御性とDKPPの計算方法の有効性が示された。

Positive and negative dependence are fundamental concepts that characterize the attractive and repulsive behavior of random subsets. Although some probabilistic models are known to exhibit positive or negative dependence, it is challenging to seamlessly bridge them with a practicable probabilistic model. In this study, we introduce a new family of distributions, named the discrete kernel point process (DKPP), which includes determinantal point processes and parts of Boltzmann machines. We also develop some computational methods for probabilistic operations and inference with DKPPs, such as calculating marginal and conditional probabilities and learning the parameters. Our numerical experiments demonstrate the controllability of positive and negative dependence and the effectiveness of the computational methods for DKPPs.

翻訳日:2024-08-05 14:26:49 公開日:2024-08-02

# 因果樹から解釈可能な因果樹を蒸留する

Distilling interpretable causal trees from causal forests ( http://arxiv.org/abs/2408.01023v1 )

ライセンス: Link先を確認

Patrick Rehill,

(参考訳) 治療効果のヘテロジニティを推定する機械学習手法は、いくつかのあらかじめ特定された仮説をテストする既存の方法よりも柔軟性が高い。しかし、これらの手法が抱える1つの問題は、複雑な機械学習モデルから洞察を抽出することが難しいことである。条件付き平均治療効果の高次元分布は、正確で個々のレベルの見積もりを与えるが、根底にあるパターンを理解することは困難であり、分析の意味を理解することは困難である。本論文は, 原生林から1本, 解釈可能な因果樹を蒸留する方法である, 希釈因果樹を提案する。これは、特に多くの相関する特徴があるノイズの多いデータや高次元データにおいて、単一の木を抽出する既存の方法とよく比較できる。ここでは、ほとんどのシミュレーションにおいて、基礎となる因果林よりも優れています。その推定値は2倍に頑丈で、因果樹林と同様に漸近的に正常である。

Machine learning methods for estimating treatment effect heterogeneity promise greater flexibility than existing methods that test a few pre-specified hypotheses. However, one problem these methods can have is that it can be challenging to extract insights from complicated machine learning models. A high-dimensional distribution of conditional average treatment effects may give accurate, individual-level estimates, but it can be hard to understand the underlying patterns; hard to know what the implications of the analysis are. This paper proposes the Distilled Causal Tree, a method for distilling a single, interpretable causal tree from a causal forest. This compares well to existing methods of extracting a single tree, particularly in noisy data or high-dimensional data where there are many correlated features. Here it even outperforms the base causal forest in most simulations. Its estimates are doubly robust and asymptotically normal just as those of the causal forest are.

翻訳日:2024-08-05 14:26:49 公開日:2024-08-02

# 身体的指導のためのセマンティック・スキル・グラウンドディング-クロスドメイン環境におけるフォローイング

Semantic Skill Grounding for Embodied Instruction-Following in Cross-Domain Environments ( http://arxiv.org/abs/2408.01024v1 )

ライセンス: Link先を確認

Sangwoo Shin, Seunghyun Kim, Youngsoo Jang, Moontae Lee, Honguk Woo,

(参考訳) EIF(Embodied instruction-following)では、タスクプランナとしての事前訓練言語モデル(LM)の統合が重要なブランチとして現れ、事前訓練されたスキルとユーザ指示でLMに促すことで、スキルレベルでタスクを計画する。しかし、ドメイン固有の知識との複雑な絡み合いから、これらの事前訓練されたスキルを異なるドメインに根ざすことは依然として困難である。この課題に対処するために、セマンティックスキルの階層的性質を活用するセマンティックスキル基盤(セマンティックスキル基盤)フレームワークを提案する。 SemGroはこれらのスキルの幅広い範囲を認識しており、ドメイン間で普遍的に適用可能な短地低セマンティックスキルから、特定のドメインに高度に特化され、調整された長地富裕なセマンティックスキルまで幅広い。フレームワークは、高いレベルのセマンティックスキル階層から始まり、次に下方へ移動して、計画された各スキルをターゲットドメイン内の実行可能なレベルにグラウンドする、反復的なスキル分解アプローチを採用している。そこで本手法では,意味的スキルの合成と分解に,LMの推論能力と,対象ドメインにおけるスキル実現可能性を評価するためのマルチモーダル拡張を利用する。 VirtualHomeベンチマークで行った実験では,300のドメイン横断EIFシナリオにおけるSemGroの有効性が示された。

In embodied instruction-following (EIF), the integration of pretrained language models (LMs) as task planners emerges as a significant branch, where tasks are planned at the skill level by prompting LMs with pretrained skills and user instructions. However, grounding these pretrained skills in different domains remains challenging due to their intricate entanglement with the domain-specific knowledge. To address this challenge, we present a semantic skill grounding (SemGro) framework that leverages the hierarchical nature of semantic skills. SemGro recognizes the broad spectrum of these skills, ranging from short-horizon low-semantic skills that are universally applicable across domains to long-horizon rich-semantic skills that are highly specialized and tailored for particular domains. The framework employs an iterative skill decomposition approach, starting from the higher levels of semantic skill hierarchy and then moving downwards, so as to ground each planned skill to an executable level within the target domain. To do so, we use the reasoning capabilities of LMs for composing and decomposing semantic skills, as well as their multi-modal extension for assessing the skill feasibility in the target domain. Our experiments in the VirtualHome benchmark show the efficacy of SemGro in 300 cross-domain EIF scenarios.

翻訳日:2024-08-05 14:26:49 公開日:2024-08-02

# CALA-$n$:Bloch Sphere Approach, Clifford+T Gates, Layoutsを用いたIBM量子コンピュータ上のコスト効果2-, 3-, 4-, 5ビットゲートを実現する量子ライブラリ

CALA-$n$: A Quantum Library for Realizing Cost-Effective 2-, 3-, 4-, and 5-bit Gates on IBM Quantum Computers using Bloch Sphere Approach, Clifford+T Gates, and Layouts ( http://arxiv.org/abs/2408.01025v1 )

ライセンス: Link先を確認

Ali Al-Bayaty, Xiaoyu Song, Marek Perkowski,

(参考訳) 我々は,Bloch球を用いた費用効率の良い$n$-bitゲートを実現するために,新しい量子レイアウトを意識したアプローチを,2ドル n \le 5$ qubits で導入する。これらの$n$-bitゲートはクリフォード+Tゲートから完全に構成されており、ブロッホ球上で視覚化された回転列を選択するアプローチである。このBloch sphereアプローチは、これらの$n$-bitゲートをIBM量子コンピュータに合成(変換)するための量子レイアウトを確実に一致させる。各種標準$n$-bitゲート(Toffoli,Fredkinなど)と,提案した$n$-bitゲートの動作等価性について,最終量子コストの文脈で検討し,IBMネイティブゲートの最終的な数として評価した。本稿では,すべての$n$-bitゲートが,トランスパイレーション後の標準$n$-bitゲートよりも量子コストが低いことを示す。したがって、我々のBloch sphereアプローチは、IBM量子コンピュータの異なるレイアウトのための費用効率のよい$n$-bitゲートの量子ライブラリを構築するのに利用できる。

We introduce a new quantum layout-aware approach to realize cost-effective $n$-bit gates using the Bloch sphere, for $2 \le n \le 5$ qubits. These $n$-bit gates are entirely constructed from the Clifford+T gates, in the approach of selecting sequences of rotations visualized on the Bloch sphere. This Bloch sphere approach ensures to match the quantum layout for synthesizing (transpiling) these $n$-bit gates into an IBM quantum computer. Various standard $n$-bit gates (Toffoli, Fredkin, etc.) and their operational equivalent of our proposed $n$-bit gates are examined and evaluated, in the context of the final quantum costs, as the final counts of generated IBM native gates. In this paper, we demonstrate that all our $n$-bit gates always have lower quantum costs than those of standard $n$-bit gates after transpilation. Hence, our Bloch sphere approach can be used to build a quantum library of various cost-effective $n$-bit gates for different layouts of IBM quantum computers.

翻訳日:2024-08-05 14:26:49 公開日:2024-08-02

# 医用画像解析のためのPINN

PINNs for Medical Image Analysis: A Survey ( http://arxiv.org/abs/2408.01026v1 )

ライセンス: Link先を確認

Chayan Banerjee, Kien Nguyen, Olivier Salvado, Truyen Tran, Clinton Fookes,

(参考訳) 機械学習フレームワークにおける物理情報の取り込みは、医療画像分析(MIA)の変換である。基本的な知識と物理法則を統合することにより、これらのモデルは強化された堅牢性と解釈可能性を達成する。本研究では,登録,生成,分類,再構築などのMIA(PIMIA)タスクに対する物理インフォームドアプローチの有用性について検討する。本報告では,MIA専用の物理インフォームド手法に関する80以上の論文について,系統的な文献レビューを行う。本稿では,物理知識とプロセスのモデル化,表現方法,MIAモデルへの統合戦略について検討する。画像処理,生成,予測,逆画像(超解像と再構成),登録,画像解析(分離と分類)など,幅広い画像解析タスクを深く掘り下げる。各タスクについて,中心的な物理誘導操作,関心領域(人体解剖学),対応する画像モダリティ,モデルトレーニングに用いるデータセット,深層ネットワークアーキテクチャ,物理過程,方程式,原理を表形式で網羅的に検討し,提示する。さらに、異なるタスクやデータセット間でPIMIAメソッドのパフォーマンスを比較するための新しいメトリクスも導入する。本レビューに基づき,今後の課題,オープンな研究課題,今後の研究の方向性について,私たちの視点を要約し,無視する。 PIMIAにおける重要なオープンな課題として、適切な物理の事前選択や標準化されたベンチマークプラットフォームの構築を挙げる。

The incorporation of physical information in machine learning frameworks is transforming medical image analysis (MIA). By integrating fundamental knowledge and governing physical laws, these models achieve enhanced robustness and interpretability. In this work, we explore the utility of physics-informed approaches for MIA (PIMIA) tasks such as registration, generation, classification, and reconstruction. We present a systematic literature review of over 80 papers on physics-informed methods dedicated to MIA. We propose a unified taxonomy to investigate what physics knowledge and processes are modelled, how they are represented, and the strategies to incorporate them into MIA models. We delve deep into a wide range of image analysis tasks, from imaging, generation, prediction, inverse imaging (super-resolution and reconstruction), registration, and image analysis (segmentation and classification). For each task, we thoroughly examine and present in a tabular format the central physics-guided operation, the region of interest (with respect to human anatomy), the corresponding imaging modality, the dataset used for model training, the deep network architecture employed, and the primary physical process, equation, or principle utilized. Additionally, we also introduce a novel metric to compare the performance of PIMIA methods across different tasks and datasets. Based on this review, we summarize and distil our perspectives on the challenges, open research questions, and directions for future research. We highlight key open challenges in PIMIA, including selecting suitable physics priors and establishing a standardized benchmarking platform.

翻訳日:2024-08-05 14:26:49 公開日:2024-08-02

# POA: すべてのサイズのモデルの事前トレーニング

POA: Pre-training Once for Models of All Sizes ( http://arxiv.org/abs/2408.01031v1 )

ライセンス: Link先を確認

Yingying Zhang, Xin Guo, Jiangwei Lao, Lei Yu, Lixiang Ru, Jian Wang, Guo Ye, Huimei He, Jingdong Chen, Ming Yang,

(参考訳) 大規模自己監督型事前学習は、ひとつの基盤モデルが多くの異なるビジョンタスクを処理するための道を開いた。ほとんどの事前学習手法は、あるサイズの1つのモデルを一度に訓練する。それでも、実際のシナリオにおける様々な計算やストレージの制約は、デプロイするサイズが異なる一連のモデルを開発するためにかなりの努力を必要とする。そこで本研究では,この課題に対処するために,POA(Pre-Treating Once for All)と呼ばれる新しい三枝学習フレームワークを提案する。我々のアプローチは、革新的な弾性的な学生分岐を近代的な自己蒸留パラダイムに導入する。事前学習の各段階において、元の学生からランダムにサブネットワークをサンプリングし、弾性的な学生を形成し、全ての枝を自己蒸留的に訓練する。一度トレーニング済みになると、POAは下流タスクのためのさまざまなサイズの事前トレーニングされたモデルの抽出を可能にする。注目すべきは、弾力性のある学生は、異なる大きさの複数のモデルの同時事前訓練を促進することであり、また、表現学習を強化するために、様々なサイズのモデルの追加のアンサンブルとして機能する。複数の下流タスクに対する線形探索評価と評価を含む大規模な実験は、我々のPOAの有効性と利点を実証している。 ViT、Swin Transformer、ResNetのバックボーンを使用して最先端のパフォーマンスを実現し、単一の事前トレーニングセッションを通じて、100ほどのモデルを生成する。コードは、https://github.com/Qichuzyy/POA.comで入手できる。

Large-scale self-supervised pre-training has paved the way for one foundation model to handle many different vision tasks. Most pre-training methodologies train a single model of a certain size at one time. Nevertheless, various computation or storage constraints in real-world scenarios require substantial efforts to develop a series of models with different sizes to deploy. Thus, in this study, we propose a novel tri-branch self-supervised training framework, termed as POA (Pre-training Once for All), to tackle this aforementioned issue. Our approach introduces an innovative elastic student branch into a modern self-distillation paradigm. At each pre-training step, we randomly sample a sub-network from the original student to form the elastic student and train all branches in a self-distilling fashion. Once pre-trained, POA allows the extraction of pre-trained models of diverse sizes for downstream tasks. Remarkably, the elastic student facilitates the simultaneous pre-training of multiple models with different sizes, which also acts as an additional ensemble of models of various sizes to enhance representation learning. Extensive experiments, including k-nearest neighbors, linear probing evaluation and assessments on multiple downstream tasks demonstrate the effectiveness and advantages of our POA. It achieves state-of-the-art performance using ViT, Swin Transformer and ResNet backbones, producing around a hundred models with different sizes through a single pre-training session. The code is available at: https://github.com/Qichuzyy/POA.

翻訳日:2024-08-05 14:26:49 公開日:2024-08-02

# 動きに基づく動き推定と未知の形状空間デブリの3次元再構成

Structure from Motion-based Motion Estimation and 3D Reconstruction of Unknown Shaped Space Debris ( http://arxiv.org/abs/2408.01035v1 )

ライセンス: Link先を確認

Kentaro Uno, Takehiro Matsuoka, Akiyoshi Uchida, Kazuya Yoshida,

(参考訳) 今後数十年で打ち上げられる宇宙船の数が増えるにつれ、宇宙デブリの問題は極めて重要になっている。持続可能な宇宙利用のために、宇宙ゴミの継続的な除去は人類にとって最も深刻な問題である。軌道上でのデブリ捕獲ミッションの信頼性を最大化するためには、目標の正確な動き推定が不可欠である。宇宙デブリは姿勢と軌道制御能力を失い、その形状は壊れたために不明である。本稿では,入力として2次元画像のみを必要とする限られた資源で未知の形状の空間デブリ運動推定を行うための,動きに基づく構造的アルゴリズムを提案する。次に、未知物体の再構成形状と、被写体とカメラの間の相対ポーズ軌跡を同時に出力し、被写体の動きを推定する。本手法は, 微小重力実験により生成した現実的な画像データセットを用いて, 2次元気流実験ベッドと3次元運動シミュレーションを用いて定量的に検証する。

With the boost in the number of spacecraft launches in the current decades, the space debris problem is daily becoming significantly crucial. For sustainable space utilization, the continuous removal of space debris is the most severe problem for humanity. To maximize the reliability of the debris capture mission in orbit, accurate motion estimation of the target is essential. Space debris has lost its attitude and orbit control capabilities, and its shape is unknown due to the break. This paper proposes the Structure from Motion-based algorithm to perform unknown shaped space debris motion estimation with limited resources, where only 2D images are required as input. The method then outputs the reconstructed shape of the unknown object and the relative pose trajectory between the target and the camera simultaneously, which are exploited to estimate the target's motion. The method is quantitatively validated with the realistic image dataset generated by the microgravity experiment in a 2D air-floating testbed and 3D kinematic simulation.

翻訳日:2024-08-05 14:26:49 公開日:2024-08-02

# パラメタライズド量子回路の解析--表現性と量子ゲートの種類との関係から-

Analysis of Parameterized Quantum Circuits: on The Connection Between Expressibility and Types of Quantum Gates ( http://arxiv.org/abs/2408.01036v1 )

ライセンス: Link先を確認

Yu Liu, Kentaro Baba, Kazuya Kaneko, Naoyuki Takeda, Junpei Koyama, Koyiti Kimura,

(参考訳) 表現性はパラメータ化量子回路(PQC)の重要な要素である。変分量子アルゴリズム(VQA)に基づく量子機械学習(QML)の文脈では、高表現能なPQCと十分な数の量子ビットからなるQMLモデルは任意の連続関数を近似することができる。表現可能性と学習性能の関係やPQCの層数について多くの研究が行われてきたが、表現性とPQC構造との関係は比較的少ない。本稿では、勾配ブースティングツリーモデルとSHAP(SHapley Additive ExPlanations)の値を用いて、PQC内の表現可能性と量子ゲートのタイプとの関係を解析する。解析は19個のPQCトポロジから導出された1,615個のPQCに対して行われ,それぞれ2-18量子ビットと1-5層からなる。分析の結果,高表現能なPQCの設計指針が得られ,CNOTゲート数と注意的バランスを維持しつつ,より多くのRXゲートやRYゲートの統合が示唆された。さらに, この評価は, 従来研究で見られたように, 表現性飽和の新たな証拠となる。

Expressibility is a crucial factor of a Parameterized Quantum Circuit (PQC). In the context of Variational Quantum Algorithms (VQA) based Quantum Machine Learning (QML), a QML model composed of highly expressible PQC and sufficient number of qubits is theoretically capable of approximating any arbitrary continuous function. While much research has explored the relationship between expressibility and learning performance, as well as the number of layers in PQCs, the connection between expressibility and PQC structure has received comparatively less attention. In this paper, we analyze the connection between expressibility and the types of quantum gates within PQCs using a Gradient Boosting Tree model and SHapley Additive exPlanations (SHAP) values. Our analysis is performed on 1,615 instances of PQC derived from 19 PQC topologies, each with 2-18 qubits and 1-5 layers. The findings of our analysis provide guidance for designing highly expressible PQCs, suggesting the integration of more RX or RY gates while maintaining a careful balance with the number of CNOT gates. Furthermore, our evaluation offers an additional evidence of expressibility saturation, as observed by previous studies.

翻訳日:2024-08-05 14:26:49 公開日:2024-08-02

# MambaST:効率的な歩行者検出のためのプラグイン・アンド・プレイ型クロススペクトル時空間フィルタ

MambaST: A Plug-and-Play Cross-Spectral Spatial-Temporal Fuser for Efficient Pedestrian Detection ( http://arxiv.org/abs/2408.01037v1 )

ライセンス: Link先を確認

Xiangbo Gao, Asiegbu Miracle Kanu-Asiegbu, Xiaoxiao Du,

(参考訳) 本稿では,効率的な歩行者検出のためのプラグ・アンド・プレイ型時空間融合パイプラインであるMambaSTを提案する。自動運転車の歩行者検出にはいくつかの課題がある。第一に、暗黒または低照度条件下でRGBカメラを用いて正確な検出を行うことは困難である。サーマルカメラや可視カメラなどの複数のセンサモードからの補完情報を統合して、検出の堅牢性を向上させるために、クロススペクトルシステムを開発する必要がある。第二に、歩行者検出モデルは遅延に敏感である。パラメータが少ない効率的な大規模検出モデルは、自律運転のようなリアルタイムアプリケーションに非常に望ましい。第3に、歩行者映像データは歩行者の動きの空間的時間的相関を提供する。時間的・空間的な情報を取り入れて歩行者検出を強化することは有益である。この研究は、状態空間モデル(Mamba)の最近の進歩を活用し、RGBと熱画像の両方から細粒度と粗粒度の情報を抽出する新しいMHPA(Multi-head Hierarchical Patching and Aggregation)構造を提案する。実験結果から,提案したMHHPAは,横断歩行者検出のためのトランスフォーマーモデルに代わる有効かつ効率的な代替手段であることがわかった。また,提案モデルにより,小規模歩行者検出の性能も向上する。コードはhttps://github.com/XiangboGaoBarry/MambaST}{https://github.com/XiangboGaoBarry/MambaSTで入手できる。

This paper proposes MambaST, a plug-and-play cross-spectral spatial-temporal fusion pipeline for efficient pedestrian detection. Several challenges exist for pedestrian detection in autonomous driving applications. First, it is difficult to perform accurate detection using RGB cameras under dark or low-light conditions. Cross-spectral systems must be developed to integrate complementary information from multiple sensor modalities, such as thermal and visible cameras, to improve the robustness of the detections. Second, pedestrian detection models are latency-sensitive. Efficient and easy-to-scale detection models with fewer parameters are highly desirable for real-time applications such as autonomous driving. Third, pedestrian video data provides spatial-temporal correlations of pedestrian movement. It is beneficial to incorporate temporal as well as spatial information to enhance pedestrian detection. This work leverages recent advances in the state space model (Mamba) and proposes a novel Multi-head Hierarchical Patching and Aggregation (MHHPA) structure to extract both fine-grained and coarse-grained information from both RGB and thermal imagery. Experimental results show that the proposed MHHPA is an effective and efficient alternative to a Transformer model for cross-spectral pedestrian detection. Our proposed model also achieves superior performance on small-scale pedestrian detection. The code is available at https://github.com/XiangboGaoBarry/MambaST}{https://github.com/XiangboGaoBarry/MambaST.

翻訳日:2024-08-05 14:26:49 公開日:2024-08-02

# UNER:ビジュアルリッチドキュメントにおける名前付きエンティティ認識のための統一予測ヘッド

UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents ( http://arxiv.org/abs/2408.01038v1 )

ライセンス: Link先を確認

Yi Tu, Chong Zhang, Ya Guo, Huan Chen, Jinyang Tang, Huijia Zhu, Qi Zhang,

(参考訳) 視覚的にリッチなドキュメント(VrD-NER)における名前付きエンティティの認識は、様々な現実のシナリオやアプリケーションにおいて重要な役割を果たす。しかしながら、VrD-NERの研究は、複雑なドキュメントレイアウト、誤った読み込み順序、不適切なタスク定式化の3つの大きな課題に直面している。これらの課題に対処するため、既存のマルチモーダル文書変換器と協調してより堅牢なVrD-NERモデルを開発するために、クエリ対応エンティティ抽出ヘッドであるUNERを提案する。 UNERヘッドは、VrD-NERタスクをシーケンスラベリングと読み込み順序予測の組み合わせとみなし、文書における不連続なエンティティの問題に効果的に対処する。多様なデータセットの実験的評価は、UNERがエンティティ抽出性能を向上させる効果を示す。さらに、UNERヘッドは、各種VrD-NERデータセットの教師付き事前学習段階を可能とし、文書トランスフォーマーバックボーンを強化し、事前学習段階から微調整段階への実質的な知識伝達を示す。普遍的なレイアウト理解を取り入れることで、事前訓練されたUNERベースのモデルは、少数ショットおよび多言語シナリオにおいて大きな利点を示し、ゼロショットエンティティ抽出能力を示す。

The recognition of named entities in visually-rich documents (VrD-NER) plays a critical role in various real-world scenarios and applications. However, the research in VrD-NER faces three major challenges: complex document layouts, incorrect reading orders, and unsuitable task formulations. To address these challenges, we propose a query-aware entity extraction head, namely UNER, to collaborate with existing multi-modal document transformers to develop more robust VrD-NER models. The UNER head considers the VrD-NER task as a combination of sequence labeling and reading order prediction, effectively addressing the issues of discontinuous entities in documents. Experimental evaluations on diverse datasets demonstrate the effectiveness of UNER in improving entity extraction performance. Moreover, the UNER head enables a supervised pre-training stage on various VrD-NER datasets to enhance the document transformer backbones and exhibits substantial knowledge transfer from the pre-training stage to the fine-tuning stage. By incorporating universal layout understanding, a pre-trained UNER-based model demonstrates significant advantages in few-shot and cross-linguistic scenarios and exhibits zero-shot entity extraction abilities.

翻訳日:2024-08-05 14:26:49 公開日:2024-08-02

# 凍結ガウス近似を用いたInchworm法によるカルデイラ・レゲットモデルの解法

Solving Caldeira-Leggett Model by Inchworm Method with Frozen Gaussian Approximation ( http://arxiv.org/abs/2408.01039v1 )

ライセンス: Link先を確認

Geshuo Wang, Siyao Yang, Zhenning Cai,

(参考訳) 本研究では, 量子粒子と熱調和浴を結合したカルデイラ・レゲットモデルをシミュレートするために, インキワーム法と凍結ガウス近似を組み合わせたアルゴリズムを提案する。特に、還元密度作用素のリアルタイムダイナミクスに関心がある。このアルゴリズムでは、凍結したガウス近似を用いて波動関数を積分形式で近似する。所望の還元密度作用素はダイソン級数として記述され、相互作用系の量子力学における経路積分の級数表現である。ダイソン級数を計算するために、ガウス波束を用いて各項を近似し、インヒワーム法のアイデアを用いて級数の収束を加速する。インチワーム法は、級数を「フルプロパゲータ」の積分微分方程式として定式化し、これら全プロパゲータを用いて右辺の無限級数を書き直し、和の項数を著しく減らし、より高速な収束を実現する。本アルゴリズムの性能は,様々な実験により数値的に検証される。

We propose an algorithm that combines the inchworm method and the frozen Gaussian approximation to simulate the Caldeira-Leggett model in which a quantum particle is coupled with thermal harmonic baths. In particular, we are interested in the real-time dynamics of the reduced density operator. In our algorithm, we use frozen Gaussian approximation to approximate the wave function as a wave packet in integral form. The desired reduced density operator is then written as a Dyson series, which is the series expression of path integrals in quantum mechanics of interacting systems. To compute the Dyson series, we further approximate each term in the series using Gaussian wave packets, and then employ the idea of the inchworm method to accelerate the convergence of the series. The inchworm method formulates the series as an integro-differential equation of "full propagators", and rewrites the infinite series on the right-hand side using these full propagators, so that the number of terms in the sum can be significantly reduced, and faster convergence can be achieved. The performance of our algorithm is verified numerically by various experiments.

翻訳日:2024-08-05 14:17:04 公開日:2024-08-02

# Patch-Wise Random と Noisy CutMix を用いた視覚変換器を用いたプライバシ保護スプリット学習

Privacy-Preserving Split Learning with Vision Transformers using Patch-Wise Random and Noisy CutMix ( http://arxiv.org/abs/2408.01040v1 )

ライセンス: Link先を確認

Seungeun Oh, Sihun Baek, Jihong Park, Hyelin Nam, Praneeth Vepakomma, Ramesh Raskar, Mehdi Bennis, Seong-Lyun Kim,

(参考訳) コンピュータビジョンでは、視覚変換器(ViT)が、精度と堅牢性を改善するために畳み込みニューラルネットワーク(CNN)に取って代わりつつある。しかし、ViTの大きなモデルサイズと高いサンプルの複雑さは、リソース制約のあるエッジデバイスでトレーニングすることを困難にしている。分散学習(SL)は、サーバ側のリソースを活用してViTをトレーニングし、分散デバイスからのプライベートデータを活用する、実行可能なソリューションとして登場した。しかし、SLはデバイスとサーバ間の重み更新のために追加の情報交換を必要としており、これはプライベートトレーニングデータに対する様々な攻撃にさらされる可能性がある。分類タスクにおけるデータ漏洩のリスクを軽減するために,DP-CutMixSLと呼ばれるクライアント間でランダムに選択したスマッシュデータにガウスノイズを注入する新しいプライバシ保護SLフレームワークを提案する。本分析により,DP-CutMixSLは,プログレッシブ・プロポーザルにおけるメンバーシップ・推論攻撃に対するプライバシー保護を強化する,差分プライベート(DP)機構であることが示された。シミュレーションにより、DP-CutMixSLは、DP-SLやDP-MixSLと比較して、メンバーシップ推論攻撃、再構築攻撃、ラベル推論攻撃に対するプライバシー保護を改善し、精度も向上することを示した。

In computer vision, the vision transformer (ViT) has increasingly superseded the convolutional neural network (CNN) for improved accuracy and robustness. However, ViT's large model sizes and high sample complexity make it difficult to train on resource-constrained edge devices. Split learning (SL) emerges as a viable solution, leveraging server-side resources to train ViTs while utilizing private data from distributed devices. However, SL requires additional information exchange for weight updates between the device and the server, which can be exposed to various attacks on private training data. To mitigate the risk of data breaches in classification tasks, inspired from the CutMix regularization, we propose a novel privacy-preserving SL framework that injects Gaussian noise into smashed data and mixes randomly chosen patches of smashed data across clients, coined DP-CutMixSL. Our analysis demonstrates that DP-CutMixSL is a differentially private (DP) mechanism that strengthens privacy protection against membership inference attacks during forward propagation. Through simulations, we show that DP-CutMixSL improves privacy protection against membership inference attacks, reconstruction attacks, and label inference attacks, while also improving accuracy compared to DP-SL and DP-MixSL.

翻訳日:2024-08-05 14:17:04 公開日:2024-08-02

# 線形光学を用いた高閾値のエンコードフュージョンに基づく量子計算

Encoded-Fusion-Based Quantum Computation for High Thresholds with Linear Optics ( http://arxiv.org/abs/2408.01041v1 )

ライセンス: Link先を確認

Wooyeong Song, Nuri Kang, Yong-Su Kim, Seung-Woo Lee,

(参考訳) 本稿では,有限サイズの絡み合った資源状態を持つ測定方式と線形光学を用いた符号化融合方式を提案する。エンコード融合(encoded-fusion)は、量子誤り訂正符号に基づく損失やエラーの存在下での融合成功確率を高めるために考案された絡み合った測定である。一般のショア符号を実装するために線形光学系と能動フィードフォワードで行うことができる符号化融合方式を適用し, 3次元ラスセンドルフ-ハリントン-ゴヤル格子の耐故障ネットワーク構成を構築する。数値シミュレーションにより, 核融合における光子数に制限のある非符号化核融合法よりも最大10倍高い損失閾値が得られることが示された。本手法は,有限サイズの絡み合った資源状態と線形光学を用いて,フォールトトレラント量子コンピューティングへの効率的な経路を舗装する。

We propose a fault-tolerant quantum computation scheme in a measurement-based manner with finite-sized entangled resource states and encoded fusion scheme with linear optics. The encoded-fusion is an entangled measurement devised to enhance the fusion success probability in the presence of losses and errors based on a quantum error-correcting code. We apply an encoded-fusion scheme, which can be performed with linear optics and active feedforwards to implement the generalized Shor code, to construct a fault-tolerant network configuration in a three-dimensional Raussendorf-Harrington-Goyal lattice based on the surface code. Numerical simulations show that our scheme allows us to achieve up to 10 times higher loss thresholds than nonencoded fusion approaches with limited numbers of photons used in fusion. Our scheme paves an efficient route toward fault-tolerant quantum computing with finite-sized entangled resource states and linear optics.

翻訳日:2024-08-05 14:17:04 公開日:2024-08-02

# 共変帯域スカラー場における局所性と絡み合いの収穫

Locality and entanglement harvesting in covariantly bandlimited scalar fields ( http://arxiv.org/abs/2408.01043v1 )

ライセンス: Link先を確認

Nicholas Funai, Nicolas C. Menicucci,

(参考訳) 滑らかな多様体上の量子論における高エネルギーの考察は、一般化された不確実性原理と量子重力シナリオにおける物理的最小長の可能性をもたらした。これらのモデルでは、最小長は単なる数学的ツールではなく物理的な極限であり、ローレンツ不変である。本稿では,同変帯域(最小長)を受けるフィールドにおける2量子通信と絡み合いの収穫について検討し,この帯域幅によって引き起こされる変化について述べる。バンドリミットは、非共変バンドリミットや他の量子光学近似とは異なり、非局所性や因果通信を導入している。また、この共変バンドリミットは、共変カットオフによって修正される仮想粒子の挙動に起因する異常な挙動と時間的・時間的整合性を導入することも観察した。

Considerations of high energies in quantum field theories on smooth manifolds have led to generalized uncertainty principles and the possibility of a physical minimal length in quantum gravitational scenarios. In these models, the minimal length would be a physical limit, not just a mathematical tool, and should be Lorentz invariant. In this paper, we study two-qubit communication and entanglement harvesting in a field subject to a covariant bandlimit (minimum length) and present the changes induced by this bandlimit. We find the bandlimit introduces nonlocality and acausal communication in a manner unlike non-covariant bandlimits or other quantum optical approximations. We also observe that this covariant bandlimit introduces uncertainties in time and temporal ordering with the unusual behavior attributed to the behavior of virtual particles being modified by the covariant cutoff.

翻訳日:2024-08-05 14:17:04 公開日:2024-08-02

# ビジョンファウンデーションモデルによる画素レベルスーパービジョンによる迷路物体の予測

Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model ( http://arxiv.org/abs/2408.01044v1 )

ライセンス: Link先を確認

Yang Jin, Lei Zhang, Shi Yan, Bin Fan, Binglu Wang,

(参考訳) 迷路オブジェクト予測(GOP)は、人間が見ている物体のカテゴリと位置を予測することを目的としている。従来はボックスレベルの監視手法を使用して、人が見ているオブジェクトを特定するが、意味的曖昧さに悩まされていたため、オブジェクトが近接しているため、単一のボックスにはいくつかのアイテムが含まれる可能性がある。ビジョンファウンデーションモデル(VFM)は、ボックスプロンプトを用いてオブジェクトのセグメンテーションを改善し、より正確にオブジェクトを配置することで混乱を低減する。本稿では,人間の視線行動によって捉えた被写体に対応する画素レベルのマスクを推定する,より困難な視線オブジェクトセグメンテーション(GOS)タスクを提案する。特に,VFMによる画素レベルの監視を視線オブジェクトの予測に統合し,意味的曖昧さを軽減することを提案する。これにより、正確なピクセルレベルの予測が可能な視線オブジェクトの検出とセグメンテーションフレームワークが実現される。付加的な頭部入力や頭部特徴の無視を必要とする従来の手法とは異なり,シーン特徴から頭部特徴を自動的に取得し,実世界におけるモデルの推論効率と柔軟性を確保することを提案する。さらに,物体の空間的位置や微妙な細部を見失うような既存の手法のように視線熱マップを予測するための特徴を直接融合させるのではなく,人と物との視線相互作用を容易にする空間対物視線回帰法を開発した。具体的には、まず最初の人間と対象の空間接続を構築し、次にセグメンテーションブランチで意味的に明確な特徴と相互作用し、最終的に正確な位置付けのための視線熱マップを予測することによって、この接続を洗練する。 GOO-SynthおよびGOO-Realデータセットの大規模な実験により,本手法の有効性が示された。

Gaze object prediction (GOP) aims to predict the category and location of the object that a human is looking at. Previous methods utilized box-level supervision to identify the object that a person is looking at, but struggled with semantic ambiguity, ie, a single box may contain several items since objects are close together. The Vision foundation model (VFM) has improved in object segmentation using box prompts, which can reduce confusion by more precisely locating objects, offering advantages for fine-grained prediction of gaze objects. This paper presents a more challenging gaze object segmentation (GOS) task, which involves inferring the pixel-level mask corresponding to the object captured by human gaze behavior. In particular, we propose that the pixel-level supervision provided by VFM can be integrated into gaze object prediction to mitigate semantic ambiguity. This leads to our gaze object detection and segmentation framework that enables accurate pixel-level predictions. Different from previous methods that require additional head input or ignore head features, we propose to automatically obtain head features from scene features to ensure the model's inference efficiency and flexibility in the real world. Moreover, rather than directly fuse features to predict gaze heatmap as in existing methods, which may overlook spatial location and subtle details of the object, we develop a space-to-object gaze regression method to facilitate human-object gaze interaction. Specifically, it first constructs an initial human-object spatial connection, then refines this connection by interacting with semantically clear features in the segmentation branch, ultimately predicting a gaze heatmap for precise localization. Extensive experiments on GOO-Synth and GOO-Real datasets demonstrate the effectiveness of our method.

翻訳日:2024-08-05 14:17:04 公開日:2024-08-02

# QUDSELECT: 議論中の質問に対する選択的デコーディング

QUDSELECT: Selective Decoding for Questions Under Discussion Parsing ( http://arxiv.org/abs/2408.01046v1 )

ライセンス: Link先を確認

Ashima Suvarna, Xiao Liu, Tanmay Parekh, Kai-Wei Chang, Nanyun Peng,

(参考訳) Question Under Examination (QUD) は、暗黙の質問を用いて文間の会話関係を明らかにするための談話フレームワークである。 QUD解析では、各文は、前の文脈でアンカー文によって引き起こされる質問に対する答えと見なされる。結果のQUD構造は、応答整合性(質問がどの程度答えられるか)のようないくつかの理論的基準に適合することが要求され、QUD解析は難しい課題となる。以前の作業はパイプライン方式でQUDパーサを構築する(つまり、トリガー文をコンテキストで検出し、質問を生成する)。しかしながら、これらのパーサーはタスクの全体像を欠き、全ての基準を満たすことはほとんどできない。本稿では,QUD基準を考慮したQUD依存構造を選択的に復号する共同学習フレームワークであるQUDSELECTを紹介する。命令チューニングを用いて、アンカー文を同時に予測し、関連する質問を生成するモデルを訓練する。基準を明示的に組み込むために、推論中に複数のQUD候補をサンプリングし、その後、基準スコアの最良の候補を選択する選択復号戦略を採用する。提案手法は, 最先端のベースラインモデルに対して, 人的評価で9%, 自動評価で4%向上し, フレームワークの有効性を実証する。

Question Under Discussion (QUD) is a discourse framework that uses implicit questions to reveal discourse relationships between sentences. In QUD parsing, each sentence is viewed as an answer to a question triggered by an anchor sentence in prior context. The resulting QUD structure is required to conform to several theoretical criteria like answer compatibility (how well the question is answered), making QUD parsing a challenging task. Previous works construct QUD parsers in a pipelined manner (i.e. detect the trigger sentence in context and then generate the question). However, these parsers lack a holistic view of the task and can hardly satisfy all the criteria. In this work, we introduce QUDSELECT, a joint-training framework that selectively decodes the QUD dependency structures considering the QUD criteria. Using instruction-tuning, we train models to simultaneously predict the anchor sentence and generate the associated question. To explicitly incorporate the criteria, we adopt a selective decoding strategy of sampling multiple QUD candidates during inference, followed by selecting the best one with criteria scorers. Our method outperforms the state-of-the-art baseline models by 9% in human evaluation and 4% in automatic evaluation, demonstrating the effectiveness of our framework.

翻訳日:2024-08-05 14:17:04 公開日:2024-08-02

# ハイパーパラメータが大規模言語モデル推論性能に及ぼす影響:vLLMとHuggingFace Pipelinesの評価

The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines ( http://arxiv.org/abs/2408.01050v1 )

ライセンス: Link先を確認

Matias Martinez,

(参考訳) 最近のオープンソースの大規模言語モデル(LLMs)の急増により、開発者はプライバシやコンプライアンスといった側面のコントロールを維持しながら、AIベースのソリューションを作成し、モデルデプロイメントプロセスのガバナンスとオーナシップを提供することができる。これらのLLMを利用するには、推論エンジンが必要である。これらのエンジンはGPUなどの利用可能なリソースにモデルの重みをロードし、クエリを処理してレスポンスを生成する。 LLMの推論速度や性能は、推論毎に数百万から数十億の浮動小数点演算を計算しているため、リアルタイムアプリケーションには不可欠である。近年、vLLMのような高度な推論エンジンが登場し、効率的なメモリ管理などの新しいメカニズムを取り入れて最先端の性能を実現している。本稿では,2つの推論ライブラリ,vLLMとHugingFaceのパイプラインを用いて,性能,特にスループット(時間単位当たりのトークン)を解析する。開発者が設定しなければならない様々なハイパーパラメータが、推論性能にどのように影響するかを検討する。その結果,スループットのランドスケープは不規則であり,最大性能を実現するためのハイパーパラメータ最適化の重要性が浮き彫りになった。また、推論に使用するGPUモデルをアップグレードまたはダウングレードする際のハイパーパラメータ最適化を適用することで、HuggingFaceパイプラインのスループットを平均9.16%、13.7%向上できることを示す。

The recent surge of open-source large language models (LLMs) enables developers to create AI-based solutions while maintaining control over aspects such as privacy and compliance, thereby providing governance and ownership of the model deployment process. To utilize these LLMs, inference engines are needed. These engines load the model's weights onto available resources, such as GPUs, and process queries to generate responses. The speed of inference, or performance, of the LLM, is critical for real-time applications, as it computes millions or billions of floating point operations per inference. Recently, advanced inference engines such as vLLM have emerged, incorporating novel mechanisms such as efficient memory management to achieve state-of-the-art performance. In this paper, we analyze the performance, particularly the throughput (tokens generated per unit of time), of 20 LLMs using two inference libraries: vLLM and HuggingFace's pipelines. We investigate how various hyperparameters, which developers must configure, influence inference performance. Our results reveal that throughput landscapes are irregular, with distinct peaks, highlighting the importance of hyperparameter optimization to achieve maximum performance. We also show that applying hyperparameter optimization when upgrading or downgrading the GPU model used for inference can improve throughput from HuggingFace pipelines by an average of 9.16% and 13.7%, respectively.

翻訳日:2024-08-05 14:17:04 公開日:2024-08-02

# StemからSternへ - AIバリューチェーンによるテスト可能性

From Stem to Stern: Contestability Along AI Value Chains ( http://arxiv.org/abs/2408.01051v1 )

ライセンス: Link先を確認

Agathe Balayn, Yulu Pi, David Gray Widder, Kars Alfrink, Mireia Yurrita, Sohini Upadhyay, Naveena Karusala, Henrietta Lyons, Cagatay Turkay, Christelle Tessono, Blair Attard-Frost, Ujwal Gadiraju,

(参考訳) このワークショップは、競争可能なAIのトピックに焦点を当てた学際的なCSCW研究者のコミュニティを成長させ、統合する。ワークショップの成果として、AIバリューチェーンに沿った競争可能性に関する最も急進的な機会と課題を、研究ロードマップの形でまとめます。このロードマップは、この分野における差し迫った仕事を形作り、刺激するのに役立ちます。 AIバリューチェーンの長さと深さを考慮すると、このようなチェーンのさまざまな場所に沿ってAIシステムの競争性について、特に議論が引き起こされるだろう。このワークショップは、さまざまな状況において、競争可能なAIを設計、展開するための要件、障害、機会を特定するために、(すべきであろうが)争われた、具体的な、成功した、失敗したAIシステムの対話とデモンストレーションのためのプラットフォームとして機能する。これは主に、ハイブリッドな宿泊施設を備えた個人ワークショップとして開催される。この日は、個々のプレゼンテーションとグループ活動から成り、アイデアを刺激し、競争可能なAIの分野に対する広範なリフレクションを刺激する。我々の目標は、研究者、実践者、利害関係者を集結させ、競争可能なAIの設計と展開を促進することで学際対話を促進することである。

This workshop will grow and consolidate a community of interdisciplinary CSCW researchers focusing on the topic of contestable AI. As an outcome of the workshop, we will synthesize the most pressing opportunities and challenges for contestability along AI value chains in the form of a research roadmap. This roadmap will help shape and inspire imminent work in this field. Considering the length and depth of AI value chains, it will especially spur discussions around the contestability of AI systems along various sites of such chains. The workshop will serve as a platform for dialogue and demonstrations of concrete, successful, and unsuccessful examples of AI systems that (could or should) have been contested, to identify requirements, obstacles, and opportunities for designing and deploying contestable AI in various contexts. This will be held primarily as an in-person workshop, with some hybrid accommodation. The day will consist of individual presentations and group activities to stimulate ideation and inspire broad reflections on the field of contestable AI. Our aim is to facilitate interdisciplinary dialogue by bringing together researchers, practitioners, and stakeholders to foster the design and deployment of contestable AI.

翻訳日:2024-08-05 14:17:04 公開日:2024-08-02

# MILP/MIQCPを用いたシモン様暗号の微分線形識別器の自動探索の高速化

Enhancing the MILP/MIQCP-based Automatic Search for Differential-Linear Distinguishers of Simon-Like Ciphers ( http://arxiv.org/abs/2408.01052v1 )

ライセンス: Link先を確認

Siwei Chen, Zejun Xiang, Xiangyong Zeng, Guangxue Qin,

(参考訳) 本稿では,Simon および Simeck ブロック暗号系の全メンバに対して,より優れた差分線形(DL)判別器を自動で見つけるために,MILP/MIQCP(MILP/MIQCP)に基づく改良手法を提案する。具体的には、まず、線形部分を記述するための完全に正確なMILPモデルを与え、また、 \textsf{Gurobi} ソルバの一般式を用いて、中間部分に対する連続差の伝播を非常に簡単な方法でモデル化する方法を説明する。第二に、MILP/MIQCPモデルを妥当な時間で解くために、探索過程を高速化する分割・畳み込みのアイデアに基づく2つのヒューリスティック戦略を提案する。第3に,DL軌跡のクラスタリング効果を利用した変換手法を導入し,DL近似の相関性を推定する。本手法をSimonおよびSimeckブロック暗号系に適用する。その結果,Simon32/48/64/96の1ラウンド,Simon64は2ラウンドで,14/17/21/26ラウンドのSim32/48/64/96のDL差分器が得られた。 Simeck氏にとって、現在最高の結果よりも長い区別器を探索するわけではないが、Zhou et al(MILP/MIQCPを用いたSimonライクな暗号のDL区別器の発見を自動化する最初の作業)の結果をすべて更新する。さらに,シモン32/シメック32とシモン48/シメック48で,これらの判別器の正当性を検証する実験を行った。その結果, 相関関係の理論的推定は実験値に非常に近いことが示され, 提案手法の有効性を裏付ける具体的な支援とみなすことができる。

In this paper, we propose an improved method based on Mixed-Integer Linear Programming/Mixed-Integer Quadratic Constraint Programming (MILP/MIQCP) to automatically find better differential-linear (DL) distinguishers for the all members of Simon and Simeck block cipher families. To be specific, we first give the completely precise MILP model to describe the linear part, and explain how to utilize the general expressions of \textsf{Gurobi} solver to model the propagation of continuous difference for the middle part in a quite easy way. Secondly, in order to solve the MILP/MIQCP model in a reasonable time, we propose two heuristic strategies based on the divide-and-conquer idea to speed up the search process. Thirdly, we introduce the transforming technique, which exploits the clustering effect on DL trails, to improve the estimated correlation of the DL approximation. We apply our method to Simon and Simeck block cipher families. Consequently, we find the 14/17/21/26-round theoretical DL distinguishers of Simon32/48/64/96, which extend the previous longest ones of Simon32/48/96 by one round and Simon64 by two rounds, respectively. For Simeck, we do not explore longer distinguishers compared to the currently best results, but refresh all the results of Zhou et al. (the first work to automate finding DL distinguishers for Simon-like ciphers using MILP/MIQCP). Besides, in order to validate the correctness of these distinguishers, the experimental verifications are conducted on Simon32/Simeck32 and Simon48/Simeck48. The results show that our theoretical estimations on correlations are very close to the experimental ones, which can be regarded as a concrete support for the effectiveness of our method.

翻訳日:2024-08-05 14:17:04 公開日:2024-08-02

# 実行時エラーハンドラとしてのLLM: ソフトウェアシステムの適応的自己修復のための実証パス

LLM as Runtime Error Handler: A Promising Pathway to Adaptive Self-Healing of Software Systems ( http://arxiv.org/abs/2408.01055v1 )

ライセンス: Link先を確認

Zhensu Sun, Haotian Zhu, Bowen Xu, Xiaoning Du, Li Li, David Lo,

(参考訳) 事前に定義されたハンドラが欠如している予期しない実行時エラーは、突然実行を終了させ、データ損失やシステムクラッシュなどの重大な結果をもたらす可能性がある。開発段階で潜在的なエラーを特定するための大規模な努力にもかかわらず、そのような予期せぬエラーを完全に排除することは依然として困難であり、ランタイムの緩和測定は影響を最小限に抑えるために依然として不可欠である。既存のハンドラを再利用するなど自動自己修復技術は,実行終了に伴う損失を軽減するために研究されている。しかし、既存のメソッドのユーザビリティは、事前に定義されたヒューリスティックなルールによって維持され、様々なランタイムエラーを適応的に処理することができない。近年,Large Language Models (LLMs) の出現により,この問題に対処するための新たな道が開かれた。コードの理解と生成において顕著な能力に着想を得て,LLMを用いてリアルタイムに実行時のエラーに対処することを提案する。具体的には、ランタイムエラーを処理するための最初のLCM支援セルフヒーリングフレームワークであるHealerを提案する。未処理のランタイムエラーが発生した場合、Healerは内部LCMの助けを借りてエラー処理コードを生成するためにアクティベートされ、フレームワークが所有するランタイム環境内でコードが実行され、プログラムの実行を継続する修正プログラム状態を取得する。我々は,4つの異なるコードベンチマークと3つの最先端LCM,GPT-3.5,GPT-4,CodeQwen-7Bを用いて,Healerの性能を評価する。その結果、微調整の必要なく、GPT-4は72.8%のランタイムエラーからプログラムをリカバリするのに役立ち、実行時エラーを処理するLCMの可能性を強調している。

Unanticipated runtime errors, lacking predefined handlers, can abruptly terminate execution and lead to severe consequences, such as data loss or system crashes. Despite extensive efforts to identify potential errors during the development phase, such unanticipated errors remain a challenge to to be entirely eliminated, making the runtime mitigation measurements still indispensable to minimize their impact. Automated self-healing techniques, such as reusing existing handlers, have been investigated to reduce the loss coming through with the execution termination. However, the usability of existing methods is retained by their predefined heuristic rules and they fail to handle diverse runtime errors adaptively. Recently, the advent of Large Language Models (LLMs) has opened new avenues for addressing this problem. Inspired by their remarkable capabilities in understanding and generating code, we propose to deal with the runtime errors in a real-time manner using LLMs. Specifically, we propose Healer, the first LLM-assisted self-healing framework for handling runtime errors. When an unhandled runtime error occurs, Healer will be activated to generate a piece of error-handling code with the help of its internal LLM and the code will be executed inside the runtime environment owned by the framework to obtain a rectified program state from which the program should continue its execution. Our exploratory study evaluates the performance of Healer using four different code benchmarks and three state-of-the-art LLMs, GPT-3.5, GPT-4, and CodeQwen-7B. Results show that, without the need for any fine-tuning, GPT-4 can successfully help programs recover from 72.8% of runtime errors, highlighting the potential of LLMs in handling runtime errors.

翻訳日:2024-08-05 14:17:04 公開日:2024-08-02

# 二次ボソニック系のボソニックホール

Bosonic Holes in Quadratic Bosonic Systems ( http://arxiv.org/abs/2408.01059v1 )

ライセンス: Link先を確認

Jia-Ming Hu, Bo Wang, Ze-Liang Xiang,

(参考訳) 電子孔の概念は凝縮物質物理学において重要な役割を果たす。ここでは、二次ボソニック系において負の粒子励起を示すボソニックホールの概念を開発する。電子孔とは異なり、ボゾン孔のフォック状態は生物直交であり、それらの励起は平均場背景を持つ平均粒子数から粒子を除去するものとして解釈することができる。さらに、非ユニタリおよび局所粒子孔変換に関連する二次ボソニックハミルトニアンは、異なる空間における同じ局所構造とスペクトル特性を持ち、PH双対性を反映している。そこで本研究では,2つのモードの場合におけるPHエンタングルメントの発生と3つのモードの場合におけるPHアハロノフ・ボーム効果について検討した。本研究は粒子非保存系および非エルミート系における異常な物理現象の理解と探索を行う新しい方法を提供する。

The concept of electron holes plays a significant role in condensed matter physics. Here we develop the concept of bosonic holes, which exhibit negative particle excitations, in quadratic bosonic systems. Unlike electron holes, the Fock states of bosonic holes are biorthogonal, and their excitation can be interpreted as removing particles from a mean-particle number with a mean field background. Furthermore, we find that quadratic bosonic Hamiltonians related by non unitary and local particle hole transformation possess the same locality structure and spectral properties in different spaces, reflecting the PH duality. Based on this, we study the generation of PH entanglement in two mode cases and the PH Aharonov Bohm effect in the three mode case, which results in a PH chiral flow with time reversal symmetry breaking. Our findings provide a new way to understand and explore unusual physical phenomena in particle non conserving and non Hermitian systems.

翻訳日:2024-08-05 14:17:04 公開日:2024-08-02

# 二次状態におけるカーネルランダム行列の普遍性とカーネル回帰

Universality of kernel random matrices and kernel regression in the quadratic regime ( http://arxiv.org/abs/2408.01062v1 )

ライセンス: Link先を確認

Parthe Pandit, Zhichao Wang, Yizhe Zhu,

(参考訳) カーネルリッジ回帰(KRR)は機械学習モデルの一般的なクラスであり、ディープラーニングを理解するための重要なツールとなっている。ここでは、$n$はトレーニングサンプルの数、$d$はデータセットの次元である。この状態において、データ分布の一定の条件下では、KRRに関わるカーネルランダム行列は、線形カーネルと同様の振舞いを示す。本研究では、カーネル回帰の研究を2次漸近状態に拡張し、$n \asymp d^2$とする。本研究では,内積核の幅広いクラスが二次核と同様の挙動を示すことを示す。具体的には、元のカーネル乱数行列と二次カーネル乱数行列との差に対する作用素ノルム近似を、カーネル関数のテイラー展開と比較して追加の補正項で確立する。この近似は、ガウスモーメントマッチング仮定の下での一般データ分布と共分散構造に作用する。この新たな近似を用いて、元のカーネル行列のスペクトル分布を制限し、$n/d^2$が非ゼロ定数に収束した場合の二次状態におけるKRRの正確な漸近的トレーニングと一般化誤差を特徴づける。一般化誤差は、決定論的およびランダムな教師モデルの両方に対して得られる。我々の証明手法はモーメント法, ウィックの公式, 直交多項式, および相関成分を持つランダム行列の分解能解析を組み合わせている。

Kernel ridge regression (KRR) is a popular class of machine learning models that has become an important tool for understanding deep learning. Much of the focus has been on studying the proportional asymptotic regime, $n \asymp d$, where $n$ is the number of training samples and $d$ is the dimension of the dataset. In this regime, under certain conditions on the data distribution, the kernel random matrix involved in KRR exhibits behavior akin to that of a linear kernel. In this work, we extend the study of kernel regression to the quadratic asymptotic regime, where $n \asymp d^2$. In this regime, we demonstrate that a broad class of inner-product kernels exhibit behavior similar to a quadratic kernel. Specifically, we establish an operator norm approximation bound for the difference between the original kernel random matrix and a quadratic kernel random matrix with additional correction terms compared to the Taylor expansion of the kernel functions. The approximation works for general data distributions under a Gaussian-moment-matching assumption with a covariance structure. This new approximation is utilized to obtain a limiting spectral distribution of the original kernel matrix and characterize the precise asymptotic training and generalization errors for KRR in the quadratic regime when $n/d^2$ converges to a non-zero constant. The generalization errors are obtained for both deterministic and random teacher models. Our proof techniques combine moment methods, Wick's formula, orthogonal polynomials, and resolvent analysis of random matrices with correlated entries.

翻訳日:2024-08-05 14:17:04 公開日:2024-08-02

# モバイルアプリレビュー機能抽出のための大規模言語モデルの活用

Leveraging Large Language Models for Mobile App Review Feature Extraction ( http://arxiv.org/abs/2408.01063v1 )

ライセンス: Link先を確認

Quim Motger, Alessio Miaschi, Felice Dell'Orletta, Xavier Franch, Jordi Marco,

(参考訳) モバイルアプリレビュー分析では,ユーザ生成ドキュメントの低品質,主観的バイアス,ノイズのある内容など,ユニークな課題が提示される。これらのレビューから特徴を抽出することは、機能の優先順位付けや感情分析といったタスクには不可欠ですが、それでも難しい作業です。一方、Transformerアーキテクチャに基づくエンコーダのみのモデルでは、複数のソフトウェアエンジニアリングプロセスの分類と情報抽出タスクに有望な結果が示されている。本研究では,エンコーダのみの大規模言語モデルがモバイルアプリレビューから特徴抽出を促進できるという仮説を考察する。クラウドソーシングされたアノテーションを産業的文脈から活用することにより、特徴抽出を教師付きトークン分類タスクとして再定義する。我々のアプローチは、コンテキスト理解を改善するためにユーザーレビューの膨大なコーパスでこれらのモデルの事前学習を拡張し、モデル微調整を最適化するためにインスタンス選択技術を採用することである。実験により,抽出した特徴の精度とリコールが向上し,性能効率が向上することが確認された。主なコントリビューションには、特徴抽出に対する新しいアプローチ、注釈付きデータセット、拡張事前訓練されたモデル、コスト効率の良い微調整のためのインスタンス選択メカニズムなどがある。本研究は,モバイルアプリレビューにおける自然言語処理タスクに大規模言語モデルを適用するための実践的手法と実証的エビデンスを提供し,特徴抽出の性能向上を提供する。

Mobile app review analysis presents unique challenges due to the low quality, subjective bias, and noisy content of user-generated documents. Extracting features from these reviews is essential for tasks such as feature prioritization and sentiment analysis, but it remains a challenging task. Meanwhile, encoder-only models based on the Transformer architecture have shown promising results for classification and information extraction tasks for multiple software engineering processes. This study explores the hypothesis that encoder-only large language models can enhance feature extraction from mobile app reviews. By leveraging crowdsourced annotations from an industrial context, we redefine feature extraction as a supervised token classification task. Our approach includes extending the pre-training of these models with a large corpus of user reviews to improve contextual understanding and employing instance selection techniques to optimize model fine-tuning. Empirical evaluations demonstrate that this method improves the precision and recall of extracted features and enhances performance efficiency. Key contributions include a novel approach to feature extraction, annotated datasets, extended pre-trained models, and an instance selection mechanism for cost-effective fine-tuning. This research provides practical methods and empirical evidence in applying large language models to natural language processing tasks within mobile app reviews, offering improved performance in feature extraction.

翻訳日:2024-08-05 14:17:04 公開日:2024-08-02

# 腹腔鏡下手術用ビデオ機器のアモーダルセグメンテーション

Amodal Segmentation for Laparoscopic Surgery Video Instruments ( http://arxiv.org/abs/2408.01067v1 )

ライセンス: Link先を確認

Ruohua Shi, Zhaochen Liu, Lingyu Duan, Tingting Jiang,

(参考訳) 手術器具の分節化は, 外科的パフォーマンスの向上と患者の安全確保に不可欠である。バイナリ、セマンティック、インスタンスセグメンテーションといった従来のテクニックは共通の欠点を共有している。これらの閉塞楽器の完全な範囲を正確に予測することは、手術中に重要なガイダンスを提供し、潜在的な外科的誤りの分析を支援し、教育目的に役立てることによって、腹腔鏡下手術を著しく改善することができる。本稿では,医療分野における外科器具の領域にアモーダルセグメンテーションを導入する。このテクニックは、オブジェクトの可視部と隠蔽部の両方を識別する。これを実現するために、2017 MICCAI EndoVis Robotic Instrument Segmentation Challengeデータセットを用いて、各機器に完全なマスクを付加することで、新しいAmoal Instruments Segmentation(AIS)データセットを導入する。さらに、この新たなデータセットのベンチマークを確立するために、いくつかの主要なアモーダルセグメンテーション手法を評価する。

Segmentation of surgical instruments is crucial for enhancing surgeon performance and ensuring patient safety. Conventional techniques such as binary, semantic, and instance segmentation share a common drawback: they do not accommodate the parts of instruments obscured by tissues or other instruments. Precisely predicting the full extent of these occluded instruments can significantly improve laparoscopic surgeries by providing critical guidance during operations and assisting in the analysis of potential surgical errors, as well as serving educational purposes. In this paper, we introduce Amodal Segmentation to the realm of surgical instruments in the medical field. This technique identifies both the visible and occluded parts of an object. To achieve this, we introduce a new Amoal Instruments Segmentation (AIS) dataset, which was developed by reannotating each instrument with its complete mask, utilizing the 2017 MICCAI EndoVis Robotic Instrument Segmentation Challenge dataset. Additionally, we evaluate several leading amodal segmentation methods to establish a benchmark for this new dataset.

翻訳日:2024-08-05 14:17:04 公開日:2024-08-02

# シリコンおよび金基板上の遷移金属ジアルコゲナイドナノアンテナと結合した単層半導体における単一光子放出体

Single photon emitters in monolayer semiconductors coupled to transition metal dichalcogenide nanoantennas on silica and gold substrates ( http://arxiv.org/abs/2408.01070v1 )

ライセンス: Link先を確認

Panaiot G. Zotev, Sam A. Randerson, Xuerong Hu, Yue Wang, Alexander I. Tartakovskii,

(参考訳) 遷移金属ジアルコゲナイド(TMD)単光子エミッタ(SPE)は、高い単一光子純度や決定論的位置決めなどの量子情報応用に多くの利点をもたらす。誘電体Mie共振器によって誘導されるホスト単分子膜のひずみは、その形成を光学特性をより制御するために、近接場フォトニックホットスポットと共配置された位置に局在させることが知られている。しかし、シリコンやガリウムホスプヒド(GaP)のようなナノ共振器の製造に用いられる伝統的な材料は、しばしば高屈折率の基板を必要とするため、発光光の損失と光強度の制限が生じる。そこで我々は,多層TMDから作製したナノアンテナ(NA)を用いて,接着ファンデルワールス力による基板選択による完全な柔軟性を実現し,高い屈折率コントラストや高反射率金属表面の利用を可能にした。 SiO$_2$およびAu基板上のWS$_2$NAsに移動したWSe$_2$単分子膜におけるSPEの局在化を実証し,光機能強化と単一光子コレクションの増大を可能にした。 SiO$_2$(Au)基板上で、WS$_2$NAs上でSPEの平均値が43%(7%)に達する量子効率(QE)が向上する証拠を提供する。さらに、誘電体基板と金属基板の両方で得られる利点を組み合わせて、最大WSe$2$単一光子励起、放出、収集のために最適化されたNA幾何を数値的にシミュレートする。したがって、蛍光は真空に比べて4桁以上、平坦なSiO$_2$/Si表面に比べて5桁以上増強される。本研究は, 種々の基板上にTMD材料ナノ共振器を用いることにより, SPE形成とフォトニック増強に有効であることを示す。

Transition metal dichalcogenide (TMD) single photon emitters (SPEs) offer numerous advantages to quantum information applications, such as high single photon purity and deterministic positioning. Strain in the host monolayer, induced by underlying dielectric Mie resonators, is known to localize their formation to positions co-located with near-field photonic hotspots providing further control over their optical properties. However, traditional materials used for the fabrication of nanoresonators, such as silicon or gallium phosphide (GaP), often require a high refractive index substrate resulting in losses of the emitted light and limited photonic enhancement. Here, we use nanoantennas (NAs) fabricated from multilayer TMDs, which allow complete flexibility with the choice of substrate due to the adhesive van der Waals forces, enabling high refractive index contrast or the use of highly reflective metallic surfaces. We demonstrate the localized formation of SPEs in WSe$_2$ monolayers transferred onto WS$_2$ NAs on both SiO$_2$ and Au substrates, enabling strong photonic enhancements and increased single photon collection. We provide evidence for enhanced quantum efficiencies (QE) reaching an average value of 43% (7%) for SPEs on WS$_2$ NAs on a SiO$_2$ (Au) substrate. We further combine the advantages offered by both dielectric and metallic substrates to numerically simulate an optimized NA geometry for maximum WSe$_2$ single photon excitation, emission, collection. Thus, the fluorescence is enhanced by a factor of over 4 orders of magnitude compared to vacuum and 5 orders of magnitude compared to a flat SiO$_2$/Si surface. Our work showcases the advantages offered by employing TMD material nanoresonators on various substrates for SPE formation and photonic enhancement.

翻訳日:2024-08-05 14:17:04 公開日:2024-08-02

# 強化学習における自己表現法の検討

A Survey on Self-play Methods in Reinforcement Learning ( http://arxiv.org/abs/2408.01072v1 )

ライセンス: Link先を確認

Ruize Zhang, Zelai Xu, Chengdong Ma, Chao Yu, Wei-Wei Tu, Shiyu Huang, Deheng Ye, Wenbo Ding, Yaodong Yang, Yu Wang,

(参考訳) エージェントとコピーや過去のバージョンとの相互作用を特徴とするセルフプレイは、近年、強化学習において注目されている。本稿ではまず,マルチエージェント強化学習フレームワークやゲーム理論の基本概念を含む,セルフプレイの予備的概念を明らかにする。そして、統合されたフレームワークを提供し、このフレームワーク内の既存のセルフプレイアルゴリズムを分類する。さらに,本論文は,異なるシナリオにおける自己表現の役割を具現化することによって,アルゴリズムと実践的意味のギャップを埋めるものである。最後に、この調査はオープンな課題と、セルフプレイにおける今後の研究方向性を強調している。本稿は,RLにおける自己表現の多面的景観を理解するためのガイドマップである。

Self-play, characterized by agents' interactions with copies or past versions of itself, has recently gained prominence in reinforcement learning. This paper first clarifies the preliminaries of self-play, including the multi-agent reinforcement learning framework and basic game theory concepts. Then it provides a unified framework and classifies existing self-play algorithms within this framework. Moreover, the paper bridges the gap between the algorithms and their practical implications by illustrating the role of self-play in different scenarios. Finally, the survey highlights open challenges and future research directions in self-play. This paper is an essential guide map for understanding the multifaceted landscape of self-play in RL.

翻訳日:2024-08-05 14:07:18 公開日:2024-08-02

# EAP-AIAS: 学術目的のための英語のためのAIアセスメント尺度の適用

The EAP-AIAS: Adapting the AI Assessment Scale for English for Academic Purposes ( http://arxiv.org/abs/2408.01075v1 )

ライセンス: Link先を確認

Jasper Roe, Mike Perkins, Yulia Tregubova,

(参考訳) ジェネレーティブ・人工知能(GenAI)の急速な進歩は、学術目的のための英語教育の機会と課題の両方を提示する。本稿では,AIA-AIAS(AIA-AIAS)と呼ばれる,EAPコンテキストに適したAIアセスメント尺度(AIAS)の適応を提案する。このフレームワークは,学術的完全性を維持しつつ,言語開発を支援するとともに,GenAIツールをEAP評価プラクティスに統合するための構造化されたアプローチを提供することを目的としている。 EAP-AIASは、"No AI"から"Full AI"までの5つのレベルで構成されている。言語学習者の独特なニーズと、EAPが言語習熟度とアカデミック・アカデミック・アカデミック・アカルチュレーションに注力していることを考えると、この適応の背景にある理論的根拠について論じる。本稿では,タスクやプレゼンテーション,研究プロジェクトなど,さまざまなEAP評価タイプにまたがるEAP-AIASの適用可能性について検討する。柔軟なフレームワークを提供することにより、EAP-AIASは、教育におけるGenAI統合の複雑さに対処し、AIに強化された学術的および専門的な未来のために学生を準備するEAP実践者に力を与えようとしている。この適応は、言語教育における倫理的かつ教育的に健全なAI統合の必要性に対処するためのステップである。

The rapid advancement of Generative Artificial Intelligence (GenAI) presents both opportunities and challenges for English for Academic Purposes (EAP) instruction. This paper proposes an adaptation of the AI Assessment Scale (AIAS) specifically tailored for EAP contexts, termed the EAP-AIAS. This framework aims to provide a structured approach for integrating GenAI tools into EAP assessment practices while maintaining academic integrity and supporting language development. The EAP-AIAS consists of five levels, ranging from "No AI" to "Full AI", each delineating appropriate GenAI usage in EAP tasks. We discuss the rationale behind this adaptation, considering the unique needs of language learners and the dual focus of EAP on language proficiency and academic acculturation. This paper explores potential applications of the EAP-AIAS across various EAP assessment types, including writing tasks, presentations, and research projects. By offering a flexible framework, the EAP-AIAS seeks to empower EAP practitioners seeking to deal with the complexities of GenAI integration in education and prepare students for an AI-enhanced academic and professional future. This adaptation represents a step towards addressing the pressing need for ethical and pedagogically sound AI integration in language education.

翻訳日:2024-08-05 14:07:18 公開日:2024-08-02

# 継続学習のための事前学習型テキストエンコーダのセマンティック知識の活用

Exploiting the Semantic Knowledge of Pre-trained Text-Encoders for Continual Learning ( http://arxiv.org/abs/2408.01076v1 )

ライセンス: Link先を確認

Lu Yu, Zhe Tao, Hantao Yao, Joost Van de Weijer, Changsheng Xu,

(参考訳) ディープニューラルネットワーク(DNN)は、固定データセットに排他的だが、現実のシナリオでは、インクリメンタルなデータシフトに苦労する。継続学習は、モデルが学習した知識を維持しながら、新しいデータから学習できるようにすることによって、この課題に対処する。既存の手法は主に視覚的特徴に依存しており、しばしばテキストで符号化されたリッチな意味情報を無視する。画像のラベル情報で利用できるセマンティック知識は、以前に取得したセマンティッククラスの知識と関連する重要なセマンティック情報を提供する。その結果、継続的な学習を通じてこの情報を効果的に活用することは有益であることが期待される。そこで本研究では,テキスト埋め込みを用いて意味的類似性を捉えることによって,タスク内およびタスク間のセマンティックガイダンスの統合を提案する。事前訓練されたCLIPモデルから始まり、現在のすべてのタスククラスに対してソフトアサインメントを行うために \emph{Semantically-guided Representation Learning (SG-RL) モジュールを使用し、知識伝達を強化するために Semantically-guided Knowledge Distillation (SG-KD) モジュールを使用する。実験結果から,本手法の汎用および微粒なデータセット上での優位性を示した。私たちのコードは、https://github.com/aprilsveryown/semantically-guided-continual-learningで見られます。

Deep neural networks (DNNs) excel on fixed datasets but struggle with incremental and shifting data in real-world scenarios. Continual learning addresses this challenge by allowing models to learn from new data while retaining previously learned knowledge. Existing methods mainly rely on visual features, often neglecting the rich semantic information encoded in text. The semantic knowledge available in the label information of the images, offers important semantic information that can be related with previously acquired knowledge of semantic classes. Consequently, effectively leveraging this information throughout continual learning is expected to be beneficial. To address this, we propose integrating semantic guidance within and across tasks by capturing semantic similarity using text embeddings. We start from a pre-trained CLIP model, employ the \emph{Semantically-guided Representation Learning (SG-RL)} module for a soft-assignment towards all current task classes, and use the Semantically-guided Knowledge Distillation (SG-KD) module for enhanced knowledge transfer. Experimental results demonstrate the superiority of our method on general and fine-grained datasets. Our code can be found in https://github.com/aprilsveryown/semantically-guided-continual-learning.

翻訳日:2024-08-05 14:07:18 公開日:2024-08-02

# PhysMamba: リモート生理計測のためのデュアルストリームクロスアテンションSSDの活用

PhysMamba: Leveraging Dual-Stream Cross-Attention SSD for Remote Physiological Measurement ( http://arxiv.org/abs/2408.01077v1 )

ライセンス: Link先を確認

Zhixin Yan, Yan Zhong, Wenjun Zhang, Lin Shu, Hongbin Xu, Wenxiong Kang,

(参考訳) リモートフォトプラチスモグラフィー(Remote Photoplethysmography, RPPG)は、感情モニタリング、医療支援、反顔スプーフィングなどの応用に用いられる、顔ビデオから生理的信号を抽出する非接触技術である。制御された実験室環境とは異なり、現実の環境は、しばしば動きのアーティファクトやノイズを含み、既存の手法の性能に影響する。そこで本研究では,Mambaをベースとした双方向時間周波数対話モデルであるPhysMambaを提案する。 PhysMambaは最先端のMamba-2モデルを統合し、マルチストリームアーキテクチャを用いて様々なrPPG特徴を学習し、ノイズ条件下で堅牢性を向上させる。さらに、情報交換を改善し、2つのストリーム間の相補性を特徴とするCASSDモジュールを設計した。 PURE,UBFC-rPPG,MMPDを用いてPhysMambaを検証する。実験の結果,PhysMambaは様々なシナリオ,特に複雑な環境での最先端のパフォーマンスを実現し,遠隔心拍モニタリングの実用化の可能性を示した。

Remote Photoplethysmography (rPPG) is a non-contact technique for extracting physiological signals from facial videos, used in applications like emotion monitoring, medical assistance, and anti-face spoofing. Unlike controlled laboratory settings, real-world environments often contain motion artifacts and noise, affecting the performance of existing methods. To address this, we propose PhysMamba, a dual-stream time-frequency interactive model based on Mamba. PhysMamba integrates the state-of-the-art Mamba-2 model and employs a dual-stream architecture to learn diverse rPPG features, enhancing robustness in noisy conditions. Additionally, we designed the Cross-Attention State Space Duality (CASSD) module to improve information exchange and feature complementarity between the two streams. We validated PhysMamba using PURE, UBFC-rPPG and MMPD. Experimental results show that PhysMamba achieves state-of-the-art performance across various scenarios, particularly in complex environments, demonstrating its potential in practical remote heart rate monitoring applications.

翻訳日:2024-08-05 14:07:18 公開日:2024-08-02

# FCDフュージョン:可視・赤外線画像対の高速低色偏光法

FCDFusion: a Fast, Low Color Deviation Method for Fusing Visible and Infrared Image Pairs ( http://arxiv.org/abs/2408.01080v1 )

ライセンス: Link先を確認

Hesong Li, Ying Fu,

(参考訳) 可視・赤外画像融合(VIF)は、可視・赤外画像からの情報を単一の融合画像に結合することを目的としている。従来のVIF法では、色空間変換を用いて色相と彩度を元の可視像から守る。しかし、高速なVIF法では、この演算が計算の大部分を占め、高速な処理を妨げるボトルネックとなっている。本稿では,色差の少ない高速核融合法FCDFusionを提案する。 RGB色空間で直接操作することで、色空間変換なしで色情報を保存する。ガンマ補正を少しのコストで組み込んでおり、色とコントラストを迅速に改善することができる。我々は,融合過程を3次元カラーベクトルのスケーリング操作とみなし,計算を大幅に単純化した。理論的解析と実験により,1ピクセルあたりのFLOPは7個しか得られなかった。 HSV色空間を用いた最先端の高速色保存法と比較すると,計算コストの半減で高いコントラストが得られる。さらに、色保存のためのVIF法の性能を測定するための新しい測度、色偏差について提案する。カラー可視光画像を用いたVIFタスク用に特別に設計されており、既存のVIFメトリクスの欠如を克服している。私たちのコードはhttps://github.com/HeasonLee/FCDFusion.comで利用可能です。

Visible and infrared image fusion (VIF) aims to combine information from visible and infrared images into a single fused image. Previous VIF methods usually employ a color space transformation to keep the hue and saturation from the original visible image. However, for fast VIF methods, this operation accounts for the majority of the calculation and is the bottleneck preventing faster processing. In this paper, we propose a fast fusion method, FCDFusion, with little color deviation. It preserves color information without color space transformations, by directly operating in RGB color space. It incorporates gamma correction at little extra cost, allowing color and contrast to be rapidly improved. We regard the fusion process as a scaling operation on 3D color vectors, greatly simplifying the calculations. A theoretical analysis and experiments show that our method can achieve satisfactory results in only 7 FLOPs per pixel. Compared to state-of-the-art fast, color-preserving methods using HSV color space, our method provides higher contrast at only half of the computational cost. We further propose a new metric, color deviation, to measure the ability of a VIF method to preserve color. It is specifically designed for VIF tasks with color visible-light images, and overcomes deficiencies of existing VIF metrics used for this purpose. Our code is available at https://github.com/HeasonLee/FCDFusion.

翻訳日:2024-08-05 14:07:18 公開日:2024-08-02

# ヘリカルコンパクト次元を持つモデルにおける位相カシミール効果

Topological Casimir effect in models with helical compact dimensions ( http://arxiv.org/abs/2408.01082v1 )

ライセンス: Link先を確認

R. M. Avagyan, A. A. Saharian, D. H. Simonyan, G. H. Harutyunyan,

(参考訳) 一般曲率結合パラメータを持つ荷電スカラー場の真空状態の局所特性に及ぼす空間次元のヘリカルコンパクト化の影響について検討する。一般的な背景幾何学は、ヘリカル周期性条件に現れる座標を持つ部分空間において回転対称性を持つと考えられる。座標変換により、問題は同じ局所幾何学における標準準周期性条件と、コンパクト次元の長さとヘリシティパラメータによって決定される有効コンパクト化半径の問題に還元される。一般手順の応用として、ヘリカルコンパクト次元を持つ局所デ・ジッター時空を考える。バンチダヴィーズ真空状態に対するアダマール関数を用いて, 電場2乗, 電流密度, エネルギー-モーメントテンソルの真空期待値について検討した。トポロジカルコントリビューションは明確に分離され、その漸近は宇宙膨張の初期段階と後期に記述される。準周期条件の問題と比較して重要な違いは、エネルギー-運動量テンソルの非零外対角成分と非コンパクト次元に沿った電流密度の成分の出現である。

We investigate the influence of the helical compactification of spatial dimension on the local properties of the vacuum state for a charged scalar field with general curvature coupling parameter. A general background geometry is considered with rotational symmetry in the subspace with the coordinates appearing in the helical periodicity condition. It is shown that by a coordinate transformation the problem is reduced to the problem with standard quasiperiodicity condition in the same local geometry and with the effective compactification radius determined by the length of the compact dimension and the helicity parameter. As an application of the general procedure we have considered locally de Sitter spacetime with a helical compact dimension. By using the Hadamard function for the Bunch-Davies vacuum state, the vacuum expectation values of the field squared, current density, and energy-momentum tensor are studied. The topological contributions are explicitly separated and their asymptotics are described at early and late stages of cosmological expansion. An important difference, compared to the problem with quasiperiodic conditions, is the appearance of the nonzero off-diagonal component of the energy-momentum tensor and of the component of the current density along the uncompact dimension.

翻訳日:2024-08-05 14:07:18 公開日:2024-08-02

# 雑音文脈処理のための検索拡張生成における適応的コントラスト復号法

Adaptive Contrastive Decoding in Retrieval-Augmented Generation for Handling Noisy Contexts ( http://arxiv.org/abs/2408.01084v1 )

ライセンス: Link先を確認

Youna Kim, Hyuhng Joon Kim, Cheonbok Park, Choonghyun Park, Hyunsoo Cho, Junyeob Kim, Kang Min Yoo, Sang-goo Lee, Taeuk Kim,

(参考訳) オープンドメイン質問応答のような知識集約的なタスクで大きな言語モデル(LLM)を使用する場合、外部コンテキストは外部知識とLLMのパラメトリック知識のギャップを埋める可能性がある。近年,LLMのパラメトリック知識に関する文脈知識を対照的な復号法で増幅する研究が進められている。これらのアプローチは、関連するコンテキストが提供されると真に反応する可能性があるが、ノイズの多いコンテキストに直面すると脆弱性が発生する傾向がある。我々は,従来の研究の範囲を広げて,雑音の文脈を包含し,文脈の影響を効果的に活用するための適応型コントラッシブ・デコーディング(ACD)を提案する。 ACDは、ベースラインと比較してオープンドメインの質問応答タスクの改善を示す。

When using large language models (LLMs) in knowledge-intensive tasks, such as open-domain question answering, external context can bridge a gap between external knowledge and LLM's parametric knowledge. Recent research has been developed to amplify contextual knowledge over the parametric knowledge of LLM with contrastive decoding approaches. While these approaches could yield truthful responses when relevant context is provided, they are prone to vulnerabilities when faced with noisy contexts. We extend the scope of previous studies to encompass noisy contexts and propose adaptive contrastive decoding (ACD) to leverage contextual influence effectively. ACD demonstrates improvements in open-domain question answering tasks compared to baselines, especially in robustness by remaining undistracted by noisy contexts in retrieval-augmented generation.

翻訳日:2024-08-05 14:07:18 公開日:2024-08-02

# 逆気象下における3次元物体検出における卵粒径分布の影響

Effect of Fog Particle Size Distribution on 3D Object Detection Under Adverse Weather Conditions ( http://arxiv.org/abs/2408.01085v1 )

ライセンス: Link先を確認

Ajinkya Shinde, Gaurav Sharma, Manisha Pattanaik, Sri Niwas Singh,

(参考訳) 光学スペクトル信号を用いたLiDARベースのセンサーは、自律走行車システムにおける対象物に関する重要な情報を提供する上で重要な役割を果たす。しかし、大気中の霧の存在はシステム全体の性能を著しく低下させる。悪天候下における3次元物体検出における霧粒径分布の役割を解析した。我々は,三重理論と気象光学範囲(MOR)を用いて点雲発生の減衰・後方散乱係数を計算し,車,自転車,歩行者のケースシナリオにおけるシステム全体の精度を,容易で中堅な検出困難下で解析する。ガンマとジュンゲの分布は、強い対流と中程度の対流の霧環境下での霧粒子の粒径分布を数学的にモデル化するために用いられる。その後、後方散乱係数値に基づいてKITTIデータセットを修正し、異なる検出困難下で、PV-RCNN++ディープニューラルネットワークモデルを用いてトレーニングした。その結果, 対象物体の寸法変化, 霧環境の性質, 検出困難度など, 車両の精度は99%程度, 歩行者の精度は73%程度であった。

LiDAR-based sensors employing optical spectrum signals play a vital role in providing significant information about the target objects in autonomous driving vehicle systems. However, the presence of fog in the atmosphere severely degrades the overall system's performance. This manuscript analyzes the role of fog particle size distributions in 3D object detection under adverse weather conditions. We utilise Mie theory and meteorological optical range (MOR) to calculate the attenuation and backscattering coefficient values for point cloud generation and analyze the overall system's accuracy in Car, Cyclist, and Pedestrian case scenarios under easy, medium and hard detection difficulties. Gamma and Junge (Power-Law) distributions are employed to mathematically model the fog particle size distribution under strong and moderate advection fog environments. Subsequently, we modified the KITTI dataset based on the backscattering coefficient values and trained it on the PV-RCNN++ deep neural network model for Car, Cyclist, and Pedestrian cases under different detection difficulties. The result analysis shows a significant variation in the system's accuracy concerning the changes in target object dimensionality, the nature of the fog environment and increasing detection difficulties, with the Car exhibiting the highest accuracy of around 99% and the Pedestrian showing the lowest accuracy of around 73%.

翻訳日:2024-08-05 14:07:18 公開日:2024-08-02

# 知識グラフを用いた接地交換を用いた対話における情報ギャップのブリッジ化

Bridging Information Gaps in Dialogues With Grounded Exchanges Using Knowledge Graphs ( http://arxiv.org/abs/2408.01088v1 )

ライセンス: Link先を確認

Phillip Schneider, Nektarios Machner, Kristiina Jokinen, Florian Matthes,

(参考訳) 知識モデルは対話システムの基本であり、ドメイン固有の知識を扱う必要がある。情報提供会話における効果的なコミュニケーションの確保には、ユーザの理解とシステムに利用可能な知識の整合が不可欠である。しかしながら、対話システムは、自然言語で情報がどのように表現されるかという点における意味的な矛盾から生じる課題に直面することが多い。この問題に対処するために,対話参加者間の共有知識を確立することで,情報ギャップを埋めるメカニズムである対話基盤のための大規模言語モデルの可能性を検討する。私たちのアプローチでは、5つの知識領域にまたがる人間の会話を注釈付けして、BridgeKGと呼ばれる対話コーパスを作成します。本データセットの一連の実験を通じて,知識グラフ構造内の接地行動の分類と接地情報項目の同定において,大規模言語モデルの有効性を実証的に評価した。本研究は,これらのモデルが会話の接地作業や一般的な予測誤りに対して,文脈内学習をどのように利用するかの知見を提供する。本稿では,非構造化対話発話と構造化情報項目のセマンティックレイヤとして,モデルが知識グラフをどのように扱うかについて議論する。

Knowledge models are fundamental to dialogue systems for enabling conversational interactions, which require handling domain-specific knowledge. Ensuring effective communication in information-providing conversations entails aligning user understanding with the knowledge available to the system. However, dialogue systems often face challenges arising from semantic inconsistencies in how information is expressed in natural language compared to how it is represented within the system's internal knowledge. To address this problem, we study the potential of large language models for conversational grounding, a mechanism to bridge information gaps by establishing shared knowledge between dialogue participants. Our approach involves annotating human conversations across five knowledge domains to create a new dialogue corpus called BridgeKG. Through a series of experiments on this dataset, we empirically evaluate the capabilities of large language models in classifying grounding acts and identifying grounded information items within a knowledge graph structure. Our findings offer insights into how these models use in-context learning for conversational grounding tasks and common prediction errors, which we illustrate with examples from challenging dialogues. We discuss how the models handle knowledge graphs as a semantic layer between unstructured dialogue utterances and structured information items.

翻訳日:2024-08-05 14:07:18 公開日:2024-08-02

# ユニバーサルドメイン適応のためのプロトタイプ部分最適輸送

Prototypical Partial Optimal Transport for Universal Domain Adaptation ( http://arxiv.org/abs/2408.01089v1 )

ライセンス: Link先を確認

Yucheng Yang, Xiang Gu, Jian Sun,

(参考訳) ユニバーサルドメイン適応(UniDA)は、両方のドメインの同じラベルセットを必要とすることなく、ラベル付きソースドメインからラベルなしターゲットドメインに知識を転送することを目的としている。ドメインとカテゴリシフトの存在はタスクを難しくし、ドメインギャップを減らす前に「既知の」サンプル(両方のドメインにラベルが存在するサンプル)と「未知の」サンプル(一つのドメインにラベルがあるサンプル)を区別する必要がある。本稿では,2つの分布を部分的に整合させるだけでよい分布マッチングの観点から問題を考察する。ミニバッチ型部分最適輸送(m-PPOT)と呼ばれる新しい手法を提案する。トレーニングフェーズでは,m-PPOTの最小化に加えて,m-PPOTの輸送計画を利用して原型および対象試料の再重み付けを行い,再重み付きエントロピー損失と再重み付きクロスエントロピー損失を設計し,"未知"と"未知"のサンプルを識別する。 4つのベンチマーク実験の結果,提案手法は従来のUniDA手法よりも優れていた。

Universal domain adaptation (UniDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain without requiring the same label sets of both domains. The existence of domain and category shift makes the task challenging and requires us to distinguish "known" samples (i.e., samples whose labels exist in both domains) and "unknown" samples (i.e., samples whose labels exist in only one domain) in both domains before reducing the domain gap. In this paper, we consider the problem from the point of view of distribution matching which we only need to align two distributions partially. A novel approach, dubbed mini-batch Prototypical Partial Optimal Transport (m-PPOT), is proposed to conduct partial distribution alignment for UniDA. In training phase, besides minimizing m-PPOT, we also leverage the transport plan of m-PPOT to reweight source prototypes and target samples, and design reweighted entropy loss and reweighted cross-entropy loss to distinguish "known" and "unknown" samples. Experiments on four benchmarks show that our method outperforms the previous state-of-the-art UniDA methods.

翻訳日:2024-08-05 14:07:18 公開日:2024-08-02

# ニューロモルフィックプリミティブを用いた汎用データフローモデル

General-purpose Dataflow Model with Neuromorphic Primitives ( http://arxiv.org/abs/2408.01090v1 )

ライセンス: Link先を確認

Weihao Zhang, Yu Du, Hongyi Li, Songchen Ma, Rong Zhao,

(参考訳) ニューロモルフィックコンピューティングは、ニューラルネットワーク以外の様々なアプリケーションに高性能な利点をもたらす大きな可能性を示している。しかし、プログラムの汎用性とニューロモルフィックハードウェア効率のギャップを埋めるためには、ニューロモルフィックコンピューティングの特徴と整合する汎用プログラム実行モデルが必要である。データフローモデルは潜在的な解決策を提供するが、制御フロープログラムを扱う際には、グラフの複雑さとニューロモルフィックハードウェアとの非互換性に直面するため、プログラム性と性能が低下する。本稿では、制御論理のためのコンパクトで簡潔でニューロモーフィックなプログラム表現を提供するニューロモーフィック・データフローと呼ばれる、ニューロモーフィック・ハードウェアに適したデータフローモデルを提案する。ニューロモルフィックデータフローは「いつ」と「どこで」プリミティブを導入し、制御の視点を再構築する。ニューロモルフィックデータフローは、これらのプリミティブをデータフロースキーマに埋め込む。本手法は,プログラム性と可塑性を両立したニューロモルフィックハードウェアへの汎用プログラムの展開を可能にするとともに,ハードウェアの可能性を完全に活用する。

Neuromorphic computing exhibits great potential to provide high-performance benefits in various applications beyond neural networks. However, a general-purpose program execution model that aligns with the features of neuromorphic computing is required to bridge the gap between program versatility and neuromorphic hardware efficiency. The dataflow model offers a potential solution, but it faces high graph complexity and incompatibility with neuromorphic hardware when dealing with control flow programs, which decreases the programmability and performance. Here, we present a dataflow model tailored for neuromorphic hardware, called neuromorphic dataflow, which provides a compact, concise, and neuromorphic-compatible program representation for control logic. The neuromorphic dataflow introduces "when" and "where" primitives, which restructure the view of control. The neuromorphic dataflow embeds these primitives in the dataflow schema with the plasticity inherited from the spiking algorithms. Our method enables the deployment of general-purpose programs on neuromorphic hardware with both programmability and plasticity, while fully utilizing the hardware's potential.

翻訳日:2024-08-05 14:07:18 公開日:2024-08-02

# 分散不協和音:自己矛盾命令に対する大規模マルチモーダルモデルのベンチマーク

Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions ( http://arxiv.org/abs/2408.01091v1 )

ライセンス: Link先を確認

Jin Gao, Lei Gan, Yuankai Li, Yixin Ye, Dequan Wang,

(参考訳) 大型マルチモーダルモデル(LMM)は、人間の指示に固執することが優れている。しかし、言語初心者や脆弱な人口にとって困難であるマルチモーダル相互作用や文脈長の増加により、自己矛盾的な指示が生じる可能性がある。矛盾するコマンドを認識する上でのLMMの能力を評価するために,自己コントラクショナルインストラクションベンチマークを導入する。言語とビジョンのパラダイムに均等に分散した2万のコンフリクトで構成されている。プロセスの迅速化と幅広い命令形式を包含できる新しい自動データセット作成フレームワークによって構築されている。我々の総合的な評価では、現在のLMMは、自己認識の欠如により、マルチモーダルな命令の不一致を特定するのに一貫して苦労している。そこで本研究では,外部から認識を注入する認知覚醒プロンプトを提案する。データセットとコードはここにある。

Large multimodal models (LMMs) excel in adhering to human instructions. However, self-contradictory instructions may arise due to the increasing trend of multimodal interaction and context length, which is challenging for language beginners and vulnerable populations. We introduce the Self-Contradictory Instructions benchmark to evaluate the capability of LMMs in recognizing conflicting commands. It comprises 20,000 conflicts, evenly distributed between language and vision paradigms. It is constructed by a novel automatic dataset creation framework, which expedites the process and enables us to encompass a wide range of instruction forms. Our comprehensive evaluation reveals current LMMs consistently struggle to identify multimodal instruction discordance due to a lack of self-awareness. Hence, we propose the Cognitive Awakening Prompting to inject cognition from external, largely enhancing dissonance detection. The dataset and code are here: https://selfcontradiction.github.io/.

翻訳日:2024-08-05 14:07:18 公開日:2024-08-02

# 例外点における創発的非エルミート保存法

Emergent non-Hermitian conservation laws at exceptional points ( http://arxiv.org/abs/2408.01092v1 )

ライセンス: Link先を確認

Zuo Wang, Liang He,

(参考訳) 非エルミート系は、それらの例外点(EP)においてリッチな静的および動的性質を示すことができる。ここでは、EP、すなわち一連の非エルミート保存法が出現する際、異なる現象の別の類を同定する。我々は、これらの異なる現象を非エルミート的ハイゼンベルク連鎖で具体的に示し、これらの創発的非エルミート保存則をEPで特定するための一般的な理論を定式化する。 EPの運動定数とそれに対応するヘルミタン系の運動定数を1対1で対応させることで、それらの物理的起源を補助系における創発対称性の存在に遡る。量子回路上の具体的なシミュレーションは、これらの創発的な保存されたダイナミクスが、現在のデジタル量子コンピューティングシステムで容易に観察できることを示している。

Non-Hermitian systems can manifest rich static and dynamical properties at their exceptional points (EPs). Here, we identify yet another class of distinct phenomena that is hinged on EPs, namely, the emergence of a series of non-Hermitian conservation laws. We demonstrate these distinct phenomena concretely in the non-Hermitian Heisenberg chain and formulate a general theory for identifying these emergent non-Hermitian conservation laws at EPs. By establishing a one-to-one correspondence between the constant of motions at EPs and those in corresponding auxiliary Hermitian systems, we trace their physical origin back to the presence of emergent symmetries in the auxiliary systems. Concrete simulations on quantum circuits show that these emergent conserved dynamics can be readily observed in current digital quantum computing systems.

翻訳日:2024-08-05 14:07:18 公開日:2024-08-02

# バイエンコーダニューラルサーチにおける符号化-探索分離視点

An Encoding--Searching Separation Perspective on Bi-Encoder Neural Search ( http://arxiv.org/abs/2408.01094v1 )

ライセンス: Link先を確認

Hung-Nghiep Tran, Akiko Aizawa, Atsuhiro Takasu,

(参考訳) 本稿では,ニューラルサーチのためのバイエンコーダアーキテクチャの新しい視点をレビューし,分析し,提案する。ビエンコーダアーキテクチャは、テスト時の単純さとスケーラビリティのために広く使用されているが、に見えるデータセットでの低パフォーマンスや、新しいデータセットでの低ゼロショットパフォーマンスなど、いくつかの注目すべき問題がある。本稿では,これらの問題を解析し,符号化情報ボトルネック問題と組込み検索の基本前提の限界という2つの主要な批判を要約する。そこで我々は,エンコーディングと探索操作を論理的に解析する思考実験を構築し,埋め込み探索の基本仮定に挑戦する。これらの観測結果に基づいて,符号化と探索操作を概念的に,実用的に分離する‘textit{encoding-searching separation’ という,バイエンコーダアーキテクチャの新しい視点を提案する。この新たな視点を適用して、特定された問題の根本原因を説明し、問題を緩和する方法について議論する。最後に、新しい視点の根底にある概念や、それが露呈する設計面、そこから生じる潜在的研究の方向性について論じる。

This paper reviews, analyzes, and proposes a new perspective on the bi-encoder architecture for neural search. While the bi-encoder architecture is widely used due to its simplicity and scalability at test time, it has some notable issues such as low performance on seen datasets and weak zero-shot performance on new datasets. In this paper, we analyze these issues and summarize two main critiques: the encoding information bottleneck problem and limitations of the basic assumption of embedding search. We then construct a thought experiment to logically analyze the encoding and searching operations and challenge the basic assumption of embedding search. Building on these observations, we propose a new perspective on the bi-encoder architecture called the \textit{encoding--searching separation} perspective, which conceptually and practically separates the encoding and searching operations. This new perspective is applied to explain the root cause of the identified issues and discuss ways to mitigate the problems. Finally, we discuss the implications of the ideas underlying the new perspective, the design surface that it exposes and the potential research directions arising from it.

翻訳日:2024-08-05 14:07:18 公開日:2024-08-02

# 6つのドラゴンが再び飛ぶ: トランスフォーマーと新しいエンコーディングで15世紀の韓国の宮廷音楽が復活

Six Dragons Fly Again: Reviving 15th-Century Korean Court Music with Transformers and Novel Encoding ( http://arxiv.org/abs/2408.01096v1 )

ライセンス: Link先を確認

Danbinaerin Han, Mark Gotham, Dongmin Kim, Hannah Park, Sihun Lee, Dasaem Jeong,

(参考訳) 15世紀の朝鮮の宮廷音楽「チワピョン」と「チワピョンヒョン」を復活させるプロジェクトを紹介します。韓国の音楽表記体系であるJeongganboの初期の例の1つで、残りのバージョンは初歩的なメロディのみで構成されている。我々の研究チームは、ナショナル・グガック(韓国伝統音楽センター)に委託され、この古いメロディを6パートのアンサンブルのための演奏可能なアレンジに変換することを目指していました。ベスポーク光音楽認識により取得したJeongganboデータを用いて,BERTのようなマスキング言語モデルとエンコーダ・デコーダ・トランスモデルを訓練した。また,Jeongganboの構造を厳密に追従し,音符の長さを位置として示す符号化方式を提案する。結果、ChwapyeongとChwipunghyeongの機械変換版は専門家によって評価され、ナショナル・グガック・センターのコート・ミュージック・オーケストラによって演奏された。本研究は, 注意深い設計と組み合わせれば, 限られたトレーニングデータを用いて, 生成モデルを従来の音楽に適用できることを実証する。

We introduce a project that revives a piece of 15th-century Korean court music, Chihwapyeong and Chwipunghyeong, composed upon the poem Songs of the Dragon Flying to Heaven. One of the earliest examples of Jeongganbo, a Korean musical notation system, the remaining version only consists of a rudimentary melody. Our research team, commissioned by the National Gugak (Korean Traditional Music) Center, aimed to transform this old melody into a performable arrangement for a six-part ensemble. Using Jeongganbo data acquired through bespoke optical music recognition, we trained a BERT-like masked language model and an encoder-decoder transformer model. We also propose an encoding scheme that strictly follows the structure of Jeongganbo and denotes note durations as positions. The resulting machine-transformed version of Chihwapyeong and Chwipunghyeong were evaluated by experts and performed by the Court Music Orchestra of National Gugak Center. Our work demonstrates that generative models can successfully be applied to traditional music with limited training data if combined with careful design.

翻訳日:2024-08-05 14:07:18 公開日:2024-08-02

# 実画像復元のための事前学習モデルによるコントリビューションに基づく低ランク適応

Contribution-based Low-Rank Adaptation with Pre-training Model for Real Image Restoration ( http://arxiv.org/abs/2408.01099v1 )

ライセンス: Link先を確認

Donwon Park, Hayeon Kim, Se Young Chun,

(参考訳) 近年,事前学習モデルと効率的なパラメータチューニングは,マスキングと即時チューニングの助けを借りて,自然言語処理やハイレベルコンピュータビジョンにおいて顕著な成功を収めている。しかし、低レベルのコンピュータビジョンでは、AIエッジデバイスに新しいタスクを統合する際のメモリインフレーションの問題など、さまざまな現実的なタスクの重要性とメリットにもかかわらず、事前訓練されたモデルに対する限定的な調査や、効率的な微調整戦略がまだ検討されていない。本稿では,複数画像復元のためのコントリビューションベース低ランク適応(CoLoRA)と呼ばれる新しいパラメータチューニング手法を提案する。すべてのネットワークパラメータをチューニングする先行技術とは異なり、我々のCoLoRAは、新しいビジョンタスク毎にLoRA(ローランク適応)を活用して、そのタスクの層容量を適応的に決定し、完全なチューニングに匹敵するパフォーマンスをもたらすことで、効果的に小さなパラメータを微調整します。さらに,我々のPRD戦略は,事前学習モデルの性能向上と,合成事前学習と実世界の微調整を橋渡しするロバスト性の向上を可能にする。 PRODを用いた我々のCoLoRAは、既知のタスクと新規タスクの合成と実世界の両方のデータセットにおいて、様々な画像復元タスクにおいて優れた性能を示した。

Recently, pre-trained model and efficient parameter tuning have achieved remarkable success in natural language processing and high-level computer vision with the aid of masked modeling and prompt tuning. In low-level computer vision, however, there have been limited investigations on pre-trained models and even efficient fine-tuning strategy has not yet been explored despite its importance and benefit in various real-world tasks such as alleviating memory inflation issue when integrating new tasks on AI edge devices. Here, we propose a novel efficient parameter tuning approach dubbed contribution-based low-rank adaptation (CoLoRA) for multiple image restorations along with effective pre-training method with random order degradations (PROD). Unlike prior arts that tune all network parameters, our CoLoRA effectively fine-tunes small amount of parameters by leveraging LoRA (low-rank adaptation) for each new vision task with our contribution-based method to adaptively determine layer by layer capacity for that task to yield comparable performance to full tuning. Furthermore, our PROD strategy allows to extend the capability of pre-trained models with improved performance as well as robustness to bridge synthetic pre-training and real-world fine-tuning. Our CoLoRA with PROD has demonstrated its superior performance in various image restoration tasks across diverse degradation types on both synthetic and real-world datasets for known and novel tasks.

翻訳日:2024-08-05 13:57:23 公開日:2024-08-02

# ハイブリッド量子ソフトウェアにおける解析可能性モデルの検証

Validation of an Analysability Model in Hybrid Quantum Software ( http://arxiv.org/abs/2408.01105v1 )

ライセンス: Link先を確認

Díaz-Muñoz Ana, Cruz-Lemus José A., Rodríguez Moisés, Piattini Mario, Baldassarre Maria Teresa,

(参考訳) 量子古典ハイブリッドコンピューティングの文脈において、ソフトウェアを理解して修正することの容易さである分析可能性を評価することは、量子アルゴリズムの複雑さと新規性に起因する重大な課題を提起する。量子ソフトウェア開発における進歩はあったが、標準的なソフトウェア品質評価手法は量子コンポーネントの仕様を完全には解決していないため、ハイブリッドソフトウェア製品の品質を確実に維持する能力のギャップが生じる。本報告では,イタリアとスペインの学術機関による国際共同弁論的アプローチを通じて,ハイブリッドソフトウェアの分析可能性に着目した品質モデルを検証することを目的としている。このアプローチは、より詳細な分析と検証の方法論を可能にし、量子コンピューティングにおけるソフトウェア品質評価における将来の研究と開発のためのフレームワークを確立する。

In the context of quantum-classical hybrid computing, evaluating analysability, which is the ease of understanding and modifying software, presents significant challenges due to the complexity and novelty of quantum algorithms. Although advances have been made in quantum software development, standard software quality evaluation methods do not fully address the specifics of quantum components, resulting in a gap in the ability to ensure and maintain the quality of hybrid software products. In this registered report proposal, we intend to validate a quality model focused on the analysability of hybrid software through an international collab orative approach involving academic institutions from Italy and Spain through a controlled experiment. This approach allows for a more detailed analysis and validation methodology and establishes a framework for future research and developments in software quality assessment in quantum computing.

翻訳日:2024-08-05 13:57:23 公開日:2024-08-02

# BioRAG: 生物学的質問応答のためのRAG-LLMフレームワーク

BioRAG: A RAG-LLM Framework for Biological Question Reasoning ( http://arxiv.org/abs/2408.01107v1 )

ライセンス: Link先を確認

Chengrui Wang, Qingqing Long, Xiao Meng, Xunxin Cai, Chengjun Wu, Zhen Meng, Xuezhi Wang, Yuanchun Zhou,

(参考訳) 生命科学研究のための質問答えシステムは、発見の急激なペース、洞察の進化、知識エンティティ間の複雑な相互作用を特徴とし、総合的な知識倉庫と正確な情報検索を維持する上で、ユニークな課題を提示する。このような問題に対処するために,我々は,Large Language Models (LLMs) フレームワークを備えた新しいレトリーバル拡張生成(RAG)であるBioRAGを紹介した。このアプローチは、基本的な知識として2200万の科学論文を解析、索引付け、セグメント化することから始まり、続いて、このドメインに適した特別な埋め込みモデルをトレーニングします。さらに、各クエリとコンテキスト間の複雑な相互関係のモデル化を支援するドメイン固有の知識階層を組み込むことで、ベクトル検索プロセスを強化する。最新の情報を必要とするクエリに対して、BioRAGは質問を分解し、検索エンジンに組み込まれた反復的な検索プロセスを用いてステップバイステップの推論を行う。厳密な実験により、我々のモデルは、複数のライフサイエンス質問応答タスクにおいて、微調整 LLM や LLM 、検索エンジン、その他の科学的RAG フレームワークよりも優れていることが示された。

The question-answering system for Life science research, which is characterized by the rapid pace of discovery, evolving insights, and complex interactions among knowledge entities, presents unique challenges in maintaining a comprehensive knowledge warehouse and accurate information retrieval. To address these issues, we introduce BioRAG, a novel Retrieval-Augmented Generation (RAG) with the Large Language Models (LLMs) framework. Our approach starts with parsing, indexing, and segmenting an extensive collection of 22 million scientific papers as the basic knowledge, followed by training a specialized embedding model tailored to this domain. Additionally, we enhance the vector retrieval process by incorporating a domain-specific knowledge hierarchy, which aids in modeling the intricate interrelationships among each query and context. For queries requiring the most current information, BioRAG deconstructs the question and employs an iterative retrieval process incorporated with the search engine for step-by-step reasoning. Rigorous experiments have demonstrated that our model outperforms fine-tuned LLM, LLM with search engines, and other scientific RAG frameworks across multiple life science question-answering tasks.

翻訳日:2024-08-05 13:57:23 公開日:2024-08-02

# セマンティック・シンボリック環境におけるエピステミック・アンサンブル(証明付き拡張版)

Epistemic Ensembles in Semantic and Symbolic Environments (Extended Version with Proofs) ( http://arxiv.org/abs/2408.01115v1 )

ライセンス: Link先を確認

Rolf Hennicker, Alexander Knapp, Martin Wirsing,

(参考訳) エピステミック・アンサンブル(英: epistemic ensemble)は、知識に基づくエージェントによって構成され、自分自身とその仲間についての知識や信念を検索し、共有することができる。これらのエージェントは、グローバルな知識状態にアクセスし、行動を使用してコミュニケーションと協力を行い、集合的な知識状態を変更する。本研究では, 共通の統語的操作的アンサンブルのセマンティクスに基づく, 表皮的アンサンブルのための2種類の数学的セマンティクスについて検討する。これらの環境を関連づけるために、我々は「phi}-equivalence」という概念を使い、もし「phi}-equivalence」の任意の式が、それが知識基底の要素であるならば、それは「phi}-equivalent」である。我々の主定理は、 {\Phi} と同値な構成が互いにシミュレートし、同じ動的エピステミックアンサンブル式を満たすことを示している。

An epistemic ensemble is composed of knowledge-based agents capable of retrieving and sharing knowledge and beliefs about themselves and their peers. These agents access a global knowledge state and use actions to communicate and cooperate, altering the collective knowledge state. We study two types of mathematical semantics for epistemic ensembles based on a common syntactic operational ensemble semantics: a semantic environment defined by a class of global epistemic states, and a symbolic environment consisting of a set of epistemic formul{\ae}. For relating these environments, we use the concept of {\Phi}-equivalence, where a class of epistemic states and a knowledge base are {\Phi}-equivalent, if any formula of {\Phi} holds in the class of epistemic states if, and only if, it is an element of the knowledge base. Our main theorem shows that {\Phi}-equivalent configurations simulate each other and satisfy the same dynamic epistemic ensemble formulae.

翻訳日:2024-08-05 13:57:23 公開日:2024-08-02

# IAI Group at CheckThat! 2024: Transformer Models and Data Augmentation for Checkworthy Claim Detection

IAI Group at CheckThat! 2024: Transformer Models and Data Augmentation for Checkworthy Claim Detection ( http://arxiv.org/abs/2408.01118v1 )

ライセンス: Link先を確認

Peter Røysland Aarnes, Vinay Setty, Petra Galuščáková,

(参考訳) 本稿は,2024 CheckThat! の枠組みの中で,IAIグループによるクレームの自動チェックハーネス評価への参加について述べる。 Task 1: Check-Worthiness Estimation」に収録。このタスクには、英語、オランダ語、アラビア語の政治討論やTwitterのデータで、チェック価値のあるクレームを自動的に検出することが含まれる。事前訓練された生成デコーダとエンコーダトランスフォーマモデルを用いて、少数ショット連鎖推論、微調整、データ拡張、言語から別の言語への変換学習などの手法を用いた。パフォーマンス面では様々な成功を収めたにもかかわらず、我々のモデルは主催者のリーダーボードに顕著な配置を達成しました。英語では9位、オランダ語では3位、アラビア語では最高位です。開発テストデータセットと比較してラベル付きテストデータセットの性能は著しく低下しているものの、クレーム検出研究における継続的な取り組みに寄与し、クレーム検証システムにおける言語固有の適応の課題と可能性を強調した。

This paper describes IAI group's participation for automated check-worthiness estimation for claims, within the framework of the 2024 CheckThat! Lab "Task 1: Check-Worthiness Estimation". The task involves the automated detection of check-worthy claims in English, Dutch, and Arabic political debates and Twitter data. We utilized various pre-trained generative decoder and encoder transformer models, employing methods such as few-shot chain-of-thought reasoning, fine-tuning, data augmentation, and transfer learning from one language to another. Despite variable success in terms of performance, our models achieved notable placements on the organizer's leaderboard: ninth-best in English, third-best in Dutch, and the top placement in Arabic, utilizing multilingual datasets for enhancing the generalizability of check-worthiness detection. Despite a significant drop in performance on the unlabeled test dataset compared to the development test dataset, our findings contribute to the ongoing efforts in claim detection research, highlighting the challenges and potential of language-specific adaptations in claim verification systems.

翻訳日:2024-08-05 13:57:23 公開日:2024-08-02

# タスクプロンプトベクトル:マルチタスクソフトプロンプト転送による効果的な初期化

Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer ( http://arxiv.org/abs/2408.01119v1 )

ライセンス: Link先を確認

Robert Belanec, Simon Ostermann, Ivan Srba, Maria Bielikova,

(参考訳) Prompt tuningは、大規模言語モデル(LLM)をトレーニングするためのモジュール式で効率的なソリューションである。主な利点の1つはタスクのモジュール化であり、マルチタスク問題に適している。しかし、現在のソフトプロンプトベースの手法は、しばしばマルチタスクのモジュラリティを犠牲にし、新たに追加されたタスクごとにトレーニングプロセスを完全にあるいは部分的に繰り返す必要がある。タスクベクトルに関する最近の研究は、望まれるマルチタスク性能を達成するために、フルモデルウェイトに算術演算を適用しているが、ソフトプロンプトに対する同様のアプローチはいまだに欠落している。そこで本研究では,調整したソフトプロンプトの重みとランダム初期化との要素的差異から生成したタスクプロンプトベクトルを提案する。 12個のNLUデータセットの実験結果から、タスクプロンプトベクトルを低リソース設定で使用して、類似タスクのプロンプトチューニングを効果的に初期化できることが示されている。さらに,タスクプロンプトベクトルはプロンプトチューニングのランダム初期化とは無関係であることを示す。これにより、異なるタスクから事前訓練されたベクトルで即時算術を行うことができる。このようにして、複数のタスクからタスクプロンプトベクトルを算術的に加算することで、場合によっては最先端のベースラインを上回ります。

Prompt tuning is a modular and efficient solution for training large language models (LLMs). One of its main advantages is task modularity, making it suitable for multi-task problems. However, current soft-prompt-based methods often sacrifice multi-task modularity, requiring the training process to be fully or partially repeated for each newly added task. While recent work on task vectors applied arithmetic operations on full model weights to achieve the desired multi-task performance, a similar approach for soft-prompts is still missing. To this end, we introduce Task Prompt Vectors, created by element-wise difference between weights of tuned soft-prompts and their random initialization. Experimental results on 12 NLU datasets show that task prompt vectors can be used in low-resource settings to effectively initialize prompt tuning on similar tasks. In addition, we show that task prompt vectors are independent of the random initialization of prompt tuning. This allows prompt arithmetics with the pre-trained vectors from different tasks. In this way, by arithmetic addition of task prompt vectors from multiple tasks, we are able to outperform a state-of-the-art baseline in some cases.

翻訳日:2024-08-05 13:57:23 公開日:2024-08-02

# マルチタスク視覚グラウンドのための効率的かつ効果的なトランスフォーマーデコーダベースフレームワーク

An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding ( http://arxiv.org/abs/2408.01120v1 )

ライセンス: Link先を確認

Wei Chen, Long Chen, Yu Wu,

(参考訳) ほとんどの先進的な視覚接地法は、視覚言語的特徴融合のためのトランスフォーマーに依存している。しかし、これらのトランスフォーマーベースのアプローチは、特に高解像度の画像や長い文脈文を扱う場合、トランスフォーマーエンコーダの自己保持機構により、計算コストが2次的にエスカレートするなど、大きな欠点に直面する。この2次計算負荷の増加は、長い言語表現を含む会話に基づく推論セグメンテーションのような、より複雑なシーンへの視覚的グラウンドの適用性を制限している。本稿では,トランスフォーマーデコーダをベースとした効率的なマルチタスクビジュアルグラウンドティング(EEVG)フレームワークを提案する。言語的側面では、言語的特徴がメモリとして入力され、視覚的特徴がクエリとして入力される、視覚的特徴と言語的特徴を融合するためにTransformer Decoderを使用します。これにより、融合は言語表現長と線形にスケールすることができる。視覚的側面では、注目スコアに基づく背景視覚トークンを排除し、パラメータフリーで計算を削減できる手法を導入する。次に、残りのスパース特徴写像からセグメント化マスクを直接予測するために、ライトマスクヘッドを設計する。ベンチマークの大規模な結果とアブレーション研究は、我々のアプローチの有効性と有効性を示している。コードはhttps://github.com/chenwei746/EEVGで入手できる。

Most advanced visual grounding methods rely on Transformers for visual-linguistic feature fusion. However, these Transformer-based approaches encounter a significant drawback: the computational costs escalate quadratically due to the self-attention mechanism in the Transformer Encoder, particularly when dealing with high-resolution images or long context sentences. This quadratic increase in computational burden restricts the applicability of visual grounding to more intricate scenes, such as conversation-based reasoning segmentation, which involves lengthy language expressions. In this paper, we propose an efficient and effective multi-task visual grounding (EEVG) framework based on Transformer Decoder to address this issue, which reduces the cost in both language and visual aspects. In the language aspect, we employ the Transformer Decoder to fuse visual and linguistic features, where linguistic features are input as memory and visual features as queries. This allows fusion to scale linearly with language expression length. In the visual aspect, we introduce a parameter-free approach to reduce computation by eliminating background visual tokens based on attention scores. We then design a light mask head to directly predict segmentation masks from the remaining sparse feature maps. Extensive results and ablation studies on benchmarks demonstrate the efficiency and effectiveness of our approach. Code is available in https://github.com/chenwei746/EEVG.

翻訳日:2024-08-05 13:57:23 公開日:2024-08-02

# 説明責任は賢明である - 電力グリッドのためのAIベースのサービスの技術的および規制的景観をナビゲートする

Being Accountable is Smart: Navigating the Technical and Regulatory Landscape of AI-based Services for Power Grid ( http://arxiv.org/abs/2408.01121v1 )

ライセンス: Link先を確認

Anna Volkova, Mahdieh Hatamian, Alina Anapyanova, Hermann de Meer,

(参考訳) 人工知能の出現と電力グリッドのデジタル化により、スマートグリッドのためのAIベースのサービスのための多くの効果的なアプリケーションシナリオが導入された。それでも、重要なインフラストラクチャにAIを採用することは、不明確な規制とリスク定量化テクニックの欠如による課題を提起する。 AIベースのサービスをスマートグリッドに統合するための規制された説明可能なアプローチは、日々のプラクティスにおける革新的な手法の採用を加速し、社会の一般的な安全上の懸念に対処する。本稿では、説明責任を定義し、エネルギーセクターにおけるAIベースのサービスの重要性を強調することにより、この目的に寄与する。 AI法の現在の欠点を根底から説明し、これらの問題に潜在的に委譲された行為で対処するアプローチを提案する。説明責任AIベースのスマートグリッドサービスの開発と運用のための技術アプローチでは、さまざまなサービスライフサイクルフェーズを評価し、関連する説明責任リスクを特定することができる。

The emergence of artificial intelligence and digitization of the power grid introduced numerous effective application scenarios for AI-based services for the smart grid. Nevertheless, adopting AI in critical infrastructures presents challenges due to unclear regulations and lacking risk quantification techniques. Regulated and accountable approaches for integrating AI-based services into the smart grid could accelerate the adoption of innovative methods in daily practices and address society's general safety concerns. This paper contributes to this objective by defining accountability and highlighting its importance for AI-based services in the energy sector. It underlines the current shortcomings of the AI Act and proposes an approach to address these issues in a potential delegated act. The proposed technical approach for developing and operating accountable AI-based smart grid services allows for assessing different service life cycle phases and identifying related accountability risks.

翻訳日:2024-08-05 13:57:23 公開日:2024-08-02

# CFBench: LLMの総合的制約フォローベンチマーク

CFBench: A Comprehensive Constraints-Following Benchmark for LLMs ( http://arxiv.org/abs/2408.01122v1 )

ライセンス: Link先を確認

Tao Zhang, Yanjun Shen, Wenjing Luo, Yan Zhang, Hao Liang, Tao Zhang, Fan Yang, Mingan Lin, Yujing Qiao, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou,

(参考訳) 自然言語命令の解釈と追従におけるLLM(Large Language Models)の有効性は、高度な現実世界のアプリケーションへの展開において重要である。既存の評価は主に断片化された制約や狭いシナリオに重点を置いているが、ユーザの視点から制約の包括性と信頼性を見落としている。このギャップを埋めるために、我々はCFBenchを提案する。CFBenchはLLMのベンチマークに従って、200以上の実環境シナリオと50以上のNLPタスクをカバーする1,000以上のキュレートされたサンプルを特徴とする大規模な包括的制約である。 CFBenchは実世界の命令から厳密に制約をコンパイルし、10のプライマリカテゴリと25以上のサブカテゴリを含む制約型のための革新的な体系的フレームワークを構築し、各制約が命令内にシームレスに統合されることを保証する。 LLM出力の評価がユーザ認識と一致していることを確認するために,多次元評価基準と要求優先化を統合し,制約,指示,要求充足の様々な観点を網羅する高度な方法論を提案する。 CFBench上での現在のLLMの評価は、制約の改善のためのかなりの余地を明らかにし、さらに影響要因と強化戦略について検討する。データとコードはhttps://github.com/PKU-Baichuan-MLSystemLab/CFBenchで公開されている。

The adeptness of Large Language Models (LLMs) in comprehending and following natural language instructions is critical for their deployment in sophisticated real-world applications. Existing evaluations mainly focus on fragmented constraints or narrow scenarios, but they overlook the comprehensiveness and authenticity of constraints from the user's perspective. To bridge this gap, we propose CFBench, a large-scale Comprehensive Constraints Following Benchmark for LLMs, featuring 1,000 curated samples that cover more than 200 real-life scenarios and over 50 NLP tasks. CFBench meticulously compiles constraints from real-world instructions and constructs an innovative systematic framework for constraint types, which includes 10 primary categories and over 25 subcategories, and ensures each constraint is seamlessly integrated within the instructions. To make certain that the evaluation of LLM outputs aligns with user perceptions, we propose an advanced methodology that integrates multi-dimensional assessment criteria with requirement prioritization, covering various perspectives of constraints, instructions, and requirement fulfillment. Evaluating current leading LLMs on CFBench reveals substantial room for improvement in constraints following, and we further investigate influencing factors and enhancement strategies. The data and code are publicly available at https://github.com/PKU-Baichuan-MLSystemLab/CFBench

翻訳日:2024-08-05 13:57:23 公開日:2024-08-02

# レーザー誘起アライメントにおける核四極子結合効果の分子的影響

Molecular influence on nuclear-quadrupole-coupling effects in laser induced alignment ( http://arxiv.org/abs/2408.01125v1 )

ライセンス: Link先を確認

Linda V. Thesing, Andrey Yachmenev, Rosario González-Férez, Jochen Küpper,

(参考訳) 我々は、異なる非対称トップ分子のフィールドフリーインパルスアライメントに対する核四極子相互作用の効果について検討した。解析は超微細構造と回転エネルギー構造の影響に焦点をあてる。これらは核スピンの数、回転定数、核スピンと外界相互作用に関与するテンソルの対称性に依存する。レーザーパルス後のスピン回転力学において, 原型大核スピン分子のヨードベンゼン, 1,2-ジオドベンゼン, 1,3-ジオドベンゼンおよび2,5-ジオドベンゾニトリルと比較し, 回転エネルギー分裂に対する超微細分裂の大きさが重要な役割を担っていることを示した。さらに, 高励起回転状態が動力学を支配下に置くと, 四重極結合が回転力学に与える影響が減少することを示した。

We studied the effect of nuclear-quadrupole interactions on the field-free impulsive alignment of different asymmetric-top molecules. Our analysis is focused on the influence of the hyperfine- and rotational-energy-level structures. These depend on the number of nuclear spins, the rotational constants, and the symmetry of the tensors involved in the nuclear spin and external field interactions. Comparing the prototypical large-nuclear-spin molecules iodobenzene, 1,2-diiodobenzene, 1,3-diiodobenzene, and 2,5-diiodobenzonitrile, we demonstrate that the magnitude of the hyperfine splittings compared to the rotational-energy splittings plays a crucial role in the spin-rotational dynamics after the laser pulse. Moreover, we point out that the impact of the quadrupole coupling on the rotational dynamics decreases when highly excited rotational states dominate the dynamics.

翻訳日:2024-08-05 13:57:23 公開日:2024-08-02

# IG-SLAM:インスタントガウスSLAM

IG-SLAM: Instant Gaussian SLAM ( http://arxiv.org/abs/2408.01126v1 )

ライセンス: Link先を確認

Furkan Aykut Sarikamis, Abdullah Aydin Alatan,

(参考訳) 3D Gaussian Splattingは、最近、神経暗黙の表現に対するSLAMシステムにおける代替のシーン表現として有望な結果を示している。しかしながら、現在の手法では、マッピングプロセスを監視するための深度マップが欠落しているか、環境の規模を考慮した詳細なトレーニングデザインが欠落している。これらの欠点に対処するため,高密度RGBのみのSLAMシステムであるIG-SLAMを提案する。環境の3次元マップは、トラッキングによって提供される正確なポーズと密集した深さを用いて構築される。さらに,マップ最適化における深度不確実性を利用して3次元再構成を改善する。写像最適化における我々の崩壊戦略は収束を高め、単一のプロセスで10 fpsでシステムを実行することを可能にする。我々は、最先端のRGBのみのSLAMシステムと競合する性能を示し、高速な動作速度を実現する。本稿では、Replica、TUM-RGBD、ScanNet、EuRoCデータセットについて実験を行った。このシステムは、特にEuRoCデータセットにおいて、大規模なシーケンスで、フォトリアリスティックな3D再構成を実現する。

3D Gaussian Splatting has recently shown promising results as an alternative scene representation in SLAM systems to neural implicit representations. However, current methods either lack dense depth maps to supervise the mapping process or detailed training designs that consider the scale of the environment. To address these drawbacks, we present IG-SLAM, a dense RGB-only SLAM system that employs robust Dense-SLAM methods for tracking and combines them with Gaussian Splatting. A 3D map of the environment is constructed using accurate pose and dense depth provided by tracking. Additionally, we utilize depth uncertainty in map optimization to improve 3D reconstruction. Our decay strategy in map optimization enhances convergence and allows the system to run at 10 fps in a single process. We demonstrate competitive performance with state-of-the-art RGB-only SLAM systems while achieving faster operation speeds. We present our experiments on the Replica, TUM-RGBD, ScanNet, and EuRoC datasets. The system achieves photo-realistic 3D reconstruction in large-scale sequences, particularly in the EuRoC dataset.

翻訳日:2024-08-05 13:57:23 公開日:2024-08-02

# マンバのサーベイ

A Survey of Mamba ( http://arxiv.org/abs/2408.01129v1 )

ライセンス: Link先を確認

Haohao Qu, Liangbo Ning, Rui An, Wenqi Fan, Tyler Derr, Xin Xu, Qing Li,

(参考訳) ディープラーニングは重要な技術であり、人工知能に顕著な革命をもたらした。最も代表的なアーキテクチャとして、トランスフォーマーは多くの高度なモデル、特に数十億のパラメータからなる大規模言語モデルが強化され、ディープラーニングの基盤となっている。素晴らしい成果にもかかわらず、トランスフォーマーは依然として固有の制限に直面しており、特に注意計算の2次計算の複雑さから生じる時間を要する推論である。近年、古典的状態空間モデルからインスピレーションを得た新しいアーキテクチャであるMambaが、基盤モデル構築のための有望な代替として登場し、トランスフォーマーに匹敵するモデリング能力を提供しながら、シーケンス長に関するほぼ直線的スケーラビリティを保っている。このことが、様々な領域で印象的なパフォーマンスを達成するためのマンバの可能性を積極的に探究する研究を活発に進めるきっかけとなった。このような急速な進化を考えると、既存のマンバ駆動モデルを統合する体系的なレビューが不可欠であり、この新たなモデルアーキテクチャの包括的理解を提供する。そこで本研究では,近年のマンバ関連研究を詳細に調査し,マンバモデルの発展,さまざまなデータにマンバを適応させる技術,およびマンバが卓越できる応用の3つの側面から取り上げる。具体的には,まず,様々な代表的深層学習モデルの基礎知識と,マンバの詳細を予備研究として思い出す。そこで,本研究では,Mambaのアーキテクチャ設計,データ適応性,アプリケーションに焦点をあてた,Mambaの意義を概観する。最後に,現状の限界について論じ,将来的な研究の方向性を探究し,今後の研究に深い洞察を与える。

Deep learning, as a vital technique, has sparked a notable revolution in artificial intelligence. As the most representative architecture, Transformers have empowered numerous advanced models, especially the large language models that comprise billions of parameters, becoming a cornerstone in deep learning. Despite the impressive achievements, Transformers still face inherent limitations, particularly the time-consuming inference resulting from the quadratic computation complexity of attention calculation. Recently, a novel architecture named Mamba, drawing inspiration from classical state space models, has emerged as a promising alternative for building foundation models, delivering comparable modeling abilities to Transformers while preserving near-linear scalability concerning sequence length. This has sparked an increasing number of studies actively exploring Mamba's potential to achieve impressive performance across diverse domains. Given such rapid evolution, there is a critical need for a systematic review that consolidates existing Mamba-empowered models, offering a comprehensive understanding of this emerging model architecture. In this survey, we therefore conduct an in-depth investigation of recent Mamba-associated studies, covering from three main aspects: the advancements of Mamba-based models, the techniques of adapting Mamba to diverse data, and the applications where Mamba can excel. Specifically, we first recall the foundational knowledge of various representative deep learning models and the details of Mamba as preliminaries. Then, to showcase the significance of Mamba, we comprehensively review the related studies focusing on Mamba models' architecture design, data adaptability, and applications. Finally, we present an discussion of current limitations and explore various promising research directions to provide deeper insights for future investigations.

翻訳日:2024-08-05 13:57:23 公開日:2024-08-02

# 自動プログラム修復におけるプログラム削減の効果

The Impact of Program Reduction on Automated Program Repair ( http://arxiv.org/abs/2408.01134v1 )

ライセンス: Link先を確認

Linas Vidziunas, David Binkley, Leon Moonen,

(参考訳) 最新の自動プログラム修正(APR)を使ってバグを修正することは、時間とリソースを消費する可能性がある。本稿では,現代のAPRツールのスケーラビリティ向上を目的としたプログラム修復手法について述べる。このアプローチでは、プログラムスライシングの形式でプログラムの削減を活用して、バグの修正に関係のないコードを排除することにより、APRツール全体のパフォーマンスが向上する。本研究では,スライシングが修復プロセスの3つの段階,すなわち障害局所化,パッチ生成,パッチ検証に与える影響について検討する。実験により,提案手法は平均してTBar APRツールの補修能力を高めるが,成功しなかった事例もいくつか見いだされた。特に、広く使われているDefects4Jデータセットの例では、中央値の修理時間を大幅に短縮し、80分から18分未満に低下する。プログラムの削減は修理品質を劣化させることなくAPRの性能を向上させることができるが、この改善は普遍的ではない。 Zenodoのレプリケーションパッケージはhttps://doi.org/10.5281/zenodo.13074333で公開されている。キーワード: プログラムの自動修復、動的プログラムスライシング、フォールトローカライゼーション、テストスーツリダクション、ハイブリッド技術。

Correcting bugs using modern Automated Program Repair (APR) can be both time-consuming and resource-expensive. We describe a program repair approach that aims to improve the scalability of modern APR tools. The approach leverages program reduction in the form of program slicing to eliminate code irrelevant to fixing the bug, which improves the APR tool's overall performance. We investigate slicing's impact on all three phases of the repair process: fault localization, patch generation, and patch validation. Our empirical exploration finds that the proposed approach, on average, enhances the repair ability of the TBar APR tool, but we also discovered a few cases where it was less successful. Specifically, on examples from the widely used Defects4J dataset, we obtain a substantial reduction in median repair time, which falls from 80 minutes to just under 18 minutes. We conclude that program reduction can improve the performance of APR without degrading repair quality, but this improvement is not universal. A replication package is available via Zenodo at https://doi.org/10.5281/zenodo.13074333. Keywords: automated program repair, dynamic program slicing, fault localization, test-suite reduction, hybrid techniques.

翻訳日:2024-08-05 13:57:23 公開日:2024-08-02

# PGNeXt:ピラミッドグラフトネットワークによる高分解能塩性物体検出

PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network ( http://arxiv.org/abs/2408.01137v1 )

ライセンス: Link先を確認

Changqun Xia, Chenxi Xie, Zhentao He, Tianshu Yu, Jia Li,

(参考訳) 本稿では、データセットとネットワークフレームワークの両方の観点から、より難解な高分解能サルエントオブジェクト検出(HRSOD)について述べる。 HRSODデータセットの欠如を補うため、4K-8K解像度で現実の複雑なシナリオから5,920枚の画像を含む、UHRSDと呼ばれる大規模高解像度の高分解能物体検出データセットを慎重に収集した。すべての画像はピクセルレベルで微妙にアノテートされ、以前の低解像度のSODデータセットをはるかに上回っている。従来の手法では,サンプリング深度と受容場の大きさの矛盾を克服することを目的として,ピラミッドグラフト機構を用いたHR-SODタスクのための新しい一段階フレームワークを提案する。一般に、変換器ベースとCNNベースのバックボーンを用いて、異なる解像度画像から特徴を独立に抽出し、これらの特徴を変換器ブランチからCNNブランチに移植する。 CNNブランチが、デコード処理中に異なるソース機能によってガイドされる、壊れた詳細情報をより公平に組み合わせられるように、アテンションベースのクロスモデルグラフティングモジュール(CMGM)が提案されている。さらに,CMGMによるアテンション行列を明示的に監視し,ネットワークが異なるブランチからのアテンションとよりよく対話できるように,AGL(Atention Guided Loss)を設計する。 UHRSDと広く使用されているSODデータセットに関する総合的な実験により、我々の手法は、有能なオブジェクトを同時に検出し、リッチな詳細を保存し、最先端の手法より優れていることを示す。提案するフレームワークの一般化能力を検証するために,COD(camouflaged object detection)タスクに適用する。特に, ベルやホイッスルを使わずに, 最先端のCOD法よりも優れた性能を発揮する。

We present an advanced study on more challenging high-resolution salient object detection (HRSOD) from both dataset and network framework perspectives. To compensate for the lack of HRSOD dataset, we thoughtfully collect a large-scale high resolution salient object detection dataset, called UHRSD, containing 5,920 images from real-world complex scenarios at 4K-8K resolutions. All the images are finely annotated in pixel-level, far exceeding previous low-resolution SOD datasets. Aiming at overcoming the contradiction between the sampling depth and the receptive field size in the past methods, we propose a novel one-stage framework for HR-SOD task using pyramid grafting mechanism. In general, transformer-based and CNN-based backbones are adopted to extract features from different resolution images independently and then these features are grafted from transformer branch to CNN branch. An attention-based Cross-Model Grafting Module (CMGM) is proposed to enable CNN branch to combine broken detailed information more holistically, guided by different source feature during decoding process. Moreover, we design an Attention Guided Loss (AGL) to explicitly supervise the attention matrix generated by CMGM to help the network better interact with the attention from different branches. Comprehensive experiments on UHRSD and widely-used SOD datasets demonstrate that our method can simultaneously locate salient object and preserve rich details, outperforming state-of-the-art methods. To verify the generalization ability of the proposed framework, we apply it to the camouflaged object detection (COD) task. Notably, our method performs superior to most state-of-the-art COD methods without bells and whistles.

翻訳日:2024-08-05 13:57:23 公開日:2024-08-02

# Axiomatic Spectral Importance Decomposition を用いた画像モデルの大域的摂動ロバスト性の解析

Interpreting Global Perturbation Robustness of Image Models using Axiomatic Spectral Importance Decomposition ( http://arxiv.org/abs/2408.01139v1 )

ライセンス: Link先を確認

Róisín Luo, James McDermott, Colm O'Riordan,

(参考訳) 摂動堅牢性は、データ破損や敵攻撃など、さまざまな摂動から生じるモデルの脆弱性を評価する。摂動堅牢性のメカニズムを理解することは、大域的解釈可能性にとって重要である。本稿では,画像モデルの摂動ロバスト性を理解するために,モデルに依存しない大域的機械論的解釈法を提案する。この研究は2つの重要な側面によって動機付けられている。第一に、従来のグローバルな解釈可能性の研究は、例えば、画像モデル内での摂動堅牢性のメカニズムを直接解釈するようには設計されていない。第2に、摂動自然画像のスペクトル信号-雑音比(SNR)が周波数上で指数関数的に減衰していることに気づく。低周波信号は一般的に高周波信号よりも強いが、低周波信号だけでは高い分類精度は達成できない。本手法は,Shapley値理論の適用により,情報理論フレームワーク内でのロバストな特徴と非ロバストな特徴の予測力を軸に定量化する。提案手法は, モデルロバストネス機構について, モデルロバストネス機構に関するユニークな知見を提供する。我々は、ImageNet上で事前訓練された様々な視覚モデルに対して広範な実験を行い、 \textbf{I-ASIDE} が摂動ロバスト性だけでなく、そのメカニズムの \textbf{provide 解釈も可能であることを示す。

Perturbation robustness evaluates the vulnerabilities of models, arising from a variety of perturbations, such as data corruptions and adversarial attacks. Understanding the mechanisms of perturbation robustness is critical for global interpretability. We present a model-agnostic, global mechanistic interpretability method to interpret the perturbation robustness of image models. This research is motivated by two key aspects. First, previous global interpretability works, in tandem with robustness benchmarks, e.g. mean corruption error (mCE), are not designed to directly interpret the mechanisms of perturbation robustness within image models. Second, we notice that the spectral signal-to-noise ratios (SNR) of perturbed natural images exponentially decay over the frequency. This power-law-like decay implies that: Low-frequency signals are generally more robust than high-frequency signals -- yet high classification accuracy can not be achieved by low-frequency signals alone. By applying Shapley value theory, our method axiomatically quantifies the predictive powers of robust features and non-robust features within an information theory framework. Our method, dubbed as \textbf{I-ASIDE} (\textbf{I}mage \textbf{A}xiomatic \textbf{S}pectral \textbf{I}mportance \textbf{D}ecomposition \textbf{E}xplanation), provides a unique insight into model robustness mechanisms. We conduct extensive experiments over a variety of vision models pre-trained on ImageNet to show that \textbf{I-ASIDE} can not only \textbf{measure} the perturbation robustness but also \textbf{provide interpretations} of its mechanisms.

翻訳日:2024-08-05 13:57:23 公開日:2024-08-02

# 非ブロックバンドの機械学習トポロジカルエネルギーブレイディング

Machine learning topological energy braiding of non-Bloch bands ( http://arxiv.org/abs/2408.01141v1 )

ライセンス: Link先を確認

Shuwei Shi, Shibing Chu, Yuee Xie, Yuanping Chen,

(参考訳) 機械学習は様々な物理系の相転移を識別するために使われてきた。しかしながら、非エルミート系における非ブロックエネルギーブレイディングに関する関連する研究はいまだに存在しない。本研究では,教師なしおよび教師なし手法を用いて,一次元非エルミート系における非ブロックエネルギーのブレイディングについて検討する。教師なし学習では、拡散マップを用いて、事前の知識なしに非ブロックエネルギーブレイディングを識別し、それをk平均と組み合わせて異なる位相要素をアンリンクやホップリンクのようなクラスタにクラスタ化する。教師付き学習では、Blochエネルギーデータに基づく畳み込みニューラルネットワーク(CNN)を訓練し、Blochエネルギーブレイディングだけでなく、100%の精度で非Blochエネルギーブレイディングを予測する。 CNNを解析することにより、ネットワークがエネルギーバンドのブレイディングトポロジを認識できることを確認することができる。本研究では,非エルミート位相位相とエネルギーブレイディングの同定における機械学習の可能性を示す。

Machine learning has been used to identify phase transitions in a variety of physical systems. However, there is still a lack of relevant research on non-Bloch energy braiding in non-Hermitian systems. In this work, we study non-Bloch energy braiding in one-dimensional non-Hermitian systems using unsupervised and supervised methods. In unsupervised learning, we use diffusion maps to successfully identify non-Bloch energy braiding without any prior knowledge and combine it with k-means to cluster different topological elements into clusters, such as Unlink and Hopf link. In supervised learning, we train a Convolutional Neural Network (CNN) based on Bloch energy data to predict not only Bloch energy braiding but also non-Bloch energy braiding with an accuracy approaching 100%. By analysing the CNN, we can ascertain that the network has successfully acquired the ability to recognise the braiding topology of the energy bands. The present study demonstrates the considerable potential of machine learning in the identification of non-Hermitian topological phases and energy braiding.

翻訳日:2024-08-05 13:47:29 公開日:2024-08-02

# 高度機械学習を用いた外傷性脳損傷患者の換気器関連肺炎の予測

Enhanced Prediction of Ventilator-Associated Pneumonia in Patients with Traumatic Brain Injury Using Advanced Machine Learning Techniques ( http://arxiv.org/abs/2408.01144v1 )

ライセンス: Link先を確認

Negin Ashrafi, Armin Abdollahi, Maryam Pishgar,

(参考訳) 背景: 外傷性脳損傷(TBI)患者の呼吸器関連肺炎(VAP)は、重大な死亡リスクを生じ、患者や医療システムにかなりの経済的負担を課す。 TBI患者のVAPのタイムリーな検出と予後は、患者の予後を改善し、医療資源の負担を軽減するために重要である。方法:MIMIC-IIIデータベースを用いて6つの機械学習モデルを実装した。提案手法には,CatBoostを用いた機能選択や専門家の意見,SMOTE(Synthetic Minority Oversampling Technique)とのクラス不均衡への対処,5倍のクロスバリデーションによる厳密なモデルチューニング,ハイパーパラメータの最適化など,事前処理のステップが含まれていた。評価された主要なモデルは、SVM、ロジスティック回帰、ランダムフォレスト、XGBoost、ANN、AdaBoostである。さらに,特徴量を決定するためにSHAP解析を行い,モデル性能に影響を及ぼす特徴量を評価するためのアブレーション試験を行った。結果: XGBoostはベースラインモデルと既存の最高の文献を上回りました。 AUC、正確性、特異性、感度、F1スコア、PV、NPVといったメトリクスを使用しました。 XGBoostは、AUCが0.940、精度が0.875で、AUCが0.706、精度が0.640で、既存の文献よりも23.4%、23.5%高い性能を示した。この性能向上は、臨床環境でのモデルの有効性を裏付けるものである。結論: 本研究は, TBI患者におけるVAPの予測モデルを強化し, 早期発見と介入の可能性を向上させる。改良された特徴選択と高度なアンサンブル技術は、モデル精度と信頼性を著しく向上させ、将来の臨床応用と医療診断研究に有望な方向性を提供した。

Background: Ventilator-associated pneumonia (VAP) in traumatic brain injury (TBI) patients poses a significant mortality risk and imposes a considerable financial burden on patients and healthcare systems. Timely detection and prognostication of VAP in TBI patients are crucial to improve patient outcomes and alleviate the strain on healthcare resources. Methods: We implemented six machine learning models using the MIMIC-III database. Our methodology included preprocessing steps, such as feature selection with CatBoost and expert opinion, addressing class imbalance with the Synthetic Minority Oversampling Technique (SMOTE), and rigorous model tuning through 5-fold cross-validation to optimize hyperparameters. Key models evaluated included SVM, Logistic Regression, Random Forest, XGBoost, ANN, and AdaBoost. Additionally, we conducted SHAP analysis to determine feature importance and performed an ablation study to assess feature impacts on model performance. Results: XGBoost outperformed the baseline models and the best existing literature. We used metrics, including AUC, Accuracy, Specificity, Sensitivity, F1 Score, PPV, and NPV. XGBoost demonstrated the highest performance with an AUC of 0.940 and an Accuracy of 0.875, which are 23.4% and 23.5% higher than the best results in the existing literature, with an AUC of 0.706 and an Accuracy of 0.640, respectively. This enhanced performance underscores the models' effectiveness in clinical settings. Conclusions: This study enhances the predictive modeling of VAP in TBI patients, improving early detection and intervention potential. Refined feature selection and advanced ensemble techniques significantly boosted model accuracy and reliability, offering promising directions for future clinical applications and medical diagnostics research.

翻訳日:2024-08-05 13:47:29 公開日:2024-08-02

# 無調波振動子のエネルギースペクトルの高精度解析モデル

Accurate Analytic Model for the Energy Spectrum of the Anharmonic Oscillator ( http://arxiv.org/abs/2408.01146v1 )

ライセンス: Link先を確認

Michel Caffarel,

(参考訳) 最近の研究で、経路積分形式を用いたクォート発振器の分配関数の解析式を導出した。非常に顕著なことに、自由エネルギーは温度とクォート結合定数全体の数パーセントまで正確であることが判明した。さらに、正確な分割関数の重要な特徴が再現された。基底および第一励起状態エネルギーの正確な解析式を$g$関数として導出した。本研究では、全エネルギースペクトルの計算に結果を拡張する。また, 性的・咬合的結合を有する非調和振動子の場合には, クォート振動子の研究を一般化する。発見されたエネルギー準位は、ここで考慮されたすべての結合と主量子数(最大$n=8$)に対して正確であり、このモデル分割関数が正確な結合のよい忠実な近似であると確認する。

In a recent work we have derived an analytic expression for the partition function of the quartic oscillator using a path integral formalism. Quite remarkably, the free energy was found to be accurate to a few percent over the entire range of temperatures and quartic coupling constant. In addition, the key features of the exact partition function were successfully reproduced. Accurate analytic expressions for the ground- and first-excited state energies as function of $g$ were derived. In this work, we extend our results to the calculation of the full energy spectrum. We also generalize our study of the quartic oscillator to the case of the anharmonic oscillator with sextic and octic couplings. The energy levels found are accurate for all couplings and principal quantum numbers considered here (up to $n=8$), confirming this model partition function as a good and faithful approximation of the exact one.

翻訳日:2024-08-05 13:47:29 公開日:2024-08-02

# 補助的非量子チャネルを持たない絡み合った光子による情報伝達

Information transfer by entangled photons without auxiliary non-quantum channel ( http://arxiv.org/abs/2408.01150v1 )

ライセンス: Link先を確認

Levente Szabó, Pál Maák,

(参考訳) 本稿では, 絡み合った光子を用いた光通信の高速化に関する理論的解析を行う。我々は、絡み合った光子対同士の直接情報伝達の問題を解くことができる設計を解析する。我々は、実験的な検証がこれを確認または否定できると考えている。我々の仮説は、ノコミュニケーション定理のほとんどの証明は、ある条件のセットに基づいており、古典的なチャネルを使わずに、量子情報チャネルとして絡み合った状態の確立を可能にする、より広い条件のセットを提供することができるというものである。提案設計の1つの基本単位は、絡み合った光子対の1つの部材の偏光状態を空間重畳状態に変換する。これにより、絡みをなくした一方の部材上での偏光測定の後、他方の部材の空間重畳状態に量子情報を保持する。これは空間干渉に基づく特定の測定によって回復することができる。我々は、いわゆる対称関数の解が、非コミュニケーション定理に対応する平均的な結果をもたらすことを示した。しかし、非対称関数を用いることで、所定の時間窓で算出した平均測定結果を、ペアの他部材で行った測定のタイプを区別することができる。これにより、特定の条件下でのより高速な情報共有を可能にする通信コードを確立することができる。量子力学的非局所性原理の重要な拡張である。

In this paper we present a theoretical analysis of the faster than light communication possibility based on entangled photons. We analyze designs that may be capable to solve the problem of direct information transfer between members of an entangled photon pairs. We consider that experimental verifications can confirm or even refute this. Our hypothesis was that most proofs of the nocommunication theorem are based on a certain set of conditions, and it is possible to provide a broader set of conditions that allow the establishment of entangled states as quantum information channels, without using a classical channel. One basic unit of the proposed design transforms the polarization state of one member of an entangled photon pair into a spatial superposition state. Thus, after the polarization measurement performed on one member, which eliminates the entanglement, the quantum information is maintained in the spatial superposition state of the other member. This can be recovered by a particular measurement based on spatial interference. We have shown that solutions with so-called symmetric functions lead to average results that corresponds to the nocommunication theorem. However, using asymmetric functions the averaged measurement results calculated in a prescribed time window can distinguish the types of measurements performed on the other member of the pair. This can establish a communication code that enables faster-than-light information sharing under specific conditions. There may be also further theoretical consequences: a significant extension of the quantum mechanical nonlocality principle.

翻訳日:2024-08-05 13:47:29 公開日:2024-08-02

# DERA:知識グラフにおけるエンティティアライメントのためのエンティティ検索

DERA: Dense Entity Retrieval for Entity Alignment in Knowledge Graphs ( http://arxiv.org/abs/2408.01154v1 )

ライセンス: Link先を確認

Zhichun Wang, Xuan Chen,

(参考訳) エンティティアライメント(EA)は、知識の融合と統合に不可欠な、異なる知識グラフ(KG)の同等のエンティティをマッチングすることを目的としている。近年,埋め込み型EAが注目され,多くのアプローチが提案されている。初期のアプローチは主に、関係三重項によって定義されるKGの構造的特徴からエンティティの埋め込みを学ぶことに焦点を当てていた。その後の手法では、EAの埋め込みを強化する補助情報としてエンティティの名前と属性が組み込まれた。しかし、これらの手法は構造情報と属性情報をエンコードするためにしばしば異なる手法を使用しており、相互の相互作用と相互強化を制限している。本研究では,言語モデルを用いて,エンティティの様々な特徴を一様にエンコードし,KG間で最も近いエンティティ検索を容易にする,EAのための密度の高いエンティティ検索フレームワークを提案する。アライメント候補はまずエンティティ検索によって生成され、最後にアライメントを決定するためにリランクされる。我々は,従来のEA手法と比較して,我々のアプローチが最先端のパフォーマンスを達成することを実証し,言語間および単言語間のEAデータセットに関する包括的な実験を行った。

Entity Alignment (EA) aims to match equivalent entities in different Knowledge Graphs (KGs), which is essential for knowledge fusion and integration. Recently, embedding-based EA has attracted significant attention and many approaches have been proposed. Early approaches primarily focus on learning entity embeddings from the structural features of KGs, defined by relation triples. Later methods incorporated entities' names and attributes as auxiliary information to enhance embeddings for EA. However, these approaches often used different techniques to encode structural and attribute information, limiting their interaction and mutual enhancement. In this work, we propose a dense entity retrieval framework for EA, leveraging language models to uniformly encode various features of entities and facilitate nearest entity search across KGs. Alignment candidates are first generated through entity retrieval, which are subsequently reranked to determine the final alignments. We conduct comprehensive experiments on both cross-lingual and monolingual EA datasets, demonstrating that our approach achieves state-of-the-art performance compared to existing EA methods.

翻訳日:2024-08-05 13:47:29 公開日:2024-08-02

# フェルミオンガウス状態から行列生成状態への効率的な変換

Efficient conversion from fermionic Gaussian states to matrix product states ( http://arxiv.org/abs/2408.01155v1 )

ライセンス: Link先を確認

Tong Liu, Ying-Hai Wu, Hong-Hao Tu, Tao Xiang,

(参考訳) フェルミオンガウス状態は二次ハミルトンの固有状態であり、量子多体問題において広く用いられる。フェミオンガウス状態から行列積状態に変換する高効率なアルゴリズムを提案する。翻訳不変性のない有限サイズ系に対しては定式化できるが、翻訳不変性を持つ無限系に適用すると特に魅力的になる。無限のシリンダー上の位相的に順序付けられた系の基底状態が行列積状態として表されるとき、転移行列の固定点は、極小絡み合った状態としても知られるエノン固有基底(英語版)( anyon eigenbasis)をフィルタリングするために利用することができる。これにより、絡み合いスペクトルやモジュラ行列のような普遍的性質の効率的な計算が可能になる。本手法のポテンシャルは, ボーソニックなラウリン状態とムーア-リード状態の位相秩序を持つ2つのキラルスピン液体の数値計算によって示される。最初のeigenbasisは以前検討され、有用なベンチマークとして役立ちます。しかし、第2の固有ベイジは透明ではなく、その構造が成功したことは、我々の方法の非自明な腐食をもたらす。

Fermionic Gaussian states are eigenstates of quadratic Hamiltonians and widely used in quantum many-body problems. We propose a highly efficient algorithm that converts fermionic Gaussian states to matrix product states. It can be formulated for finite-size systems without translation invariance, but becomes particularly appealing when applied to infinite systems with translation invariance. If the ground states of a topologically ordered system on infinite cylinders are expressed as matrix product states, then the fixed points of the transfer matrix can be harnessed to filter out the anyon eigenbasis, also known as minimally entangled states. This allows for efficient computation of universal properties such as entanglement spectrum and modular matrices. The potential of our method is demonstrated by numerical calculations in two chiral spin liquids that have the same topological orders as the bosonic Laughlin and Moore-Read states, respectively. The anyon eigenbasis for the first one has been worked out before and serves as a useful benchmark. The anyon eigenbasis of the second one is, however, not transparent and its successful construction provides a nontrivial corroboration of our method.

翻訳日:2024-08-05 13:47:29 公開日:2024-08-02

# TCR-GPT:T細胞受容体レパートリー生成のための自己回帰モデルと強化学習の統合

TCR-GPT: Integrating Autoregressive Model and Reinforcement Learning for T-Cell Receptor Repertoires Generation ( http://arxiv.org/abs/2408.01156v1 )

ライセンス: Link先を確認

Yicheng Lin, Dandan Zhang, Yun Liu,

(参考訳) T細胞受容体(TCR)は、感染またはがん細胞によって提示される特定の抗原を認識し、結合することによって免疫系において重要な役割を担っている。 TCRの配列パターンを理解することは、標的となる免疫療法を開発し、効果的なワクチンを設計するのに不可欠である。自動回帰変換器のような言語モデルは、TCRレパートリーの確率分布を学習し、レパートリーの基本パターンを継承する新しいTCRシーケンスを生成することにより、この問題に対する強力な解決策を提供する。本稿では,デコーダのみのトランスアーキテクチャ上に構築された確率モデルTCR-GPTを紹介する。 TCR-GPTはピアソン相関係数によって測定されたシーケンス確率分布の推定において0.953の精度を示す。さらに, 強化学習(Reinforcement Learning, RL)を活用することで, TCR配列の分布を, 特定のペプチドを認識できるTCRの生成に適用し, 標的とする免疫療法やワクチン開発に有意義な可能性を示唆した。 RLの有効性により、微調整されたTCR-GPTモデルにより、特定のペプチドに結合する可能性のあるTCRレパートリーを生産する能力が示され、生物学的に関連するTCR配列の確率分布へのモデルの適応性を高める上でRLの効率が示された。

T-cell receptors (TCRs) play a crucial role in the immune system by recognizing and binding to specific antigens presented by infected or cancerous cells. Understanding the sequence patterns of TCRs is essential for developing targeted immune therapies and designing effective vaccines. Language models, such as auto-regressive transformers, offer a powerful solution to this problem by learning the probability distributions of TCR repertoires, enabling the generation of new TCR sequences that inherit the underlying patterns of the repertoire. We introduce TCR-GPT, a probabilistic model built on a decoder-only transformer architecture, designed to uncover and replicate sequence patterns in TCR repertoires. TCR-GPT demonstrates an accuracy of 0.953 in inferring sequence probability distributions measured by Pearson correlation coefficient. Furthermore, by leveraging Reinforcement Learning(RL), we adapted the distribution of TCR sequences to generate TCRs capable of recognizing specific peptides, offering significant potential for advancing targeted immune therapies and vaccine development. With the efficacy of RL, fine-tuned pretrained TCR-GPT models demonstrated the ability to produce TCR repertoires likely to bind specific peptides, illustrating RL's efficiency in enhancing the model's adaptability to the probability distributions of biologically relevant TCR sequences.

翻訳日:2024-08-05 13:47:29 公開日:2024-08-02

# 運動野を用いたボリューム医用画像のロバストカーブ検出

Robust Curve Detection in Volumetric Medical Imaging via Attraction Field ( http://arxiv.org/abs/2408.01159v1 )

ライセンス: Link先を確認

Farukh Yaushev, Daria Nogina, Valentin Samokhin, Mariya Dugova, Ekaterina Petrash, Dmitry Sevryukov, Mikhail Belyaev, Maxim Pisov,

(参考訳) 身体部分の幾何学を理解することは、正確な診断に不可欠である。カーブは解剖学的構造を効果的に記述し、心血管疾患、呼吸障害、骨格疾患に関連する医療画像の分野で広く用いられている。従来の曲線検出手法は、しばしばタスク固有のものであり、ドメイン固有の特徴に大きく依存し、適用範囲を制限している。本稿では, 物体の向き, 形状, 位置に関する事前の知識を必要としない非分岐曲線の検出手法を提案する。提案手法は,(1)サブピクセル精度を提供するアトラクション場,(2)関心領域を制限し,所望の曲線から外れたアウトリーチを本質的に排除するクローズネスマップをニューラルネットワークで予測する。各種形態の異なるいくつかの臨床的タスクに対して曲線検出器を試験し,既存の手法を超越した印象的なサブピクセルレベルの精度を達成し,その汎用性と堅牢性を強調した。さらに、この分野でさらなる進歩をサポートするために、大動脈中心線とマスクのプライベートアノテーションを提供し、将来の研究のベンチマークとして機能する。データセットはhttps://github.com/neuro-ml/curve-detectionで見ることができる。

Understanding body part geometry is crucial for precise medical diagnostics. Curves effectively describe anatomical structures and are widely used in medical imaging applications related to cardiovascular, respiratory, and skeletal diseases. Traditional curve detection methods are often task-specific, relying heavily on domain-specific features, limiting their broader applicability. This paper introduces a novel approach for detecting non-branching curves, which does not require prior knowledge of the object's orientation, shape, or position. Our method uses neural networks to predict (1) an attraction field, which offers subpixel accuracy, and (2) a closeness map, which limits the region of interest and essentially eliminates outliers far from the desired curve. We tested our curve detector on several clinically relevant tasks with diverse morphologies and achieved impressive subpixel-level accuracy results that surpass existing methods, highlighting its versatility and robustness. Additionally, to support further advancements in this field, we provide our private annotations of aortic centerlines and masks, which can serve as a benchmark for future research. The dataset can be found at https://github.com/neuro-ml/curve-detection.

翻訳日:2024-08-05 13:47:29 公開日:2024-08-02

# PreMix: バッチ内スライドミキシングによる事前トレーニングによるデジタル病理学における複数インスタンス学習の促進

PreMix: Boosting Multiple Instance Learning in Digital Histopathology through Pre-training with Intra-Batch Slide Mixing ( http://arxiv.org/abs/2408.01162v1 )

ライセンス: Link先を確認

Bryan Wong, Mun Yong Yi,

(参考訳) 高分解能スキャナーを用いて得られた組織スライドのデジタル表現であるギガピクセルサイズの全スライド画像(WSI)の分類は、きめ細かなラベリングの細部と時間的特性に関連する重大な課題に直面している。弱い教師付き多重インスタンス学習(MIL)が有望なアプローチとして登場したが、現在のMILメソッドは、ラベルのないWSIに埋め込まれた豊富な情報を活用できる能力に制限されている。この制限は、しばしば、特徴抽出プロセス後のスクラッチからMIL機能アグリゲータを訓練し、効率と精度を阻害する。 PreMixは、MILアグリゲータをバッチ内スライド混合アプローチで事前トレーニングすることで、一般的なMILフレームワークを拡張している。具体的には、PreMixは事前トレーニング中にBarlow Twins Slide Mixingを導入し、様々なWSIサイズを扱う能力を高め、ラベルなしWSIの有用性を最大化します。微調整中にMixupとManifold Mixupと組み合わせることで、PreMixはCamelyon16データセット上の階層画像ピラミッドトランスフォーマー(HIPT)のベースラインMILフレームワークよりも4.7%パフォーマンスが向上した。さまざまなアクティブな学習獲得機能とWSIラベルのトレーニング予算による改善は、さまざまなデータセットへのフレームワークの適応性と、さまざまなリソース制約を強調します。最終的にPreMixは、限られたWSIラベル付きデータセットの下で、より効率的で正確なWSI分類の道を開いた。コードはhttps://anonymous.4open.science/r/PreMixで公開されている。

The classification of gigapixel-sized whole slide images (WSIs), digital representations of histological slides obtained via a high-resolution scanner, faces significant challenges associated with the meticulous and time-consuming nature of fine-grained labeling. While weakly-supervised multiple instance learning (MIL) has emerged as a promising approach, current MIL methods are constrained by their limited ability to leverage the wealth of information embedded within unlabeled WSIs. This limitation often necessitates training MIL feature aggregators from scratch after the feature extraction process, hindering efficiency and accuracy. PreMix extends the general MIL framework by pre-training the MIL aggregator with an intra-batch slide mixing approach. Specifically, PreMix incorporates Barlow Twins Slide Mixing during pre-training, enhancing its ability to handle diverse WSI sizes and maximizing the utility of unlabeled WSIs. Combined with Mixup and Manifold Mixup during fine-tuning, PreMix achieves a mean of 4.7% performance improvement over the baseline MIL framework, the hierarchical image pyramid transformer (HIPT), on the Camelyon16 dataset. The observed improvement across a range of active learning acquisition functions and WSI-labeled training budgets highlights the framework's adaptability to diverse datasets and varying resource constraints. Ultimately, PreMix paves the way for more efficient and accurate WSI classification under limited WSI-labeled datasets, encouraging the broader adoption of unlabeled WSI data in histopathological research. The code is available at https://anonymous.4open.science/r/PreMix

翻訳日:2024-08-05 13:47:29 公開日:2024-08-02

# ドメイン適応強化サーチライト:脳の認知から精神イメージへのデコーディングの実現

Domain Adaptation-Enhanced Searchlight: Enabling brain decoding from visual perception to mental imagery ( http://arxiv.org/abs/2408.01163v1 )

ライセンス: Link先を確認

Alexander Olza, David Soto, Roberto Santana,

(参考訳) 認知神経科学と脳-コンピュータインターフェースの研究では、想像された刺激を正確に予測することが重要である。本研究は, 画像予測における領域適応(DA)の有効性について検討した。当初我々は、14の脳領域のデータを利用して、視覚刺激のベースラインモデルをトレーニングし、想像された刺激を予測する。次に、様々なDA手法を比較し、画像予測を改善するために複数のモデルを開発する。以上の結果から,DAは画像予測の精度を著しく向上させることが明らかとなった。次に、正規転送を用いたDA強化サーチライト分析を行い、その後、置換に基づく統計的テストを行い、画像復号が被検体全体で常に上回っている脳領域を特定する。我々のDA強化サーチライトは、視覚野や前頭前頭葉皮質を含む高度に分散した脳領域のイメージ内容を予測し、標準的なクロスドメイン分類法より優れている。この論文の完全なコードとデータは、科学コミュニティの利用のために公開されています。

In cognitive neuroscience and brain-computer interface research, accurately predicting imagined stimuli is crucial. This study investigates the effectiveness of Domain Adaptation (DA) in enhancing imagery prediction using primarily visual data from fMRI scans of 18 subjects. Initially, we train a baseline model on visual stimuli to predict imagined stimuli, utilizing data from 14 brain regions. We then develop several models to improve imagery prediction, comparing different DA methods. Our results demonstrate that DA significantly enhances imagery prediction, especially with the Regular Transfer approach. We then conduct a DA-enhanced searchlight analysis using Regular Transfer, followed by permutation-based statistical tests to identify brain regions where imagery decoding is consistently above chance across subjects. Our DA-enhanced searchlight predicts imagery contents in a highly distributed set of brain regions, including the visual cortex and the frontoparietal cortex, thereby outperforming standard cross-domain classification methods. The complete code and data for this paper have been made openly available for the use of the scientific community.

翻訳日:2024-08-05 13:47:29 公開日:2024-08-02

# 複雑な生体系における物理的に許容できる分類器による可視性ナノダイヤモンドの識別的アドレス化

Discriminative Addressing of Versatile Nanodiamonds via Physically-Enabled Classifier in Complex Bio-Systems ( http://arxiv.org/abs/2408.01164v1 )

ライセンス: Link先を確認

Yayin Tan, Xiaolu Wang, Feng Xu, Xinhao Hu, Yuan Lin, Bo Gao, Zhiqin Chu,

(参考訳) 窒素空孔(NV)センターは、ナノスケールのバイオセンシングとバイオイメージングに大きな可能性を秘めている。それにもかかわらず、彼らの想定する生体応用は、避けられない光散乱と細胞や組織における自己蛍光による固有の背景ノイズに悩まされる。そこで本研究では,背景雑音を効果的に除去しつつ,画素解像度でNV蛍光にオンデマンドかつ直接アクセスするための,物理機能付き分類器による新しい全光変調撮像法を開発した。具体的には、NV蛍光を光学的に変調して正弦波様の変化を示し、分類の基礎を与えることができる。本手法は, 細胞から生物まで, 蛍光干渉を伴う複雑な生物学的シナリオで検証する。特に,我々の分類に基づくアプローチは,神経タンパク質イメージングにおける蛍光ナノダイアモンド(FND)の信号-背景比(SBR)を約10^6倍に向上させる。また、染色細胞中のFNDの光検出磁気共鳴測定(ODMR)において、4倍のコントラスト改善を示す。提案手法は, 現実的な高忠実度イメージングや, 難聴シナリオのセンシングに応用可能な汎用的, 説明可能な, 堅牢なソリューションを提供する。

Nitrogen-vacancy (NV) centers show great potentials for nanoscale bio-sensing and bio-imaging. Nevertheless, their envisioned bio-applications suffer from intrinsic background noise due to unavoidable light scattering and autofluorescence in cells and tissues. Herein, we develop a novel all-optical modulated imaging method via physically-enabled classifier, for on-demand and direct access to NV fluorescence at pixel resolution while effectively filtering out background noise. Specifically, NV fluorescence can be modulated optically to exhibit sinusoid-like variations, providing basis for classification. We validate our method in various complex biological scenarios with fluorescence interference, ranging from cells to organisms. Notably, our classification-based approach achieves almost 10^6 times enhancement of signal-to-background ratio (SBR) for fluorescent nanodiamonds (FNDs) in neural protein imaging. We also demonstrate 4-fold contrast improvement in optically-detected magnetic resonance measurements (ODMR) of FNDs inside stained cells. Our technique offers a generic, explainable and robust solution, applicable for realistic high-fidelity imaging and sensing in challenging noise-laden scenarios.

翻訳日:2024-08-05 13:47:29 公開日:2024-08-02

# 連続時間ニューラルネットワークは、ランダムスパイク列車を安定的に記憶できる

Continuous-Time Neural Networks Can Stably Memorize Random Spike Trains ( http://arxiv.org/abs/2408.01166v1 )

ライセンス: Link先を確認

Hugo Aguettaz, Hans-Andrea Loeliger,

(参考訳) 本稿では,連続時間リカレントニューラルネットワークによるスパイクパターンの保存とリコール能力について検討する。ある種のパラメータにおいて、スパイク列(ネットワーク内のすべてのニューロン)のランダムスコアは、全てのスパイクの安定した正確な相対時間で頑健に記憶され、自律的に再生され、確率は1に近い。また,ノイズ条件下での連想的リコールも示す。これらの実験では、必要なシナプス重みはオフラインで計算され、時間的安定性を促進するテンプレートを満たす。

The paper explores the capability of continuous-time recurrent neural networks to store and recall precisely timed spike patterns. We show (by numerical experiments) that this is indeed possible: within some range of parameters, any random score of spike trains (for all neurons in the network) can be robustly memorized and autonomously reproduced with stable accurate relative timing of all spikes, with probability close to one. We also demonstrate associative recall under noisy conditions. In these experiments, the required synaptic weights are computed offline, to satisfy a template that encourages temporal stability.

翻訳日:2024-08-05 13:47:29 公開日:2024-08-02

# 全スライド画像分類のための複数インスタンス学習における事前訓練された特徴外子選択の再考

Rethinking Pre-trained Feature Extractor Selection in Multiple Instance Learning for Whole Slide Image Classification ( http://arxiv.org/abs/2408.01167v1 )

ライセンス: Link先を確認

Bryan Wong, Mun Yong Yi,

(参考訳) 多重インスタンス学習(MIL)は、パッチラベルアノテーションを必要とせず、ギガピクセル全体のスライド画像(WSI)を分類する方法として好まれている。現在のMIL研究ストリームの焦点は、事前訓練された特徴抽出器を使用してパッチから特徴ベクトルを抽出する組み込みベースのMILアプローチである。これらの特徴ベクトルは、スライドレベルの予測のためにMILアグリゲータに入力される。 ImageNet-1Kで事前訓練された最も一般的なResNet50教師付きモデルの強化に関する以前の研究提案にもかかわらず、WSI性能を最大化するために最適な特徴抽出器を選択するための明確なガイダンスがない。本研究は,3次元のMIL特徴抽出器(事前学習データセット,バックボーンモデル,事前学習手法)を用いて,このギャップに対処することを目的とする。 4つのSOTA MILモデルを用いて2つのWSIデータセット(TCGA-NSCLCとCamelyon16)で大規模な実験を行った。主な発見は以下のとおりである。 1) CNNとTransformerのバックボーンにおいて,より大きく,より多様な事前トレーニングデータセットにより,パフォーマンスが大幅に向上する。 2) `Modern and Deep' バックボーンは ‘standard' バックボーン(ResNet と ViT)を大幅に上回り、Transformer ベースのバックボーンではパフォーマンス改善がより保証されている。 3) Transformer (ViT) バックボーンに適用した場合, 自己教師あり学習 (SSL) の選択は極めて重要である。研究結果は、より効果的な病理基盤モデルの設計を含む、実践的な意味を持つ。私たちのコードは、https://anonymous.4open.science/r/MIL-Feature-Extractor-Selectionで利用可能です。

Multiple instance learning (MIL) has become a preferred method for classifying gigapixel whole slide images (WSIs), without requiring patch label annotation. The focus of the current MIL research stream is on the embedding-based MIL approach, which involves extracting feature vectors from patches using a pre-trained feature extractor. These feature vectors are then fed into an MIL aggregator for slide-level prediction. Despite prior research suggestions on enhancing the most commonly used ResNet50 supervised model pre-trained on ImageNet-1K, there remains a lack of clear guidance on selecting the optimal feature extractor to maximize WSI performance. This study aims at addressing this gap by examining MIL feature extractors across three dimensions: pre-training dataset, backbone model, and pre-training method. Extensive experiments were carried out on the two public WSI datasets (TCGA-NSCLC and Camelyon16) using four SOTA MIL models. The main findings indicate the following: 1) Performance significantly improves with larger and more varied pre-training datasets in both CNN and Transformer backbones. 2) `Modern and deeper' backbones greatly outperform `standard' backbones (ResNet and ViT), with performance improvements more guaranteed in Transformer-based backbones. 3) The choice of self-supervised learning (SSL) method is crucial, with the most significant benefits observed when applied to the Transformer (ViT) backbone. The study findings have practical implications, including designing more effective pathological foundation models. Our code is available at: https://anonymous.4open.science/r/MIL-Feature-Extractor-Selection

翻訳日:2024-08-05 13:47:29 公開日:2024-08-02

# LLMを誤解させる: 脆弱性、課題、機会

Misinforming LLMs: vulnerabilities, challenges and opportunities ( http://arxiv.org/abs/2408.01168v1 )

ライセンス: Link先を確認

Bo Zhou, Daniel Geißler, Paul Lukowicz,

(参考訳) 大規模言語モデル(LLM)は自然言語処理において大きな進歩を遂げているが、その基盤となるメカニズムはしばしば誤解されている。一貫性のある答えと明らかな推論行動を示すにもかかわらず、LLMは真の認知過程ではなく、単語の埋め込みにおける統計的パターンに依存している。これは"幻覚"や誤報といった脆弱性につながる。この論文は、現在のLLMアーキテクチャは、単語埋め込みベクトルの逐次パターンの相関に依存するため、本質的に信頼できないと論じている。しかし、生成トランスフォーマーモデルとファクトベースと論理型言語を組み合わせる研究は、与えられた真実に基づいてステートメントを生成し、自己推論プロセスを説明することができる信頼できるLCMの開発に繋がる可能性がある。

Large Language Models (LLMs) have made significant advances in natural language processing, but their underlying mechanisms are often misunderstood. Despite exhibiting coherent answers and apparent reasoning behaviors, LLMs rely on statistical patterns in word embeddings rather than true cognitive processes. This leads to vulnerabilities such as "hallucination" and misinformation. The paper argues that current LLM architectures are inherently untrustworthy due to their reliance on correlations of sequential patterns of word embedding vectors. However, ongoing research into combining generative transformer-based models with fact bases and logic programming languages may lead to the development of trustworthy LLMs capable of generating statements based on given truth and explaining their self-reasoning process.

翻訳日:2024-08-05 13:47:29 公開日:2024-08-02

# 産業用サイバー物理システムにおけるAI駆動型ディジタル双生児の持続的拡散型インセンティブメカニズム

Sustainable Diffusion-based Incentive Mechanism for Generative AI-driven Digital Twins in Industrial Cyber-Physical Systems ( http://arxiv.org/abs/2408.01173v1 )

ライセンス: Link先を確認

Jinbo Wen, Jiawen Kang, Dusit Niyato, Yang Zhang, Shiwen Mao,

(参考訳) 産業用サイバー物理システム(ICPS)は、現代の製造業と産業にとって不可欠なコンポーネントである。製品ライフサイクルを通じてデータをデジタル化することで、ICPSのDigital Twins(DT)は、現在の産業インフラからインテリジェントで適応的なインフラへの移行を可能にします。データ処理機能のおかげで、生成人工知能(GAI)はDTの構築と更新を推し進め、予測精度を改善し、多様なスマート製造の準備ができる。しかし, 産業用IoT(Industrial Internet of Things, IIoT)デバイスを利用したDT構築のためのデータ共有機構は, 有害な選択問題の影響を受けやすい。本稿ではまず,ICPSのためのGAI駆動型DTアーキテクチャを開発する。情報非対称性に起因する有害な選択問題に対処するため,契約理論モデルを提案し,持続可能な拡散に基づくソフトアクター・クリティック・アルゴリズムを開発し,最適に実現可能な契約を同定する。具体的には,動的構造化プルーニング技術を利用してアクターネットワークのパラメータ数を削減し,提案アルゴリズムのサステナビリティと効率的な実装を可能にする。最後に,提案手法の有効性を数値的に示す。

Industrial Cyber-Physical Systems (ICPSs) are an integral component of modern manufacturing and industries. By digitizing data throughout the product life cycle, Digital Twins (DTs) in ICPSs enable a shift from current industrial infrastructures to intelligent and adaptive infrastructures. Thanks to data process capability, Generative Artificial Intelligence (GAI) can drive the construction and update of DTs to improve predictive accuracy and prepare for diverse smart manufacturing. However, mechanisms that leverage sensing Industrial Internet of Things (IIoT) devices to share data for the construction of DTs are susceptible to adverse selection problems. In this paper, we first develop a GAI-driven DT architecture for ICPSs. To address the adverse selection problem caused by information asymmetry, we propose a contract theory model and develop the sustainable diffusion-based soft actor-critic algorithm to identify the optimal feasible contract. Specifically, we leverage the dynamic structured pruning technique to reduce parameter numbers of actor networks, allowing sustainability and efficient implementation of the proposed algorithm. Finally, numerical results demonstrate the effectiveness of the proposed scheme.

翻訳日:2024-08-05 13:47:29 公開日:2024-08-02

# 量子ネットワーク間の分数的状態伝達による決定論的多部絡み合い

Deterministic multipartite entanglement via fractional state transfer across quantum networks ( http://arxiv.org/abs/2408.01177v1 )

ライセンス: Link先を確認

G. F. Peñas, J. -J. García-Ripoll, R. Puebla,

(参考訳) 分散量子アーキテクチャにおける異なるノード間の絡み合いの生成は、異なるアプリケーションにおいて重要な役割を果たす。特に、決定論的で堅牢で高速なプロトコルは、真のマルチパートの絡み合った状態を作るのが非常に望ましい。本稿では,エミッタの励起が部分的に量子通信チャネルを介して伝達され,空間的に分離されたノードで吸収される分数量子状態伝達を提案する。このプロトコルは2つの量子レジスタ間のベル状態の高速な決定論的生成を可能にし、ネットワークのトポロジに応じて連続的または同時的に$N$ qubitsの一般的な設定に対して$W$状態を提供する。詳細な数値シミュレーションにより, 真のマルチパーティント絡み合った状態は, 現在の実験プラットフォーム内で忠実に準備できることを示し, ネットワークトポロジに応じて, 主デコヒーレンス源, クビットデフォーカス, 緩和の役割について議論する。

The generation of entanglement across different nodes in distributed quantum architectures plays a pivotal role for different applications. In particular, deterministic, robust, and fast protocols that prepare genuine multipartite entangled states are highly desirable. In this article, we propose a fractional quantum state transfer, in which the excitation of an emitter is partially transmitted through the quantum communication channel and then absorbed at a spatially separated node. This protocol is based on wavepacket shaping allowing for a fast deterministic generation of Bell states among two quantum registers and $W$ states for a general setting of $N$ qubits, either in a sequential or simultaneous fashion, depending on the topology of the network. By means of detailed numerical simulations, we show that genuine multipartite entangled states can be faithfully prepared within current experimental platforms and discuss the role of the main decoherence sources, qubit dephasing and relaxation, depending on the network topology.

翻訳日:2024-08-05 13:37:26 公開日:2024-08-02

# EmoBack:感情韻律を用いた話者識別に対するバックドア攻撃

EmoBack: Backdoor Attacks Against Speaker Identification Using Emotional Prosody ( http://arxiv.org/abs/2408.01178v1 )

ライセンス: Link先を確認

Coen Schoof, Stefanos Koffas, Mauro Conti, Stjepan Picek,

(参考訳) 話者識別(SI)は、話者の発話に基づいて話者の身元を決定する。これまでの研究は、SIディープニューラルネットワーク(DNN)がバックドア攻撃に対して脆弱であることを示している。バックドア攻撃は、DNNのトレーニングデータに隠れたトリガを埋め込むことで、推論中にこれらのトリガが存在する場合、DNNは誤った出力を生成する。これは、SI DNNのバックドア攻撃に対する脆弱性を話者の感情的韻律を用いて探求する最初の作品であり、動的で目立たないトリガーをもたらす。攻撃によって、鑑識、認証、監視に現実世界に影響を及ぼす可能性がある。 3つの異なるデータセットとDNNアーキテクチャを用いてパラメータスタディを行い、SIシステムの正確性に対するバックドアトリガーとしての感情の影響を調べた。さらに、プルーニング、STRIP-ViTA、および量子化、中央値フィルタリング、スクイーズという3つの一般的な前処理技術による攻撃の堅牢性についても検討した。以上の結果から, 上記のモデルでは攻撃の傾向が強く, 感情的トリガー(サドと中性韻律)がSIシステムの整合性を損なうのに有効であることが示唆された。しかし, プルーニング実験の結果から, 攻撃に対するモデル強化の潜在的な解決策が示唆され, 攻撃成功率は40%まで低下した。

Speaker identification (SI) determines a speaker's identity based on their spoken utterances. Previous work indicates that SI deep neural networks (DNNs) are vulnerable to backdoor attacks. Backdoor attacks involve embedding hidden triggers in DNNs' training data, causing the DNN to produce incorrect output when these triggers are present during inference. This is the first work that explores SI DNNs' vulnerability to backdoor attacks using speakers' emotional prosody, resulting in dynamic, inconspicuous triggers. %Such an attack could have real-world implications in forensics, authentication, and surveillance. We conducted a parameter study using three different datasets and DNN architectures to determine the impact of emotions as backdoor triggers on the accuracy of SI systems. Additionally, we have explored the robustness of our attacks by applying defenses like pruning, STRIP-ViTA, and three popular preprocessing techniques: quantization, median filtering, and squeezing. Our findings show that the aforementioned models are prone to our attack, indicating that emotional triggers (sad and neutral prosody) can be effectively used to compromise the integrity of SI systems. However, the results of our pruning experiments suggest potential solutions for reinforcing the models against our attacks, decreasing the attack success rate up to 40%.

翻訳日:2024-08-05 13:37:26 公開日:2024-08-02

# Nested Music Transformer:シンボリック・ミュージックとオーディオ・ジェネレーションにおける複合トークンの逐次デコード

Nested Music Transformer: Sequentially Decoding Compound Tokens in Symbolic Music and Audio Generation ( http://arxiv.org/abs/2408.01180v1 )

ライセンス: Link先を確認

Jiwoo Ryu, Hao-Wen Dong, Jongmin Jung, Dasaem Jeong,

(参考訳) 記号を複合トークンで表現し、それぞれのトークンは異なる音楽の特徴や属性を表すいくつかの異なるサブトークンで構成されており、シーケンス長を減少させる利点がある。音楽シーケンスモデリングにおける複合トークンの有効性は過去の研究で検証されているが、全てのサブトークンを同時に予測することは、それらの相互依存性を完全に把握できないため、最適以下の結果につながる可能性がある。我々はNested Music Transformer(NMT)を紹介した。これは、フラット化トークンの処理と似ているが、メモリ使用量の少ない複合トークンを自動回帰的に復号するアーキテクチャである。 NMTは、複合トークンの列をモデル化するメインデコーダと、各複合トークンのサブトークンをモデル化するサブデコーダの2つのトランスフォーマから構成される。実験の結果,複合トークンにNMTを適用することで,MAESTROデータセットから様々なシンボリック音楽データセットや離散音声トークンを処理する際の難易度が向上することが示された。

Representing symbolic music with compound tokens, where each token consists of several different sub-tokens representing a distinct musical feature or attribute, offers the advantage of reducing sequence length. While previous research has validated the efficacy of compound tokens in music sequence modeling, predicting all sub-tokens simultaneously can lead to suboptimal results as it may not fully capture the interdependencies between them. We introduce the Nested Music Transformer (NMT), an architecture tailored for decoding compound tokens autoregressively, similar to processing flattened tokens, but with low memory usage. The NMT consists of two transformers: the main decoder that models a sequence of compound tokens and the sub-decoder for modeling sub-tokens of each compound token. The experiment results showed that applying the NMT to compound tokens can enhance the performance in terms of better perplexity in processing various symbolic music datasets and discrete audio tokens from the MAESTRO dataset.

翻訳日:2024-08-05 13:37:26 公開日:2024-08-02

# VAR-CLIP:視覚的自己回帰モデルを用いたテキスト・画像生成装置

VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling ( http://arxiv.org/abs/2408.01181v1 )

ライセンス: Link先を確認

Qian Zhang, Xiangzi Dai, Ninghua Yang, Xiang An, Ziyong Feng, Xingyu Ren,

(参考訳) VARは「次世代予測」とは対照的に「次世代予測」を用いる新世代のパラダイムである。この革新的な変換により、自動回帰(AR)変換器は視覚分布を迅速に学習し、堅牢な一般化を実現することができる。しかしながら、オリジナルのVARモデルは、ガイダンスのためのテキストキャプションのみに依存するため、クラス条件の合成に制約されている。本稿では,Visual Auto-Regressive技術とCLIPの機能を統合する新しいテキスト・ツー・イメージ・モデルであるVAR-CLIPを紹介する。 VAR-CLIPフレームワークはキャプションをテキスト埋め込みにエンコードし、画像生成のテキスト条件として使用される。 ImageNetなどの広範なデータセットのトレーニングを容易にするため,BLIP2を利用した画像テキストデータセットを構築した。さらに,キャプションガイダンスの目的で,CLIP内の単語位置決めの重要性について検討した。 VAR-CLIPは,高忠実度,テキストの整合性,美的卓越性を有する幻想画像の生成に優れていた。私たちのプロジェクトページはhttps://github.com/daixiangzi/VAR-CLIPです。

VAR is a new generation paradigm that employs 'next-scale prediction' as opposed to 'next-token prediction'. This innovative transformation enables auto-regressive (AR) transformers to rapidly learn visual distributions and achieve robust generalization. However, the original VAR model is constrained to class-conditioned synthesis, relying solely on textual captions for guidance. In this paper, we introduce VAR-CLIP, a novel text-to-image model that integrates Visual Auto-Regressive techniques with the capabilities of CLIP. The VAR-CLIP framework encodes captions into text embeddings, which are then utilized as textual conditions for image generation. To facilitate training on extensive datasets, such as ImageNet, we have constructed a substantial image-text dataset leveraging BLIP2. Furthermore, we delve into the significance of word positioning within CLIP for the purpose of caption guidance. Extensive experiments confirm VAR-CLIP's proficiency in generating fantasy images with high fidelity, textual congruence, and aesthetic excellence. Our project page are https://github.com/daixiangzi/VAR-CLIP

翻訳日:2024-08-05 13:37:26 公開日:2024-08-02

# 強化学習におけるメタヒューリスティック戦略を用いた変分量子回路の最適化

Optimizing Variational Quantum Circuits Using Metaheuristic Strategies in Reinforcement Learning ( http://arxiv.org/abs/2408.01187v1 )

ライセンス: Link先を確認

Michael Kölle, Daniel Seidl, Maximilian Zorn, Philipp Altmann, Jonas Stein, Thomas Gabor,

(参考訳) 量子強化学習(QRL)は、特定のシナリオにおいて、コンパクトな状態空間表現やより高速な収束など、古典的な強化学習よりも潜在的に有利である。しかし、実際的な利点はさらなる検証を必要とする。 QRLはフラットなソリューションランドスケープのような課題に直面している。本研究では,メタヒューリスティックアルゴリズム – Particle Swarm Optimization, Ant Colony Optimization, Tabu Search, Genetic Algorithm, Simulated Annealing, Harmony Search – のQRLへの統合について検討する。これらのアルゴリズムはパラメータ最適化の柔軟性と効率性を提供する。 5\times5$ MiniGrid Reinforcement Learning環境の評価は、全てのアルゴリズムが最適に近い結果を得ることを示している。キャットポール環境では、シミュレートされたアニーリング、遺伝的アルゴリズム、パーティクルスワーム最適化が最適な結果を得る一方、他はランダムなアクション選択よりも若干良い結果が得られる。これらの結果から,QRL学習を効率的に行うために,Particle Swarm Optimization と Simulated Annealing の可能性を示唆し,アルゴリズムの選択と適応を慎重に行う必要性を強調した。

Quantum Reinforcement Learning (QRL) offers potential advantages over classical Reinforcement Learning, such as compact state space representation and faster convergence in certain scenarios. However, practical benefits require further validation. QRL faces challenges like flat solution landscapes, where traditional gradient-based methods are inefficient, necessitating the use of gradient-free algorithms. This work explores the integration of metaheuristic algorithms -- Particle Swarm Optimization, Ant Colony Optimization, Tabu Search, Genetic Algorithm, Simulated Annealing, and Harmony Search -- into QRL. These algorithms provide flexibility and efficiency in parameter optimization. Evaluations in $5\times5$ MiniGrid Reinforcement Learning environments show that, all algorithms yield near-optimal results, with Simulated Annealing and Particle Swarm Optimization performing best. In the Cart Pole environment, Simulated Annealing, Genetic Algorithms, and Particle Swarm Optimization achieve optimal results, while the others perform slightly better than random action selection. These findings demonstrate the potential of Particle Swarm Optimization and Simulated Annealing for efficient QRL learning, emphasizing the need for careful algorithm selection and adaptation.

翻訳日:2024-08-05 13:37:26 公開日:2024-08-02

# 自律システムにおける最適化のための多目的深層強化学習

Multi-Objective Deep Reinforcement Learning for Optimisation in Autonomous Systems ( http://arxiv.org/abs/2408.01188v1 )

ライセンス: Link先を確認

Juan C. Rosero, Ivana Dusparic, Nicolás Cardozo,

(参考訳) 強化学習(Reinforcement Learning, RL)は、自律システム(AS)において、環境のモデルや事前定義されたアクションを必要とせず、実行時の学習を可能にするために広く使用されている。しかしながら、Q-learning のような AS における RL のほとんどの応用は、1つの目的のみを最適化することができ、複数の目的を1つの目的関数と事前定義された重みで組み合わせるために、多目的システムにおいて必要となる。 MORL(Multi-Objective Reinforcement Learning)技術はいくつか存在するが、実世界のASシステムではなくRLベンチマークで採用されている。本稿では,Deep W-Learning(DWN)と呼ばれるMORL技術を用いて,自己適応型サーバであるEmergent Web Servers exemplarに適用し,実行時のパフォーマンス最適化に最適な構成を求める。 DWNを2つの単目的最適化実装と比較する: {\epsilon}-greedyアルゴリズムとDeep Q-Networks。最初の評価では,DWN は DQN と {\epsilon}-greedy のアプローチと類似した結果と同時に複数の目的を最適化し,いくつかの指標の性能が向上し,複数の目的をひとつのユーティリティ関数に結合する問題を回避する。

Reinforcement Learning (RL) is used extensively in Autonomous Systems (AS) as it enables learning at runtime without the need for a model of the environment or predefined actions. However, most applications of RL in AS, such as those based on Q-learning, can only optimize one objective, making it necessary in multi-objective systems to combine multiple objectives in a single objective function with predefined weights. A number of Multi-Objective Reinforcement Learning (MORL) techniques exist but they have mostly been applied in RL benchmarks rather than real-world AS systems. In this work, we use a MORL technique called Deep W-Learning (DWN) and apply it to the Emergent Web Servers exemplar, a self-adaptive server, to find the optimal configuration for runtime performance optimization. We compare DWN to two single-objective optimization implementations: {\epsilon}-greedy algorithm and Deep Q-Networks. Our initial evaluation shows that DWN optimizes multiple objectives simultaneously with similar results than DQN and {\epsilon}-greedy approaches, having a better performance for some metrics, and avoids issues associated with combining multiple objectives into a single utility function.

翻訳日:2024-08-05 13:37:26 公開日:2024-08-02

# 脳腫瘍分離のための弱教師付き,グローバルに説明可能な学習フレームワーク

A Weakly Supervised and Globally Explainable Learning Framework for Brain Tumor Segmentation ( http://arxiv.org/abs/2408.01191v1 )

ライセンス: Link先を確認

Ruitao Xie, Limai Jiang, Xiaoxi He, Yi Pan, Yunpeng Cai,

(参考訳) マシンベースの脳腫瘍セグメント化は、医師がより良い診断を行うのに役立つ。しかし、脳腫瘍の複雑な構造と高価なピクセルレベルのアノテーションは、自動腫瘍セグメンテーションの課題を呈している。本稿では, ピクセルレベルのアノテーションを必要とせずに, 例外的な脳腫瘍のセグメンテーション性能を実現するための反ファクト生成フレームワークを提案する。本フレームワークは, クラス関連機能とクラス関連機能とを効果的に分離し, クラス関連機能を埋め込み, クラス属性を変更しながら, アイデンティティ機能を保存する新しいサンプルを生成する。抽出したクラス関連特徴についてトポロジカルデータ解析を行い,グローバルに説明可能な多様体を得るとともに,各異常標本をセグメント化するために,腫瘍領域の同定のための比較のために,多様体内に設計されたルールベースパスのガイダンスを用いて有意な正常標本を効果的に生成することができた。提案手法を2つのデータセットで評価し,脳腫瘍セグメンテーションの優れた性能を示す。コードはhttps://github.com/xrt11/tumor-segmentationで入手できる。

Machine-based brain tumor segmentation can help doctors make better diagnoses. However, the complex structure of brain tumors and expensive pixel-level annotations present challenges for automatic tumor segmentation. In this paper, we propose a counterfactual generation framework that not only achieves exceptional brain tumor segmentation performance without the need for pixel-level annotations, but also provides explainability. Our framework effectively separates class-related features from class-unrelated features of the samples, and generate new samples that preserve identity features while altering class attributes by embedding different class-related features. We perform topological data analysis on the extracted class-related features and obtain a globally explainable manifold, and for each abnormal sample to be segmented, a meaningful normal sample could be effectively generated with the guidance of the rule-based paths designed within the manifold for comparison for identifying the tumor regions. We evaluate our proposed method on two datasets, which demonstrates superior performance of brain tumor segmentation. The code is available at https://github.com/xrt11/tumor-segmentation.

翻訳日:2024-08-05 13:37:26 公開日:2024-08-02

# 能動的ロバスト符号化方式

Certifiably Robust Encoding Schemes ( http://arxiv.org/abs/2408.01200v1 )

ライセンス: Link先を確認

Aman Saxena, Tom Wollschläger, Nicola Franco, Jeanette Miriam Lorenz, Stephan Günnemann,

(参考訳) 量子機械学習は、量子力学の原理を使ってデータを処理し、速度と性能の潜在的な進歩を提供する。しかし、以前の研究では、これらのモデルが入力データを操作する攻撃や量子回路のノイズを利用する攻撃に感受性があることが示されている。これに続いて、これらのモデルの堅牢性について様々な研究がなされている。これらの研究は、量子状態の操作の堅牢性証明に焦点を当てている。本研究では,従来のデータ符号化方式における摂動に対するロバスト性を調べることで,この研究線を拡張した。このようなスキームでは、従来の機械学習によるランダム化平滑化と類似した、スムーズなデータにおけるノイズレス分類器の平均値を評価するのに、適切なノイズチャネルの追加が等価であることを示す。一般の枠組みを用いて、位相減衰型ノイズチャネルの適切な付加により、検討された符号化方式に対する経験的かつ証明可能なロバスト性が向上することを示す。

Quantum machine learning uses principles from quantum mechanics to process data, offering potential advances in speed and performance. However, previous work has shown that these models are susceptible to attacks that manipulate input data or exploit noise in quantum circuits. Following this, various studies have explored the robustness of these models. These works focus on the robustness certification of manipulations of the quantum states. We extend this line of research by investigating the robustness against perturbations in the classical data for a general class of data encoding schemes. We show that for such schemes, the addition of suitable noise channels is equivalent to evaluating the mean value of the noiseless classifier at the smoothed data, akin to Randomized Smoothing from classical machine learning. Using our general framework, we show that suitable additions of phase-damping noise channels improve empirical and provable robustness for the considered class of encoding schemes.

翻訳日:2024-08-05 13:37:26 公開日:2024-08-02

# 大規模言語モデルを用いた臨床テキストの高調波フェノタイピング

High-Throughput Phenotyping of Clinical Text Using Large Language Models ( http://arxiv.org/abs/2408.01214v1 )

ライセンス: Link先を確認

Daniel B. Hier, S. Ilyas Munzir, Anne Stahlfeld, Tayo Obafemi-Ajayi, Michael D. Carrithers,

(参考訳) 高スループット表現型化は、患者サインの標準化されたオントロジー概念へのマッピングを自動化し、精密医療に必須である。本研究では,大規模言語モデルを用いたオンラインMendelian Inheritance in Man (OMIM)データベースから臨床要約の表現の自動化について検討した。豊かな表現型データのため、これらの要約は医師のメモの代理となる。 GPT-4とGPT-3.5-Turboの性能比較を行った。その結果, GPT-4 は GPT-3.5-Turbo を超越し, 信号の識別, 分類, 正規化を行い, 文字間合意に匹敵する手動アノテータと一致した。符号正規化のいくつかの制限にもかかわらず、GPT-4の広範囲な事前訓練は、手動で注釈付けされたトレーニングデータの必要性を回避しつつ、複数の表現型タスクのハイパフォーマンスと一般化性をもたらす。大規模言語モデルが臨床テキストの高スループット表現型自動化の主流となることが期待されている。

High-throughput phenotyping automates the mapping of patient signs to standardized ontology concepts and is essential for precision medicine. This study evaluates the automation of phenotyping of clinical summaries from the Online Mendelian Inheritance in Man (OMIM) database using large language models. Due to their rich phenotype data, these summaries can be surrogates for physician notes. We conduct a performance comparison of GPT-4 and GPT-3.5-Turbo. Our results indicate that GPT-4 surpasses GPT-3.5-Turbo in identifying, categorizing, and normalizing signs, achieving concordance with manual annotators comparable to inter-rater agreement. Despite some limitations in sign normalization, the extensive pre-training of GPT-4 results in high performance and generalizability across several phenotyping tasks while obviating the need for manually annotated training data. Large language models are expected to be the dominant method for automating high-throughput phenotyping of clinical text.

翻訳日:2024-08-05 13:37:26 公開日:2024-08-02

# ZNorm: 加速ニューラルネットワークトレーニングのためのZスコア勾配正規化

ZNorm: Z-Score Gradient Normalization for Accelerating Neural Network Training ( http://arxiv.org/abs/2408.01215v1 )

ライセンス: Link先を確認

Juyoung Yun, Hoyoung Kim, Suin Cho, Hangil Kang,

(参考訳) ディープラーニングの急速な進歩は、ディープニューラルネットワーク(DNN)の効率的なトレーニング方法を必要とする。モデルが複雑化するにつれて、勾配の消滅と爆発は収束と性能を妨げる。本研究では,Z-Score Normalization for Gradient Descent (ZNorm)を提案する。 ZNormは、全体的な勾配を正規化し、層をまたいだ一貫した勾配スケーリングを提供し、これにより、消滅と爆発する勾配のリスクを低減する。 CIFAR-10と医療データセットに関する広範な実験により、ZNormは収束を加速するだけでなく、パフォーマンス指標も向上することが示された。 ZNormは既存の手法を一貫して上回り、同じ計算設定で優れた結果を得る。医用画像の応用において、ZNormは腫瘍予測とセグメンテーション性能を改善し、その実用性を強調している。これらの調査結果は、さまざまなアーキテクチャやアプリケーションにわたるディープニューラルネットワークトレーニングの効率性と有効性を改善するための、堅牢で汎用的なツールとしてのZNormの可能性を強調している。

The rapid advancements in deep learning necessitate efficient training methods for deep neural networks (DNNs). As models grow in complexity, vanishing and exploding gradients impede convergence and performance. We propose Z-Score Normalization for Gradient Descent (ZNorm), an innovative technique that adjusts only the gradients to enhance training efficiency and improve model performance. ZNorm normalizes the overall gradients, providing consistent gradient scaling across layers, thereby reducing the risks of vanishing and exploding gradients. Our extensive experiments on CIFAR-10 and medical datasets demonstrate that ZNorm not only accelerates convergence but also enhances performance metrics. ZNorm consistently outperforms existing methods, achieving superior results using the same computational settings. In medical imaging applications, ZNorm improves tumor prediction and segmentation performances, underscoring its practical utility. These findings highlight ZNorm's potential as a robust and versatile tool for improving the efficiency and effectiveness of deep neural network training across a wide range of architectures and applications.

翻訳日:2024-08-05 13:37:26 公開日:2024-08-02

# S2TD-Face:1つのスケッチから制御可能なテクスチャで詳細な3D顔の再構築

S2TD-Face: Reconstruct a Detailed 3D Face with Controllable Texture from a Single Sketch ( http://arxiv.org/abs/2408.01218v1 )

ライセンス: Link先を確認

Zidu Wang, Xiangyu Zhu, Jiang Yu, Tianshuo Zhang, Zhen Lei,

(参考訳) アニメーションや3Dアバター,芸術デザイン,行方不明者検索など,多くのシナリオに適用可能なスケッチから3Dテクスチャ化された顔復元は,非常に有望だが未開発な研究課題である。一方、スケッチのスタイリスティックな多様性は、ポーズ限定で現実的な陰影のスケッチのみを処理できる既存のスケッチ・ツー・3Dフェイス手法に繋がる。一方、テクスチャは顔の外観を表現する上で重要な役割を担っているが、スケッチにはこの情報が欠如しており、再構築過程において追加のテクスチャ制御が必要である。本稿では,S2TD-Faceと呼ばれるスケッチから,制御可能なテクスチャと詳細な3次元顔の再構成手法を提案する。 S2TD-Faceは2段階の幾何再構成フレームワークを導入し、入力スケッチから詳細な幾何を直接再構築する。スケッチの微妙なストロークと幾何的整合性を維持するため,ディアンプやしわなどの入力特徴を正確に再現できる新しいスケッチ・ツー・ジオメトリー・ロスを提案する。我々のトレーニング戦略は、3D顔スキャンデータや労働集約的な手描きスケッチに頼らない。さらに、S2TD-Faceは、テキストプロンプトを利用したテクスチャ制御モジュールを導入し、ライブラリから最も適したテクスチャを選択し、それらをシームレスに幾何学に統合することで、制御可能なテクスチャを持つ3Dディテールフェイスを実現する。 S2TD-Faceは、膨大な量的および定性的な実験において、既存の最先端の手法を超越している。私たちのプロジェクトはhttps://github.com/wang-zidu/S2TD-Faceで利用可能です。

3D textured face reconstruction from sketches applicable in many scenarios such as animation, 3D avatars, artistic design, missing people search, etc., is a highly promising but underdeveloped research topic. On the one hand, the stylistic diversity of sketches leads to existing sketch-to-3D-face methods only being able to handle pose-limited and realistically shaded sketches. On the other hand, texture plays a vital role in representing facial appearance, yet sketches lack this information, necessitating additional texture control in the reconstruction process. This paper proposes a novel method for reconstructing controllable textured and detailed 3D faces from sketches, named S2TD-Face. S2TD-Face introduces a two-stage geometry reconstruction framework that directly reconstructs detailed geometry from the input sketch. To keep geometry consistent with the delicate strokes of the sketch, we propose a novel sketch-to-geometry loss that ensures the reconstruction accurately fits the input features like dimples and wrinkles. Our training strategies do not rely on hard-to-obtain 3D face scanning data or labor-intensive hand-drawn sketches. Furthermore, S2TD-Face introduces a texture control module utilizing text prompts to select the most suitable textures from a library and seamlessly integrate them into the geometry, resulting in a 3D detailed face with controllable texture. S2TD-Face surpasses existing state-of-the-art methods in extensive quantitative and qualitative experiments. Our project is available at https://github.com/wang-zidu/S2TD-Face .

翻訳日:2024-08-05 13:37:26 公開日:2024-08-02

# 計算思考能力評価のためのノイズゲイツベイズネットワークを用いたルーブリック学習モデル

Rubric-based Learner Modelling via Noisy Gates Bayesian Networks for Computational Thinking Skills Assessment ( http://arxiv.org/abs/2408.01221v1 )

ライセンス: Link先を確認

Giorgia Adorni, Francesca Mangili, Alberto Piatti, Claudio Bonesana, Alessandro Antonucci,

(参考訳) 近代的・パーソナライズされた教育においては、学習者の能力を開発し、それらを正確に評価することへの関心が高まっている。本研究では,タスク固有の能力評価ルーリックから自動スキルアセスメントのための学習者モデルを導出する手法を提案し,自動アセスメントツールの実装を簡略化した。しかし、以前のアプローチには2つの大きな制限があった。一評価ルーリックで定める能力の秩序は、間接的にのみモデル化する。 (二)補足技は、評価対象ではなく、課題達成に必要なもので、そのモデルには含まれなかった。この作業では、問題に対処します。 (i)ダミー観測ノードの導入により,ネットワークの構造を変化させることなく,厳密な順序付けを行うことができた。対照的に、 2)2つのゲート層を持つネットワークを設計し,一方はノイズORゲートによる解離操作を行い,他方は論理的ANDによる解離操作を行う。このような変更は、モデルのコンパクトなパラメトリエーション、解釈可能性、単純な専門家の推論を妥協することなく、モデル結果の一貫性とモデリングツールの柔軟性を改善します。本研究では,CT(Computational Thinking)スキルアセスメントのための学習モデルの開発に,このアプローチを用いた。 CT-cubeスキルアセスメントフレームワークとCAT(Cross Array Task)は、それを実証し、その実現可能性を示すために使用される。

In modern and personalised education, there is a growing interest in developing learners' competencies and accurately assessing them. In a previous work, we proposed a procedure for deriving a learner model for automatic skill assessment from a task-specific competence rubric, thus simplifying the implementation of automated assessment tools. The previous approach, however, suffered two main limitations: (i) the ordering between competencies defined by the assessment rubric was only indirectly modelled; (ii) supplementary skills, not under assessment but necessary for accomplishing the task, were not included in the model. In this work, we address issue (i) by introducing dummy observed nodes, strictly enforcing the skills ordering without changing the network's structure. In contrast, for point (ii), we design a network with two layers of gates, one performing disjunctive operations by noisy-OR gates and the other conjunctive operations through logical ANDs. Such changes improve the model outcomes' coherence and the modelling tool's flexibility without compromising the model's compact parametrisation, interpretability and simple experts' elicitation. We used this approach to develop a learner model for Computational Thinking (CT) skills assessment. The CT-cube skills assessment framework and the Cross Array Task (CAT) are used to exemplify it and demonstrate its feasibility.

翻訳日:2024-08-05 13:37:26 公開日:2024-08-02

# ハイパースペクトル画像分類のためのマルチヘッド空間スペクトルマンバ

Multi-head Spatial-Spectral Mamba for Hyperspectral Image Classification ( http://arxiv.org/abs/2408.01224v1 )

ライセンス: Link先を確認

Muhammad Ahmad, Muhammad Hassaan Farooq Butt, Muhammad Usama, Hamad Ahmed Altuwaijri, Manual Mazzara, Salvatore Distenano,

(参考訳) 空間スペクトルマンバ(SSM)は計算効率を改善し、トランスフォーマーの制限に対処して長距離依存をキャプチャする。しかし、伝統的なマンバモデルは、HSIの豊富なスペクトル情報を見落とし、高次元とシーケンシャルなデータに苦しむ。これらの課題に対処するため,マルチヘッド自己注意・トークン拡張(MHSSMamba)を用いたSSMを提案する。このモデルは、スペクトルトークンの強化とマルチヘッドアテンションを用いてスペクトルバンドと空間位置の複雑な関係を捉えることで、スペクトル情報と空間情報を統合する。また、スペクトル帯域にまたがるコンテキスト情報を保存し、長距離依存やHSIデータのシーケンシャルな性質も管理する。 MHSSMambaはパヴィア大学で97.62 %、ヒューストン大学で96.92 %、サリナスで96.85 %、武漢長クーのデータセットで99.49 %という顕著な分類精度を達成した。

Spatial-Spectral Mamba (SSM) improves computational efficiency and captures long-range dependencies, addressing Transformer limitations. However, traditional Mamba models overlook rich spectral information in HSIs and struggle with high dimensionality and sequential data. To address these issues, we propose the SSM with multi-head self-attention and token enhancement (MHSSMamba). This model integrates spectral and spatial information by enhancing spectral tokens and using multi-head attention to capture complex relationships between spectral bands and spatial locations. It also manages long-range dependencies and the sequential nature of HSI data, preserving contextual information across spectral bands. MHSSMamba achieved remarkable classification accuracies of 97.62\% on Pavia University, 96.92\% on the University of Houston, 96.85\% on Salinas, and 99.49\% on Wuhan-longKou datasets.

翻訳日:2024-08-05 13:37:26 公開日:2024-08-02

# 幻覚の脅威:視覚・言語モデルにおけるプライバシー漏洩を解き明かす

The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models ( http://arxiv.org/abs/2408.01228v1 )

ライセンス: Link先を確認

Simone Caldarella, Massimiliano Mancini, Elisa Ricci, Rahaf Aljundi,

(参考訳) VLM(Vision-Language Models)は、視覚的およびテキスト的理解を組み合わせることで、画像キャプションの生成や、さまざまな領域にわたる視覚的質問への回答など、さまざまなタスクに適している。しかし、これらの機能は、Webからクロールされた大量の未処理データのトレーニングに基づいて構築されている。後者には、VLMが記憶し、リークする可能性のある機密情報が含まれており、重要なプライバシー上の懸念を引き起こす可能性がある。本稿では,これらの脆弱性が存在するかどうかを,ID漏洩に着目して評価する。私たちの研究は3つの重要な発見につながります。 i)VLMは、視覚言語アライメント及び微調整用データの使用時であっても、識別情報を漏洩する。 (二)身元漏洩にはほとんど影響しない。 (三)曖昧化のようにシンプルで広く用いられる匿名化技術は、この問題に対処するには不十分である。これらの知見は、VLMをデプロイする際の堅牢なプライバシ保護戦略の緊急の必要性を浮き彫りにした。倫理的認識と責任ある開発プラクティスは、これらのリスクを軽減するために不可欠です。

Vision-Language Models (VLMs) combine visual and textual understanding, rendering them well-suited for diverse tasks like generating image captions and answering visual questions across various domains. However, these capabilities are built upon training on large amount of uncurated data crawled from the web. The latter may include sensitive information that VLMs could memorize and leak, raising significant privacy concerns. In this paper, we assess whether these vulnerabilities exist, focusing on identity leakage. Our study leads to three key findings: (i) VLMs leak identity information, even when the vision-language alignment and the fine-tuning use anonymized data; (ii) context has little influence on identity leakage; (iii) simple, widely used anonymization techniques, like blurring, are not sufficient to address the problem. These findings underscore the urgent need for robust privacy protection strategies when deploying VLMs. Ethical awareness and responsible development practices are essential to mitigate these risks.

翻訳日:2024-08-05 13:37:26 公開日:2024-08-02

# HeteroMorpheus:形態的不均一性モデリングに基づくユニバーサルコントロール

HeteroMorpheus: Universal Control Based on Morphological Heterogeneity Modeling ( http://arxiv.org/abs/2408.01230v1 )

ライセンス: Link先を確認

YiFan Hao, Yang Yang, Junru Song, Wei Peng, Weien Zhou, Tingsong Jiang, Wen Yao,

(参考訳) ロボット制御の分野では、各ロボットのために個々のコントローラを設計することは、高い計算コストをもたらす。多様なロボット形態に適用可能なユニバーサルコントロールポリシーは、この課題を軽減することを約束する。優先的に、グラフニューラルネットワーク(GNN)とトランスフォーマーに基づくモデルが採用されている。しかしながら、これらのモデルは典型的には、異なる手足の機能的多様性を見渡す均質なグラフ構造を用いる。このギャップを埋めるために、異種グラフ変換器に基づく新しい手法であるHeteroMorpheusを導入する。この方法は一意に四肢の不均一性に対処し、様々な形態のロボット力学のより良い表現を促進する。大規模な実験を通じて、ゼロショットの一般化やサンプル効率のよいロボット形態への移動を含む、政策一般化能力における最先端の手法に対するヘテロモルフェウスの優位性を実証する。

In the field of robotic control, designing individual controllers for each robot leads to high computational costs. Universal control policies, applicable across diverse robot morphologies, promise to mitigate this challenge. Predominantly, models based on Graph Neural Networks (GNN) and Transformers are employed, owing to their effectiveness in capturing relational dynamics across a robot's limbs. However, these models typically employ homogeneous graph structures that overlook the functional diversity of different limbs. To bridge this gap, we introduce HeteroMorpheus, a novel method based on heterogeneous graph Transformer. This method uniquely addresses limb heterogeneity, fostering better representation of robot dynamics of various morphologies. Through extensive experiments we demonstrate the superiority of HeteroMorpheus against state-of-the-art methods in the capability of policy generalization, including zero-shot generalization and sample-efficient transfer to unfamiliar robot morphologies.

翻訳日:2024-08-05 13:37:26 公開日:2024-08-02

# WaveMamba:ハイパースペクトル画像分類のための空間スペクトルウェーブレットマンバ

WaveMamba: Spatial-Spectral Wavelet Mamba for Hyperspectral Image Classification ( http://arxiv.org/abs/2408.01231v1 )

ライセンス: Link先を確認

Muhammad Ahmad, Muhammad Usama, Manual Mazzara,

(参考訳) ハイパースペクトルイメージング(HSI)は、様々なアプリケーションにわたる詳細なスペクトルと空間情報をキャプチャするための強力なツールであることが証明されている。 HSI分類のためのディープラーニング(DL)とトランスフォーマーアーキテクチャ(HSIC)の進歩にもかかわらず、計算効率や広範なラベル付きデータの必要性といった課題が続いている。本稿では、ウェーブレット変換を空間スペクトルマンバアーキテクチャと統合してHSICを強化する新しいアプローチであるWaveMambaを紹介する。 WaveMambaは、エンドツーエンドのトレーニング可能なモデルで、ローカルなテクスチャパターンとグローバルなコンテキスト関係の両方をキャプチャします。 Waveletベースの拡張機能はステートスペースアーキテクチャを通じて処理され、空間-スペクトル関係と時間的依存関係をモデル化する。実験の結果、WaveMambaは既存のモデルを超え、ヒューストン大学のデータセットでは4.5倍の精度向上、パヴィア大学のデータセットでは2.0倍の精度向上を達成した。これらの結果は,HSIに固有の複雑なデータ相互作用に対処する上での有効性を検証した。

Hyperspectral Imaging (HSI) has proven to be a powerful tool for capturing detailed spectral and spatial information across diverse applications. Despite the advancements in Deep Learning (DL) and Transformer architectures for HSI Classification (HSIC), challenges such as computational efficiency and the need for extensive labeled data persist. This paper introduces WaveMamba, a novel approach that integrates wavelet transformation with the Spatial-Spectral Mamba architecture to enhance HSIC. WaveMamba captures both local texture patterns and global contextual relationships in an end-to-end trainable model. The Wavelet-based enhanced features are then processed through the state-space architecture to model spatial-spectral relationships and temporal dependencies. The experimental results indicate that WaveMamba surpasses existing models, achieving an accuracy improvement of 4.5\% on the University of Houston dataset and a 2.0\% increase on the Pavia University dataset. These findings validate its effectiveness in addressing the complex data interactions inherent in HSIs.

翻訳日:2024-08-05 13:27:42 公開日:2024-08-02

# CLIP4Sketch: 拡散モデルを用いたデータセット拡張によるスケッチとマグショットマッチングの強化

CLIP4Sketch: Enhancing Sketch to Mugshot Matching through Dataset Augmentation using Diffusion Models ( http://arxiv.org/abs/2408.01233v1 )

ライセンス: Link先を確認

Kushal Kumar Jain, Steve Grosz, Anoop M. Namboodiri, Anil K. Jain,

(参考訳) Forensic sketch-to-mugshot matchingは顔認識において難しい課題であり、主に注釈付き法医学的スケッチの不足と、スケッチと写真の間のモダリティギャップによって妨げられる。これを解決するために,拡散モデルを利用して多種多様なスケッチ画像を生成する新しいアプローチであるCLIP4Sketchを提案する。本手法は拡散確率モデル(DDPM)を用いて,個人性やスタイルを明確に制御したスケッチを生成する。参照マグショットのCLIPとAdafaceの埋め込みとスタイルのテキスト記述を,拡散モデルの条件として組み合わせる。本研究のアプローチの有効性は,マグショットに対応するスケッチの包括的データセットを作成し,合成データに基づいて顔認識モデルを訓練することによって実証する。本研究は,既存の実顔スケッチデータに対するトレーニングよりも,スケッチ・ツー・マガットのマッチング精度を大幅に向上させ,モダリティを越えた顔認識システムの性能向上における拡散モデルの可能性を検証した。また、その優位性を示すために、GANベースの手法を用いて生成されたデータセットとデータセットを比較した。

Forensic sketch-to-mugshot matching is a challenging task in face recognition, primarily hindered by the scarcity of annotated forensic sketches and the modality gap between sketches and photographs. To address this, we propose CLIP4Sketch, a novel approach that leverages diffusion models to generate a large and diverse set of sketch images, which helps in enhancing the performance of face recognition systems in sketch-to-mugshot matching. Our method utilizes Denoising Diffusion Probabilistic Models (DDPMs) to generate sketches with explicit control over identity and style. We combine CLIP and Adaface embeddings of a reference mugshot, along with textual descriptions of style, as the conditions to the diffusion model. We demonstrate the efficacy of our approach by generating a comprehensive dataset of sketches corresponding to mugshots and training a face recognition model on our synthetic data. Our results show significant improvements in sketch-to-mugshot matching accuracy over training on an existing, limited amount of real face sketch data, validating the potential of diffusion models in enhancing the performance of face recognition systems across modalities. We also compare our dataset with datasets generated using GAN-based methods to show its superiority.

翻訳日:2024-08-05 13:27:42 公開日:2024-08-02

# 量子ネットワークにおける絡み合いルーティング:包括的調査

Entanglement Routing in Quantum Networks: A Comprehensive Survey ( http://arxiv.org/abs/2408.01234v1 )

ライセンス: Link先を確認

Amar Abane, Michael Cubeddu, Van Sy Mai, Abdella Battou,

(参考訳) 近距離量子ネットワークにおけるエンタングルメントルーティングは、2つの離れたノード間の終端エンタングルメントを確立するために、スワップ操作によって結合するショートレンジエンタングルメントの最適なシーケンスを選択することで構成される。従来のルーティング技術と同様に、量子ルーティングプロトコルは、ネットワーク情報を使用して、一連のエンドツーエンドの絡み合い要求を満たす最適なパスを選択する。しかし、ネットワーク状態情報に加えて、量子ルーティングプロトコルは要求される絡み合いの忠実さ、スワップ操作の確率的性質、絡み合い状態の短寿命を考慮に入れなければならない。本研究では,実際の絡み合いルーティング問題を定式化し,それに対応する主要なアプローチを解析・分類し,従来のネットワークルーティング戦略と比較し,そこからインスピレーションを得る。我々は、研究された量子ルーティングスキームを、反応性、プロアクティブ、機会論的、仮想ルーティングに分類し、議論する。

Entanglement routing in near-term quantum networks consists of choosing the optimal sequence of short-range entanglements to combine through swapping operations to establish end-to-end entanglement between two distant nodes. Similar to traditional routing technologies, a quantum routing protocol uses network information to choose the best paths to satisfy a set of end-to-end entanglement requests. However, in addition to network state information, a quantum routing protocol must also take into account the requested entanglement fidelity, the probabilistic nature of swapping operations, and the short lifetime of entangled states. In this work, we formulate a practical entanglement routing problem and analyze and categorize the main approaches to address it, drawing comparisons to, and inspiration from, classical network routing strategies where applicable. We classify and discuss the studied quantum routing schemes into reactive, proactive, opportunistic, and virtual routing

翻訳日:2024-08-05 13:27:42 公開日:2024-08-02

# グラフニューラルネットワークによる個別血流と活動へのフロー誘導型位置決め

Tailoring Graph Neural Network-based Flow-guided Localization to Individual Bloodstreams and Activities ( http://arxiv.org/abs/2408.01239v1 )

ライセンス: Link先を確認

Pablo Galván, Filip Lemic, Gerard Calvo Bartra, Sergi Abadal, Xavier Costa Pérez,

(参考訳) 血流中ナノデバイスを用いたフローガイドの局在化は,早期疾患の検出,生物状態の連続モニタリング,標的治療に有用であることが期待される。ナノデバイスは、ローカライゼーション目的のために誤った生データを生成する、サイズと電力制約を呈する。オンボディアンカーはこのデータを受信し、興味のある診断イベントの場所を導出する。さまざまな機械学習(ML)アプローチが最近提案されているが、現在は安静患者の基準血流に制限されている。そのため、患者の血流の物理的多様性には対処できず、個々の患者の活動の変化による継続的なモニタリングもできない。グラフニューラルネットワーク(GNN)をベースとした現状のフローガイド型ローカライズ手法であるSotA(State-of-the-Art)に対するこれらの課題に対処するために,身長,体重,心拍数などの個々の生理指標に基づくGNN適応のためのパイプラインを提案する。以上の結果から,提案した適応は,血流と活動の個人差を和らげる上で有益であることが示唆された。

Flow-guided localization using in-body nanodevices in the bloodstream is expected to be beneficial for early disease detection, continuous monitoring of biological conditions, and targeted treatment. The nanodevices face size and power constraints that produce erroneous raw data for localization purposes. On-body anchors receive this data, and use it to derive the locations of diagnostic events of interest. Different Machine Learning (ML) approaches have been recently proposed for this task, yet they are currently restricted to a reference bloodstream of a resting patient. As such, they are unable to deal with the physical diversity of patients' bloodstreams and cannot provide continuous monitoring due to changes in individual patient's activities. Toward addressing these issues for the current State-of-the-Art (SotA) flow-guided localization approach based on Graph Neural Networks (GNNs), we propose a pipeline for GNN adaptation based on individual physiological indicators including height, weight, and heart rate. Our results indicate that the proposed adaptions are beneficial in reconciling the individual differences between bloodstreams and activities.

翻訳日:2024-08-05 13:27:42 公開日:2024-08-02

# 300mmウエハプロセスを用いたMOS二重量子ドットの交換制御

Exchange control in a MOS double quantum dot made using a 300 mm wafer process ( http://arxiv.org/abs/2408.01241v1 )

ライセンス: Link先を確認

Jacob F. Chittock-Wood, Ross C. C. Leon, Michael A. Fogarty, Tara Murphy, Sofia M. Patomäki, Giovanni A. Oakes, Felix-Ekkehard von Horstig, Nathan Johnson, Julien Jussot, Stefan Kubicek, Bogdan Govoreanu, David F. Wise, M. Fernando Gonzalez-Zalba, John J. L. Morton,

(参考訳) 半導体産業の先進的な製造能力を活用することで、歩留まり、均一性、統合性を高めることで、シリコンベースの量子プロセッサのスケールアップを支援することが約束される。 300mmウエハ金属-酸化物-半導体(MOS)プロセスで作製された量子ドットの最近の研究は、個々のスピン量子ビットの制御と読み出しを示しているが、量子プロセッサは2量子ビットの相互作用を必要とする。ここでは、スピン量子ビット用にカスタマイズされた300mmウエハMOSプロセスを使用し、スピン-スピン交換相互作用を用いた2つの電子スピンのコヒーレント制御を示し、$\sqrt{\text{SWAP}}$のようなエンタングルゲートの基礎を形成する。ゲート劣化時間は最大$T_2^{*}\approx500$ns, ゲート品質係数は10。我々はさらに、エコーシーケンスを用いて最大1桁までコヒーレンスを拡大する。読み出しには、分散測定のスピンプロジェクティブな性質を維持しながら信号を増幅する、分散読出技術である高周波電子カスケードを導入する。本研究は,分散センシング技術との統合とともに,2量子演算のための産業用グレードプラットフォームを実証した。

Leveraging the advanced manufacturing capabilities of the semiconductor industry promises to help scale up silicon-based quantum processors by increasing yield, uniformity and integration. Recent studies of quantum dots fabricated on 300 mm wafer metal-oxide-semiconductor (MOS) processes have shown control and readout of individual spin qubits, yet quantum processors require two-qubit interactions to operate. Here, we use a 300 mm wafer MOS process customized for spin qubits and demonstrate coherent control of two electron spins using the spin-spin exchange interaction, forming the basis of an entangling gate such as $\sqrt{\text{SWAP}}$. We observe gate dephasing times of up to $T_2^{*}\approx500$ ns and a gate quality factor of 10. We further extend the coherence by up to an order of magnitude using an echo sequence. For readout, we introduce a dispersive readout technique, the radiofrequency electron cascade, that amplifies the signal while retaining the spin-projective nature of dispersive measurements. Our results demonstrate an industrial grade platform for two-qubit operations, alongside integration with dispersive sensing techniques.

翻訳日:2024-08-05 13:27:42 公開日:2024-08-02

# XGBoostモデルとSVMモデルを用いたドライビーン品種の自動分類

Automated Classification of Dry Bean Varieties Using XGBoost and SVM Models ( http://arxiv.org/abs/2408.01244v1 )

ライセンス: Link先を確認

Ramtin Ardeshirifar,

(参考訳) 本稿では,機械学習モデルを用いた7種類の乾燥豆の自動分類について比較検討する。 12,909個のドライビーンサンプルを用いて, 初期13,611個から外乱除去と特徴抽出を行い, 主成分分析 (PCA) を次元化に応用し, XGBoost と Support Vector Machine (SVM) の2種類のマルチクラス分類器を訓練した。モデルをネストしたクロスバリデーションを用いて評価し,ロバストな性能評価とハイパーパラメータチューニングを実現した。 XGBoostとSVMのモデルはそれぞれ94.00%と94.39%の正確な分類率を達成した。この結果は、特に種子分類の均一性と効率を高めるために、農業応用におけるこれらの機械学習アプローチの有効性を裏付けるものである。本研究は, 種子品質制御と収量最適化を効果的に支援できることを実証し, 精密農業への取り組みの活発化に寄与する。今後は、より多様なデータセットと高度なアルゴリズムを取り入れて、分類精度をさらに向上していく予定だ。

This paper presents a comparative study on the automated classification of seven different varieties of dry beans using machine learning models. Leveraging a dataset of 12,909 dry bean samples, reduced from an initial 13,611 through outlier removal and feature extraction, we applied Principal Component Analysis (PCA) for dimensionality reduction and trained two multiclass classifiers: XGBoost and Support Vector Machine (SVM). The models were evaluated using nested cross-validation to ensure robust performance assessment and hyperparameter tuning. The XGBoost and SVM models achieved overall correct classification rates of 94.00% and 94.39%, respectively. The results underscore the efficacy of these machine learning approaches in agricultural applications, particularly in enhancing the uniformity and efficiency of seed classification. This study contributes to the growing body of work on precision agriculture, demonstrating that automated systems can significantly support seed quality control and crop yield optimization. Future work will explore incorporating more diverse datasets and advanced algorithms to further improve classification accuracy.

翻訳日:2024-08-05 13:27:42 公開日:2024-08-02

# MapComp: グループアグリゲーションのためのセキュアなビューベースの協調分析フレームワーク

MapComp: A Secure View-based Collaborative Analytics Framework for Join-Group-Aggregation ( http://arxiv.org/abs/2408.01246v1 )

ライセンス: Link先を確認

Xinyu Peng, Feng Han, Li Peng, Weiran Liu, Zheng Yan, Kai Kang, Xinyuan Zhang, Guoxing Wei, Jianling Sun, Jinfei Liu,

(参考訳) 本稿では、協調分析のための結合グループ集約(JGA)クエリを容易にするビューベースの新しいフレームワークであるMapCompを紹介する。グループ集約(group-aggregation, GA)プロトコルの結合と新規設計のための特別に製作されたマテリアライズドビューにより、MapCompは重複したジョインのワークロードを排除し、その後のGAを高速化し、JGAクエリの実行効率を向上する。連続的なデータ更新をサポートするため、当社のマテリアライズドビューはペイロード独立機能を提供し、無料のMPCオーバーヘッドでビューリフレッシュの大幅な効率向上を実現しています。この機能はまた、GAのさらなる加速を可能にし、以前の作業より優れた複数の新しいプロトコルを考案しました。特に、本研究は、マテリアライズドビューを使ったセキュアなJGAクエリを高速化する最初の取り組みである。本実験はMapCompの大きな利点を示し,クエリを8回実行する場合の非ビューベースラインと比較して,2189.9倍の効率向上を実現した。

This paper introduces MapComp, a novel view-based framework to facilitate join-group-aggregation (JGA) queries for collaborative analytics. Through specially crafted materialized view for join and novel design of group-aggregation (GA) protocols, MapComp removes duplicated join workload and expedites subsequent GA, improving the efficiency of JGA query execution. To support continuous data updates, our materialized view offers payload-independence feature and brings in significant efficiency improvement of view refreshing with free MPC overhead. This feature also allows further acceleration for GA, where we devised multiple novel protocols that outperform prior works. Notably, our work represents the first endeavor to expedite secure collaborative JGA queries using materialized views. Our experiments demonstrate a significant advantage of MapComp, achieving up to a 2189.9x efficiency improvement compared to the non-view based baseline when executing queries eight times.

翻訳日:2024-08-05 13:27:42 公開日:2024-08-02

# IRSおよびUAV支援MECシステムのための深層強化学習に基づくフレキシブルリソーススケジューリングフレームワーク

Deep progressive reinforcement learning-based flexible resource scheduling framework for IRS and UAV-assisted MEC system ( http://arxiv.org/abs/2408.01248v1 )

ライセンス: Link先を確認

Li Dong, Feibo Jiang, Minjie Wang, Yubo Peng, Xiaolong Li,

(参考訳) インテリジェントリフレクションサーフェス (IRS) と無人航空機 (UAV) による移動エッジコンピューティング (MEC) システムは、一時的および緊急のシナリオで広く利用されている。我々のゴールは、UAV位置、IRS位相シフト、タスクオフロード、リソース割り当てを可変数のUAVで共同最適化することで、MECシステムのエネルギー消費を最小化することである。この目的のために,新しいマルチタスクエージェントが提案され,混合整数非線形プログラミング(MINLP)問題に対処する。本発明のマルチタスクエージェントは、異なるタスク用に設計された2つの出力ヘッドを有し、分類されたヘッドを用いて整数変数によるオフロード決定を行い、適合ヘッドは連続変数によるリソース割り当てを解決する。次に、プログレッシブスケジューラを導入して、エージェント内のニューロンの一部を段階的に調整することにより、エージェントを様々な数のUAVに適応させる。この構造は自然に経験を蓄積し、破滅的な忘れ物に免疫を持つ。最後に、FRESのグローバル検索を強化するために、ライトタブー検索(LTS)を導入する。数値計算により,動的MECシステムにおいてもリアルタイムかつ最適な資源スケジューリングを実現するFRESフレームワークの優位性を示す。

The intelligent reflection surface (IRS) and unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) system is widely used in temporary and emergency scenarios. Our goal is to minimize the energy consumption of the MEC system by jointly optimizing UAV locations, IRS phase shift, task offloading, and resource allocation with a variable number of UAVs. To this end, we propose a Flexible REsource Scheduling (FRES) framework by employing a novel deep progressive reinforcement learning which includes the following innovations: Firstly, a novel multi-task agent is presented to deal with the mixed integer nonlinear programming (MINLP) problem. The multi-task agent has two output heads designed for different tasks, in which a classified head is employed to make offloading decisions with integer variables while a fitting head is applied to solve resource allocation with continuous variables. Secondly, a progressive scheduler is introduced to adapt the agent to the varying number of UAVs by progressively adjusting a part of neurons in the agent. This structure can naturally accumulate experiences and be immune to catastrophic forgetting. Finally, a light taboo search (LTS) is introduced to enhance the global search of the FRES. The numerical results demonstrate the superiority of the FRES framework which can make real-time and optimal resource scheduling even in dynamic MEC systems.

翻訳日:2024-08-05 13:27:42 公開日:2024-08-02

# 不確実環境におけるメタレアソン--メタBAMDPフレームワーク

Metareasoning in uncertain environments: a meta-BAMDP framework ( http://arxiv.org/abs/2408.01253v1 )

ライセンス: Link先を確認

Prakhar Godara, Tilman Diego Aléman, Angela J. Yu,

(参考訳) 意思決定のシナリオでは、 \textit{reasoning} は、アクション $a^* \in \mathcal{A}$ を選択するアルゴリズム $P$ と見ることができ、マルコフ決定プロセス(MDP)の値関数の最大化などの結果の最適化を目的としている。しかしながら、$P$自体の実行にはいくつかのコスト(時間、エネルギー、限られた容量など)がかかり、根底にある決定問題における選択によって得られる明示的なユーティリティと並行して考慮する必要がある。このようなコストは、人間の振る舞いを正確にモデル化するだけでなく、すべての物理的システムがリソースの制約に直面しているため、AI計画の最適化にも考慮する必要がある。正しい$P$を見つけることは、推論プロセスの空間上の最適化問題として、$P$(一般には \textit{metareasoning} と呼ばれる)と表すことができる。従来、ヒトメタレゾンモデルでは、エージェントは基礎となるMDPの遷移と報酬分布を知っていると仮定していた。本稿では,メタベイズ適応型MDP(meta-BAMDP)フレームワークを,人間やAIシステムが直面している,はるかに大規模で現実的な計画問題を含む,未知の報酬/遷移分布を持つ環境におけるメタ推論を扱うことで,そのようなモデルを一般化する。最初のステップとして、人間の意思決定によく使われる2本腕のBernoulli bandit(TABB)タスクにこのフレームワークを適用します。メタ問題の複雑さのため、我々のソリューションは必ずしも近似的だが、それでも人間の意思決定シナリオにとって間違いなく現実的な仮定の範囲内で堅牢である。これらの結果は、認知的制約の下での人間の探索を理解するための規範的な枠組みを提供する。ベイズ適応戦略とメタ推論の統合は、意思決定研究の理論的な展望と、不確実性とリソース制約の下で計画するAIシステムを設計する実践的応用の両方を豊かにする。

In decision-making scenarios, \textit{reasoning} can be viewed as an algorithm $P$ that makes a choice of an action $a^* \in \mathcal{A}$, aiming to optimize some outcome such as maximizing the value function of a Markov decision process (MDP). However, executing $P$ itself may bear some costs (time, energy, limited capacity, etc.) and needs to be considered alongside explicit utility obtained by making the choice in the underlying decision problem. Such costs need to be taken into account in order to accurately model human behavior, as well as optimizing AI planning, as all physical systems are bound to face resource constraints. Finding the right $P$ can itself be framed as an optimization problem over the space of reasoning processes $P$, generally referred to as \textit{metareasoning}. Conventionally, human metareasoning models assume that the agent knows the transition and reward distributions of the underlying MDP. This paper generalizes such models by proposing a meta Bayes-Adaptive MDP (meta-BAMDP) framework to handle metareasoning in environments with unknown reward/transition distributions, which encompasses a far larger and more realistic set of planning problems that humans and AI systems face. As a first step, we apply the framework to two-armed Bernoulli bandit (TABB) tasks, which have often been used to study human decision making. Owing to the meta problem's complexity, our solutions are necessarily approximate, but nevertheless robust within a range of assumptions that are arguably realistic for human decision-making scenarios. These results offer a normative framework for understanding human exploration under cognitive constraints. This integration of Bayesian adaptive strategies with metareasoning enriches both the theoretical landscape of decision-making research and practical applications in designing AI systems that plan under uncertainty and resource constraints.

翻訳日:2024-08-05 13:27:42 公開日:2024-08-02

# TrIM:畳み込みニューラルネットワークのための三角形入力運動シストリックアレイ-その1:データフローと解析モデル

TrIM: Triangular Input Movement Systolic Array for Convolutional Neural Networks -- Part I: Dataflow and Analytical Modelling ( http://arxiv.org/abs/2408.01254v1 )

ライセンス: Link先を確認

Cristian Sestito, Shady Agwa, Themis Prodromakis,

(参考訳) 最先端AIモデルの継続的な計算複雑性とデータ強度に従うために、新しい計算パラダイムが提案されている。これらのパラダイムは、処理コアとメモリの間のデータ移動のエネルギーコストに関連するフォン・ノイマンのボトルネックを緩和することにより、高いエネルギー効率を達成することを目的としている。畳み込みニューラルネットワーク(CNN)はこのボトルネックに特に影響を受けやすい。 Systolic Arrays (SA)は、処理要素の配列(PE)によって実行される高いデータ利用のおかげで、データ転送コストを軽減できる有望なアーキテクチャである。これらのPEは、特定のデータフロー(重量定常や行定常など)に基づいて、データを連続的に交換し、処理し、メインメモリへのメモリアクセス数を減少させる。 SAのハードウェア特殊化は、行列乗算から多次元畳み込みまで、さまざまなワークロードに対応できる。本稿では,三角入力運動に基づく新しいデータフローであるTrIMを提案する。重量定常や行定常のような最先端のSAデータフローと比較すると、TrIMが提供する高いデータ利用はメモリアクセスを約10倍削減する。さらに、PEが連続的に乗算と累積を重複していることを考えると、TrIMは限られたレジスタ(行定常よりも最大で15.6倍少ないレジスタ)を必要とせず、高いスループット(行定常よりも81.8%高い)を達成する。

In order to follow the ever-growing computational complexity and data intensity of state-of-the-art AI models, new computing paradigms are being proposed. These paradigms aim at achieving high energy efficiency, by mitigating the Von Neumann bottleneck that relates to the energy cost of moving data between the processing cores and the memory. Convolutional Neural Networks (CNNs) are particularly susceptible to this bottleneck, given the massive data they have to manage. Systolic Arrays (SAs) are promising architectures to mitigate the data transmission cost, thanks to high data utilization carried out by an array of Processing Elements (PEs). These PEs continuously exchange and process data locally based on specific dataflows (like weight stationary and row stationary), in turn reducing the number of memory accesses to the main memory. The hardware specialization of SAs can meet different workloads, ranging from matrix multiplications to multi-dimensional convolutions. In this paper, we propose TrIM: a novel dataflow for SAs based on a Triangular Input Movement and compatible with CNN computing. When compared to state-of-the-art SA dataflows, like weight stationary and row stationary, the high data utilization offered by TrIM guarantees ~10x less memory access. Furthermore, considering that PEs continuously overlap multiplications and accumulations, TrIM achieves high throughput (up to 81.8% higher than row stationary), other than requiring a limited number of registers (up to 15.6x fewer registers than row stationary).

翻訳日:2024-08-05 13:27:42 公開日:2024-08-02

# SeCritMass:秘密の誓約を守る

SeCritMass: Threshold Secret Petitions ( http://arxiv.org/abs/2408.01255v1 )

ライセンス: Link先を確認

Florian Breuer,

(参考訳) 我々は、ユーザが署名に暗号化された署名を付加し、少なくとも$n$署名が収集された場合に限り、署名が復号化されるという、$n$-thresholdの秘密請願の概念を導入する。これは、ユーザーが請願書に署名したり、原因にコミットしたいと願う調整の問題を解決するが、他のユーザーが署名する前に署名したと特定したくない。本稿では,ElGamal暗号システムに基づく請願書の実装について述べる。申請書には、セクハラや警察の残虐行為の訴えなど、不平を訴える者が単独で立ち上がるのをためらった状況の不正行為を報告することが含まれる。

We introduce the notion of an $n$-threshold secret petition, in which users add encrypted signatures to a petition, and the signatures are decrypted if and only if at least $n$ signatures have been gathered. This solves the coordination problem in which users wish to sign a petition or commit to a cause, but do not want to be identified as having signed it before enough others have signed it too. We present an implementation of such a petition based on the ElGamal cryptosystem. Applications include reporting misconduct in situations were complainants hesitate to come forward alone, such as in allegations of sexual harassment or police brutality.

翻訳日:2024-08-05 13:27:42 公開日:2024-08-02

# 協調型オンライン行動の検出と評価

Detection and Characterization of Coordinated Online Behavior: A Survey ( http://arxiv.org/abs/2408.01257v1 )

ライセンス: Link先を確認

Lorenzo Mannocci, Michele Mazza, Anna Monreale, Maurizio Tesconi, Stefano Cresci,

(参考訳) 調整は人生の基本的な側面である。ソーシャルメディアの出現は、オンラインコミュニティの繁栄や社会運動を特徴付けるような、オンライン人との交流にも不可欠なものとなっている。同時に、コーディネーションは効果的な偽情報、操作、ヘイトキャンペーンのコアでもある。この調査は、コーディネートされたオンライン行動への関心が高まった結果、得られた仕事の身体を収集し、分類し、批判的に議論する。我々は、業界と学術的定義を整理し、協調したオンライン行動を研究するための包括的な枠組みを提案し、既存の検出方法と特徴付け手法をレビューし、批判的に議論する。本分析では,オンラインコーディネーションに固有の複雑さを理解し,対処する上で,学者,実践者,政策立案者のガイドとして,オープンな課題と研究の有望な方向性を特定した。

Coordination is a fundamental aspect of life. The advent of social media has made it integral also to online human interactions, such as those that characterize thriving online communities and social movements. At the same time, coordination is also core to effective disinformation, manipulation, and hate campaigns. This survey collects, categorizes, and critically discusses the body of work produced as a result of the growing interest on coordinated online behavior. We reconcile industry and academic definitions, propose a comprehensive framework to study coordinated online behavior, and review and critically discuss the existing detection and characterization methods. Our analysis identifies open challenges and promising directions of research, serving as a guide for scholars, practitioners, and policymakers in understanding and addressing the complexities inherent to online coordination.

翻訳日:2024-08-05 13:27:42 公開日:2024-08-02

# 絡み合いエントロピーのQCD進化

QCD evolution of entanglement entropy ( http://arxiv.org/abs/2408.01259v1 )

ライセンス: Link先を確認

Martin Hentschinski, Dmitri E. Kharzeev, Krzysztof Kutak, Zhoudunming Tu,

(参考訳) エンタングルメントエントロピーは、プロトンにおける色閉じ込めのような非摂動量子色力学(QCD)現象を探索するための新しいツールとして登場した。近年の研究では、深い非弾性散乱におけるハドロンの生成を説明する上で重要な能力を示しているが、絡み合いエントロピーのQCD進化は未解明のままである。本研究では, 陽子内における速度依存性エンタングルメントエントロピーとその最終状態ハドロンへの関連性について検討し, QCDの進化を解明することを目的とした。解析の結果,QCD進化方程式から得られたフォン・ノイマンエントロピーの速さ依存性と,それに対応するハドロンエントロピーの実験データとの間には強い一致が認められた。これらの発見は、最大絡み合った状態の出現を示す説得力のある証拠となり、陽子の非摂動構造に対する新たな洞察を与える。

Entanglement entropy has emerged as a novel tool for probing nonperturbative quantum chromodynamics (QCD) phenomena, such as color confinement in protons. While recent studies have demonstrated its significant capability in describing hadron production in deep inelastic scatterings, the QCD evolution of entanglement entropy remains unexplored. In this work, we investigate the differential rapidity-dependent entanglement entropy within the proton and its connection to final-state hadrons, aiming to elucidate its QCD evolution. Our analysis reveals a strong agreement between the rapidity dependence of von Neumann entropy, obtained from QCD evolution equations, and the corresponding experimental data on hadron entropy. These findings provide compelling evidence for the emergence of a maximally entangled state, offering new insights into the nonperturbative structure of protons.

翻訳日:2024-08-05 13:27:42 公開日:2024-08-02

# RAGEval:シナリオ固有のRAG評価データセット生成フレームワーク

RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework ( http://arxiv.org/abs/2408.01262v1 )

ライセンス: Link先を確認

Kunlun Zhu, Yifan Luo, Dingling Xu, Ruobing Wang, Shi Yu, Shuo Wang, Yukun Yan, Zhenghao Liu, Xu Han, Zhiyuan Liu, Maosong Sun,

(参考訳) Retrieval-Augmented Generation (RAG) システムは,Large Language Models (LLM) の幻覚を緩和する上で,その利点を実証している。既存のRAGベンチマークは主に、LLMが一般的な知識に正しく答えられるかどうかを評価することに焦点を当てている。しかし、異なる垂直領域のデータを扱う場合、RAGシステムの有効性は評価できない。本稿では,異なるシナリオにおける異なるLLMの知識利用能力を評価するために,評価データセットを自動生成するフレームワークであるRAGEvalを紹介する。具体的には、RAGEvalはシードドキュメントからスキーマを要約し、さまざまなドキュメントを生成するために構成を適用し、記事と構成の両方に応じて質問応答ペアを構築する。 LLMが生み出す応答を慎重に評価するために, 完全性, 幻覚, 不適切性の3つの新しい指標を提案する。 RAGEvalは、垂直領域のRAGモデルをベンチマークすることで、LCMの知識使用能力をよりよく評価する能力を持ち、既存のQAデータセットにおける知識の源泉に関する混乱を避ける。

Retrieval-Augmented Generation (RAG) systems have demonstrated their advantages in alleviating the hallucination of Large Language Models (LLMs). Existing RAG benchmarks mainly focus on evaluating whether LLMs can correctly answer the general knowledge. However, they are unable to evaluate the effectiveness of the RAG system in dealing with the data from different vertical domains. This paper introduces RAGEval, a framework for automatically generating evaluation datasets to evaluate the knowledge usage ability of different LLMs in different scenarios. Specifically, RAGEval summarizes a schema from seed documents, applies the configurations to generate diverse documents, and constructs question-answering pairs according to both articles and configurations. We propose three novel metrics, Completeness, Hallucination, and Irrelevance, to carefully evaluate the responses generated by LLMs. By benchmarking RAG models in vertical domains, RAGEval has the ability to better evaluate the knowledge usage ability of LLMs, which avoids the confusion regarding the source of knowledge in answering question in existing QA datasets--whether it comes from parameterized memory or retrieval.

翻訳日:2024-08-05 13:27:42 公開日:2024-08-02

# 仮想CAT:スイスの強制教育におけるアルゴリズム的思考評価ツール

The virtual CAT: A tool for algorithmic thinking assessment in Swiss compulsory education ( http://arxiv.org/abs/2408.01263v1 )

ライセンス: Link先を確認

Giorgia Adorni, Alberto Piatti,

(参考訳) 今日のデジタル時代において、アルゴリズム思考(AT)スキルを保持することは、コンピュータ科学の分野だけでなく、重要なことである。これらの能力により、個人は複雑な問題をより管理可能なステップに分解し、解決するための一連のアクションを作成することができる。教育環境におけるATアセスメントの需要の増加と現行手法の限界に対処するため,スイスの強制教育におけるアルゴリズムスキルの評価を目的とした非プラグ型アセスメント活動のデジタル適応である仮想クロスアレータスク(CAT)を紹介した。このツールはスケーラブルで自動化されたアセスメントを提供し、人間の関与を減らし、潜在的なデータ収集エラーを軽減する。このプラットフォームはジェスチャーベースおよび視覚ブロックベースのプログラミングインタフェースを備えており、多様な学習者に対するユーザビリティを確保し、さらに多言語機能によってサポートされている。仮想CATプラットフォームを評価するため,スイスで異種学生グループによるパイロット評価を行った。この結果から, 多様な年齢, 開発段階, 教育的背景を持つ学生のATスキルを評価するためのプラットフォームの有用性, 習熟度, 適性, および大規模データ収集の可能性が示唆された。

In today's digital era, holding algorithmic thinking (AT) skills is crucial, not only in computer science-related fields. These abilities enable individuals to break down complex problems into more manageable steps and create a sequence of actions to solve them. To address the increasing demand for AT assessments in educational settings and the limitations of current methods, this paper introduces the virtual Cross Array Task (CAT), a digital adaptation of an unplugged assessment activity designed to evaluate algorithmic skills in Swiss compulsory education. This tool offers scalable and automated assessment, reducing human involvement and mitigating potential data collection errors. The platform features gesture-based and visual block-based programming interfaces, ensuring its usability for diverse learners, further supported by multilingual capabilities. To evaluate the virtual CAT platform, we conducted a pilot evaluation in Switzerland involving a heterogeneous group of students. The findings show the platform's usability, proficiency and suitability for assessing AT skills among students of diverse ages, development stages, and educational backgrounds, as well as the feasibility of large-scale data collection.

翻訳日:2024-08-05 13:27:42 公開日:2024-08-02

# 浮遊ナノ粒子の量子非局在化

Quantum Delocalization of a Levitated Nanoparticle ( http://arxiv.org/abs/2408.01264v1 )

ライセンス: Link先を確認

Massimiliano Rossi, Andrei Militaru, Nicola Carlon Zambon, Andreu Riera-Campeny, Oriol Romero-Isart, Martin Frimmer, Lukas Novotny,

(参考訳) 量子物理学によれば、全ての巨大な粒子は波のように振る舞う。しかし、この特性波の性質は、原子や分子のような顕微鏡システムを用いた二重スリット実験でのみ観察されている。鍵となる側面は、これらの系の運動を記述する波動関数がスリット分離に匹敵する距離を連続的に拡張し、系自体のサイズよりもはるかに大きいことである。より巨大で複雑な物体をこれらの状態に準備することは、依然として顕著な課題である。固体振動子の運動は単一量子のレベルで制御できるが、そのコヒーレンス長はゼロ点運動と同等であり、原子間距離に制限される。ここでは、ゼロ点運動を超えるコヒーレンス長を有する浮遊固体ナノスフィアの非局在状態を作成する。私たちはまずその動きを地平線に冷やします。そして、閉じ込め電位の剛性を調節することにより、最小付加雑音で初期コヒーレンス長の3倍以上の増分を達成する。光学浮揚は、他の機械的プラットフォームに欠けている閉じ込めを制御できる。我々の研究は、物体の大きさに匹敵する非局在化スケールの生成に向けたステップストーンであり、マクロ的な量子実験にとって重要なレギュレーションであり、浮遊粒子を用いた量子増強力センシングに向けたものである。

Every massive particle behaves like a wave, according to quantum physics. Yet, this characteristic wave nature has only been observed in double-slit experiments with microscopic systems, such as atoms and molecules. The key aspect is that the wavefunction describing the motion of these systems extends coherently over a distance comparable to the slit separation, much larger than the size of the system itself. Preparing these states of more massive and complex objects remains an outstanding challenge. While the motion of solid-state oscillators can now be controlled at the level of single quanta, their coherence length remains comparable to the zero-point motion, limited to subatomic distances. Here, we prepare a delocalized state of a levitating solid-state nanosphere with coherence length exceeding the zero-point motion. We first cool its motion to the ground state. Then, by modulating the stiffness of the confinement potential, we achieve more than a threefold increment of the initial coherence length with minimal added noise. Optical levitation gives us the necessary control over the confinement that other mechanical platforms lack. Our work is a stepping stone towards the generation of delocalization scales comparable to the object size, a crucial regime for macroscopic quantum experiments, and towards quantum-enhanced force sensing with levitated particles.

翻訳日:2024-08-05 13:17:55 公開日:2024-08-02

# 非エルミート系における不純物による皮膚効果とリニアモード

Impurity-induced counter skin-effect and linear modes in non-Hermitian systems ( http://arxiv.org/abs/2408.01265v1 )

ライセンス: Link先を確認

Nico G. Leumer, Dario Bercioux,

(参考訳) 非相互格子系は最も単純で非エルミート系の一つであり、エルミート系にはいくつかの重要な特徴が欠落している。本研究では,不純物を用いた波多野・ネルソンモデルについて検討し,不純物が本態性非エルミット皮膚への影響を明らかにする。不純物の位置や強度に関わらず、開境界条件と周期境界条件の両方で問題に対して正確な解析解を提示する。この正確な解は数値シミュレーションによって完全に検証される。非相反ホッピングパラメータによって決定される特異な不純物強度は、不純物部位で特異な皮膚状態を引き起こす。この不純物状態は、境界誘発皮膚効果と相反する皮膚効果を示し、不純物誘発皮膚効果と呼ばれる現象である。これらの発見は、不純物を持つ非エルミート系の力学に関する前例のない洞察を与え、不純物とシステムの非相互性の間の複雑な相互作用を解明する。さらに,不純物による対向皮膚効果の特異な特性を,感度と精度の向上に活用できる量子センシングへの応用の可能性も示唆した。

Non-reciprocal lattice systems are among the simplest non-Hermitian systems, exhibiting several key features absent in their Hermitian counterparts. In this study, we investigate the Hatano-Nelson model with an impurity and unveil how the impurity influences the intrinsic non-Hermitian skin effect of the system. We present an exact analytical solution to the problem under both open and periodic boundary conditions, irrespective of the impurity's position and strength. This exact solution is thoroughly validated by numerical simulations. Our analysis reveals a distinctive phenomenon where a specific impurity strength, determined by the non-reciprocal hopping parameters, induces a unique skin state at the impurity site. This impurity state exhibits a skin effect that counterbalances the boundary-induced skin effect, a phenomenon we term the impurity-induced counter skin-effect. These findings offer unprecedented insights into the dynamics of non-Hermitian systems with impurities, elucidating the complex interplay between impurities and the system's non-reciprocal nature. Furthermore, our results suggest potential applications in quantum sensing, where the unique characteristics of the impurity-induced counter skin-effect could be harnessed for enhanced sensitivity and precision.

翻訳日:2024-08-05 13:17:55 公開日:2024-08-02

# 語彙豊かさによるテキスト・ツー・3次元生成のための3次元GS初期化向上のための一般フレームワーク

A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness ( http://arxiv.org/abs/2408.01269v1 )

ライセンス: Link先を確認

Lutao Jiang, Hangyu Li, Lin Wang,

(参考訳) テキストから3Dコンテンツの作成は、特に3Dガウススプラッティングの流行により、最近多くの注目を集めている。一般に、GSベースの手法は初期化とレンダリング最適化という2つの重要な段階から構成される。初期化を達成するために、既存の研究は、初期形状を導出するためにランダム球初期化または3次元拡散モデル(例えば、Point-E)を直接適用している。しかし、このような戦略は2つの重大な難題に苦しむ。 1) 最終形状は,訓練後においても,初期形状と変わらず類似している。 2) 形状は単純なテキスト,例えば "a dog" からのみ生成できるが,これは語彙的にリッチなテキスト,例えば "a dog are on the top of the plane" のためではない。これらの問題に対処するために,テキストから3D生成のための3次元GS初期化を語彙的豊かさに基づいて促進する,新しい汎用フレームワークを提案する。我々のキーとなる考え方は、3Dガウスを空間的に均一なボクセルに集約し、複雑な形状を表現し、3Dガウスとテクスト間の空間的相互作用とガウスとテクスト間の意味的相互作用を可能にすることである。具体的には、まず、各ボクセルが位置、スケール、回転を固定した3次元ガウスを持ち、不透明度を唯一の要因として設定し、位置の占有度を決定するボクセル化表現を構築する。次に、主に2つの新しいコンポーネントからなる初期化ネットワークを設計する。 1)グローバルインフォメーション・パーセプション(GIP)ブロックと 2) Gaussians-Text Fusion (GTF) ブロック。このような設計により、各3次元ガウスは、他の領域からの空間情報とテキストからの意味情報を同化することができる。大規模な実験により,従来の手法であるShap-Eに対して,語彙的に単純,中,硬テキストを採り入れ,高品質な3D GS初期化の枠組みが優れていることが示された。また、私たちのフレームワークは、セマンティックに一貫性のあるテキストから3D生成のためのSoTAトレーニングフレームワーク、例えばLucidDreamerにシームレスにプラグインすることができます。

Text-to-3D content creation has recently received much attention, especially with the prevalence of 3D Gaussians Splatting. In general, GS-based methods comprise two key stages: initialization and rendering optimization. To achieve initialization, existing works directly apply random sphere initialization or 3D diffusion models, e.g., Point-E, to derive the initial shapes. However, such strategies suffer from two critical yet challenging problems: 1) the final shapes are still similar to the initial ones even after training; 2) shapes can be produced only from simple texts, e.g., "a dog", not for lexically richer texts, e.g., "a dog is sitting on the top of the airplane". To address these problems, this paper proposes a novel general framework to boost the 3D GS Initialization for text-to-3D generation upon the lexical richness. Our key idea is to aggregate 3D Gaussians into spatially uniform voxels to represent complex shapes while enabling the spatial interaction among the 3D Gaussians and semantic interaction between Gaussians and texts. Specifically, we first construct a voxelized representation, where each voxel holds a 3D Gaussian with its position, scale, and rotation fixed while setting opacity as the sole factor to determine a position's occupancy. We then design an initialization network mainly consisting of two novel components: 1) Global Information Perception (GIP) block and 2) Gaussians-Text Fusion (GTF) block. Such a design enables each 3D Gaussian to assimilate the spatial information from other areas and semantic information from texts. Extensive experiments show the superiority of our framework of high-quality 3D GS initialization against the existing methods, e.g., Shap-E, by taking lexically simple, medium, and hard texts. Also, our framework can be seamlessly plugged into SoTA training frameworks, e.g., LucidDreamer, for semantically consistent text-to-3D generation.

翻訳日:2024-08-05 13:17:55 公開日:2024-08-02

# ニューラルネットワークODEにおける認証ロバスト不変ポリトープ訓練

Certified Robust Invariant Polytope Training in Neural Controlled ODEs ( http://arxiv.org/abs/2408.01273v1 )

ライセンス: Link先を確認

Akash Harapanahalli, Samuel Coogan,

(参考訳) 本研究では、フィードフォワードニューラルネットワークとしてパラメータ化された状態フィードバック制御器を用いて、外乱を受ける通常の微分方程式としてモデル化された非線形制御系について考察する。本研究では,ポリトープ内で初期化される任意の軌道が,乱れに関係なくポリトープ内に留まる,頑健な前方不変ポリトープを持つコントローラのトレーニングフレームワークを提案する。まず,高次元空間における昇降制御系の一群をパラメータ化し,各昇降系の不変部分空間上で元のニューラル制御系が進化する。我々は、間隔解析とニューラルネットワーク検証を用いて、昇降埋め込みシステムのファミリーを構築し、この不変部分空間の知識を注意深く把握する。任意の持ち上げ埋め込み系のベクトル場が1つの点で符号制約を満たすなら、元の系のある凸ポリトープは頑強に前方不変である。ニューラルネットワーク制御器と昇降系パラメータを変数として扱うことにより,閉ループ制御系における前方不変ポリトープを認定した制御器を訓練するアルゴリズムを提案する。 2つの例を通して、手話制約の単純さによって、システム次元を50ドル以上の状態に拡張し、実行時に最先端のリャプノフベースのサンプリングアプローチより優れていることを示す。

We consider a nonlinear control system modeled as an ordinary differential equation subject to disturbance, with a state feedback controller parameterized as a feedforward neural network. We propose a framework for training controllers with certified robust forward invariant polytopes, where any trajectory initialized inside the polytope remains within the polytope, regardless of the disturbance. First, we parameterize a family of lifted control systems in a higher dimensional space, where the original neural controlled system evolves on an invariant subspace of each lifted system. We use interval analysis and neural network verifiers to further construct a family of lifted embedding systems, carefully capturing the knowledge of this invariant subspace. If the vector field of any lifted embedding system satisfies a sign constraint at a single point, then a certain convex polytope of the original system is robustly forward invariant. Treating the neural network controller and the lifted system parameters as variables, we propose an algorithm to train controllers with certified forward invariant polytopes in the closed-loop control system. Through two examples, we demonstrate how the simplicity of the sign constraint allows our approach to scale with system dimension to over $50$ states, and outperform state-of-the-art Lyapunov-based sampling approaches in runtime.

翻訳日:2024-08-05 13:17:55 公開日:2024-08-02

# ウェーブマンバ:超高精細低光画像強調のためのウェーブレット状態空間モデル

Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement ( http://arxiv.org/abs/2408.01276v1 )

ライセンス: Link先を確認

Wenbin Zou, Hongxia Gao, Weipeng Yang, Tongtong Liu,

(参考訳) 超高精細(UHD)技術は、視力の異常さから注目されているが、低照度画像強調(LLIE)技術に新たな課題も生じている。 UHD画像は本質的に高い計算複雑性を有しており、既存のUHD LLIE法では、計算コストを削減し、結果として情報損失をもたらす。ウェーブレット変換は情報を失うことなくダウンサンプリングを可能にするだけでなく、画像の内容とノイズを分離する。これにより、状態空間モデル(SSM)は、長いシーケンスをモデル化する際のノイズの影響を避けることができ、SSMの長いシーケンスモデリング機能を完全に活用することができる。そこで本研究では,ウェーブレット領域から導出した2つの重要な洞察に基づく新しいアプローチであるWave-Mambaを提案する。 1) 画像の内容情報の大部分は低周波成分に存在し、高周波成分にはほとんど含まれない。 2) 高周波成分は低照度化の結果に最小限の影響を与える。具体的には,UHD画像のグローバルなコンテント情報を効率的にモデル化するために,低周波サブバンド情報の復元に重点を置いた低周波状態空間ブロック(LFSSBlock)を提案する。さらに,高周波数サブバンド情報に対する高周波数拡張ブロック (HFEBlock) を提案する。網羅的な評価により,提案手法は優れた性能を示し,より合理化されたアーキテクチャを維持しつつ,現在の先行技術を大きく上回っている。コードはhttps://github.com/AlexZou14/Wave-Mambaで入手できる。

Ultra-high-definition (UHD) technology has attracted widespread attention due to its exceptional visual quality, but it also poses new challenges for low-light image enhancement (LLIE) techniques. UHD images inherently possess high computational complexity, leading existing UHD LLIE methods to employ high-magnification downsampling to reduce computational costs, which in turn results in information loss. The wavelet transform not only allows downsampling without loss of information, but also separates the image content from the noise. It enables state space models (SSMs) to avoid being affected by noise when modeling long sequences, thus making full use of the long-sequence modeling capability of SSMs. On this basis, we propose Wave-Mamba, a novel approach based on two pivotal insights derived from the wavelet domain: 1) most of the content information of an image exists in the low-frequency component, less in the high-frequency component. 2) The high-frequency component exerts a minimal influence on the outcomes of low-light enhancement. Specifically, to efficiently model global content information on UHD images, we proposed a low-frequency state space block (LFSSBlock) by improving SSMs to focus on restoring the information of low-frequency sub-bands. Moreover, we propose a high-frequency enhance block (HFEBlock) for high-frequency sub-band information, which uses the enhanced low-frequency information to correct the high-frequency information and effectively restore the correct high-frequency details. Through comprehensive evaluation, our method has demonstrated superior performance, significantly outshining current leading techniques while maintaining a more streamlined architecture. The code is available at https://github.com/AlexZou14/Wave-Mamba.

翻訳日:2024-08-05 13:17:55 公開日:2024-08-02

# 共鳴原理を超えたタイプIIポンプ:エネルギーから幾何学的規則へ

Type-II pumping beyond resonance principle: From energetic to geometric rules ( http://arxiv.org/abs/2408.01282v1 )

ライセンス: Link先を確認

B. Q. Song, J. D. H. Smith, Y. X. Yao, J. Wang,

(参考訳) 従来、ポンプはエネルギー共鳴に依存していた: Energy Quanta ${\hbar}{\omega}$はギャップ$\Delta$と一致する。線形近似では、これはフェルミ・ゴールデン・ルール(Fermi Golden Rule, FGR)と呼ばれる。しかし、この原理は、${\omega},{\Delta}{\rightarrow}0$を同時に持つ「0/0」極限に適用することが困難になる。位相位相遷移 (TPT) のような「0/0」のシナリオでは、FGRが定式化したタイプIと区別した幾何学的規則に基づくタイプIIポンピング、幾何ポンピング (GP) が認識される。 Type-Iは「エネルギーの狭さ」を特徴とし、FGRのフェルミ分布への依存(原子価と伝導帯の確率)に反映される粒子を高エネルギーで送る。 GPは非指向的であるが、その確率は、検出のキーシグネチャである$f_v+f_c-2f_v f_c$に依存する。本研究では,(1)GPの概念,(2)TPTの分数性,可逆性,依存性,(3)ZrTe$_5$のコヒーレントフォノン駆動における超高速スペクトルによる実験的検出について述べる。

Conventionally, pumping relies on energetic resonance: energy quanta ${\hbar}{\omega}$ matches the gap $\Delta$. Under linear approximation, this is known as the Fermi golden rule (FGR). However, this principle becomes challenging to apply in the "0/0" limit, where ${\omega},{\Delta}{\rightarrow}0$ simultaneously. In "0/0" scenarios, such as topological phase transition (TPT), a type-II pumping, geometric pumping (GP), is recognized subject to geometric rules, distinguished from type-I dictated by FGR. Type-I features an "arrow of energy", sending particles higher in energy, reflected by FGR's dependence on Fermi distribution $f_v-f_c$ (probabilities of valence and conduction bands). While GP is non-directional, its probability relies on $f_v+f_c-2f_v f_c$ instead, a key signature for detection. In this work, we address: (1) the concept of GP; (2) its features of fractionality, irreversibility, and dependence on TPT; (3) experimental detection with ultra-fast spectrum in coherent phonon driving of ZrTe$_5$.

翻訳日:2024-08-05 13:17:55 公開日:2024-08-02

# オート・データ・プルーニングによる人間活動認識用ODLコアの小型化

A Tiny Supervised ODL Core with Auto Data Pruning for Human Activity Recognition ( http://arxiv.org/abs/2408.01283v1 )

ライセンス: Link先を確認

Hiroki Matsutani, Radu Marculescu,

(参考訳) 本稿では,人間活動認識のための入力データの分布変化に対処できる,低コストで低消費電力の小型教師ありオンデバイス学習(ODL)コアを提案する。リソース制限エッジデバイス用のODLは近年研究されているが、実行時にこれらのデバイスにトレーニングラベルを正確に提供する方法は未解決のままである。この問題に対処するために、教師付きODLと自動データプルーニングを組み合わせることで、教師装置から予測されたラベルを取得するのに必要なクエリ数を削減し、モデル再トレーニング時の消費電力を削減することを提案する。データプルーニングしきい値が自動的に調整され、手動のしきい値調整が不要になる。人間の活動認識のための数mWの小さなMLソリューションとして、45nmのCMOSプロセス技術を用いて、自動データプルーニングをサポートする教師付きODLコアを設計する。我々は,コアに必要なメモリサイズが同一形状の多層パーセプトロン(MLP)よりも小さく,消費電力は3.39mWであることを示した。人間の活動認識データセットを用いた実験では、提案した自動データプルーニングにより通信容量が55.7%減少し、消費電力は0.9%の精度で減少した。

In this paper, we introduce a low-cost and low-power tiny supervised on-device learning (ODL) core that can address the distributional shift of input data for human activity recognition. Although ODL for resource-limited edge devices has been studied recently, how exactly to provide the training labels to these devices at runtime remains an open-issue. To address this problem, we propose to combine an automatic data pruning with supervised ODL to reduce the number queries needed to acquire predicted labels from a nearby teacher device and thus save power consumption during model retraining. The data pruning threshold is automatically tuned, eliminating a manual threshold tuning. As a tinyML solution at a few mW for the human activity recognition, we design a supervised ODL core that supports our automatic data pruning using a 45nm CMOS process technology. We show that the required memory size for the core is smaller than the same-shaped multilayer perceptron (MLP) and the power consumption is only 3.39mW. Experiments using a human activity recognition dataset show that the proposed automatic data pruning reduces the communication volume by 55.7% and power consumption accordingly with only 0.9% accuracy loss.

翻訳日:2024-08-05 13:17:55 公開日:2024-08-02

# 音声視覚一般化ゼロショット学習のためのアウトオフ分布検出:汎用フレームワーク

Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework ( http://arxiv.org/abs/2408.01284v1 )

ライセンス: Link先を確認

Liuyuan Wen,

(参考訳) Generalized Zero-Shot Learning (GZSL) は、目に見えないクラスと見えないクラスの両方を正確に分類する必要がある課題である。この領域内では、視覚的特徴と音響的特徴の両方をマルチモーダル入力として含めることを考えると、オーディオ視覚GZSLは非常にエキサイティングだが難しいタスクとして現れます。この分野での既存の取り組みは、主に埋め込みベースの手法または生成ベースの手法を利用する。しかし、生成的トレーニングは困難で不安定であり、埋め込みベースの手法はドメインシフト問題に遭遇することが多い。したがって、両手法を統一されたフレームワークに統合し、それぞれのデメリットを軽減しつつ、それらの利点を活用することが期待できる。本研究は,両アプローチの強みを活かすために,OED(out-of-distriion)検出を用いた汎用フレームワークを提案する。まず、生成的対向ネットワークを用いて未知の特徴を合成し、見知らぬクラスのための分類器とともにOOD検出器の訓練を可能にする。この検出器は、テスト特徴が見知らぬクラスに属しているかどうかを判断し、続いて各特徴種別分類器を用いた分類を行う。我々は,3つの人気オーディオ・ビジュアル・データセット上でフレームワークをテストし,既存の最先端技術と比較した大幅な改善を観察する。コードはhttps://github.com/liuyuan-wen/AV-OOD-GZSLにある。

Generalized Zero-Shot Learning (GZSL) is a challenging task requiring accurate classification of both seen and unseen classes. Within this domain, Audio-visual GZSL emerges as an extremely exciting yet difficult task, given the inclusion of both visual and acoustic features as multi-modal inputs. Existing efforts in this field mostly utilize either embedding-based or generative-based methods. However, generative training is difficult and unstable, while embedding-based methods often encounter domain shift problem. Thus, we find it promising to integrate both methods into a unified framework to leverage their advantages while mitigating their respective disadvantages. Our study introduces a general framework employing out-of-distribution (OOD) detection, aiming to harness the strengths of both approaches. We first employ generative adversarial networks to synthesize unseen features, enabling the training of an OOD detector alongside classifiers for seen and unseen classes. This detector determines whether a test feature belongs to seen or unseen classes, followed by classification utilizing separate classifiers for each feature type. We test our framework on three popular audio-visual datasets and observe a significant improvement comparing to existing state-of-the-art works. Codes can be found in https://github.com/liuyuan-wen/AV-OOD-GZSL.

翻訳日:2024-08-05 13:17:55 公開日:2024-08-02

# 人間とモデルの誤測:大規模言語モデルにおけるアロケーション・ハームの評価

The Mismeasure of Man and Models: Evaluating Allocational Harms in Large Language Models ( http://arxiv.org/abs/2408.01285v1 )

ライセンス: Link先を確認

Hannah Chen, Yangfeng Ji, David Evans,

(参考訳) 大規模言語モデル(LLM)は現在検討され、採用や臨床決定といった高い意思決定をサポートするアプリケーションにデプロイされている。バイアスを測定するためにいくつかの方法が提案されているが、提案手法が考慮している予測と、その決定にどのように使用されるかの間には、いまだにギャップがある。本研究では,LLM予測におけるバイアスに起因する潜在的アロケーション障害を評価するモデル非依存バイアス指標であるランクアロケーションベースバイアス指数(RABBI)を導入する。 RABBIと現在のバイアスメトリクスを2つの割り当て決定タスクで比較する。モデル選択のための10個のLLMと実用性に対して,それらの予測妥当性を評価した。以上の結果から, 平均性能差と分布距離に基づくバイアス指標は, 割り当て結果におけるグループ差を確実に捉えることができず, RABBIはアロケーション差と強い相関関係を示すことが明らかとなった。私たちの研究は、限られたリソース制約のあるコンテキストでモデルがどのように使用されるかを説明する必要性を強調しています。

Large language models (LLMs) are now being considered and even deployed for applications that support high-stakes decision-making, such as recruitment and clinical decisions. While several methods have been proposed for measuring bias, there remains a gap between predictions, which are what the proposed methods consider, and how they are used to make decisions. In this work, we introduce Rank-Allocational-Based Bias Index (RABBI), a model-agnostic bias measure that assesses potential allocational harms arising from biases in LLM predictions. We compare RABBI and current bias metrics on two allocation decision tasks. We evaluate their predictive validity across ten LLMs and utility for model selection. Our results reveal that commonly-used bias metrics based on average performance gap and distribution distance fail to reliably capture group disparities in allocation outcomes, whereas RABBI exhibits a strong correlation with allocation disparities. Our work highlights the need to account for how models are used in contexts with limited resource constraints.

翻訳日:2024-08-05 13:17:55 公開日:2024-08-02

# 深層学習に基づくビジュアルリッチな文書コンテンツ理解:調査

Deep Learning based Visually Rich Document Content Understanding: A Survey ( http://arxiv.org/abs/2408.01287v1 )

ライセンス: Link先を確認

Yihao Ding, Jean Lee, Soyeon Caren Han,

(参考訳) ビジュアルリッチドキュメンテーション(VRD)は、学術、金融、医療、マーケティングにおいて必要不可欠である。 VRDから情報を抽出する従来の方法は、専門家の知識と手作業に依存しており、費用がかかり非効率である。ディープラーニングの出現は、このプロセスに革命をもたらし、マルチモーダルな情報ビジョン、テキスト、レイアウト、および包括的な文書表現を開発するための事前訓練タスクを活用するモデルを導入した。これらのモデルは、様々な下流タスクにおける最先端のパフォーマンスを達成し、VRDからの情報抽出の効率と精度を大幅に向上させた。本稿では,Visually Rich Document Understanding (VRDU)における要求の高まりと迅速な開発に対応するため,ディープラーニングベースのVRDUフレームワークの包括的なレビューを行う。既存の手法とベンチマークデータセットを体系的に調査し分析し、採用戦略と下流タスクに基づいて分類する。さらに,VRDUモデルで使用されるさまざまなテクニックを比較し,特徴表現と融合,モデルアーキテクチャ,事前学習手法に着目し,その強み,制限,適切なシナリオを強調した。最後に、VRDUの新たなトレンドと課題を特定し、今後の研究方向や実践的応用に関する洞察を提供する。本調査は,VRDUの進歩を深く理解し,学術分野と産業分野の両方に利益をもたらすことを目的としている。

Visually Rich Documents (VRDs) are essential in academia, finance, medical fields, and marketing due to their multimodal information content. Traditional methods for extracting information from VRDs depend on expert knowledge and manual labor, making them costly and inefficient. The advent of deep learning has revolutionized this process, introducing models that leverage multimodal information vision, text, and layout along with pretraining tasks to develop comprehensive document representations. These models have achieved state-of-the-art performance across various downstream tasks, significantly enhancing the efficiency and accuracy of information extraction from VRDs. In response to the growing demands and rapid developments in Visually Rich Document Understanding (VRDU), this paper provides a comprehensive review of deep learning-based VRDU frameworks. We systematically survey and analyze existing methods and benchmark datasets, categorizing them based on adopted strategies and downstream tasks. Furthermore, we compare different techniques used in VRDU models, focusing on feature representation and fusion, model architecture, and pretraining methods, while highlighting their strengths, limitations, and appropriate scenarios. Finally, we identify emerging trends and challenges in VRDU, offering insights into future research directions and practical applications. This survey aims to provide a thorough understanding of VRDU advancements, benefiting both academic and industrial sectors.

翻訳日:2024-08-05 13:17:55 公開日:2024-08-02

# TexGen:マルチビューサンプリングと再サンプリングによるテキストガイド型3Dテクスチャ生成

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling ( http://arxiv.org/abs/2408.01291v1 )

ライセンス: Link先を確認

Dong Huo, Zixin Guo, Xinxin Zuo, Zhihao Shi, Juwei Lu, Peng Dai, Songcen Xu, Li Cheng, Yee-Hong Yang,

(参考訳) 3Dメッシュが与えられた場合、任意のテクスチャ記述に対応する3Dテクスチャを合成することを目的としている。サンプルビューからテクスチャを生成し組み立てる現在の手法は、しばしば顕著なシームや過度な平滑化をもたらす。これらの課題に対処するために,事前学習したテキスト・画像拡散モデルを利用したテクスチャ生成のための新しい多視点サンプリング・再サンプリングフレームワークであるTexGenを提案する。ビュー一貫したサンプリングのために、まず第一にRGB空間におけるテクスチャマップを維持し、それはデノナイジングステップによってパラメータ化され、拡散モデルの各サンプリングステップ後に更新され、ビューの不一致を漸進的に低減する。注目誘導型マルチビューサンプリング戦略を利用して、ビュー間で外観情報をブロードキャストする。テクスチャの詳細を保存するために、テキストプロンプトと現在のテクスチャマップによって指示された、ノイズの推定を支援し、その後のデノナイジングステップの入力を生成するノイズリサンプリング技術を開発した。定性的・定量的な評価を多量に行い, 多様な3次元オブジェクトのテクスチャ品質を高いビュー一貫性とリッチな外観で向上させ, 最先端の手法よりも優れていたことを実証した。さらに,テクスチャ生成技術は,テクスチャ編集にも適用可能である。さらなる実験結果はhttps://dong-huo.github.io/TexGen/で公開されている。

Given a 3D mesh, we aim to synthesize 3D textures that correspond to arbitrary textual descriptions. Current methods for generating and assembling textures from sampled views often result in prominent seams or excessive smoothing. To tackle these issues, we present TexGen, a novel multi-view sampling and resampling framework for texture generation leveraging a pre-trained text-to-image diffusion model. For view consistent sampling, first of all we maintain a texture map in RGB space that is parameterized by the denoising step and updated after each sampling step of the diffusion model to progressively reduce the view discrepancy. An attention-guided multi-view sampling strategy is exploited to broadcast the appearance information across views. To preserve texture details, we develop a noise resampling technique that aids in the estimation of noise, generating inputs for subsequent denoising steps, as directed by the text prompt and current texture map. Through an extensive amount of qualitative and quantitative evaluations, we demonstrate that our proposed method produces significantly better texture quality for diverse 3D objects with a high degree of view consistency and rich appearance details, outperforming current state-of-the-art methods. Furthermore, our proposed texture generation technique can also be applied to texture editing while preserving the original identity. More experimental results are available at https://dong-huo.github.io/TexGen/

翻訳日:2024-08-05 13:17:55 公開日:2024-08-02

# 3DPX:ハイブリッドMLP-CNNネットワークを用いたプログレッシブ2次元3次元口腔画像再構成

3DPX: Progressive 2D-to-3D Oral Image Reconstruction with Hybrid MLP-CNN Networks ( http://arxiv.org/abs/2408.01292v1 )

ライセンス: Link先を確認

Xiaoshuang Li, Mingyuan Meng, Zimo Huang, Lei Bi, Eduardo Delamare, Dagan Feng, Bin Sheng, Jinman Kim,

(参考訳) パノラマX線(パノラマX線、英: Panoramic X-ray、PX)は、歯科医療において、広く利用でき、低コストである。しかし、2Dプロジェクション画像として、PXは解剖学的情報を含まないため、3D情報(例えば、歯角ミスリグメンションの検出と分類)の恩恵を受けることができる歯科応用に限られている。 2D PXから直接3D構造を再構築し、直接2Dから3Dマッピングのために主に畳み込みニューラルネットワーク(CNN)に依存する既存の手法の限界に対処する研究が最近行われた。しかし,これらの手法は深度軸空間情報を正確に推測することができない。さらに、畳み込み演算の固有の局所性によって制限され、畳み込みカーネルはすぐ近くのピクセルの情報のみをキャプチャする。本研究では2D-to-3D経口PX再建のためのプログレッシブハイブリッド多層パーセプトロン(MLP)-CNNピラミドネットワーク(DPX)を提案する。本稿では, 3次元像を3DPXで段階的に再構成し, 各ピラミッドレベルでの中間再構成結果にガイダンスを付与するプログレッシブ・コンストラクション戦略を提案する。さらに, 細粒度長範囲依存の獲得を約束するMLPの出現により, 再建中の意味理解を改善するため, 3DPXはMLPとCNNを統合した。 464研究を含む2つの大規模データセットの大規模な実験により、我々の3DPXは、スタンドアローンのMLPやトランスフォーマーを含む最先端の2D-to-3D経口再建法を再構築品質で上回り、下流の角方向の不整合分類タスクの性能を即時的に向上させることを示した。

Panoramic X-ray (PX) is a prevalent modality in dental practice for its wide availability and low cost. However, as a 2D projection image, PX does not contain 3D anatomical information, and therefore has limited use in dental applications that can benefit from 3D information, e.g., tooth angular misa-lignment detection and classification. Reconstructing 3D structures directly from 2D PX has recently been explored to address limitations with existing methods primarily reliant on Convolutional Neural Networks (CNNs) for direct 2D-to-3D mapping. These methods, however, are unable to correctly infer depth-axis spatial information. In addition, they are limited by the in-trinsic locality of convolution operations, as the convolution kernels only capture the information of immediate neighborhood pixels. In this study, we propose a progressive hybrid Multilayer Perceptron (MLP)-CNN pyra-mid network (3DPX) for 2D-to-3D oral PX reconstruction. We introduce a progressive reconstruction strategy, where 3D images are progressively re-constructed in the 3DPX with guidance imposed on the intermediate recon-struction result at each pyramid level. Further, motivated by the recent ad-vancement of MLPs that show promise in capturing fine-grained long-range dependency, our 3DPX integrates MLPs and CNNs to improve the semantic understanding during reconstruction. Extensive experiments on two large datasets involving 464 studies demonstrate that our 3DPX outperforms state-of-the-art 2D-to-3D oral reconstruction methods, including standalone MLP and transformers, in reconstruction quality, and also im-proves the performance of downstream angular misalignment classification tasks.

翻訳日:2024-08-05 13:17:55 公開日:2024-08-02

# 水路安定化による水中物体検出

Underwater Object Detection Enhancement via Channel Stabilization ( http://arxiv.org/abs/2408.01293v1 )

ライセンス: Link先を確認

Muhammad Ali, Salman Khan,

(参考訳) 複雑な海洋環境は、物体検出多様体の課題を悪化させる。海洋ゴミは水生生態系を危険にさらし、永続的な課題を提示している。海洋堆積物の正確な検出は、この害を緩和するために重要である。本研究は、画像品質の向上と検出方法の評価により水中物体検出に対処する。 Detectron2のバックボーンには,さまざまなベースモデルとコンフィギュレーションが用意されています。本稿では,訓練画像におけるヘイズやカラーキャストを低減し,マルチスケール物体検出を改善するため,簡易な画像強調モデルとともにチャネル安定化手法を提案する。画像処理の後、最適な検出精度を得るために、異なるDectron2バックボーンをテストした。さらに、オブジェクトプロファイルの強調表示に拡張手法を付加したシャープニングフィルタを適用し、認識を容易にする。 TrashCan Datasetでは、インスタンスバージョンとマテリアルバージョンの両方で結果が示されている。最も優れたバックボーン法は,チャネル安定化と拡張技術が組み込まれている。また、検出結果をDeformable Transformerと比較する。 TrashCan 1.0のインスタンスバージョンでは、小さなオブジェクトの平均精度が9.53%向上し、ベースラインと比較して境界ボックス検出が7%向上した。コードはコードで利用可能になる。 https://github.com/aliman80/Underwater-Object-Detection-via-Channel-Stablization

The complex marine environment exacerbates the challenges of object detection manifold. Marine trash endangers the aquatic ecosystem, presenting a persistent challenge. Accurate detection of marine deposits is crucial for mitigating this harm. Our work addresses underwater object detection by enhancing image quality and evaluating detection methods. We use Detectron2's backbone with various base models and configurations for this task. We propose a novel channel stabilization technique alongside a simplified image enhancement model to reduce haze and color cast in training images, improving multi-scale object detection. Following image processing, we test different Detectron2 backbones for optimal detection accuracy. Additionally, we apply a sharpening filter with augmentation techniques to highlight object profiles for easier recognition. Results are demonstrated on the TrashCan Dataset, both instance and material versions. The best-performing backbone method incorporates our channel stabilization and augmentation techniques. We also compare our Detectron2 detection results with the Deformable Transformer. In the instance version of TrashCan 1.0, our method achieves a 9.53% absolute increase in average precision for small objects and a 7% absolute gain in bounding box detection compared to the baseline. The code will be available on Code: https://github.com/aliman80/Underwater- Object-Detection-via-Channel-Stablization

翻訳日:2024-08-05 13:17:55 公開日:2024-08-02

# 特徴時計:二次元プロットにおける高次元効果

Feature Clock: High-Dimensional Effects in Two-Dimensional Plots ( http://arxiv.org/abs/2408.01294v1 )

ライセンス: Link先を確認

Olga Ovcharenko, Rita Sevastjanova, Valentina Boeva,

(参考訳) 人間は高次元データを知覚し解釈するのに苦労する。したがって、高次元データは可視化のために2次元に投影されることが多い。多くの応用は複素非線形次元減少法の恩恵を受けるが、個々の高次元特徴の影響は二次元空間では説明が難しい。ほとんどの可視化ソリューションでは、複数の2次元プロットを使用し、それぞれが2次元に1つの高次元特徴の効果を示す。我々のソリューションであるFeature Clockは、2次元で表現されたデータ構造に対する元の特徴の影響を把握するためにこれらのkプロットを検査する必要がない新しいアプローチを提供する。 Feature Clockは、組み込みデータの視覚化の可視性とコンパクト性を高め、オープンソースのPythonライブラリで利用できる。

Humans struggle to perceive and interpret high-dimensional data. Therefore, high-dimensional data are often projected into two dimensions for visualization. Many applications benefit from complex nonlinear dimensionality reduction techniques, but the effects of individual high-dimensional features are hard to explain in the two-dimensional space. Most visualization solutions use multiple two-dimensional plots, each showing the effect of one high-dimensional feature in two dimensions; this approach creates a need for a visual inspection of k plots for a k-dimensional input space. Our solution, Feature Clock, provides a novel approach that eliminates the need to inspect these k plots to grasp the influence of original features on the data structure depicted in two dimensions. Feature Clock enhances the explainability and compactness of visualizations of embedded data and is available in an open-source Python library.

翻訳日:2024-08-05 13:17:55 公開日:2024-08-02

# 多変量分類木を用いた最適混合整数線形最適化

Optimal Mixed Integer Linear Optimization Trained Multivariate Classification Trees ( http://arxiv.org/abs/2408.01297v1 )

ライセンス: Link先を確認

Brandon Alston, Illya V. Hicks,

(参考訳) 多変量決定木は、多くの研究者や業界の専門家を惹きつける、分類と回帰のための強力な機械学習ツールである。最適な二分木は2種類の頂点を持つ。一正確に二人の子供がおり、個別の特徴の集合に基づいてデータポイントが評価されている分枝頂点二データポイントが予測される葉の頂点であって、目的とする生物客観的最適化問題を解くことにより得ることができること。 i) 正しく分類されたデータポイントの数を最大化し、 (ii)分岐頂点の数を最小化する。分岐頂点は訓練特徴の線形結合であり、したがって超平面とみなすことができる。本稿では、最適二分分類木を設計するための2つのカットベース混合整数線形最適化(MILO)法を提案する。我々のモデルは、最小限の実用不可能なサブシステム(MIS)をオンザフライで識別し、パッケージング制約の形をとる切断平面を導出する。本稿では,現在論文の中で最強のフローベースMILO定式化に関する理論的改善と,我々のモデルがスケールする能力,従来のブランチとバウンドアプローチに対する強み,サンプル外テスト性能の堅牢性を示すために利用可能なデータセットの実験を行う。コードとデータはGitHubで入手可能です。

Multivariate decision trees are powerful machine learning tools for classification and regression that attract many researchers and industry professionals. An optimal binary tree has two types of vertices, (i) branching vertices which have exactly two children and where datapoints are assessed on a set of discrete features and (ii) leaf vertices at which datapoints are given a prediction, and can be obtained by solving a biobjective optimization problem that seeks to (i) maximize the number of correctly classified datapoints and (ii) minimize the number of branching vertices. Branching vertices are linear combinations of training features and therefore can be thought of as hyperplanes. In this paper, we propose two cut-based mixed integer linear optimization (MILO) formulations for designing optimal binary classification trees (leaf vertices assign discrete classes). Our models leverage on-the-fly identification of minimal infeasible subsystems (MISs) from which we derive cutting planes that hold the form of packing constraints. We show theoretical improvements on the strongest flow-based MILO formulation currently in the literature and conduct experiments on publicly available datasets to show our models' ability to scale, strength against traditional branch and bound approaches, and robustness in out-of-sample test performance. Our code and data are available on GitHub.

翻訳日:2024-08-05 13:17:55 公開日:2024-08-02

# 遠隔超電導量子ビットシステムの完全自己試験

Complete Self-Testing of a System of Remote Superconducting Qubits ( http://arxiv.org/abs/2408.01299v1 )

ライセンス: Link先を確認

Simon Storz, Anatoly Kulikov, Josua D. Schär, Victor Barizien, Xavier Valcarce, Florence Berterottière, Nicolas Sangouard, Jean-Daniel Bancal, Andreas Wallraff,

(参考訳) セルフテストプロトコルは、デバイスに依存しない方法で量子システムの認証を可能にする。本稿では、大規模量子コンピューティングシステムを構築するための主要なプラットフォームである超伝導回路を用いた評価ルーチンの高規格化を実演する。まず、パウリ測度の自己検定が可能な欠損理論を開発する。次にベル対の生成と測定を同時に行い、30m間隔で動作する2つの絡み合った超伝導回路からなるシステムで完全な自己試験を行う。 1700万回の試験に基づく実験では、平均CHSH (Clauser-Horne-Shimony-Holt) S値は2.236である。実験装置に関する追加の仮定を頼らずに、平均ベル状態忠実度は58.9%、平均測定忠実度は少なくとも89.5%、信頼度は99%である。これにより、分散量子コンピューティングや、デリゲート量子コンピューティングのような超伝導回路との通信の分野での応用が可能になる。

Self-testing protocols enable the certification of quantum systems in a device-independent manner, i.e. without knowledge of the inner workings of the quantum devices under test. Here, we demonstrate this high standard for characterization routines with superconducting circuits, a prime platform for building large-scale quantum computing systems. We first develop the missing theory allowing for the self-testing of Pauli measurements. We then self-test Bell pair generation and measurements at the same time, performing a complete self-test in a system composed of two entangled superconducting circuits operated at a separation of 30 meters. In an experiment based on 17 million trials, we measure an average CHSH (Clauser-Horne-Shimony-Holt) S-value of 2.236. Without relying on additional assumptions on the experimental setup, we certify an average Bell state fidelity of at least 58.9% and an average measurement fidelity of at least 89.5% in a device-independent manner, both with 99% confidence. This enables applications in the field of distributed quantum computing and communication with superconducting circuits, such as delegated quantum computing.